Read File

From Clariopedia

Jump to: navigation, search

Contents


Read File Node Icon
Read File Node Icon

Overview

The clario® node Read File is used to read in either a delimited or fixed-length flat file.

IMPORTANT: All workflows must begin with one or more Read File nodes.

Usage

Files to be read in must be associated with the clario project in which you created your workflow.

Configuration

The Read File node has two configuration faces: Define File, and Define Attributes

Define File
Define File Face
Define File Face

The first step in configuring a Read File node is to click on the browse button […] which launches the file browser. Only data which has been associated with the project will be displayed. Select the desired file from the list and click [OK]. Once a file has been selected, you can click the raw file preview button to display the first 50 rows of the selected file, unformatted. This is a great way to learn the structure of the file. To close raw file preview, click on the [x] in the upper right corner.

The second step is to select a file type. Options are ‘delimited’ or ‘fixed’. After selecting a file, the user interface will display different options depending on your selection.

For delimited data files, select the attribute delimiter from the drop down list of “Delimiter”. The available delimiters are comma, semi-colon, pipe, double pipe, tab and space. If one is present, the enclosure (single quote or double quote) must be specified. Leave enclosure blank if none is present.

For either delimited or fixed data files, if there are header rows present in the file, click on the check box and specify the number of rows for each header. The header rows will be excluded from any further data processing, but will be used in the define attributes face by the attribute guesser.

To read and process a sequential subset (from the beginning) of the data in the selected data file, uncheck [All] and specify a value in the [# Rows to Read From File] field. Leave the [All] box checked to read all rows in the file.

If there are any errors in reading data you may choose to null values, simply check the box [Set Errors to Null]. Otherwise leave the box unchecked. For example if the defined attribute is numeric, but there is one row that contains an alpha, the value for the row and attribute will be set to NULL. Additionally, checking this box will place null values in malformed rows (missing delimiters, errant carriage returns) and will return a count of malformed rows in the run log. If this is not checked and the read node encounters malformed rows it will result in a failed run.

Define Attributes
The Define Attributes user interface changes slightly depending on what type of file was selected in the define file node face.
Define Attributes Face
Define Attributes Face
If you specified a delimited file on the define file face, there is an attribute guesser button on the lower left corner of the face. Clicking this button will use the define file configuration to guess the Name, Type and, in the case of Date Types, format of each attribute. Name and Type are required for each attribute in the data file. If an attribute's type is date, a format is required. If the [header rows] check box is checked on the define file node face, the data in the first header row will be used to fill in the attribute Names. If the Header Rows check box is not checked the attribute Names are defaulted to attribute1 … attribute’n’. The first 100 data rows (after skipping the number of rows specified as headers in the # Rows value on the define file node face) will be used to determine the field Type (String, Number or Date).

If you specified a fixed-length file for each attribute in the data file, you must enter the Name, Type, Start Position and Length of each attribute; format is only required for Date Types. Each data row starts in position 1. For each attribute, enter a valid Name, select the correct Type, and enter the Start Position and Length value. It is possible to group together a block of contiguous data attributes by entering the start position of the first attribute and the total length of all attributes combined. This is helpful if you want to treat a group of data as one attribute and just pass it thru processing. Selected attributes can be read from the file by entering the Name, Type, Start and Length of those attributes you are interested in and ignoring the remainder of the data file. Also, the same attribute can be read in multiple times. For example, you may want to read all digits of a zipcode (5 digits), all digits of a zip+4 (9 digits) as well as just sectional centers, the first three digits of zipcode (3 digits).

Regardless of the type of file selected in the define file node face, there is a formatted file preview button that will display the first 50 rows of data in a grid using the configuration from both the define file and define attributes nodes faces. This is useful to check that you have properly defined all attributes.

Additionally, the read node allows users to import attribute definitions directly from a comma separated file via the "Import from CSV" button.


NOTE: Valid clario data types are String, Number, and Date. For Dates, valid Formats are listed in a drop-down box. If the Date Format is not listed, you can read the attribute in as a string or number and then transform it into the desired format using other clario nodes.


Field Definitions

Read File gives you the ability to name attributes, but only a specific number of keys are valid to name attributes. These valid keys are: A-Z, a-z, 0-9, "-", "_". If the attribute guesser has been pressed and invalid attribute names appear, they become highlighted in red and an error
Invalid Attribute Error
Invalid Attribute Error
occurs stating: "You must define a valid name for each attribute". To fix the error you must first click the specific attribute to highlight it, click once more to access the text box, then use the valid keys listed above to rename the attribute. 
  • Valid types – String, Number, and Date.
  • Valid formats – For date, valid formats are listed in a drop-down box. If the date format is not listed, you can use the transform node to manipulate a string or number attribute into the desired format.


Results

There are no results faces for the read node. It is assumed the read node will be connected to another node to actually process the data. In fact, a valid workflow must have at least two nodes (a read node and at least one more node).

Output Stream

Due to the unique design of clario, where data is streamed throughout all processing, there is no direct data output as a result of executing the read node. The read node is designed to stream data into other nodes to explore, manipulate, cleanse, and model the data. The data can be exported at any point in a workflow by using the Write File node.




Retrieved from "/wiki/index.php/Read_File"
Personal tools