Univariate
From Clariopedia
Contents |
Overview
The clario® node Univariate allows you to understand and explore your data, by looking at frequency distributions, graphs, and a variety of statistical metadata for all attributes. You can connect the univariate node to a cleanse node (missing, outliers), in order to utilize the univariate output in the file cleansing process.
Usage
Input Stream
The node connector can be connected to a variety of nodes, (ie. Read, Aggregate, Append, Missing, etc.), but requires a valid stream of data.
Configuration
The Univariate node has only one configuration face.
Configuration Face
The configuration face contains an Available Attributes and a Selected Attributes list box. The Available Attributes list box displays all of the attributes available on the input data stream connected to the input link node connector. Univariate results will be returned for attributes chosen as Selected Attributes.
The first, and only step is to select an attribute(s) to be analyzed by clicking on it in the Available Attributes box, these selected attributes will become highlighted. Drag and drop the desired attributes from the Available Attributes list box to the Selected Attributes list box. You must select at least one attribute to run through the Univariate node.
NOTES: To efficiently find attribute names, begin typing an attribute name in the text box directly above Available Attributes. You will then be directed to the attribute(s) beginning with the letter(s) you type. To select multiple attributes at once, either: use [Ctrl]+click to select multiple, one at a time, [Shift]+down arrow to select multiple in order of appearance, or use [Shift]+click to select the beginning and the ending attribute which will select all attributes. To de-select an attribute click on the attribute in the Selected Attributes box and drag and drop into the Available Attributes box. Attributes in the Selected Attributes list can be re-ordered by clicking and holding on an attribute and dragging it to the desired position within the Selected Attributes box.
Field Definitions
- Valid Inputs – You must link from a valid data stream (ie. Read, Append, Filter, etc.).
Results
There is one results face with two tabs (numeric summary and results explorer) for the univariate node.
View Summary
Click the View Summary button for a high level summary of all attributes. Each attribute is listed, along with several summary statistics. Statistics include: Non-missing rows, Mean, Minimum, Maximum, Standard Deviation, Sum and Median. Any of these columns can be sorted for display, by clicking on the column heading. Click the [x] at the upper right corner to close any of the results pages.View Explorer
Click the View Explorer button for more detailed statistics and charts for one attribute at a time. Click on any attribute listed in the left hand box, and details will be displayed in three different sections:
Summary statistics: The gray box in the upper center portion of the screen contains the summary statistics of Mean, Minimum, Maximum, Standard Deviation, Sum, Median and Non-missing rows.
Bar graph: A bar graph representation of the data is displayed along the right hand side of the screen. As you hover over each bar of the graph, actual value ranges and counts (non-missing rows) are displayed.
Detailed Statistics: The lower center portion of the screen contains additional univariate statistics. These statistics are displayed in a series of tabs:
| Location | Variability | Moments | Quartile | Extremes |
|
Uncorrected Sum of Squares Corrected Sum of Squares Standard Error of the Mean Percent Coefficient of Variation Variable Sum Distinct Values Non-Missing Coverage Percent Percent Highest Frequency Value Non-Missing Rows Processed Rows z of Minimum z of Maximum |
Minimum Quartile 1 Median (Quartile 2) Quartile 3 Maximum |
10 Minimum Values 10 Maximum Values |
IMPORTANT: Within each of these univariate tabs, you can resize the columns by clicking and dragging the column heading to the right or left. You can also resize the Results Explorer page by dragging on the three vertical lines button (|||), between the attributes list and the detailed univariate results, or between the univariate results and the graph.
Output Stream
The Univariate node can be connected to a Missing or Outliers node to aid in replacing missing values or capping outliers. The Univariate output can also be written out to a file, for future use, using the Write node.
Video Demonstration
References
None.
