Univariate

From Clariopedia

Jump to: navigation, search

Contents


Univariate Node
Univariate Node

Overview

The clario® node Univariate allows you to understand and explore your data, by looking at frequency distributions, graphs, and a variety of statistical metadata for all attributes. You can connect the univariate node to a cleanse node (missing, outliers), in order to utilize the univariate output in the file cleansing process.

Usage

Input Stream

The node connector can be connected to a variety of nodes, (ie. Read, Aggregate, Append, Missing, etc.), but requires a valid stream of data.

Configuration

The Univariate node has only one configuration face.

Configuration Face

The configuration face contains an Available Attributes and a Selected Attributes list box. The Available Attributes list box displays all of the attributes available on the input data stream connected to the input link node connector. Univariate results will be returned for attributes chosen as Selected Attributes.

Configuration Face
Configuration Face

The first, and only step is to select an attribute(s) to be analyzed by clicking on it in the Available Attributes box, these selected attributes will become highlighted. Drag and drop the desired attributes from the Available Attributes list box to the Selected Attributes list box. You must select at least one attribute to run through the Univariate node.

NOTES: To efficiently find attribute names, begin typing an attribute name in the text box directly above Available Attributes. You will then be directed to the attribute(s) beginning with the letter(s) you type. To select multiple attributes at once, either: use [Ctrl]+click to select multiple, one at a time, [Shift]+down arrow to select multiple in order of appearance, or use [Shift]+click to select the beginning and the ending attribute which will select all attributes. To de-select an attribute click on the attribute in the Selected Attributes box and drag and drop into the Available Attributes box. Attributes in the Selected Attributes list can be re-ordered by clicking and holding on an attribute and dragging it to the desired position within the Selected Attributes box.



Field Definitions

  • Valid Inputs – You must link from a valid data stream (ie. Read, Append, Filter, etc.).

Results

There is one results face with two tabs (numeric summary and results explorer) for the univariate node.

Results Face
Results Face
View Summary
Click the View Summary button for a high level summary of all attributes. Each
Numeric Summary
Numeric Summary
attribute is listed, along with several summary statistics. Statistics include: Non-missing rows, Mean, Minimum, Maximum, Standard Deviation, Sum and Median. Any of these columns can be sorted for display, by clicking on the column heading. Click the [x] at the upper right corner to close any of the results pages.
Results Explorer: String
Results Explorer: String
Results Explorer: Number
Results Explorer: Number
View Explorer

Click the View Explorer button for more detailed statistics and charts for one attribute at a time. Click on any attribute listed in the left hand box, and details will be displayed in three different sections:

Summary statistics: The gray box in the upper center portion of the screen contains the summary statistics of Mean, Minimum, Maximum, Standard Deviation, Sum, Median and Non-missing rows.

Bar graph: A bar graph representation of the data is displayed along the right hand side of the screen. As you hover over each bar of the graph, actual value ranges and counts (non-missing rows) are displayed.

Detailed Statistics: The lower center portion of the screen contains additional univariate statistics. These statistics are displayed in a series of tabs:



Location Variability Moments Quartile Extremes

Mean

Median

Mode

Standard Deviation

Variance

Range

Interquartile Range

Skewness

Kurtosis

Uncorrected Sum of Squares

Corrected Sum of Squares

Standard Error of the Mean

Percent Coefficient of Variation

Variable Sum

Distinct Values

Non-Missing Coverage Percent

Percent Highest Frequency Value

Non-Missing Rows

Processed Rows

z of Minimum

z of Maximum

Minimum

Quartile 1

Median (Quartile 2)

Quartile 3

Maximum

10 Minimum Values

10 Maximum Values




IMPORTANT: Within each of these univariate tabs, you can resize the columns by clicking and dragging the column heading to the right or left. You can also resize the Results Explorer page by dragging on the three vertical lines button (|||), between the attributes list and the detailed univariate results, or between the univariate results and the graph.

Output Stream

The Univariate node can be connected to a Missing or Outliers node to aid in replacing missing values or capping outliers. The Univariate output can also be written out to a file, for future use, using the Write node

Video Demonstration


References

None.


Retrieved from "/wiki/index.php/Univariate"