Sample

From Clariopedia

Jump to: navigation, search

Contents


Sample Node
Sample Node

Overview

The clario® node Sample gives you the ability to perform either simple random sampling or stratified sampling where the node randomly selects rows from an input data stream.

Usage

Input Stream

The node connector can be connected to a variety of nodes, (ie. Read, Aggregate, Append, Missing, etc.), but requires a valid stream of data.

Configuration

The Sample node has only one configuration face.

Configuration Face
The configuration face involves your interaction with a few drop down lists and text boxes.
Configuration Face
Configuration Face
First you must specify the seed value (any integer between 1 and one billion). Then move down to specify the number of samples (between 1 and 15). Sample will generate a unique value in the Replicate ID Attribute that will be added to the beginning of the output data stream metadata. This attribute can be renamed by clicking in the Replicate ID Attribute box and typing in a new name.
Specify the sampling method desired: Simple Random or Stratified
  • When Simple Random is selected: Select either Rows or Percent, then specify the corresponding Sample Size
  • When Stratified is selected: Select the Class Attribute (from the drop down list...only String type attributes will be available for selection). Then select either Rows or Percent, the corresponding Stratum value(s), and Sample Size will appear in the ‘Sample Size’ box. Type in value of each Strata to be selected into sample. Lastly type in a corresponding sample size (# of rows or percent) for each Strata.



Field Definitions

Because Sample gives you the ability to name the 'Replicate ID Attribute', a specific number of keys are valid. These valid keys are: A-Z, a-z, 0-9, "-", "_". If invalid keys are pressed when the text box is open, nothing with appear.

  • Valid Inputs – You must link to a valid data stream (ie. Read, Append, Filter, etc.).
  • When ‘Stratified’ is selected as the sampling method, a valid (string) stratum must be selected from the drop down list and all Stratum value(s) must be specified in the 'Sample Size' box.

Results

There is one results face for the Sample node which contains the following:
Results Face
Results Face
Method Name (simple, stratified)
Total Row Count (Total numbers of rows in input file)
Sample Row Count (total number of rows sampled)
Selection Probability Sampling Weight
View Results
View Results
Note: To ensure consistent and reliable results, the input data stream must be sorted on the selected Class Attribute.

Output Stream

Due to the unique design of clario, where data is streamed throughout all processing, there is no direct data output as a result of executing the Sample node. The Sample node is designed to sample a dataset for other nodes to explore, manipulate, cleanse, and model the data. The data can be exported at any point in a workflow by using the Write File node.

Video Demonstration

References

None.


Retrieved from "/wiki/index.php/Sample"