Sample
From Clariopedia
Contents |
Overview
The clario® node Sample gives you the ability to perform either simple random sampling or stratified sampling where the node randomly selects rows from an input data stream.
Usage
Input Stream
The node connector can be connected to a variety of nodes, (ie. Read, Aggregate, Append, Missing, etc.), but requires a valid stream of data.
Configuration
The Sample node has only one configuration face.
Configuration Face
The configuration face involves your interaction with a few drop down lists and text boxes. First you must specify the seed value (any integer between 1 and one billion). Then move down to specify the number of samples (between 1 and 15). Sample will generate a unique value in the Replicate ID Attribute that will be added to the beginning of the output data stream metadata. This attribute can be renamed by clicking in the Replicate ID Attribute box and typing in a new name.Specify the sampling method desired: Simple Random or Stratified
- When Simple Random is selected: Select either Rows or Percent, then specify the corresponding Sample Size
- When Stratified is selected: Select the Class Attribute (from the drop down list...only String type attributes will be available for selection). Then select either Rows or Percent, the corresponding Stratum value(s), and Sample Size will appear in the ‘Sample Size’ box. Type in value of each Strata to be selected into sample. Lastly type in a corresponding sample size (# of rows or percent) for each Strata.
Field Definitions
Because Sample gives you the ability to name the 'Replicate ID Attribute', a specific number of keys are valid. These valid keys are: A-Z, a-z, 0-9, "-", "_". If invalid keys are pressed when the text box is open, nothing with appear.
- Valid Inputs – You must link to a valid data stream (ie. Read, Append, Filter, etc.).
- When ‘Stratified’ is selected as the sampling method, a valid (string) stratum must be selected from the drop down list and all Stratum value(s) must be specified in the 'Sample Size' box.
Results
There is one results face for the Sample node which contains the following: Method Name (simple, stratified)Total Row Count (Total numbers of rows in input file)
Sample Row Count (total number of rows sampled)
Selection Probability Sampling Weight Note: To ensure consistent and reliable results, the input data stream must be sorted on the selected Class Attribute.
Output Stream
Due to the unique design of clario, where data is streamed throughout all processing, there is no direct data output as a result of executing the Sample node. The Sample node is designed to sample a dataset for other nodes to explore, manipulate, cleanse, and model the data. The data can be exported at any point in a workflow by using the Write File node.
Video Demonstration
References
None.
