Candidates

From Clariopedia

Jump to: navigation, search

Contents


Candidates Node
Candidates Node

Overview

The clario® node Candidates is used to build a series of regression models in order to determine the “best” predictive model. You can direct the node to control the number of predictor attributes and the number of models generated.

IMPORTANT: Candidates builds only OLS regression models.

Usage

Input Stream

The node connector can be connected to a variety of nodes, (ie. Read, Aggregate, Append, Missing, etc.), but requires a valid stream of data.

Configuration

The Candidates node has only one configuration face
Configuration
Configuration
Configuration Face

The configuration face contains an Available Attributes and Selected Attribute list box along with a drop down box, Dependent Attribute . The first step is to drag and drop the attributes you want to use in constructing models from the Available Attributes box to the Selected Attributes box.

NOTES: To efficiently find attribute names, begin typing an attribute name in the text box directly above Available Attributes. You will then be directed to only the attribute(s) beginning with the letter(s) you type. To select multiple attributes at once, either use [Ctrl]+click to select multiple, one at a time, [Shift]+down arrow to select multiple in order of appearance, or use [Shift]+click to select the beginning and the ending attribute which will select all attributes. To de-select an attribute click on the attribute in the Selected Attributes box and drag and drop into the Available Attributes box. Attributes in the Selected Attributes list can be re-ordered by clicking and holding on an attribute and dragging it to the desired position within the Selected Attributes box.

Below the list boxes, you can control the model size in terms of number of attributes, as well as the number of models with each size. First, select your dependent attribute from the drop down box. Note that this attribute must be numeric for candidates to run. Even if you are ultimately going to build a logistic model, where you need a string dependent attribute, candidates needs a numeric dependent attribute. This is because candidates builds OLS regression models. Second, select a minimum and maximum number of attributes. The minimum number of attributes must be LESS THAN or equal to the number of selected attributes; the maximum number of attributes also must be LESS THAN or equal to the number of selected attributes, but greater than the minimum number of attributes. Next, select the number of models per size. The valid range is 1 to 5. The number of models per size must be less than the number of selected attributes.

Field Definitions

  • Available Attributes (and therefore Selected Attributes & Dependent Attribute) must be numeric.
  • Selected Attributes – You must have at least one Selected Attribute.
  • Dependent Attribute – You must specify a dependent attribute.
  • Model Size – The minimum number of attributes must be LESS THAN or equal to the number of selected attributes; the maximum number of attributes must also be LESS THAN or equal to the number of selected attributes, but greater than the minimum number of attributes.
  • Model size (Minimum and Maximum) - valid range is 1 to 99.
  • Number of models per size – valid range is 1 to 5.

Results

There is one results face with one tab (Detailed Results) containing the following:
Results
Results
The left two boxes describe all of the resulting models. The upper left box shows all models produced, along with each model’s Cp statistic and R² statistic. The lower left box graphs all models, with R² on the x-axis and Cp statistic on the y-axis.
Detailed Results
Detailed Results
Each model is represented by a colored circle, and the color corresponds with a specific number of attributes. The color key is shown below the graph. In general, the models with higher R² statistics are better, and the models with lower Cp statistics are better.

The right three boxes describe each individual model. Clicking on one of the models in the upper left models will bring up the three boxes specific to that model. The upper two right boxes contain overall models statistics. The lower right box contains attributes in the model, along with coefficients and other statistics for each attribute. As you click on different models, the three right boxes will reflect each of these different models.

To easily compare multiple models, you can export each model of interest to a spreadsheet, and compare side by side.

Output Stream

The candidates results tables can be exported into Excel. There is no data file output from candidates, as it is a terminal node. Once you choose a final model, you can go back and use either linear or logistic to build the final model and continue on to score, rank and evaluate the model.

Video Demonstration

References

None.



Retrieved from "/wiki/index.php/Candidates"