The following options appear on the Classification Tree dialogs.

Variables In Input Data

The variables included in the data set appear here.

Selected Variables

Variables selected to be included in the output appear here.

Output Variable

The dependent variable or the variable to be classified appears here.

# Classes

Displays the number of classes in the Output Variable.

Specify "Success" class (for Lift Chart)

This option is selected by default. Click the drop-down arrow to select the value to specify a success. This option is enabled when the number of classes for the Output Variable is equal to 2.

Specify initial cutoff probability for success

Enter a value between 0 and 1 here to denote the cutoff probability for success. If the calculated probability for success for an observation is greater than or equal to this value, a success (or a 1) is predicted for that observation. If the calculated probability for success for an observation is less than this value, a non-success (or a 0) is predicted for that observation. The default value is 0.5. This option is enabled when the number of classes for the output variable is equal to 2.

 

Partitioning Options

Analytic Solver Data Mining includes the ability to partition a dataset from within a classification or prediction method by clicking Partition Data on the Parameters dialog. Analytic Solver Data Mining will partition your dataset (according to the partition options you set) immediately before running the classification method. If partitioning has already occurred on the dataset, this option will be disabled. For more information on partitioning, please see the Data Mining Partitioning chapter.

Rescale Data

Use Rescaling to normalize one or more features in your data during the data preprocessing stage. Analytic Solver Data Mining provides the following methods for feature scaling: Standardization, Normalization, Adjusted Normalization and Unit Norm. For more information on this new feature, see the Rescale Continuous Data section within the Transform Continuous Data chapter that occurs earlier in this guide.

Tree Growth

In the Tree Growth section, select Levels, Nodes, Splits, and Records in Terminal Nodes. Values entered for these options limit tree growth, i.e. if 10 is entered for Levels, the tree will be limited to 10 levels.

Prior Probability

Three options appear in the Prior Probability Dialog: Empirical, Uniform and Manual. 

  • If the first option is selected, Empirical, Analytic Solver Data Mining will assume that the probability of encountering a particular class in the dataset is the same as the frequency with which it occurs in the training data.   
  • If the second option is selected, Uniform, Analytic Solver Data Mining will assume that all classes occur with equal probability. 
  • Select the third option, Manual, to manually enter the desired class and probability value.

Prune (Using Validation Set)

If a validation partition exists, this option is enabled. When this option is selected, Analytic Solver Data Mining will prune the tree using the validation set. Pruning the tree using the validation set reduces the error from over-fitting the tree to the training data.

Show Feature Importance

Select Feature Importance to include the Features Importance table in the output. This table displays the variables that are included in the model along with their Importance value.

Maximum Number of Levels

This option specifies the maximum number of levels in the tree to be displayed in the output.
Note: If a tree is limited to X levels in the output (intentionally or due to Analytic Solver Basic's limit of 7 levels), Analytic Solver will draw the first X levels of the diagram.

Trees to Display

Select Trees to Display to select the types of trees to display: Fully Grown, Best Pruned, Minimum Error or User Specified.
• Select Fully Grown to “grow” a complete tree using the training data.
• Select Best Pruned to create a tree with the fewest number of nodes, subject to the constraint that the error be kept below a specified level (minimum error rate plus the standard error of that error rate).
• Select Minimum error to produce a tree that yields the minimum classification error rate when tested on the validation data.
• To create a tree with a specified number of decision nodes select User Specified and enter the desired number of nodes.

The following options appear on the Classification Tree - Step 3 of 3 dialog.

Score Training Data

Select these options to show an assessment of the performance of the tree in classifying the training Set. The report is displayed according to the specifications - Detailed, Summary, and Lift Chart. Lift Charts are only available when the Output Variable contains two categories.

Score Validation Data

These options are enabled when a Validation Set is present. Select these options to show an assessment of the performance of the tree in classifying the Validation Set. The report is displayed according to the specifications - Detailed, Summary, and Lift Chart. Lift Charts are only available when the Output Variable contains two categories.

Score Test Data

These options are enabled when a Test Set is present. Select these options to show an assessment of the performance of the tree in classifying the test data. The report is displayed according to the specifications - Detailed, Summary, and Lift Chart. Lift Charts are only available when the Output Variable contains two categories.>

Score New Data

For information on the Score New Data options, see the Scoring New Data section.