The following options appear on one of the six Classification Tree dialogs.

Classification Tree - Step 1 of 3 Dialog

Variables In Input Data

The variables included in the data set appear here.

Selected Variables

Variables selected to be included in the output appear here.

Output Variable

The dependent variable or the variable to be classified appears here.

# Classes

Displays the number of classes in the Output Variable.

Specify "Success" class (for Lift Chart)

This option is selected by default. Click the drop-down arrow to select the value to specify a success. This option is enabled when the number of classes for the Output Variable is equal to 2.

Specify initial cutoff probability for success

Enter a value between 0 and 1 here to denote the cutoff probability for success. If the calculated probability for success for an observation is greater than or equal to this value, a success (or a 1) is predicted for that observation. If the calculated probability for success for an observation is less than this value, a non-success (or a 0) is predicted for that observation. The default value is 0.5. This option is enabled when the number of classes for the output variable is equal to 2.

Classification Tree Boosting - Step 2 of 3 Dialog 

Normalize Input Data

XLMiner normalizes the data when Normalize input data is selected. Normalization helps if linear combinations of the input variables are used when splitting the tree. This option is not selected by default.

Boosting algorithm

The difference in the algorithms is the way in which the weights assigned to each observation or record are updated. See the Ensemble Methods section.

In AdaBoost.M1 (Freund), the constant is calculated as:

αb= ln((1-eb)/eb)

In AdaBoost.M1 (Breiman), the constant is calculated as:

αb= 1/2ln((1-eb)/eb)

In SAMME, the constant is calculated as:

αb= 1/2ln((1-eb)/eb + ln(k-1) where k is the number of classes

(When the number of categories is equal to 2, SAMME behaves the same as AdaBoost Breiman.)

The default selection is AdaBoost.M1 (Breiman).

Number of weak learners

This option controls the number of weak classification models that will be created. The ensemble method stops when the number or classification models created reaches the value set for the Number of weak learners. The algorithm then computes the weighted sum of votes for each class and assign the winning classification to each record. The default is 50.

Use re-weighting

If Use re-weighting is selected, the Adaboost algorithm calculates a weight for each record, and updates that weight on each iteration, while assigning higher weights to misclassified records.

Use re-sampling

If Use re-sampling is selected, the AdaBoost algorithm chooses a sample of records in each iteration, and assigns higher probabilities to misclassified records so that those records are favored in the next sample selection.

Maximum number of tree

The tree may also be limited by the number of splits or nodes by clicking the drop-down error next to levels. If levels is chosen, the tree contains the specified number of levels. If splits is selected, the number of times the tree is split is limited to the value entered, and if nodes is selected, the number of nodes in the entire tree limited to the value specified. The defaults are levels and 7.

Minimum # records in a terminal node

XLMiner will stop splitting the tree when all nodes contain the minimum number of records. The default setting is 10% of the number of records in the training dataset.

Partitioning Options

XLMiner V2015 provides the ability to partition a data set from within a classification or prediction method by selecting Partitioning Options on the Step 2 of 3 dialog. If this option is selected, XLMiner partitions the data set (according to the partitioning options) immediately before running the prediction method. If partitioning has already occurred on the data set, this option is disabled. For more information on partitioning, see the Data Mining Partition section.

Classification Tree Bagging - Step 2 of 3 Dialog 

Please see below for options that are unique to the Classification Tree Bagging - Step 2 of 3 dialog. For remaining option explanations, please see above.

Bootstrapping Random Seed

If an integer value appears for Bootstrapping Random seed, XLMiner will use this value to set the bootstrapping random number seed. Setting the random number seed to a nonzero value (any number of your choice is OK) ensures that the same sequence of random numbers is used each time the dataset is chosen for the classifier. The default value is 12345. If left blank, the random number generator is initialized from the system clock, so the sequence of random numbers will be different in each calculation. If you need the results from successive runs of the algorithm to another to be strictly comparable, you should set the seed. To do this, type the desired number you want into the box. This option accepts both positive and negative integers with up to nine digits.

Random Trees Classification - Step 2 of 3 Dialog

Feature selection random seed

If an integer value appears for Feature Selection Random seed, XLMiner uses this value to set the feature selection random number seed. Setting the random number seed to a nonzero value (any number of your choice is OK), ensures that the same sequence of random numbers is used each time the data set is chosen for the classifier. The default value is 12345. If left blank, the random number generator is initialized from the system clock, so the sequence of random numbers is different in each calculation. If you need the results from successive runs of the algorithm to another to be strictly comparable, you should set the seed. This option accepts both positive and negative integers with up to nine digits.

Number of randomly selected features

The Random Trees ensemble method works by training multiple weak classification trees using a fixed number of randomly selected features, then taking the mode of each class to create a strong classifier. Number of randomly selected features controls the fixed number of randomly selected features in the algorithm. The default setting is 4.

Following are descriptions of options that are unique to the Classification Tree - Step 2 of 3 dialog.

Classification Tree - Step 2 of 3 Dialog

Prune Tree

XLMiner prunes the tree using the Validation Set when Prune Tree is selected. (Pruning the tree using the Validation Set reduces the error from over-fitting the tree using the Training Set.) This option is selected by default. If no Validation Set exists, this option is disabled.

The following five options appear on the Classification Tree - Step 3 of 3 dialog.

Note:  The Trees section is only included on the Single Tree Step 3 of 3 dialog. 

Classification Tree - Step 3 of 3 Dialog

Maximum # levels to be displayed

Enter the maximum number of levels to be displayed in the full tree.

Full tree (grown using training data)

Select this option to grow a complete tree using the training data.

Best pruned tree (pruned using validation data)

Selecting this option results in a tree with the fewest number of nodes, subject to the constraint that the error be kept below a specified level (minimum error rate plus the standard error of that error rate).

Minimum error tree (pruned using validation data)

Select this option to produce a tree that yields the minimum classification error rate when tested on the Validation Set.

Tree with specified number of decision nodes

To create a tree with a specified number of decision nodes, select this option and enter the desired number of nodes.

 

The following options appear on all four Classification Tree - Step 3 of 3 dialogs.

Score Training Data

Select these options to show an assessment of the performance of the tree in classifying the training Set. The report is displayed according to the specifications - Detailed, Summary, and Lift Chart. Lift Charts are only available when the Output Variable contains two categories.

Score Validation Data

These options are enabled when a Validation Set is present. Select these options to show an assessment of the performance of the tree in classifying the Validation Set. The report is displayed according to the specifications - Detailed, Summary, and Lift Chart. Lift Charts are only available when the Output Variable contains two categories.

Score Test Data

These options are enabled when a Test Set is present. Select these options to show an assessment of the performance of the tree in classifying the test data. The report is displayed according to the specifications - Detailed, Summary, and Lift Chart. Lift Charts are only available when the Output Variable contains two categories.

Score New Data

For information on the Score New Data options, see the Scoring New Data section.