This example illustrates the Ensemble Method results with the results from a single tree. On the XLMiner ribbon, from the Data Mining tab, select Partition - Standard Partition to open the Standard Partition dialog, then select a cell on the Data_Partition worksheet.

On the XLMiner ribbon, from the Data Mining tab, select  Classify - Classification Tree - Single Tree to open the Classification Tree - Step 1 of 3 dialog. 

Classification Tree - Step 1 of 3 Dialog 

At Output Variable, select CAT. MEDV, then from the Selected Variables list, select all remaining variables except MEDV. The MEDV variable is not included, since the CAT. MEDV variable is derived from the MEDV variable.

At Specify "Success" class (for Lift Chart), choose the value that will be the indicator of Success by clicking the down arrow. In this example, we will use the default of 1.

At Specify initial cutoff probability for success, enter a value between 0 and 1. If the Probability of success (probability of the output variable = 1) is less than this value, then a 0 is entered for the class value; otherwise, a 1 is entered for the class value. In this example, we will keep the default of 0.5.

For Maximum number of tree, keep the default of 7.

Click Next to advance to the Classification Tree - Step 2 of 3 dialog.

XLMiner normalizes the data when Normalize Input Data is selected. Normalization helps only if linear combinations of the input variables are used when splitting the tree. Keep this option unchecked.

Classification Tree - Step 2 of 3 Dialog 

Under Tree Growth, leave the defaults of levels and 7 for Maximum number of tree. The tree may also be limited by the number of splits or nodes by clicking the drop-down error next to levels. If levels is chosen, the tree contains the specified number of levels. If splits is selected, the number of times the tree is split is limited to the value entered, and if nodes is selected, the number of nodes in the entire tree is limited to the value specified.

Set Minimum #records in a terminal node to 30. XLMiner stops splitting a node (during tree growth) when the number of records in the Training Set in the node is below this setting.s

Prune Tree is selected by default when a Validation Set exists. When this option is selected, XLMiner prunes the tree using the Validation Set. (Pruning the tree using the Validation Set reduces the error from over-fitting the tree using the Training Set.)

XLMiner V2015 provides the ability to partition a data set from within a classification or prediction method by selecting Partition Options on the Step 2 of 3 dialog. If this option is selected, XLMiner partitions the data set (according to the partition options) immediately before running the prediction method. If partitioning has already occurred on the data set, this option is disabled. For more information on partitioning, see the Data Mining Partition section.

Click Next to advance to the Classification Tree - Step 3 of 3 dialog.

Classification Tree - Step 3 of 3 Dialog 

For Maximum # levels to be displayed, keep the default setting of 7 for Maximum # levels to be displayed. This option specifies the maximum number of levels in the tree to be displayed in the output.

Select Full tree (grown using training data) to grow a complete tree using the Training Set.

Select Best pruned tree (pruned using validation data) to grow a tree with the fewest number of nodes, subject to the constraint that the error be kept below a specified level (minimum error rate plus the standard error of that error rate).

Select Minimum error tree (pruned using validation data) to produce a tree that yields the minimum classification error rate when tested on the Validation Set.

To create a tree with a specified number of decision nodes, select Tree with specified number of decision nodes, and enter the desired number of nodes. Leave this option unselected for this example.

Under both Score Training Data and Score Validation Data, Summary Report and Lift Chart options are selected by default. Under both Score Training Data and Score Validation Data, select Detailed Report to produce a detailed assessment of the performance of the tree in both sets. Since we did not create a test partition, the options for Score Test Data are disabled. For information on creating a test partition, see the Data Mining Partition section.

Click Finish. Worksheets containing the output of the Classification Tree algorithm are inserted at the end of the workbook. Click the CT_Output worksheet to view the Output Navigator. Click any link in this section to navigate to various sections of the output.

 

Classification Tree Single Tree Output Navigator 

Click the CT_FullTree worksheet tab to view the full tree.

Classification Tree Single Tree Full Tree Output 

 

The objective of this example is to classify each case as a 0 (low median value) or a 1 (high median value). Consider the top decision node (denoted by a circle). The label above this node indicates the variable represented at this node (i.e., the variable selected for the first split) in this case, RM (Average # of Rooms ). The value inside the node indicates the split threshold. (Hover over the decision node to read the decision rule.) If the RM value for a specific record is greater than or equal to 6.861 (RM >= 6.861), the record will be assigned to the right node. If the RM value for the record is less than 6.861, the value will be assigned to the left node. There are 43 records with values for the RM variable greater than or equal to 6.861, while 261 records contained RM values less than 6.861. We can think of records with an RM value less than 6.861 (RM < 6.861) as tentatively classified as 0 (low median value), and any record where RM >= 6.861 can be tentatively classified as a 1 (high median value).

The 261 records with RM values less than 6.861 are further split as we move down the tree. The second split occurs with the LSTAT variable (percent of the population that is of lower socioeconomic status). The LSTAT values for 11 records (out of 261) fell below the split value of 4.91. These records are tentatively classified as a 1, meaning these records have low percentages of the population with lower socioeconomic status. The LSTAT values for the remaining 250 records are greater than or equal to 4.91, and are tentatively classified as 0.

A square node indicates a terminal node, which means there are no further slits. The 11 records split to the left from the LSTAT node are classified as 1 as indicated by the 1 in the middle of the square node. There are no further splits for this group. If there are fewer rooms and a low percentage of the population with lower socioeconomic status, then it is classified as a 1.

The 250 records assigned to the right are split again on the DIS variable (weighted distances to the five Boston employment centers). Records with a DIS variable value greater than or equal to 1.245 (247 records) are assigned to the node to the right (TAX), and records with a DIS variable value less than 1.245 (3 records) are assigned to the terminal node to the left.

The structure of the full tree will be clear by reading the Full - Grown Tree Rules. On the Output Navigator, click the Full-Grown Tree Rules link to open the Full-Grown Tree Rules table shown on the CT_Output worksheet.

Full-Grown Tree Rules (Using Training Dataset) 

The first entry in this table shows a split on the RM variable with a split value of 6.861. The 304 total cases were split between nodes 1 (LeftChild column) and 2 (Rightchild column).

Moving to NodeID 1, we find that 261 cases were assigned to this node (from node 0), which has a 0 value (Class column). These cases were split on the LSTAT variable using a value of 4.91: 250 cases assigned to node 4, and 11 cases assigned to node 3.

Moving to NodeID 4, we find that 250 cases were assigned to this node (from node 1), which has a 0 value. From here, these cases were split on the DIS variable using a value of 1.245 between nodes 7 (3 cases), and 8 (247 cases).

Moving to NodeID 8, we find that 247 cases were assigned to this node (from node 4), which has a 0 value. From here, these cases were split on the TAX variable using a value of 210.5 between nodes 11 (7 cases) and 12 (240 cases).

Moving to NodeID 11, we find that seven cases were assigned to this node (from node 8), which has a 0 value. Node 11 is a terminal node, so no other splits occur on this branch. This node has been classified as 0.

On the Output Navigator, click the Min-Error Tree link to view the Minimum Error Tree on the CT_MinErrorTree worksheet.

Min-Error Tree 

The minimum error tree is the tree that yields a minimum classification error rate when tested on the Validation Set. The misclassification (error) rate is measured as the tree is pruned. The tree that produces the lowest error rate is selected.

On the Output Navigator, click the Best-Pruned Tree link to view the Best Pruned Tree.

Best Pruned Tree 

 

The best pruned tree is based on the Validation Set, and is the smallest tree whose misclassification rate is within one standard error of the misclassification rate of the minimum error tree.

On the Output Navigator, click the Train Log link to navigate to the Training Log, and display the misclassification (error) rate as each additional node is added to the tree. Starting off at 0 nodes with the full data set, all records would be classified as low median value (0).

Training Log 

 

Click the Valid. Score Summary link to navigate to the Classification Confusion Matrix to display counts for cases that were correctly and incorrectly classified in the Validation Set. There were eight cases misclassified in the Training Data, resulting in a % error of 3.96.

Training Log 

 

Click the CT Train. Lift Chart and CT Valid. Lift Chart links to navigate to the Lift Charts, shown below.

Lift Charts consist of a lift curve and a baseline. After the model is built using the training data set, the model is used to score on the Training Set and the Validation Set (if one exists). Then the data set(s) are sorted using the predicted output variable value. After sorting, the actual outcome values of the output variable are cumulated and the lift curve is drawn as the number of cases (x-axis) versus the cumulated value (y -axis). The baseline (red line connecting the origin to the end point of the blue line) is drawn as the number of cases versus the average of actual output variable values multiplied by the number of cases. The greater the area between the lift curve and the baseline, the better the model.

In the Lift Chart (training Set) pictured below, the red line originating from the origin and connecting to the point (200, 37) is a reference line that represents the expected number of CAT MEDV predictions if XLMiner simply selected random cases  (i.e., no model was used). This reference line provides a yardstick against which to compare the model performance. From the Lift Chart below, we can infer that if we assigned 100 cases to class 1, about 37 1s would be included. If 100 cases were selected at random, we could expect about 17 1s.

The decile-wise lift curve is drawn as the decile number versus the cumulative actual output variable value divided by the decile's mean output variable value. The bars in this chart indicate the factor by which the MLR model outperforms a random assignment, one decile at a time. Refer to the validation graph below. In the first decile, taking the most expensive predicted housing prices in the data set, the predictive performance of the model is about 5.5 times better when simply assigning a random predicted value.

ROC curves plot the performance of binary classifiers by graphing true positive rates (TPR) versus false positive rates (FPR) as the cutoff value grows from 0 to 1. The closer the curve is to the top left corner of the graph (in other words, the smaller the area above the curve), the better the performance of the model.

In an ROC curve, we can compare the performance of a classifier with that of a random guess which would lie at a point along a diagonal line (red line) running from the origin (0, 0) to the point (1, 1). (This line is sometimes called the line of no-discrimination.) Anything to the left of this line signifies a better prediction, and anything to the right signifies a worse prediction. The best possible prediction performance would be denoted by a point at the top-left of the graph at the intersection of the x and y axis. This point is sometimes referred to as the perfect classification. Area Under the Curve (AUC) is the space in the graph that appears below the ROC curve. (This value is reported at the top of the ROC graph.) AUC is a value between 0 and 1. The closer the value AUC is to 1, the better the performance of the classification model. In this example, the AUC is very close to 1 in both the Training and Validation Sets, which indicates that this model is a good fit.

Single Tree Lift Chart &amp; ROC Curve for Training Dataset 

Single Tree Lift Chart &amp; ROC Curve for Validation Dataset

XLMiner generates the CT_Stored worksheet along with the other output sheets. For more information on scoring data, see the Scoring New Data section.