XLMiner provides four options when creating a Neural Network classifier: Boosting & Bagging (ensemble methods), Automatic, and Manual. This example focuses on creating a Neural Network using the boosting ensemble method. See the sections below for examples of creating a Neural Network using the bagging ensemble method, a manual architecture, and an automatic architecture.

On the XLMiner ribbon, from the Applying Your Model tab, select Help - Examples, then select Forecasting/Data Mining Examples to open the file Wine.xlsx. This file contains 13 quantitative variables measuring the chemical attributes of wine samples from three different wineries (Type variable). The objective is to assign a wine classification to each record. A portion of this data set is shown below.

Wine.xlsx Sample Dataset 

On the XLMiner ribbon, from the Data Mining tab, select Partition - Standard Partition to open the Standard Data Partition dialog. Partition the data into Training and Validation Sets using a Standard Data Partition with percentages of 80% of the data randomly allocated to the Training Set, and 20% of the data randomly allocated to the Validation Set. For more information on partitioning a data set, see the Data Mining Partition section.

 Standard Data Partition Dialog

Select a cell on the newly created Data_Partition worksheet, then on the XLMiner ribbon, from the Data Mining tab, select Classify - Neural Network -- Boosting to open the Boosting - Neural Network Classification - Step 1 of 3 dialog.

 Boosting - Neural Network Classification -- Step 1 of 3 Dialog

At Output Variable, select Type, and from the Selected Variables list, select all the remaining variables. Since the Output Variable contains three classes (A, B, and C) to denote the three different wineries, the options for Classes in the Output Variable are disabled. (The options under Classes in the Output Variable are only enabled when the number of classes is equal to 2.)

Click Next to advance to the Step 2 of 3 dialog.

Boosting - Neural Network Classification -- Step 2 of 3 Dialog 

The option Normalize input data is selected by default. Normalizing the data (subtracting the mean and dividing by the standard deviation) is important to ensure that the distance measure accords equal weight to each variable -- without normalization, the variable with the largest scale would dominate the measure.

Leave the Number of weak learners at the default of 50. This option controls the number of weak classification models that will be created. The ensemble method will stop when the number or classification models created reaches the value set for the Number of weak learners. The algorithm will then compute the weighted sum of votes for each class and assign the winning classification to each record.

Leave the default selection for Boosting Algorithm, AdaBoost.M1 (Breiman). The difference in the algorithms is the way in which the weights assigned to each observation or record are updated. See the Ensemble Method section.

In AdaBoost.M1 (Freund), the constant is calculated as:

αb= ln((1-eb)/eb)

In AdaBoost.M1 (Breiman), the constant is calculated as:

αb= 1/2ln((1-eb)/eb)

In SAMME, the constant is calculated as:

αb= 1/2ln((1-eb)/eb + ln(k-1) where k is the number of classes

(When the number of categories is equal to 2, SAMME behaves the same as AdaBoost Breiman.)

Leave the default integer value of 12345 for Neuron weight initialization seed, XLMiner uses this value to set the neuron weight random number seed. Setting the random number seed to a non-zero value ensures that the same sequence of random numbers is used each time the neuron weight are calculated. If left blank, the random number generator is initialized from the system clock, so the sequence of random numbers will be different in each calculation. If you need the results from successive runs of the algorithm to another to be strictly comparable, you should set the seed. To do this, type the desired number you want into the box.

Keep the default setting (1) for # Hidden Layers (max 4). Keep the default setting of 12 for # Nodes Per Layer. (Since # Hidden layers (max 4) is set to 1, only the first text box is enabled.)

Keep the default setting of 30 for # Epochs. An epoch is one sweep through all records in the Training Set.

Keep the default setting of 0.1 for Gradient Descent Step Size. This is the multiplying factor for the error correction during backpropagation; it is roughly equivalent to the learning rate for the neural network. A low value produces slow but steady learning, a high value produces rapid but erratic learning. Values for the step size typically range from 0.1 to 0.9.

Keep the default setting of 0.6 for Weight change momentum. In each new round of error correction, some memory of the prior correction is retained so that an outlier that crops up does not spoil accumulated learning.

Keep the default setting of 0.01 for Error tolerance. The error in a particular iteration is backpropagated only if it is greater than the error tolerance. Typically error tolerance is a small value in the range from 0 to 1.

Keep the default setting of 0 for Weight Decay. To prevent over-fitting of the network on the Training Data, set a Weight Decay to penalize the weight in each iteration. Each calculated weight will be multiplied by (1-decay).

Nodes in the hidden layer receive input from the input layer. The output of the hidden nodes is a weighted sum of the input values. This weighted sum is computed with weights that are initially set at random values. As the network learns these weights are adjusted. This weighted sum is used to compute the hidden node's output using a transfer function, or activation function. Select Standard (the default setting) to use a logistic function for the activation function with a range of 0 and 1. This function has a squashing effect on very small or very large values but is almost linear in the range where the value of the function is between 0.1 and 0.9. Select Symmetric to use the tanh function for the transfer function, the range being -1 to 1. Keep the default selection, Standard, for this example. If more than one hidden layer exists, this function is used for all layers.

As in the hidden layer output calculation (explained in the above paragraph), the output layer is also computed using the same transfer function. Select Standard (the default setting) to use a logistic function for the transfer function with a range of 0 and 1. Select Symmetric to use the tanh function for the transfer function, the range being -1 to 1. Select Softmax to use a generalization of the logistic function that maps a length-p vector of real values to a length-K vector of values. In neural networks, the Softmax function is often implemented at the final layer of a classification neural network to impose the constraints that the posterior probabilities for the output variable must be >= 0 and <= 1 and sum to 1. Keep the default selection, Standard, for this example.

XLMiner V2015 provides the ability to partition a data set from within a classification or prediction method by selecting Partitioning Options on the Step 2 of 3 dialog. If this option is selected, XLMiner partitions the data set (according to the partition options) before running the prediction method. If partitioning has already occurred on the data set, this option is disabled. For more information on partitioning, see the Data Mining Partition section. 

Click Next to advance to the Boosting - Neural Network Classification - Step 3 of 3 dialog.

Boosting - Neural Network Classification -- Step 3 of 3 Dialog 

Under Score Training Data and Score Validation Data, Summary Report is selected by default. Select Detailed Report under both Score Training Data and Score Validation Data. Lift Charts are disabled when the number of classes is greater than two. Since a Test Data partition was not created, the options under Score Test Data are disabled.

Click Finish to view the output.

Click the NNCBoost_Output worksheet to view the Output Navigator.

Boosting - Neural Network Classification Output Navigator 

Scroll down to the Classification Matrices to view the Neural Network Classification algorithm performed. The algorithm finished with no errors in the Training Set. The ensemble method misclassified one record in the Validation Set, resulting in an overall error of 2.78%. This small error in the Validation Set indicates that the model created with the boost ensemble method is a very good fit to the data. 

Boosting - Neural Network Classsification Output: Confusion Matrices

On the Output Navigator, click the NNCBoost_TrainScore to view the Classification of the Training Data. Here we see the Predicted class assignment versus the Actual class that each record in the Training Set is assigned. In addition, we see the probability for each class. The class assigned a probability greater than 0.5 becomes the assigned class.

 Boosting - Neural Network Classification Output Training Dataset Score

On the Output Navigator, click the NNCBoost_ValidScore to view the Classification of the Validation Data. Here we see the Predicted class assignment versus the Actual class that each record is assigned. In addition, we see the probability for each class. The class with the largest probability becomes the assigned class.

Boosting - Neural Network Classification Output:  Validation Dataset Score 

See the Scoring New Data section for information on the Stored Model Sheet NNC_Stored1.

Continue on with the Bagging Neural Network Classification Example to compare the results between the two ensemble methods.