XLMiner provides four options when creating a Neural Network classifier: Boosting, Bagging (ensemble methods), Automatic, and Manual. This example focuses on creating a Neural Network using an Automated network architecture.

On the XLMiner ribbon, from the Data Mining tab, select Classify - Neural Network - Automatic Network to open the Neural Network Classification (Automatic Arch.) - Step 1 of 2 dialog. Select a cell on the Data_Partition worksheet.

Neural Network Classification - Step 1 of 3 

At Output Variable, select Type, and from the Selected Variables list, select all remaining variables. Since the Output Variable contains three classes (A, B, and C), the options for Classes in the Output Variable are disabled. The options under Classes in the Output Variable are only enabled when the number of classes is equal to 2.

Click Next to advance to the Step 2 of 3 dialog.

Neural Network Classification - Step 2 of 3 

When an automated network is created, several networks are run with increasing complexity in the architecture. The networks are limited to two hidden layers, and the number of hidden neurons in each layer is bounded by UB1 = (#features + #classes) * 2/3 on the 1st layer and UB2 = (UB1 + #classes) * 2/3 on the 2nd layer.

All networks are trained with one hidden layer with the number of nodes not exceeding the UB1 and UB2 bounds, then a second layer is added and a two-layer architecture is tried until the UB2 limit is satisfied.

The limit on the total number of trained networks is the minimum of 100 and (UB1 * (1+UB2)). In this data set, there are 13 features in the model and three classes in the Type Output Variable with the following bounds.

UB1 = (13 + 3) * 2/3 = 10.67 ~ 10

UB2 = (10 + 3) * 2/3 = 8.67 ~ 8

# Networks Trained = MIN {100, (10 * (1 + 8)} = 90

The option Normalize input data is selected by default. Normalizing the data (subtracting the mean and dividing by the standard deviation) is important to ensure that the distance measure accords equal weight to each variable -- without normalization, the variable with the largest scale dominates the measure.

If an integer value appears for Neuron weight initialization seed, XLMiner uses this value to set the neuron weight random number seed. Setting the random number seed to a non-zero value ensures that the same sequence of random numbers is used each time the neuron weight are calculated. The default value is 12345. If left blank, the random number generator is initialized from the system clock, so the sequence of random numbers is different in each calculation. If you need the results from successive runs of the algorithm to another to be strictly comparable, enter a value for Set Seed.

At # Epochs, keep the default setting of 30. An epoch is one sweep through all records in the Training Set.

At Gradient Descent Step Size, keep the default setting of 0.1. This is the multiplying factor for the error correction during backpropagation; it is roughly equivalent to the learning rate for the neural network. A low value produces slow learning, a high value produces rapid (erratic) learning. Values for the step size typically range from 0.1 to 0.9.

For Weight Change Momentum, keep the default setting of 0.6. In each new round of error correction, some memory of the prior correction is retained so that an outlier that crops up does not spoil accumulated learning.

For Error Tolerance, keep the default setting of 0.01. The error in a particular iteration is backpropagated only if it is greater than the error tolerance. Typically, error tolerance is a small value in the range from 0 to 1.

For Weight Decay, keep the default setting of 0. To prevent over-fitting of the network on the Training Set, set a weight decay to penalize the weight in each iteration. Each calculated weight is multiplied by (1-decay).

Nodes in the hidden layer receive input from the input layer. The output of the hidden nodes is a weighted sum of the input values. This weighted sum is computed with weights that are initially set at random values. As the network learns, these weights are adjusted. This weighted sum is used to compute the hidden node's output using a transfer function, or activation function. Select Standard (default setting) to use a logistic function for the activation function with a range of 0 and 1. This function has a squashing effect on very small or very large values but is almost linear in the range where the value of the function is between 0.1 and 0.9. Select Symmetric to use the tanh function for the transfer function, the range being -1 to 1. For this example, keep the default selection, Standard. If more than one hidden layer exists, this function is used for all layers.

As in the hidden layer output calculation, the output layer is also computed using the same transfer function. Select Standard (the default setting) to use a logistic function for the transfer function with a range of 0 and 1. Select Symmetric to use the tanh (tangent) function for the transfer function, the range being -1 to 1. Select Softmax to use a generalization of the logistic function that maps a length-p vector of real values to a length-K vector of values. In neural networks, the Softmax function is often implemented at the final layer of a classification neural network to impose the constraints that the posterior probabilities for the output variable must be >= 0 and <= 1 and sum to 1. For this example, keep the default selection, Standard.

XLMiner V2015 provides the ability to partition a data set from within a classification or prediction method by selecting Partitioning Options on the Step 2 of 3 dialog. If this option is selected, XLMiner partitions the data set before running the prediction method. If partitioning has already occurred on the data set, this option is disabled. For more information on partitioning, please see the Data Mining Partition section.

Click Finish.

Click the NNCAuto_Output worksheet inserted to the right of the Data_Partition worksheet.

The top section of this worksheet includes the Output Navigator, which can be used to quickly navigate to various sections of the output. The Data, Variables, and Parameters/Options sections of the output worksheet all reflect user input. Further down, is the error report.

 

Neural Network Classification Output Error Report

 

This report may be sorted by clicking the arrow next to each column heading. Click the arrow next to V: % Error and from the  menu, select Sort Smallest to Largest.

  Output Navigator

Excel Sort Menu 

The network for Net ID 45 contains two hidden layers. Layer one contains five neurons, and layer two contains three neurons. The number of errors for this network were 37 or 26.06 % Error for the Training Set, and 10 or 27.78 % Error for the Validation Set. This Network ID has the smallest number of errors in the Validation Set.

 

  Sorted Error Report 

On the Error Report, click a Net ID (45) hyperlink to open the Neural Network Classification (Manual Arch.) - Step 1 of 3 dialog. Click Finish to run the Neural Net Classification method with Manual Architecture using the input and option settings specified for Net ID 45.

The layout of this report changes when the number of classes is reduced to two. See the NNC with Output Variable below for an example with a data set that includes two classes.

Scroll down the worksheet to see the Confusion Matrices for each Neural Network listed in the table above.

These matrices expand upon the information shown in the Error Report for each network ID. See the matrices below for NetID 45.

  Net ID 45 Confusion Matrix

Notice the number of networks trained and reported in the Error Report was 90 (# Networks Trained = MIN {100, (10 * (1 + 8)} = 90).

With this particular data set, the ensemble methods resulted in a better fit to the data versus the Automatic Network selection.