The following options appear on the k-Nearest Neighbors Classification dialogs.

k-Nearest Neighbors Step 1 of 3 Dialog

Variables In Input Data

This list contains the variables in the data set.

Selected Variables

This list contains the variables selected as input variables.

Output Variable

Enter the output variable to be classified.

# Classes

Enter the number of classes in the output variable.

Specify "Success" class (for Lift Chart)

This option is selected by default. Click the down arrow to select the value to specify a success. This option is enabled when the number of classes for the output variable is equal to 2.

Specify initial cutoff probability for success

Enter a value between 0 and 1 here to denote the cutoff probability for success. If the calculated probability for success for an observation is greater than or equal to this value, than a success (1) will be predicted for that observation. If the calculated probability for success for an observation is less than this value, then a non-success (0) will be predicted for that observation. The default value is 0.5. This option is enabled when the number of classes for the output variable is equal to 2.

k-Nearest Neighbors Classification - Step 2 of 2 

Normalize input data

When this option is selected, XLMiner normalizes the data by expressing the entire data set in terms of standard deviations. This is done so that the distance measure is not dominated by a large magnitude variable. In this example, the values for Petal_width are between .1 and 2.5, while the values for Sepal_length are between 4.3 and 7.9. When the data is normalized, the actual variable value is replaced with the standard deviation from the mean of that variable.

Number of nearest neighbors (k)

This is the parameter k in the k-Nearest Neighbor algorithm. If the number of observations (rows) is less than 50, the value of k should be between 1 and the total number of observations (rows). If the number of rows is greater than 50, the value of k should be between 1 and 50. Note that if k is chosen as the total number of observations in the Training Set, all the observations in the Training Set become nearest neighbors. The default value for this option is 1.

Scoring Option

If Score on specified value of k as above is selected, the output is displayed for the specified value of k.

If Score on best k between 1 and specified value is selected, XLMiner displays the output for the best k between 1 and the value entered for Number of nearest neighbors (k).

Prior Class Probabilities

According to relative occurrences in training dataXLMiner incorporates prior assumptions about how frequently the different classes occur. XLMiner assumes that the probability of encountering a particular class in the data set is the same as the frequency with which it occurs in the Training Set.

Use equal prior probabilities  XLMiner assumes that all classes occur with equal probability.

User specified prior probabilities - Select this option to manually enter the desired class and probability value.

Partitioning Options

XLMiner V2015 provides the ability to partition a data set from within a classification or prediction method by selecting Partitioning Options on the Step 2 of 3 dialog. If this option is selected, XLMiner partitions the data set (according to the partition options) immediately before running the prediction method. If partitioning has already occurred on the data set, this option is disabled. For more information on partitioning, see the Data Mining Partitioning section.   

k-Nearest Neighbors Classification - Step 3 of 3

Score Training Data

Select these options to show an assessment of the performance of the algorithm in classifying the Training Set. The report is displayed according to your specifications: Detailed, Summary, and Lift Chart. Lift Charts are only available when the Output Variable contains two categories.

Score Validation Data

These options are enabled when a validation data set exists. Select to show an assessment of the performance of the algorithm in classifying the Validation Set. The report is displayed according to your specification: Detailed, Summary, and Lift Charts. Lift Charts are only available when the Output Variable contains two categories.

Score Test Data

These options are enabled when a test set is present. Select these options to show an assessment of the performance of the tree in classifying the test data. The report is displayed according to your specifications: Detailed, Summary, and Lift Chart. Lift Charts are only available when the Output Variable contains two categories.

Score New Data

For more information on the Score New Data options, see the Scoring New Data section.