The following options appear on the k-Nearest Neighbors Classification dialogs.

k-Nearest Neighbors Step 1 of 3 Dialog

Variables In Input Data

This list contains the variables in the data set.

Selected Variables

This list contains the variables selected as input variables.

Output Variable

Enter the output variable to be classified.

# Classes

Enter the number of classes in the output variable.

Specify "Success" class (for Lift Chart)

This option is selected by default. Click the down arrow to select the value to specify a success. This option is enabled when the number of classes for the output variable is equal to 2.

Specify initial cutoff probability for success

Enter a value between 0 and 1 here to denote the cutoff probability for success. If the calculated probability for success for an observation is greater than or equal to this value, than a success (1) will be predicted for that observation. If the calculated probability for success for an observation is less than this value, then a non-success (0) will be predicted for that observation. The default value is 0.5. This option is enabled when the number of classes for the output variable is equal to 2.

 

 

 

 

 

# Neighbors (k)

This is the parameter k in the k-Nearest Neighbor algorithm. If the number of observations (rows) is less than 50, the value of k should be between 1 and the total number of observations (rows). If the number of rows is greater than 50, the value of k should be between 1 and 50. Note that if k is chosen as the total number of observations in the Training Set, all the observations in the Training Set become nearest neighbors. The default value for this option is 1.

Nearest Neighbors Search

If Score on Fixed K is selected, the output is displayed for the specified value of k.

If Score on Search 1...K is selected, Analytic Solver displays the output for the best k between 1 and the value entered for Number of nearest neighbors (k).

Prior Probability

According to relative occurrences in training data, Analytic Solver incorporates prior assumptions about how frequently the different classes occur. Analytic Solver assumes that the probability of encountering a particular class in the data set is the same as the frequency with which it occurs in the Training Set.

If the first option is selected, Empirical, Analytic Solver Data Mining will assume that the probability of encountering a particular class in the dataset is the same as the frequency with which it occurs in the training data. 

If the second option is selected, Uniform, Analytic Solver Data Mining will assume that all classes occur with equal probability. 

Select the third option, Manual, to manually enter the desired class and probability.

User specified prior probabilities - Select this option to manually enter the desired class and probability value.

Partitioning Options

Analytic Solver provides the ability to partition a data set from within a classification or prediction method by selecting Partitioning Options on the Step 2 of 3 dialog. If this option is selected, Analytic Solver partitions the data set (according to the partition options) immediately before running the prediction method. If partitioning has already occurred on the data set, this option is disabled. For more information on partitioning, see the Data Mining Partitioning section.   

 

 

 

 

 

 

 

 

 

 

Score Training Data

Select these options to show an assessment of the performance of the algorithm in classifying the Training Set. The report is displayed according to your specifications: Detailed, Summary, and Lift Chart. Lift Charts are only available when the Output Variable contains two categories.

Score Validation Data

These options are enabled when a validation data set exists. Select to show an assessment of the performance of the algorithm in classifying the Validation Set. The report is displayed according to your specification: Detailed, Summary, and Lift Charts. Lift Charts are only available when the Output Variable contains two categories.

Score Test Data

These options are enabled when a test set is present. Select these options to show an assessment of the performance of the tree in classifying the test data. The report is displayed according to your specifications: Detailed, Summary, and Lift Chart. Lift Charts are only available when the Output Variable contains two categories.

Score New Data

For more information on the Score New Data options, see the Scoring New Data section.