To score data with a prediction or classification method, or perform a time series forecast manually (without selecting the Score icon on XLMiner ribbon), enter a Psi Solver function into an Excel cell as an array.

Scoring Data Using PsiPredictXxx()

On the XLMiner ribbon, from the Applying Your Model tab, select Score to open the Select New Data Sheet & Stored Model Sheet dialog. Under Data to be Scored, select the New Data worksheet, and at Data range, select cells N2:N11. Enter =PsiPredictMLR(MLR_Stored!A2:R8,'New Data'!A2:M11) then press SHIFT+CTRL+ENTER to enter the formula as an array into all 10 cells (N2:N11).

The first argument, MLR_Stored!A2:R8, is the range of cells used by XLMiner to store the Multiple Linear Regression model on the MLR_Stored worksheet. This data range changes as the classification or prediction method changes, and as the number of features included in the data set changes. When entering this first argument, start with cell A2 and end with the last populated lowest, right-most cell. (There should be no other populated cells to the right or below the last cell in the range.)

The second argument, New Data!A2:M11, is the range containing the new data on the New Data worksheet. The new data must contain at least one row of data containing the same number of features (or columns) as the data used to create the model. In this example, we included 13 features in our model (CRIM, ZN, INDUS, CHAS, NOX, RM, AGE, DIS, RAD, TAX, PTRATIO, B, and LSTAT). As a result, our new data also contained these same exact features. We could have performed this prediction on only one row of new data, NewData!A2:M2, but chose to use all ten rows.

To enter this formula using the Function Arguments dialog, from the Excel Formula Bar, select Insert Function (fx) to open the Insert Function dialog. For category, select PSI Data Mining, then from function, select PsiPredictMLR, and click OK. The Function Arguments dialog opens.

 At Params, enter the first argument, MLR_Stored!A2:R8 (the range of cells used by XLMiner to store the Multiple Linear Regression model on the MLR_Stored worksheet), and at New_data, enter  the second argument, 'NewData'!A2:M11, and click OK.

 

Function Arguments Dialog

The scoring results are shown in the screenshot below in the N column.

New Data with Predicted Values

The PsiPredictMLR() functions are interactive, meaning that if a variable value is changed (i.e., the first LSTAT value in cell M2 changes from 4.5 to 9.5), the Predicted Value in cell N2 updates to reflect a new predicted value.

The remaining three PsiForecast functions can be used similarly using models from their respective stored model sheets.

Multiple Linear Regression

  • Psi Function:  PsiPredictMLR()
  • Stored Model Sheet:  MLR_Stored

Regression Tree

  • Psi Function:  PsiPredictRT()
  • Stored Model Sheet:  RT_Stored

Regression Tree Ensemble Methods

  • Psi Function:  PsiPredictRTEnsemble()
  • Stored Model Sheets:  RTBoost_Stored, RTBag_Stored or RTRandTrees_Stored

Neural Networks:

  • Psi Function:  PsiPredictNN()
  • Stored Model Sheet:  NNP_Stored

Neural Networks Ensemble Methods

  • Psi Function:  PsiPredictNNEnsemble()
  • Stored Model Sheets:  NNPBoost_Stored or NNPBag_Stored

Note: Scoring using k-Nearest Neighbors must be performed by clicking the Score icon on the XLMiner ribbon.

Scoring Data Using PsiClassifyXxx()

To score the same data using a classification algorithm, click back to the Data_Partition worksheet, then on the XLMiner ribbon, from the Data Mining tab, select Classify - Logistic Regression to open the Logistic Regression - Step 1 of 3 dialog. From the Output Variable list, select CAT.MEDV, then select all remaining variables (except MEDV) from the Input Variables list. (The CAT. MEDV variable is a categorical variable based on the MEDV variable, so there is no need to include both in the classification algorithm.) Click Finish to accept the default settings for options on the Logistic Regression - Step 2 of 3 dialog. (For a detailed explanation of this classification method, please see the Logistic Regression Classification Method section.)

From Excel, click the New Data tab to display ten new records containing the same 13 input variables used in the k-Nearest Neighbors classification method (CRIM, ZN, INDUS, CHAS, NOX, RM, AGE, DIS, RAD, TAX, PTRATIO, B, & LSTAT).

On the XLMiner ribbon, from the Applying Your Model tab, select Score to open the Select New Data Sheet & Stored Model Sheet dialog. Under Data to be Scored, confirm that New Data appears as the Worksheet, Scoring.xlsx as the Workbook, and the Data range is A1:M11. Under Stored Model, select MLR_Stored, then click Match By Name to match the Variables in the New Data with the Variables in the Stored Model. Click OK to score the data.

Select New Data Sheet & Stored Model Sheet Dialog 

XLMiner scores the new data and inserts the Predicted column on the LR_ModelScore worksheet.

New Data with Predicted Values

The Predicted column (cells (B12:B21) contains the scoring formulas. These formulas may be entered manually into a cell to bypass clicking the Score icon. These formulas are now interactive, meaning that if a value is changed for a variable, the Predicted value is updated. For example, change the RM value (average number of rooms) of 5.454 in cell H12 to 10. When the ENTER key is pressed, the Predicted value in cell B12 changes from 0 to 1.

To enter these PsiClassifyLR() formulas manually, click back to the New Data worksheet and select cells O2:O11. Enter =PsiClassifyLR(LR_Stored!A2:R11,'New Data'!A2:M11), then press SHIFT+CTRL+ENTER to enter the formula as an array into all ten cells (O2:O11).

The first argument, LR_Stored!A2:R11, is the range of cells used by XLMiner to store the logistic regression model on the LR_Stored worksheet. This data range changes as the classification or prediction method changes, and as the number of features included in the data set changes. When entering this first argument, start with cell A2 and end with the last populated lowest, right-most cell. (There should be no other populated cells to the right or below the last cell in the range.)

The second argument, New Data!A2:M11, is the range containing the new data on the New Data worksheet. The new data must contain at least one row of data containing the same number of features (or columns) as the data used to create the model. In this example, we included 13 features in the model (CRIM, ZN, INDUS, CHAS, NOX, RM, AGE, DIS, RAD, TAX, PTRATIO, B & LSTAT). As a result, our new data also contained these same features. We could have performed this prediction on only one row of new data, NewData!A2:M2, but chose to use all ten rows.

The scoring results are shown in the screenshot below in the O column.

New Data with Predicted Values

The PsiClassifyLR() functions are interactive, meaning that if a variable value is changed (i.e., if the RM value in cell F5 changes from 5 to 10), the Predicted classification in cell O2 updates to reflect a new predicted value of 1.

The remaining five PsiClassify functions can be used in the same way using models from their respective stored model sheets.

Discriminant Analysis

  • Psi Function:  PsiClassifyDA()
  • Stored Model Sheet:  DA_Stored

Logistic Regression

  • Psi Function:  PsiClassifyLR()
  • Stored Model Sheet:  LR_Stored

Classification Tree

  • Psi Function:  PsiClassifyCT()
  • Stored Model Sheet:  CT_Stored

Classification Tree Ensemble Methods

  • Psi Function:  PsiClassifyCTEnsemble()
  • Store Model Sheets:  CTBoost_Stored, CTBag_Stored or CTRandTrees_Stored

Naive Bayes***

  • Psi Function:  PsiClassifyNB()
  • Stored Model Sheet:  NNB_Stored

Neural Network**

  • Psi Function:  PsiClassifyNN()
  • Stored Model Sheet:  NNC_Stored

Neural Network Ensemble Methods

  • Psi Function:  PsiClassifyNNEnsemble()
  • Stored Model Sheets:  NNCBoost_Stored or NNCBag_Stored

Note: To score new data with the k-Nearest Neighbors classification method, on the Xlminer ribbon, from the Applying Your Model tab, select the Score icon.

Note: The Excel error #N/A is returned for PsiClassifyNB() when one or more records (in the Validation, Testing, or New Data partitions) are unable to be classified by the algorithm. This can happen when new feature values appear in the Validation Set that did not appear in the Training Set. This can be a result of the following scenarios.

  • If the data is partitioned, this can occur if the training partition data set does not include feature realization values that appear in the Validation or Test Sets.
  • The Training Seet could be too small.
  • The Training Set is not a representative sample.
  • The Training Set is of a different nature than the Validation Set.

For more information on this algorithm, see the Naïve Bayes Classification section.