On the Analytic Solver Data Science ribbon, from the Applying Your Model tab, select Help - Examples to open the IrisFacto.xlsx example database, then select Forecasting/Data Science Examples. (This data set is derived from the example data set Iris.xlsx.) This example illustrates how to create dummy variables and category scores. Note that in this data set, Species_Name is a string variable.
Select a cell within the data set, and on the XLMiner ribbon, from the Data Analysis tab, select Transform - Transform Categorical Data - Create Dummies to open the Create Dummies dialog.
From the Variables list, select Species_name, then click > to move the variable to the Variables to be factored field.
Click OK, and view the output on the CreateDummies worksheet inserted after Sheet1.
The variable Species_name is expressed as three dummy variables: Species_name_Setosa, Species_name_Verginica, and Species_name_Versicolor. These new dummy variables are assigned values of either 1 to indicate that the record belongs, or 0 to indicate that the record does not belong. For example, Species_name_Setosa is assigned a value of 1 only when the value of Species_name=Setosa is in the data set. Otherwise, Species_name_Setosa = 0. The same is true for the two remaining dummy variables (i.e., Species_name_Verginica and Species_name_Versicolor).
XLMiner converted the string variable into three categorical variables that resulted in an entirely numeric data set.
Click back to the data set on Sheet1, select cell A2, and on the XLMiner ribbon, from the Data Analysis tab, select Transform - Transform Categorical Data - Create Category Scores to open the Create Category Scores dialog.
From the Variables list, select Species_name, and click > to move the variable to the Variables to be factored list. Keep the default option of Assign numbers 1,2,3....
Click OK and view the results on the CategoryVar worksheet that is inserted directly to the right of Sheet 1 and to the left of the CreateDummies worksheet.
XLMiner has sorted the values of the Species_name variable alphabetically, and then assigned values of 1, 2, or 3 to each record depending upon the species type (starting from 1 because we selected Assign numbers 1,2,3....). To force XLMiner to start from 0, select the option Assign numbers 0, 1, 2,… on the Create Category Scores dialog. The variable Species_name_ord is created to store these assigned numbers. Again, XLMiner has converted this data set to an entirely numeric data set.
On the XLMiner ribbon, from the Applying Your Model tab, select Help - Examples to open the Iris.xlsx example data set. Select a cell within the data set, then on the XLMiner ribbon, from the Data Analysis tab, select Transform - Transform Categorical Data - Reduce Categories to open the XLMiner Reduce Categories dialog.
At Category variable, click the down arrow, and select Petal_length as the variable.
Under Assign Category, select Manually.
All unique values of the Petal_length variable are listed. Select all categories with values less than 2, then at Category, click the down arrow and select 1, and click Apply.
Repeat these steps for categories with values from 3 to 3.9 and apply a Category number of 2. Continue repeating these steps until values ranging from 4 thru 4.9 are assigned a category number = 3, values ranging from 5 thru 5.9 are assigned a category number = 4, and values ranging from 6 thru 6.9 are assigned a category = 5.
If using Analytic Solver Platform or XLMiner Platform, the maximum number of categories will be equal to the number of unique values for the selected variable. In this instance the petal_length variable contains 43 unique values. If using Analytic Solver Pro or XLMiner Pro, the number of categories is limited to 30.
Note: Analytic Solver Pro and XLMiner Pro are limited to 30 categories. If you select By frequency, XLMiner assigns category numbers 1 through 29 to the most frequent 29 unique values, and category number 30 to all other unique values.
Click OK to produce the following output on the ReduceCat1 worksheet that is inserted directly to the right of the Data worksheet.
In the output, XLMiner has assigned new categories as shown in the column, Petal_length_red, based on the choices made in the Reduce Categories dialog.
Click back to the Data worksheet and on the XLMiner ribbon, from the Data Analysis tab, select Transform - Transform Categorical Data - Reduce Categories to open the Reduce Categories dialog. At Category variable, click the down arrow and select Petal_width. Leave the default setting of By frequency, and at Limit number of categories to, enter 12. Click Apply, then click OK.
The output worksheet ReduceCat2 is inserted to the right of the Data worksheet.
There are 22 unique values for Petal_width, and XLMiner has classified the Petal_width variable using 12 different categories. The most frequently appearing value is 0.2 (with 29 instances), which has been assigned to category 1. The second most frequently appearing value is 1.3 (with 13 instances), which has been assigned to category 2.
Incrementally increased category numbers are assigned to each value as the number of instances decreases, and until the 11th category is assigned. All remaining values are then lumped into category 12.