On the Analytic Solver Data Science ribbon, from the Applying Your Model tab, select Help - Examples, then select Forecasting/Data Science Examples, and open the example file Utilities.xlsx. This example data set provides data on 22 public utilities in the U.S.
Select a cell within the data set, then on the XLMiner ribbon, from the Data Analysis tab, select Transform - Principal Components to open the Principal Components Analysis - Step1 of 3 dialog.
From the Variables In Input Data list, select variables x1 to x8, then click the > button to move them to the Selected Variables list, and click Next to open the Principal Components Analysis - Step 2 of 3 dialog.
XLMiner provides two routines for specifying the number of principal components: Fixed #components and Smallest # components explaining. Use the Fixed # components method to specify a fixed number of components or variables to be included in the reduced model. The Smallest #components explaining method allows the user to specify a percentage of the variance. When this method is selected, XLMiner calculates the minimum number of principal components required to account for that percentage of the variance.
XLMiner provides two methods for calculating the principal components: using the covariance, or the correlation matrix. When using the correlation matrix method, the data is normalized first before the method is applied (i.e., the data set is normalized by dividing each variable by its standard deviation). Normalizing gives all variables equal importance in terms of variability. If the covariance method is selected, the data set should first be normalized.
Select Use Correlation Matrix (Use Standardized Variables), then click Next to open the Principal Components - Step 3 of 3 dialog.
Confirm Show principal components score is selected, then click Finish. This option displays an output matrix where the columns are the principal components, the rows are the individual data records, and the value in each cell is the calculated score for that record on the relevant principal component.
For a description of Show Q-Statistics and Show Hotteling's T-Squared Statistics options, please see the Principal Components Options section.
Three worksheets are inserted after the Description worksheet: PCA_Summary, PCA_Scores, and PCA_Components. The output from PCA_Summary is displayed below.
The top section of the PCA_Output worksheet displays the number of principal components created (eight as selected in the Step 2 of 3 dialog), the number of records in the data set (No. of Patterns: 22), the method chosen (Matrix Used: Correlation selected in the Step 2 of 3 dialog), and the Component chosen (Component: Fixed Number as selected in the Step 2 of 3 dialog).
PCA_Components displays the principal component table. The maximum magnitude element for Component 1 corresponds to x2 (|-0.5712|). This signifies that the first principal component is measuring the effect of x2 on the utility companies. The second component appears to be measuring the effect of x4 on the utility companies (maximum magnitude = |-0.4091|). The first component accounts for 27.16% of the variance, while the second component accounts for 23.75%. Together, these two components account for more than 50% of the total variation. The maximum magnitude element for component 1 corresponds to x2.
The output from the PCA_Scores worksheet is displayed below. This table holds the weighted averages of the normalized variables (after each variable's mean is subtracted). This matrix is described in the secoond step of the PCA algorithm. We are looking for the magnitude or absolute value of each figure in the table.
Click back to the Data worksheet, select any cell in the data set, then on the XLMiner ribbon, from the Data Analysis tab, select Transform - Principal Components. Select cells x1 through x8, then click Next to advance to the Step 2 of 3 dialog.
Select Smallest # of components explaining, next to at least, enter 50 for % of variance, select Use Correlation Matrix (Use Standardized Variables), then click Finish.
From the output worksheet PCA_Components1, only the first two components are included in the output file, since these two components account for over 50% of the variation.
The output from the PCA_Scores1 worksheet is below. This table holds the weighted averages of the normalized variables (after each variable's mean is subtracted). This matrix is described in the second step of the PCA algorithm. We are looking for the magnitude or absolute value of each figure in the table.
After applying the PCA algorithm, proceed to analyze the data set by applying additional data science algorithms featured in Analytic Solver.
1. Shmueli, Galit, Nitin R. Patel, and Peter C. Bruce. Data Mining for Business Intelligence. 2nd ed. New Jersey: Wiley, 2010.