The following examples illustrate how XLMiner can be used to explore the data to uncover trends and seasonalities. On the XLMiner ribbon, from the Applying Your Model tab, select Help - Examples, then Forecasting/Data Mining Examples and open the example data set, Income.xlsx. This data set contains the average income of tax payers by state.
Typically the following steps are performed in a time series analysis.
1. The data is partitioned into two sets with 60% of the data assigned to the Training Set and 40% assigned to the Validation Set.
2. Exploratory techniques are applied to both the Training and Validation Sets. If the results are in synch, then the model can be fit. If the ACF and PACF plots are the same, then the same model can be used for both sets.
3. The model is fit using the ARIMA (Autoregressive Integrated Moving Average) method.
4. When a model is fit using the ARIMA method, XLMiner displays the ACF and PACF plots for residuals. If these plots are in the band of UCL and LCL, then the residuals are random and the model is adequate.
5. If the residuals are not within the bands, then some correlations exist, and the model should be improved.
First, perform a partition on the data. Select a cell within the data set, then on the XLMiner ribboon, from the Time Series tab, select Partition to open the Time Series Partition Data dialog.
Under the Variables list, select Year and click > to move to Time Variable. Select the remaining variables under the Variables list, and click > to include them in the Variables in the Partition Data list.
Under Specify Partitioning Options, select Specify #Records to specify the number of records assigned to the Training and Validation Sets. Under Specify #Records for Partitioning, select Specify # records, then enter 50 for the number of Training Set records and 21 for the number of Validation Set records.
If Specify percentages is selected under Specify Partitioning Options, XLMiner assigns a percentage of records to each set according to the values entered by the user or automatically entered by XLMiner under Specify Percentages for Partitioning.
Click OK. The Data_PartitionTS worksheet is inserted to the right of the Income worksheet.
In the output above, the partitioning method is sequential (versus random). The first 50 observations have been assigned to the Training Set and the remaining 21 observations have been assigned to the Validation Set.
Select a cell on the Data_PartitionTS worksheet, then on the XLMiner ribbon, from the Time Series tab, select ARIMA - Autocorrelations to display the ACF dialog.
Select CA as the Selected variable, enter 10 for both ACF Parameters for Training Data and Validation Data. Plot ACF chart is selected by default.
Click OK. The worksheet ACF_Output is inserted after the Data_PartitionTS worksheet.
Note on each chart that the autocorrelation decreases as the number of lags increase. This suggests that a definite pattern exists in each partition. However, since the pattern does not repeat, it can be assumed that no seasonality is included in the data. In addition, since both charts exhibit a similar pattern, we can fit the same model to both the Validation and Training Sets.
Click back to the Data_PartitionTS worksheet and on the XLMiner ribbon, from the Time Series tab, select ARIMA -- Partial Autocorrelations to open the PACF dialog.
Select CA from the Variables In Input Data list, then click > to move the variable to Selected variable. Enter 40 for Maximum Lag under PACF Parameters for Training Data, and 15 for PACF Parameters for Validation Data. Plot PACF chart is selected by default.
Click OK. The worksheet ACF_Output is inserted directly to the right of the Data_PartitionTS worksheet. Both PACF plots show similar patterns in both the Validation and Training Sets. As a result, we can use the same model for both sets.
PACF Output for Training Data
PACF Output for Validation Data
The PACF function shows a definite pattern, which means there is a trend in the data. However, since the pattern does not repeat, we can conclude that the data does not show any seasonality.
Both the ACF and PACF charts suggest that a definite pattern exists, but without any seasonality. Both data sets exhibit the same behavior in both the Training and Validation Sets, which suggests that the same model is appropriate for each. Now we are ready to fit the model.
The ARIMA model accepts three parameters: p - the number of autoregressive terms; d - the number of non-seasonal difference;, and q - the number of lagged errors (moving averages).
Recall that the ACF plot showed no seasonality in the data, which means that autocorrelation is almost static, decreasing with the number of lags increasing. This suggests setting q = 0 since there appears to be no lagged errors. The PACF plot displayed a large value for the first lag, but minimal plots for successive lags. This suggest setting p =1. With most data sets, setting d =1 is sufficient or can at least be a starting point.
Click back to the Data_PartitionTS worksheet and on the XLMiner ribbon, from the Time Series tab, select ARIMA - ARIMA model to bring up the Time Series - ARIMA dialog.
Select CA from the Variables In Input Data list, then click > to move the variable to the Selected variable field. Under Non-seasonal Parameters set Autoregressive (p) to 1, Difference (d) to 1, and Moving Average (q) to 0.
Click Advanced to open the ARIMA - Advanced Options dialog.
Select Fitted Values and residuals, Produce forecasts, and Report forecast confidence intervals. The default Confidence Level setting of 95 is automatically entered. Variance-covariance matrix is selected by default.
Click OK on the ARIMA-Advanced Options dialog and again on the Time Series - ARIMA dialog. XLMiner calculates and displays various parameters and charts in two output sheets, ARIMA_Output and ARIMA_Residuals. The ARIMA_Output worksheet contains the ARIMA model, shown below.
On this same worksheet, XLMiner has calculated the constant term and the AR1 term for our model. These are the constant and f1 terms of the forecasting equation. See the following output of the Chi-square test.
The small p-value for the constant term (0.9704) and AR1 term (0) suggests that the model is a good fit to our data.
Open the worksheet ARIMA_Residuals. This table plots the actual and fitted values and the resulting residuals. As shown in the graph below, the Actual and Forecasted values match up fairly well. The usefulness of the model in forecasting will depend upon how close the actual and forecasted values are in the Time plot of Validation Set.
Next, we will view the ACF and PACF plots for Errors found at the bottom of the ARIMA_Output worksheet.
All lags, except lag 1, are clearly within the UCL and LCL bands. This indicates that the residuals are random and are not correlated, which is the first indication that the model parameters are adequate for this data.
See the Forecast table on the ARIMA_Output worksheet.
The table shows the actual and forecasted value. The Lower and Upper values represent the lower and upper bounds of the confidence interval. There is a 95% chance that the forecasted value will fall into this range.
The time plot to the right indicates how the model, which we fitted using the Training Set and performed on the Validation Set. The actual and forecasted values are fairly close, which confirms that our model should be good for forecasting.
To plot the values under the Lower and Upper column in the same chart, select the graph, then on the Excel ribbon, select Design - Select Data to open the Select Data Source dialog.
For Chart data range, enter =ARIMA_Output!$B$56:$G$77, then uncheck Error under Legend Entries. Click OK.
This plot shows that the Actual and Forecasted values lie inside the Lower and Upper 95% Confidence Level bands. Although the Actual values do fluctuate a bit, these values fall within the center of the range. We can conclude from the ARIMA output, that our model using parameters (1, 1, 0) has been shown to adequately fit the data.