XLMiner provides a comprehensive set of analysis features based both on statistical and machine learning methods. A problem or a data set can be analyzed by several methods. It is usually a good idea to try different approaches, compare their results, and then choose a model that suits the problem well.
Databases, Spreadsheets and Size Limits
XLMiner can work with large data sets which may exceed the limits in Excel. A standard procedure is to sample data from a larger database, bring it into Excel to fit a model, and, in the case of supervised learning routines, score output back out to the database. XLMiner can sample data from Oracle, SQL Server and Access databases, and in V4.0 from PowerPivot in-memory databases.
Since XLMiner V4.0 (unlike earlier XLMiner versions), this feature is available in both the Professional edition and the Educational edition. However, the Educational edition limits the size of the database table or view from which you can sample, as well as the size of the sample drawn. Further information on XLMiner's limits is available here.
Data Exploration and Visualization
XLMiner has built-in features for data exploration and visualization. It is no longer necessary to use external tools such as Tableau or SpotFire to visualize your data; this can be done easily and at no extra cost in XLMiner itself. You can create any number of chart windows (each containing multiple linked charts), name, and save these windows in the workbook. In a chart window, you can create bar, line, scatterplot, boxplot, and histogram charts with one click. Also available (with one click) are quick charts of all variables, scatterplot matrix charts, and parallel coordinates charts.
You can use your mouse to zoom in and out, or select points of interest in a given chart; these points will be 'brushed' or highlighted in other charts in the same chart window, and the actual values of variables at each highlighted point are visible in scrollable side pane. You can use filters for each continuous and categorical variable -- simple sliders and checkbox arrays that appear automatically in a side pane -- to include or exclude points of interest. With a simple point and click, you can change axes or colors, or create multiple panels based on the values of any categorical variable.
Exploring data in this fashion often yields quick insights about hidden relationships in the data, as well as "what is important, and what is not." This step can inform your choices of further operations, from data preparation and transformation to the fitting of data mining models.
There are five broad groups of operations in XLMiner:
A data set with known values of an outcome (response) variable is necessary to train a data mining model. For training a model, we usually choose (at random) a fraction of the available data -- the training partition. Trained models can then be applied to another partition -- the validation partition -- of the same data set to see how well they do with data that they were not trained with. In this phase, models can be adjusted and the best performing model selected. After a final model is selected, it can be applied to a third partition -- the test partition -- to test how well the final model will do with data that have been used neither in testing nor in validation.
XLMiner also supports partitioning with oversampling, used when rare events are modeled and you need to assure an adequate supply of those events in the modeling process. click a link below to learn more:
When the outcome variable is discrete or categorical, the objective of the data mining exercise is to classify the records into the discrete classes or categories.
XLMiner offers several techniques for classification:
- Discriminant Analysis
- Logistic Regression with best subset selection
- Classification Trees
- Naive Bayes Classifier
- Neural Networks
- k-Nearest Neighbors
When the outcome variable is continuous, the objective is to predict the value of the outcome variable for each of the data records.
XLMiner offers the following methods of prediction:
- Multiple Linear Regression with best subset selection
- k-Nearest Neighbors
- Regression Trees
- Neural Networks
Some problems involve detecting association among the properties of data records. XLMiner supports generation of Association Rules for showing which attributes of the data occur frequently together. One common application is to determine groups of products customers are likely to buy together, also known as Market Basket Analysis.
XLMiner offers time series forecasting, with the exploratory techniques ACF (Autocorrelation function) and PACF (Partial autocorrelation function), smoothing techniques (moving average, exponential, double exponential and Holt-Winters), as well as ARMA and ARIMA modeling.
Data Reduction and Exploration
It is often useful or necessary to reduce the dimensionality of data into only a few attributes that matter more than others. In this situation, we do not attempt to classify or predict an outcome variable. Instead, the objective is to discover similarities in records and group them together using the available attributes (variables).
One such method involves deciding which variables matter most in explaining differences among records. Other methods categorize data into clusters that can be represented as a new categorical variable added to the data.
XLMiner supports the following methods of data exploration and reduction:
Output presentation and graphics
XLMiner provides special graphics to enhance the understanding of the data and the analysis outcomes. For instance, tree diagrams in classification and regression trees, and dendrograms in hierarchical clustering give very useful insights.
In conjunction with XLMiner outputs, you can use Excel's built-in features to work with the output. For instance, histograms, scatter plots and bubble plots are very useful to provide an insight into the data and the fitted outcomes. Lift charts and gain charts can be easily generated from XLMiner outputs to see the benefit produced by the data mining exercise.