Feature Selection

Introduction

The Explore icon, on the Analytic Solver Data Science tab, provides access to Dimensionality Reduction via Feature Selection. Dimensionality Reduction is the process of deriving a lower-dimensional representation of original data (that still captures the most significant relationships) to be used to represent the original data in a model. This domain can be divided into two branches, Feature Selection and Feature Extraction. Feature Selection attempts to discover a subset of the original variables, while Feature Extraction attempts to map a high-dimensional model to a lower-dimensional space. In past versions, Analytic Solver Data Science contained one feature extraction tool that could be used outside of a classification or prediction method, Principal Components Analysis (Data Analysis - Transform - Principal Components).

Feature Selection

Analytic Solver Data Science offers a new tool for Dimensionality Reduction, Feature Selection. Feature Selection attempts to identify the best subset of variables (or features) out of the available variables (or features) to be used as input to a classification or prediction method. The main goals of Feature Selection are: to clean the data, to eliminate redundancies, and to identify the most relevant and useful information hidden within the data, thereby reducing the scale or dimensionality of the data. Feature Selection results in an enhanced ability to explore the data, visualize the data, and to make some previously infeasible analytic models feasible.

One important issue in Feature Selection is how to define the best subset. If using a supervised learning technique (classification/prediction model), the best subset would result in a model with the lowest misclassification rate or residual error. This presents a different question: which classification method should we use? A given subset (of variables) may be optimal for one method, but not for another. One might suggest to try all possible subsets. Unfortunately, the number of all possible combinations of variables can quickly grow to an exponential number, making the problem of finding the best subset (of variables) infeasible for even a moderate number of variables. Trying to find the best subset of 10 variables out of a total of 50 would lead to 10,272,278,170 combinations.

Feature Selection methods are divided into three major categories: filters, wrappers, and embedded approaches. The Feature Selection tool uses Filter Methods that provide the mechanisms to rank variables according to one or more univariate measure, and to select the top-ranked variables to represent the data in the model. Feature Selection may only be applied to supervised learning methods; the importance of a variable is based on its relation, or ability to predict the value of, the output variable. The measures used to rank the variables can be divided into three main categories: correlation-based tests, statistical tests, and information-theoretic measures. The definitive characteristic of Filter methods is their independence of any particular model; therefore making them widely applicable as a preprocessing step for supervised learning algorithms. Usually, filter methods are much less computationally expensive than other Feature Selection approaches. This means that when faced with a big data problem, these methods are sometimes the only methods that are computationally feasible. The major drawback is that filters do not examine subsets containing multiple variables, they only rank them individually. Although many individual features are not important by themselves, they become relevant when combined with other feature(s).

Feature Selection is a very important topic that becomes more relevant as the number of variables in a model increases.