Discriminant Analysis

Introduction

Linear Discriminant analysis (LD) is a generative classifier; it models the joint probability distribution of the input and target variables. As a result, this classifier can “generate” new input variables given the target variable.

The discriminant analysis model is built using a set of observations for which the classes are known. This set of observations is sometimes referred to as the training set. Based on the training set, the technique constructs a set of linear functions of the predictors, known as discriminant functions, such that L = b1x1 + b2x2 + … + bnxn + c, where the b's are discriminant coefficients, the x's are the input variables or predictors and c is a constant.

These discriminant functions are used to predict the class of a new observation with an unknown class. For a k class problem, k discriminant functions are constructed. Given a new observation, all k discriminant functions are evaluated and the observation is assigned to the class with the largest discriminant function value.

Discriminant analysis assumes that:

The data is normally distributed.
Means of each class are specific to that class.
All classes have a common covariance matrix.

If these assumptions are realized, DA generates a linear decision boundary.

The latest version of Analytic Solver Data Science now contains Quadratic Discriminant Analysis (QDA). QDA produces a quadratic decision boundary, rather than a linear decision boundary. While QDA also assumes that the data is normally distributed, QDA does not assume that all classes share the same covariance matrix.

QDA is a more flexible technique when compared to LDA. QDA's performance improves over LDA when the class covariance matrices are disparate. Since each class has a different covariance matrix, the number of parameters that must be estimated increases significantly as the number of dimensions (predictors) increase. As a result, LDA might be a better choice over QDA on datasets with small numbers of observations and large numbers of classes. It’s advisable to try both techniques to determine which one performs best on your model. You can easily switch between LDA and QDA simply by setting this option to true or false.