Quadratic Discriminant Functions

Home

Linear or canonical discriminant functions assume that groups share common covariance within and between groups, whereas the quadratic discriminant functions while assuming multivariate normality, do not assume common covariance between groups.

The linear discriminant functions

\[ g_i(x) = w^t_i x + w_{i0} \] \[ w_i = \Sigma^{-1}\mu_i \] and \[ w_{i0} = -\frac{1}{2}\mu_i' \Sigma^{-1}\mu_i + \log{P (w_i)} \]

Are altered to introduce the quadratic term \(W_i\) giving the discriminant functions from [1] below

\[ g_i(x) = x'W_ix + x'w_i + w_{i0} \]

\[ = -\frac{1}{2} x' \Sigma_i^{-1} x + x'\Sigma_i^{-1} \mu_i - \frac{1}{2}\mu'_i\Sigma_i^{-1}\mu_i - \frac{1}{2}\log{|\Sigma_i|} + \log \pi_i \]

where \[ W_i = -\frac{1}{2}\Sigma_i^{-1} \] \[ w_i = \Sigma_i^{-1}\mu_i \] and

\[ w_{i0} = -\frac{1}{2}\mu'_i \Sigma_i^{-1}\mu_i - \frac{1}{2}\log |\Sigma_i| + \log P(\omega_i) \]

The calculation of the quadratic functions can employ the eigen decomposition of the estimate for \(\hat{\Sigma}\) as follows (from [2]).

\[ \hat{\Sigma_i} = U_i D_i U'_i \]

\[ \mu'_i \Sigma_i^{-1} \mu_i = [U'_i\mu_i]'D_i^{-1}[U'_i\mu_i] \] \[ (x-\hat{\mu_i})' \Sigma_i^{-1} (x-\hat{\mu_i}) = [U'_i(x-\hat{\mu_i})]'D_i^{-1}[U'_i(x-\hat{\mu_i})] \]

Note that \((x-\hat{\mu_i})' \Sigma_i^{-1} (x-\hat{\mu_i})\) expands to \(x'W_ix + x'w_i\).

\[ \log|\Sigma_i| = \sum_l \log d_{il} \] where \(d_{il}\) is the \(lth\) diagonal value of the eigenvalue matrix \(D_i\)

The maximum value of \(g_i(x)\) estimated by the discriminant function is then used to provide the classification for the corresponding group.

This implementation provides a naive implementation of the approach.

The primary drawback is where the number of groups \(m\) increases so too does the number of resulting decompositions.

Hence memory and storage can be an issue for computing the entire set of discriminant functions in one go.

One option is to compute each decomposition separately, since the covariance matrix is estimated within groups and to store those separately, rather than computing all at once.

An issue of interest is that of visualisation. The ordination available in a method such as lda is convenient for its application accross all groups. I am unsure as to whether qda is able to be used directly for visualisation, however, intuitively this may be possible perhaps by computing a projection for each group and combining the results. I’ll need to look further afield for potential application in this area.

Example Classification Application

The following example makes use of the wine data set which associates the chemical measurements of different wines with one of three producers for the wine. The task of the discriminant functions will be to define the boundaries that best separate the observations between each of the three groups.

import au.id.cxd.math.count.CrossTabulate
import au.id.cxd.math.data.MatrixReader
import au.id.cxd.math.function.transform.StandardisedNormalisation
import au.id.cxd.math.model.components.{CanonicalDiscriminantAnalysis, QuadraticDiscriminant}

val file1:String = s"$basePath/data/wine_data_train.csv"

val file2:String = s"$basePath/data/wine_data_test.csv"

val mat1 = MatrixReader.readFileAt(file1)
val mat2 = MatrixReader.readFileAt(file2)

// 1st column is group
// 14 columns
val trainGroups = mat1(::,0).toArray.map(_.toString).toList
val testGroups = mat2(::,0).toArray.map(_.toString).toList
val temp1 = mat1(::,1 to 13)
val temp2 = mat2(::,1 to 13)
val trainData = StandardisedNormalisation().transform(temp1)
val testData = StandardisedNormalisation().transform(temp2)


// build the canonical discriminant model.
val quadParams = QuadraticDiscriminant(trainData, trainGroups)

// perform the test classification.
val predictedClasses = QuadraticDiscriminant.classifyDiscriminant(testData, quadParams)

val predictGroups = predictedClasses.map(_._1)

val crosstab = CrossTabulate(testGroups, predictGroups)

val results = CrossTabulate.metrics(crosstab)


// translate for display in r

val metrics = Array(results._1, results._2, results._3, results._4, results._5, results._6)

crosstab

##      V1 V2 V3
## [1,] 16  4  0
## [2,]  0 12  0
## [3,]  0  0 12

(accuracy <- metrics[4])

## [1] 0.9090909

(error <- metrics[5])

## [1] 0.09090909

The qda method correctly assigns 40 out of the 44 examples to the original groups.

The comparison with lda is given below.

val groupNames = trainGroups.distinct.sorted

val compareLDA = CanonicalDiscriminantAnalysis(trainGroups, trainData)
val test = CanonicalDiscriminantAnalysis.classifyDiscriminant(testData,
  compareLDA._2,
  compareLDA._3,
  compareLDA._7,
  groupNames)

val predictions = test._4.map(_._2)

val crosstab2 = CrossTabulate(testGroups, predictions.toList)

val results2 = CrossTabulate.metrics(crosstab2)


// translate for display in r
val metrics2 = Array(results2._1, results2._2, results2._3, results2._4, results2._5, results2._6)

crosstab

##      V1 V2 V3
## [1,] 16  4  0
## [2,]  0 12  0
## [3,]  0  0 12

(accuracy <- metrics[4])

## [1] 0.9090909

(error <- metrics[5])

## [1] 0.09090909

Note the lda in this case generalises quite well, being less complex, with 41 correct labels applied out of 44.

The ability to perform multi-class assignment with both methods through the use of the decomposition of the estimates of the covariance matrix seems to have demonstrable utility in both inference (such as where applied in manova) and in classification tasks.

[1] Duda, Richard O. Hart, Peter E. Stork, David G. (2001 ). Pattern Classification, 2nd Edition. Wiley Interscience Publication

[2] Hastie, T. Tibshirani, R. Friedman, J. (2009 ). The Elements of Statistical Learning, 2nd Edition. Springer