Class/Object

au.id.cxd.math.probability.analysis

Anova

Related Docs: object Anova | package analysis

Permalink

class Anova extends StatisticalTest

This class implements an Anova procedure as derived from 'Mathematical Statistics with Applications' 7th edition by Wackerly, et el.

That resource provides a full explanation, however a summary is given below.

The goal of Anova is to "identify the important independent variables and determine how they affect the response".

Where each independent variable is a factor and the intensity of the variable the level.

The procedure calculates the Total Sum of Squares which is the sum of the squared deviations of each variable from the mean and a remainder random error.

We consider that under the null hypothesis the independent variables are assumed to be unrelated to the response variable, and each portion of the total sum of squares divided by a corresponding constant provides an independent and unbiased estimator of $\sigma^2$ of the experimental error.

If a variable is highly related to the response its contribution to the total sum of squares will be large.

The variable sum of squares total $SST$ is compared with the sum of squares for the error $SSE$

An F test is used to determine whether the null hypothesis should be rejected.

This implementation addresses the case for $k$ variables, and is used to determine the F test for the null hypothesis $\mu_1 = \mu_2 = ... = \mu_k$ and common variance $\sigma^2$.

In this case we are working with matrices provided by the breeze library, at this time. However, it would be useful to later change the implementation to use another structure which allows uneven lengths of samples, since the number of rows for each sample will be equal in the matrix form.

It is possible however to use unqual sample sizes for each $ith$ sample.

An AnovaTable is used to contain the variables for the anova process providing a "one way layout" that comprises of the following elements.

The total sum of squares is computed form the $SSE$ and the $SST$

$$ TotalSS = \sum_{i=1 \in k} \sum_{j=1 \in n_i} (Y_{ij} - \bar{Y})^2 $$

It can be summarised as being the total of all observations squared subtracting the correction for the mean $CM$

$$ \sum_{i=1 \in k} \sum_{j=1 \in n_i} Y_{ij}^2 - CM $$

where the correction for the mean is calculated as the total for all observations squared divided by $n$

$$ CM = \frac{1}{n} ( \sum_{i=1 \in k} \sum_{j=1 \in n_i} Y_{ij} )^2 $$

The total of each sample set is defined as $Y_{i.}$:

$$ Y_i. = \sum_{j=1 \in n_i} Y_{ij} $$

and the mean of each sample set is estimated as $\bar{Y_{i.}}$.

$$ \bar{Y_{i.} } = \frac{1}{n_i} \sum_{j=1 \in n_i} Y_{ij} $$ $$ \frac{1}{n_i} Y_{i.} $$

This is used in calculating the Sum of squares for treatments which will be large if the differences between the treatments is also large.

$$ SST = \sum_{i=1 \in k} n_i (\bar{Y_{i.}} - \bar{Y})^2 $$

Note also that ${Z\ squared} = \frac{SST}{{\sigma^2}}$

having a $\chi^2$

distribution with $k-1$ df for $k$ factors.

$$ SST = \sum_{i=1 \in k} \frac{Y_{i.}^2}{n_i} - CM $$

The second part the sum of squared errors is computed as $$ SSE = Total SS - SST $$

However it can also represent a total of the sample variances multiplied by a degree of freedom as shown in Wackerly.

$$ SSE = \sum_{i=1 \in k} (n_i - 1) S_i^2 $$

where the sample variance $S^2$ is

$$ {S^2} = \frac{1}{n_i-1} \sum_{j=1 \in n_i} (Y_{ij} - \bar{Y_{i.}})^2 $$

which is an unbiased estimator of $ \sigma_i^2 = \sigma^2 $

The Mean squared error is an estimator for the pooled variance $S^2$ with $n-k$ degrees of freedom.

$$ MSE = \frac{SSE}{n-k} $$

The Mean square total is accumulated from the estimates of the mean for each sample with degree of freedom $k-1$. $$ MST = \frac{SST}{k-1} $$

Once the anova table is calculated the F-Test is used to test the null hypothesis $\mu_1 = \mu_2 = ... = \mu_k$ with even variance, and is rejected at the critical level $\alpha$

$$ F = \frac{MST}{MSE} > F_\alpha $$ The statistic is an F distribution with $k-1$ and $n-k$ numerator and denominator degrees of freedom.

The key assumptions are the normal assumption for the $k$ samples, with equal means and variance.

Example Usage

The example is derived from the test case TestAnovaInference which is also derived from an example in Wackerly on page 671.

The test data for the example is the following matrix with $k = 4$ sets of observations.

  /**
columns correspond to k samples
rows correspond to sample observation Y_ij
*/
val table = DenseMatrix(
(65.0, 75.0, 59.0, 94.0),
(87.0, 69.0, 78.0, 89.0),
(73.0, 83.0, 67.0, 80.0),
(79.0, 81.0, 62.0, 88.0),
(81.0, 72.0, 83.0, 0.0),
(69.0, 79.0, 76.0, 0.0),
(0.0, 90.0, 0.0, 0.0))

Note that incomplete examples have been padded with 0. In this implementation it would be best to use a "balanced" set of samples where the number for each observation is equal.

The Anova table is created as

val anova = Anova(table)

And a test for the null hypothesis at the critical level for $\alpha = 0.05$ to be performed using

val testResult = anova.test(0.05)

The test result will contain the anova table which can be printed or inspected for each of the table values. Inspecting the table will give a report for example:

     NumeratorDF: 3
DenominatorDF: 24
SST: 2876.107142857145
SSE: 23604.857142857145
MSE: 983.5357142857143
MST: 958.7023809523816
TotalDF: 27
TotalSS: 26480.96428571429
F-stat (observed statistic): 0.9747509592456765
F-alpha (critical value): 3.0000000000000013
P-Value: 0.4219003172019309
Observed-Prob 0.44613162513336035
alpha (significance level):0.05

In this example the F-stat > F-alpha and the test case rejects the null hypothesis.

The test result also has the "rejected" flag which indicates whether the null hypothesis is rejected. It is possible to use a $k$ fold approach to determine which of the samples may be rejected after the initial test.

The critival value is approximated using the trait CriticalValue for the UpperTail of the FDistribution.

Created by cd on 17/09/2014.

Linear Supertypes
StatisticalTest, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Anova
  2. StatisticalTest
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new Anova(X: DenseMatrix[Double])

    Permalink

Type Members

  1. class Intermediate extends AnyRef

    Permalink

    internal class for intermediate results

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. val X: DenseMatrix[Double]

    Permalink
  5. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  6. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  7. def cm(): Intermediate

    Permalink

    the correction for the mean

  8. val criticalVal: (Seq[Double]) ⇒ CriticalValue

    Permalink
  9. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  10. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  11. val fdist: FDistribution

    Permalink
  12. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  13. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  14. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  15. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  16. val k: Int

    Permalink
  17. def mse(accum: Intermediate): Intermediate

    Permalink

    the mean sum of squares

  18. def mst(accum: Intermediate): Intermediate

    Permalink

    the mean sum of squares statistic

  19. val n: Int

    Permalink

    f distribution n = rows * cols k = cols

    f distribution n = rows * cols k = cols

    df = (k-1), (n-k)

  20. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  21. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  22. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  23. def ssTreatment(accum: Intermediate): Intermediate

    Permalink

    sum of squares treatment $$

    sum of squares treatment $$

    SST = \sum_{i=1}k n_i (\bar{Y_{i.} } - \bar{Y} )2 = \sum_{i=1}k \frac{Y_{i.}2}{n_i} - CM

    $$

  24. def sse(accum: Intermediate): Intermediate

    Permalink

    the sum of squares error

  25. def statistic(): (Double, Intermediate)

    Permalink

    compute the F-statistic this is the assertion that $H_0: \mu_1 = \mu_2 = ...

    compute the F-statistic this is the assertion that $H_0: \mu_1 = \mu_2 = ... = \mu_k$ vs $H_a: $ none of the means are equal.

  26. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  27. def test(alpha: Double): TestResult

    Permalink

    perform the anova test at the supplied critical level

    perform the anova test at the supplied critical level

    Definition Classes
    AnovaStatisticalTest
  28. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  29. def totalSS(accum: Intermediate): Intermediate

    Permalink

    compute the total SS.

    compute the total SS. $$ Total SS = \sum_{i=1}k\sum_{j=1}{n_i} (Y_{ij} - \bar{Y})2 = \sum_{i=1}k\sum_{j=1}{n_i}Y_{ij}2 - CM $$

    $$ CM = \frac{1}{n} ( \sum_{i=1}k\sum_{j=1}{n_i} Y_{ij} ) ^2 $$

  30. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  31. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  32. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from StatisticalTest

Inherited from AnyRef

Inherited from Any

Ungrouped