Caret:Documentation:StatisticsGLM

From Van Essen Lab

Jump to: navigation, search

Contents

General Linear Model (GLM)

The equation of the General Linear Model is Y = Xb + e where Y, X, b, and e are matrices. Statistical test such as Analysis of Variance and Regression are special cases of the General Linear Model. These statistical tests can be formulated and solved using the General Linear Model.

Regression Via Least Squares

The goal of linear regression is to predict the value of a dependent variable, Y, from one or more independent variables, X, where X and Y are observations from multiple subjects.

Regression Equation

Yi = β0 + β1 * Xi + ε0

  • Yi is the dependent variable that is predicted and depends upon the value Xi.
  • Xi is the independent variable.
  • β0 and β1 are coefficients. Furthermore, β0 is the Y-intercept (value of Yi when Xi is zero) and β1 is the slope of the line.
  • εi is random error.
  • i ranges from 1 to N, the number of subjects.


It can be shown that:

  • {\beta}_1 = \frac{\sum_{i=1}^N {(X_i - \overline{X}) (Y_i - \overline{Y})}}
                                                {\sum_{i=1}^N {(X_i - \overline{X})^2}}
  • {\beta}_0 = \overline{Y} - {\beta}_1 * \overline{X}


Example

Data from Applied Linear Regression Models, page 44.

i Yi Xi
1 73 30
2 50 20
3 128 60
4 170 80
5 87 40
6 108 50
7 135 60
8 69 30
9 148 70
10 132 60

Image:RegressionDataPlot.jpg

For the data from the table above:

  • β0 = 10.0
  • β1 = 2.0

so

Y = 10.0 + 2.0X

Matlab Solution

Image:Regression_LS_Matlab.jpg

Matrices

A matrix is a two dimensional array of numbers. The elements in the matrix are indexed by i and j where i is the row number (starting from the top at zero) and j is the column number (starting at zero from the left). In the matrix below, X01 is the element 2 and X12 is 4.


\begin{vmatrix}
 1 & 2 & 3\\
 6 & 5 & 4
\end{vmatrix}


\begin{vmatrix}
 X_{00} & X_{01} & X_{02}\\
 X_{10} & X_{11} & X_{12}
\end{vmatrix}

Transpose

Given a matrix A, its transpose is A'.

A = 
\begin{vmatrix}
 1 & 2 & 3\\
 6 & 5 & 4
\end{vmatrix}

A' = 
\begin{vmatrix}
 1 & 6 \\
 2 & 5 \\
 3 & 4
\end{vmatrix}

Addition and Subtraction

To add or subtract matrices, the matrices involved MUST contain the same number of rows and columns.

A = 
\begin{vmatrix}
 1 & 2 & 3\\
 6 & 5 & 4
\end{vmatrix}

B = 
\begin{vmatrix}
 17 & 8 & 12\\
 11 & 7 & 15
\end{vmatrix}

A + B = 
\begin{vmatrix}
 1 + 17 & 2 + 8 & 3 + 12\\
 6 + 11 & 5 + 7 & 4 + 15
\end{vmatrix} =
\begin{vmatrix}
 18 & 10 & 15\\
 17 & 12 & 19
\end{vmatrix}

A - B = 
\begin{vmatrix}
 1 - 17 & 2 - 8 & 3 - 12\\
 6 - 11 & 5 - 7 & 4 - 15
\end{vmatrix} =
\begin{vmatrix}
 -16 & -6 & -9\\
 -5 & -2 & -11
\end{vmatrix}

Multiplication

To multiply matrices, the number of columns in the matrix on the left side of the "*" must equal the number of rows in the matrix on the right side of the "*".

A = 
\begin{vmatrix}
 1 & 2 & 3\\
 6 & 5 & 4
\end{vmatrix}

B = 
\begin{vmatrix}
 17 & 8 \\
 11 & 7  \\
 9   & 10 
\end{vmatrix}

A * B = 
\begin{vmatrix}
(1*17 + 2*11 + 3*9)  & (1*8 + 2*7 + 3*10) \\
(6*17 + 5*11 + 4*9)  & (6*8 + 5*7 + 4*10)
\end{vmatrix} =
\begin{vmatrix}
 66 & 52 \\
 193 & 123
\end{vmatrix}

Identity

An identity matrix is a square matrix (number of rows equals number of columns) with the element one where the row index equals the column index (the diagonal) and zeros for all other elements. Multiplying a matrix by the identify matrix leaves the matrix unchanged (A * I = A).

I = 
\begin{vmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1)
\end{vmatrix}

Linear Independence and Rank

A column (row) of a matrix is Linear Dependent if it is some linear combination of another column (row). In the matrix A, below, the elements of column 1 (1 2 3) can be multiplied by 3 producing (3 6 9) which is the third column of the matrix.

A = 
\begin{vmatrix}
1 & 4 & 3 \\
2 & 6 & 6 \\
3 & 7 & 9
\end{vmatrix}

The Rank of a matrix is its minimum number of linearly independent rows/columns. For the matrix A, its Rank is 2.

Inverse

For the matrix A, its inverse is denoted by A − 1and A * A − 1 = A − 1&A = I (multiplying a matrix by its inverse yields the Identity matrix). Only square matrices have an inverse. In addition, a matrix must be nonsingular, that is, its rank must equals its number of rows and columns, to have an inverse. If the rank of a matrix is less than than its number of rows and columns, it is singular. The inverse of a singular matrix can be computed using a pseudoinverse.

A = 
\begin{vmatrix}
5 & 3 & 4 \\
-3 & 2 & 5 \\
7 & 4 & 6
\end{vmatrix}

A^{-1} =
\begin{vmatrix}
-0.533 & -0.133 & 0.466 \\
3.533 & 0.133 & -2.466 \\
-1.733 & 0.066 & 1.266
\end{vmatrix}

Regression Via GLM

Data from Applied Linear Regression Models, page 44.

i Yi Xi
1 73 30
2 50 20
3 128 60
4 170 80
5 87 40
6 108 50
7 135 60
8 69 30
9 148 70
10 132 60


GLM


\mathbf{Y} = \mathbf{X} * \boldsymbol{\Beta} + \boldsymbol{\Epsilon}


\mathbf{Y} = 
\begin{vmatrix}
73 \\
50 \\
128 \\
170 \\
87 \\
108 \\
135 \\
69 \\
148 \\
132 
\end{vmatrix},~~

\mathbf{X} = 
\begin{vmatrix}
1 & 30 \\
1 & 20 \\
1 & 60 \\
1 & 80 \\
1 & 40 \\
1 & 50 \\
1 & 60  \\
1 & 30 \\
1 & 70 \\
1 & 60
\end{vmatrix}, ~~

\boldsymbol{\Beta} = 
\begin{vmatrix}
b_0 \\
b_1
\end{vmatrix},~~

\boldsymbol{\Epsilon} = 
\begin{vmatrix}
\epsilon_0 \\
\epsilon_1 \\
\epsilon_2 \\
\epsilon_3 \\
\epsilon_4 \\
\epsilon_5 \\
\epsilon_6 \\
\epsilon_7 \\
\epsilon_8 \\
\epsilon_9 \\
\end{vmatrix}


Least Squares Estimation


\mathbf{X'} * \mathbf{X} * \mathbf{B} = \mathbf{X'} * \mathbf{Y}



\mathbf{B} = {(\mathbf{X'} * \mathbf{X})}^{-1} * \mathbf{X'} * \mathbf{Y}


Least Squares Estimation Using Example Data


\mathbf{X'} = 
\begin{vmatrix}
1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\
30 20 & 60 & 80 & 40 & 50 & 60 & 30 & 70 & 60
\end{vmatrix}


\mathbf{X'} * \mathbf{X} = 
\begin{vmatrix}
10 & 500 \\
500 & 28400
\end{vmatrix},~~

{(\mathbf{X'} * \mathbf{X})}^{-1} = 
\begin{vmatrix}
0.8353& -0.0147 \\
-0.0147& 0.0003
\end{vmatrix}


Result


\mathbf{B} = 
\begin{vmatrix}
10 \\
2
\end{vmatrix}

Matlab Solution

Image:Regression_GLM_Matlab.jpg

One-Way ANOVA Via Sums Of Squares

A one-way ANOVA determines if the mean values at each node for two or more groups of subjects are statistically different. The groups being compared are allowed to have a different number of subjects.

Sum of Squares Formulas

K = Number of Groups

N = Total Number of Subjects

Ni = Number of Subjects in Group "i"

dfTotal = N − 1

df_{Error} = \sum_{i=1}^{K} (N_i - 1) = N - K

dfTreatment = K − 1

Xij = Measurement for subject "j" in group "i"

Mean of group i, \bar{X_i} = \frac{\sum_{j=1}^{N_i} x_{ij}} {N_i}

Grand Mean, \bar{X_{..}} = \frac{\sum_{i=1}^{K} \sum_{j=1}^{N_i} X_{ij}}{N}

SS_{Total} = \sum_{i=1}^{K} \sum_{j=1}^{N_i} (X_{ij} - \bar{X_{..}})^2

SS_{Error} = \sum_{i=1}^{K} \sum_{j=1}^{N_i} (X_{ij} - \bar{X_i})^2

SS_{Treatment} = \sum_{i=1}^{K} N_i (\bar{X_i} - \bar{X_{..}})^2

SSTotal = SSWithin + SSTreatment

 MS_{Treatment} = \frac{SS_{Treatment}} {df_{Treatment}}

 MS_{Error} = \frac{SS_{Error}} {df_{Error}}

 F = \frac{MS_{Treatment}} {MS_{Error}}

Sum Of Squares Example

Data from Statistical Methods for Psychology, page 608.

Treatment 1 Treatment 2 Treatment 3 Treatment 4
8 5 3 6
9 7 4 4
7 3 1 9

SSTotal = 73.00

SSError = 27.333

SSTreatment = 45.667

dfTotal = 11

dfError = 8

dfTreatment = 3

 MS_{Treatment} = \frac{SS_{Treatment}} {df_{Treatment}} = \frac{45.667}{3} = 15.222

 MS_{Error} = \frac{SS_{Error}} {df_{Error}} = \frac{27.333}{8} = 3.417

 F = \frac{MS_{Treatment}} {MS_{Error}} = \frac{15.222}{3.417} = 4.46

Sum of Squares Example in Matlab

Image:Anova_SS_Data_Matlab.jpg

Image:Anova_SS_Result_Matlab.jpg

One-Way ANOVA Via GLM

Recall that the equation of the General Linear Model is Y = Xb + e where Y, X, b, and e are matrices.

In the case of a One-Way ANOVA with N total subjects from K groups:

Matrix Dimensions
Y N x 1
X N x (K + 1)
b (K + 1) x 1
e N x 1

Design Matrix

In the case of ANOVA, X is a Design Matrix that indicates group membership of each subject. In the matrix X, there is one row for each subject and (K + 1) columns. The first column always contains ones. The remaining columns are used to indicate membership in a group with a value of one in column M indicating membership in group (M - 1) and a value of zero indicating not a member of the group.

The matrix Y contains the value from each subject, one per row.

Data from Statistical Methods for Psychology, page 608.

Treatment 1 Treatment 2 Treatment 3 Treatment 4
8 5 3 6
9 7 4 4
7 3 1 9



\mathbf{Y} = 
\begin{vmatrix}
8 \\
9 \\
7 \\
5 \\
7 \\
3 \\
3 \\
4 \\
1 \\
6 \\
4 \\
9
\end{vmatrix}

\mathbf{X} = 
\begin{vmatrix}
1 & 1 & 0 & 0 & 0 \\
1 & 1 & 0 & 0 & 0 \\
1 & 1 & 0 & 0 & 0 \\
1 & 0 & 1 & 0 & 0 \\
1 & 0 & 1 & 0 & 0 \\
1 & 0 & 1 & 0 & 0 \\
1 & 0 & 0 & 1 & 0 \\
1 & 0 & 0 & 1 & 0 \\
1 & 0 & 0 & 1 & 0 \\
1 & 0 & 0 & 0 & 1 \\
1 & 0 & 0 & 0 & 1 \\
1 & 0 & 0 & 0 & 1
\end{vmatrix}


Recall from the General Linear Model that 
\mathbf{b} = {(\mathbf{X'} * \mathbf{X})}^{-1} * \mathbf{X'} * \mathbf{Y}

However (\mathbf{X'} * \mathbf{X}) is singular so an inverse cannot be computed. To resolve this problem, the design matrix X is modified in two ways. First, the last column, indicating membership in the last group, is removed. Recall that a value of one in a column indicates membership in a group and a value of zero in a column indicates NOT membership in a group. The second modification is to use a value of negative one in a column to indicate membership in the last group (the column that was removed).


\mathbf{X} = 
\begin{vmatrix}
1 & 1 & 0 & 0  \\
1 & 1 & 0 & 0  \\
1 & 1 & 0 & 0  \\
1 & 0 & 1 & 0  \\
1 & 0 & 1 & 0  \\
1 & 0 & 1 & 0  \\
1 & 0 & 0 & 1  \\
1 & 0 & 0 & 1  \\
1 & 0 & 0 & 1  \\
1 & -1 & -1 & -1  \\
1 & -1 & -1 & -1  \\
1 & -1 & -1 & -1 
\end{vmatrix}

Computing Sums of Squares using General Linear Model

SS_{Total} = \mathbf{Y'} * (\mathbf{I} - \frac{1}{n} * \mathbf{J}) * \mathbf{Y}

SS_{Error} = \mathbf{Y'} * \mathbf{Y} - \mathbf{b'} * \mathbf{X'} * \mathbf{Y}

SS_{Treatment} = \mathbf{Y'} * (\mathbf{H} - \frac{1}{n} * \mathbf{J}) * \mathbf{Y}

\mathbf{b} = {(\mathbf{X'} * \mathbf{X})}^{-1} * \mathbf{X'} * \mathbf{Y}

\mathbf{I} is an NxN identity matrix.

\mathbf{J} is an NxN matrix with all elements having the value 1.

\mathbf{H} is the Hat matrix equal to \mathbf{X} * {(\mathbf{X'} * \mathbf{X})}^{-1} * \mathbf{X'}

Solving One-Way ANOVA Using GLM With Matrices in Matlab

Image:AnovaMatlabGLM_Data1.jpg

Image:AnovaMatlabGLM_Data1.5.jpg

Image:AnovaMatlabGLM_Data2.jpg

Image:AnovaMatlabGLM_SS.jpg

Image:AnovaMatlabGLM_F.jpg

Two-Sample T-Test via GLM

A two-sample T-Test is simply a one-way ANOVA with only two groups. Note that T is the square root of F.

Example Data

Data from http://www.statsdirect.com/help/parametric_methods/utt.htm.

Hight Protein Low Protein
134 80
146 118
104 101
119 85
124 107
161 132
107 94
83
113
129
97
123

Solution with Matlab Using the GLM with Matrices

Image:TTestMatlabGLM_DataY.jpg Image:TTestMatlabGLM_DataX.jpg


Image:TTestMatlabGLM_Datab.jpg


Image:TTestMatlabGLM_DataI_DataJ.jpg


Image:TTestMatlabGLM_DataH.jpg


Image:TTestMatlabGLM_Data_SS_DF_F.jpg


Solution Using Matlab's Two-Sample T-Test

Image:TTestMatlab_T.jpg



References

The General Linear Model by S.J. Kiebel and A.P. Holmes

Neter, J., Wasserman, W., and Kutner, M.H. 1989. Applied Linear Regression Models. IRWIN, Homewood, IL, Second Edition.

Neter, J., Wasserman, W., and Kutner, M.H. Applied Linear Statistical Models. IRWIN, Homewood, IL, Third Edition.

Howell, David C. 2002. Statistical Methods for Psychology. Duxbury, Pacific Grove, CA, Fifth Edition.

Stats Direct

Personal tools
Sums Database