Caret:Documentation:StatisticsGLM
From Van Essen Lab
Contents |
General Linear Model (GLM)
The equation of the General Linear Model is Y = Xb + e where Y, X, b, and e are matrices. Statistical test such as Analysis of Variance and Regression are special cases of the General Linear Model. These statistical tests can be formulated and solved using the General Linear Model.
Regression Via Least Squares
The goal of linear regression is to predict the value of a dependent variable, Y, from one or more independent variables, X, where X and Y are observations from multiple subjects.
Regression Equation
Y_{i} = β_{0} + β_{1} * X_{i} + ε_{0}
- Y_{i} is the dependent variable that is predicted and depends upon the value X_{i}.
- X_{i} is the independent variable.
- β_{0} and β_{1} are coefficients. Furthermore, β_{0} is the Y-intercept (value of Y_{i} when X_{i} is zero) and β_{1} is the slope of the line.
- ε_{i} is random error.
- i ranges from 1 to N, the number of subjects.
It can be shown that:
Example
Data from Applied Linear Regression Models, page 44.
i | Y_{i} | X_{i} |
---|---|---|
1 | 73 | 30 |
2 | 50 | 20 |
3 | 128 | 60 |
4 | 170 | 80 |
5 | 87 | 40 |
6 | 108 | 50 |
7 | 135 | 60 |
8 | 69 | 30 |
9 | 148 | 70 |
10 | 132 | 60 |
For the data from the table above:
- β_{0} = 10.0
- β_{1} = 2.0
so
Y = 10.0 + 2.0X
Matlab Solution
Matrices
A matrix is a two dimensional array of numbers. The elements in the matrix are indexed by i and j where i is the row number (starting from the top at zero) and j is the column number (starting at zero from the left). In the matrix below, X_{01} is the element 2 and X_{12} is 4.
Transpose
Given a matrix A, its transpose is A'.
A =
A' =
Addition and Subtraction
To add or subtract matrices, the matrices involved MUST contain the same number of rows and columns.
A =
B =
A + B =
A - B =
Multiplication
To multiply matrices, the number of columns in the matrix on the left side of the "*" must equal the number of rows in the matrix on the right side of the "*".
A =
B =
A * B =
Identity
An identity matrix is a square matrix (number of rows equals number of columns) with the element one where the row index equals the column index (the diagonal) and zeros for all other elements. Multiplying a matrix by the identify matrix leaves the matrix unchanged (A * I = A).
I =
Linear Independence and Rank
A column (row) of a matrix is Linear Dependent if it is some linear combination of another column (row). In the matrix A, below, the elements of column 1 (1 2 3) can be multiplied by 3 producing (3 6 9) which is the third column of the matrix.
A =
The Rank of a matrix is its minimum number of linearly independent rows/columns. For the matrix A, its Rank is 2.
Inverse
For the matrix A, its inverse is denoted by A^{ − 1}and A * A^{ − 1} = A^{ − 1}&A = I (multiplying a matrix by its inverse yields the Identity matrix). Only square matrices have an inverse. In addition, a matrix must be nonsingular, that is, its rank must equals its number of rows and columns, to have an inverse. If the rank of a matrix is less than than its number of rows and columns, it is singular. The inverse of a singular matrix can be computed using a pseudoinverse.
A =
Regression Via GLM
Data from Applied Linear Regression Models, page 44.
i | Y_{i} | X_{i} |
---|---|---|
1 | 73 | 30 |
2 | 50 | 20 |
3 | 128 | 60 |
4 | 170 | 80 |
5 | 87 | 40 |
6 | 108 | 50 |
7 | 135 | 60 |
8 | 69 | 30 |
9 | 148 | 70 |
10 | 132 | 60 |
GLM
Least Squares Estimation
Least Squares Estimation Using Example Data
Result
Matlab Solution
One-Way ANOVA Via Sums Of Squares
A one-way ANOVA determines if the mean values at each node for two or more groups of subjects are statistically different. The groups being compared are allowed to have a different number of subjects.
Sum of Squares Formulas
K = Number of Groups
N = Total Number of Subjects
N_{i} = Number of Subjects in Group "i"
df_{Total} = N − 1
df_{Treatment} = K − 1
X_{ij} = Measurement for subject "j" in group "i"
Mean of group i,
Grand Mean,
SS_{Total} = SS_{Within} + SS_{Treatment}
Sum Of Squares Example
Data from Statistical Methods for Psychology, page 608.
Treatment 1 | Treatment 2 | Treatment 3 | Treatment 4 |
---|---|---|---|
8 | 5 | 3 | 6 |
9 | 7 | 4 | 4 |
7 | 3 | 1 | 9 |
SS_{Total} = 73.00
SS_{Error} = 27.333
SS_{Treatment} = 45.667
df_{Total} = 11
df_{Error} = 8
df_{Treatment} = 3
Sum of Squares Example in Matlab
One-Way ANOVA Via GLM
Recall that the equation of the General Linear Model is Y = Xb + e where Y, X, b, and e are matrices.
In the case of a One-Way ANOVA with N total subjects from K groups:
Matrix | Dimensions |
---|---|
Y | N x 1 |
X | N x (K + 1) |
b | (K + 1) x 1 |
e | N x 1 |
Design Matrix
In the case of ANOVA, X is a Design Matrix that indicates group membership of each subject. In the matrix X, there is one row for each subject and (K + 1) columns. The first column always contains ones. The remaining columns are used to indicate membership in a group with a value of one in column M indicating membership in group (M - 1) and a value of zero indicating not a member of the group.
The matrix Y contains the value from each subject, one per row.
Data from Statistical Methods for Psychology, page 608.
Treatment 1 | Treatment 2 | Treatment 3 | Treatment 4 |
---|---|---|---|
8 | 5 | 3 | 6 |
9 | 7 | 4 | 4 |
7 | 3 | 1 | 9 |
Recall from the General Linear Model that
However is singular so an inverse cannot be computed. To resolve this problem, the design matrix X is modified in two ways. First, the last column, indicating membership in the last group, is removed. Recall that a value of one in a column indicates membership in a group and a value of zero in a column indicates NOT membership in a group. The second modification is to use a value of negative one in a column to indicate membership in the last group (the column that was removed).
Computing Sums of Squares using General Linear Model
is an NxN identity matrix.
is an NxN matrix with all elements having the value 1.
is the Hat matrix equal to
Solving One-Way ANOVA Using GLM With Matrices in Matlab
Two-Sample T-Test via GLM
A two-sample T-Test is simply a one-way ANOVA with only two groups. Note that T is the square root of F.
Example Data
Data from http://www.statsdirect.com/help/parametric_methods/utt.htm.
Hight Protein | Low Protein |
---|---|
134 | 80 |
146 | 118 |
104 | 101 |
119 | 85 |
124 | 107 |
161 | 132 |
107 | 94 |
83 | |
113 | |
129 | |
97 | |
123 |
Solution with Matlab Using the GLM with Matrices
Solution Using Matlab's Two-Sample T-Test
References
The General Linear Model by S.J. Kiebel and A.P. Holmes
Neter, J., Wasserman, W., and Kutner, M.H. 1989. Applied Linear Regression Models. IRWIN, Homewood, IL, Second Edition.
Neter, J., Wasserman, W., and Kutner, M.H. Applied Linear Statistical Models. IRWIN, Homewood, IL, Third Edition.
Howell, David C. 2002. Statistical Methods for Psychology. Duxbury, Pacific Grove, CA, Fifth Edition.