# General Linear Model (GLM)

The equation of the General Linear Model is Y = Xb + e where Y, X, b, and e are matrices. Statistical test such as Analysis of Variance and Regression are special cases of the General Linear Model. These statistical tests can be formulated and solved using the General Linear Model.

# Regression Via Least Squares

The goal of linear regression is to predict the value of a dependent variable, Y, from one or more independent variables, X, where X and Y are observations from multiple subjects.

## Regression Equation

Yi = β0 + β1 * Xi + ε0

• Yi is the dependent variable that is predicted and depends upon the value Xi.
• Xi is the independent variable.
• β0 and β1 are coefficients. Furthermore, β0 is the Y-intercept (value of Yi when Xi is zero) and β1 is the slope of the line.
• εi is random error.
• i ranges from 1 to N, the number of subjects.

It can be shown that:

• ${\beta}_1 = \frac{\sum_{i=1}^N {(X_i - \overline{X}) (Y_i - \overline{Y})}} {\sum_{i=1}^N {(X_i - \overline{X})^2}}$
• ${\beta}_0 = \overline{Y} - {\beta}_1 * \overline{X}$

## Example

Data from Applied Linear Regression Models, page 44.

i Yi Xi
1 73 30
2 50 20
3 128 60
4 170 80
5 87 40
6 108 50
7 135 60
8 69 30
9 148 70
10 132 60

For the data from the table above:

• β0 = 10.0
• β1 = 2.0

so

Y = 10.0 + 2.0X

# Matrices

A matrix is a two dimensional array of numbers. The elements in the matrix are indexed by i and j where i is the row number (starting from the top at zero) and j is the column number (starting at zero from the left). In the matrix below, X01 is the element 2 and X12 is 4. $\begin{vmatrix} 1 & 2 & 3\\ 6 & 5 & 4 \end{vmatrix}$ $\begin{vmatrix} X_{00} & X_{01} & X_{02}\\ X_{10} & X_{11} & X_{12} \end{vmatrix}$

## Transpose

Given a matrix A, its transpose is A'.

A = $\begin{vmatrix} 1 & 2 & 3\\ 6 & 5 & 4 \end{vmatrix}$

A' = $\begin{vmatrix} 1 & 6 \\ 2 & 5 \\ 3 & 4 \end{vmatrix}$

To add or subtract matrices, the matrices involved MUST contain the same number of rows and columns.

A = $\begin{vmatrix} 1 & 2 & 3\\ 6 & 5 & 4 \end{vmatrix}$

B = $\begin{vmatrix} 17 & 8 & 12\\ 11 & 7 & 15 \end{vmatrix}$

A + B = $\begin{vmatrix} 1 + 17 & 2 + 8 & 3 + 12\\ 6 + 11 & 5 + 7 & 4 + 15 \end{vmatrix} = \begin{vmatrix} 18 & 10 & 15\\ 17 & 12 & 19 \end{vmatrix}$

A - B = $\begin{vmatrix} 1 - 17 & 2 - 8 & 3 - 12\\ 6 - 11 & 5 - 7 & 4 - 15 \end{vmatrix} = \begin{vmatrix} -16 & -6 & -9\\ -5 & -2 & -11 \end{vmatrix}$

## Multiplication

To multiply matrices, the number of columns in the matrix on the left side of the "*" must equal the number of rows in the matrix on the right side of the "*".

A = $\begin{vmatrix} 1 & 2 & 3\\ 6 & 5 & 4 \end{vmatrix}$

B = $\begin{vmatrix} 17 & 8 \\ 11 & 7 \\ 9 & 10 \end{vmatrix}$

A * B = $\begin{vmatrix} (1*17 + 2*11 + 3*9) & (1*8 + 2*7 + 3*10) \\ (6*17 + 5*11 + 4*9) & (6*8 + 5*7 + 4*10) \end{vmatrix} = \begin{vmatrix} 66 & 52 \\ 193 & 123 \end{vmatrix}$

## Identity

An identity matrix is a square matrix (number of rows equals number of columns) with the element one where the row index equals the column index (the diagonal) and zeros for all other elements. Multiplying a matrix by the identify matrix leaves the matrix unchanged (A * I = A).

I = $\begin{vmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1) \end{vmatrix}$

## Linear Independence and Rank

A column (row) of a matrix is Linear Dependent if it is some linear combination of another column (row). In the matrix A, below, the elements of column 1 (1 2 3) can be multiplied by 3 producing (3 6 9) which is the third column of the matrix.

A = $\begin{vmatrix} 1 & 4 & 3 \\ 2 & 6 & 6 \\ 3 & 7 & 9 \end{vmatrix}$

The Rank of a matrix is its minimum number of linearly independent rows/columns. For the matrix A, its Rank is 2.

## Inverse

For the matrix A, its inverse is denoted by A − 1and A * A − 1 = A − 1&A = I (multiplying a matrix by its inverse yields the Identity matrix). Only square matrices have an inverse. In addition, a matrix must be nonsingular, that is, its rank must equals its number of rows and columns, to have an inverse. If the rank of a matrix is less than than its number of rows and columns, it is singular. The inverse of a singular matrix can be computed using a pseudoinverse.

A = $\begin{vmatrix} 5 & 3 & 4 \\ -3 & 2 & 5 \\ 7 & 4 & 6 \end{vmatrix}$ $A^{-1} = \begin{vmatrix} -0.533 & -0.133 & 0.466 \\ 3.533 & 0.133 & -2.466 \\ -1.733 & 0.066 & 1.266 \end{vmatrix}$

# Regression Via GLM

Data from Applied Linear Regression Models, page 44.

i Yi Xi
1 73 30
2 50 20
3 128 60
4 170 80
5 87 40
6 108 50
7 135 60
8 69 30
9 148 70
10 132 60

## GLM $\mathbf{Y} = \mathbf{X} * \boldsymbol{\Beta} + \boldsymbol{\Epsilon}$ $\mathbf{Y} = \begin{vmatrix} 73 \\ 50 \\ 128 \\ 170 \\ 87 \\ 108 \\ 135 \\ 69 \\ 148 \\ 132 \end{vmatrix},~~ \mathbf{X} = \begin{vmatrix} 1 & 30 \\ 1 & 20 \\ 1 & 60 \\ 1 & 80 \\ 1 & 40 \\ 1 & 50 \\ 1 & 60 \\ 1 & 30 \\ 1 & 70 \\ 1 & 60 \end{vmatrix}, ~~ \boldsymbol{\Beta} = \begin{vmatrix} b_0 \\ b_1 \end{vmatrix},~~ \boldsymbol{\Epsilon} = \begin{vmatrix} \epsilon_0 \\ \epsilon_1 \\ \epsilon_2 \\ \epsilon_3 \\ \epsilon_4 \\ \epsilon_5 \\ \epsilon_6 \\ \epsilon_7 \\ \epsilon_8 \\ \epsilon_9 \\ \end{vmatrix}$

## Least Squares Estimation $\mathbf{X'} * \mathbf{X} * \mathbf{B} = \mathbf{X'} * \mathbf{Y}$ $\mathbf{B} = {(\mathbf{X'} * \mathbf{X})}^{-1} * \mathbf{X'} * \mathbf{Y}$

## Least Squares Estimation Using Example Data $\mathbf{X'} = \begin{vmatrix} 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 30 20 & 60 & 80 & 40 & 50 & 60 & 30 & 70 & 60 \end{vmatrix}$ $\mathbf{X'} * \mathbf{X} = \begin{vmatrix} 10 & 500 \\ 500 & 28400 \end{vmatrix},~~ {(\mathbf{X'} * \mathbf{X})}^{-1} = \begin{vmatrix} 0.8353& -0.0147 \\ -0.0147& 0.0003 \end{vmatrix}$

## Result $\mathbf{B} = \begin{vmatrix} 10 \\ 2 \end{vmatrix}$

# One-Way ANOVA Via Sums Of Squares

A one-way ANOVA determines if the mean values at each node for two or more groups of subjects are statistically different. The groups being compared are allowed to have a different number of subjects.

## Sum of Squares Formulas

K = Number of Groups

N = Total Number of Subjects

Ni = Number of Subjects in Group "i"

dfTotal = N − 1 $df_{Error} = \sum_{i=1}^{K} (N_i - 1) = N - K$

dfTreatment = K − 1

Xij = Measurement for subject "j" in group "i"

Mean of group i, $\bar{X_i} = \frac{\sum_{j=1}^{N_i} x_{ij}} {N_i}$

Grand Mean, $\bar{X_{..}} = \frac{\sum_{i=1}^{K} \sum_{j=1}^{N_i} X_{ij}}{N}$ $SS_{Total} = \sum_{i=1}^{K} \sum_{j=1}^{N_i} (X_{ij} - \bar{X_{..}})^2$ $SS_{Error} = \sum_{i=1}^{K} \sum_{j=1}^{N_i} (X_{ij} - \bar{X_i})^2$ $SS_{Treatment} = \sum_{i=1}^{K} N_i (\bar{X_i} - \bar{X_{..}})^2$

SSTotal = SSWithin + SSTreatment $MS_{Treatment} = \frac{SS_{Treatment}} {df_{Treatment}}$ $MS_{Error} = \frac{SS_{Error}} {df_{Error}}$ $F = \frac{MS_{Treatment}} {MS_{Error}}$

## Sum Of Squares Example

Data from Statistical Methods for Psychology, page 608.

Treatment 1 Treatment 2 Treatment 3 Treatment 4
8 5 3 6
9 7 4 4
7 3 1 9

SSTotal = 73.00

SSError = 27.333

SSTreatment = 45.667

dfTotal = 11

dfError = 8

dfTreatment = 3 $MS_{Treatment} = \frac{SS_{Treatment}} {df_{Treatment}} = \frac{45.667}{3} = 15.222$ $MS_{Error} = \frac{SS_{Error}} {df_{Error}} = \frac{27.333}{8} = 3.417$ $F = \frac{MS_{Treatment}} {MS_{Error}} = \frac{15.222}{3.417} = 4.46$

# One-Way ANOVA Via GLM

Recall that the equation of the General Linear Model is Y = Xb + e where Y, X, b, and e are matrices.

In the case of a One-Way ANOVA with N total subjects from K groups:

Matrix Dimensions
Y N x 1
X N x (K + 1)
b (K + 1) x 1
e N x 1

## Design Matrix

In the case of ANOVA, X is a Design Matrix that indicates group membership of each subject. In the matrix X, there is one row for each subject and (K + 1) columns. The first column always contains ones. The remaining columns are used to indicate membership in a group with a value of one in column M indicating membership in group (M - 1) and a value of zero indicating not a member of the group.

The matrix Y contains the value from each subject, one per row.

Data from Statistical Methods for Psychology, page 608.

Treatment 1 Treatment 2 Treatment 3 Treatment 4
8 5 3 6
9 7 4 4
7 3 1 9 $\mathbf{Y} = \begin{vmatrix} 8 \\ 9 \\ 7 \\ 5 \\ 7 \\ 3 \\ 3 \\ 4 \\ 1 \\ 6 \\ 4 \\ 9 \end{vmatrix}$ $\mathbf{X} = \begin{vmatrix} 1 & 1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \\ 1 & 0 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 & 0 \\ 1 & 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 & 1 \end{vmatrix}$

Recall from the General Linear Model that $\mathbf{b} = {(\mathbf{X'} * \mathbf{X})}^{-1} * \mathbf{X'} * \mathbf{Y}$

However $(\mathbf{X'} * \mathbf{X})$ is singular so an inverse cannot be computed. To resolve this problem, the design matrix X is modified in two ways. First, the last column, indicating membership in the last group, is removed. Recall that a value of one in a column indicates membership in a group and a value of zero in a column indicates NOT membership in a group. The second modification is to use a value of negative one in a column to indicate membership in the last group (the column that was removed). $\mathbf{X} = \begin{vmatrix} 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 0 & 1 \\ 1 & -1 & -1 & -1 \\ 1 & -1 & -1 & -1 \\ 1 & -1 & -1 & -1 \end{vmatrix}$

## Computing Sums of Squares using General Linear Model $SS_{Total} = \mathbf{Y'} * (\mathbf{I} - \frac{1}{n} * \mathbf{J}) * \mathbf{Y}$ $SS_{Error} = \mathbf{Y'} * \mathbf{Y} - \mathbf{b'} * \mathbf{X'} * \mathbf{Y}$ $SS_{Treatment} = \mathbf{Y'} * (\mathbf{H} - \frac{1}{n} * \mathbf{J}) * \mathbf{Y}$ $\mathbf{b} = {(\mathbf{X'} * \mathbf{X})}^{-1} * \mathbf{X'} * \mathbf{Y}$ $\mathbf{I}$ is an NxN identity matrix. $\mathbf{J}$ is an NxN matrix with all elements having the value 1. $\mathbf{H}$ is the Hat matrix equal to $\mathbf{X} * {(\mathbf{X'} * \mathbf{X})}^{-1} * \mathbf{X'}$

# Two-Sample T-Test via GLM

A two-sample T-Test is simply a one-way ANOVA with only two groups. Note that T is the square root of F.

## Example Data

Hight Protein Low Protein
134 80
146 118
104 101
119 85
124 107
161 132
107 94
83
113
129
97
123

# References

Neter, J., Wasserman, W., and Kutner, M.H. 1989. Applied Linear Regression Models. IRWIN, Homewood, IL, Second Edition.

Neter, J., Wasserman, W., and Kutner, M.H. Applied Linear Statistical Models. IRWIN, Homewood, IL, Third Edition.

Howell, David C. 2002. Statistical Methods for Psychology. Duxbury, Pacific Grove, CA, Fifth Edition.