# WARNING

THIS DOCUMENT IS IN DEVELOPMENT AND DESCRIBES FUTURE VERSIONS OF CARET

# Descriptive Statistics

Descriptive statistics provide information about the data such as the mean (average), median (middle value), mode (most common value), standard deviation, and variance. When computing the standard deviation, one must know if the data values represent the entire population in which case division is by N (number of items) or the data values are a subsample of the population in which case division is by N - 1.

## Population Descriptive Statistics

• Population Mean $\mu = \frac{\sum_{i=1}^N x_i}{N}$
• Population Standard Deviation $\sigma = \sqrt{\frac{\sum_{i=1}^N (x_i - \mu)^2}{N}}$ OR $\sigma = \sqrt{\frac{\sum_{i=1}^N x_i^2 - \frac{(\sum_{i=1}^N x_i)^2}{N}}{N}}$
• Population Variance = σ2
• Standard Deviation of the Mean $SD_{\overline{x}} = \frac{\sigma}{\sqrt{N}}$

## Sample Descriptive Statistics

• Sample Mean $M = \frac{\sum_{i=1}^N x_i}{N}$
• Sample Standard Deviation $S = \sqrt{\frac{\sum_{i=1}^N (x_i - M)^2}{N-1}}$ OR $S = \sqrt{\frac{\sum_{i=1}^N x_i^2 - \frac{(\sum_{i=1}^N x_i)^2}{N}}{N-1}}$
• Sample Variance = S2
• Standard Error of the Mean $SE_{\overline{x}} = \frac{S}{\sqrt{N}}$

## Miscellaneous Descriptive Statistics

• Z-Score $Z = \frac{x_i - \mu}{\sigma}$

# Inferential Statistic Tests

## Parametric Inferential Tests

For parametric tests, the data is assumed to be in a specific probability distribution, typically the normal (gaussian) distribution.

### ANOVA (Analysis of Variance), One Way

A one-way ANOVA determines if the mean values at each node for two or more groups of subjects are statistically different. The groups being compared are allowed to have a different number of subjects.

K = Number of Groups

N = Total Number of Subjects

Ni = Number of Subjects in Group "i"

dfTotal = N − 1

$df_{Error} = \sum_{i=1}^{K} (N_i - 1) = N - K$

dfTreatment = K − 1

Xij = Measurement for subject "j" in group "i"

Mean of group i, $\bar{X_i} = \frac{\sum_{j=1}^{N_i} x_{ij}} {N_i}$

Grand Mean, $\bar{X_{..}} = \frac{\sum_{i=1}^{K} \sum_{j=1}^{N_i} X_{ij}}{N}$

$SS_{Total} = \sum_{i=1}^{K} \sum_{j=1}^{N_i} (X_{ij} - \bar{X_{..}})^2$

$SS_{Error} = \sum_{i=1}^{K} \sum_{j=1}^{N_i} (X_{ij} - \bar{X_i})^2$

$SS_{Treatment} = \sum_{i=1}^{K} N_i (\bar{X_i} - \bar{X_{..}})^2$

SSTotal = SSWithin + SSTreatment

$MS_{Treatment} = \frac{SS_{Treatment}} {df_{Treatment}}$

$MS_{Error} = \frac{SS_{Error}} {df_{Error}}$

$F = \frac{MS_{Treatment}} {MS_{Error}}$

If the ANOVA is run with two groups of data, the F-statistic is equivalent to the square of the T-Statistic produced by a Two-Sample T-Test.

### T-Test, One-Sample (Single Sample)

A one-sample T-Test determines if the mean value at each node is statistically different than a specified value, often zero.

t = $\frac{\mathrm{M} - \mu}{\sqrt{\frac{\mathrm{s}^2}{N}}}$

df = N − 1

### T-Test, Paired (Dependent Means)

A paired T-Test determines if mean at each node is statistically different for two measurements (X and Y) on one group of subjects.

$\overline{D} = \frac{\sum_{i=1}^N (x_i - y_i)}{N}$

t = $\frac{\overline{D} - \mu}{\sqrt{\frac{\mathrm{s}^2}{N}}}$

df = N − 1

### T-Test, Two-Sample (Independent Means)

A two-sample T-Test determines if the means at each node for two groups of subjects are statistically different. The groups being compared are allowed to have a different number of subjects.

#### Equal (Pooled) Variances

$S^2 = \frac{ \sum_{i=1}^{N_1} (x_i - \overline{x}_1)^2 + \sum_{j=1}^{N_2} (x_j - \overline{x}_2)^2} {N_1 + N_2 - 2}$

$t = \frac{\overline{x}_1 - \overline{x}_2} { \sqrt{S^2(\frac{1}{N_1} + \frac{1}{N_2})} }$

df = N1 + N2 − 2

#### Unequal (Unpooled) Variances

$S_1^2 = \frac{\sum_{i=1}^{N_1} (x_i - \overline{x}_1)^2} {N_1 - 1}$

$S_2^2 = \frac{\sum_{j=1}^{N_2} (x_j - \overline{x}_2)^2} {N_2 - 1}$

$t = \frac{\overline{x}_1 - \overline{x}_2} {\sqrt{\frac{S_1^2}{N_1} + \frac{S_2^2}{N_2} }}$

$d\mathit{f} = \frac{(\frac{S_1^2}{N_1} + \frac{S_2^2}{N_2})^2} {\frac{(\frac{S_1^2}{N_1})^2}{N_1 - 1} + \frac{(\frac{S_2^2}{N_2})^2}{N_2 - 1} }$

## Non-Parametric (Distribution Free) Inferential Statistic Tests

For non-parametric tests, no assumptions are made about the distribution of the data.

# caret_stats

caret_stats is a command line program that performs statistical operations on GIFTI surface data files. The first parameter indicates the operation that will be performed. Run the command with just the operation for help information.

The program is written in Java and requires the Java SE Development Kit (JDK) for optimal execution. If you are using a Mac, Java is already installed and you can skip this step. If you are running Linux or Windows, you must download the Java JDK. The Java Development Kit is downloaded from http://java.sun.com/javase/downloads/index.jsp. Download and install the Java SE Development Kit (JDK). You must set the "path" environment variable to the Java installation's "bin" directory so that "java" can be run from the command line.

Note: Do not use the Java Runtime Environment. It does not support Java's "-server" option which reduces the runtime of caret_stats by fifty percent. If you get the error message "No Server JVM" you are using JRE, not JDK.

After Java is installed, download the caret6 distribution. Install in the desired location such as "Program Files" on Windows, "/Applications" on a Mac, or "/usr/local" on Linux. When the distribution is unzipped, it will create the subdirectory "caret6". Located in the caret6 directory are several directories whose names being with "bin". You must update your PATH environment variable to point to the appropriate "bin" directory so that "caret_stats" can be run from the command line. In addition, Windows users will need to set the environment variable CARET6_HOME to the full path of the caret6 directory (eg: C:\caret6).

## Descriptive Statistical Operations

• -descriptive Mean, standard deviation, etc.

## Inferential Statistical Operations

The purpose of the inferential statistic is to take the input files, perform a statistical test at each node, and create a new file containing one or more statistical measurements (F, T, Z, etc) at each node.

## Performing Inferential Statistical Tests in Caret

Inferential statistical tests in Caret are performed on metric or surface shape files. All of the data (metric or shape files) must be on a co-registered surface so that all data files have the same number of nodes and each node number i is "in register" across subjects (i.e., all subjects' surfaces have undergone surface-based registration using Caret, Freesurfer, CIVET, or other software).

The goal is to find clusters (regions) that are statistically different between the groups of input data. That is, one can reject the null hypothesis which states that the metric/shape values at each node are essentially the same.

The steps in Caret are:

1. Run the input files through an inferential statistical test to produce the statistic file and the randomized statistic file.
2. Perform a significance test to assign P-Values to the statistic file.

Each of the inferential tests in Caret produces two files. The statistic file contains the results of the statistical test performed on the input data. The randomized statistic file contains columns with the same statistical test performed on randomly assigned groups of the input data. This randomized file is used during significance testing.

### One Sample T-Test

-inferential-t-test-one-sample

### Paired T-Test

-inferential-t-test-paired

### Two-Sample T-Test

-inferential-t-test-two-sample Two sample T-Test with or without pooled variance.

### Interhemispheric Clusters

-inferential-interhemispheric

The interhemispheric clusters test is used to determine asymmetry (and symmetry) between the left and right hemispheres of two groups of subjects. All subjects left and right hemispheres must be co-registered to an atlas, typically the PALS atlas.

Inputs:

• AL is group A, left hemispheres.
• AR is group A, right hemispheres.
• BL is group B, left hemispheres.
• BR is group B, right hemispheres.
• ITER_LEFT_RIGHT is the number of iterations for T-Statistics of random combinations of left or right subjects.
• ITERATIONS is the number of iterations for the randomized T-Statistic file.

Algorithm:

• Create TL, a T-Statistic metric file comparing the left hemispheres of the two groups, TL = T-Statistic(AL, BL).
• Create TR, a T-Statistic metric file comparing the right hemispheres of the two groups, TR = T-Statistic(AR, BR).
• Create TP, a metric file containing the product of the left and right T-Statistic, TP = TL * TR.
• Create RANDTL, a metric file containing T-Statistics for ITER_LEFT_RIGHT randomized combinations of the left hemispheres from both groups, RANDTL = T-Statistic(RandomCombinations(AL, BL)).
• Create RANDTR, a metric file containing T-Statistics for ITER_LEFT_RIGHT randomized combinations of right hemispheres from both groups RANDTR = T-Statistic(RandomCombinations(AR,BR)).
• Create RANDTP, a metric file containing ITERATIONS random combinations of the product of one column from each of the left and right T-Statistic randomized files, RANDTP = RandomColumn(RANDTL) * RandomColumn(RANDTR).

Output:

• TP is the statistic file for input to the significance testing command.
• RANDTP is the randomized statistic file for input to the significance testing command.

### Coordinate Difference Analysis of Variance

In coordinate difference analysis of variance, the input data are coordinate files from participants that are in two or more groups. In the ANOVA equations shown previously, Xi, in the case of coordinate difference ANOVA, is a three-dimensional coordinate. A subtraction operation, such as $(X_{ij} - \bar{X_i})$ is the Euclidean (straight line) distance between two coordinates.

In the numerator of the F-Statistic is $SS_{Treatment} = \sum_{i=1}^{K} N_i (\bar{X_i} - \bar{X_{..}})^2$. In the parentheses is the distance between a group average coordinate and the population average coordinate (the average of all coordinates). If the participants are all from the same population, each of the group average coordinates will be very close to the population average coordinate and this quantity will be small. If participants are from different populations, the group average coordinates will be different than the population average coordinate and this quantity will be large.

In the denominator of the F-Statistic is $SS_{Error} = \sum_{i=1}^{K} \sum_{j=1}^{N_i} (X_{ij} - \bar{X_i})^2$. In the parenthesis is the distance between the coordinate of each participant in the group and the average coordinate for the group. When the participants in a group are spatially clustered this quantity will be small. When the participants in a group are spatially separated, this quantity will be large.

Consider the two-dimensional examples below. In each example, there are two groups of data with each participant labels as "O" and "+". The average coordinate for each group is "(O)" and "(+)" with the population average coordinate at "(A)".

In the plot below, both groups appear to be from the same population. As a result, SSTreatment will be small, resulting in a small F-Statistic and one is unable to reject the null hypothesis.

In the plot below, the average coordinates of the two groups are spatially separated resulting in SSTreatment being large. In addition, the groups are spatially clustered resulting in SSError being small. As a result, the numerator is large and the denominator small creating a large F-Statistic and the rejection of the null hypothesis.

### Coordinate Difference

NOTE: At this time, coordinate difference is not implemented in caret_stats.

Definitions:

• Nx is the number of participants in group X.
• D(i,j) = $\sqrt{ {(X_i - X_j)}^2 + {(Y_i - Y_j)}^2 + {(Z_i - Z_j)}^2}$ (The Euclidean distance between two three-dimensional points.)

• AVGxj is the average coordinate at node j for group x.

• Xdev = $\sqrt{\frac{\sum_{i=1}^{N_x} \sum_{j=1}^M D(XYZ_{ij},AVG_{xj})^2}{N_x - 1}}$, where Nx is the number of participants in group X and M is the number of nodes.

Algorithm:

• Create Aavg, the average coordinate file for group A.
• Create Bavg, the average coordinate file for group B.
• Create Adev, the deviations at each node for group A.
• Create Bdev, the deviations at each node for group B.
• If the mode is COORD_DIFF, create the statistic-file where the statistic at each node is D(Aavg,Bavg).
• If the mode is TMAP_DIFF, create the statistic-file where the statistic at each node is $\frac{D(A_{avg}, B_{avg})}{\sqrt{A_{dev} + B_{dev}}}$
• Create the randomized-statistic-file file. For each column in it, create two coordinate files that are randomized combinations from all of the input coordinate files on which the COORD_DIFF or TMAP_DIFF test is performed.

What Donna desires and matches the formula for an Unpooled Two-Sample T-Test

$\frac{D(A_{avg}, B_{avg})}{\sqrt{\frac{{A_{dev}}^2}{N_A} + \frac{{B_{dev}}^2}{N_B}}}$

## Significance Testing

Significance testing in Caret is a non-parametric technique involving randomization (bootstrapping???).

Two data files are required for significance testing. The first is the file containing the test statistic. The second file is the "randomized statistic" file that contains test statistics from many random combinations of the test subjects.

### Randomization

Randomization testing is used to determine the P-Values.

#### Randomization With One Group of Subjects

When there is one group of subjects, such as in a one-sample T-Test, it is not possible to randomize among groups. So, the randomization is performed by randomly flipping the signs of the values for each subject. The statistical test is then run on each of these randomizations and the largest clusters are identified.

#### Randomization With Multiple Groups of Subjects

With multiple groups of subjects, all of subjects are placed into a pool. Subjects are then randomly drawn from the pool and placed into new groups. The new groups contain the same number of subjects as the original groups. When randomizing subjects, each new randomization of subjects should be unique when compared to any previously generated groups of subjects. Statistical tests are then run on each of these randomizations and the largest clusters are identified.

Given a group of three subjects, choosing two at a time, there are 3 combinations and 6 permutations. For example, selecting two subjects from {A,B,C} results in the combinations {A,B}, {A,C}, and {B,C} and results in the permutations {A,B}, {A,C}, {B,C}, {B,A}, {C,A}, and {C,B}. Basically, with combinations, two groups of elements are equal if they contain the same elements, in any order (ie: {A,B}, and {B,A} are equivalent). With permutations, two groups of elements are equal only if they contain the same elements in an identical order (ie: {A,B} and {B,A} are NOT equivalent).

Mathematical formulas for the number of permutations and combinations when choosing k elements from a total of n elements:

P(n,k) = $\frac{n!} {(n - k)!}$

C(n,k) = $\frac{n!}{k!(n-k)!}$

### P's and Q's

The significance tests in Caret produce both P and Q values. Q is simply 1 - P. Q is useful for thresholding in Caret. One selects the statistic for viewing and thresholds with Q. Since Caret thresholds by inhibiting the display of data BELOW the threshold, one can threshold with Q and set the threshold to 0.95 to see statistics with a P-Value of 0.05 or less.

### Cluster Based Thresholding

For cluster-based threshold significance testing use "caret_stats -significance-cluster-threshold".

• The user provides positive and negative thresholds and a desired significance level (P-Value, eg: 0.05).
• Clusters of nodes passing the threshold tests are identified in the statistic file. Note that positive and negative values are processed separately.
• The largest cluster is identified in each column of the randomized statistic file using the thresholds.
• The clusters identified from the randomized statistic file are ranked based upon surface area (possibly corrected for surface distortion).
• The user provided P-Value is multiplied by the number of columns in the randomized statistic file (eg: 0.05 * 500 = 25) providing the significant cluster rank. The cluster at this rank is identified and its surface area is noted as the "significant surface area".
• For each cluster in the statistic file, use its surface area and determine how it ranks in the ranked randomized clusters. Set the P-Value for the statistic file's cluster to its ranking divided by the total number of columns in the randomized file. For example if the statistic cluster is ranked 3 out of 100, the cluster receives a P-Value of 0.03.

The difficult part of cluster-based thresholding is selecting the thresholds. There is no "correct" threshold value. In general, smaller thresholds result in either or both more clusters and larger clusters and larger thresholds result in either or both fewer clusters and smaller clusters.

### Threshold-Free Cluster Enhancement (TFCE)

For threshold-free cluster enhancement significance testing use "caret_stats -significance-threshold-free".

The difficulty of selecting a threshold in cluster-based thresholding led to the development of threshold-free cluster enhancement (See Smith and Nichols in the References section at the bottom of this page). With threshold-free cluster enhancement, the user does not need to choose thresholds.

• Apply the TFCE transform to the statistic in the statistic file.
• Apply the TFCE transform to all columns in the randomized statistic file.
• Find the largest TFCE value in each column of the TFCE transformed randomized statistic file and rank them.
• The user provided P-Value is multiplied by the number of columns in the randomized statistic file (eg: 0.05 * 500 = 25) providing the significant TFCE rank. The TFCE at this rank is identified and its value is noted as the "significant TFCE value".
• For each node in the statistic file, use its TFCE value and determine how it ranks in the ranked, randomized maximum TFCE values. Set the P-Value for the statistic file's node to its ranking divided by the total number of columns in the randomized file. For example if the statistic node TFCE is ranked 3 out of 100, the node receives a P-Value of 0.03.

The value of the TFCE output at node p where node p has a positive input value is given by the following integral:

$TFCE(p) = \int_{h_0}^{h_f} e(h, p)^Eh^Hdh$, where h is a threshold, e(h,p) is the area of the cluster containing node p at threshold h (in Caret, the sum of the surface areas of the nodes in the cluster), and h0 and hf are typically zero and the highest value of a node in the surface, respectively. E and H are constants (default values 1.0 and 2.0 for surfaces) that define what shape and size of clusters it is most sensitive to. In practice, this integral is approximated numerically, due to the cluster size varying unpredictably with height. At a high level, our approach was to use many thresholds, computing the approximate integral for each "slice" of each cluster with the trapezoidal rule. In order to obtain the values for negative nodes, the input values are sign flipped, run through the same process, and then sign flipped to be negative again. See Caret:Documentation:Statistics:TFCE_Implementation for details.

#### Flat Surface with Z-Coordinate set to TFCE-Enhanced T-Statistic

The significance testing commands have a parameter named "-number-of-threads". Threads allow a task to be broken down into pieces that may be run in parallel and take advantage of either multiple processors or multi-core processors. Using threads will typically reduce the execution time of the command if more than one logical processor is available.

# References

## Books

• Howell, David C. (2002) Statistical Methods for Psychology. Pacific Grove, CA: Duxbury.

## Journal Articles

• Nonparametric Permutation Test For Functional Neuroimaing: A Primer with Examples. Thomas E. Nichols and Andrew P. Holmes. Human Brain Mapping 15:1
• Threshold-Free Cluster Enhancement: Addressing problems of smoothing, threshold dependence and localisation in cluster inference. Stephen M. Smith and Thomas E. Nichols.NeuroImage 2009 44(1)