FsTutorial/GlmReview - Free Surfer Wiki

Linear modeling describes the observed data as a linear combination of explanatory factors plus noise, and determines how well that description explains the data being analyzed.

For group morphometric analysis, the observed data is comprised of a set of surface measures (such as cortical thickness) at each vertex in a surface model, for each subject in the group. This data can be organized as a set of vectors, each associated with a different vertex in the surface model, and containing a surface measurement for every subject in the group at the corresponding vertex.

First, a linear model must be designed to include all explanatory variables (EVs) that may account for each vector's values. A simple linear model is given by y=a*x+b+e, where the observed data y is a one-dimensional vector of surface measures -- one measurement per subject at a vertex; x is a one-dimensional vector containing a variable, such as age, describing each subject; a is the parameter estimate (PE) for x, for instance the value that a subject's age must be multiplied by to fit the data in y; b is a constant, and in this example, would correspond to the baseline measurement present in the data; and e is the error in the model fitting. If an additional explanatory variable is added to explain the observed data, the model would be given as y=a1*x1+a2*x2+b+e, containing two different model waveforms, a1*x1 and a2*x2, corresponding to two variables, such as age and gender, describing all subjects in the study.

2.1 Estimation overview
Once the model is specified, an estimation step follows, in which the model is fit to each vertex's vector separately; no interactions between vertices are taken into account in the examples presented here. This step generates the estimate of the "goodness of fit" of each of the explanatory variables to each vector of surface measurements. Thus if a particular vertex responds strongly to the explanatory variable x1, a large value for a1 will be produced by model-fitting; if the data appears unrelated to x2 then a2 will have a very small value.

This kind of linear modeling is commonly expressed in matrix notation, where the the matrix X contains all the explanatory variables (designed effects and confounds) in the model, and the matrix A contains all the PEs. The matrix X is also commonly called the design matrix and it can be user-specified in FreeSurfer in the form of an FSGD (FreeSurfer Group Descriptor) file, as the exercises below illustrate. Each column of X corresponds to a different explanatory variable (also called a regressor or a covariate). As typically formulated and solved, the estimation step produces a set of estimates of the PEs, which in turn are used in hypothesis testing.

2.2 Inference overview
Estimates of the PEs can be converted into statistical parametric maps, which are commonly visualized as a color-coded surface overlay. The overlay assigns each vertex a value based on the likelihood that the null hypothesis is false at that vertex. A linear combination of estimates of PEs is used to encode the particular hypothesis of interest. This encoding is accomplished with a user-specified ''contrast vector'', which assigns a contrast weight to each column of the design matrix. A simple example of a contrast vector that tests the null hypothesis for the explanatory variable associated with the first design matrix column would be[ 1 0 0 0...]. To compute this particular contrast at each vertex, the PE value associated with the first design matrix column at that vertex is divided by the error in its estimate, yielding a t-value. The t-value provides a good measure of confidence in the estimate of the PE value, and can be converted into a probability (P) or Z statistic at that vertex via a standard statistical transformation. T, P and Z values all convey the same information about how significantly the observed data is related to a given explanatory variable.

A t-value map can be produced for each explanatory variable of interest. Each map indicates how strongly vertices on the surface are related to one explanatory variable. Parameter estimates can also be compared to see if one explanatory variable is more strongly related to the data than another. To encode this kind of hypothesis, one PE is subtracted from another using a "contrast" vector such as [1 -1 0 0 ...], a combined standard error is computed, and a new t-map is generated. In a similar fashion, to test for a more complicated collection of effects, a matrix of contrast weights can be specified. A more rigorous description of single and multiple linear regression and GLM, types of analyses, estimation and hypothesis testing is available at http://www.statsoft.com/textbook/stglm.html.