- This Data Set
- General Linear Model (GLM) DODS Setup
If You're at an Organized Course
If you are taking one of the formally organized courses, everything has been set up for you on the provided laptop. The only thing you will need to do is run the following commands in every new terminal window (aka shell) you open throughout this tutorial. Copy and paste the commands below to get started:
setenv SUBJECTS_DIR $TUTORIAL_DATA/buckner_data/tutorial_subjs/group_analysis_tutorial cd $SUBJECTS_DIR/glm
To copy: Highlight the command in the box above, right click and select copy (or use keyboard shortcut Ctrl+c), then use the middle button of your mouse to click inside the terminal window (this will paste the command). Press enter to run the command.
These two commands set the SUBJECTS_DIR variable to the directory where the data is stored and then navigates into this directory. You can now skip ahead to the tutorial (below the gray line).
If You're not at an Organized Course
If you are NOT taking one of the formally organized courses, then to follow this exercise exactly be sure you've downloaded the tutorial data set before you begin. If you choose not to download the data set you can follow these instructions on your own data, but you will have to substitute your own specific paths and subject names. These are the commands that you need to run before getting started:
tcsh source your_freesurfer_dir/SetUpFreeSurfer.csh setenv SUBJECTS_DIR $TUTORIAL_DATA/buckner_data/tutorial_subjs/group_analysis_tutorial cd $SUBJECTS_DIR/glm
Notice the command to open tcsh. If you are already running the tcsh command shell, then the 'tcsh' command is not necessary. If you are not using the tutorial data you should set your SUBJECTS_DIR to the directory in which the recon(s) of the subject(s) you will use for this tutorial are located.
This tutorial is designed to introduce you to the "command-line" group analysis stream in FreeSurfer (as opposed to QDEC which is GUI-driven), including correction for multiple comparisons. While this tutorial shows you how to perform a surface-based thickness study, it is important to realize that most of the concepts learned here apply to any group analysis in FreeSurfer, surface or volume, thickness or fMRI.
This Data Set
The data used for this tutorial is 40 subjects from Randy Buckner's lab. It consists of males and females ages 18 to 93. You can see the demographics here. You will perform an analysis looking for the effect of age on cortical thickness accounting for the effects of gender in the analysis. This is the same data set used in the QDEC Tutorial, and you will get the same result.
General Linear Model (GLM) DODS Setup
Design Matrix/FSGD File
For this example, we will model the thickness as a straight line. A line has two parameters: intercept (or offset) and a slope.
- The slope is the change of thickness with age
- The intercept/offset is interpreted as the thickness at age=0.
Parameter estimates are also called "regression coefficients".
To account for effects of gender, we will model each sex with its own line, meaning that there will be four linear parameters (also called "betas"):
- Intercept for Females
- Intercept for Males
- Slope for Females
- Slope for Males
In FreeSurfer, you can either create your own design matrices, or, if you can specify your design in terms of a FreeSurfer Group Descriptor File (FSGD), FreeSurfer will create them for you. The FSGD file is a simple text file. See this page for the format. The demographics page also has an example FSGD file for this data.
Things to do:
- Create an FSGD file for the above design. One (gender_age.fsgd) already exists so that you can continue with the exercises.
Information on how to create/view text files can be found here.
A contrast is a vector that embodies the hypothesis we want to test. In this case, we wish to test the change in thickness with age after removing the effects of gender. To do this, create a simple text file with the following numbers:
0 0 0.5 0.5
- There is one value for each parameter (so 4 values)
- The intercept/offset values are 0 (nuisance)
- The slope values are 0.5 so as to average the F and M slopes
- A file called lh-Avg-thickness-age-Cor.mtx already exists
Assemble the Data (mris_preproc)
Assembling the data simply means:
- Resampling each subject's data into a common space, and
- Concatenating all the subjects' into a single file
- Spatial smoothing (can be done between 1 and 2)
This can be done in two equivalent ways:
Previously Cached (qcached) Data
When you have run recon-all with the -qcache option, recon-all will resample data onto the average subject (fsaverage) and smooth it at various FWHM (full-width/half-max), usually 0, 5, 10, 10, 20, and 25mm. This can speed later processing. The data for this tutorial have been cached, so run:
mris_preproc --fsgd gender_age.fsgd \ --cache-in thickness.fwhm10.fsaverage \ --target fsaverage --hemi lh \ --out lh.gender_age.thickness.10.mgh
- Only takes a few seconds because the data have been cached
- The FSGD file lists all the subjects, helps keep order.
- The independent variable is the thickness smoothed to 10mm FWHM.
- The data are for the left hemisphere
- The output is lh.gender_age.thickness.10.mgh (which already exists)
Things to do:
- Run mri_info on lh.gender_age.thickness.10.mgh to find its dimensions,ie,
In the case that you have not cached the data, you can perform the same operations manually using the two commands below. There is no need to do this for this tutorial. OPTIONAL: THIS WILL TAKE ABOUT 10 MINUTES
mris_preproc --fsgd gender_age.fsgd \ --target fsaverage --hemi lh \ --meas thickness \ --out lh.gender_age.thickness.00.mgh
- This resamples each subjects data to fsaverage
- Output is lh.gender_age.thickness.00.mgh, which is unsmoothed
OPTIONAL: THIS WILL TAKE ABOUT 5 MINUTES
mri_surf2surf --hemi lh \ --s fsaverage \ --sval lh.gender_age.thickness.00.mgh \ --fwhm 10 \ --cortex \ --tval lh.gender_age.thickness.10B.mgh
- This smooths by 10mm FWHM
- "--cortex" means only smooth areas in cortex (exclude medial wall). This is automatically done with qcache. You can also specify other labels.
- Output is lh.gender_age.thickness.10B.mgh.
GLM Analysis (mri_glmfit)
mri_glmfit \ --y lh.gender_age.thickness.10.mgh \ --fsgd gender_age.fsgd dods\ --C lh-Avg-thickness-age-Cor.mtx \ --surf fsaverage lh \ --cortex \ --glmdir lh.gender_age.glmdir
- Input is lh.gender_age.thickness.10.mgh
- Same FSGD used as with mris_preproc. Maintains subject order!
- DODS is specified (it is the default)
- Only one contrast is used (lh-Avg-thickness-age-Cor.mtx) but you can specify multiple contrasts.
- "--cortex" specifies that the analysis only be done in cortex (ie, medial wall is zeroed out). Other labels can be used.
- The output directory is lh.gender_age.glmdir
- Should only take about 1min to run.
Things to do:
When this command is finished you will have an lh.gender_age.glmdir. There will be a number of output files in this directory, as well as two other directories. If you did an ls in your glmdir there will be:
y.fsgd -- copy of input FSGD file (text) Xg.dat -- design matrix (text) mask.mgh -- binary mask (surface overlay) beta.mgh -- all parameter estimates (surface overlay) rstd.mgh -- residual standard deviation (surface overlay) sar1.mgh -- residual spatial AR1 (surface overlay) fwhm.dat -- average FWHM of residual (text) dof.dat -- degrees of freedom (text) lh-Avg-thickness-age-Cor -- contrast subdirectory mri_glmfit.log -- log file (text, send this with bug reports)
- What was the DOF for this experiment?
- What was the FWHM?
- How many frames does beta have? Why?
There will be a subdirectory for each contrast that you specify. The name of the directory will be that of the contrast matrix file (without the .mtx extension). The lh-Avg-thickness-age-Cor directory will have the following files:
C.dat -- original contrast matrix (text) gamma.mgh -- contrast effect size (surface overlay) F.mgh -- F ratio of contrast (surface overlay) sig.mgh -- significance, -log10(pvalue), uncorrected (surface overlay)
View the uncorrected significance map with tksurfer:
tksurfer fsaverage lh inflated \ -annot aparc.annot -fthresh 2 \ -overlay lh.gender_age.glmdir/lh-Avg-thickness-age-Cor/sig.mgh
You should see:
The threshold is set to 2, meaning vertices with p<.01, uncorrected, will have color.
- Blue means a negative correlation (thickness decreases with age), red is positive correlation.
- Click on a point in the Precentral Gyrus. What is it's value? What does it mean?
Viewing the medial surface, change the overlay threshold to something very, very low (say, .01):
View->Configure->Overlay, set "Min" .01
You should see:
- Almost all of cortex now has color
- The non-cortical areas are still blank (0) because they were excluded with --cortex in mri_glmfit above.
Clusterwise Correction for Multiple Comparisons
Note: The method used is based on: Smoothing and cluster thresholding for cortical surface-based group analysis of fMRI data. Hagler DJ Jr, Saygin AP, Sereno MI. NeuroImage (2006).
To perform a cluster-wise correction for multiple comparisons, we will run a simulation. The simulation is a way to get a measure of the distribution of the maximum cluster size under the null hypothesis. This is done by iterating over the following steps:
- Synthesize a z map
- Smooth z map (using residual FWHM)
- Threshold z map (level and sign)
- Find clusters in thresholded map
- Record area of maximum cluster
Repeat over desired number of iterations (usually > 5000)
In FreeSurfer, this information is stored in a simple text file called a CSD (Cluster Simulation Data).
Once we have the distribution of the maximum cluster size, we correct for multiple comparisons by:
- Going back to the original data
- Thresholding using same level and sign
- Finding clusters in thresholded map
- For each cluster, p = probability of seeing a maximum cluster that size or larger during simulation.
Run the simulation
All these steps are performed with the mri_glmfit-sim:
mri_glmfit-sim \ --glmdir lh.gender_age.glmdir \ --sim mc-z 5 4 mc-z.negative \ --sim-sign neg --cwpvalthresh 0.4\ --overwrite
- Specify the same GLM directory (--glmdir)
- The simulation type is Z Monte Carlo (mc-z)
- Specify the sign (neg for negative, pos, or abs)
Vertex-wise threshold of 4 (p < .0001)
Only 5 iterations (usually > 5000)
- CSD files will have mc-z.negative base
"--cwpvalthresh 0.4" Keep clusters that have cluster-wise p-values < 0.4 (usually you will want something like .05). To see all clusters, set to .999
- --overwrite deletes previous CSD files
This has been run this data with 1000 iterations (CSD base of mc-z.neg). This can often take many hours.
View CSD File
The CSD files are stored in lh.gender_age.glmdir/csd. Look at one of them:
Or click here.
To end the less command, type 'q' for quit.
- Rows that begin with '#' are headers or comments.
- Each data row is an iteration.
- The 3rd column is the maximum cluster size for that iteration.
- The CSD file itself is not particularly important, so don't worry too much about it.
View the Corrected Results
In the contrast subdirectory, you will see several new files, all beginning with mc-z:
mc-z.neg4.sig.cluster.summary - summary of clusters (text) mc-z.neg4.sig.cluster.mgh - cluster-wise corrected map (overlay) mc-z.neg4.sig.ocn.annot - annotation of clusters
First, look at the cluster summary (or click here):
- This is a list of all the clusters that were found (38 of them)
- The CWP column is the cluster-wise probability (the number you are interested in). It is a simple p (ie, NOT -log10(p)).
- Note that ALL clusters are found, regardless of their significance.
- Cluster number 1 has a CWP of p=.001.
- Cluster number 21 has a CWP of p=.077.
Load the cluster annotation in tksurfer
tksurfer fsaverage lh inflated \ -annot lh.gender_age.glmdir/lh-Avg-thickness-age-Cor/mc-z.neg4.sig.ocn.annot \ -fthresh 2 -curv -gray
You should see:
- These are all clusters, regardless of significance.
- When you click on a cluster, the label will tell you the cluster number (eg, cluster-021).
Change the label mode to "Outline" by hitting the outline button . Now load the cluster p-value overlay with
File->LoadOverlay Click "Browse..." Next to where it says "Selection", paste (middle click with a 3-button mouse) the following: lh.gender_age.glmdir/lh-Avg-thickness-age-Cor/mc-z.neg4.sig.cluster.mgh Hit OK Hit OK
You should see:
Things to do:
- Find and click on cluster 1. It is blue and has a value of -3 since this is log10(.001). The -3 is because the correlation is negative.
- Find and click on cluster 21. Its value is -1.11 because this is log10(.077). Note that it is transparent because its significance
is worse than the threshold we set (-fthresh 2, p < .01).
- All vertices within a cluster are the same value (the p-value of the cluster).
- You can changed the cluster-wise threshold by clicking on
View->Configure->Overlay, and setting the "Min" to your desired level. As you do this, some clusters will change transparency.