Contents
BACKGROUND THEORY
A standard nonparametric test is exact, in that the false positive rate is exactly equal to the specified α level. Using randomise with a GLM that corresponds to one of the following simple statistical models will result in exact inference:
One sample t-test on difference measures Two sample t-test One-way ANOVA Simple correlation Use of almost any other GLM will result in approximately exact inference. In particular, when the model includes both the effect tested (e.g., difference in FA between two groups) and nuisance variables (e.g., age), exact tests are not generally available. Permutation tests rely on an assumption of exchangeability; with the models above, the null hypothesis implies complete exchangeability of the observations. When there are nuisance effects, however, the null hypothesis no longer assures the exchangeability of the data (e.g. even when the null hypothesis of no FA difference is true, age effects imply that you can't permute the data without altering the structure of the data).
Permutation tests for the General Linear Model
For an arbitrary GLM randomise uses the method of Freeman & Lane (1983). Based on the contrast (or set of contrasts defining an F test), the design matrix is automatically partitioned into tested effects and nuisance (confound) effects. The data are first fit to the nuisance effects alone and nuisance-only residuals are formed. These residuals are permuted, and then the estimated nuisance signal is added back on, creating an (approximate) realization of data under the null hypothesis. This realization is fit to the full model and the desired test statistic is computed as usual. This process is repeated to build a distribution of test statistics equivalent under the null hypothesis specified by the contrast(s). For the simple models above, this method is equivalent to the standard exact tests; otherwise, it accounts for nuisance variation present under the null. Note, that randomise v2.0 and earlier used a method due to Kennedy (1995). While both the Freedman-Lane and Kennedy methods are accurate for large n, for small n the Kennedy method can tend to false inflate significances. For a review of these issues and even more possible methods, see Anderson & Robinson (2001)
This approximate permutation test is asymptotically exact, meaning that the results become more accurate with an ever-growing sample size (for a fixed number of regressors). For large sample sizes, with 50-100 or more degrees of freedom, the P-values should be highly accurate. When the sample size is low and there are many nuisance regressors, accuracy could be a problem. (The accuracy is easily assessed by generating random noise data and fitting it to your design; the uncorrected P-values should be uniformly spread between zero and one; the test will be invalid if there is an excess of small P-values and conservative if there is a deficit of small P-values.)
Monte Carlo Permutation Tests
A proper "exact" test arises from evaluating every possible permutation. Often this is not feasible, e.g., a simple correlation with 12 scans has nearly a half a billion possible permutations. Instead, a random sample of possible permutations can be used, creating a Monte Carlo permutation test. On average the Monte Carlo test is exact and will give similar results to carrying out all possible permutations.
If the number of possible permutations is large, one can show that a true, exhaustive P-value of p will produce Monte Carlo P-values between p ± 2√(p(1-p)/n) about 95% of the time, where n is the number of Monte Carlo permutations. The table below shows confidence limits for p=0.05 for various n. At least 5,000 permutations are required to reduce the uncertainty appreciably, though 10,000 permutations are required to reduce the margin-of-error to below 10% of the nominal alpha.
n |
Confidence limits |
100 |
0.0500 ± 0.0436 |
1,000 |
0.0500 ± 0.0138 |
5,000 |
0.0500 ± 0.0062 |
10,000 |
0.0500 ± 0.0044 |
50,000 |
0.0500 ± 0.0019 |
In randomise the number of permutations to use is specified with the -n option. If this number is greater than or equal to the number of possible permutations, an exhaustive test is run. If it is less than the number of possible permutations a Monte Carlo permutation test is performed. The default is 5000, though if time permits, 10000 is recommended.
Counting Permutations
Exchangeabilty under the null hypothesis justifies the permutation of the data. For n scans, there are n! (n factorial, n×(n-1)×(n-2)×...×2) possible ways of shuffling the data. For some designs, though, many of these shuffles are redundant. For example, in a two-sample t-test, permuting two scans within a group will not change the value of the test statistic. The number of possible permutations for different designs are given below.
Model |
Sample Size(s) |
Number of Permutations |
One sample t-test on difference measures |
n |
2n |
Two sample t-test |
n1,n2 |
(n1+n2)! / ( n1! × n2! ) |
One-way ANOVA |
n1,...,nk |
(n1+n2+ ... + nk)! / ( n1! × n2! × ... × nk! ) |
Simple correlation |
n |
n! |
Note that the one-sample t-test is an exception. Data are not permuted, but rather their signs are randomly flipped. For all designs except a one-sample t-test, randomise uses a generic algorithm which counts the number of unique possible permutations for each contrast. If X is the design matrix and c is the contrast of interest, then Xc is sub-design matrix of the effect of interest. The number of unique rows in Xc is counted and a one-way ANOVA calculation is used.