Orginally hosted on http://www.math.mcgill.ca/keith

Moved here to ensure continuity.

Download it here!

SurfStat is a Matlab toolbox for the statistical analysis of univariate and multivariate surface and volumetric data using linear mixed effects models and random field theory. It is inspired by Jason Lerch's Thickness Statistics written in R, and Jonathan Taylor's BrainStat, part of NIPY, written in Python. It is intended for cortical thickness data on triangular meshes, either for the whole cortex or one for each hemisphere. It will handle any triangulated surface data, written in FreeSurfer or MNI object format. The only requirement is that the triangulation scheme must be the same for all surfaces, i.e. the data must be registered to a common surface.

Worsley, K.J., Taylor, J.E., Carbonell, F., Chung, M.K., Duerden, E., Bernhardt, B., Lyttelton, O., Boucher, M., Evans, A.C. (2009). SurfStat: A Matlab toolbox for the statistical analysis of univariate and multivariate surface and volumetric data using linear mixed effects models and random field theory. *NeuroImage*, OHBM poster, accepted.

- SurfStat now does statistical analysis for volumetric data, e.g. VBM, DBM and PET data.
- New viewers added for volumetric data.
- Memory mapping has been added to cope with very large data sets. There is now essentially no limit on the size of the data, but you will need some free disk space, equal to twice the size of the data, for scratch space.
- Its main engine fits fixed effects and mixed effects, univariate and multivariate, linear models
and makes inference using T, F, Hotelling's T
^{2}and Roy's maximum root statistics. - Random field theory (RFT) for peaks and clusters, as well as False Discovery Rate (FDR).
- Model formula rather than a design matrix for specifying the linear model.
- Mixed effects models fitted by ReML.
- Off-the-shelf Matlab graphics that are ready to publish.
- FAST! everything loaded into memory.
- Truly interactive analysis, no need for batch.
- A mere 200K of code ...

Thanks to Oliver Lyttelton for suggesting the multivariate statistics, Moo Chung and Cate Hartley for help with FreeSurfer, Boris Bernhardt for helping with the mixed effects, and Emma Duerden for constant feedback that has really improved the package.

`C:\Program Files\MATLAB\R2006a\toolbox\local/surfstat/`

.
`C:\Program Files\MATLAB\R2006a\toolbox\local/surfstat/`

`doc/SurfStat`

directory.
You can get complete on-line documentation
about any Matlab function, including the ones in SurfStat,
by typing `help`

followed by the
name of the function, e.g.

First let's read in one subject whose id is 100:

and perhaps save this as a .jpg file for publication or PowerPoint:

The point is marked by the little black and white square in the top centre figure. Its id is 66180, so you can type

For further customizing, see the code of the `SurfStatViewData.m`

function.

You may just want to work with one hemisphere:

Finally a FreeSurfer example, courtesy Moo Chung:

You can check that the FreeSurfer surface has no topological deformities by calculating its Euler characteristic (EC):

`SurfStatListDir.m`

to list the contents of a directory:
You might want to also exclude the brain stem. To do this, I clicked on where I thought the centre of the brain stem was: (0; -16; -8)mm, and I clicked a bit further out to guess the radius: 20mm. Then you can add a ball-shaped ROI to the mask as follows (see ROI):

Since I don't really know where the brain stem is, I shall just stick to the first mask in what follows.

`glim_image`

, i.e. a text file with fields for the thickness
file names, and the covariates:
`textread`

:
To read in the actual thickness data from the above files:

Let's take a look at the mean thickness for vertices:

We can save this for later:

As expected, thickness decreases with age. The line looks a bit faint, and the markers are too small. You can change the line width and marker size as follows:

or you can edit the figure by clicking on the "Edit Plot" button (the one with the little arrow) in the Figure toolbar, then double-clicking on the feature you want to edit.

For gender:

Males appear to have overall thicker cortex than females.

FreeSurfer group registered data is often stored in two `.mgh`

files. In the following example, data on 18 subjects is in two files. To read these in:

`Y`

is an 18 × 327684 matrix of cortical thickness data on the 18 subjects. To read in one of the surfaces:
It looks as if this is a "whole brain" component,
i.e. the thickness on the entire surface is positively correlated.
An explanation is that some subjects have overall thick cortex, some
have overall thin cortex, perhaps due to differences in contrast at
the grey-white boundary. See `SurfStatNorm.m`

to normalize the
cortical surface data, either by subtracting or dividing the
global mean. However before doing this, let's plot the
subject component against age:

The global effect decreases with age; in other words, it could be just an age-related atrophy effect. This makes sense: since the spatial component is roughly constant across the brain, then the subject component must be the global average, which we have already noted decreases with age.

Now let's look at the second component, which explains a mere 4.1% of the variability:

Plotting the subject component against gender:

We note that the second component might be a gender effect. Females score positively on this component, so they have higher cortical thickness in the red areas, lower cortical thickness in the blue areas, compared to males.

Higher order components seem to be more difficult to interpret.

`term`

. For the
technically minded, a `term`

has two components: a matrix, and a cell of strings
for the names of the columns. The function `term.m`

converts a vector or a
cell array of strings to a `term`

, which is displayed as follows:
`double`

and `char`

convert
a `term`

back to its matrix and name components.
The operators `+`

, `*`

, `^`

and `-`

have been redefined (overloaded) for `term`

s
so that they match the operation of terms in a model formula.
You now write down a model formula (without the coefficients)
e.g. a main effect of Age and Gender:
(The empty space on the right is reserved for the design matrix for the
variance of a mixed effects model.)
Now let's fit the linear model, saving the results in `slm`

:

`slm`

by typing
`slm.X`

is the design matrix,
`slm.df`

is the error degrees of freedom, `slm.coef`

is the coefficients, `slm.SSE`

is
error sum of squares, `slm.tri`

is the list of triangle indices and `slm.resl`

is a function of the resels along each edge.
Normally you never need to look at any of these things!
`help SurfStatT`

for more information on this):
`contrast`

is a numeric 147 × 1 vector. Note how `Gender.male`

gives you the numeric indicator variable for
`male`

, i.e. 1 if the observation is `male`

, and 0 otherwise. Then to get the T statistic:
`slm.c`

. It equals [0 0 -1 1] (to machine accuracy), i.e. column 4 - column 3. `slm.k`

is the number of variates (univariate in this case). Effects are in `slm.ef`

, and their standard deviations (standard
errors) are in `slm.sd`

.
The T statistic is in `slm.t`

, and by multiplying it by `mask`

, we set
all the values outside the mask to zero, which makes it better
to look at:
To find the threshold for P=0.05, corrected, the resels are:

`pval.P`

contains P-values for peaks, and
`pval.C`

contains P-values for clusters.
This special structure is recognised by `SurfStatView`

which draws the figure in a special way:
Maybe there is something hidden; try inflating the average brain:

If you want to inflate it more, try

`peak`

and `clus`

contain a list of peaks and clusters, together with their P-values, as in
FmriStat. They can be displayed nicely by converting them to a `term`

:
Note that the the F statistic on the plot is the square of the maximum T statistic for the contrast, i.e. 33.24 = 5.7658^2. F statistics with one degree of freedom always equal the square of T statistics. Moreover the P-values are also the same, i.e. the P-value of the F is the P-value of the 2-sided T (equal to twice the P-value of the one-sided T).

Finally, Q values or False Discovery Rate:

Recall that:

Let's try to click on the maximum T statistic with the "Data Cursor" button. I can't hit it exactly - the cursor is just to the right of the mask in the middle right figure. To find the id try:

Only clusters are significant, and not peaks. This suggests that the age effect covers large regions, rather than local foci. This agrees with our conclusion above that there is age-related atrophy right across the brain. FDR tells the same story:

Nothing doing here. Nevertheless, let's take a look at what is happening at the seed point. Here's a plot of `Yseed`

against `age`

with two separate lines for each `gender`

:

As before, a negative effect of age (atrophy), but little difference in slope between the two genders.

`slm.t`

, and their degrees of freedom are in `slm.df`

=[3 143].
The P and Q values are calculated using the same functions. The software knows that `slm.t`

is an F statistic, rather than a T statistic, because
an F statistic has two degrees of freedom:

A large part of the brain is correlated with the seed. However I would claim that pure connectivity as above is of little scientific interest. A more interesting question is how the connectivity changes with say gender. To answer this, add an interaction between seed and the rest of the model, then look at the interaction between seed and gender:

There is really no evidence for a gender effect on connectivity. Nevertheless, let's take a look at the most significant peak at `id=3698`

tucked inside the left temproal lobe. The T statistic is `slm.t(3698)=3.7024`

. To see how the connectivity differs between genders, here's a plot of the data `Y3698`

at that point. The data is adjusted for `Age`

and its interaction with `Yseed`

, and plotted against `Yseed`

for each gender:

As you can see, the connectivity is more in females than males, which is why the T statistic was positive. The F statistic is `13.71=3.7024^2`

as expected.

How does the connectivity change with age?

This looks more promising; for confirmation:

This time there is one small significant cluster where connectivity increases with age ... or maybe this is just the 1 time out of 20 when we expect to see a false positive!

Unfortunately there is no easy way to make a plot that show the effect of an interaction between two continuous variables (here `Yseed`

and `age`

).

`[x; y; z]`

coordinates or by a vertex id. For example, one of our users was interested in a region in the top right figure. We clicked on the centre and found the `id`

=53815. To make the ROI with a radius of 10mm:
The region shows up as white in the top right figure, just behind the temporal lobe.

You could also make an ROI from a cluster. Let's take the clusters of the T statistic for age in the model `M = Age + Gender`

. The variable `clusid`

contains the cluster id's for each vertex, obtained by adding a fourth output argument to `SurfStatP`

:

(The Figure is the same as before.)
Suppose we want to make an ROI from the dark blue cluster. First click on any point in the cluster, find its `id`

=27805 (its P-value is 0.016783), then

`SurfStatP`

(see `help SurfStatP`

) you can change the threshold for defining clusters to any arbitrary value; by default it is the uncorrected P=0.001 threshold.
You can find clusters for any (non-statistical) image using `SurfStatPeakClus`

, as follows. Suppose we want to form an ROI from the mean thickness, as here. First make a structure, say `s`

, with two fields `t`

containing the data, and `tri`

containing the triangles (`s`

is arbitrary, but you must use `t`

and `tri`

as fields):

The figure shows the cluster id's (by coincidence there are four clusters). To make an ROI out of the green cluster (`clusid`

=2, read from the colour bar in the figure):

`maskROI1`

and `maskROI2`

.
Using the Matlab "or" operator `|`

, their union is
`&`

.Going back to the linear model with age and gender, we can find the resels of the first ROI:

`YROI`

can be treated exactly like `Y`

itself, but without specifying the surface, e.g.

`.mnc`

), ANALYZE (`.img`

), NIFTI (`.nii`

) or AFNI (`.brik`

) format. All the matlab functions for ANALYZE (`.img`

) and NIFTI (`.nii`

) are included with SurfStat. If you are using AFNI (`.brik`

) formatted files, you will need to install the AFNI Matlab Library and set the matlab path as here.
If you are using MINC (`.mnc`

) formatted files you will need the extra tools in the
emma toolbox also available for
Windows and Linux.
Installing the Windows and Linux versions is a bit tricky.
In the zipped file that you download, there are `.dll`

files and `.exe`

files -
these must be moved to a directory in the operating system path, e.g. the `matlab/bin/win32`

directory.
Also you must create a directory `c:/tmp`

or `/tmp`

for
emma to use as scratch space. Finally you must set the matlab path to the emma directory, as here. To test it out, see if you can read the MINC template file
supplied with SurfStat:

`.dll`

files and `.exe`

files into the path of the operating system. If you don't get any red error messages, it has worked! Now check to see if you can write
a MINC file:
`c:/tmp`

or `/tmp`

directories. Again if there are no red warning messages, you should have written a MINC file called `test.mnc`

in your working directory (it will just contain zeros). Congratulations!
Now we are ready for the real thing. First read in your volumetric data from `c:/keith/surfstat/myfile.mnc`

:

`s`

to a volume `vol2`

, then write it to `c:/keith/surfstat/myfile2.mnc`

:
`c:/keith/surfstat/myfile.mnc`

and `c:/keith/surfstat/myfile2.mnc`

side by side. If you are using MINC files, then `register`

is convenient (go here for a Windows version). You can call it up from within Matlab as follows:
`c:/keith/surfstat/myfile2.mnc`

has chunks of data missing in places where there are no cortical surfaces. Also you will notice that `c:/keith/surfstat/myfile2.mnc`

is a lot smoother looking than the original `c:/keith/surfstat/myfile.mnc`

because the surfaces that it interpolates onto are all different. The different surfaces have the effect of blurring the data.

`subj`

to a cell array of strings, then to a term. What can we do with `Subj`

? There are three choices; the first one is to ignore `Subj`

, as above:
Suppose in fact there are correlations between observations on the same subject. Then coefficients of fixed effects (`Age`

and `Gender`

) are still unbiased but less precise. But most importantly, their estimated standard deviations are biased (usually too small), so that the resulting T statistics are wrong (usually too big). One possibiility is to allow for `Subj`

as a fixed effect:

(The image is a horrible mess because the 35 `Subj`

indicators dominate.) This model also gives unbiased estimates of fixed effects (`Age`

and `Gender`

), but again they are less precise. The reason is that by allowing for a fixed effect of `Subj`

, all our inference comes from differences *within* a subject. We are throwing away information from age or gender differences *between* subjects. In particular, we are throwing away any subject with just one observation, i.e. the 5 subjects (#28, 31, 33, 34, 35). We are only using age information within a subject. It is as if all subjects had the same age (in years).

Age effects are only inferred from the monthly effect within each subject, and information about age effects between subjects is lost. Nevertheless, inference about the age effects that remain are valid even if the observations within a subject are equally correlated. In other words, the estimated standard deviation of the age effects are unbiased, and T statistics have the correct null distribution. This analysis is better than `M1`

, at least for age.

However it is a disaster for gender: it is now impossible to estimate gender, because gender only varies between subjects, and we have just removed a subject effect!.

The solution is to try a mixed effects model:

Here we have made `Subj`

into a random effect, meaning that its coefficients are independent Gaussian random variables. This induces equal correlations between observations on the same subject. On the right you will see the model for the variance of the observations. We have forgotten to add the identity matrix `I`

to allow for independent "white" noise in every observation (this is added by default to any fixed effect model, but it must be specifically added to a mixed effects model):

The variance is now a linear model with two "variables", a matrix with 1's in each subject, plus the identity matrix with 1's down the diagonal. This mixed effects model `M3`

lies somewhere between the two fixed effects models `M1`

and `M2`

. In `M3`

, if the variance of `random( Subj )`

is 0, you get `M1`

; if the variance of `random( Subj )`

is infinite, you get `M2`

. Model `M3`

estimates a variance in between.
You might be curious to know that if the design is balanced then inference about the age effect is the same in `M2`

as in `M3`

. A balanced design has the same number of observations at the same time instants (e.g. at ages 21.0, 21.5, 21.75 years) on each subject.

The rest of the analysis proceeds as before. `SurfStatLinMod`

fits model `M3`

by first fitting `M1`

. It then uses the residuals to estimate the variance model by ReML. The fixed effects and the overall variance are re-estimated, again by ReML. Thus all parameter estimates are approximately unbiased. If this procedure were iterated, it would converge to the ReML estimates, but this takes much more time, and the resulting estimates are not much different. If you are worried about this, you should use Jason Lerch's
Thickness Statistics which calls R's `nlme`

, but it is much more time consuming (up to 1 day per fit!). `nlme`

will also fit more elaborate models such as an AR(p) time-series model for the correlation structure, which is not implemented in SurfStat.

Here we go:

`slm.r`

contains the estimated correlation within subjects. It is clamped to a minimum of a small positive number to avoid a non-positive definite variance matrix. Most of the image is in fact clamped, indicating no correlations within "subjects". This is what we expect, since the "subjects" are fake. The results for a negative age effect (atrophy) are:

The first thing to note is that the degrees of freedom is not equal to #observations - #variables = 147 - 3 = 144 as for `M1`

. It is not even an integer. It is a non-integer *effective* degrees of freedom that varies spatially (since the variance model varies spatially), and is available in `slm.dfs`

. Its average inside the mask is 54.9, and this is used for the P-value calculations (not shown). For a gender effect:

Note that you get different degrees of freedom for different contrasts, unlike for `M1`

. Now the degrees of freedom is less. Why? Essentially because gender is estimated between subjects, and there are only 35 subjects, so the degrees of freedom has to be less than 35.

More elaborate models are possible. It is common practice to assume that the interaction between a fixed effect and a random effect is also random. In the case of age, this allows for a random age slope for different subjects. The test for the age main effect still makes sense: it is testing to see if the mean of these random slopes is different from 0. To do this:

It turns out that this model is not appropriate because it is not invariant to a rescaling of age. If for instance we measured age from a baseline of (say) 18 years, then the fitted model would be different. To overcome this, we need to allow for a covariance between the random age-subject interaction and the random subject main effect, which can be written in two identical ways:

Now there are two extra components that allow for the covariance. As a result, fitting takes a little longer:

The df has gone down from 54.9 to 12.7. The reason is that if we assume that there is a different age slope for each subject, then the age sd is now estimated between the 30 subjects with at least two observations, so the age sd has to have less than 30 degrees of freedom. The 30 is reduced even more, since more weight is put on the few subjects with larger numbers of observations. In general, you lose degrees of freedom as the variance model becomes more elaborate, which means you lose sensitivity to detect effects.

So my advice is to stick to simple models, such as `M3`

, unless there is strong evidence to the contrary. You might be curious to know that
FmriStat (and all the other common packages such as SPM and FSL) do in fact allow for an interaction between fixed effects and random effects. This is implemented by finding one slope per subject, throwing the rest of the data away, then combining the slopes in another linear model. The reason why it is appropriate for fMRI data is that the number of repeated measures in an fMRI time series is huge (100-200) compared to the number of repeated measures here (1-10). This means that sd's can be estimated very accurately within subjects, there is no need to pool sd information between subjects, and so no need for a complex mixed effects model.

Finally, it is possible to do Welch's T test, which tests for a difference in group means allowing for a difference in variance. For example, to test for a difference in gender, allowing for different variances for males and females, but ignoring age and subject,

Actually the model is over-parameterized. To reduce the model by eliminating redundant variables both for mean and variance:

Now it is clearer what is going on: there is a separate mean for males and females, and a separate variance for males and females. The effective degrees of freedom is 137.3, down slightly from the 147 - 2 = 145 without allowing for separate variances, which is the price you pay for allowing for separate variances.

The most general model would be to interact all fixed effects with all random effects:

Apart from taking a long time to fit, such a model leaves too little degrees of freedom to make any useful inference.

So far only T tests for univariate mixed effects models are implemented - F tests and multivariate mixed effects models are yet to come.

A final note of caution: the methods used to fit mixed effects models are only approximate. Convergence is not guaranteed. The way in which the model is forced to have a positive definite variance matrix restricts the types of correlation structures. The method is not completely invariant to a linear re-parameterization, even if fixed effects terms are added before crossing with a random effect as recommended above. However the algorithm will give *exact* results for a single random effect if the design is balanced. It will also give *exact* results for Welch's test even if the grouping factor has more than one group. In general it will give good results whenever there is just one additive random effect, such as `random(Subj)`

, so my suggestion is to stick to that.

`SurfStat`

functions handle memory mapping without you knowing. However there might be occasions when you need to know where your data is really stored, in which case, read on.
When the amount of data is big, it can exceed the available memory. In this case, the three functions that read data (`SurfStatReadSurf`

, `SurfStatReadData`

, `SurfStatReadVol`

) all resort to memory mapping instead of reading the data into memory. Memory mapping is invoked if the total storage exceeds 64Mb of memory. You can change this memory limit to e.g. 128MB by issuing the following command:

`help`

for the reader.
Memory mapping works as follows. The data is first written to a temporary file as 4 byte reals. The temporary file is in a directory `SurfStat`

in the same directory as the first file to be read in. This directory will be created if it doesn't exist already. If it can't be created because you don't have permission, then an error message will be produced prompting you to supply a directory where you do have permission to write. Note that neither the temporary file, nor the temporary directory, are ever deleted. To avoid wasted disk space, you should delete these files after you have quit Matlab.

Then the output parameter of the reader, say `Y`

, is a special structure that contains information on the name of the file where the data is stored, the format, and the dimensions of the data array (see below for an example). You can access the data by:

`SurfStat`

functions will handle memory mapped data where appropriate.As an example of memory mapping, we shall analyze the trivariate coordinates of the surfaces themselves. We shall see if coordinates depend on age allowing for a difference in coordinates between males and females. First we must read in all the surface data (this takes a few minutes):

`SurfStatReadSurf`

, only `surfs.coord`

is memory mapped, and `surfs.tri`

is read into memory in the usual way. To see this, type:
`Filename`

is the memory mapped file. It is mapped as a 81924 × 3 × 147 single precision array. To access all the coordinates of vertex 1000 on subjects 20 and 25:

As an example, let's look for an age effect on the smoothed surface coordinates, allowing for gender.

It's interesting to compare this with the T statistic for an age effect on cortical thickness. However before getting excited, we should find out the corrected threshold:

Unfortunately the P-values for peaks and clusters are not significant:

`P>1`

for the last two peaks? The reason is that `P`

approximates the expected number of peaks above `t`

, which only approximates the P-value if `P`

is small, say less than 0.1.
Both the P-value and Q-value images are blank, so we don't show them, but here's the code:
Here's an even better way of showing the movement of the surface with age. We can use Matlab's `quiver3`

to show the age effects in `slm.ef`

as little arrows. First let's pick three local maxima in the table above as origins for the arrows, then find their coordinates (the rows are `x`

, `y`

, `z`

, the columns are the vertices):

`h`

by putting `h`

as the output of the `SurfStatView`

function above. Then we loop over axes, adding red arrows in the positive direction and blue arrows in the negative direction, so we can always see the arrows:
(Note that the arrows are not to scale.) The red arrows are pointing up and away from the brain centre, suggesting a movement of anatomy in that direction with increasing age. However we shouldn't read too much into this, because the P-values are not significant.

Finally, let's test for a quadratic age effect, allowing for gender, using the maximum F statistic (Roy's maximum root):

The corrected P-values are not significant, but there is one significant cluster:

What is going on at the peak inside this region (`id=14465`

)? Here's plot of the x,y,z coordinates against age for males and females separately:

Looks like the quadratic effect is caused by one male and one female (ages 45 and 42, respectively) with very low z coordinate.

`z=0`

, laid out in two groups of trauma and control. To do this, we first read in just the one slice with `z`

coordinate `0`

from all data sets, which easily fits into memory:
`x`

slices, all `y`

slices, but just the one slice with `z=0`

.
We then use
`SurfStatViews`

(the `s`

is for slice) with a last parameter, `layout`

, which is a matrix which lays
out the slices (in the order in filenames) where you want them in the
array of images (0 leaves out an image). The variable `control`

indicates with a 1 which files are in the control group, in the order in `filenames`

:
Even to the naked eye, it looks like the trauma group has less WM than the control group. To save storage, we now find a mask based on the average white matter density in the control group:

`SurfStatView1`

is a viewer for both volumetric and surface data with only one view (hence the name). By default it will show three orthogonal slices through the middle of the data:
Now we need to choose a suitable threshold to mask the white matter. 0.05, i.e. 5% or more white matter, looks like a good choice. To check this, add a thresholded image at 0.05 by leaving out the `clf`

command (which clears the figure window):

Other properties such as the colour and transparency can be modified, see the help of `SurfStatView1`

, and strung together in pairs of arguments after `volwmav`

.

Now we are ready to read in all the data inside the mask. If there is a lot of data, `SurfStatReadVol`

uses memory mapping rather than loading all the data into memory.

Of course you may prefer to write out the T-statistic so you can load it into your own favourite viewer:

The damage is quite extensive, indicating that the white matter surrounding the ventricles has atrophied. To see how much atrophy, let's look at the estimated effect (control minus trauma) and its standard error:

You may prefer the more traditional view of slices, say every 10mm from -30mm to 70mm:

It looks like the trauma subjects have lost between 20% and 40% WM, ± ~5%, in the regions surrounding the ventricles.

`x`

, `y`

and `z`

deformation components for each subject. We first read in the file names then rearrange them into 17 × 3 and 19 × 3 cell arrays of file names as follows:
`z=0`

of all the data, arranged in rows of `x`

, `y`

and `z`

deformations for the trauma group, followed by `x`

, `y`

and `z`

deformations for the control group:
It's hard to see any difference in deformations between trauma and control, but at least the data look OK. Next we will need a mask. Since the trauma is expected to atrophy the white matter, let's form a mask by thresholding average white matter density. A file for this has already been created:

0.05, i.e. 5% or more white matter, looks like a reasonable threshold:

Now let's read in all the data, which just exceeds 64Mb, so it is memory mapped:

Now for the P-values:

The damage is quite extensive, but what are the actual deformation differences? We can visualize the deformations by adding them as little arrows to the image. To do this, let's first select some positions for the arrows. I've chosen positions on a lattice of 10mm intervals using Matlab's `ndgrid`

:

`slm.ef`

but since the deformations are measured from subject to atlas, i.e. atlas minus subject, we should flip the sign so that we have trauma minus control deformations. Then we can use Matlab's `quiver3`

to add the arrows:
The "0" as the 7th argument of `quiver3`

ensures that the arrows are drawn to scale, that is, they are the real movements in anatomy, in mm. Let's zoom in by hitting the "+" magnifying glass in the figure tool bar:

It looks like the arrows are pointing out from the centre, indicating that the ventricles of the trauma subjects have expanded, or equivalently, that the white matter surrounding the ventricles has atrophied.

This illustrates nicely how DBM can not only detect where the anatomy has changed, but it can also show how the anatomy has moved, i.e. the direction of the changes.

`h1`

, `h2`

, `h3`

and
`h4`

). These are to be compared to the 5th scan of rest (labeled `hb`

).
We can get a list of all the 70 PET data files as follows:
`z=0`

, laid out in a task × subjects matrix. To do this, we first read in just the one slice with `z`

coordinate `0`

from all data sets, which easily fits into memory:
`x`

slices, all `y`

slices, but just the one slice with `z=0`

.
We then use
`SurfStatViews`

with a last parameter, `layout`

, which is a matrix which lays
out the slices (in the order in filenames) where you want them in the
array of images:
It is obvious that there is a bad scan (#25). In the FmriStat analysis, the entire subject subject #5 was dropped from the analysis. Here we shall retain all the data on subject #5, except the bad scan, and do a mixed effects analysis to recover as much information as possible from the data. To exclude the bad scan, #25:

`SurfStatAvVol`

tells the function that we want the minimum over scans (other possible choices are `@max`

for the maximum, or `@plus`

for the average, which is the default). The third argument replaces NaN ("not a number") by 0. `SurfStatView1`

is a viewer for both volumetric and surface data with only one view (hence the name). By default it will show three orthogonal slices through the middle of the data:

Now we need to choose a suitable threshold to mask the interior of the brain. 5000 looks like a good choice. To check this, reset the colour limits as follows:

Now we are ready to read in all the data inside the mask. If there is a lot of data, as in this case,
`SurfStatReadVol`

uses memory mapping rather than loading all the data into
memory:

`SurfStatReadVol1`

, extract the data part, then convert to a logical:
`z=0`

(0 in `layout`

leaves a gap in the array for the deleted scan):
It is immediately obvious that the data need normalizing so that all images have roughly the same average intensity, as follows:

Since the design is almost balanced, the simplest way for generating the levels of the factors for subject and condition is to generate them for a balanced design, then delete the levels for the bad scan:

Fitting the model, the within-subject correlation is quite high, averaging 0.59:

Now we test for the difference between the 3Hz condition and the baseline, picking two slices at `z=[-20, 60]`

for display:

To get a better look, let's threshold the T-statistic at 3 and add it to the figure by leaving off the `clf`

command (which clears the figure window):

We could add the mask to the image as well:

Other properties such as the colour and transparency can be modified, see the help of `SurfStatView1`

, and strung together in pairs of arguments after `vol`

. You may like the more traditional view of slices, say every 10mm:

Of course you may prefer to write out the T-statistic so you can load it into your own favourite viewer:

Note that the mean effective df is slightly higher than the fixed effects df, which would be 69 - 14 - 5 + 1 = 51. This is because the mixed effects model has recovered some of the between-subject variation to increase the amount of information about the contrast. Finally the P-values and Q-values are:

Finally, just for amusement, let's add a surface to the last image:

An alternative is to make up a new linear model with an explicit linear variable for frequency. To do this, we extract the frequencies from the `cond`

variable, and set the baseline to 0:

`Baseline + Hz`

does not force the intercept of the Hz effect to go through the baseline data, since `Hz=0`

might not produce the same response as the baseline. In other words, the intercept of the Hz effect is modeled separately from the baseline:
Now we fit the model in the usual way:

As we can see, the two T-statistics, the first a linear contrast, the second a linear effect of Hz, are almost identical. The main difference is that the df of the first is lower (51) than the second (53). Why? The reason is that the first model has a *separate* effect for each level of Hz, in other words, it allows for an arbitrary non-linear effect of Hz; the second only has a linear effect of Hz, so it has ~2 more df for the error. Which is better? As usual, it's a trade-off; the first will have an sd that is less biased, but less accurate; the second will have an sd that might be biased (if the the true Hz effect is non-linear, e.g. quadratic) but more accurate. Luckily both methods give almost identical results. Here are the P-values for the second method:

The activated local maxima are almost too small to see. Let's plot the data at the maximum T-statistic:

Although the T-statistic is high (6.04), the data doesn't seem to be highly correlated. This is because the Y values are adjusted only for the fixed effects, not the fixed and random (subject) effects. One way to adjust Y for subject is to treat subject as fixed:

The command `ylim`

sets the same y limits, so we can compare the plots.
The slopes are almost identical, but now the data in the second plot are far less variable because the subject effect has been removed, so now the data looks more correlated.

Let's look at a quadratic effect of Hz. Again there are two ways of doing it, either through contrasts or through an explicit quadratic model. Here's the first approach, through a quadratic contrast:

Here's the model for the second approach, though a quadratic model:

Here's the analysis of the quadratic effect:

The analyses are almost identical, apart from the different df, an neither shows any significant quadratic effect.

Finally, let's look at a cubic effect of Hz, done both ways:

`slm.SSE`

's, are the same as well. The answer is that they are exactly the same model! A cubic model in Hz (`M3`

) is identical to a separate mean for each of the four levels of Hz (`M`

). The only difference is that the models are parameterized in different ways. But the fitted models are identical. Why? A cubic can be fitted exactly to any data at four points.