Orginally hosted on http://www.math.mcgill.ca/keith Moved here to ensure continuity. Download it here!
SurfStat is a Matlab toolbox for the statistical analysis of univariate and multivariate surface and volumetric data using linear mixed effects models and random field theory. It is inspired by Jason Lerch's Thickness Statistics written in R, and Jonathan Taylor's BrainStat, part of NIPY, written in Python. It is intended for cortical thickness data on triangular meshes, either for the whole cortex or one for each hemisphere. It will handle any triangulated surface data, written in FreeSurfer or MNI object format. The only requirement is that the triangulation scheme must be the same for all surfaces, i.e. the data must be registered to a common surface.
Worsley, K.J., Taylor, J.E., Carbonell, F., Chung, M.K., Duerden, E., Bernhardt, B., Lyttelton, O., Boucher, M., Evans, A.C. (2009). SurfStat: A Matlab toolbox for the statistical analysis of univariate and multivariate surface and volumetric data using linear mixed effects models and random field theory. NeuroImage, OHBM poster, accepted.
Thanks to Oliver Lyttelton for suggesting the multivariate statistics, Moo Chung and Cate Hartley for help with FreeSurfer, Boris Bernhardt for helping with the mixed effects, and Emma Duerden for constant feedback that has really improved the package.
C:\Program Files\MATLAB\R2006a\toolbox\local/surfstat/
.
C:\Program Files\MATLAB\R2006a\toolbox\local/surfstat/
--> OK --> Save.
doc/SurfStat
directory.
You can get complete on-line documentation
about any Matlab function, including the ones in SurfStat,
by typing help
followed by the
name of the function, e.g.
First let's read in one subject whose id is 100:
and perhaps save this as a .jpg file for publication or PowerPoint:
The point is marked by the little black and white square in the top centre figure. Its id is 66180, so you can type
For further customizing, see the code of the SurfStatViewData.m
function.
You may just want to work with one hemisphere:
Finally a FreeSurfer example, courtesy Moo Chung:
You can check that the FreeSurfer surface has no topological deformities by calculating its Euler characteristic (EC):
SurfStatListDir.m
to list the contents of a directory:
You might want to also exclude the brain stem. To do this, I clicked on where I thought the centre of the brain stem was: (0; -16; -8)mm, and I clicked a bit further out to guess the radius: 20mm. Then you can add a ball-shaped ROI to the mask as follows (see ROI):
Since I don't really know where the brain stem is, I shall just stick to the first mask in what follows.
glim_image
, i.e. a text file with fields for the thickness
file names, and the covariates:
textread
:
To read in the actual thickness data from the above files:
Let's take a look at the mean thickness for vertices:
We can save this for later:
As expected, thickness decreases with age. The line looks a bit faint, and the markers are too small. You can change the line width and marker size as follows:
or you can edit the figure by clicking on the "Edit Plot" button (the one with the little arrow) in the Figure toolbar, then double-clicking on the feature you want to edit.
For gender:
Males appear to have overall thicker cortex than females.
FreeSurfer group registered data is often stored in two .mgh
files. In the following example, data on 18 subjects is in two files. To read these in:
Y
is an 18 × 327684 matrix of cortical thickness data on the 18 subjects. To read in one of the surfaces:
It looks as if this is a "whole brain" component,
i.e. the thickness on the entire surface is positively correlated.
An explanation is that some subjects have overall thick cortex, some
have overall thin cortex, perhaps due to differences in contrast at
the grey-white boundary. See SurfStatNorm.m
to normalize the
cortical surface data, either by subtracting or dividing the
global mean. However before doing this, let's plot the
subject component against age:
The global effect decreases with age; in other words, it could be just an age-related atrophy effect. This makes sense: since the spatial component is roughly constant across the brain, then the subject component must be the global average, which we have already noted decreases with age.
Now let's look at the second component, which explains a mere 4.1% of the variability:
Plotting the subject component against gender:
We note that the second component might be a gender effect. Females score positively on this component, so they have higher cortical thickness in the red areas, lower cortical thickness in the blue areas, compared to males.
Higher order components seem to be more difficult to interpret.
term
. For the
technically minded, a term
has two components: a matrix, and a cell of strings
for the names of the columns. The function term.m
converts a vector or a
cell array of strings to a term
, which is displayed as follows:
double
and char
convert
a term
back to its matrix and name components.
The operators +
, *
, ^
and -
have been redefined (overloaded) for term
s
so that they match the operation of terms in a model formula.
You now write down a model formula (without the coefficients)
e.g. a main effect of Age and Gender:
(The empty space on the right is reserved for the design matrix for the
variance of a mixed effects model.)
Now let's fit the linear model, saving the results in slm
:
slm
by typing
slm.X
is the design matrix,
slm.df
is the error degrees of freedom, slm.coef
is the coefficients, slm.SSE
is
error sum of squares, slm.tri
is the list of triangle indices and slm.resl
is a function of the resels along each edge.
Normally you never need to look at any of these things!
help SurfStatT
for more information on this):
contrast
is a numeric 147 × 1 vector. Note how Gender.male
gives you the numeric indicator variable for
male
, i.e. 1 if the observation is male
, and 0 otherwise. Then to get the T statistic:
slm.c
. It equals [0 0 -1 1] (to machine accuracy), i.e. column 4 - column 3. slm.k
is the number of variates (univariate in this case). Effects are in slm.ef
, and their standard deviations (standard
errors) are in slm.sd
.
The T statistic is in slm.t
, and by multiplying it by mask
, we set
all the values outside the mask to zero, which makes it better
to look at:
To find the threshold for P=0.05, corrected, the resels are:
pval.P
contains P-values for peaks, and
pval.C
contains P-values for clusters.
This special structure is recognised by SurfStatView
which draws the figure in a special way:
Maybe there is something hidden; try inflating the average brain:
If you want to inflate it more, try
peak
and clus
contain a list of peaks and clusters, together with their P-values, as in
FmriStat. They can be displayed nicely by converting them to a term
:
Note that the the F statistic on the plot is the square of the maximum T statistic for the contrast, i.e. 33.24 = 5.7658^2. F statistics with one degree of freedom always equal the square of T statistics. Moreover the P-values are also the same, i.e. the P-value of the F is the P-value of the 2-sided T (equal to twice the P-value of the one-sided T).
Finally, Q values or False Discovery Rate:
Recall that:
Let's try to click on the maximum T statistic with the "Data Cursor" button. I can't hit it exactly - the cursor is just to the right of the mask in the middle right figure. To find the id try:
Only clusters are significant, and not peaks. This suggests that the age effect covers large regions, rather than local foci. This agrees with our conclusion above that there is age-related atrophy right across the brain. FDR tells the same story:
Nothing doing here. Nevertheless, let's take a look at what is happening at the seed point. Here's a plot of Yseed
against age
with two separate lines for each gender
:
As before, a negative effect of age (atrophy), but little difference in slope between the two genders.
slm.t
, and their degrees of freedom are in slm.df
=[3 143].
The P and Q values are calculated using the same functions. The software knows that slm.t
is an F statistic, rather than a T statistic, because
an F statistic has two degrees of freedom:
A large part of the brain is correlated with the seed. However I would claim that pure connectivity as above is of little scientific interest. A more interesting question is how the connectivity changes with say gender. To answer this, add an interaction between seed and the rest of the model, then look at the interaction between seed and gender:
There is really no evidence for a gender effect on connectivity. Nevertheless, let's take a look at the most significant peak at id=3698
tucked inside the left temproal lobe. The T statistic is slm.t(3698)=3.7024
. To see how the connectivity differs between genders, here's a plot of the data Y3698
at that point. The data is adjusted for Age
and its interaction with Yseed
, and plotted against Yseed
for each gender:
As you can see, the connectivity is more in females than males, which is why the T statistic was positive. The F statistic is 13.71=3.7024^2
as expected.
How does the connectivity change with age?
This looks more promising; for confirmation:
This time there is one small significant cluster where connectivity increases with age ... or maybe this is just the 1 time out of 20 when we expect to see a false positive!
Unfortunately there is no easy way to make a plot that show the effect of an interaction between two continuous variables (here Yseed
and age
).
[x; y; z]
coordinates or by a vertex id. For example, one of our users was interested in a region in the top right figure. We clicked on the centre and found the id
=53815. To make the ROI with a radius of 10mm:
The region shows up as white in the top right figure, just behind the temporal lobe.
You could also make an ROI from a cluster. Let's take the clusters of the T statistic for age in the model M = Age + Gender
. The variable clusid
contains the cluster id's for each vertex, obtained by adding a fourth output argument to SurfStatP
:
(The Figure is the same as before.)
Suppose we want to make an ROI from the dark blue cluster. First click on any point in the cluster, find its id
=27805 (its P-value is 0.016783), then
SurfStatP
(see help SurfStatP
) you can change the threshold for defining clusters to any arbitrary value; by default it is the uncorrected P=0.001 threshold.
You can find clusters for any (non-statistical) image using SurfStatPeakClus
, as follows. Suppose we want to form an ROI from the mean thickness, as here. First make a structure, say s
, with two fields t
containing the data, and tri
containing the triangles (s
is arbitrary, but you must use t
and tri
as fields):
The figure shows the cluster id's (by coincidence there are four clusters). To make an ROI out of the green cluster (clusid
=2, read from the colour bar in the figure):
maskROI1
and maskROI2
.
Using the Matlab "or" operator |
, their union is
&
.Going back to the linear model with age and gender, we can find the resels of the first ROI:
YROI
can be treated exactly like Y
itself, but without specifying the surface, e.g.
.mnc
), ANALYZE (.img
), NIFTI (.nii
) or AFNI (.brik
) format. All the matlab functions for ANALYZE (.img
) and NIFTI (.nii
) are included with SurfStat. If you are using AFNI (.brik
) formatted files, you will need to install the AFNI Matlab Library and set the matlab path as here.
If you are using MINC (.mnc
) formatted files you will need the extra tools in the
emma toolbox also available for
Windows and Linux.
Installing the Windows and Linux versions is a bit tricky.
In the zipped file that you download, there are .dll
files and .exe
files -
these must be moved to a directory in the operating system path, e.g. the matlab/bin/win32
directory.
Also you must create a directory c:/tmp
or /tmp
for
emma to use as scratch space. Finally you must set the matlab path to the emma directory, as here. To test it out, see if you can read the MINC template file
supplied with SurfStat:
.dll
files and .exe
files into the path of the operating system. If you don't get any red error messages, it has worked! Now check to see if you can write
a MINC file:
c:/tmp
or /tmp
directories. Again if there are no red warning messages, you should have written a MINC file called test.mnc
in your working directory (it will just contain zeros). Congratulations!
Now we are ready for the real thing. First read in your volumetric data from c:/keith/surfstat/myfile.mnc
:
s
to a volume vol2
, then write it to c:/keith/surfstat/myfile2.mnc
:
c:/keith/surfstat/myfile.mnc
and c:/keith/surfstat/myfile2.mnc
side by side. If you are using MINC files, then register
is convenient (go here for a Windows version). You can call it up from within Matlab as follows:
c:/keith/surfstat/myfile2.mnc
has chunks of data missing in places where there are no cortical surfaces. Also you will notice that c:/keith/surfstat/myfile2.mnc
is a lot smoother looking than the original c:/keith/surfstat/myfile.mnc
because the surfaces that it interpolates onto are all different. The different surfaces have the effect of blurring the data.
subj
to a cell array of strings, then to a term. What can we do with Subj
? There are three choices; the first one is to ignore Subj
, as above:
Suppose in fact there are correlations between observations on the same subject. Then coefficients of fixed effects (Age
and Gender
) are still unbiased but less precise. But most importantly, their estimated standard deviations are biased (usually too small), so that the resulting T statistics are wrong (usually too big). One possibiility is to allow for Subj
as a fixed effect:
(The image is a horrible mess because the 35 Subj
indicators dominate.) This model also gives unbiased estimates of fixed effects (Age
and Gender
), but again they are less precise. The reason is that by allowing for a fixed effect of Subj
, all our inference comes from differences within a subject. We are throwing away information from age or gender differences between subjects. In particular, we are throwing away any subject with just one observation, i.e. the 5 subjects (#28, 31, 33, 34, 35). We are only using age information within a subject. It is as if all subjects had the same age (in years).
Age effects are only inferred from the monthly effect within each subject, and information about age effects between subjects is lost. Nevertheless, inference about the age effects that remain are valid even if the observations within a subject are equally correlated. In other words, the estimated standard deviation of the age effects are unbiased, and T statistics have the correct null distribution. This analysis is better than M1
, at least for age.
However it is a disaster for gender: it is now impossible to estimate gender, because gender only varies between subjects, and we have just removed a subject effect!.
The solution is to try a mixed effects model:
Here we have made Subj
into a random effect, meaning that its coefficients are independent Gaussian random variables. This induces equal correlations between observations on the same subject. On the right you will see the model for the variance of the observations. We have forgotten to add the identity matrix I
to allow for independent "white" noise in every observation (this is added by default to any fixed effect model, but it must be specifically added to a mixed effects model):
The variance is now a linear model with two "variables", a matrix with 1's in each subject, plus the identity matrix with 1's down the diagonal. This mixed effects model M3
lies somewhere between the two fixed effects models M1
and M2
. In M3
, if the variance of random( Subj )
is 0, you get M1
; if the variance of random( Subj )
is infinite, you get M2
. Model M3
estimates a variance in between.
You might be curious to know that if the design is balanced then inference about the age effect is the same in M2
as in M3
. A balanced design has the same number of observations at the same time instants (e.g. at ages 21.0, 21.5, 21.75 years) on each subject.
The rest of the analysis proceeds as before. SurfStatLinMod
fits model M3
by first fitting M1
. It then uses the residuals to estimate the variance model by ReML. The fixed effects and the overall variance are re-estimated, again by ReML. Thus all parameter estimates are approximately unbiased. If this procedure were iterated, it would converge to the ReML estimates, but this takes much more time, and the resulting estimates are not much different. If you are worried about this, you should use Jason Lerch's
Thickness Statistics which calls R's nlme
, but it is much more time consuming (up to 1 day per fit!). nlme
will also fit more elaborate models such as an AR(p) time-series model for the correlation structure, which is not implemented in SurfStat.
Here we go:
slm.r
contains the estimated correlation within subjects. It is clamped to a minimum of a small positive number to avoid a non-positive definite variance matrix. Most of the image is in fact clamped, indicating no correlations within "subjects". This is what we expect, since the "subjects" are fake. The results for a negative age effect (atrophy) are:
The first thing to note is that the degrees of freedom is not equal to #observations - #variables = 147 - 3 = 144 as for M1
. It is not even an integer. It is a non-integer effective degrees of freedom that varies spatially (since the variance model varies spatially), and is available in slm.dfs
. Its average inside the mask is 54.9, and this is used for the P-value calculations (not shown). For a gender effect:
Note that you get different degrees of freedom for different contrasts, unlike for M1
. Now the degrees of freedom is less. Why? Essentially because gender is estimated between subjects, and there are only 35 subjects, so the degrees of freedom has to be less than 35.
More elaborate models are possible. It is common practice to assume that the interaction between a fixed effect and a random effect is also random. In the case of age, this allows for a random age slope for different subjects. The test for the age main effect still makes sense: it is testing to see if the mean of these random slopes is different from 0. To do this:
It turns out that this model is not appropriate because it is not invariant to a rescaling of age. If for instance we measured age from a baseline of (say) 18 years, then the fitted model would be different. To overcome this, we need to allow for a covariance between the random age-subject interaction and the random subject main effect, which can be written in two identical ways:
Now there are two extra components that allow for the covariance. As a result, fitting takes a little longer:
The df has gone down from 54.9 to 12.7. The reason is that if we assume that there is a different age slope for each subject, then the age sd is now estimated between the 30 subjects with at least two observations, so the age sd has to have less than 30 degrees of freedom. The 30 is reduced even more, since more weight is put on the few subjects with larger numbers of observations. In general, you lose degrees of freedom as the variance model becomes more elaborate, which means you lose sensitivity to detect effects.
So my advice is to stick to simple models, such as M3
, unless there is strong evidence to the contrary. You might be curious to know that
FmriStat (and all the other common packages such as SPM and FSL) do in fact allow for an interaction between fixed effects and random effects. This is implemented by finding one slope per subject, throwing the rest of the data away, then combining the slopes in another linear model. The reason why it is appropriate for fMRI data is that the number of repeated measures in an fMRI time series is huge (100-200) compared to the number of repeated measures here (1-10). This means that sd's can be estimated very accurately within subjects, there is no need to pool sd information between subjects, and so no need for a complex mixed effects model.
Finally, it is possible to do Welch's T test, which tests for a difference in group means allowing for a difference in variance. For example, to test for a difference in gender, allowing for different variances for males and females, but ignoring age and subject,
Actually the model is over-parameterized. To reduce the model by eliminating redundant variables both for mean and variance:
Now it is clearer what is going on: there is a separate mean for males and females, and a separate variance for males and females. The effective degrees of freedom is 137.3, down slightly from the 147 - 2 = 145 without allowing for separate variances, which is the price you pay for allowing for separate variances.
The most general model would be to interact all fixed effects with all random effects:
Apart from taking a long time to fit, such a model leaves too little degrees of freedom to make any useful inference.
So far only T tests for univariate mixed effects models are implemented - F tests and multivariate mixed effects models are yet to come.
A final note of caution: the methods used to fit mixed effects models are only approximate. Convergence is not guaranteed. The way in which the model is forced to have a positive definite variance matrix restricts the types of correlation structures. The method is not completely invariant to a linear re-parameterization, even if fixed effects terms are added before crossing with a random effect as recommended above. However the algorithm will give exact results for a single random effect if the design is balanced. It will also give exact results for Welch's test even if the grouping factor has more than one group. In general it will give good results whenever there is just one additive random effect, such as random(Subj)
, so my suggestion is to stick to that.
SurfStat
functions handle memory mapping without you knowing. However there might be occasions when you need to know where your data is really stored, in which case, read on.
When the amount of data is big, it can exceed the available memory. In this case, the three functions that read data (SurfStatReadSurf
, SurfStatReadData
, SurfStatReadVol
) all resort to memory mapping instead of reading the data into memory. Memory mapping is invoked if the total storage exceeds 64Mb of memory. You can change this memory limit to e.g. 128MB by issuing the following command:
help
for the reader.
Memory mapping works as follows. The data is first written to a temporary file as 4 byte reals. The temporary file is in a directory SurfStat
in the same directory as the first file to be read in. This directory will be created if it doesn't exist already. If it can't be created because you don't have permission, then an error message will be produced prompting you to supply a directory where you do have permission to write. Note that neither the temporary file, nor the temporary directory, are ever deleted. To avoid wasted disk space, you should delete these files after you have quit Matlab.
Then the output parameter of the reader, say Y
, is a special structure that contains information on the name of the file where the data is stored, the format, and the dimensions of the data array (see below for an example). You can access the data by:
SurfStat
functions will handle memory mapped data where appropriate.As an example of memory mapping, we shall analyze the trivariate coordinates of the surfaces themselves. We shall see if coordinates depend on age allowing for a difference in coordinates between males and females. First we must read in all the surface data (this takes a few minutes):
SurfStatReadSurf
, only surfs.coord
is memory mapped, and surfs.tri
is read into memory in the usual way. To see this, type:
Filename
is the memory mapped file. It is mapped as a 81924 × 3 × 147 single precision array. To access all the coordinates of vertex 1000 on subjects 20 and 25:
As an example, let's look for an age effect on the smoothed surface coordinates, allowing for gender.
It's interesting to compare this with the T statistic for an age effect on cortical thickness. However before getting excited, we should find out the corrected threshold:
Unfortunately the P-values for peaks and clusters are not significant:
P>1
for the last two peaks? The reason is that P
approximates the expected number of peaks above t
, which only approximates the P-value if P
is small, say less than 0.1.
Both the P-value and Q-value images are blank, so we don't show them, but here's the code:
Here's an even better way of showing the movement of the surface with age. We can use Matlab's quiver3
to show the age effects in slm.ef
as little arrows. First let's pick three local maxima in the table above as origins for the arrows, then find their coordinates (the rows are x
, y
, z
, the columns are the vertices):
h
by putting h
as the output of the SurfStatView
function above. Then we loop over axes, adding red arrows in the positive direction and blue arrows in the negative direction, so we can always see the arrows:
(Note that the arrows are not to scale.) The red arrows are pointing up and away from the brain centre, suggesting a movement of anatomy in that direction with increasing age. However we shouldn't read too much into this, because the P-values are not significant.
Finally, let's test for a quadratic age effect, allowing for gender, using the maximum F statistic (Roy's maximum root):
The corrected P-values are not significant, but there is one significant cluster:
What is going on at the peak inside this region (id=14465
)? Here's plot of the x,y,z coordinates against age for males and females separately:
Looks like the quadratic effect is caused by one male and one female (ages 45 and 42, respectively) with very low z coordinate.
z=0
, laid out in two groups of trauma and control. To do this, we first read in just the one slice with z
coordinate 0
from all data sets, which easily fits into memory:
x
slices, all y
slices, but just the one slice with z=0
.
We then use
SurfStatViews
(the s
is for slice) with a last parameter, layout
, which is a matrix which lays
out the slices (in the order in filenames) where you want them in the
array of images (0 leaves out an image). The variable control
indicates with a 1 which files are in the control group, in the order in filenames
:
Even to the naked eye, it looks like the trauma group has less WM than the control group. To save storage, we now find a mask based on the average white matter density in the control group:
SurfStatView1
is a viewer for both volumetric and surface data with only one view (hence the name). By default it will show three orthogonal slices through the middle of the data:
Now we need to choose a suitable threshold to mask the white matter. 0.05, i.e. 5% or more white matter, looks like a good choice. To check this, add a thresholded image at 0.05 by leaving out the clf
command (which clears the figure window):
Other properties such as the colour and transparency can be modified, see the help of SurfStatView1
, and strung together in pairs of arguments after volwmav
.
Now we are ready to read in all the data inside the mask. If there is a lot of data, SurfStatReadVol
uses memory mapping rather than loading all the data into memory.
Of course you may prefer to write out the T-statistic so you can load it into your own favourite viewer:
The damage is quite extensive, indicating that the white matter surrounding the ventricles has atrophied. To see how much atrophy, let's look at the estimated effect (control minus trauma) and its standard error:
You may prefer the more traditional view of slices, say every 10mm from -30mm to 70mm:
It looks like the trauma subjects have lost between 20% and 40% WM, ± ~5%, in the regions surrounding the ventricles.
x
, y
and z
deformation components for each subject. We first read in the file names then rearrange them into 17 × 3 and 19 × 3 cell arrays of file names as follows:
z=0
of all the data, arranged in rows of x
, y
and z
deformations for the trauma group, followed by x
, y
and z
deformations for the control group:
It's hard to see any difference in deformations between trauma and control, but at least the data look OK. Next we will need a mask. Since the trauma is expected to atrophy the white matter, let's form a mask by thresholding average white matter density. A file for this has already been created:
0.05, i.e. 5% or more white matter, looks like a reasonable threshold:
Now let's read in all the data, which just exceeds 64Mb, so it is memory mapped:
Now for the P-values:
The damage is quite extensive, but what are the actual deformation differences? We can visualize the deformations by adding them as little arrows to the image. To do this, let's first select some positions for the arrows. I've chosen positions on a lattice of 10mm intervals using Matlab's ndgrid
:
slm.ef
but since the deformations are measured from subject to atlas, i.e. atlas minus subject, we should flip the sign so that we have trauma minus control deformations. Then we can use Matlab's quiver3
to add the arrows:
The "0" as the 7th argument of quiver3
ensures that the arrows are drawn to scale, that is, they are the real movements in anatomy, in mm. Let's zoom in by hitting the "+" magnifying glass in the figure tool bar:
It looks like the arrows are pointing out from the centre, indicating that the ventricles of the trauma subjects have expanded, or equivalently, that the white matter surrounding the ventricles has atrophied.
This illustrates nicely how DBM can not only detect where the anatomy has changed, but it can also show how the anatomy has moved, i.e. the direction of the changes.
h1
, h2
, h3
and
h4
). These are to be compared to the 5th scan of rest (labeled hb
).
We can get a list of all the 70 PET data files as follows:
z=0
, laid out in a task × subjects matrix. To do this, we first read in just the one slice with z
coordinate 0
from all data sets, which easily fits into memory:
x
slices, all y
slices, but just the one slice with z=0
.
We then use
SurfStatViews
with a last parameter, layout
, which is a matrix which lays
out the slices (in the order in filenames) where you want them in the
array of images:
It is obvious that there is a bad scan (#25). In the FmriStat analysis, the entire subject subject #5 was dropped from the analysis. Here we shall retain all the data on subject #5, except the bad scan, and do a mixed effects analysis to recover as much information as possible from the data. To exclude the bad scan, #25:
SurfStatAvVol
tells the function that we want the minimum over scans (other possible choices are @max
for the maximum, or @plus
for the average, which is the default). The third argument replaces NaN ("not a number") by 0. SurfStatView1
is a viewer for both volumetric and surface data with only one view (hence the name). By default it will show three orthogonal slices through the middle of the data:
Now we need to choose a suitable threshold to mask the interior of the brain. 5000 looks like a good choice. To check this, reset the colour limits as follows:
Now we are ready to read in all the data inside the mask. If there is a lot of data, as in this case,
SurfStatReadVol
uses memory mapping rather than loading all the data into
memory:
SurfStatReadVol1
, extract the data part, then convert to a logical:
z=0
(0 in layout
leaves a gap in the array for the deleted scan):
It is immediately obvious that the data need normalizing so that all images have roughly the same average intensity, as follows:
Since the design is almost balanced, the simplest way for generating the levels of the factors for subject and condition is to generate them for a balanced design, then delete the levels for the bad scan:
Fitting the model, the within-subject correlation is quite high, averaging 0.59:
Now we test for the difference between the 3Hz condition and the baseline, picking two slices at z=[-20, 60]
for display:
To get a better look, let's threshold the T-statistic at 3 and add it to the figure by leaving off the clf
command (which clears the figure window):
We could add the mask to the image as well:
Other properties such as the colour and transparency can be modified, see the help of SurfStatView1
, and strung together in pairs of arguments after vol
. You may like the more traditional view of slices, say every 10mm:
Of course you may prefer to write out the T-statistic so you can load it into your own favourite viewer:
Note that the mean effective df is slightly higher than the fixed effects df, which would be 69 - 14 - 5 + 1 = 51. This is because the mixed effects model has recovered some of the between-subject variation to increase the amount of information about the contrast. Finally the P-values and Q-values are:
Finally, just for amusement, let's add a surface to the last image:
An alternative is to make up a new linear model with an explicit linear variable for frequency. To do this, we extract the frequencies from the cond
variable, and set the baseline to 0:
Baseline + Hz
does not force the intercept of the Hz effect to go through the baseline data, since Hz=0
might not produce the same response as the baseline. In other words, the intercept of the Hz effect is modeled separately from the baseline:
Now we fit the model in the usual way:
As we can see, the two T-statistics, the first a linear contrast, the second a linear effect of Hz, are almost identical. The main difference is that the df of the first is lower (51) than the second (53). Why? The reason is that the first model has a separate effect for each level of Hz, in other words, it allows for an arbitrary non-linear effect of Hz; the second only has a linear effect of Hz, so it has ~2 more df for the error. Which is better? As usual, it's a trade-off; the first will have an sd that is less biased, but less accurate; the second will have an sd that might be biased (if the the true Hz effect is non-linear, e.g. quadratic) but more accurate. Luckily both methods give almost identical results. Here are the P-values for the second method:
The activated local maxima are almost too small to see. Let's plot the data at the maximum T-statistic:
Although the T-statistic is high (6.04), the data doesn't seem to be highly correlated. This is because the Y values are adjusted only for the fixed effects, not the fixed and random (subject) effects. One way to adjust Y for subject is to treat subject as fixed:
The command ylim
sets the same y limits, so we can compare the plots.
The slopes are almost identical, but now the data in the second plot are far less variable because the subject effect has been removed, so now the data looks more correlated.
Let's look at a quadratic effect of Hz. Again there are two ways of doing it, either through contrasts or through an explicit quadratic model. Here's the first approach, through a quadratic contrast:
Here's the model for the second approach, though a quadratic model:
Here's the analysis of the quadratic effect:
The analyses are almost identical, apart from the different df, an neither shows any significant quadratic effect.
Finally, let's look at a cubic effect of Hz, done both ways:
slm.SSE
's, are the same as well. The answer is that they are exactly the same model! A cubic model in Hz (M3
) is identical to a separate mean for each of the four levels of Hz (M
). The only difference is that the models are parameterized in different ways. But the fitted models are identical. Why? A cubic can be fitted exactly to any data at four points.