Thursday 18 April 2024

Parcellation theory

The argument of this post is snapped above.

Contents

Introduction

Numerical digression

Scans

Reference brains and their parcels

Anatomical parcels

Functional parcels

Diffusion

Other considerations

Conclusions

References

Introduction

A tutorial digression prompted by my trying to read a paper about detecting sleep patterns from the connectivity graphs which can be derived from fMRI scans. To which last matter I shall return in due course.

Scans – MRI scans – of human brains produce huge amounts of data at the voxel level, where a voxel might be as small as a two-millimetre cube – in which case there are going to be a lot of them, say more than a 100,000 (in an entire human brain). Maybe with a data point every five seconds with a scan running over 10 minutes. There are now huge libraries of such scans, more or less publicly available. While the cost of these scans comes down and their power goes up. Two views of a functional scan are snapped above, with the high valued voxels being highlighted in red.

Analysing, visualising or modelling this sort of data is challenging and parcellating the voxels into of the order of 100 parcels (aka regions) for the brain as a whole can often be a way forward. Parcels are a way to get the volume of data down to something more manageable. A more robust way to compare one scan with another, or one subject with another, with a better signal to noise ratio than that of the disaggregated data. A way for one research team to talk to another. On the other hand, parcels blur detail, blur differences between scans and blur differences between subjects. And whatever it is that you are interested in might easily be lost inside the parcels.

There are lots of ways to do parcels and I have found the tutorial material at references 1 and 2 helpful. The classification snapped below comes from reference 2 and while it should not be taken too seriously, it does provide a useful framework on which to hang a discussion of the various approaches to parcellation. One such parcellation, from reference 5, is snapped above. While some tutorial material on fMRI analysis more generally is to be found at references 3 and 4.

In this, I associate to an orthodontist who attended to my front teeth when I was around 12 or 13 years old. In his spare time, he was interested in the way in which faces and jaws grow and I remember my father explaining to me that you had to come up with some fixed point in the subject’s face so that you could superimpose one (two-dimensional) image on top of another, in order to see how growth was affecting things. In which getting both the images aligned to the same vertical at least was relatively straightforward. A problem of the same sort as those which are about to follow, albeit a rather more tractable one.

Numerical digression

Excluding the neuron-rich cerebellum, a cubic millimetre of brain might contain around 20,000 neurons – at a spacing of not much more than a thirtieth of a millimetre – and a 2mm voxel 150,000. If we have 100 parcels, a parcel might contain 200 million of them. So a voxel can do a lot of computing – and a parcel orders of magnitude more. 

Noting here that the density of neurons varies a good deal from place to place – perhaps by a factor of ten – and that the neurons themselves also vary a good deal. Some are a good deal more complicated than others. The taxonomy of neurons is an industry in its own right.

Looking for computing parallels, I asked Bing how many nodes there might be in a neural network, where a node can be thought of as a fairly simple-minded version of a neuron. Bing was a bit vague, beyond saying that it was all rather complicated, not least because a network can have too many nodes as well as too few. Gemini was prepared to be a bit more specific offering the numbers in the snap above.

The small numbers are consistent with what I got from Bing, but I have not attempted to check the moderate and large numbers. With the large number suggesting that one of these large language models – consuming, we are told, large amounts of electricity – has computing power of the same order of magnitude as a human brain.

Gemini gets a bit more vague when asked about electricity but does offer: ‘… Training models like GPT-3 can potentially use hundreds or thousands of megawatt-hours of electricity (enough to power a small town for a period)…’.

I didn’t check this one either, but I did try him with: ‘moving on, what can you tell me about the battle of Grorton Rushett, which I think was fought between the Saxons and the Mercians, somewhere in middle England, sometime in the 8th century’. And while a few months ago he might have fallen for this non-existent battle, plucked out of the air, and told me all about it, on this occasion he was more careful.

A rather different sort of number would be the number of neurons that could implemented in a simulation package like NEURON from Yale. This package is as interested in what goes on inside a neuron as in their interactions, and I imagine the numbers would be quite small compared with those given above. See reference 11.

Scans

In the beginning, and for many purposes, the idea of a scan was to produce a static image of a brain or of some other part of the body. A better image than you could get from an X-ray, without the risks associated with X-rays. You just stuck the image up on a screen and looked at it.

Now, instead of X-rays, you put the brain into a very strong magnetic field and then hit it with an electrical pulse. This pulse disturbs the charged particles in the brain and the signal is what you get as those particles settle back to where they were before the pulse, where the nature of the signal is a function of how exactly you organise the field and the pulse, with quantities called T1 and T2 being relevant in this context. In the simple case, the signal is a real number, say between zero and one which says something about the voxel in question. Is it, perhaps, mainly fat or mainly protein? Is it full of oxygen – that is to say active – or not?

In the latter case people were looking at what the brain was doing (function), rather than how it was built (structure) – the ‘f’ for functional bit of fMRI – perhaps how the activity of one person compared with that of another. Then how the activity of an individual brain was changing at time scales of a few seconds – noting in passing that EEG and its friends can do much better than a few seconds – at the cost of poor localisation of the electrical artefact you are looking at. 

You wanted a series of snapshots of the activity of the brain, a series of volumes making up a take, in the jargon of the snap at the top of this post. You also wanted the activity recorded for all the voxels of a volume to be adjusted as if they were all recorded at the same instant of time, rather than over a few seconds. You wanted all the slices of a volume and all the volumes of a take to be adjusted for the inevitable small head movements so that the x, y and z coordinates of all the slices and all the volumes all lined up. So your time series of activity for the voxel [x= α, y= β, z= γ], where [α, β, γ] is some point of interest in the brain, is exactly that.

And then, having got all that out of the way, you wanted to be able to compare one brain with another, or perhaps the image of a brain at some point in time with the image of that same brain some days or years later. The problem here being that the brain is a rather jelly-like quantity, with a good deal of change of time and a good deal of variation between people. How do you compare one blob of jelly with another?

Reference brains and their parcels

Which is where reference brains come in, sometimes called templates. The gross structure – in particular the layout of the hills (aka gyri) and valleys (aka sulci) – of one healthy human brain is much like that of another, small differences notwithstanding. To which one can add bony landmarks. So we can continuously deform an image of one brain so that, at that level, it matches another. In the hope that the matching carries down, at least more or less, to the voxels below. A process which can be automated and which is usually called registration

Registration to a reference brain which has been thoroughly examined and which has already been parcellated. A reference brain which might be that of a person or which might be a collective. For example, from reference 12 we have: ‘… MNI305 was the first MNI [Montreal Neurological Institute] template. The current standard MNI template is the ICBM152, which is the average of 152 normal MRI scans that have been matched to the MNI305 using a 9 parameter affine transform. The International Consortium for Brain Mapping adopted this as their standard template; it is the standard template in SPM99 [Statistical Parametric Mapping]…’.

In this, one can either choose a well-known reference brain, with access to all the data which has been registered to that brain, or one can choose a reference brain which better fits the brains in question. Perhaps one is interested in young people between the ages of 20 and 25, and given that brains do change in a systematic way with age, one might take a reference brain of that sort. It all depends on what one is trying to do.


 As noted above, parcels are both a way of reducing the complexity of the brain and a way of facilitating the comparison of one study, one brain with another. Perhaps a hundred parcels, with most of them corresponding to more or less convex patches of the thin cerebral cortex which covers most of the brain. Where by thin, we mean between one and five millimetres, with there being a good deal of variation across the brain, over time, in sickness and in health. Parcels are usually connected, but sometimes not, as in the case, for example, when a parcel corresponds to one of the brain networks, perhaps to the well-known resting state network. Parcels are usually mutually exclusive, but again, sometimes not, perhaps because one is taking a probabilistic rather than a deterministic view.

Most of follows is derived from reference 2, the top of which is snapped above. The classification on which the paper is hung is snapped below.

Anatomical parcels

We are looking here to parcel up the grey matter of the human brain. A large part of this is to be found in the cortical mantle, which can be regarded as highly folded, two-dimensional sheet, around 2,500 square centimetres in extent and just a few millimetres thick. But not forgetting the various structures below that, inter alia, connecting the top of the brain stem to that mantle.

In the beginning, more than a hundred years ago now, parcels were defined in terms of the microscopic structure of the layers of the cortex, of the neurons in those layers, definitions which could then be expressed in terms of the macroscopic structure of gyri and sulci. See, for example, reference 5.

Much more recently, when MRI first arrived, a trained anatomist would map a new scan onto those pre-existing parcels, with the result that one could assign every voxel in that scan to a parcel and then get the computer to analyse the new data at the level of those parcels. The direct approach at the top of the snap above.

But this was an expensive business and given the number of scans being done, something quicker was needed and registration of the new scan to a reference scan was the answer. Registration was much quicker, much easier to automate than mapping the chosen parcels onto the new scan directly.

It was then found that one got a better result – in the sense of better alignment with what an anatomist would do – by constructing one’s reference from several subjects, reducing the bias which might be introduced by odd features of the chosen subject brain. The multiple-subjects-single-reference line of the snap above.

With the last variation here being allowing for using different references for different parts of the new scan, taking into account its peculiarities. But all this parcellation essentially based on anatomy, on the sort of thing that Brodmann was doing a century ago. And all grounded in something that one could see in a brain, either with the naked eye or with the help of a microscope. 

The paper goes on to offer a catalogue of parcellations of this sort.

Functional parcels

The next step was to cut loose from anatomy and define parcels of grey matter in terms of the fMRI scans, in terms of what the brain was doing and of how the various parts of the brain were connected. One draw being the attractive, mathematically flavoured theory. Another being automation. You could get your clusters without having to look at the data. From where I associate to the anecdote from the 1970s, according to which the analysts at the US Department of Defence (DoD) were all for machine translation because then you got all the benefit of having Pravda (otherwise the truth or Правда) in English without the bother of having to read it.

The core of these new methods was to consider each voxel as a time series and to look at the pair-wise correlation of one voxel with another. Which allowed one either to cluster the voxels or to graph them, with both approaches giving one parcels, perhaps probabilistic rather than deterministic. What I am not yet so clear about is how one compares the clusters or graph from one scan with those from another. Or given suitable aggregation, of one group of scans with another group of scans. Comparisons which were achieved by registration in the case of anatomical parcels.

The paper includes two sorts of scan, yielding two sorts of data. The first which measures changes in the BOLD signal, is about the level of blood oxygen in the voxel in question, a measure of neural activity. The second uses DWI (diffusion weighted imaging, aka DTI for diffusion tensor imaging) to estimate the dominant direction of white matter fibres in the voxel, in the case of a fibre bundle, the direction of that bundle.  

All of which can be summarised in terms of ellipsoids, prettily snapped in the plane from Wikipedia above.

Similarity between voxels usually means some measure of covariance and there is plenty of debate about exactly what sort of covariance works best in this context. And then there is the debate about whether or not to take magnitude as well as variation into account.

A lot of the time here, we are working with a large real valued, symmetric square matrix, voxel by voxel. In this case it is easy to convert the matrix to a graph by filtering the values of that matrix: two voxels are connected if the value is greater than the threshold. Alternatively, one can take the values, or some function of those values, to be the weights of the connections, thus giving a weighted graph. Either way, the nodes of the graph are the rows (or columns) of the matrix.

This graph is not directed, there is no causation in these connections, just covariance. Although I dare say one could introduce some of that by introducing lagging into the correlations.

Clustering

There are lots of clustering algorithms out there, a lot of them derived from data mining. Some of them fit work on brains better than others. Most of them involve testing the quality of the clusters so far in some way.

K-means is a group of iterative methods which involve starting by guessing at the centroids of each of the desired number of clusters – with needing to know that number in advance being a downside in some contexts. It seems that algorithms of this sort also tend to favour round clusters, not particularly appropriate in this case.

Hierarchical clustering looks to start with every voxel being its own cluster. One then merges clusters in an iterative way until one reaches the desired number of clusters.

Spectral clustering works in two stages. First the raw data is taken down to a much lower dimension; the data about each voxel is compressed in some way. Possibly by something akin to principal components analysis. Second, one clusters the voxels using that reduced, compressed data.

Graph

These algorithms draw on graph theory, which includes lots of work on partitioning graphs. Algorithms which have been used on brains include:

‘… Community detection algorithms search for subsets or ‘communities’ of nodes in graphs that are more strongly connected to other nodes in the community compared to nodes not included in the community. Such communities or modules provide a partitioning of the graph. Community detection algorithms can estimate the optimal number of communities…’.

‘… The graph cut algorithm formulates the parcellation problem as an energy optimization problem. An energy function is defined by the researcher that captures the parcellation objective, which is maximizing the similarity of voxels grouped together. Other desired properties for the parcellation can be achieved by incorporating appropriate terms into the energy function…’.

With the two quotes above being taken from Table 2 of the paper.

Statistical

I failed to make much sense of this section until I was reminded about reference 7, which included an accessible introduction to both K-means and mixture models. With one of the examples that Bishop uses being the clustering of colour values in an image in the context of lossy image compression. Another being the timings of the eruptions of the well-known geyser ‘Old Faithful’, expressed as a two dimensional plot of duration of eruption by time since last eruption, both expressed in minutes. A plot which has two, possibly three, clusters.

I think that in the present context, the data used by these methods are the time series associated with each of a large number of voxels. I don’t think that this is supplemented with data about the voxel, such as its position in the brain or its coordinates in the scan.

I find that one of the drivers for all this is the need for accessible ways to describe complex probability distributions. Not all distributions can be adequately approximated by a single normal (Gaussian) distribution, but one can do a lot better if one is allowed to sum a number of such distributions, rather in the way of a Fourier transform. One can have a complex distribution, while retaining its analytic tractability; one can have one’s cake and eat it.

This then links to mixture models, where one’s data is supposed to come from some number (often K) of distributions, all drawn from some well-known family of distributions. The idea is to estimate the parameter of those distributions, then assign each data point to one or other of them, thus arriving at one’s K clusters. Going further, one can apply a quality measure to those clusters, and maximise that measure over K. You get to know how many clusters there are and the assignment of data points to those clusters.

In this context the family of von Mises–Fisher distributions, probability distributions on the perimeter of a circle or the surface of a sphere, invented about a hundred years ago to investigate the clustering of different measurements of the same atomic weight, have proved useful and there are lots of vMF mixture models. Models which are likely to make use of the expectation-maximization (EM) algorithm. Maximisation over K is likely to make use of the Bayesian information criterion (BIC) in the energy function.

At first, I thought it odd to model the behaviour in time of voxels as a probability distribution across voxels. But now, maybe not. Given the complexity of the brain, with its thousands of millions of neurons, Maybe this strikes a useful balance between determinism and probability.

What is called dictionary learning seems to be a variation where each data point is expressed as a weighted sum of K basis functions. The largest weight is taken to indicate to which of the K clusters the data point (a time series) should be assigned.

Surface

‘… Edge detection algorithms [are] based on the observation that similarity in connectivity between an arbitrary seed voxel and all other voxels does not change smoothly and has sharp ‘transition zones’ over the cortical surface…’. This can be used to make boundaries and parcels.

On this account, these algorithms do not need to know the position or coordinates of the voxels: their mutual connectivity is enough.

I supplemented the short account of region growing with the helpful paper at reference 9, with supplementary available for the keen. Another complicated pipeline: pre-processing, stability map, seed selection from among the local stability maxima, region growing from the seeds, hierarchical parcel aggregation from the regions. Quite a lot of work went into choosing among the many algorithms available for each of these steps.

I also learned about, possibly not for the first time, about Statistical Parametric Mapping. Some history of which is to be found at reference 10. From which I learn that the SPM acronym was, inter alia, a nod to the Significance Probability Mapping which came before, from the world of EEG. But perhaps more important, I learn how much statistics has been poured into this work – and I imagine that most of us are going to have to take the statistical answers on trust.

Comment

I wonder about the extent to which this classification of clustering methods derives from their different origins, in different branches of science, rather than being that different in themselves.

The von Mises–Fisher distributions are example of probability distributions of random variables defined on a circle, rather than on the real line. From where I associate to the investigations of Winfree, noticed here from time to time, most recently at reference 8.

Diffusion

This paper covers the diffusion scan based clustering along with the functional, but to my mind it is rather anatomical, more based on the facts on the ground than the BOLD based clustering which is at quite some remove. The difference being that the anatomical clustering is based on cortical macro and micro structure, while this clustering is based on the wiring diagram, on the white matter rather than the grey matter, on the axons rather than the neural cell bodies.

Furthermore, one is doing the clustering entirely on the computer, rather than relying on registration and an anatomist.

Other considerations

I continue to worry about the huge amount of processing which is going on in the background to this imaging. What one sees on the screen is a very long way from what one gets from looking at a slice of frozen brain under a microscope. And I continue to worry about the amount of knowledge required to understand all that is going on in the background. I imagine that successful teams have to build trust in their collective knowledge and team members have to be content, to some large extent, within their own specialities.

It is easy to spend quality time on new and better parcellations, perhaps better suited to the immediate task at hand. But it needs to be remembered that a good part of the point of a parcellation is that everyone else is using it, that we are all singing from the same song sheet. So the best parcellation for some particular task may not be the best parcellation for the endeavour as a whole. I associate to the business of standards for the coding of home video tapes, where the second-best candidate became the standard. Which might make a nice story in the present context, but which a quick look at reference 6 suggests is rather oversimplified!

The production of volumes of data at the level of millimetre sized voxels is very seductive. But all is not as tidy and homogenous as might at first appear: the quality of this data varies a good deal across a volume. Some parts of the brain image better than other parts. Imaging is affected by the presence of bone or of blood vessels. And I dare say the list goes on. One tries to tidy the data up with pre-processing, but there is only so much that one can do.

A thought experiment about listening to an orchestra. All the instruments are playing in time and the corresponding electrical signals are dominated by that beat. So we have a clump of neurons, holding information about all these different instruments, made into a parcel because their signals appear to be highly correlated. But if we are taking an interest in orchestras in the brain, would we want all the instruments to be clumped together in one parcel?

Conclusions

This tutorial post complements that offered a few years back at reference 4.

Which being out of the way, clears the road back to the sleeping patterns and brain connectivity from where this rather lengthy digression kicked off.

References

Reference 1: https://dartbrains.org/content/Parcellations.html

Reference 2: A Review on MR Based Human Brain Parcellation Methods – Pantea Moghimi, Anh The Dang, Theoden Netoff, Kelvin Lim, Gowtham Atluri – 2021. Floraoddmenu_758.

Reference 3: Functional MRI: an introduction to methods – Jezzard P, Matthews P, Smith S, editors – 2001. Book_134. Some presently useful introductory material is to be found in chapter 11.

Reference 4: https://psmv4.blogspot.com/2021/07/teach-myself-all-about-fmri.html

Reference 5: https://en.wikipedia.org/wiki/Brodmann_area

Reference 6: https://en.wikipedia.org/wiki/Videotape_format_war

Reference 7: Pattern Recognition and Machine Learning – Christopher M Bishop – 2006.

Reference 8: https://psmv5.blogspot.com/2024/02/an-invisible-fingerprint.html

Reference 9: Spatially Constrained Hierarchical Parcellation of the Brain with Resting-State fMRI – Blumensath, Thomas, Saad Jbabdi, Matthew F. Glasser, David C. Van Essen, Kamil Ugurbil, Timothy E. J. Behrens, Stephen M. Smith – 2013.

Reference 10: https://www.fil.ion.ucl.ac.uk/spm/doc/history.html

Reference 11: https://www.neuron.yale.edu/neuron/

Reference 12: https://www.brainmap.org/training/BrettTransform.html. Quite old now, but hopefully gives the right idea about reference brains.

No comments:

Post a Comment