Monday 6 May 2024

Outgroups

I have been doing a bit of botanising on my walks around Epsom, with the result of my making an excursion into the world of plant systematics, the business of organising plants into classifications and trees. A big business, given that there are maybe 250,000 species of flowering plants alone, organised into maybe 12,500 genera. With species being given binomial names, so for the buttercup we have ‘Ranunculus repens’ where Ranunculus is the name of the genus in which the buttercup is to be found and repens is the name of the species within the genus. Binomial names are supposed to be unique and are regulated by both national and international organisations. The snap above, a bit animalist, gives the general idea.

An excursion which almost started as a result of the Raynes Park haul noticed at reference 14, a haul of which just Bentham & Hooker survive, with Davis & Heywood’s ‘Principles of Angiosperm Taxonomy’ having been culled. As it turns out, a pity – although if push comes to shove, Abebooks will do me a hardback for a tenner plus postage.

Then there is a slightly related matter of the number of segments in an orange, often ten, and whether that bears any relation to the number of petals of the flower, which is five. Gemini was quite firm that it did not, but I am not so sure and am poking about at something called merosity, for which see reference 15. But more about that another time.

Plants systematics has been around for at least a couple of thousand years and many eminent people have put their shoulder to this particular wheel. With an important thread in current work being something called cladistics – this despite some older botanists – for example Thorne of references 3 and 4 below – being rather irritated by the zeal and dogmatism of some of the earlier cladists, perhaps more interested in theory than in plants. A bit too keen on spreadsheets – a trait that I can relate to.

I find the language in general and the vocabulary in particular deployed by systematists rather daunting and got stuck on ‘outgroup’ – until I came across the helpful tutorial at reference 9. The search for her was not without interest, but I dare say I would have got there a lot quicker with a bit of expert help. There is a lot to be said for regular textbooks and regular tuition – but, sadly, I can’t manage tuition on YouTube, however good it may have got. So what follows is what I have now learned about outgroups.

With most of the stuff referenced below to be had for free from the Internet. Much of it only suitable for specialists, although the introductory material can be helpful, and some of it is mainly taken up with long lists.

Cladism

Cladistics is all about building trees of life, or of chunks of life, which reflect history, how species come to be. Not so much ‘is this species like that species’, in the way that a dolphin is like a shark, but where did the dolphin come from, which is not quite the same thing, and certainly not where the shark came from. The dolphin being a mammal and the shark not even being a bony fish, although it does count as a chordate, that is to say among the pre-cursors to the vertebrates. A discipline which might be regarded as having been invented by the author of reference 12, a chap who now has the society of reference 19 named in his honour.

In this, there is an underlying belief that there is a complete tree of life, starting at the beginning – if not right at the very beginning, but perhaps with the eukaryotes – and ending with all the species in the world, such that any tree of life that we might draw up will, give or take a bit of error and confusion here and there, be a sub-tree of that master tree. Reference 6 is a rather impressive attempt at same.

Each node of the tree defines a monophyletic clade, that is to say all the descendants of some one, putative ancestor. Polyphyletic and paraphyletic are not allowed and I have not noticed much in the way of cross-cutting hybrids.

There is also a belief in what is called parsimony, that evolution likes to keep things simple and that, given a number of trees satisfying the data to hand, the right tree will the simplest one, the one involving the smallest amount of change as you move from the root to the leaves. Where the trees in question have just the one root (or trunk) and lots of leaves. Where change is not always forward to some new state; reversion to some prior state is also allowed. Cladistics sometimes seems to reduce to devising clever minimisation algorithms – less of a business with modern computers now than it was when all this kicked off back in the 1960s.

Flat files

I now introduce flat files, as it happens pretty much my own starting point as a wannabee government statistician. The flat files which became the worksheets of Microsoft’s Excel; rectangular arrays of data where the rows are usually somethings and the columns are the properties of those somethings. Properties which range from the number of legs, through the colour of eyes to the value of the AZC_3892 gene – as it happens, a real gene if I have got the jargon right. For present purposes, hopefully without loss of generality, we stick to binary properties which take just two values: present or absent, true or false.

So from reference 11, which offers a cladistic analysis of 35 eukaryotic taxa – that is to say, excluding large plants and large animals, we have the following.

Most of the somethings, I think a mixture of phyla, classes and orders, the middle bands in the snap at the top. Bearing in mind that this paper is forty years old and things have probably moved on a bit. A job that is never finished.

Some of the properties.

Some of the columns. This particular table being transposed with the somethings across the page, the properties down the page.

In this, the convention is that all our characters take one of two states, a or b, often 0 for the primitive state, 1 for the derived state. There may also be provision for missing for one reason or another, in this table ‘?’.

There are then lots of more or less publicly available computer programs which can turn such data into trees, or at least networks. A business which is related to cluster analysis, which cropped up in the excursion reported at reference 15.

Networks and trees


We start with some motivation.

In panel 1, in the figure above, we have a network constructed from the six purple nodes with their four shared properties.

Nodes A and B are not distinguished by these properties, so we join them up with an additional node a. Ditto nodes D and E with node b.

Then, property 1 separates node C from node a.

Then, node C is two steps away from node F, so we need a node c between them, using properties 2 and 3.

Then, property 4 separates node b from node c.

Giving us a minimal if not minimum network connecting up our six nodes. 

In panel 2, we redesignate node F as the outgroup. We hypothesise that the outgroup provides us with the primitive states of our six properties. The outgroup acts a present proxy for the now absent ancestral node, at least in so far as these four properties are concerned.

Then, moving out into the network from the outgroup, now serving as the root of our tree, we can give direction to all our edges, converting our network into a tree. This procedure is well defined because our network is minimal, without redundancy or cycles. Note that one of the remaining purple nodes is not terminal, does not include all its descendants and is not monophyletic – something which I feel is not quite right – and is changed in a second version below. A change which results in a bigger score, for which see below.

In panel 3, we turn things around a bit, giving our tree in more normal orientation.

Thus illustrating how an outgroup can be used to transform our flat, static data into an evolutionary tree.

Our network rules

We now provide a bit of formality, the sort of thing that computers like if they are to help us. 

In what follows, we use the diagramming convention suggested in the figure above. A very small part of the tree of life, with evolution running up the page.

Note: there are lots of ways in which one could do all this. It is more a question of what seems to work, rather than what is right. But it does seem right, nevertheless, that any set of rules should be well defined and consistently worked through. 

We suppose we have a collection of binary properties, in the first instance taking the value ‘a’ or ‘b’. Allowing more values would complicate the argument without changing what is important for present purposes.

A node is some setting of the property values: all the individuals at that node take those values for those properties. And we allow distinct nodes with the same values; nodes which are not resolved by the properties we have chosen. A node may be as small as a species, roughly speaking a group of sexed individuals capable of interbreeding. All our individuals are assumed to have sex, and to reproduce in that way, but sex is not considered to be a property for present purposes. An assumption which works better for animals than it does for plants.

Some nodes are real taxons – some well-defined group of taxonomic interest – in the here and now. These are given black edges in the figure above. Other nodes are ancestral taxons, taxons which might have existed once, but which do not exist now. So in the example above, we have the ancestral mammals, all with five fingers – while now we allow mammals with one finger. A slightly awkward distinction which reflects the tension between a static system of classification and evolutionary change, the desire to capture both views in the one diagram.

Our ingroup is a set of such real taxons, the mutually exclusive set of nodes which we wish to organise in time from an evolutionary point of view. 

Our outgroup is another real taxon which is apart from, is separated from, the ingroup. The choice of an appropriate outgroup for the work in hand is important as the outgroup drives the distinction between primitive and derived values of properties and can be used to impose direction, that is to say time, on any undirected network that we might make of the ingroup.

Rule: for each property, there is at least one node in the ingroup which takes the same value for that property as the outgroup. The ingroup spans the outgroup in that sense. There is also at least one node which takes some other value. The property is interesting in that sense.

An edge links two nodes which differ in their value of at most one property. At this point, the edge is neutral as regards order in time, as regards which node is the precursor of the other, which node is a mutation of the other.

Definition: a group is a minimal, connected, acyclic network of nodes and edges which spans the nodes of the ingoup and the outgroup. A node of the ingroup is connected to exactly one edge. A group will usually include nodes additional to the outgroup and to those already present in the ingroup. Such nodes will be used, inter alia, to group any duplicates there may be. A group will also include a node which corresponds to the ingroup, in the sense that the nodes of the ingroup are all descendants.

The outgroup is supposed to be primitive, with its values of the properties being the primitive values. For these purposes the outgroup is conflated with the most recent common ancestor. The other values are called the derived values. Conventionally, the primitive values are denoted by ‘0’, the derived values by ‘1’. Rather than the ‘a’ and ‘b’ above.

Spanning the ingroup from the outgroup converts the group from a network to a tree. The edges are no longer neutral in time. In which spanning we do not exclude a value reverting from 1 to 0, the derived value reverting to the primitive value. The outgroup is the root of the tree and the nodes of the ingroup are its leaves.

The value of a group is the number of edges that it has. 

The task is to find the group with the smallest value. The rule of parsimony says that this is most likely what happened, this is how the ingroup evolved from its ancestor.

This minimisation will usually involve ties, so we need to add some rule for breaking ties.

Some commentary

Given the putative and unique tree of universal life, both the ingroup and the outgroup can be mapped onto it, giving us a unique common ancestor. There is a sub-tree which takes in both ingroup and outgroup. There is a point of attachment on the ingroup network to the tree, otherwise its root.

And in the absence of such a tree of life, a bit of minimisation over the possible points of attachment hopefully gives us the right one. Our sub-tree now describes, to the extent that the property data provided allows, the evolution of the ingroup.

Choice of outgroup is important, and different outgroups can give different answers. One way of testing all this is to run the process with a number of outgroups to see if they all give more or less the same answer for the ingroup.

Another issue is contradictory data, contradiction which may or may not be the result of error. One property may pull one way, another property another. The minimisation rules should cope here: one just makes the best available compromise, the one giving the minimum value.

On another tack, I have been impressed by how difficult I found it to think about all this sort of thing while lying in bed, even with my eyes shut. I seemed to get on much better when I could think with my fingers – either tapping away on the keyboard or doodling on a piece of paper. Is it that working memory needs this sort of support if there is to be coherent, helpful thought or what? I associate now to a mathematics teacher who once explained that the great thing about mathematics was that you could do it while sitting on the beach with the children. All you needed was a stick and a bit of sand. The present point being that he needed the stick and the bit of sand.

Rather late in the day, taking a look at reference 7, I came across reference 17, a substantial looking book, with a title which appears to take an even more philosophical tone. Perhaps it is not a coincidence that one of the authors of reference 7 comes from the US museum of natural history while the author of reference 17 comes the UK one. I resisted buying a copy of this last, despite there being one on eBay going for what looked like half the going rate. Settled for his more popular book on venom instead.

But I did learn of morphocline from reference 7, an earlier take on primitive and derived features, from where I associated to reference 18, a relic of a quite different time. Outgroups in one form or another have been the subject of taxonomic controversy for a long time.

While according to Wikipedia, the journalist author of the quote at the bottom of the snap above: ‘… Thompson was known for his lifelong use of alcohol and illegal drugs, his love of firearms, and his iconoclastic contempt for authority. He often remarked: ‘I hate to advocate drugs, alcohol, violence, or insanity to anyone, but they've always worked for me’…’. Rum sort of person to be mixed up with.

Conclusions

Work in progress. But at least I think I have got a grip of sorts on what outgroups are for. Four days it has taken me!

Which did not include resolving to my complete satisfaction the tension between a static classification of what we have now and the dynamic story of how we got there. More work in progress.

Gemini

As an afterthought, I tried asking Gemini what outgroups were for in cladism. His answer was pretty good – now that I know the answer. So I think that, had I read this before, I would have had the same trouble with it as I had above. I had to work through more of the details before I was comfortable with it. Rather as, in mathematics, it is no good just reading the words in the textbook, you have to do the exercises for the words to acquire real meaning, to acquire some life.

Parsimony seemed to be missing here, so I tried again with the following result.

Only very slightly spoiled by not understanding the colloquial form 'getting all mixed up with'. No sense of humour!

I pushed on with an appropriate outgroup for looking at the number of fingers in mammals, often but not always five. Here the first answer was pretty good and his responses to supplementaries helped to round things out a bit.

In sum, answers good, but no substitute for getting one’s own brain to do some work, particularly on the details of how the cladistic trees are built from the raw data.

References

Reference 1: Guide to flowering plant families - Wendy B. Zomlefer - 1994. University of North Carolina Press, Chapel Hill.

Reference 2: https://psmv5.blogspot.com/2024/04/senior-moments.html

Reference 3: Classification and geography of the flowering plants – Thorne R F – 1992.

Reference 4: An updated phylogenetic classification of the flowering plants – Robert F. Thorne – 1992. No freebie available.

Reference 5: An update of the angiosperm phylogeny group classification for the orders and families of flowering plants: APG II – The Angiosperm Phylogeny Group – 2002. 

Reference 6: The tree of life: a phylogenetic classification – Lecointre and Le Guyader – 2006. 

Reference 7: On outgroups - Kevin C Nixon, James M Carpenter – 1993. 

Reference 8: Outgroup analysis and parsimony - Wayne P Maddison, Michael J Donoghue - 1984. 

Reference 9: Basics of cladistic analysis – Diana Lipscomb – 1998. 

Reference 10: https://biology.columbian.gwu.edu/weintraub-program. Lipscomb’s home at the time of writing reference 9.

Reference 11: The eukaryotic kingdoms - Diana Lipscomb – 1985. 

Reference 12: Phylogenetic systematics – Hennig, W – 1966. Available for browsing at https://archive.org/details/hennig-phylogenetic-systematics-1966

Reference 13: Phylogenetic systematics: A concise guide on the theory and practise – Hasan H. Basibuyuk, Robert Belshaw, Fevzi Bardakci, Donald L. J. Quicke – 2015. 

Reference 14: https://psmv5.blogspot.com/2021/08/cheese-time-again.html

Reference 15: https://en.wikipedia.org/wiki/Merosity.

Reference 16: https://psmv5.blogspot.com/2024/04/parcellation-theory.html

Reference 17: Ancestors in evolutionary biology: Linear thinking about branching trees – Ronald A Jenner – 2022.

Reference 18: The hand: Its mechanism and vital endowments, as evincing design, and illustrating the power, wisdom and goodness of God – Sir Charles Bell, KGH, FRS L&E etc – 1874.

Reference 19: https://psmv4.blogspot.com/2019/08/darwin.html. Previous notice of reference 18.

Reference 20: https://cladistics.org/.

No comments:

Post a Comment