Friday, 17 March 2023

Matrix visualisation for economists

A few days – or perhaps weeks – ago, I was trying to make sense of reference 1. This led to reference 2. Where puzzling about the point of maximal spanning trees led me to reference 3. Which turned out to be quite helpful.

An important ingredient of references 1 and 2 is parcelling up the surface of the human brain into (say) 100 or even 1,000 parcels, using fMRI to measure the activity in all those parcels and then producing something like a square correlation (or covariance) matrix, where the number of rows and columns is the same as the number of parcels. One might, as in reference 1, then go on to look at how this matrix varies with time. There one is interested in changes from minute to minute, but one might well be interested in longer time frames, perhaps to demonstrate that such matrices can be used to identify individuals.

Reference 3, together with the supplementary material, references 4, 5 and 6, as it turns out, has been both interesting and helpful. Providing a useful bit of support for the impending return to reference 2.

We start reference 3 with the thought that the development and evolution of the countries of the world is driven, at least in part, by the pattern of their external trade, in particular their exports.


The dataset is that documented at reference 6 and describes the exports of 72 reporting countries in the period 1962-2000 in terms of the SITC classification of traded goods. The present work uses data from 1998-2000 for the 750 or so significantly exported products of the 1,000 or so products identified by the four digit version, the sub-groups, of this classification. A very small part of which classification is snapped above and which reminds me of the roughly contemporary CODOT classification of occupations, noticed from time to time in these pages. See, for example, reference 7. Both being fascinating classifications which must have provided work for statisticians for years.


Using something economists call revealed comparative advantage (RCA), this data can be converted into a square matrix, with around 1,000 rows and columns, the cells of which describes the extent to which countries export pairs of products in a significant way. If a country exports wheat, how likely is it that it also exports coal? I have tried to encapsulate this conversion in the snap above, drawn from references 3 and 4.

The cells are called proximities, varying between 0 and 1. Slightly confusingly given the name, taking low values for dissimilar pairs of products, high values for pairs of similar products. Metrics and distances usually being the other way around. On the other hand, high values of proximity do go with high values for the weights of links, important when we move to looking at the matrix as a graph. And while correlations might vary from -1 to +1, rather than from 0 to 1, at least 0 means uncorrelated, that is to say dissimilar.

Conceptually all rather more complicated than the fMRI with which we started, where the corresponding cells, the corresponding proximities, are simply the correlations between two signals, one for the corresponding row, the other for the column. But as least the result is the same sort of thing.
 

Figure S1 in the snap above, lifted from the supplementary material to reference 3, shows one way of visualising this matrix (left), with more structure being brought out by suitably sorting the rows and columns (right). Inter alia, motivating dropping down from around 1,000 products to around 750.

The sort of thing one might well see in references 1 or 2, describing the correlation between the activity observed in a 1,000 parcel parcellation of the surface of the brain. With one difference being that one might produce such an image from every 5 minute segment of an fMRI scan of a brain; millions if not billions of them worldwide every year. While the economists can only manage one such image every year or so. Dealing with a million images is apt to be a rather different business to dealing with just one, even if they are all of roughly the same form.

With another difference being, to my mind anyway, that parcels of brain are more or less convex bits of the surface of the brain in three dimensional space, not quite the same as being in two dimensional space, let alone the one dimension available to a matrix of this sort, but at least having a position in space. This products do not have, although thinking with my fingers, one could perhaps do something by mapping their density on the surface of the globe. Nevertheless, a bit of a sheet is topologically quite different from a material scattered over the globe, although both do have position, of one sort or another. You buy a particular apple, but you just buy a chunk of cheese, cut from some larger piece; the difference between a thing and a material. But I digress.

Reference 3 then moves onto a quite different sort of visualisation, derived from the weighted graph or network corresponding to the matrix. First, one reduces the originally rather large network to a much more manageable maximal spanning tree, then one adds back in some of the more heavily weighted links which have been left out. It seems that in this case you get quite a good result by adding back in all the left-over links with a proximity value of 0.55 or more. 

The idea of such a tree being that being maximal it includes every node, it uses the links with the bigger weights and it does not include the more or less redundant smaller links providing additional routes between pairs of nodes. Intuitively, it is indeed the core or backbone of the network, with all the froth pruned away.

We are given a simple algorithm for generating one of these trees from a weighted network. Certainly maximal, in that it can’t be made any bigger while remaining a tree, but I worry about such a tree not being unique. And if not, does it matter very much?

There is some material on spanning trees in general at reference 8.

We then apply a something called a force spring algorithm to generate a nice looking, two dimensional visualisation of the product space which has been defined by our matrix and reduced to something a bit bigger than a maximal spanning tree. This algorithm comes in plenty of varieties, some suitable for large networks, some not.
 

Bing turns up the elegant description of such an algorithm snapped above, invented by Peter Eades, an Australian expert on such matters, in 1984. Included in a book chapter by Stephen Kobourov for Brown University of Rhode Island.

Note that these algorithm do not do all the work and the present authors admit to some manual touching up of the output. With the whole process striking me as rather ad-hoc.
 

They then decorate the resultant visualisation using all kinds of icons, shapes, sizes and colours, with the result snapped above. I think the general idea is that there is a central core of high value products which developing countries should aspire to and a low value periphery where developing countries tend to start. With one of the arguments being that countries need paths from outside to inside which do not involve a lot of big, difficult jumps. Which might mean that some countries are, in effect, rather stuck out on the periphery.

To my mind, the authors of the present paper, reference 3, have gone rather too far. Their graphic is too complicated, although I dare say it would work better on a big screen. I associate to the large and heavy atlas at reference 9, an expensively produced book full of elaborately decorated maps of Ontario, a book which I once owned, but which ended up at the bottom of the compost heap of the allotment that I had at the time – which I now rather regret, despite the load on our bookshelves. See reference 10.

Rather more successfully, the authors go on to project various aspects of the data onto the undecorated version of the visualisation of the product space. Perhaps to look at how those projections vary over time.

And along the way we have the intriguingly named PRODY, derived from reference 12: ‘… the key novelty is a quantitative index that ranks traded goods in terms of their implied productivity. We construct this measure by taking a weighted average of the per-capita GDPs of the countries exporting a product, where the weights reflect the revealed comparative advantage of each country in that product. So for each good, we generate an associated income/productivity level (which we call PRODY). We then construct the income/productivity level that corresponds to a country’s export basket (which we call EXPY), by calculating the export-weighted average of the PRODY for that country. EXPY is our measure of the productivity level associated with a country’s specialization pattern…’. Reference 12 goes on to get a lot more complicated.

Conclusions

An interesting excursion into visualisation in a field which is new to me. Economics is very different from neuroscience, but there are some techniques and issues in common, and there is something to be gained by looking at them together.

Furthermore, as intended, I am a bit more comfortable with the notion of a maximal spanning tree.

And lastly, more evidence, if any were needed, of our need to reduce complex problems to two dimensional images which the human brain can cope with. A limitation which I suspect to be related to the facts first, that our retinas generate two-dimensional images and, second, that the organisation of the human cerebral cortex is essentially two-dimensional.

References

Reference 1: The complexity of the stream of consciousness - Peter Coppola, Judith Allanson, Lorina Naci, Ram Adapa, Paola Finoia, Guy B. Williams, John D. Pickard, Adrian M. Owen, David K. Menon, Emmanuel A. Stamatakis – 2022.

Reference 2: Mapping the structural core of human cerebral cortex – Hagmann P, Cammoun L, Gigandet X, Meuli R, Honey CJ, Wedeen VJ, Sporns O – 2008.

Reference 3: The product space conditions the development of nations – Hidalgo CA, Klinger B, Barabasi A-L, Hausmann R – 2007. Plus supplementary material.

Reference 4: The Structure of the Product Space and the Evolution of Comparative Advantage – Ricardo Hausmann and Bailey Klinger – 2007

Reference 5: Standard International Trade Classification Revision 4 – Department of Economic and Social Affairs Statistics Division, United Nations – 2006.

Reference 6: World Trade Flows: 1962-2000 – Feenstra, Lipsey, Deng, Ma, & Mo's – 2005. See https://www.nber.org/papers/w11040



Reference 9: Economic Atlas of Ontario – W.G. Dean, G.J. Matthews, University of Toronto Press – 1969.


Reference 11: https://atlas.cid.harvard.edu/. A more modern version of reference 6. Even more complicated!

Reference 12: What you export matters – Ricardo Hausmann, Jason Hwang, Dani Rodrik, NBER working paper series – 2005.

No comments:

Post a Comment