Wednesday, 15 February 2023

Fast matrix multiplication

This prompted by the piece in the MIT technology review at reference 1.

When I was young, I learned something about the multiplication of two dimensional matrices and about the formalisation of proofs – this last against the background of Gödel’s famous theorems about completeness and incompleteness. So this piece, which involved both, caught my eye.

Matrix multiplication is illustrated by the snap above, taken from the accessible reference 2, with the general idea being that a cell in the product is the scalar product of a row of the first matrix with a column in the second. One can write down straightforward, if sometimes long-winded formula for this, a formula which is easy enough to translate into something that a computer can deal with.

The story from Wikipedia (reference 5) is reproduced above. For present purposes, what is important is the number of pairwise sums needed to evaluate this formula. With a pairwise sum being something of the form ‘<number operator number>’, for example ‘2 + 4’. In the case of square matrices with n rows and n columns, and using the obvious approach, one gets (n) multiplications and (n - 1) additions, a total of (n × n × (n + n - 1)) operations for the matrix as a whole. That is to say, for large n, of the order of twice n cubed.

And there the matter rested for a long time, until, rather to everyone’s surprise, a chap called Volker Strassen demonstrated in the late 1960’s that one could do better, at least in the seemingly elementary case where n=2, getting the number of multiplications down by one, at the expense of rather more additions. But this was OK because, in the context, multiplications were far more expensive than additions. All this is detailed at reference 4 and illustrated in the snap below, again from reference 2.

This sort of thing being important because a great deal of today’s computing involves a great deal of matrix multiplication and a saving of say 20% in the computer time needed for all this multiplication would be very worthwhile.

However, the mathematics involved is tricky and we are a long way off knowing the best way to do larger matrices. Which brings us to a specialised form of proof, very much a subset of the set of proofs that Gödel was working with.

The left hand column of the snap above is such a proof, while the right hand column is a condensed version of a such a proof.

So a proof is a sequence of elements of the form < number operator number>. Each number must either be a given, that is to say a cell in one of the two matrices we are multiplying together, or the outcome of some previous element in the sequence. And, in the present case at least, the operator must be either multiply (×) or add (+). All the cells of the product matrix must appear exactly once in the sequence, with the last such cell being the last element in the sequence. The order in which these cells appear is not material, provided only that the sequence as a whole adheres to the rules for proof.

We add a cost function, that is to say some function of the number of operations involved in the proof, with the cost of each operation being, in turn, a function of the computing environment on which it is proposed to do the matrix multiplication.

What the DeepMind people appear to have done – and I have not done more than glance at their story at reference 3 – is to extend these ideas, mapping these proofs into a space which can be navigated by a version of the game playing computer program, AlphZero, the program that plays world class chess and go. A single player game in which the player, that is to say the computer program, navigates the proof space with a view to finding a proof with a lower cost than the proofs that have gone before.

One of the difficulties is that the proof space is extremely large. One is looking for needles in very large haystacks. And part of the solution involves tensors, something with which I have never got to grips.

And it seems that this program has done pretty well, coming up with algorithms which are better than those which went before. And the authors are hopeful that this new technique will prove helpful for other applications, that is to say other than matrix multiplication.

Maybe it is not just accountants that computers are making redundant. Maybe it is mathematicians too.

References

Reference 1: DeepMind’s game-playing AI has beaten a 50-year-old record in computer science: The new version of AlphaZero discovered a faster way to do matrix multiplication, a core problem in computing that affects thousands of everyday computer tasks – Will Douglas Heaven, MIT Technology Review – 2022.

Reference 2: https://www.deepmind.com/blog/discovering-novel-algorithms-with-alphatensor. The story from Google.

Reference 3: Discovering faster matrix multiplication algorithms with reinforcement learning – Alhussein Fawzi1, Matej Balog, Aja Huang, Thomas Hubert, Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Francisco J. R. Ruiz, Julian Schrittwieser,, Grzegorz Swirszcz, David Silver, Demis Hassabis,  Pushmeet Kohli – 2022. The paper in Nature.

Reference 4: Fast Matrix Multiplication – Markus Bläser – 2013. Some tutorial material.

Reference 5: https://en.wikipedia.org/wiki/Matrix_multiplication

No comments:

Post a Comment