Friday 8 March 2024

Sums

I read a few days ago at reference 2 that the large language models which are all about something quite different and which drive the likes of Microsoft’s Copilot and Google’s Gemini, did not used to be much good at arithmetic, but have now become much better. I have been focussing on the Google product, so I thought to stick with it for the purposes of trying some sums.

I started with the prompt ‘When I was at school, I had to do lots of sums like 'If coal costs 10/6 a cwt and I need 2 tons 15 cwt and 6 stone of the stuff, what will the bill be? Can you do this sort of sum?’.

A good part of this challenge was dealing with all the units which were current when I was at primary school, and which have rather moved on since. And knowing that farthings were around until 1961 and halfpennies until 1969 – so around when I was doing these kinds of sums.

Gemini gave me a detailed reply, setting down the steps he went through to get to, as it turned out, the wrong answer. But this made it easy to check. Further prompts ironed out some errors - but introduced others – and I thought that his penultimate reply was worse than the one before that. While on his last reply he more or less gave up.  But a thoroughly well-mannered and interesting exchange. Conversational or companionable even.

He got the right idea, the right strategy from the outset:

Convert the amount of coal into hundredweights (right answer: 55.75)

Convert the price into pennies per hundredweight (right answer: 126)

Do the multiplication (right answer: 7,024.5)

Convert the pennies back into pounds, shillings and pence (right answer: £29 5/4½, otherwise twenty nine pounds, five and fourpence ha'penny).

However, execution not so good,

In the first round, there was an elementary error in converting stones into hundredweight (cwt). And he did not twig that if you say: ‘A cwt and B stone’, B is apt to be less than the number of stone in a hundredweight. Which might have helped.

On his wrong figures, he got his multiplication right, but failed when it came to converting pennies to pounds and pennies, with his remainder on dividing by 240 being way out.

In the second round, he gets the multiplication slightly wrong. He then gets his remainder very wrong again and then gets into a complete muddle with shillings and pounds: ‘However, since 12 shillings is already more than a pound (which is 20 shillings), we have to carry over the extra shilling to the pounds amount’.

In the third round, we agree on the answer to the multiplication, but he then starts rounding halfpennies up. Continues to fail on the remainder and continues in his muddle with shillings and pounds.

In the fourth round, Gemini gave up. He was pleased that we agreed about the answer to the multiplication and congratulated me on my mastery of the antique notation for sums of money. But passed up on any further involvement in the sums involved.

Neither he nor I seem to have started out by making a rough estimate. That is so say, at getting on for 3 tons @ £10 a ton, we should be looking at around £30. Probably worthwhile when doing this sort of thing.

But all this involves rather more than just doing sums. When I went on to try him just doing straight forward multiplications, divisions and remainders, I failed to catch him out. Maybe I will think of some new cunning wheeze in the morning.

Conclusions

Doing sums may have been sorted out. But mixing that in with conundrums about antique units is work in progress. That against a background of very impressive competence with language generally: Gemini did seem to know what I was on about and did seem to be able to follow the thread of the conversation - at least up to a point.

While I wonder how homogenous the model is; a model which started out as an engine to forecast the next element in a sequence of words, based on soaking up huge amounts of text from the Internet, all very clean and simple at that level. But how many special modules and special tweaks have been put it to deal with special cases – like doing sums, not obviously related to forecasting the next element of a sequence? Or if not special modules, special activation functions and special back-propagation rules?

PS: the next morning. No wheeze, but it did occur to me that it might be reasonable to require all call centre operators - whether operating by voice or keyboard - should have to identify themselves. 'Hello. My name is Alexa46. I am a robot' or 'Hello. My name is Alexa Evanko. I am a person', as the case may be. OK, so there might be difficulties at the margin with operator assisted computers but that could be dealt with. In the meantime, maybe I will ask Gemini - which for some reason I treat as a male - whether he plays chess, something that very specialised computers got very good at a few years back.

References

Reference 1: https://gemini.google.com/

Reference 2: Large language models can do jaw-dropping things. But nobody knows exactly why: And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models – Will Douglas Heaven, MIT Technology Review, 2024.

No comments:

Post a Comment