This being prompted by taking another look at Dehaene’s book at reference 1.
It struck me the other day that graph theory with its edges and nodes did not describe upper case letters very well. It seemed to require, for example, three edges and four nodes to describe a ‘T’, where two edges, that is to say two lines, seems both more natural and more informative. So I started to wonder if there was a better way.
Suppose we want to be able to analyse upper case letters, with a view to writing a conventional computer program – that is to say not some kind of neural network – to recognise them. We want to know what is going on, not just to build a black box which does the job.
For present purposes, we suppose the upper case letters to be presented to the computer in black on a white ground, using an array of pixels with 128 rows and 128 columns, which ought to be more than enough. The sort of in-train information display panels with moving orange lights manage with far fewer.
Elements. We allow straight lines, cups and circles. Lines and cups have one line and two nodes each. With a cup being a smoothly bent line. A circle has one line and no nodes. With a circle being a cup where the two nodes have been drawn together, annihilating each other.
Interactions. We allow joining (as in the two lines of an ‘L’), meeting (the vertical line of a ‘T’ meets the horizontal line above, roughly in the middle) and crossing (two lines cross, roughly at their middles. As in an X).
We require a join to involve a significant change of direction, mathematically a significant discontinuity of the first derivative. This makes it a lot easier for the computer to detect nodes.
We suppose that we can program our computer to find and recognise these three elements and these three interactions in our array of pixels.
If our data is reasonably noisy, we might want to put lower limits on the sizes of lines, cups and circles.
The hypothesis is that the computer can then be programmed to recognise the upper case letters of the standard English alphabet using simple properties of those letters expressed in terms of these three elements and three interactions.
The letters
So, adapting the letters shown above slightly, we get the following, marking the ambiguous cases with a ‘#’ and the special cases with a ‘*’.
Letter A. Two lines joined at a node. A third line meeting the other two.
Letter B. One base line. Two cups, on the same side of the base line, each with a join and a meet. One join to one node of the base line, the other join to the other. The two meets coincide.
(#) Letter C. A cup.
Letter D. One base line. One cup joining that line at both of its nodes.
Letter E. One base line. Two lines on joining the nodes of the base line. One line meeting the base line. The three lines on the same side of the base line.
Letter F. One base line. One line joining a node of the base line. One line meeting the base line. The two lines on the same side of the base line.
(*) Letter G. A specialised cup. Alternatively a sequence of cup, line, line.
Letter H. Two lines. A third line meeting the other two.
Letter I. A line.
(*) Letter J. A specialised line. Alternatively, a specialised cup. Alternatively, a line joined to a cup – although this would break the change of direction rule. Alternatively, a sequence of three or four lines – although this sort of approximation is not in the spirit of this model and would also introduce ambiguity.
Letter K. A base line. Two lines meeting that base line, from the same side. The two meets coincide.
(#) Letter L. Two lines joined at a node.
(#) Letter M. A sequence of four lines, joined at their nodes.
(#) Letter N. A sequence of three lines, joined at their nodes.
Letter O. A circle.
Letter P. A base line with a cup joining a node of that line and meeting the line.
Letter Q. A circle crossed by a line. No other points of contact.
Letter R. A base line with a cup, as the P. With an additional line meeting the base line, on the same side as, but outside of the cup. The two meets coincide.
(*) Letter S. A specialised line.
Letter T. Two lines, one meeting the other.
(#) Letter U. A cup.
(#) Letter V. Two lines joined at a node.
(#) Letter W. A sequence of four lines, joined at their nodes.
Letter X. One line crossing another.
Letter Y. Three lines, joined at a single node.
(#) Letter Z. A sequence of three lines.
Which gives us three special cases, not catered for by the model, and rather more ambiguities.
To sort out the ambiguities and otherwise tricky cases, we might appeal to orientation, to vertical or to horizontal. Or assume that the bottom of the image square is horizontal. Or we might introduce line length or angle at a node. Which last jar, being metrical rather than topological properties. The topological sort might be thought to be more robust, although that would have to be tested.
The test
We present a series of test images to the computer, most of which contain reasonable letters but some of which do not.
The computer responds in every case and we have some formula to score the test based on how many it gets wrong. A formula which might distinguish correct letter, wrong letter, null in response to letter, letter in response to null and correct null.
Additional information
Under this scheme, ‘P’ could be confused with both the lower case ‘b’ and the lower case ‘d’.
We do not make much use of circles or crossings, with just two of each. And with Q having both. Although the French use crossings, for example on the ‘7’, to distinguish it from a ‘1’, which is not at issue here.
Graph theory does joins in the sense used above, but does not do meetings or crossings.
Conclusions
There are probably lots of different ways of doing this. But I have no idea whether this algorithmic approach has been made more or less redundant by being able to train a neural network to do the job, with worrying too much about how exactly it does it. Which last might be cheap and effective, but is not terribly satisfying.
A rather different question would be whether a model of this sort bears any relation to what a human brain does. Maybe Dehaene has the answer to that one.
References
Reference 1: Reading in the brain – Stanislas Dehaene – 2009.
No comments:
Post a Comment