In May, at the close of a grueling semester of math, my mind was in a state of ferment. Having valiantly fought with the foundations of calculus and linear algebra, I wanted nothing more than to celebrate my victory (such as it was) by withdrawing from the field of battle. No such luck; I had theorems on the brain. The best I could do, in the immediate aftermath, was to subvert mathematical means to some other, less exalted end. So, naturally, I postulated a slightly silly theorem: “Everything is the same as something else, except for something.” In lieu of proof — which would have required precisely the sort of rigor from which I sought to escape — I provided several examples, mostly involving cheeseburgers. To cite two, “The cheeseburger is the same as the hamburger, except that it also contains cheese,” and “Talking about cheeseburgers to illustrate abstract concepts is like eating cheeseburgers, only the satisfaction is intellectual rather than gustatory.” In spite of its unjustified assertions and its author's somewhat nihilistic intent, the so-called theorem manages to make an interesting claim about the world: given any two things, they will always be similar, yet different. In a sense, the claim is merely a hand-wavy generalization of genus-differentia definition, but this need not be an indictment, for there is something deeply intuitive about thinking of concepts in terms of their similarities to, and differences from, other concepts. Another connection to previous work was found by a friend: my example of two otherwise identical cheeseburgers was mirrored in assertion 2.0233 of Wittgenstein's Tractatus (”Two objects of the same logical form are — apart from their external properties — only differentiated from one another in that they are different”).

Other than lending my work an undeserved air of academic respectability, the Wittgenstein connection meant little to me at the time. Now that I've read a bit more of him, I see my theorem as a claim about the scope and usefulness of family resemblances. The scholarly community has yet to respond meaningfully to my work, but judging from Words and Rules, Steven Pinker would disagree. In the last chapter, en route to his conclusion, he contrasts Wittgenstein's famous idea with the Aristotelian categories required for genus-differentia definitions (271). It turns out to be an interesting conclusion, but let's not jump the gun on how he gets there. Pinker begins by claiming that the human brain generates and comprehends language by means of “two tricks, words and rules” (2). By “words” he means a set of memorized entries in the mental dictionary (arbitrary ones, by way of Saussure; and by “rules” he means a set of general patterns in the mental grammar. These facts and patterns are applied combinatorially, says Pinker, accounting for the richness and variety of language. (Our current conscious understanding of the patterns is attributed to the generative grammar of Chomsky. After trumping up the debate between rationalism and empiricism, he argues that the best place to peer into the workings of human language is the English past tense (90). This becomes the focus of much of the book.

In particular, Pinker's interest is piqued by irregular verbs. Why do we have verbs whose preterites can't be generated by rule? How did they get that way? What happens in the brain while children are learning them? These are indeed fascinating questions. It would seem more elegant, if language is to have rules at all, not to deviate from them. Yet we do, and what's worse, deviations occur most frequently in the most common verbs — be, go, have, make — so we are bombarded regularly by irregularities. According to Pinker, the very frequency of these verbs earns them prime real estate in the memorized lexicon, thereby ensuring their faithful transmission across generations in their full irregular glory. This is to say, once a verb becomes irregular, there is a very powerful tendency for it to stay that way. But how does the process start? For have/had and make/made, they were originally regular ”haved and maked, but enough lazy speakers swallowed the consonants that at some point in the Middle English period speakers didn't hear them and assumed that they were not there at all” (58). A similar process is cited in children, who sometimes spell find “fid”, not hearing the n. That English orthography is itself highly irregular (having long since departed from sensible rules to represent vowel sounds) surely doesn't help.

Having discussed at some length the prevalent Chomsky-centric theories of generative grammar and phonology, which he acknowledges as impressive though not necessarily correct, Pinker turns to examining an alternative model of language synthesis: the pattern associator memory of Rumelhart and McClelland. A kind of neural network, the pattern associator memory is initially empty, but arrives fitted with some preformed ideas about how to manufacture verbs. For instance, verbs are expected to have stems, and “speech sounds are represented in the [software] not as phonemes but as bundles of features such as 'voiced' and 'nasal'”, after Chomsky (105). (Of course, that the software was programmed with preformed ideas at all is itself a nod to Chomsky.) The model is primed with a series of verbs in present and past tense, from which it makes connections between sound patterns. The process is iterated, and at each iteration, the model makes adjustments to the strength of certain connections in order to improve the correctness of its guesses for the past tense of each stored verb. Rumelhart and McClelland were able to show that, given the right set of input verbs and the right number of iterations, the model would not only correctly generate the past tense forms of most of the verbs it had been trained on, but also predict the past tense forms of a new set of input verbs with remarkable accuracy. It would even make the sort of overgeneralization errors seen in children, “such as catched and digged” (108). Pinker finds the pattern associator memory impressive as well, yet also incomplete; it generates verbs with some success, but doesn't address the problem of how we recognize them. The stage is set for Pinker to synthesize his own theory:

[Alan] Prince and I have proposed a hybrid in which Chomsky and Halle are basically right about regular inflection and Rumelhart and McClelland are basically right about irregular inflection. Our proposal is simply the traditional words-and-rules theory with a twist. Regular verbs are computed by a rule that combines a symbol for the verb stem with a symbol for the suffix. Irregular verbs are pairs of words retrieved from the mental dictionary, a part of memory. Here is the twist: Memory is not a list of unrelated slots, like RAM in a computer, but is associative. Not only are words linked to words, but bits of words are linked to bits of words…. like stems, onsets, rimes, vowels, consonants, and features. (117-118)

Pinker then examines further experimental data (some of it his own) to test the veracity of his theory. The chapters I found most interesting were about language acquisition in children and language loss in cases of brain damage or neurological disorder. For me, the most surprising finding was that for a period of time, children appear to regress in their language development. Pinker explains this in terms of words and rules: in the beginning, lacking data from which to generalize, every form of every word can only be memorized. Once the child intuits rules, he then zealously applies them. Adults find the resulting instances of overgeneralization cute and amusing, but Pinker points out that we grownups do the same thing when presented with an unknown verb, challenging the reader to provide the correct past-tense form of to shend (198). The answer is probably shended, right? But of course it's shent, and “the only way you could have produced shent is if you had previously heard and remembered it.” For a child, all verbs are unknown. A related conclusion is that “irregular verbs tend to be high in frequency” (201). When an irregular verb becomes sufficiently rare, its irregularities cease to be reliably transmitted to the next generation of speakers, and the verb (assuming it is not entirely lost) reverts to regularity by default.

Fully aware that his argument is thus far rooted in English alone, Pinker branches out, first into German. (Encountering a quotation from Mark Twain's famous essay The Awful German Language, one of my all-time favorites, put a gigantic smile on my face.) German is notoriously less regular than English, but the theory holds up with respect to irregular noun plurals. He inspects Dutch, then jumps to French, Hungarian, Arabic, Hebrew, the inflectionless Chinese, and even the Arapesh language of New Guinea. None is treated anywhere near exhaustively (”every language… deserves a book of its own”) but each shows evidence of having both irregular behavior and rules which otherwise apply (239).

Here we finally return to my oddball theorem about things and their similarities and differences. Pinker says it is evolutionarily adaptive for our brains to be able to categorize in two ways: the classical Aristotelian strict kind, and the Wittgensteinian rough kind. The former allows us to reason about ideas and experiences in a rigorous way, and the latter is sometimes the only way to extrapolate from what we know. As a result of this adaptation, our brains are accustomed to dealing with word-like associations and rule-like abstractions. At least from this vantage point, Pinker concludes that we have learned to speak as we do because we have learned to process reality through an algorithm combining associations and abstractions. As a computer programmer, I find his argument and his conclusion far more convincing than either generative grammar or neural networks alone. It is this peculiar algorithm — words and rules, together — that enables the rich variety of expression we use every day.