If a million monkeys typed on a million typewriters for a million years, one of them would be sure to type out one of Shakespeare's plays.
Of course, you would need a million English Teachers to read all the typed material, as well as thousands of zookeepers to tend to the monkeys - but that would create jobs. Perhaps this grand experiment could be funded by the Jobs Bill. It would surely stimulate the economy and, we could sell the poems, short stories, novels, and other works created by the monkeys. SO, it would be revenue-neutral.
I decided to do the experiment using the latest in computer tools, my laptop plus an Excel spreadsheet. You can download the Excel spreadsheet here. Every time you press the F9 key on your PC keyboard, a "monkey" produces a paragraph.
Now, to give the "monkey" all possible advantages, I created a special keyboard that has more "E" keys and "A" keys and so on, according to their frequency in English text. The keyboards have no numbers or special characters. Half the monkeys are working on keyboards with no punctuation, since any good English teacher can visually pick words out of a stream of characters and phrases and sentences out of a stream of words. The other half have keyboards that include some comma and period keys.
Below is the BEST example I could find in quite a few tries. Words are outlined in red. [Click CTRL + repeatedly for larger view. CTRL - for smaller view.]
Download the Excel spreadsheet here.and try it yourself! Perhaps your "monkeys" will have better luck.
WHY DID IT FAIL?
This is a good example of how the most compelling thought experiments can mislead even the most intelligent among us. Indeed, it would probably take an overly intelligent person to buy into this concept.
It turns out that Wikipedia has the explanation for the failure of our expectations:
"... If there are as many monkeys as there are particles in the observable universe (10^80), and each types 1,000 keystrokes per second for 100 times the life of the universe (10^20 seconds), the probability of the monkeys replicating even a short book is nearly zero."
THE "PARTS" PROBLEM
One of the many valuable things I learned about from Prof. Howard Pattee, when he was my teacher and Chairman of my PhD Committee, was the "parts problem".
For a partially random process of assembly to result in any thing of value, the parts must be in the right proportion to the thing you will produce. In the case of the Million Monkeys, the parts are random letters and we are looking for a book-length result (a Shakespeare play).
All we got were a few English words sprinkled among lots of gibberish.
Had we started with words as the parts, and had they been in proportion to their frquency in the English language, we would have obtained better results.
For example, there are about 10,000 English words that constitute over 99% of all English writing. Say we had a "keyboard" with 100,000 keys, with each word having as many keys as justified by its frequency in the English language. A "monkey" typing thousands of keystrokes on such a "keyboard" would be quite likely to produce a number of grammatical phrases and even sentences, and perhaps several meaningful sentences. The "monkey" might even produce a unique, original thought.
We'd be still further ahead if we used the computer to impose grammatical structure. The "monkey" would have three "keyboards". the first would have keys for SUBJECT, the second for VERB, and the third for OBJECT. Thus, each sentence would be of the form: "The BOY LOVES the GIRL" or "JACK HITS the BALL", etc. Of course, most sentences while gramatical, would not be meaningful, "The GIRL HITS the BEDROOM" or "The GRAPEFRUIT LOVES the SHOES", etc.
Of course, this system would create only the simplest of simple sentences.
For more natural sentence structure, we could use larger parts, such as phrases. Or perhaps ready-made sentences with fill-in-the-blanks like those we used to use as party games.
Like many things in the natural and artificial world, written language is a hierarchical set of structures. English consists of letter characters that, when taken in groups, form words of up to several letters. (But, you can't just take random letters and get a word. You need a proper ordering of vowels and consonants, etc.) At the next level up are simple and compound sentences made up of groups of words. (But, that cannot be random either, you need subject, verb, object substructures). At the next level paragraphs, then sections, then chapters, etc.
HOW THE GENETIC SYSTEM SOLVES THE "PARTS" PROBLEM
When we decode the genome of some animal we express it in a series of four nucleotides A, T, G, C. Each of these letters stand for a molecular assemblage containing a dozen or two atoms. Sequences of these letters (in the "genotype") code for the generation of various amino acids and groups of amino acids code for proteins. Combinations of proteins constitute what we call "genes" that code for physical characteristics (in the "phenotype").
The genetic system long ago settled on a really neat hierarchcal system where the lowest levels are very stable and most are common between different species. When DNA is copied, there are multiple instances of codes for the really important proteins. There are correction mechanisms for many types of mutations (copying errors) . The same is true at the next level, which we call "genes", and the level above that of multiple genes that work in concert, etc.
Notice how, in the genetic system that has been evolving over the past three or four billion years, the "parts" at each level are appropriately sized for their jobs. (My Optimal Span Hypothesis http://iraknol.wordpress.com/article/optimal-span-3ncxde0rz8dtk-2/ ) provides a basis, founded in well-established information theory, for how hierarchical systems are most effectively organized.)
A TEXT GENERATOR THAT REALLY WORKS
Here are excerpts from a "Post-Modernist" academic paper I just generated:
Reinventing Modernism: Neosemioticist objectivism, capitalism and Derridaist reading
Department of Ontology, Massachusetts Institute of Technology
1. Derridaist reading and patriarchial conceptualismIn the works of Burroughs, a predominant concept is the distinction between figure and ground. Therefore, subtextual dialectic theory states that truth is unattainable.
The main theme of the works of Burroughs is the failure, and hence the futility, of precapitalist sexuality. In a sense, several discourses concerning patriarchial conceptualism may be discovered.
If Batailleist `powerful communication’ holds, we have to choose between subtextual dialectic theory and cultural narrative. But the creation/destruction distinction prevalent in Burroughs’s The Last Words of Dutch Schultz emerges again in Naked Lunch.
2. Expressions of dialecticThe characteristic theme of Hamburger’s analysis ...
1. Hamburger, O. ed. (1972) Subtextual dialectic theory and Derridaist reading. University of California Press ...
The above "scholarly paper" and as many as you'd want to see like it, are available at Communications from Elsewhere.
The computer program behind this feat starts with parts that are very large. Indeed, each paper has a Title, Authors, Sections (with paragraphs and sentences), and Citations. Each of these has a set form. The only randomness is the insertion of words from certain lists into specified blanks. The results are quite compelling.
Indeed, if you gave one of these papers to a group of intelligent people who were not experts in post-modernism, many would accept them as peer-reviewed material. AND a Post-Modernist Journal might peer-review and accept the paper for publication! (See Sokal Affair)