The Virtual Philosophy Club: Million Monkeys on a Million Typewriters and Other Failed Experiments

Monday, March 8, 2010

Million Monkeys on a Million Typewriters and Other Failed Experiments

Many, many years ago I came across a most compelling thought experiment (you must have heard it as well.) It goes like this:

If a million monkeys typed on a million typewriters for a million years, one of them would be sure to type out one of Shakespeare's plays.

Of course, you would need a million English Teachers to read all the typed material, as well as thousands of zookeepers to tend to the monkeys - but that would create jobs. Perhaps this grand experiment could be funded by the Jobs Bill. It would surely stimulate the economy and, we could sell the poems, short stories, novels, and other works created by the monkeys. SO, it would be revenue-neutral.

MY EXPERIMENT

I decided to do the experiment using the latest in computer tools, my laptop plus an Excel spreadsheet. You can download the Excel spreadsheet here. Every time you press the F9 key on your PC keyboard, a "monkey" produces a paragraph.

Now, to give the "monkey" all possible advantages, I created a special keyboard that has more "E" keys and "A" keys and so on, according to their frequency in English text. The keyboards have no numbers or special characters. Half the monkeys are working on keyboards with no punctuation, since any good English teacher can visually pick words out of a stream of characters and phrases and sentences out of a stream of words. The other half have keyboards that include some comma and period keys.

Below is the BEST example I could find in quite a few tries. Words are outlined in red. [Click CTRL + repeatedly for larger view. CTRL - for smaller view.]

I really expected to see more words and some phrases and perhaps a sentence or two. NADA!

Download the Excel spreadsheet here.and try it yourself! Perhaps your "monkeys" will have better luck.

WHY DID IT FAIL?

This is a good example of how the most compelling thought experiments can mislead even the most intelligent among us. Indeed, it would probably take an overly intelligent person to buy into this concept.

It turns out that Wikipedia has the explanation for the failure of our expectations:

"... If there are as many monkeys as there are particles in the observable universe (10^80), and each types 1,000 keystrokes per second for 100 times the life of the universe (10^20 seconds), the probability of the monkeys replicating even a short book is nearly zero."

THE "PARTS" PROBLEM

One of the many valuable things I learned about from Prof. Howard Pattee, when he was my teacher and Chairman of my PhD Committee, was the "parts problem".

For a partially random process of assembly to result in any thing of value, the parts must be in the right proportion to the thing you will produce. In the case of the Million Monkeys, the parts are random letters and we are looking for a book-length result (a Shakespeare play).

All we got were a few English words sprinkled among lots of gibberish.

Had we started with words as the parts, and had they been in proportion to their frquency in the English language, we would have obtained better results.

For example, there are about 10,000 English words that constitute over 99% of all English writing. Say we had a "keyboard" with 100,000 keys, with each word having as many keys as justified by its frequency in the English language. A "monkey" typing thousands of keystrokes on such a "keyboard" would be quite likely to produce a number of grammatical phrases and even sentences, and perhaps several meaningful sentences. The "monkey" might even produce a unique, original thought.

We'd be still further ahead if we used the computer to impose grammatical structure. The "monkey" would have three "keyboards". the first would have keys for SUBJECT, the second for VERB, and the third for OBJECT. Thus, each sentence would be of the form: "The BOY LOVES the GIRL" or "JACK HITS the BALL", etc. Of course, most sentences while gramatical, would not be meaningful, "The GIRL HITS the BEDROOM" or "The GRAPEFRUIT LOVES the SHOES", etc.

Of course, this system would create only the simplest of simple sentences.

For more natural sentence structure, we could use larger parts, such as phrases. Or perhaps ready-made sentences with fill-in-the-blanks like those we used to use as party games.

Like many things in the natural and artificial world, written language is a hierarchical set of structures. English consists of letter characters that, when taken in groups, form words of up to several letters. (But, you can't just take random letters and get a word. You need a proper ordering of vowels and consonants, etc.) At the next level up are simple and compound sentences made up of groups of words. (But, that cannot be random either, you need subject, verb, object substructures). At the next level paragraphs, then sections, then chapters, etc.

HOW THE GENETIC SYSTEM SOLVES THE "PARTS" PROBLEM

When we decode the genome of some animal we express it in a series of four nucleotides A, T, G, C. Each of these letters stand for a molecular assemblage containing a dozen or two atoms. Sequences of these letters (in the "genotype") code for the generation of various amino acids and groups of amino acids code for proteins. Combinations of proteins constitute what we call "genes" that code for physical characteristics (in the "phenotype").

The genetic system long ago settled on a really neat hierarchcal system where the lowest levels are very stable and most are common between different species. When DNA is copied, there are multiple instances of codes for the really important proteins. There are correction mechanisms for many types of mutations (copying errors) . The same is true at the next level, which we call "genes", and the level above that of multiple genes that work in concert, etc.

Notice how, in the genetic system that has been evolving over the past three or four billion years, the "parts" at each level are appropriately sized for their jobs. (My Optimal Span Hypothesis http://iraknol.wordpress.com/article/optimal-span-3ncxde0rz8dtk-2/ ) provides a basis, founded in well-established information theory, for how hierarchical systems are most effectively organized.)

A TEXT GENERATOR THAT REALLY WORKS

Here are excerpts from a "Post-Modernist" academic paper I just generated:

Reinventing Modernism: Neosemioticist objectivism, capitalism and Derridaist reading
Andreas Porter
Department of Ontology, Massachusetts Institute of Technology

1. Derridaist reading and patriarchial conceptualismIn the works of Burroughs, a predominant concept is the distinction between figure and ground. Therefore, subtextual dialectic theory states that truth is unattainable.

The main theme of the works of Burroughs is the failure, and hence the futility, of precapitalist sexuality. In a sense, several discourses concerning patriarchial conceptualism may be discovered.

If Batailleist `powerful communication’ holds, we have to choose between subtextual dialectic theory and cultural narrative. But the creation/destruction distinction prevalent in Burroughs’s The Last Words of Dutch Schultz emerges again in Naked Lunch.

2. Expressions of dialecticThe characteristic theme of Hamburger’s[1] analysis ...

1. Hamburger, O. ed. (1972) Subtextual dialectic theory and Derridaist reading. University of California Press ...

The above "scholarly paper" and as many as you'd want to see like it, are available at Communications from Elsewhere.

The computer program behind this feat starts with parts that are very large. Indeed, each paper has a Title, Authors, Sections (with paragraphs and sentences), and Citations. Each of these has a set form. The only randomness is the insertion of words from certain lists into specified blanks. The results are quite compelling.

Indeed, if you gave one of these papers to a group of intelligent people who were not experts in post-modernism, many would accept them as peer-reviewed material. AND a Post-Modernist Journal might peer-review and accept the paper for publication! (See Sokal Affair)

Ira Glickstein

5 comments:

Howard Pattee said...: The monkey theorem is conceptually stimulating and can be instructive for probability theorists, but it is of little interest to linguists, scientists, and artists because any random string of symbols is meaningless.

What we call a language must have meaning, by definition. Meaning in any sign or symbol system requires establishing a triadic relation between symbol, interpreter, and referent (see Wiki “C. S. Peirce” on signs). This largely arbitrary but fixed relation is partially codified in what we call a dictionary, but a dictionary is circular. It takes much more direct experience than reading a dictionary to ground a word’s meaning in the real world. The interpreter (cell, brain, or computer hardware) is what determines meaning, not the sequence of marks.

In other words, a language whether natural, mathematical, or artificial, is a set of arbitrary but fixed constraints on the order of symbol sequences, and because it is ordered it is not random, by definition. Such nonrandom constraints are necessary for meaning in any medium.

Igor Stravinsky said it in his Poetics of Music: “The more constraints one imposes, the more one frees oneself from the chains that shackle the spirit . . . and the arbitrariness of the constraint serves only to obtain precision of execution.”; March 8, 2010 at 8:47 PM
Ira Glickstein said...: Correct Howard, "...any random string of symbols is meaningless."

Yet we believe that life originated on Earth via random processes. The best explanation you and I know started with randomly-generated groups of proteins that happened to form autocatalytic cycles that reproduced themselves. Some "lucky" autocatalytic cycle happened to generate primitive RNA-like strings that reproduced in what has been called "RNA World". There must have been zillions of different "lucky" RNA Worlds that came and went until one happened to generate "super-lucky" primitive DNA-like double strings.

Up to this point, we were depending upon random events, betting (as in the million monkey caper) something would come together with "meaning" - whatever that is in a world without sentient beings.

I believe you told me you were involved with, or around at the time of the Stanley Miller and Harold Urey experiment that actually generated some amino acids from inorganic precursors.

So, it seems the Laws of Physics and Chemistry dictate that random processes will generate organic compounds. Though Miller-Urey did not succeed in generating more than the amino acids, do you believe such experiments could eventually generate primitive single-cell life?

Isn't this akin to the million monkeys eventually generating at least one coherent short story - perhaps not one that is exactly the same as any ever generated by a known writer like Shakespeare, but one that nevertheless would have meaning for any person who understands written English?

Once primitive single-cell, DNA-based reproduceable life was established on Earth, was it not more or less inevitable that it would evolve into more complex forms, including multi-cell life? Perhaps not exactly the same as the life we have now, but somewhat similar?

So, back to your observations about "meaning". In the story of the origin of life on Earth that you and I basically accept, when did "meaning" evolve? In the case of the million monkeys, when does "meaning" evolve? Is it when one of the English teachers happens upon a sentence or paragraph or short story that has meaning to him or her, or was it when that particular monkey happened to type it?

Ira Glickstein; March 8, 2010 at 10:56 PM
Howard Pattee said...: Ira, we're back to our old argument. The view you like is that random events are just our ignorance of events that are all strictly determined. So in your view life was inevitably determined with the initial conditions at the big bang.

The other view is that nothing is deterministic. All events are just probabilistic, but some have very high (or low) probabilities that are experimentally indistinguishable from determinism.

Since no one has thought of any way to finally empirically test either assumption, they must be considered metaphysical faiths.

These two positions are actually only our models of reality and they are both useful but incompatible [complementary] models.

Max Planck emphasized that, "For it is clear to everybody that there must be an unfathomable gulf between a probability, however small, and an absolute impossibility . . . Thus [deterministic] dynamics and statistics cannot be regarded as interrelated.”; March 9, 2010 at 10:05 AM
Ira Glickstein said...: Right, Howard, our old argument! (Just like an old married couple :^)

OK, I am willing to take the "nothing is deterministic" stance from now on in this thread.

Given that "All events are just probabilistic", when do you think "meaning" arose in the Miller-Urey experiment that produced amino acids from non-organic precursors? (And, am I misremembering your involvement?)

Do you think it is posssible that some future Miller-Urey-like experiment, starting with non-organic precursors, will eventually yield primitive reproducing life-forms? I believe such an experiment (assuming success) will have something like primitive RNA and DNA (but most likely have different code details). Do you agree?

Assuming the million monkeys are "just probabilistic", and one happens to turn out a paragraph that you and I find "meaningful" and perhaps even "poignant", when did that "meaning" happen? When we read it or when the monkey typed it?

Sorry for all these questions, but you are, after all, my most influential professor!

Ira Glickstein; March 9, 2010 at 11:50 AM
Howard Pattee said...: Ira wants to know: “when did that "meaning" happen? When we read it or when the monkey typed it?” My short answer is: “When we read it.”

But Ira is not alone in his question! The long answer is that this is the basic question of epistemology, a case of what philosophers call the mind-matter problem or more generally the symbol-matter problem. That is, when does any collection or pattern of matter become more than just physical and chemical substance? When does a molecule become a message?

Many biologists and semioticians argue that the first message was at the origin of life. That required the cell’s genetic message. Certainly DNA has meaning for the cell. It is the heritable record of past selection events that controls reproduction. At the other end of the hierarchy of evolved meanings cognitive scientists wonder when brain matter becomes conscious.

In physics this is called the measurement problem: When does the material measuring instrument produce a symbolic result? In quantum theory they say it is when the wave function loses its entanglement (decoheres) and becomes a classical probability. When and how that happens is very mysterious. In any case, most everyone (e.g., Heisenburg, Pauli, Bohr, von Neumann) agrees that to measure anything one must make a sharp epistemic cut between the measuring device and the system being measured.

In artificial intelligence this is the problem of when the hardware and voltages in a computer can be called symbolic. Harnad calls this the symbol-grounding problem. Searle’s “Chinese room” is an example of the conceptual problem. The Turing Test is the classic example.

Excuse my lecturing, but this is the problem I have worried about for 50 years. Google “epistemic cut” if you want to worry about it further.; March 9, 2010 at 9:29 PM

My Personal Best on the Internet

>>>The Virtual Philosophy Club The Virtual Philosophy Club - Courteous Discussion of Serious Topics
>>>Visual Ira - Visualize Science and Technology With Ira
>>>"2052 - The Hawking Plan""2052 - The Hawking Plan" (Free online Novel) Amore, amorality and Stephanie Goldenrod's mission to save civilization for an infinite future.
>>>My Future Tech Predictions My predictions for the next several decades (companion site to my novel)
>>>Curb Your EnthusiasmCurb Your Enthusiasm - Fantasy Episode- Fantasy Episode
>>>My YouTube Videos>>>>My YouTube Videos
>>>"What is Time?">>>>My "What is Time?" Video

BLOG GUIDELINES

Click Here to Read WELCOME Posting

Express your opinions forcefully. Collegial cross-discussion is encouraged. Use logical, fact-based arguments. Stick to the subject of the main topic. No political or religious diatribes. No personal attacks on others. No chain letters.

Participation levels:

LURKER - Anyone with web access may read this Blog.

COMMENTER - Anyone with a Google account may post a Comment to any existing Topic thread. Unless you are an Author (see below) your Comment will not appear on the Blog for a day or two until I approve it.

AUTHOR - An Author may initiate new Topics and have their Comments instantly appear without my further approval. [To become an Author, send an email to Ira Glickstein at Ira@techie.com with your name, email, and brief bio.]

New Topics should be carefully written, as if you were planning to give them as a presentation at a club meeting. Please sign Topics with your full name.

NOTE: This Blog is not officially associated with the Philosophy Club of The Villages, FL. However, current and former members of that organization are invited to participate.

Curb Your Enthusiasm - Fantasy Episodes

My brother and I are members of the "cult" that enjoys Curb Your Enthusiasm, the HBO comedy series about to enter its seventh season.

Although our political viewpoints are different, I think Larry David, who created and stars in Curb, and previously co-created Seinfeld, is a comedy genius. Here is the link to the fan Blog I created

WHY DID I CREATE THE BLOG?

Some years ago, while asembling a wooden cabinet, I had a disastrous experience. I got too cocky and made a mistake that caused the partially-built cabinet to collapse "like a house of cards". It was deeply disturbing at the time, but is hilarious in retrospect.

That experience inspired me to write a story concept where Larry David, who portrays himself in Curb as a clutz when it comes to manual skills, is shamed into tackling a do-it-yourself project. In my story idea he, in his inept way, ends up doing a better job than an expert cabinet maker.

Since Larry David does not accept story ideas from the general public, my story outline has mouldered away on a computer disk in my closet somewhere. After a visit by my brother reignited my passion for Curb, I decided to resurrect the story idea and "free" it on the Internet.

HERE ARE THE LINKS

Football fans play "Fantasy Football", so why can't sitcom fans play "Fantasy Episodes" of their favorite series? I've started a Blog Curb Your Enthusiasm - Fantasy Episodes. Read the Welcome posting and the first Fantasy Episode DIY Larry.

Here are some short highlight clips of past seasons of Curb and a tickler for the seventh season as well as a trailer for a Woody Allen movie staring Larry David.

2052 - The Hawking Plan

2052 - The Hawking Plan is Ira's free online novel. Please read it and pass it on! Stephanie Goldenrod strives to save human life and civilization for an infinite future. Amorality, amore and deep ethical and philosophical issues. What will life, liberty and technology be like several decades in the future? Click here for PREDICTIONS - How technology will affect life and liberty in future decades.

A peaceful world where large-scale military conflict is absent.
A political economy dominated and effectively ruled by transnational corporations.
A civilization where reason is trusted and faith is suspect.
A "positive ID" regime where religion-based terrorism has been suppressed, along with virtually all anonymity and privacy.
A society where most people, including ministers, priests and rabbis are not literal believers.
A post global warming world where humanity has been decimated by genetic engineering disasters.
A population served by Intelligent Robotic Agents, certified intelligent at the average human level.

A brilliant plan to spread human life and civilization far and wide into space as insurance against further disasters here on Earth. 2052 - The Hawking Plan

Want the latest version of the novel as a .pdf file? Want a professionally printed and bound copy? http://www.lulu.com/spotlight/queenbeebooks.

ALSO PUBLISHED BY IRA Several "Knols" (bits of knowledge) on Google's Knol platform. Some are a bit technical but should be accessible by a general audience:

Exercise in Bed - Do Your 4's! Some easy exercises you can do in four minutes flat in bed. Do some easy exercises in bed morning and evening. Energize yourself for the day. Relax yourself for some solid sleep every evening. If you do this every day for 120 years I guarantee you will live long :^)

Optimal Span What is the most effective span for a hierarchical structure? For example, Management Span of Control is optimally between 6 and 7...

Quantifying Brooks Mythical Man-Month Brooks Law states: “Adding manpower to a late software project makes it later.” This Law is applicable to any task involving lots of people in complex interaction...

Bayesian AI Advisor Bayes Theorem has practical applications. Use it to make real world decisions. A relatively simple Excel-based tool helps you choose the right course of action in the face of uncertain probabilities and inexact test results. It is available for FREE.

Decision Aiding Tool - Do a Trade Study Choose the Best out of a set of attractive Alternatives. Buying a house, car, pet or PC; choosing a course of action at work? Here is how to get everyone involved and make a rational choice. A relatively simple and FREE Excel-based tool helps you make the right decision, including decisions in the face of uncertainty!

Aristotle's Physics - The Four Causes How would Aristotle have used computer graphics to depict his ideas about the Four Causes (Material, Formal, Efficient, and Final)? Written ca. 350 BC, his ideas appear primitive by our 21st Century standards but they capture the wisdom of the ages. The plain text is inadequate for modern understanding - graphics explain his concepts more clearly to a contemporary audience.

Aristotle's Physics - The Five Elements Powerpoint charts Aristotle might have drawn explaining the Five Elements (Aether, Air, Fire, Water, and Earth).

Nash Bargain Advisor. John Nash won the 1994 Nobel in Economics for his work on what came to be known as "Nash Equilibrium", where two or more competing entities "cooperate" (without illegally colluding) to reach a "Nash Bargain". The book and movie "A Beautiful Mind" dramatized Nash's life story and work. A relatively simple Excel-based tool helps you calculate a Nash Bargain in a competitive situation. It is available for FREE.

Ira Glickstein

Monday, March 8, 2010

Million Monkeys on a Million Typewriters and Other Failed Experiments

5 comments:

Ira Glickstein, Administrator

Authorized Authors

LIES, DAMNED LIES, AND STATISTICS

Causality and Determinism

Morality, Ethics, and Religion