How long would it take a monkey to write "Hamlet"? | Coffee and theorems

The call infinite monkey theorem claims that a monkey with a typewriter pressing random keys would end up writing any literary work: Hamlet, Don Quixote or even a improve self-created seller. Even if it is not very applicable in practice – it is at least complicated to have an immortal monkey willing to write forever – this statement allows us to explore very interesting concepts such as randomness, infinite behavior and calculation based on the generation of pseudo-random numbers.

This is a direct consequence of second Borel-Cantelli lemma. This lemma states that if each attempt to achieve a particular outcome is independent of the others and has a probability of success greater than zero, then, given a sufficient number of attempts, that outcome will occur an infinite number of times. In the case of the infinite monkey theorem, if a monkey presses random keys indefinitely, the probability that it will type a given text in a single attempt is very low, but not zero. Since the attempts are repeated infinitely and are independent of each other, according to the lemma the monkey will end up writing the desired text infinite times.

To be realized, the theorem is based on several hypotheses. The first is that the monkey must type randomly. Colloquially, we understand it as a phenomenon random one whose outcome cannot be determined with certainty before it occurs, even if the initial conditions are known. Examples of randomness are rolling a dice or drawing the Christmas lottery. In the case of the monkey it is assumed that, with each press of a key, all the letters of the alphabet have the same probability of appearing, regardless of the text already written.

This condition allows us to calculate the probability that the monkey writes a certain sequence. For example, the probability of writing “hello” by typing four keys at random on a Spanish keyboard (considering only letters and spaces) is (1/27)^4, about 0.0000019. This small value, for such a short sequence, already demonstrates how complicated the issue is.

Here is the second assumption of the theorem: there is an infinite time and, therefore, an infinite number of attempts. After a number No of attempts, which for simplicity is assumed to be isolated, the probability that the ‘hello’ sequence does not appear is (1-0.0000019)^No. Although (1-0.0000019) is very close to 1, when multiplied by itself No sometimes yes, No is large enough, you get a value close to zero. Therefore, the monkey will write “hello” with the highest probability we desire.

The same thing happens with any other sequence, even one that includes all the words, in order, of Hamlet– and this is what the statement of the infinite monkey theorem is based on. Now, can we roughly estimate how long it might take to get, with a high probability, Shakespeare’s classic? In a recent item they calculated that, almost certainly, the entire current population of apes would not be able to write a text of more than a few words before the heat death of the universe.

Other curious experiment Related to this theorem it allows the user to input any sequence and simulates random generation of text until the specified sequence is found. To produce the text, this page uses the so-called Pseudorandom number generators. Being rule-based, the calculations carried out by these programs are completely deterministic: if all the initial conditions are known, it is possible to anticipate the generated number. That is, pseudorandom numbers are not random. However, with the generator’s initial conditions unknown, the generated values are indistinguishable from truly random numbers. There are several techniques for this purpose, such as generators based on modular arithmetic or those based on cryptography, among others.

Finally, given the rapid rise of large language models, could these be used as substitutes for monkeys in our experiment? ChatGPT or DeepSeek could write spontaneously Don Quixoteif they are asked to write for an infinite amount of time? The previous reasoning is not valid, since these models generate texts based on the probability of appearance of words within a context, they are not the product of a random process. And how Don Quixote is among the texts with which they were trained, it might seem that the probability of them reproducing the complete work would be greater than in the previous case.

However, several factors make this extremely unlikely. First, these models are not trained to faithfully replicate Golden Age Spanish texts, but rather in a modern language, which makes it difficult for them to accurately follow Cervantes’ style. Furthermore, these programs are designed not to copy verbatim large portions of the texts they learned from, further reducing the chances that they will reproduce entire works. This, combined with other limitations of the program, means that although the model can get closer to some parts of the text than the monkeys, the likelihood of it reproducing it in full is negligible.

Pablo Garcia Arce He is a pre-doctoral researcher at the Superior Council of Scientific Research (CSIC) at the Institute of Mathematical Sciences (ICMAT)

Coffee and theoremsis a section dedicated to mathematics and the environment in which it is born, coordinated by the Institute of Mathematical Sciences (ICMAT), in which researchers and members of the center describe the latest progress of this discipline, share meeting points between mathematics and other social and cultural expressions and remember those who marked its development and were able to transform coffee into theorems. The name evokes the definition of the Hungarian mathematician Alfred Rényi: “A mathematician is a machine that transforms coffee into theorems”.

Editing, translation and coordination:Agata Timón García-Longoria. She is coordinator ofMathematical Culture Unit of the Institute of Mathematical Sciences(ICMAT)