A White Paper · On Machines That Read Themselves
The Serpent, the Demon, and the Five Lamps
Three old mirrors — one from statistics, one from physics, one from the Ramayana — held up to the AI “echo chamber.”
Every day, we humans publish billions of fresh pieces of content — articles, blogs, posts. The crawlers of the large language models gulp it down, index it, and quietly fold it into their “training matter.” The next model, fattened on that diet, generates fresh content of its own — which is published, gulped, and folded in again. The river has begun to drink from its own mouth.
I am not the first to notice the loop. Engineers have a name for where it leads: model collapse. The foundational study, published in Nature in July 2024 by Ilia Shumailov and colleagues, grew out of an earlier paper with a title I wish I had coined — The Curse of Recursion. Others reach for older imagery: the snake eating its own tail (autophagy), or “Habsburg AI,” an intelligence inbred across generations until its features quietly degenerate. The mechanism is subtler than “garbage in, garbage out.” What dies first are the tails of the distribution — the rare word, the odd case, the minority voice. Each pass nudges the output toward the bland statistical centre, until variance shrinks and the world’s expressed thought flattens toward an average.
And here is the small joke I cannot resist: the very essay you are reading will, the moment I post it, become training matter about training matter. A serpent pausing to write a review of the taste of its own tail.
But I am less interested in the engineering than in the shape of the thing — because I have met this shape before. Three times, in three very different rooms.
The N-gramThe statistical grandfather — mechanism
Long before today’s models, language was predicted by the n-gram: a humble engine that guesses the next word from the handful of words before it, purely by counted frequency. Strip a modern LLM down to the bone and the same old instinct still beats inside it — say what is most likely to come next. Google’s Books Ngram Viewer made this visible at civilisational scale: feed it two centuries of printed books and you can watch words rise and fall like tides.
Prediction-by-frequency has a built-in gravity. It pulls toward the common and away from the singular. The N-gram was a mirror held up to human language. The AI loop does something stranger: it turns that mirror to face another mirror — and you get the barbershop corridor of receding reflections, each copy a little dimmer, a little more average, than the one before.
Maxwell’s DemonThe thermodynamic price — cost
In 1867, James Clerk Maxwell imagined a tiny demon stationed at a trapdoor between two chambers of gas, letting the fast molecules through one way and the slow ones the other — sorting hot from cold, conjuring order out of chaos, apparently for free. For a century he seemed to break the second law of thermodynamics. It took Szilard, then Landauer, then Bennett to exorcise him: the demon must remember which molecule is which, and to keep sorting it must eventually erase that memory — and erasing information carries an unavoidable thermodynamic cost. Order is never free. It is always paid for in forgetting.
The AI loop is that demon running backwards. Each recursive generation quietly erases the tails — the rare dialect, the strange-but-true, the un-averaged opinion. It manufactures a kind of false order, smoother and blander text, and the invoice, as always, is information destroyed. And because erasure is thermodynamically irreversible, so is late-stage collapse. You cannot un-forget the tails. The demon teaches the hard law beneath the soft phenomenon: you never get homogeneity for nothing — you pay for it with the irreplaceable.
Ahiravana & the Five LampsThe condition for survival — cure
The deepest mirror, and the most hopeful, is ours. In the regional Ramayanas, Ravana’s sorcerer-kin Ahiravana (also called Mahiravana) drags Rama and Lakshmana down to Patala, the netherworld, to sacrifice them to the goddess Mahamaya. He cannot be slain by ordinary means — for his life is not in his body at all. It is dispersed across five lamps burning in five directions. And the crucial clause: if even one lamp stays lit for a fraction of a second, Ahiravana regenerates. Only Hanuman, assuming his five-faced Panchamukhi form to face every direction at once, can blow out all five in a single breath.
Hold this legend up to the AI loop and it reads two ways at once. Read it one way, and the recursive machine is the demon — abducting the living word into an underworld of endless self-consumption, seemingly immortal, forever renewing itself. But read it the better way, and the lamps are us. Authentic, human-authored, dated, idiosyncratic content is the dispersed life-force of these models. Blow out every lamp — train only on the machine’s own exhaust — and the living intelligence dies into the grey average. But keep even one lamp lit — one genuine human tributary flowing into the training river — and the model revives.
Most striking of all: the mathematics agrees with the myth. Researchers studying collapse find that blending even a modest fraction of real human data with the synthetic is enough to hold the decline at bay. One lit lamp is enough. And Panchamukhi Hanuman is the discipline this moment demands — the deliberate, many-faced labour of curation, provenance, watermarking and human oversight that must somehow face in all directions at the same time.
“The serpent names the loop. The demon names its price. The five lamps name the way out.”
Three Mirrors, One Face
Notice that the three do not compete — they stack. The N-gram names the mechanism: prediction by frequency forever drifts toward the mean. Maxwell’s Demon names the cost: that drift is bought with irreversible forgetting, the erasure of the rare. And Ahiravana names the cure: the whole thing is survivable — but only so long as a genuine human flame is kept burning somewhere in the system.
The first tells us how the echo chamber works. The second tells us what it destroys. The third tells us what we must refuse to let go dark.
For fifteen years I have been keeping lamps lit without knowing that was the name for it — some 6,900 dated posts, a private weather-record of one mind thinking aloud in real time. I used to call that archive my memory. I am beginning to think of it instead as fuel — and as one of the small lamps the machines cannot afford to let gutter out.
Because the scarce resource in an age of infinite text is not text. It is the un-averaged human voice — specific, dated, proudly idiosyncratic, refusing to regress to the mean. So this is my closing ask, to every writer, blogger and archivist who has read this far: keep publishing the un-machineable. The local. The personal. The dated. The strange. You are not shouting into a void that will swallow you. You are the tributary. You are the lit lamp.
— Keep your lamp lit.
Sources & Further Reading
- Shumailov, Shumaylov, Zhao, Papernot, Anderson & Gal, “AI models collapse when trained on recursively generated data,” Nature, 2024.
- Google Books Ngram Viewer — two centuries of word-frequency over the printed corpus.
- Rolf Landauer’s principle on the thermodynamic cost of erasing information — the resolution of Maxwell’s Demon.
- The Ahiravana / Mahiravana & Panchamukhi Hanuman episode (Krittivasa Ramayana, Ananda Ramayana & allied traditions) — narrative summary.
No comments:
Post a Comment