AI Cracks Medieval Codes: What Historians Are Reading for the First Time

Historians long accepted gaps in the record. Encrypted letters, diaries and ledgers sat untouched in archives from the Vatican to Swedish castles. Roughly 1% of all material worldwide remains hidden behind ciphers. That figure, cited across multiple studies, represents lost diplomatic cables, medical recipes and personal betrayals.

Now machine learning changes the arithmetic. Researchers feed images of faded ink and odd symbols into models trained on historical scripts. The systems spot patterns humans miss after years of staring. Results arrive in minutes instead of months.

From substitution tables to neural networks

Beáta Megyesi, professor of computational linguistics at Stockholm University, spent years on these puzzles. She helped crack the Copiale cipher years ago, revealing rituals of an 18th-century German secret society. The work convinced her that scale demanded new methods. “It is like detective work where every symbol, pattern, and partial solution may bring us closer to someone’s secrets and to a lost historical world,” Megyesi told the BBC.

Her team targeted the Borg cipher, a 408-page manuscript locked in the Vatican Library for over 400 years. Shelfmark Borg.lat.898. The text mixed 34 strange symbols with occasional Roman letters and an Arabic title page. Its cover promised remedies “for affections of the human body.” Medical knowledge in the early 17th century often invited charges of witchcraft. Owners had reason to hide recipes.

AI broke it. The decoded pages list thousands of treatments. Drink several glasses of high-quality red wine. Ferment a nutmeg in dough to fight dysentery. The full translation sits online for anyone to inspect. Yet Megyesi stresses the effort stayed laborious even with assistance. Patterns emerged only after repeated human validation.

The DECRYPT project, which she leads, collects thousands of cipher examples. Its database now holds over 10,000 records, 6,000 keys and 3,600 ciphers. That volume feeds training data previous generations lacked. Models learn not just frequency analysis, the ninth-century technique pioneered by Arab mathematician Al-Kindi, but variations where one letter maps to eight different symbols.

Take the 1637 letter from Sigismund Heusner von Wandersleben to Swedish Lord High Chancellor Axel Oxenstierna. Written at the height of the Thirty Years’ War, it used numbers separated by dots. Parts stayed in plain 17th-century German. Michelle Waldispühl, professor of German linguistics at the University of Oslo, turned to Transkribus, an AI platform trained on centuries of handwriting. The tool digitized the document quickly. Corrections followed. The decrypted warning described forced retreats after discovering a conspiracy among Protestant allies, including Lord Franz Heinrich of Saxony.

“It was very much back and forth between the machine and the human validator,” Waldispühl explained in the same BBC report. She wonders whether future systems will operate without that loop.

But simple frequency still fails against clever decoys. Extra meaningless symbols. Multiple signs for a single letter. Unknown source languages. These tricks defeated cryptologists for decades. Cecile Pierrot at INRIA in France spent six months on a three-page letter from Holy Roman Emperor Charles V. The 500-year-old document employed 120 distinct cipher symbols. It exposed the emperor’s terror of assassination by an Italian mercenary working for French King Francis I.

Modern tools accelerate transcription first. Bad handwriting, faded ink and invented symbols once required a full day per two-page letter. Neural networks now scan entire pages, detect lines, recognize characters. The DECRYPT team builds models adaptable to broad ranges of scripts and symbolic sets. Megyesi says the group trains on generic handwriting then refines with specific cipher images paired to known plaintext.

Recent tests skipped transcription altogether. The system analyzed photos directly and deciphered simple substitution ciphers it had never seen. One experiment used the already-solved Copiale manuscript. After training on partial lines, the model predicted unseen sections accurately. “This opens up exciting possibilities for rare and non-standard writing systems,” Megyesi noted. The ultimate aim merges transcription and decipherment into one process.

Her group also created a chatbot-style interface. Researchers upload an image. The tool transcribes, decrypts, translates to English and explains its reasoning. When tested on an extract from the Borg cipher, it processed 500 symbols in just over 29 minutes. It handled two other previously solved ciphers from different eras and complexities with similar speed. “AI helps most with scale, speed, pattern discovery and integration of tasks,” Megyesi said.

So far the project has examined 400 mysterious postcards from the late 1800s and early 1900s. A few decoded lines show German love letters. Other archives hold Mary Queen of Scots’ coded correspondence, which exposed her role in throne plots and strained relations with her son, the future James I of England.

These successes arrive against a backdrop of larger questions. Cryptographers built modern systems on assumptions about computational limits. Medieval pencil-and-paper ciphers seem quaint by comparison. Yet the same pattern-matching engines that read faded script could, in theory, hunt weaknesses elsewhere. Bruce Schneier highlighted the development on his security blog, simply noting that researchers now apply machine learning algorithms to historical pencil-and-paper ciphers. The Schneier on Security post from early June 2026 drew fresh attention across cybersecurity circles, as seen in recent discussions on X.

Historians gain the most immediate benefit. Missing medical knowledge. Diplomatic intelligence never factored into textbooks. Personal details that reshape biographies. The Vatican remedies alone expand views of Renaissance-era healing practices. Charles V’s fears humanize a ruler once defined by power. Von Wandersleben’s warnings add texture to the Thirty Years’ War’s chaos.

Challenges persist. Data scarcity hampers training. Large language models feast on trillions of tokens scraped from the modern internet. Historical ciphers offer fragments. The DECRYPT database helps close that gap, but experts still review outputs to guard against hallucinated solutions. The chatbot logs its logic precisely to allow verification.

Ancient undeciphered texts loom larger. The 4,000-year-old Phaistos Disc from Crete. Linear A script from early Greece. Neither yields to current methods. Yet the incremental progress on medieval material suggests pathways forward. Combine computer vision, historical language models and cryptanalysis in one pipeline. Feed it every known cipher. Let it generalize.

Megyesi’s vision stays measured. She speaks of assisting researchers across many cases rather than promising instant solutions. The work demands collaboration among linguists, cryptologists, historians and computer scientists. That cross-disciplinary habit already produced the open-access DECRYPT portal, available to anyone with an archive image and curiosity.

One percent of the world’s archives still waits. That percentage shrinks each quarter as models improve. Historians read texts unseen for centuries. They adjust narratives once considered settled. And the techniques refined on 400-year-old remedies may sharpen tools that guard today’s secrets. The feedback loop between past and present tightens.

Short-term gains look clear. A love letter decoded. A battlefield report restored. A medical manual expanded. Longer term, the same pattern engines test assumptions about what stays secret. Medieval scribes hid messages from rivals with pen and ink. Today’s defenders use mathematics and silicon. The machines learning from one era now scrutinize both.

AI Cracks Medieval Codes: What Historians Are Reading for the First Time

Notice an error?

Ready to get started?