AI language models decipher ancient texts Unveiling Ancient Secrets: The Potential of AI Language Models in Decoding Mysterious Texts

Approximately 4000 years agoan ancient civilization living in the Indus River Valley (present-day India and Pakistan). 10 percent of the world’s population. Although few records remain of this group of people, archaeologists have discovered that they were advanced enough to have their own writing system – which has yet to be deciphered.

Also known as Indian scriptthe mysterious text has puzzled scientists, linguists and even cryptographers for decades. Only a few hundred symbols have been classified because scientists have not found a “Rosetta stone” or key to decoding this unknown language. But recent advances in artificial intelligence—including large language models like ChatGPT—could change that, providing further insights into ancient civilizations.

Uncovering the Indus Valley Script

While the Indus Valley Civilization was officially discovered in the 1920s, it would not be until 1999 that the first parts of his script have been discovered. Seals, pottery and even bones are inscribed with strange symbols accompanied by animal figures. These elaborate inscriptions made discovery all the more enticing, putting the secrets of this complex society just out of reach.

“[The script] will help us learn a lot about this ancient civilization, their way of life [and their] knowledge about the world,” says Satish Palaniaappan, an applied machine learning scientist at Microsoft. “This is all locked information that we do not currently have access to.”

Indus Valley Script Unlock

Palaniappan is one of many researchers using AI algorithms to try to decode the script. Together with a colleague, he developed algorithm to identify similar characters in text by looking for patterns in certain character frequencies, according to a recently published journal article PLANE. Scientists can then use these character frequencies to create a decryption key.

Other ancient languages, such as Ancient Egyptian, are deciphered with a multilingual key: the Rosetta stone. In this case, the stone connects an already decoded speech (the Greek alphabet) to an undeciphered one (Egyptian hieroglyphs), allowing archaeologists to decode the unknown language.

Because the Indus Valley language lacks a multilingual key, it forces researchers like Palaniapan to think creatively in finding connections between the Indus script and other languages.

“With recent advances in natural language processing, especially with large language models like ChatGPT-3 and ChatGPT-4, we can try to refine or provide more context to the languages ​​we think are derived from the Indus scripts, such as the Brahmi scripts,” he says. “And see if these generative models can get creative and figure out what each symbol means and how it fits into a linguistic structure.”

Other efforts to unlock Indus Script

Similarly, Peter Reves, a professor of computer science at the University of Nebraska-Lincoln, is trying to connect the Indus script with other languages. Like Palaniappan, Revesz, along with student Shruti Daggumati, grouped characters in the Indus Valley script and compared them to similar-looking characters in both the Brahmi script and the Phoenician alphabet, which had roots in the Minoan culture.

“You feel like an archaeologist mixed with a computer scientist,” says Dagumati in Youtube video about the project. “You will be your own Indiana Jones.”

In a 2018 paper, Revesz and Daggumati found that the characters of the Indus script resembled some characters of the Phoenician alphabet with 90% certainty, according to the AI ​​algorithm they used.

“We can think of it as a Bronze Age version of the Silk Road,” Reves said, emphasizing the connection between the two cultures. “It is possible that the use of scales, weights and writing spread along these trade routes. Therefore, the Indus Valley and the linear script may be related. I am developing AI algorithms to help investigate this possibility, which would be the key to deciphering the Indus Valley Script.

Deciphering the Voynich Manuscript

Unlike the Indus script, a mysterious late-medieval text known as Voynich Manuscript offers a wealth of symbols for archaeologists and linguists to analyze. Written all around 600 years ago the text of 240 pages is composed of 25 to 30 unknown letters and signs. Adjacent to the language are 126 color illustrations of alien-looking plants on its pages, of which 124 are botanically identified based on the structure of the flower, leaf or root of the plant.

A similar process has not yet been carried out for the language of the stumbling manuscript cryptographers and linguists since its opening in 1912.

“Deciphering the Voynich manuscript may provide some historical insight into medieval life,” said Kevin Knight, a professor of computer science at the University of Southern California. “But that’s not what makes people try to decipher it. They do it for the intellectual challenge. It would be great to be the first person in 500 years to read and understand such a mysterious document.

Can AI decode these ancient texts?

Knight and other scholars believe the manuscript was written as a cipher, perhaps even an anagram, making its decoding even more of a mystery. For Knight, this is where an AI algorithm can come in handy.

“If I show you a long cipher, you might notice that the ‘P’ is always followed by a ‘D,'” says Knight. “You might guess that ‘P’ and ‘D’ stand for ‘Q’ and ‘U’ respectively, because that’s how QU works in English. Once you know that “D” stands for “U”, you can look for patterns related to “U”. A computer can do these reasonings faster and better than a human.

Yet the medieval language encoded in the Voynich manuscript may be older version in English, French, or Latin, making decipherment more difficult. Knight continues to use AI algorithms to try to decode the Voynich manuscript, but is still determining whether it can be solved with current versions of AI models, such as ChatGPT.

“Generally speaking, GPT is good at doing simple tasks that don’t require trial and error with a pencil and eraser,” says Knight. “For example: adding numbers, translating a sentence, counting words, writing a paragraph on topic X, etc. It is less good at solving complex puzzles. But of course future versions of GPT may very well learn how to do things like these.

The Voynich Manuscript and the Indus Valley Script are some of the most complex linguistic puzzles. As such, many scholars around the world will no doubt be eagerly awaiting AI advances that can help unlock the mysteries behind these ancient texts.

