Google DeepMind has collaborated with classical scholars to create a new AI tool that uses deep neural networks to help historians decipher the text of damaged inscriptions from ancient Greece. The new system, called Ithaca, builds on an earlier text recovery system called Pythia.
Ithaca doesn’t just help historians recover text — it can also identify a text’s origin location and date of creation, according to a new paper the research team published in the journal Nature† Ithaca has even been used to settle an ongoing debate among historians about the correct dates for a group of ancient Athenian decrees. An interactive version of Ithaca is available for free and the team makes the code open source.
Many ancient sources—whether written on scrolls, papyri, stone, metal, or pottery—are so damaged that large chunks of text are often illegible. It can also be a challenge to determine where the lyrics come from, as they have likely been moved several times. As for accurately determining when they were produced, radiocarbon dating and similar methods cannot be used as they can damage the priceless artifacts. So the daunting and time-consuming task of interpreting these incomplete texts falls to so-called epigraphists who specialize in those skills.
As the folks at DeepMind wrote in 2019:
One of the problems with distinguishing the meaning of incomplete text fragments is that there are often multiple possible solutions. In many word games and puzzles, players guess letters to complete a word or phrase. The more letters specified, the more limited the possible solutions become. But unlike these games, which require players to guess a sentence individually, historians restoring a text can estimate the likelihood of several possible solutions based on other contextual clues in the inscription, such as grammatical and linguistic considerations, layout and form, textual parallels, and historical context.
To help speed up the process, DeepMind’s Yannis Assael, Thea Sommershield and Jonathan Prag teamed up with researchers at the University of Oxford to develop Pythia, an ancient text recovery system named after the high priestess who served as the Oracle of Delphi and the sayings of the god Apollo.
The researchers’ first step was to convert the Packard Humanities Institute (PHI) database — the largest digital collection of ancient Greek inscriptions — into machine-active text they called PHI-ML. That amounted to about 35,000 inscriptions and more than 3 million words from the 7th century BC to the 5th century AD. Next, the researchers trained Pythia (using both words and the individual characters as input) to predict the missing letters of words in those inscriptions. Pythia is trained to use the pattern recognition capabilities of deep neural networks.
When confronted with an incomplete inscription, Pythia produced as many as 20 different possible letters or words that could fill in the gaps, as well as the confidence level for each possibility. It was up to the historians (the ‘domain experts’) to explore those possibilities and make a final decision based on their substantive expertise.
The team tested the system by comparing Pythia’s results on completing 2,949 inscriptions with those of Oxford students in epigraphy. Pythia’s output had a 30.1 percent error rate, compared to a 57.3 percent error rate for the students. Pythia was also able to complete the task much faster, taking just a few seconds to decipher 50 inscriptions, compared to two hours for the students.