Histoinformatics in India — Epigraphy

Srinidhi
7 min readSep 21, 2022

--

Recently, I participated in the grand finale of Smart India Hackathon 2022. The problem statement our team worked on was one by Indian Knowledge Systems, “OCR of Temple Inscriptions”. Our team, named Vigrah, tried to develop an OCR system Aśokan Brāhmī Inscriptions.

The project was definitely difficult as little work has been done in this field. Let me give a short sketch of the technicalities of our project. We first dug through all the resources online and milked all the available data to make a dataset of 3000 Brāhmī character images. We tried several image processing methods to clean inscription photographs and segment them into individual characters. We augmented our dataset to get a dataset of 2 lakh+ images. This was then trained with a Convolutional Neural Network. Multiple night-long training of models later, the accuracy reached a reasonable rate.

Yet, I felt a sense of incompleteness. What and for whom is this OCR system? Is it really necessary? What is the bigger picture? What is the elephant in the room and what problem are we trying to address?

In this article, I shall elaborate on my understanding of these questions. I will first explain the major concerns for epigraphical research, epigraphical heritage preservation and promotion in 21st century India. Then, I shall move on to address the bigger picture and finally talk about what has been done and what needs to be done.

Epigraphy in 21st Century India

It will superficial for me to talk about the importance of Indian epigraphy at this point. Scholars are of the unanimous opinion that inscriptions are the most important source of Indian history, especially before 1000 AD. Indian inscriptions have been studied for more than 200 years now, and the late 19th century and early 20th century saw a massive peak in the interest and research by the G.O.A.Ts of Indian Epigraphy. It was during this time that series like Epigraphia Indica, South India Inscriptions, Epigraphia Carnatica, Corpus Inscriptionum Indicarum, and Indian Antiquary flourished. To this day, there are no works that can substitute them.

Today, it is variedly estimated that there are 90,000 to 2,00,000 inscriptions of Indian origin. Yet, less than 60,000 have been edited and published. Even these are available only in journals like the above-mentioned ones, making it impossible for a layman to appreciate any of them.

In the Pondicherry Lit Fest of 2022, Sandeep Balakrishna’s talk on “Indian Historiography at 75”, he says that distinguished scholars informed him that “in the next 25 or 30 years, there won’t be a single Indian scholar living in India who would be able to decipher valuable inscriptions and other primary sources of Indian history written in Indian languages.”[1] In an interview with The Hindu in 2010, Japanese Scholar Noboru Karashima spoke on the condition of epigraphy in India “These days most scholars, Indian and foreign, depend on summaries of the inscriptions that appear in the annual reports. They, therefore, don’t go into the material.” [2]

Further, there is a total lack of knowledge of epigraphy among the general public. While one may fuzzily recall middle school history textbooks talking of Aśokan edicts, but that is generally all about it. Tourists throng famous heritage sites like Hampi or Ellora in large numbers. A tiny board may explain a line or two about the inscription(s) on the site but hardly anyone can relate to or appreciate them. This is because it requires a comprehensive understanding of the historical time period in discussion.

Also, conservation and preservation is a major concern. Of course, weather and natural phenomenon have caused a certain extent of damage to inscriptions but have you not found tourists negligently writing and scribbling on the walls of monuments, sometimes even on top of inscriptions? In 2018, I visited Nāgḍā, an abandoned historical site off Udaipur, where I found an inscription in the Sahasrabahu Temple. The condition was pitiable. No board or any information explained what it was. To date, I have been unsuccessful in finding anything about it.

At this point, we may conjecture that the problem is triple-fold:

1. Epigraphical research within academic circles

2. Management of Epigraphical heritage — its preservation and protection

3. Democratizing the access to Epigraphy and its promotion among laymen

The Big Picture

As a student of engineering and design, this understanding of history led me to explore Histoinformatics. It is still in its infant stage in India, but on a global level, many top universities have excellent research groups and labs for fields like Histoinformatics, Digital Humanities or Computational Archeology.

Histoinformatics, Digital Humanities, Digital Archeology, Computational History/Archeology — while I’ve used these words synonymously, they are distinct but allied fields. And for the purposes of this article, what applies to one applies to the rest. They correspond to the fields of study where computing and technology intersect disciplines of Humanities like History, Archeology, Epigraphy etc.

Histoinformatics specifically deals with computational approaches to the processing, organization, and analysis of digital historical data.

In 2013, the first ever workshop for Histoinformatics was conducted in Kyoto, Japan. Since then it has happened 5 times, the last one in 2021. Do take a look at the “Themes and Topics” to get a better understanding of what the field is about. [3]

The Digital Archaeology research group at Leiden University, for example, has some interesting projects. Dr. Alex Brandsen used deep learning to develop a smart search engine for archaeologists. Dr. Sarah Klassen uses LiDAR information to map previously concealed and undocumented urban Landscapes in Angkor. [4]

The case at home

The avenues for research and application are ample. The sheer size of our epigraphic corpora provides a multitude of opportunities. It is impossible for any scholar to completely know all the inscriptions. It is only a computer that can handle such a large amount of data. For example, Text mining and NLP on the inscription texts can give lots of new and beneficial insights. A database of inscriptions will be very valuable for scholars and laymen alike. The massive amount of detailed information from Chōl̥a inscriptions can lead to complex Network Analyses and reveal a lot more about the cultural, social and political life during that period.

But we have a problem here — Histoinformatics deals with historical data, where is our data?

It may be surprising that one of the most famous sets of inscriptions in India — even the Aśokan edicts do not have a digitized version in Brāhmī script. For the numerous famous important inscriptions — Prayagraj Prashasti of Samudragupta, Aihole inscription of Pulakesin, Leiden copper plates of Rajaraja Chola, etc, one may even find a translation on its Wikipedia page but it is hard to get the text. One needs to manually find out the journal issue in which they were published, check if a scanned PDF is there in open source, and try to read a scanned page of mediocre quality. Say you want to copy a line; you need to type then all over as is it not available in text form and the quality forbids us from using an OCR to extract the text. We do not even have a comprehensive catalog of the inscriptions.

This problem has been highlighted more than once in the past. Nearly thirty years ago, Riccardo Garbini’s seminal paper in the Journal of Epigraphical Society of India titled “Software development in Epigraphy” was among the first works on this line. [5]

A word of caution

The scene is not empty, several attempts have been made to bring inscriptions to the digital space. Projects, some of them ongoing, have been working towards the same goal. In fact, earlier this year, the Epigraphy department of the ASI finally began a mammoth project of digitizing all the estampages which will be stored as microfilms in the Arctic World Archive (AWA)[6]. But this is not something to be happy about because nearly all of them are by research groups and companies outside India. Why is this a concern?

In the 21st century, data is not merely information but very powerful which needs to be harnessed. To quote Rajiv Malhotra, “Today, data is power but the Indian establishment hasn’t understood the value of power, considering the control it is allowing the foreign digital companies to have here.” This remains true in the context of historical data as well. [7]

It is important that the research and implementation are done by Indians in our labs and companies in India. The work has to be done keeping in mind a vision for the next several decades.

Which is why, merely some initiative by ASI centers to impulsively digitize some inscriptions, or when a research group thousands of miles away from India works on cataloging the Gupta Inscriptions or a central ministry hosting competitions for student groups to work on an ancient script OCR is far from what needs to be done. What we need is a rigorous national-level policy and vision for creating the necessary digital infrastructure for tangible and intangible Indian heritage.

In the past India has made mistakes in the context of Epigraphical research. Her inscriptions lay neglected and forgotten until Brāhmī had to be again “deciphered” and we had to find out about Rājēndra Chōl̥a’s naval campaign to South East Asia through his inscriptions by foreign historians. If we do not adopt and invent technology for our heritage, we may be making another mistake. To again quote, “That knowledge has already been safely stored in the institutions of the West who will once more come here to teach India to Indians.”

  1. Balakrishna, Sandeep. (2022). Indian Historiography at 75. [Website]The Dharma Dispatch. Available at:https://www.dharmadispatch.in/commentary/indian-historiography-at-75 [Accessed: 21 September 2022]
  2. Menon, P., 2010. Unless knowledge of epigraphy develops, no ancient or medieval history of this country can be studied. The Hindu, [online] Available at: <https://www.thehindu.com/opinion/interview/lsquoUnless-knowledge-of-epigraphy-develops-no-ancient-or-medieval-history-of-this-country-can-be-studied/article15576712.ece> [Accessed 21 September 2022].
  3. https://dh.virginia.edu/event/histoinformatics-2021-workshop
  4. https://www.universiteitleiden.nl/en/archaeology/archaeological-sciences/digital-archaeology
  5. Garbini, R., 1993. Software development in Epigraphy. Journal Of The Epigraphical Society Of India, Volume 19.
  6. https://www.pressreader.com/india/millennium-post-kolkata/20220717/281728388243652
  7. https://www.dailypioneer.com/2019/state-editions/data-is-power--india-doesn---t-understand-this--rajiv-malhotra.html

--

--

No responses yet