AWCA began 20 years ago as a digitization project. Since 2019 the analytical workflow has been built at the D-Lab by myself and a team of DS-Discovery students. The goal is to build a citation network from any collection of PDFs. The project data mines a large collection of sources from the disciplines of ancient Near Eastern Studies, Classics, Archaeology, and Middle Eastern Languages. The results of the project make this collection more internationally accessible for research by scholars in these fields by creating novel tools for computational textual analysis.
The site contains a series of Python Jupyter Notebooks. which implement different types of NLP tools for computational text analysis. These results are visualized in a series network graph, which map the relationships across the vast textual corpus of multilingual, primary and secondary sources in the field of Near Eastern Studies. The methods and tools are intended to be generally applicable to any collection of documents (in many languages). Results can be used to visualize the different types of language models in a network, thereby mapping the contours of the research landscape described within a collection of scholarly works.

Screenshots