Category: School of Information

Predicting Dates of First Publication in the HathiTrust

The rise of large-scale digitized book collections—such as those provided by Google Books, the HathiTrust and the
Internet Archive—is enabling a fundamentally new kind of text analysis that exploits the scale of collections to ask
questions not possible with smaller corpora. Many of these research questions are driven by historically deep textual
collections—corpora that span several decades or centuries in their publication. Moretti (2007) analyzes the changing

Read more

Marti Hearst

Marti Hearst is  a Professor in the School of Information and EECS at UC Berkeley.  She develops algorithms and tools that combine natural language processing and user interface design to support research efforts in the digital humanities.   She is Vice President of the Association for Computational Linguistics, and is a member of the CHI Academy and a Fellow of the ACM.

Read more

David Bamman

David Bamman is an assistant professor in the School of Information at UC Berkeley, where he applies natural language processing and machine learning to empirical questions in the humanities and social sciences. His research involves adding linguistic structure (e.g., syntax, semantics, coreference) to statistical models of text. Before Berkeley, Bamman received his PhD at Carnegie Mellon (School of Computer Science, Language Technologies Institute) and was a senior researcher at the Perseus Project of Tufts University.

Read more