Category: David Bamman

Predicting Dates of First Publication in the HathiTrust

The rise of large-scale digitized book collections—such as those provided by Google Books, the HathiTrust and the
Internet Archive—is enabling a fundamentally new kind of text analysis that exploits the scale of collections to ask
questions not possible with smaller corpora. Many of these research questions are driven by historically deep textual
collections—corpora that span several decades or centuries in their publication. Moretti (2007) analyzes the changing

Read more