August 21, 2015
4PM - reception opens, food and drink served
5PM - presentation begins
Social Science Matrix, Barrows Hall, 8th Floor

6PM - afterparty at the DLab, 356 Barrows Hall (3rd floor, opposite side of the building)

Open to the public

RSVP via Eventbrite

Presented by Digital Humanities at Berkeley, a project of the Division of Arts & Humanities

David Bamman photoNatural language processing is a research area whose focus is the development of automatic methods that can reason about the internal structure of language, including part-of-speech tagging, syntactic parsing and named entity recognition--identifying the people and places in text and discovering the structure of who does what to whom. Over the past few years, NLP has become an increasingly important element in computational research in the humanities and social sciences, enabling sophisticated analyses that can go far beyond simple word counting. At the same time, however, there is a substantial gap between the quality of the NLP used by researchers in the humanities and the state of the art, since NLP research has overwhelmingly focused not only on one language (English) but also one domain (newswire)---leaving many other languages, dialects and domains (such as literary text) underserved.

In this talk, I'll advocate for two things that I think are necessary to drive the next generation of textual work in the computational humanities. First, I'll argue for the importance of structured linguistic representations in computational models of text, surveying several recent projects that have leveraged that structure to good effect. Second, I'll advocate for the development of high-quality NLP for the long tail of languages, dialects and domains that humanists study--and which humanists are in the best position to take the reins and make progress on. By leveraging standard machine learning techniques with disciplinary expertise only humanists can provide, we can both dramatically expand the scope of NLP to be applied to a much wider variety of texts in our cultural record and use the linguistic structure we infer to help define new tasks altogether.


David Bamman joins UC Berkeley this fall as an assistant professor in the School of Information, receiving his PhD from the School of Computer Science at Carnegie Mellon University.  His research uses natural language processing (NLP) and machine learning to extract meaning from text in order to answer empirical questions in the humanities and social sciences; he has published in collaboration with researchers whose home departments include English, Linguistics, Classics, and Near Eastern Studies. Prior to CMU, David was a senior researcher in computational linguistics at the Perseus Project of Tufts University.


Berkeley Center for New Media

UC Berkeley Social Science Matrix

Research IT at UC Berkeley

Download Flyer

(png) (pdf)


DH at Berkeley Summer Institute 2015