Go from Analog to Digital Texts with OCR

by Quinn Dombrowski and Stacy Reardon

A collection of digitized texts marks the start of a research project — or does it?

For many social sciences and humanities researchers, creating searchable, editable, and machine-readable digital texts out of heaps of paper in archival boxes or from books painstakingly sourced from overlooked corners of the library can be a tedious, time-consuming process.

Read more

Digital Humanities for Tomorrow: Opening the Conversation about DH Project Preservation

By Rachael G. Samberg & Stacy Reardon

After intensive research, hard work, and maybe even fundraising, you launch your digital humanities (DH) project into the world. Researchers anywhere have instant access to your web app, digital archive, data set, or project website. But what will happen to your scholarly output in five years? In twenty-five? What happens if

Read more

Event Recap: Christopher Oghe on Literary Editing and Preservation at the Mark Twain Project

On October 13th, the Lit+DH Working Group welcomed Christopher Ohge of the Mark Twain Project.  He spoke about the challenges of archival work, drawing up an instance of editorial conflict between Mark Twain and his editor that led to the omission of 56 pages of material from chapter 16 of The Adventures of Huckleberry Finn, which were discovered in 1990. While restoring the omitted pages would improve the flow of the chapter, doing so would be at odds with Mark Twain’s expressed editorial preferences.

Read more

Second Annual Summer Institute - Recap

The second annual Digital Humanities at Berkeley Summer Institute (DHBSI), August 15-19, 2016, grew to offer 6 courses to 100 participants. Geospatial Analysis, Data Workflows and Network Analysis, Database Development Using Drupal, and Computational Text Analysis were offered again, while Intro to Digital Humanities and Qualitative Data Analysis were offered for the first time.

 

In addition to individual courses, DHBSI included daily events, open to the campus community. On Monday, Eleanor Dickson and Peter Organisciak from the HathiTrust Research Center (HTRC) presented on computational text analysis and methods for accessing the data in the HTRC collections. That evening featured a keynote by noted digital humanities and new media scholar Tara McPherson (USC), entitled “DH by Design: Alternative Origin Stories for the Digital Humanities.” McPherson’s talk highlighted the importance of keeping theory and methods together in DH.

Read more

A Humanist Apologetic of Natural Language Processing; or A New Introduction to NLTK. A Guest Post by Teddy Roland, University of California, Berkeley

Computer reading can feel like a Faustian bargain. Sure, we can learn about linguistic patterns in literary texts, but it comes at the expense of their richness. At bottom, the computer simply doesn't know what or how words mean. Instead, it merely recognizes strings of characters and tallies them. Statistical models then try to identify relationships among the tallies. How could this begin to capture anything like irony or affect or subjectivity that we take as our entry point to interpretive study?

Read more

"Topic Modeling: What Humanists Actually Do With It." A Guest Post by Teddy Roland, University of California, Berkeley

Pennsylvania Gazzette

One of the hardest questions we can pose to a computer is asking what a human-language text is about. Given an article, what are its keywords or subjects? What are some other texts on the same subjects? For us as human readers, these kinds of tasks may seem inseparable from the very act of reading: we direct our attention over a sequence of words in order to connect them to one another syntactically and interpret their semantic meanings. Reading a text, for us, is a process of unfolding its subject matter.

Read more

Pages