Computational Text Analysis (updated 6/16)

This series of workshops will focus on working with textual data. Participants will be introduced to tools for computational text analysis using a toy corpus of documents provided by the instructor. We will discuss some of the methodological concerns that accompany these computational methods.

Basic Syllabus / Topics Covered
  1.  Acquiring and Preprocessing texts
  2. Dictionary Methods: Measuring Weighted Word Usage
  3. Methods for Finding Discriminating Words
  4. The Vector Space Model and the Geometry of Text (Principal Components, Multi-dimensional Scaling, Most Similar Texts)
  5. Clustering Methods
  6. Topic Models
  7. Supervised Learning
  8. Quantifying Style: Grammar, Alliteration, and other Poetic Concerns
  9. Verification
  10. Regular expressions and word searching
Please read the following to get a sense of what we'll be doing.

Students are strongly encourged to complete this brief tutorial to learn the basic syntax of the R programming language.

Data Workflows and Network Analysis (updated 05/11) 

This workshop will discuss methods of data retrieval, data cleaning, and visualization.  Participants will discuss how websites are structured and learn how to collect a data set with webscraping.  Participants will learn how to use tools like OpenRefine for cleaning and transforming data and then visualize data using Gephi, an open source tool for network analysis.  Christopher Church, Assistant Professor of History at the University of Nevada, Reno, will return to UC Berkeley to lead this track. 

Geospatial Analysis

Geospatial analysis is a key pillar of digital humanities methods. These workshops will cover the basics of ArcGIS, a geospatial tool used in both industry and academic research in various environmental and social sciences.  Students will be introduced to georeferencing and geocoding data. Students will also meet with the GIS & Maps Librarian to discuss working with different geodata formats. This workshop will also preview tools and methods for publishing maps on the web.

Database Development Using Drupal

Databases form the backbone of many digital humanities projects, such as digital collections, interactive web maps, and computational text analyses. Before these projects can begin, data must be gathered, cleaned, and structured. Students will be introduced to data modeling and other considerations for structuring and storing data. We will discuss the advantages and disadvantages of various database tools and platforms. Individual teams will work with workshop instructors to develop a data model for their data. Students will work with Drupal, an open source content management system, to begin building a database for their project.

DH at Berkeley Summer Institute 2015