Event date
Tuesday, May 9, 2017
Event time
10:00am to 12:00pm
Barrows 371: D-Lab Breakout Room

This hands on workshop goes through the common “preprocessing recipe” that is used as the foundation for a variety of other applications as well as some basic natural language processing techniques.  These include: a) digitization (utf 8), b) removal of stopwords, numbers, punctuation, c) tokenization, d) calculation of word frequencies / proportions, e) part of speech tagging, and f) concordances.
Prior knowledge: We will be using the NLTK Python package, so basic familiarity with Python is required if you wish to follow along with the tutorial. Completion of D-Lab's Python FUN!damentals workshop series will be sufficient.
This workshop is one of a four-part series that will prepare participants to move forward with text analysis research, with a special focus on humanities and social science applications. Please register for each workshop separately. The other workshops in the series are listed below:
Text Analysis Fundamentals: Methods and Approaches
Text Analysis Fundamentals: Unsupervised Approaches
Text Analysis Fundamentals: Supervised Methods