In spring 2016, the Dutch Studies Program at the German Department and the Bancroft Library partnered in a collaborative research grant through Digital Humanities to prepare a digital research collection from selected primary source materials in the Engel Sluiter Historical Documents Collection at The Bancroft Library. This collection consists predominantly of copies and transcriptions of primary source materials on the seventeenth-century Atlantic. These typed transcriptions of archival materials were previously inaccessible to most researchers because of difficulties in reading seventeenth-century Dutch paleography. The project sought to design a web presentation for the “Colonial New Netherland” subset of documents, focused on the seventeenth-century Dutch colony of New Netherland, later, New York. The goal of the project was to digitize, extract, and clean the historic text, in order to present “research ready” text to enable natural language, machine-processing capabilities over these archival documents. 823 documents from the collection were digitized as TIFF files and the digitized versions were run through Optical Character Recognition to generate text files. The OCR text files were manually reconciled and corrected by way of the OCR Virtual Desktop supported by BRC’s Analytic Environments on Demand service. The corrected texts were recombined into new PDF files, then run against web-based text analysis environment, “Voyant Tools,” to explore the texts and determine if they were research ready. The results were put into a website which presents the final research products, comprised of the corrected texts, presented as PDF files for use by researchers interested in doing text analysis over these archival documents. The text files can be used with other natural language processing tools, such as topic modeling, entity extraction, and keyword extraction, to explore and expand access to the documents. In addition to the project website presentation, the corrected texts are fully text searchable and published through Calisphere.

Project type