Skip to Main Content

Introducing the Digital Humanities

This guide provides a list of resources and brief descriptions of several varying areas that encompass the digital humanities.

Example Projects

What is text analysis?

The first digital humanities project was text analysis.  As more text becomes available in machine readable digital formats it becomes easier to apply text analysis methods to answer research questions.  These techniques, often referred to as distant reading, allow you to examine a large volume of texts, known as a corpus, to learn about the information they contain and answer questions.  This is opposed to the traditional close reading many people are more familiar with where you closely scrutinize individual texts.  

Specific methods include concordances, the arrangement of words within a text that allows you to find its frequency and where; topic modelling, which identifies the reoccuring theme of texts based on computational linquistics and common words; and stylometry,  the statistical analysis of writing style that is useful in research such as authorship studies.

Before you can apply textual analysis methods you must assemble and clean your corpus.  This is often the most laborious part of a text analysis project as you may need to find machine readable copies of your textual materials, make them machine readable through OCR (optical character recognition), remove formatting and tags from XML or HTML, and clean up mistakes from OCR.


Tools for Corpus Cleaning