This guide walks you through the steps and provides you with resources for finding and cleaning your texts--the necessary preliminaries to textual analysis projects. This isn't the most glamorous part of digital scholarship, but with these resources, you can move quickly and avoid the most common pitfalls.
Why would you need to prepare a corpus? Most textual analysis projects rely on plain-text digital versions of documents, which are easily read and analyzed by a computer. These documents need to be relatively clean of spelling errors, clearly labeled, and all collected in one place for the results to be accurate.
Use the tabs at the top to explore the most common steps of preparing a digital corpus: finding, cleaning, and analyzing.
Remember, on its most basic level, all textual analysis is counting and classifying words. Some of these tools are more advanced than others, of course!
A few tips:
What does computer-assisted textual analysis mean?
Textual analysis is any time you look at a text and pull out a deeper meaning. Computer-assisted textual analysis just means that you use some type of computing to help.
The real reason you might do this is because it can show you something that you might not otherwise be able to see.
What does it help you do that you couldn’t otherwise do?
Tip: If you’ve got a question that can be answered by a small number of texts that you can realistically and easily read, it might not be the best question for computer-assisted textual analysis.
Tip: Textual analysis software is changing very quickly, and there is probably a new tool out there today that wasn’t there yesterday that will let you do this—and more!