Skip to Main Content

Computational Textual Analysis

This guide gives resources for finding, cleaning, and analzying textual corpora.

Step 3: Analyzing Your Texts

Once you've got a reasonably clean corpus, you can run it through textual analysis programs to get results.

To learn how to do Stylometry, a specific mode of textual analysis for the purposes of determining authorship, check out our Stylometry Libguide

Tools to Play With

If you're just getting started and want to see what you can do, these tools are fun ways to get your feet wet, and can be valuable for teaching.

Advanced Toolboxes

These aren't the easiest tools to use, but are a good resource for more advanced users.
 

Topic Modeling

Word Embeddings

Word embeddings, also known as vector space models, offer insights into the associations of words in a corpus based on proximity. Unlike topic models, which give an overview of frequent words that appear across a series of documents, word embeddings offer a view of the likelihood of words to appear new each other.

With recent advances in machine learning for natural language processing, word embeddings have grown from the original word2vec algorithm to doc2vec and universal language models like BERT.

R

If you're prepared to learn to code, R is one of the preferred platforms for textual analysis, with built-in visualization tools.