Skip to Main Content

Computational Textual Analysis

This guide gives resources for finding, cleaning, and analzying textual corpora.

Visualizing Twitter Data

The Webscraping libguide will show you how to get Twitter data. This Libguide will show you to visualize it.

After your Twitter data is collected, there are two broad types of analysis that you can perform: quantitative and qualitative.

For the purposes of this guide, I will focus on two specific types of research. The first is the more quantitative method of network analysis. Network analysis explores tweeters and groups, mapping tweeters, who they have replied to, and who they have mentioned. This method allows you to see how communities are formed.

The second is the more qualitative method of textual analysis. Textual analysis investigates actual tweet content to see if themes emerge around a certain keyword or #hashtag.


Sample Dataset

On July 31, 2018, Clemson University's Social Media Listening Center published their dataset to Github of almost three million tweets from 2,848 different Twitter handles associated with a 'troll factory' known as the Internet Research Agency. 

The 13 Russians recently indicted by Robert Mueller for interfering in the 2016 US presidential election worked for the Internet Research Agency, a private company with close ties to the Kremlin. The IRA engaged in a coordinated cyber attack distributing misleading and false information to sow divisions and distractions with the aim of derailing Hillary Clinton's campaign and, ultimately, help gain support for then-candidate Donald Trump. 

When Twitter disabled these accounts associated with the IRA, they removed them from public view. By publishing these tweets, Clemson University's SMLC provides fellow researchers and the public at large to examine the contents of what we now know was part of a Russian-led disinformation campaign in the 2016 election.

Voyant Visualizations

To better understand the actual content of your mined tweets, we can move on to more qualitative methods, drawing out themes hiding in the tweet content. To do this, we will move from NodeXL to a textual visualization and analysis tool called "Voyant." To do so, you will need to first go here. This is Voyant's homepage, and where you can paste all of your data.

Getting this data is very easy. Go to the column in your NodeXL workbook that contains the tweet content and select all of the rows cells that you want to analyze.

highlighted tweets

Copy this data and paste it into the Voyant search box. Click "reveal."

voyant search box

Now, Voyant provides you will a few tools to understand the content of your tweets. My screen looks like this:

voyant search results

The top left box shows me a word cloud. A word cloud displays all of the words present in my data set, increasing the size of each word the more it appears in my set. Clearly, the largest word is "#templemade" since this was my search term, and thus every tweet contained it. Below the word cloud, Voyant provides a summary of your input data. On the right, all of your content is listed, allowing you to click on any word and see the specifics about that word.

For the purposes of this guide, I will quickly focus on the word cloud. One way to better analyze your data is to set a stop word list. Click on the little gear in the upper right hand corner of your word cloud and, for "Stop Words List" choose your language. Then click "OK." The tool automatically removes words that usually don't matter for analysis, such as "the" and "a."

If you go back into the Stop Words List tool, you can also add in your own stop words that are not relevant to your analysis. Just click on "Edit Stop Words" and add to the list already there. For example, for my data, I added "it's," "," "rt," and "http" to my stop words list. (These are also terms that you can remove in a spreadsheet in the data cleaning process.) Now my word cloud provides me with an even better space for analysis:

word cloud

As with Voyant and network analysis, Voyant allows you to conduct many textual analyses. To learn more, search for video tutorials that directly speak to your research goals.