Skip to main content

Mining Twitter: Home

Scraping, Visualizing, and Analyzing Twitter Data

Mining Data

With the plethora of data available in digital spaces, it only makes sense that scholars want to access, visualize, and analyze this data. Many software packages are available to aid in this pursuit, and programming languages paired with social media APIs make the scraping process even more customized. This guide will focus on scraping, visualizing, and analyzing Twitter data.

Originally created by: Angela Cirucci

Currently updated & maintained by: Elizabeth Rodrigues

Some tips before you get started:

  1. Browse around for other studies that generally match your research ideas.
  2. Play with some of the popular tools. (I outline a few under the "Scraping" tab.)
  3. Chat with other people!
  4. Design your project before you jump in.

Why Twitter?

Twitter is a micro-blogging site where users can broadcast status updates of 140 characters or less. If you aren't that familiar with the site, you can explore it here.

While there are many social networking sites that hold rich information for research, Twitter is an ideal space because:

  1. Most profiles are public: Other sites like Facebook and Instagram may have interesting data. However, depending on users' privacy setting, you will only be able to collect certain information, potentially skewing your findings.
  2. Tweets are more than just text: Each time you mine data from Twitter you are open to collecting a lot of information. This may be tweet-based content such as photos, links, and geo-locations. And it also may be user-based content such as profile picture, number of followers, and date of sign up.
  3. Tweets can only be 140 characters: The fact that each tweet entry can only be 140 characters may seem limiting at first, but it is actually quite helpful in the analysis process. Further, users are learning how to adapt to the smaller space; therefore you are not really missing out on any important data that 141+ characters would have allowed.
  4. Twitter users can have both friends and followers:  Unlike a site like Facebook where friending is reciprocal, on Twitter users can gain followers without adding them to their friends list. Because of this, potential audiences are better analyzed and network maps can be more dynamic, revealing more information.

Why scrape Twitter?

Before getting started witih your research, you want to be sure that your research question matches the types of research best served by Twitter data. Social media scraping, in general, is best utilized when you are trying to understand some phenomenon that is taking place online.

In particular, Twitter data allows you to:

  • Understand your own twitter network or the influence of your own tweets
  • Collect data about tweeters (followers, friends, signup date, favorites, profile picture, etc.)
  • Know who is mentioned through @usernames
  • See how information disseminates
  • See the influence/popularity of tweets and people
  • Examine networks and communities
  • Explore how trends develop and change over time

Content specialist

After scraping

Scraping is only the first step. Once you have your data, there will be much cleaning and organizing to do.

Mining the site only gives you raw information. Some scraping software packages come with visualization and analysis tools. You can also employ other methods and tools.

Some ways of visualizing and analyzing your data include: comparing to other online norms, comparing to other social networking site performances, and comparing to offline phenomena. For example, you may want to compare networks with quantitative methods, or you may want to compare the content of tweets from two different hashtags with qualitative methods.

The rest of this guide will introduce you to some of my preferred tools and methods for scraping, visualizing, and analyzing Twitter data.