Research Guides: Webscraping: Twitter Scraping

Twitter

Twitter is a micro-blogging site where users can broadcast status updates of 140 characters or less. If you aren't that familiar with the site, you can explore it here.

While there are many social networking sites that hold rich information for research, Twitter is an ideal space because:

1. Most profiles are public: Other sites like Facebook and Instagram may have interesting data. However, depending on users' privacy setting, you will only be able to collect certain information, potentially skewing your findings.

2. Tweets are more than just text: Each time you mine data from Twitter you are open to collecting a lot of information. This may be tweet-based content such as photos, links, and geo-locations. And it also may be user-based content such as profile picture, number of followers, and date of sign up.

3. Tweets can only be 140 characters: The fact that each tweet entry can only be 140 characters may seem limiting at first, but it is actually quite helpful in the analysis process. Further, users are learning how to adapt to the smaller space; therefore you are not really missing out on any important data that 141+ characters would have allowed.

4. Twitter users can have both friends and followers: Unlike a site like Facebook where friending is reciprocal, on Twitter users can gain followers without adding them to their friends list. Because of this, potential audiences are better analyzed and network maps can be more dynamic, revealing more information.

Why Scrape Twitter?

Before getting started with your research, you want to be sure that your research question matches the types of research best served by Twitter data. Social media scraping, in general, is best utilized when you are trying to understand some phenomenon that is taking place online.

In particular, Twitter data allows you to:

Understand your own twitter network or the influence of your own tweets
Collect data about tweeters (followers, friends, signup date, favorites, profile picture, etc.)
Know who is mentioned through @usernames
See how information disseminates
See the influence/popularity of tweets and people
Examine networks and communities
Explore how trends develop and change over time

Some Scraping Tools

There are a variety of tools for Twitter scraping that are easier to use than actual code-based scraping. These tools do not require any knowledge of coding programs, and instead have been set up as ready-to-use websites for easy collection of Tweets. I include a few of the best ones below.

For visualizing Twitter, see our network and text analysis Temple guides.

Twarc

Truthy

University of Indiana's Observatory on Social Media (OSoMe) has developed several different tools for studying information diffusion on social media.

Track how different memes trend over time
Hoaxy visualizes the spread of claims versus fact-checked articles
Botometer checks the likelihood of a Twitter account being a bot
Networks employs network analysis explores who is discussing a meme and how different memes are related