Skip to main content

Webscraping

Scraping, Visualizing, and Analyzing the Web

Mining Data

With the plethora of data available in digital spaces, it only makes sense that scholars want to access, visualize, and analyze this data. Many software packages are available to aid in this pursuit, and programming languages paired with social media APIs make the scraping process even more customized. This guide will focus on scraping, visualizing, and analyzing web data.

Downloading data from APIs, such as from social media platforms like Twitter or YouTube, is the easiest way of gathering data from online sources. You can also webscrape html pages for the information they contain. In this LibGuide we review tools and methods for scraping a couple social media platforms and website pages

 

Currently updated & maintained by: Nicole Lemire Garlic, 2019-2020

Updated by: Elizabeth Rodrigues and Caroline Tynan, 2018

Originally created by: Angela Cirucci, 2017

Why scrape the web?

Explain why someone might webscrape. 

Add some disciplines that are typically on this subject. Cite AOIR

Tools

There is an ever-changing array of tools and platforms to mine web data. Listed below are some software packages that are free, or at least offer a free version, and that are a good place to start.

This guide also covers scrapping with programming scripts. Although scripting methods require coding experience or the willingness to learn, for more in-depth or customized studies, it is often beneficial to learn enough code to use one of these approaches.

After scraping

Scraping is only the first step. Once you have your data, there will be much cleaning and organizing to do.

Mining the site only gives you raw information. Some scraping software packages come with visualization and analysis tools. You can also employ other methods and tools.

Some ways of visualizing and analyzing your data include: comparing to other online norms, comparing to other social networking site performances, and comparing to offline phenomena. For example, you may want to compare networks with quantitative methods, or you may want to complete content or sentiment analysis of web texts.

The rest of this guide will introduce you to some tools and methods for scraping, visualizing, and analyzing web data.