Skip to Main Content

Webscraping

Scraping, Visualizing, and Analyzing the Web

Scraping Scripts

I have found that some studies require more customized Twitter mining. For this, it becomes necessary to learn a programming language and write a script that perfectly scrapes the information that you need.

Popular programming languages for social media mining include RubyPython, and R. A quick internet search will provide you with a plethora of written walk-throughs and video tutorials to help you learn these programming languages for free.

Once you have an idea of how they work, it is easy to search around for others who have shared their scraping scripts and then personalize for your own needs. Of course, if you learn the language well enough, you can write your own scripts!

Beyond just learning your chosen programming language, you must also familiarize yourself with Twitter's API (Application Programming Interface). This is Twitter's programming language that you will use in your scripts so that Twitter knows exactly what information they need to send back to you when you request it. This language is already integrated into the scraping software packages that I summarize above but invisible to the user.

 

Beautiful Soup

One Web scraping package that is great for those with a bit of familiarity in Python is Beautiful Soup. This, like most packages for Python  uses HTML to parse data from a webpage. It returns unstructured data in what is known as a parse tree. While requiring a bit more on how to read HTML tags and overall messier in the format of the data it provides, Beautiful Soup can be used to get around limits to APIs.