Extracting Tweets With R

This article will give you a great, brief overview for extracting Tweets using R.

By Kritika Jalan

Twitter is a popular microblogging site that lets user tweet up to 140 characters, post pictures, videos and GIFs. What a user tweets about, gives away a lot of information about himself, his surroundings, likes, dislikes and preferences.

Companies, organisations and individuals have found a lot of ways to utilise this

The first step towards all this is extracting the tweets of concerned party or event into usable format. This article will help you get started with that!

Following are the steps we will be taking:

1. Create a Twitter application to extract

3. You will lend on application details page; move to ‘Keys and Access Tokens’ tab, scroll down and click ‘Create my access token’. Note the values of API Key and API Secret for future use. Thou shan’t share these with anyone, one can access your account if they get the keys.

4. In order to extract tweets, you will need to establish a secure connection between R and Twitter as follows:

Load necessary R packages and get CURL certification. ROAuth: R interface for OAuth, the open standard for token-based authorisation on the internet.

#Clear R Environment
rm(list=ls())
#Load required libraries
install.packages("twitteR")
install.packages("ROAuth")
library("twitteR")
library("ROAuth")
# Download the file and store in your working directory
download.file(url= "http://curl.haxx.se/ca/cacert.pem", destfile= "cacert.pem")

Set the certification at Twitter by making a call to OAuthFactory function

#Insert your consumerKey and consumerSecret below
credentials <- OAuthFactory$new(consumerKey='XXXXXXXXXXXXXXXXXX',
      consumerSecret='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
      requestURL='https://api.twitter.com/oauth/request_token',
      accessURL='https://api.twitter.com/oauth/access_token',
      authURL='https://api.twitter.com/oauth/authorize')

Let’s now ask Twitter for access!

credentials$handshake(cainfo="cacert.pem")

5. After executing the above code, you will be directed to Twitter’s authorisation screen. Click on Authorize App and note the PIN generated. Go back to RStudio and enter the PIN. Note, you will only need to do this once.

Save the credentials for later use:

save(credentials, file=”twitter authentication.Rdata”)

Extract Tweets

Now that we are all done with setting up gateways to reach Twitter, let’s get our hands dirty with real data. Function searchTwitter lets you search through Twitter and return a list of tweets consisting the searched text.

Below is a piece of code to extract tweets with the search string, #DataLove. Explore other parameters of this function that lets you filter for time period, geography etc.

#Load Authentication Data
load(“twitter authentication.Rdata”)
#Register Twitter Authentication
setup_twitter_oauth(credentials$consumerKey, credentials$consumerSecret, credentials$oauthKey, credentials$oauthSecret)
#Extract Tweets with concerned string(first argument), followed by number of tweets (n) and language (lang)
tweets <- searchTwitter('#DataLove', n=10, lang="en")

Closing Note

Extracting tweets is just the beginning. This data becomes beautiful when you add visualisation, identify patterns, analyse relations and get relevant insights. Check-out one such fun analysis here!

Hope you enjoyed adding a new skill to your Machine Learning portfolio!

Kritika Jalan is an experienced business analyst working in management consulting. She is skilled in R, Python, SQL, and other data analysis tools and machine learning techniques.