Mining Twitter Data using Python: Getting Started

February 15, 2014

Mining Twitter Data using Python: Getting Started

Data Mining is a hot topic these days, and Twitter is being used heavily as a data source in various Data Mining applications. In this post I will introduce you to start mining twitter data with Python using the Tweepy module.
( I will not include the scientific module examples here( for mining,analysing ...etc). It's a basic guide to get the Twitter API setup)

Environment Setup

1. Install python ( MacOS comes with python installed)

2. Get a Twitter API key
Go to https://dev.twitter.com/, sign-in to twitter ( create an account if you don't already have one)
Click the profile Icon ( top left) -> My Applications -> Create New App
Provide the necessary data and it will create an application.
Go to the application -> click on API Keys tab

This will show you the necessary keys to authenticate your application using OAuth.

3. Install Tweepy
Tweepy is a python library which supports the Twitter API

Install in Mac:

pip install tweepy

Ubuntu:

sudo apt-get install python-tweepy

Here's the github project : https://github.com/tweepy/tweepy

Now you are ready to read some tweets!!

The code to get the twitter stream, ( insert your keys to this file)

#imports
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener

#setting up the keys
consumer_key = ''
consumer_secret = ''
access_token = ''
access_secret = ''

class TweetListener(StreamListener):
    # A listener handles tweets are the received from the stream.
    #This is a basic listener that just prints received tweets to standard output

    def on_data(self, data):
        print data
        return True

    def on_error(self, status):
        print status

#printing all the tweets to the standard output
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

stream = Stream(auth, TweetListener())
stream.filter(track=['nba'])

This prints the whole twitter stream filtered using the text "nba".

getting user info:

import tweepy

auth = OAuthHandler(consumer_key,consumer_secret)
api = tweepy.API(auth)

auth.set_access_token(access_token, access_secret)
twitterStream = Stream(auth,TweetListener())

user = api.get_user('sachithwithana')
print user.screen_name

This is a basic example to get you set up. Now you are ready to explore with the Twitter API.

I would recommend using the scikit-learn library for Machine Learning with Python.
http://scikit-learn.org/stable/

Here's the Tweepy Documentation:
http://pythonhosted.org/tweepy/html/

Comments

UnknownApril 11, 2014 at 7:18 AM
Sachith,

I just started harvesting a Twitter stream. Thank you! I am still learning about computation on graphs, and also considering what kind of statistical models might be cool to implement. Will let you know if/when something comes of it.

Again, thank you.

Chris
ReplyDelete
Replies
Sachith WithanaApril 11, 2014 at 9:02 AM
Thanks mate!
Yeah try them out and please let me know if can :)
You can use the scikit-learn if you are going to do any Machine Learning stuff :)
ReplyDelete
Replies
Thilina ThanthriwattaApril 12, 2014 at 7:39 PM
Nice guidance, thank you very much for saving lots of time
ReplyDelete
Replies
UnknownJanuary 24, 2015 at 7:47 AM
Hi, I was wondering if tweepy could be installed on chrome OS through the python app?

Thank you for this guide, I will use it on my ubuntu box.
ReplyDelete
Replies
MageshFebruary 14, 2015 at 11:14 PM
Hi,
I have downloaded twitter data and saved them as json in a .txt file. Just wondering if there is any online help to understand how to clean it up, convert it to a database and use it in R for data mining. I am new to python.
Magesh
ReplyDelete
Replies

Add comment

Search This Blog

Sachith's Matrix

Mining Twitter Data using Python: Getting Started

Comments

Post a Comment

Popular Posts

Kerberos Java Client: Configuration

Apache Thrift: Securing the Cilent Server Communication using SSL