Mining Twitter Data using Python: Getting Started

Data Mining is a hot topic these days, and Twitter is being used heavily as a data source in various Data Mining applications. In this post I will introduce you to start mining twitter data with Python using the Tweepy module.
( I will not include the scientific module examples here( for mining,analysing ...etc). It's a basic guide to get the Twitter API setup)

Environment Setup

1. Install python ( MacOS comes with python installed)

2. Get a Twitter API key
    Go to, sign-in to twitter ( create an account if you don't already have one)
    Click the profile Icon ( top left) -> My Applications -> Create New App
    Provide the necessary data and it will create an application.
    Go to the application -> click on API Keys tab
    This will show you the necessary keys to authenticate your application using OAuth.

3. Install Tweepy
   Tweepy is a python library which supports the Twitter API
   Install in Mac:
pip install tweepy
sudo apt-get install python-tweepy
   Here's the github project :

Now you are ready to read some tweets!!

The code to get the twitter stream, ( insert your keys to this file)

from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener

#setting up the keys
consumer_key = ''
consumer_secret = ''
access_token = ''
access_secret = ''

class TweetListener(StreamListener):
    # A listener handles tweets are the received from the stream.
    #This is a basic listener that just prints received tweets to standard output

    def on_data(self, data):
        print data
        return True

    def on_error(self, status):
        print status

#printing all the tweets to the standard output
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

stream = Stream(auth, TweetListener())

This prints the whole twitter stream filtered using the text "nba".

getting user info:

import tweepy

auth = OAuthHandler(consumer_key,consumer_secret)
api = tweepy.API(auth)

auth.set_access_token(access_token, access_secret)
twitterStream = Stream(auth,TweetListener())

user = api.get_user('sachithwithana')
print user.screen_name

This is a basic example to get you set up. Now you are ready to explore with the Twitter API.

I would recommend using the scikit-learn library for Machine Learning with Python.

Here's the Tweepy Documentation:


