Aggregating activity from Twitter
Interested in building a real time aggregator for Twitter? Who isn't? You have lots of options:Just the vanilla API
Simply call user_timeline for each user that you are interested in every x minutes.
The standard rate limit on the Twitter API is 100 requests per hour e.g. checking 25 users every 15 minutes is pretty much the best you'll be able to do. If you're a lazy chancer you can try and get your application whitelisted which removes rate limits.
Good:
- Very simple
Not so good:
- Too simple - won't scale.
- Slow update time (while the number of calls you can make per hour is limited)
- Seeing so much redundant data returned for each call makes the internet cry.
Vanilla API + robot
Create a new Twitter account, log in and follow the people you're interested in aggregating tweets from. You don't have to follow people manually - you could do it programmatically using the friendships/create API call.
Now just check the friends_timeline for that user as often as you like (up to the hourly rate limit, obviously). Page through results if necessary.
Twitter has some (sensible) rules about follower / following ratios. Once you're following ~ 800 people further follow requests will be blocked; you have to wait until you have more followers before adding anybody else. You can't whitelist your way out of this.
Good:
- Again, pretty simple.
- Better update time (aggregation within a couple of minutes of a tweet)
Not so good:
- Can only follow ~ 800 people before Twitter starts blocking your follow requests.
- Users will know that you're aggregating them (is this a bug or a feature?). Can't keep following / unfollowing people - they'll get spammed by emails telling about it.
GNIP
GNIP works activity streams from a bunch of different web 2.0 sites. Here's how it works in a nutshell:
- you set up a GNIP account
- you add rules to your account ("give me all tweets by @twalf" "give me all tweets by @ianmulvany") and set up a web hook (a script on your server). You can have up to 25k rules per site for free.
- GNIP receives data in real time from Twitter
- If any data matches your rule set then GNIP POSTs to your web hook with some metadata about the matching tweet (a unique id, the tweeter's username, a URI for the actual message)
Now you'll get pinged whenever anybody in your rules tweets - in close to real time.
Rules can be added programmatically or by hand. GNIP's API docs are pretty opaque but it's actually a fairly simple, efficient system once you've gotten to grips with it.
Unfortunately the metadata that gets POSTed to you doesn't contain the actual tweet. For that you have to go back to Twitter using the supplied URI, which points to the message in XML format. Remember that there's a rate limit on the Twitter API so by default you won't be able to aggregate more than a hundred messages per hour. This sucks. Whitelisting is pretty much the only way you're going to overcome this.
Twitter on GNIP is unique in this respect; none of the other services require you to call the originating site to get messages. It's especially annoying as tweets are only 140 characters long - it's definitely not a space / bandwidth issue!
Good:
- Fast update time (pretty close to real time)
- GNIP infrastructure can help you aggregate from other sites (Digg, Delicious...) in the future.
- Follow up to 25k people for free and without scaling issues.
Not so good:
- Relatively complex.
- GNIP can be a bit flaky - occasionally it goes down and you lose updates for a few hours.
- Requires whitelisting by Twitter once you're collecting more than a hundred tweets p/h.
Twitter streaming API
Twitter has a streaming API in alpha.
You can follow up to 200k users by POSTing their ids to http://stream.twitter.com/birddog.json - after you've been approved by Twitter and signed a usage agreement.
You can follow up to 200 users for free using http://stream.twitter.com/follow.json which is similar.
Once you've opened a connection to shadow or birddog it'll never close. When a followed user tweets it'll come down the wire as a line of JSON (ending with a carriage return). Think Comet.
Good:
- As fast an update as you're ever going to get.
- Don't need to rely on third parties (like GNIP)
Not so good:
- Still in alpha.
- Need an agreement from Twitter to follow more than 2k users.
- Complex (in that it requires you to move away from reactive, asynchronous scripts towards an app that can keep an HTTP connection open for hours)
Rose
Anonymous
LargelyPolitical
. This post has trackbacks.

