Flags and Lollipops

Saturday, June 13, 2009

Aggregating activity from Twitter

Update: you can't follow a specific set of users using GNIP any more - their feed is equivalent to the 'spritzer' method in the official Twitter API.

Interested in building a real time aggregator for Twitter? Who isn't? You have lots of options:

Just the vanilla API

Simply call user_timeline for each user that you are interested in every x minutes.

The standard rate limit on the Twitter API is 100 requests per hour e.g. checking 25 users every 15 minutes is pretty much the best you'll be able to do. If you're a lazy chancer you can try and get your application whitelisted which removes rate limits.

Good:
  • Very simple

Not so good:
  • Too simple - won't scale.
  • Slow update time (while the number of calls you can make per hour is limited)
  • Seeing so much redundant data returned for each call makes the internet cry.

Vanilla API + robot

Create a new Twitter account, log in and follow the people you're interested in aggregating tweets from. You don't have to follow people manually - you could do it programmatically using the friendships/create API call.

Now just check the friends_timeline for that user as often as you like (up to the hourly rate limit, obviously). Page through results if necessary.

Twitter has some (sensible) rules about follower / following ratios. Once you're following ~ 800 people further follow requests will be blocked; you have to wait until you have more followers before adding anybody else. You can't whitelist your way out of this.

Good:
  • Again, pretty simple.
  • Better update time (aggregation within a couple of minutes of a tweet)

Not so good:
  • Can only follow ~ 800 people before Twitter starts blocking your follow requests.
  • Users will know that you're aggregating them (is this a bug or a feature?). Can't keep following / unfollowing people - they'll get spammed by emails telling about it.

GNIP

GNIP works activity streams from a bunch of different web 2.0 sites. Here's how it works in a nutshell:

  1. you set up a GNIP account
  2. you add rules to your account ("give me all tweets by @twalf" "give me all tweets by @ianmulvany") and set up a web hook (a script on your server). You can have up to 25k rules per site for free.
  3. GNIP receives data in real time from Twitter
  4. If any data matches your rule set then GNIP POSTs to your web hook with some metadata about the matching tweet (a unique id, the tweeter's username, a URI for the actual message)

Now you'll get pinged whenever anybody in your rules tweets - in close to real time.

Rules can be added programmatically or by hand. GNIP's API docs are pretty opaque but it's actually a fairly simple, efficient system once you've gotten to grips with it.

Unfortunately the metadata that gets POSTed to you doesn't contain the actual tweet. For that you have to go back to Twitter using the supplied URI, which points to the message in XML format. Remember that there's a rate limit on the Twitter API so by default you won't be able to aggregate more than a hundred messages per hour. This sucks. Whitelisting is pretty much the only way you're going to overcome this.

Twitter on GNIP is unique in this respect; none of the other services require you to call the originating site to get messages. It's especially annoying as tweets are only 140 characters long - it's definitely not a space / bandwidth issue!

Good:
  • Fast update time (pretty close to real time)
  • GNIP infrastructure can help you aggregate from other sites (Digg, Delicious...) in the future.
  • Follow up to 25k people for free and without scaling issues.

Not so good:
  • Relatively complex.
  • GNIP can be a bit flaky - occasionally it goes down and you lose updates for a few hours.
  • Requires whitelisting by Twitter once you're collecting more than a hundred tweets p/h.

Twitter streaming API

Twitter has a streaming API in alpha.

You can follow up to 200k users by POSTing their ids to http://stream.twitter.com/birddog.json - after you've been approved by Twitter and signed a usage agreement.

You can follow up to 2k users for free using http://stream.twitter.com/shadow.json which is similar.

You can follow up to 200 users for free using http://stream.twitter.com/follow.json which is similar.

Once you've opened a connection to shadow or birddog it'll never close. When a followed user tweets it'll come down the wire as a line of JSON (ending with a carriage return). Think Comet.

Good:
  • As fast an update as you're ever going to get.
  • Don't need to rely on third parties (like GNIP)

Not so good:
  • Still in alpha.
  • Need an agreement from Twitter to follow more than 2k users.
  • Complex (in that it requires you to move away from reactive, asynchronous scripts towards an app that can keep an HTTP connection open for hours)

Comments and trackbacks Feel free to post your comments Blogger Rose Anonymous Anonymous . This post has trackbacks.


See all posts from: July 2005 August 2005 September 2005 October 2005 November 2005 December 2005 January 2006 February 2006 March 2006 April 2006 May 2006 June 2006 July 2006 September 2006 October 2006 November 2006 December 2006 January 2007 February 2007 March 2007 April 2007 May 2007 June 2007 July 2007 August 2007 October 2007 November 2007 December 2007 January 2008 February 2008 March 2008 April 2008 May 2008 October 2008 December 2008 January 2009 February 2009 March 2009 June 2009