Flags and Lollipops

Saturday, December 22, 2007

Userscripts for the life sciences

Egon and Noel have a paper in BMC Bioinformatics this month describing userscripts for the life sciences... nice work, guys.

Last year there was a discussion over at Pedro's of the merits of publishing individual userscripts after Ben Good's paper about a Greasemonkey based iHOP enhancement appeared in BMC. This is more of a review.

We discussed the possibility of hosting a science mashups / web services wiki at NPG - sort of like ProgrammableWeb, but listing only the APIs, databases and tools relevant to science. This sort of ties in with the post over at Nodalpoint that Alf wrote about documenting bioinformatics APIs. There's enough stuff available nowadays for it to be a useful resource, I think.


Incidentally: I started writing this post BEFORE I read the paper properly and realised that I got a namecheck for Postgenomic. Now I definitely recommend it. ;p

Labels: , ,

Comments and trackbacks Feel free to post your comments OpenID mndoci . This post has trackbacks.

Monday, November 05, 2007

First steps with Opensocial

I got access to the Orkut Developer's sandbox this morning (at the moment Orkut is the only place where you can test Opensocial apps - Ning also has an implementation but it's far too buggy to work with).

I like the idea behind Opensocial a lot. It had to happen, sooner or later.

The problem is that in this case, IMHO, Google have gone for 'sooner' a bit too soon. My Opensocial experience has been rubbish. I'm all for release early, release often but the whole thing seems half-baked to me.

I'm not talking about the API itself, just the implementation, though the API also has issues (for example: no way of identifying app visitors who aren't logged-in members of the social network you're hosted on).

First off: the Orkut sandbox sucks, big time. It becomes marginally more usable once you discover (by scouring the Google Group as there's no relevant documentation) that by appending the magic '&bpc=1' incantation to each Orkut url you can turn off caching and test your apps without reinstalling them under a different name each time you make a change.

Then there's the 'opensocial is not defined' issue: something happened during a rollout of new code across Google's servers and now the opensocial .js file can't be loaded; so nobody's apps are working unless you're on the west coast.

Again, I only know this from speculation on the Google Group where (complaint #3) there aren't any actual Googlers, just confused developers wondering why none of tutorial code works. Facebook did the same thing - set up a developer help board! Let them help themselves! Ignore them, focus on pushing out new code! - and it's annoying. Take shifts, assign an intern, I dunno, just please put us out of our misery and tell us about typos in the specs, servers going down and known bugs as soon as you know about them.

Again, I like the idea and I'm definitely keen to build some Opensocial apps.

Might wait for 1.0, though.

Labels: ,

Comments and trackbacks Feel free to post your comments Blogger Chris Jackson Anonymous Maxine Blogger Pedro Beltrão Blogger Stew . This post has trackbacks.

Tuesday, July 03, 2007

CrossRef metadata for all!

(via Code4Lib) CrossRef currently has a competition running. Submit a proposal for an innovative service that uses CrossRef data and if you're selected you can get an account for free (giving you access to the CrossRef database, which normally costs $$$).

In case you haven't heard of CrossRef:

CrossRef's specific mandate is to be the citation linking backbone for all scholarly information in electronic form. CrossRef is a collaborative reference linking service that functions as a sort of digital switchboard. It holds no full text content, but rather effects linkages through Digital Object Identifiers (DOI), which are tagged to article metadata supplied by the participating publishers. The end result is an efficient, scalable linking system through which a researcher can click on a reference citation in a journal and access the cited article.


Simplistically, CrossRef can supply the basic metadata (title, authors, journal details) associated with DOIs from the scientific literature.

In case you haven't heard of DOIs: DOIs are unique, resolvable identifiers for digital content (papers or figures, for example..). This paper has DOI 10.1186/1743-422X-4-67 . If you go to the CrossRef homepage and enter that string into the DOI resolver you'll be redirected to wherever the owner of the DOI (BioMedCentral, in this case) says the paper currently resides.

An example of how this data is used: when a paper is published the references at the bottom of the page can be hyperlinked by doing a reverse lookup on the CrossRef database i.e. asking 'what's the DOI of the paper with this title and author list'?

Another example: imagine a scholarly bookmarking system like Connotea or Zotero. When somebody bookmarks a paper you could scrape the title, authors, etc. directly from the HTML and hope that it doesn't break, or you could just scrape the DOI and then get all of the other metadata from CrossRef.

I think the competition is a good idea. My only problem with it is that IMHO CrossRef data should be free for non-commercial use anyway. At the moment it sorta kinda is; there's a 'demo' interface which you can use to try the service out.

If CrossRef really want to encourage innovative uses of their data then they should open up the database to anybody who wants to build (free, publicly accessible) applications on top of it.

Sure, CrossRef costs money to run but surely more people and open systems using DOIs in turn make it more worthwhile for publishers or software vendors to sign up as commercial members?

In any case if you've got a brilliant biomedical mashup in mind that might benefit you should apply. The deadline for proposals is July 15th.

Labels: , ,

Comments and trackbacks Feel free to post your comments Blogger Bill Hooker Blogger baoilleach Anonymous Duncan Hull . This post has trackbacks.

Monday, June 18, 2007

Publishers, trackbacks and shared data

The elevator pitch version of this post: if you're a science publisher interested in the web then let's talk about collaborating on a shared system that will stimulate online discussion, kickstart commenting and recognize the sometimes valuable contributions already being made every day by science blogs.


I'm a strong believer in allowing commenting on online papers. This is something under serious discussion at Nature (the question is how to do it properly). The vast majority of researchers read, organize and discover papers online; we should give them the tools and opportunity to discuss papers online, too.

It's easy to be dispirited by the lack of comments on early adopters - though what would an appropriate number of comments on a paper be? Is one comment pointing out a critical error worth more than a hundred saying 'nice paper'?

In the relatively near future two things will happen to help push commenting forward:

  • We'll (scientists in general) develop systems that track and credit scientific contributions - including relatively minor ones like wiki edits and comments - that aren't in manuscript form.

  • We'll make it easy enough to leave comments and for content stakeholders to be alerted so that they can reply for a positive feedback loop to kick in - more authors responding means commenting is seen to be more useful, so more comments are left... etc.


Until then, though, there is a way of supplementing comments submitted directly to journals: science blogs.

I think it's fairly safe to say that the number of blog posts discussing papers is much, much larger than the number of online comments left on papers from all STM publishers combined. Prove me wrong and I'll take you out for cocktails.

Some specific examples of papers discussed in blog posts:

This recent paper in Cell has no comments but three blog posts written about it. This paper in PLoS One has two blog citations but only one comment (which is a link to one of the blog posts - this has been discussed previously on the PLoS One blog).

So how can publishers use blog content to supplement commenting systems? I think Postgenomic is the answer, or at least a good starting point.

Postgenomic is a science blog aggregation site with an open source codebase. The data it collects is accessible via a REST based API.

Postgenomic follows several hundred science blogs and tracks the papers that they link to. Publishers can easily - and should, IMHO - access this data and display blog trackbacks next to the papers that they publish online.

Technorati or a homegrown system could possibly be used to do the same thing. Here's why STM publishers should use Postgenomic instead:

  • Postgenomic was written specifically to deal with scientific literature. It handles tricky things like disambiguation: a single paper X might be linked to at different URIs by different blogs (imagine that one blogger links to the abstract on PubMed, another to the PDF and a third to the fulltext view). It understands DOIs and PMIDs. We have a lot of experience with this sort of thing at Nature - see Connotea.

  • As the list of aggregated blogs is strictly controlled there's no need for publishers to manually curate each and every trackback on their papers.

  • Postgenomic has been running for more than a year and is recognized by the community - at least to the extent that new blogs are submitted regularly. If somebody starts a new blog and wants to be included on paper trackback whitelists, or a blog changes address or an archive is deleted then it makes sense for there to be one, central place for this to be dealt with. The science blog community is relatively small already, why fragment it further?


My suggestion is that wherever you'd allow comments on papers you also collect trackbacks, displaying the title and excerpt of blog posts citing the paper in question.

Blog trackbacks on papers are a winning proposition for everybody involved. Bloggers get recognition and increased exposure, readers get more relevant content, publishers get papers worth coming back to after you've downloaded the PDF, authors see more discussion surrounding their research.

If you're interested in talking about this further then please get in touch.

Labels: , , ,

Comments and trackbacks Feel free to post your comments Blogger Pedro Beltrão Blogger Egon Willighagen . This post has trackbacks.

Wednesday, June 13, 2007

Facebook as a platform, pt2

I thought that it might be useful to blog my experiences with the new Facebook Platform API. For the past two weeks I've been working on Bookshare in my spare time. Bookshare is a book review site with social networking features (feel free to sign up and let me know what you think).

Anyway....

Facebook makes you think about scalability

Here was my first harsh Facebook lesson: apps can spread fast. Not necessarily exponentially but always faster than you can iterate over your code optimizing it. Quick hacks come back to bite you in the ass hours after you implement them, rather than months, because your userbase can double overnight.

Here's a fictional but in no way implausible scenario: imagine that you wrote an application that shows lolcats on people's profile pages to try out the Facebook API. You make the app public and then forget about it. A week later it has 1.7 million users.

A good problem to have? Maybe. Until you get your bandwidth bill (not a problem for images, luckily, as Facebook caches them locally, but still) or you want to use your server for anything else... ever.

Usually if your site slowed down a bit you'd be OK, but unfortunately:

Facebook times out. A lot.

If you visit

http://apps.facebook.com/bookshare/dashboard.php

then Facebook fetches the output of dashboard.php from my server and renders it inside of the Facebook page template (containing the Facebook logo, sidebars, footer etc.).

The problem is that there is an awfully short timeout on the Facebook server fetching remote pages. If your site is being slower than usual (because of an unoptimized SQL query, perhaps?) then this is reflected on Facebook by all of your application's canvas pages being totally unavailable - they get replaced with a non customizable 'could not retrieve page, try again in a couple of days' type error. A couple of days? Way to drive off users. What's wrong with suggesting that they hit 'reload' instead?

Imagine getting 1.7 million messages of complaint from angry lolcatters who can't access your app any more.

Official support is a bit lacking

I can kind of understand this as they've presumably been inundated by requests and bug reports, but for a company that's done right by developers in so many other respects their direct support sucks. I emailed off a report about a specific bug and got back an intensely annoying 'thanks for trying the Facebook Platform, all the documentation you need to get started can be found online' form letter reply.

The documentation was written by developers

The official client library is in PHP (as is the UI of Facebook itself). Unfortunately the online documentation only ever refers to the underlying REST calls to the server, so you have to actually delve into the code to see what the naming conventions are.

Despite all this, it's brilliant

You don't have to worry about persuading users to come visit. You don't have to write login or session handlers. Or friends lists. Or a messaging system. Your images get cached. The audience is large. The API is well designed.

I think they're on to a winner.

Labels: , , ,

Comments and trackbacks Feel free to post your comments Blogger brian . This post has trackbacks.

Saturday, May 26, 2007

Facebook as a platform, pt 1

(The next couple of posts have very little to do with bioinformatics - sorry).

I've been trying out a whole bunch of online social networks this year in the name of research (no, really). LinkedIn is the most boring. Bebo, which has a widget that suggested that my celebrity lookie-likie is Whoopi Goldberg, is the weirdest. MySpace is just scary.

Facebook is pretty good, though. The design is grown up and so is the userbase (relatively speaking). Nature has a group there to help point young scientists towards some of the less obvious services that NPG provides (like free drinks at Nature Network meetups - if you're in London then come along). Facebook were the first big social network to release an API and now they've gone a step further and opened up Facebook as a platform.

I was pretty excited by this announcement. Facebook works brilliantly as a social platform - will 3rd parties add features that'll enable it to compete in the 'professional' social networking arena too?

Labels: , ,

Comments and trackbacks Feel free to post your comments Anonymous Deepak Blogger Pedro Beltrão Blogger kieren_lythgow . This post has trackbacks.

Friday, April 27, 2007

Blog trackback bookmarklet

Update: now returns HTML instead of atom :)

Bookmarklet to retrieve science blog trackbacks for the current page (a blog post permapage, for example?), courtesy of Alf:


javascript:location.href='http://www.postgenomic.com/page_trackback.php?url='+encodeURIComponent(location.href)

And the link (just drag it up to your bookmarks bar): PgTrack

Want to try it out? Bora's seminal 'science on science blogs' post has a lot of incoming links.

Note that you can check for trackbacks on any page - BBC news stories, papers, whatever - it's just they'll all be from blogs indexed in Postgenomic. And about that exact URL.

Labels: , ,

Comments and trackbacks Feel free to post your comments Blogger baoilleach Blogger Stew . This post has trackbacks.

Tuesday, March 20, 2007

Dapper

So Andrew has been talking about Dapper and Pedro about kapow, both screen scraping (sort of) services that let you extract data from websites. It's interesting and potentially useful stuff.

Here's my Dapped contribution: an RSS feed of advance access articles from Bioinformatics and NAR. I tried putting Nature advance publications in there but shamefully the nature.com markup isn't up to scratch or something and Dapper refuses to pull out the correct pieces of data consistently. Neh.

There's a little bit of custom PHP code involved to create links for each paper from the doi, to extract publication dates and to merge the two dap RSS feeds - this seems like the kind of thing Pipes was designed for but unfortunately it's not quite flexible enough yet.

I made the original bioinformatics and NAR daps public if you want to tweak them.

Labels: , , , ,

Comments and trackbacks Feel free to post your comments Anonymous Latecia . This post has trackbacks.

Tuesday, February 13, 2007

Connotea: basic integration with Word

Word 2007 has some built-in support for bibliographies and citations. It's pretty basic and has some, um, issues, but by using it you can get your Connotea bookmarks into Word fairly quickly.

Here's how:
  1. Enter your Connotea username in the form at the bottom of this post, press 'get bookmarks' and wait for a couple of minutes while a script fetches your bookmarks via the API (if you have lots of bookmarks then this could take some time)

  2. Save the resulting file (as .XML - "sources.xml" would be a good name).

  3. Open up Word 2007

  4. Click on the References tab. In the "Citations and Bibliography" panel click on "Manage Sources"

  5. Click the Browse button and select the XML file you saved in step 2.

  6. Any bookmarks in your Connotea library that had citation information attached to them should appear in the 'Sources Available' listbox. Select those appropriate to the document that you're writing and copy them across to 'Current List'.

  7. You can now use the 'insert citation' and 'bibliography' buttons to insert citations and generate formatted bibliographies, respectively.

Con2Word XMLatron

Connotea username:


Labels: , , ,

Comments and trackbacks Feel free to post your comments Blogger Bob . This post has trackbacks.

Monday, February 12, 2007

PLoS One / Postgenomic mashup

Chris Surridge has an interesting post over at the PLoS blog about the comments (or the lack thereof) on PLoS One papers. He mentions one paper in particular that has a long discussion thread associated with it on Gene Expression but no real comments on the actual PLoS One site.

As a temporary solution (?) to the problem of blog comments not being immediately accessible from the paper, summaries of notable manuscripts are going to be posted to the PLoS publishing blog with open comment threads. Based on the three posts already up I think this is a terrible idea.

Partly this is personal preference - I hate blogs that just replicate tables of contents - but more importantly I think that it misses the point.

People like the GNXP folks have taken the time and trouble to build up a loyal community that fosters debate and to create an environment in which visitors enjoy interacting with the site and with each other. Sticking up an abstract or two on your own blog just isn't going to compete with that, doesn't matter how much traffic you get.

Blog properly - engage your audience - or don't blog at all. It's a personal communication medium, that's one of the reasons why people feel more comfortable commenting in a blogging environment. A link and an abstract on a publisher's blog isn't personal, it's an advert. The PLoS One blogs are generally a good read at the moment, don't ruin them.

I'm not just PLoS bashing here: I like the ideas behind PLoS One and we do the same 'if we blog the abstract then people will comment!' thing at Nature on some blogs (the ones I don't read any more). The intention is good, it's just misguided, IMHO.

Anyway, I think that a better solution would be to embrace the existing science blogosphere and to explore ways of working with it more closely. As a proof of concept, here's a Greasemonkey script that adds science blog trackbacks to PLoS One.

It's doesn't look particularly nice, mainly because I didn't have time to style things very well. Feel free to do with it as you will, though (you could get it working with PLoS Two, for a start).

Labels: , , , , ,

Comments and trackbacks Feel free to post your comments . This post has trackbacks.


See all posts from: July 2005 August 2005 September 2005 October 2005 November 2005 December 2005 January 2006 February 2006 March 2006 April 2006 May 2006 June 2006 July 2006 September 2006 October 2006 November 2006 December 2006 January 2007 February 2007 March 2007 April 2007 May 2007 June 2007 July 2007 August 2007 October 2007 November 2007 December 2007 January 2008 February 2008 March 2008 April 2008