Flags and Lollipops

Tuesday, July 03, 2007

CrossRef metadata for all!

(via Code4Lib) CrossRef currently has a competition running. Submit a proposal for an innovative service that uses CrossRef data and if you're selected you can get an account for free (giving you access to the CrossRef database, which normally costs $$$).

In case you haven't heard of CrossRef:

CrossRef's specific mandate is to be the citation linking backbone for all scholarly information in electronic form. CrossRef is a collaborative reference linking service that functions as a sort of digital switchboard. It holds no full text content, but rather effects linkages through Digital Object Identifiers (DOI), which are tagged to article metadata supplied by the participating publishers. The end result is an efficient, scalable linking system through which a researcher can click on a reference citation in a journal and access the cited article.


Simplistically, CrossRef can supply the basic metadata (title, authors, journal details) associated with DOIs from the scientific literature.

In case you haven't heard of DOIs: DOIs are unique, resolvable identifiers for digital content (papers or figures, for example..). This paper has DOI 10.1186/1743-422X-4-67 . If you go to the CrossRef homepage and enter that string into the DOI resolver you'll be redirected to wherever the owner of the DOI (BioMedCentral, in this case) says the paper currently resides.

An example of how this data is used: when a paper is published the references at the bottom of the page can be hyperlinked by doing a reverse lookup on the CrossRef database i.e. asking 'what's the DOI of the paper with this title and author list'?

Another example: imagine a scholarly bookmarking system like Connotea or Zotero. When somebody bookmarks a paper you could scrape the title, authors, etc. directly from the HTML and hope that it doesn't break, or you could just scrape the DOI and then get all of the other metadata from CrossRef.

I think the competition is a good idea. My only problem with it is that IMHO CrossRef data should be free for non-commercial use anyway. At the moment it sorta kinda is; there's a 'demo' interface which you can use to try the service out.

If CrossRef really want to encourage innovative uses of their data then they should open up the database to anybody who wants to build (free, publicly accessible) applications on top of it.

Sure, CrossRef costs money to run but surely more people and open systems using DOIs in turn make it more worthwhile for publishers or software vendors to sign up as commercial members?

In any case if you've got a brilliant biomedical mashup in mind that might benefit you should apply. The deadline for proposals is July 15th.

Labels: , ,

Comments and trackbacks Feel free to post your comments Blogger Bill Hooker Blogger baoilleach Anonymous Duncan Hull . This post has trackbacks.

Thursday, May 17, 2007

Pg10k

OK, it's cheating because the figure includes books as well as papers, but Postgenomic has now tracked more than ten thousand citations in blog posts. As the majority of blogs either (a) don't supply fulltext RSS feeds, just excerpts or (b) strip out HTML and thus the links from feeds there must be a sizeable dark figure, too - how many citations are being missed by Postgenomic and Chemical Blogspace, I wonder?

Anyway, paper #10,000 was Mauro Costa-Mattioli's paper in Cell about stress induced translation regulation (conveniently the citing post from Gene Expression explains what that is and then goes into some interesting detail - an excellent advert for science blogging).

I'm pleased. Scientists write blogs and put science in them. They talk about recent papers. Their numbers are growing. Might blog trackbacks be a good or even necessary supplement to comments on a paper on a journal website?

It'd be interesting to take, say, BioMedCentral papers from the past twelve months and compare the number of comments on each to the number of citations from posts. I think that BMC does comments quite well, possibly better than any other STM publisher - PLoS included - not that that's necessarily saying much (also there's still no comment RSS feed, boo). Using comment data from PLoS One would be another option (was speaking about this with a colleague earlier today) but considering how new PLoS One is perhaps there isn't enough data in Postgenomic yet for any results to be meaningful.

Actually, it'd also be interesting to compare the number of blog citations to the number of 'real' citations recieved by each paper in the index. Is blog buzz a good indicator of impact?

A brief stats update: the site has been running for about fourteen months. The most popular book has been The God Delusion, with relevant posts from 15 different blogs. The most popular 'proper' paper (anything with a DOI in PubMed gets tracked, which includes some opinion pieces) was Ben Voight's a map of recent positive selection in the human genome, from PLoS Biology.

There are 735 blogs in the index, of which 341 were active in the past week. Usually ~2,500 posts are aggregated each week (a major exception being the last two weeks of December, when this number falls to 1,400). There are ~120,000 blog posts in the database.

I've been busy with other projects at NPG recently but plan on spending some more time on Postgenomic over the next few months. If you've got any ideas (or you'd like to help out with coding, documentation, design - it's an open source project, born from discussions in the comment threads of bioinformatics blogs) then please let me know. If you're interested in using data from Postgenomic in some way then that's cool too, I'm keen to help.

I was going to reiterate my thanks to people who have contributed so far but the list is too long and I'd forget people. You know who you all are - ta muchly. Science bloggers rock.

Labels: , ,

Comments and trackbacks Feel free to post your comments Blogger neilfws Blogger Pedro Beltrão Blogger Stew . This post has trackbacks.

Thursday, March 01, 2007

Publish or Perish

Publish or Perish is a Windows app that generates your h-index (amongst other metrics) for you, based on citation data from Google Scholar. No NSPNAS, yet, unfortunately.

Personally I think that the simple number that makes up an h-index is a little dry. Besides, people who've never heard of it before don't have a frame of reference. What's the scale? Does it go from low to high or the other way round?

More to the point, "I have an h-index of 30" won't impress the opposite sex. No, for that you need D&D references (hotties dig D&D). How about "I'm a level 30 biobarbarian?". Now we're talking. Behold my +1 Pipettes of Power. When you collect enough citations you level up. I'm a level 1 gnome, myself.

Go on, it'll look much better on your grant application.


Labels: ,

Comments and trackbacks Feel free to post your comments Blogger Pedro Beltrão Blogger Stew Anonymous Anne-Wil Harzing Anonymous Anne-Wil Harzing Anonymous Anonymous Anonymous Anonymous . This post has trackbacks.


See all posts from: July 2005 August 2005 September 2005 October 2005 November 2005 December 2005 January 2006 February 2006 March 2006 April 2006 May 2006 June 2006 July 2006 September 2006 October 2006 November 2006 December 2006 January 2007 February 2007 March 2007 April 2007 May 2007 June 2007 July 2007 August 2007 October 2007 November 2007 December 2007 January 2008 February 2008 March 2008 April 2008