Software availabilty: a quick survey of OUP Bioinformatics
I figured that I'd do some empirical research and check out all of the Application Notes published in the March issues of Bioinformatics from the past four years.
Some "this study isn't very scientific" disclaimers: It's not a huge dataset. I'm lumping databases, software and web services together to talk about 'resources' in general. There's only one resource per paper, and it's whatever is referred to in the abstract 'availability' section. I started off going through every paper in each issue to see if they mentioned resources but it rapidly because tiresome and so for 2005, 2004 and 2003 I just looked at the Application Notes.
So on to the results - the raw data is at the end of the post, but briefly:
- 12% of resources from the March 2006 issues are no longer available.
- 17% of the resources from 2005 and 2004 are no longer available.
- 11% of the resources from 2003 are no longer available.
- Only one of the resources I looked at was hosted on SourceForge. It's still available.
- Many, many resources were hosted in home directories (i.e. whatever.edu/~username/ ).
- A couple of resources that were available 'upon request' made clear that they were free for non-profit use only - is holding the software back a way of screening potential customers?
Two other things I noticed:
- OUP Bioinformatics used to have lots of original research and now it's all applications and databases (not necessarily a bad thing, I'm just saying. Neil has mentioned this before, too)
- People writing bioinformatics web services love frames. Stop using frames, please.
Perhaps a compromise between making software open source and keeping it locked up until you / your technology transfer officer can become fantastically rich by selling it to big pharma is to upload a tarball of the software executable (that runs on a reference platform: Windows, OS X, Linux?) and some documentation to, say, WebCite? No mailing lists, CVS access or anything fancy are necessary, after all: just a permanent snapshot of the software that you used to write your paper.
Anyway, the raw data:
March 2006
27 resources
3 available on request (11%)
3 unavailable (of all resources: 11% / of freely available resources: 12%)
1 in SourceForge
March 2005
33 resources
4 available on request (8.25%)
5 unavailable (15% / 17%)
1 unavailable site redirects to an ad filled domain parking page, how rude.
March 2004
29 resources
all freely available (i.e. not 'on request')
5 unavailable (17% / 17%)
March 2003
22 resources
5 available on request (22%)
2 unavailable (9% / 11%)
Labels: availability, nature, software
Pierre
Neil
Deepak
Sandy
Mike Barton
SNP
Pedro Beltrão
Bishu
. This post has trackbacks.
