Flags and Lollipops

Wednesday, October 12, 2005

Software "releases"

There's an interesting post over at Neil's blog about Methods vs Discovery (I started off adding this as a comment there but decided it was too long and ranty):
One of my pet issues in bioinformatics is that of methodology versus discovery. In other words - is your new, cute and clever piece of software any use if biologists are not using it to discover interesting things?
Reading it reminded me of being frustrated in my efforts to find software relating to chromatin structure for a project (not the S/MARs thing, but related) a wee while back. In the last year there've been two potentially really useful tools "released" which could have helped speed my work along. When I say "released", though, I mean that papers describing them were published; neither piece of software was ever actually made available for download (as you'll be aware if you've written an application note recently, editors "strongly suggest" that the software is released under an open source licence or made publically available, but it's just a suggestion).

This isn't too bad; I mean, sometimes code is messy or it's got some peculiar prerequisites or there's some other reason that you want to deal with potential downloaders in person. So I emailed the relevant authors to ask nicely if there was any possibility of obtaining the software, it'd be very useful, all due credit to be given etc.

I never got an answer from one group; the other responded that I'd have to wait until they'd published results from their own analyses with the software in question (presumably in case I had the cheek to try and use their software to discover something significant enough to be publishable before them).

Part of me understands the whole "I did the work, why should you reap the benefits?" idea but frankly it's a bit sneaky to publish a pointless software application note to pad out your CV with when the software is of no use to anybody (though not for lack of them trying). What's the point of a paper announcing that you've got this piece of software that, realistically, nobody else is ever going to use and you know it?

Isn't it short-sighted to keep a piece of software back so that you can get a few more papers out of it when the field is moving so quickly that your code will be obsolete within the next twelve months? Wouldn't it be better for everybody concerned - other scientists, who don't have to reinvent the wheel and implement their own tools, you, who will get cited more frequently and pick up more collaborations and the journal, who won't be publishing vanity pieces any longer - if we all shared?

There are sometimes other complications, of course. There's the possibility that the software in question is actually a bit crap; which will become obvious the moment anybody applies it to something other than the carefully selected test set in the paper (roll on transparent reviewing and reader comments). Perhaps you're using some sort of commercial or proprietary database.

Or perhaps the university technology transfer people got to you...

Comments and trackbacks Feel free to post your comments Anonymous Spitshine Anonymous Neil Blogger Stew Anonymous Neil . This post has trackbacks.

Trackbacks:

4 Comments:

At October 12, 2005 7:43 PM, Anonymous Spitshine said...

This seems connected to one of the frustrations of this week: Looking for a piece of software to detect low complexity regions, I realized that a lot of software that I recall from three years back is no longer available.
So looks like the student left the lab and no one is interested in maintaining just the silly web site for download. I wish the publisher would take better care of these issues.

 
At October 13, 2005 1:45 AM, Anonymous Neil said...

There was an article about software maintenance in The Scientist last year, which was discussed at Nodalpoint.

It is a problem and it's what happens when essentially amateur, part-time software people (I don't mean that in any negative sense) move on and universities pay no attention to what their people generate, other than money and publications and so have no concept of a software archive.

Maybe some large institution could take up this cause? Years ago, the IUBio Archive was the place to go for software (often via gopher - who remembers that?). Some sort of global, distributed OSS for biology repository? With bittorrents and so on?

 
At October 13, 2005 2:30 AM, Blogger Stew said...

There's bioinformatics.org, which you (well, I, anyway) don't really hear of all that much (despite it being second on the Google search results for "bioinformatics"). Can't think of any big software projects kept there, anyway. OK, it's not really a big institution, but it's trying to be which should be worth points.

Something puts me off hosting my stuff there: not anything in particular, it just doesn't really exude the professionalism you'd want from the people who're going to look after your hard-written code. That and the front page news is always a bit crap.

The thing about hosting an archive at an institution is that you've still got a problem with the people who were doing it in their spare time moving on, nobody outside being allowed to fiddle with institution webpages and nobody else in the group wanting to take over.

Maybe Greg would be willing to host simple software archives (rather than CVS / bugzilla / forum superdoodahs) on Nodalpoint. At least everybody there is already a volunteer.

Then again, it's easy to suggest and maybe not so easy to be landed with the work and bandwidth bill. :)

 
At October 13, 2005 3:08 AM, Anonymous Neil said...

I suspect that the points you raise - commercial considerations, technology transfer (yuck) and fears over software quality all play some part. It took me years to put any of my stuff online. Much of it was written for very specific instances which I thought would not apply for other people and I'll admit, I was embarassed by the code quality and was in fear of ridicule.

Eventually I got it into CVS and online and it was a really positive decision because (1) it forced me to improve my code quality, (2) it forced me to write more generic code and (3) in cases where I had made glaring, embarassing errors, people have contacted me with very polite and constructive corrections and improvements. So the message is - get your code out there.

 

Post a Comment

<< Home


See all posts from: July 2005 August 2005 September 2005 October 2005 November 2005 December 2005 January 2006 February 2006 March 2006 April 2006 May 2006 June 2006 July 2006 September 2006 October 2006 November 2006 December 2006 January 2007 February 2007 March 2007 April 2007 May 2007 June 2007 July 2007 August 2007 October 2007 November 2007 December 2007 January 2008 February 2008 March 2008 April 2008 May 2008