Flags and Lollipops

Thursday, August 18, 2005

Reusable Code

One of the things that I like about working in bioinformatics is that a lot of the software I write is situational - that is to say it's suited for one particular purpose. It gets written to do a job (typically some sort of analysis or data collation) and then discarded. Sure, there are a couple of major scripts or little applications that I've written that I use fairly frequently but most of my coding time is spent on throwaway scripts. It's the same for many of my colleagues.

In Computer Science classes we were always taught that this is not a good thing (that was in the mid-nineties - nowadays you'd hope that other methodologies get a look in too).

Personally, I think throwaway code is great - I dislike documenting code in detail, I selectively ignore object orientated design principles wherever possible and UML gives me the heeby jeebies. Just because you don't put in the extra effort to make sure that an outsider can come in and reuse all of your code doesn't mean that your program doesn't work just as well, after all - in fact, you have more time to find and fix bugs.

So we've got a Masters student (with a life sciences background) working on a project with us over the summer. The MSc in bioinformatics here is taught by the Computer Science department and his head has been filled with crazy J2EE-talk. He's doing everything by the book; drawing class diagrams, writing test harnesses - he even set up his own little local CVS repository. I should point out that I think this is a good thing. I mean, he's being graded by CS professors, or at least the coding part of his project is. If I were him I'd stick to what had been recommended to me in the lectures, too. It doesn't really bear any relation to the kind of code he'll be writing once he's actually out in the field, though. I just hope he doesn't end up believing that the only good code is reusable code.

Don't get me wrong - there's certainly a time and place for sticking to your object orientated, reusable code module guns. Bioperl is a good example of that. If you've spent some time implementing a tricksy algorithm it's worth setting things up so that you can share it with other people. If you're going to be distributing a script or program make sure that it's high quality, readable code.

To be brutally honest, though, does "everyday" code really ever get reused? What's the ratio of extra effort expended to work saved later on?

For much of bioinformatics I think that disposable software is perfectly acceptable. Partly the reason for this is that the focus isn't usually on producing fully-fledged application software for use by others but on processing data (or providing tools to process data) in the short term. What's important isn't the knowledge encapsulated in the code but in the knowledge created and then published as a paper. It's horses for courses, I guess; we should be able to accept that no single software methodology or mindset necessarily suits all situations.

Or maybe I'm just lazy.

(footnote: for more on this, check out the comments and posts at Propeller Twist and Inforbiomatica)

Comments and trackbacks Feel free to post your comments Anonymous Spitshine Blogger Stew Anonymous Fabrice Anonymous Mauricio Anonymous neil Anonymous Anonymous Anonymous Anonymous Anonymous Anonymous . This post has trackbacks.

Trackbacks:

8 Comments:

At August 19, 2005 6:38 AM, Anonymous Spitshine said...

Sure, people who have heard of the big software design systems want to play with them and loose themselves in there. But if you know what to use and what not, those tools are helpful and offer reusable solutions (not code usually) for larger problems.
Btw, not using CVS sounds entirely reckless to me...

 
At August 19, 2005 9:37 AM, Blogger Stew said...

>> Btw, not using CVS sounds entirely reckless to me...

That's fair enough! I run CVS on my own machine but just to keep track of where everything is - for us it's rare to have two people working on the same code so other aspects of CVS don't really get a look in.

 
At August 19, 2005 12:06 PM, Anonymous Fabrice said...

Interesting point of view. I like to discuss about this problem. I have given a complete answer in my blog

 
At August 19, 2005 3:36 PM, Anonymous Mauricio said...

You're right, we're not doing "software design" we just "getting the job done".

Btw, for me CVS is mandatory, even for my daily throwaway code.

Regards.

 
At August 24, 2005 5:15 AM, Anonymous neil said...

Think you hit the nail on the head when you point out that a lot of so-called bioinformatics is actually data processing and therefore dependent on the source data. I'd say 80% or more of my time is spent writing scripts that simply parse and reformat data.

As an example, I'm currently collaborating with a genome sequencing facility who have supplied us with various files related to a microbial genome sequence. Their fasta files are illegal - there is a space in the header between the '>' and the ID, and the IDs are not unique. So, a Bioperl script to read in the file, strip the header and rewrite it. Similarly they have supplied GenBank files lacking an ACCESSION field. So more Perl to read in a file and use the LOCUS field to supply a dummy accession. So it goes on, a lot of work before you get anywhere near actual research. Is this reusable code? Of course not, it's specific to the files from these people.

Even in research projects, it's hard to be generic. I'm currently working on large scale homology modelling of proteins from complete genomes. Goes something like (1) BLAST proteins v. PDB, (2) choose "good" hits (by identity and length), (3) download appropriate templates, (4) write some Perl that will process the template, thread the query sequence to it, output files for modelling and write out a shell script that does all these things via a PBS batch queue. Multiple ways to do this...I've chosen one, it works but generic and reusable? No.

So no, we are not software developers. Hvaing said that I'm all for good practice - objects and modules, commented code and CVS, where practicable.

 
At September 19, 2006 12:55 AM, Anonymous Anonymous said...

Thank you!
[url=http://tzdtdmmz.com/igjq/nagn.html]My homepage[/url] | [url=http://wibnuelx.com/jxgm/cyjd.html]Cool site[/url]

 
At September 19, 2006 12:55 AM, Anonymous Anonymous said...

Nice site!
My homepage | Please visit

 
At September 19, 2006 12:55 AM, Anonymous Anonymous said...

Good design!
http://tzdtdmmz.com/igjq/nagn.html | http://laedcwcy.com/zblu/pqoz.html

 

Post a Comment

<< Home


See all posts from: July 2005 August 2005 September 2005 October 2005 November 2005 December 2005 January 2006 February 2006 March 2006 April 2006 May 2006 June 2006 July 2006 September 2006 October 2006 November 2006 December 2006 January 2007 February 2007 March 2007 April 2007 May 2007 June 2007 July 2007 August 2007 October 2007 November 2007 December 2007 January 2008 February 2008 March 2008 April 2008 May 2008