Flags and Lollipops

Monday, April 10, 2006

Standardized testing

There's an interesting, wide ranging review paper in Biology Direct this month that looks at methods for motif discovery. At the end of the introduction the authors point out how difficult it is to perform any kind of quantitative performance measurement on the different algorithms and suggest that there's a need for more standardized testing in the field, citing Tompa et al.'s excellent paper in Nature Biotechnology as a step in the right direction (Tompa compared several modern motif finding algorithms on a carefully selected reference set of transcription factor binding sites).

Broadly speaking I'm a fan of different research groups getting together to establish standardized testing. It can drive innovation - look at all the methods created specifically for CASP or BioCreAtIvE - and there's no doubt that it's handy for biologists coming in from the cold. Imagine that all you want to know is if a particular program that you use has been superceded by newer, fancier algorithms yet: would you rather compare two statistics to the top-ten list on the assessment web page or go off on a tangent chasing references?

Of course, it's sometimes easier said than done. I'm interested in finding candidate disease genes in large regions of interest and there are at least a dozen different algorithms that can help with this, but comparing them is difficult as they're all suited to slightly different circumstances (for example, one works well when there's expression data available but cannot operate otherwise: is this method better or worse than an algorithm that doesn't require expression data but does need, say, GO annotation?). Working out which algorithm will work best on the data that you're interested in becomes obvious if you read all of the relevant papers but it's information that could quite easily be lost in a standardized study.

As a brief aside, check out the reviewer comments from Eugene Koonin on the Biology Direct paper: why don't reviewers keep it short n' sweet like that when commenting on my manuscripts?

On second thoughts, don't answer that.

Comments and trackbacks Feel free to post your comments Blogger maximilian . This post has trackbacks.

Trackbacks:

1 Comments:

At May 26, 2006 12:27 AM, Blogger maximilian said...

Yes, right, you should read all the relevant papers. Alas, in bioinformatics there are sometimes quite a few: For the example of motif discovery, let's simply look at a list of motif discovery algorithms. Erm... well, I stopped counting at 80... OK, the list is not well commented, maybe you could eliminate every second because the program is not available on the internet. It's still simply too much to look at. Let alone benchmark or parse or anything. You would have to concentrate somehow, choose, read...
It's much simpler to write another algorithm than to benchmark this mess. :-)

 

Post a Comment

<< Home


See all posts from: July 2005 August 2005 September 2005 October 2005 November 2005 December 2005 January 2006 February 2006 March 2006 April 2006 May 2006 June 2006 July 2006 September 2006 October 2006 November 2006 December 2006 January 2007 February 2007 March 2007 April 2007 May 2007 June 2007 July 2007 August 2007 October 2007 November 2007 December 2007 January 2008 February 2008 March 2008 April 2008 May 2008