Flags and Lollipops

Monday, December 31, 2007

Open notebook - what's a disease again?

Is there a super-semantic-web-enabled phenotype database out there*? I want to ask a question like 'give a list of monogenic disorders whose locus has been confirmed by at least two labs, broken down by type of causative mutation type' and get an answer.

(* on a tangent: is 23andMe's gene book thing freely available?)

OMIM falls quite a long way short of this... it never set out to be a resource for programmatic access so you can't really blame them. The morbid map is available for download and contains all of the gene -> disorder mappings in their database.

A couple of issues:

  • OMIM's weird entry categorization system (#*%+...) is very confusing. There are 2229 'phenotypes' (note: not 'Mendelian phenotypes') with a known molecular basis in the database, apparently, but only 386 genes with a phenotype associated with them? Some of those phenotypes are going to be caused by gross insertions / deletions / whatever and not small mutations in single genes, multiple phenotypes might arise from different mutations in the same genes but even so... what's with the disparity?
  • It contains polygenic disorders (diabetes, schizophrenia) as well as monogenic ones
  • You can't tell which is which - you could count the number of genes associated with the disorder but a 'monogenic' disorder might be a complex one whose OMIM entry hasn't been updated yet
  • It's not a disease database - it has other phenotypes in it too. Longevity? Wet or dry ear wax? Novelty seeking personality?


The last point is interesting, really. When is a phenotype a disease? If you have a novelty seeking personality and so are relatively impulsive and prone to climbing mountains, swimming with sharks, cycling without a helmet etc. then are you ill?

Well, no, is the obvious answer. But where do you draw the line? Is autism a disease?

Neh. Beyond our remit. For us a monogenic disease = a clinically recognized disorder with a single, genetic cause.

Comments and trackbacks Feel free to post your comments OpenID mndoci Blogger Dan Anonymous bioinfblog . This post has trackbacks.

Saturday, December 29, 2007

Open notebook pt2 - question, theories, approach

The question

Monogenic (classically mendelian) disorders are caused by mutations or errors in a single gene. Many of these gene -> disesase mappings have been discovered and are listed in OMIM, the Online Mendelian Inheritance in Man database.

There are ~ 1,000 'disease' genes (genes that give rise to a particular monogenic disorder when mutated in particular way) listed in OMIM. If you compare this set to other genes some interesting differences become apparent [1, 2] (only two paragraphs before reference to own paper; this is just like a real writeup!). Check out the table below from [1]:



Median gene length (*) is particularly interesting; disease genes have a median length of 27k while the control set sits at 19k. Why?

(* the longest known transcript of each gene was used)

Some plausible sounding explanations


  • Study bias: genes known to be responsible for disease have by definition been studied in a lab. Gene finding is an inexact science; perhaps automated systems tend to miss the last few exons and it takes a human in a wet lab to find the longer transcripts?
  • Larger genes are 'less important': older, more conserved genes tend to be smaller [ref needed]. Mutations in newer, larger genes may be more likely to have no effect or give rise to a new phenotype (like monogenic disease) while mutations in these older presumably more important genes might be fatal at a very early stage.
  • Correlation with some other feature: larger gene sizes are correlated (to different extents) with things like larger numbers of exons, longer 3' and 5' UTRs and expression patterns [3]. Could it be, for example, that monogenic diseases tend to affect one particular area rather than being systemic? If so, maybe the disease gene set is larger because larger genes tend to be more tissue specific.


Our approach

Let's start off by revisting the data from [1] and making sure that the gene size / disease correlation still holds up, throw in a few more features to look at - back in 2005 it was difficult to get normalized expression data for the control set - then search the literature for any other theories or related findings.

After that we can test a few possible explanations.

[1] Speeding disease gene discovery by sequence based candidate prioritization

[2] Human disease genes: patterns and predictions

[3] Elevated rates of protein secretion, evolution, and disease among tissue-specific genes

Comments and trackbacks Feel free to post your comments Blogger Bill Hooker Anonymous Stew Blogger Bill Hooker Blogger Bill Hooker . This post has trackbacks.

Open notebook pt1

I've decided to get back into 'proper' science. For a week, anyway, I'm not stupid (well, stupid enough to do this in my spare time, but yeah...).

Here's the plan:


  1. ask an interesting yet niche and relatively simple question
  2. use bioinformatics tools and awesome science 2.0 websites to find answer
  3. keep track of progress on this blog
  4. put together manuscript and submit to Precedings
  5. use backdoor into Nature Genetics manuscript tracking system to get paper accepted


This may not make for exciting reading - we'll see.

Comments and trackbacks Feel free to post your comments . This post has trackbacks.

Saturday, December 22, 2007

Come work for Web Publishing

NPG is recruiting (a publishing / managerial role):

Head of Community Business Development

This person will play a central role in NPG's evolution as a scientific communication company. They will be based in London or New York and will report to the Publishing Director, Nature.com. This role will focus on using online approaches to develop a better understanding of, and deeper relationships with, each of our users. By serving them better we intend ultimately to attract attention and usage from all professional scientists, and by using these services as the foundation for new businesses we intend to continue NPG's rapid evolution as an online scientific communication company. This role will involve line management responsibility for our existing social software teams, as well as the appointment of further staff in the areas of online marketing and web statistics. We are seeking someone with a clear strategic sense of how the web is evolving, sufficient technical knowledge to work closely with software developers, a clear strategic vision for the future of communities on Nature.com, and experience in developing, promoting and running successful participative websites.

Comments and trackbacks Feel free to post your comments OpenID mndoci . This post has trackbacks.

Userscripts for the life sciences

Egon and Noel have a paper in BMC Bioinformatics this month describing userscripts for the life sciences... nice work, guys.

Last year there was a discussion over at Pedro's of the merits of publishing individual userscripts after Ben Good's paper about a Greasemonkey based iHOP enhancement appeared in BMC. This is more of a review.

We discussed the possibility of hosting a science mashups / web services wiki at NPG - sort of like ProgrammableWeb, but listing only the APIs, databases and tools relevant to science. This sort of ties in with the post over at Nodalpoint that Alf wrote about documenting bioinformatics APIs. There's enough stuff available nowadays for it to be a useful resource, I think.


Incidentally: I started writing this post BEFORE I read the paper properly and realised that I got a namecheck for Postgenomic. Now I definitely recommend it. ;p

Labels: , ,

Comments and trackbacks Feel free to post your comments OpenID mndoci . This post has trackbacks.

Friday, December 07, 2007

Second Life


via Alf's delicious bookmarks - Linden Lab has released a beta version of the Second Life client that uses Windmark's 'atmospheric rendering technology'.

Big difference, no?

Comments and trackbacks Feel free to post your comments Blogger Deepak . This post has trackbacks.


See all posts from: July 2005 August 2005 September 2005 October 2005 November 2005 December 2005 January 2006 February 2006 March 2006 April 2006 May 2006 June 2006 July 2006 September 2006 October 2006 November 2006 December 2006 January 2007 February 2007 March 2007 April 2007 May 2007 June 2007 July 2007 August 2007 October 2007 November 2007 December 2007 January 2008 February 2008 March 2008 April 2008 May 2008