Flags and Lollipops

Wednesday, July 27, 2005

Modelling Phenotypes with Bayesian Networks

Read an interesting paper this morning in Nature Genetics (via a commentary article in EJHG) by Paola Sebastiani et al at the Boston University School of Public Health.

The abstract and supplementary data is available here. Essentially, Sebastiani used Bayesian networks to analyse a set of ~ 100 SNPs in candidate genes for sickle cell anaemia to see if any of them modulated the risk of overt stroke, a severe complication that happens to around 1 in 14 of SCA patients.

SCA is classed as a monogenic disease; that is to say, a fault in a single gene gives rise to the disease phenotype. Of course, things are never that simple in human genetics and it turns out that many monogenic diseases - SCA included - are affected by mutations in other genes that alter things like the age of onset, the types and frequency of complications, disease severity and response to treatment. The SNPs that Sebastiani looked at, some of which have been shown to contribute negatively to the risk of stroke and some positively, were spread out over 39 different genes.

The Bayesian approach allowed Sebastiani to look at many mutations in genes suspected to play a role in SCA phenotype modification simultaneously. Most statistical methods used to analyse the effect of SNPs on disease phenotypes deal with the mutations one at a time. Trained on a set of markers from 92 SCD patients who had suffered strokes and 1306 who had not, the resulting network was tested to see if it could be used to predict the likelihood of a patient suffering from stroke, given their genotype.

Rather promisingly, they report a success rate of 98.2% on an independent test set of patients, with 100% of the true positives and 98% of the true negatives detected. 25 of the SNPs in 11 different genes were found to directly modulate stroke risk.

Those numbers are great, but there are caveats perhaps not immediately apparent: it's worth bearing in mind that the success rate might be population specific. Both the training set and independent test set of patients were African Americans - other populations might have subtle differences in the way particular mutations affects phenotype.

It also seems strange that there is so little environmental contribution to stroke risk in these patients; the 108 SNPs chosen for the study presumably don't make up an exhaustive list of potential disease-modifying mutations, so the 98.2% success rate based on genotype alone isn't even an upper bound - there may well be a number of SNPs not considered for inclusion in this study for whatever reason that complete the picture even further.

It's nice to see research into modelling phenotypes like this produce good results. This kind of thing isn't really my area, so I don't know if anybody has done similar studies with Bayesian approaches that include environmental factors; maybe one day it'll be possible to create predictive tests and model disease outcomes for schizophrenia, or autism, or heart disease.

Still seems very far away, though.

Comments and trackbacks Feel free to post your comments . This post has trackbacks.

Trackbacks:

0 Comments:

Post a Comment

<< Home


See all posts from: July 2005 August 2005 September 2005 October 2005 November 2005 December 2005 January 2006 February 2006 March 2006 April 2006 May 2006 June 2006 July 2006 September 2006 October 2006 November 2006 December 2006 January 2007 February 2007 March 2007 April 2007 May 2007 June 2007 July 2007 August 2007 October 2007 November 2007 December 2007 January 2008 February 2008 March 2008 April 2008 May 2008