Was pootling around on Google Video and found this presentation about BioKnoppix (the bootable Linux CD with a whole bunch of bioinformatics tools on it) by Humberto Ortiz-Zuazaga and Carlos Rodríguez, who were the originators of the project.
I've never actually used Bioknoppix (have used plain old Knoppix for sysadmin stuff, though). Seems like a good idea, especially in conjuction with a 1Gb flash drive where you can store all of your scripts and analyses.
On a different but related note, Google Video also picks up all of these live surgery webcasts, which I didn't expect. Is it for the educational value to other surgeons and medical students? To sell the equipment used..?
Eric Snowdeal recently posted an article about the use of .NET in bioinformatics, which is something that got me thinking. I used Visual Basic .NET a year or two ago to code up the front end to a simplified LIMS (every bioinformatician has to code at least one LIMS in their lifetime) and it was actually quite a pleasant experience. I chose Visual Basic, incidentally, because FOR...NEXT loops bring me back to halcyon days spent writing half-assed text adventures and trying to hack Granny's Garden to print out rude words.
(quick disclaimer: in theory .NET is platform independent and lots of different programming languages can use it. In reality when most people talk about .NET what they mean is Microsoft's implementation on Windows and one of their big three 'Visual' programming languages - Visual C++, Visual Basic and Visual C#. I'm no exception)
In case you haven't really been paying much attention to Windows programming, .NET is a strategy around which Microsoft has been busy recasting its APIs, programming languages and operating systems. Their web page about it has all the marketing crapola that you might expect but .NET basically boils down to: you write programs made up of components that can talk to one another, even over the network.
The .NET framework contains a lot of nice classes that handle XML and connecting to databases, and something like connecting to a web service and printing out a result is actually pretty simple. Visual Studio is the rapid application development (RAD) / code editor that comes with all of the Microsoft programming languages, and it makes designing graphical user interfaces quick and easy.
In the Sys-Con article that Eric links to (server was down when I last checked, Google cache version here though) the author - Christopher Frenz - anticipates a growing need for Windows desktop based bioinformatics applications and suggests that an open source library similar to Bioperl would help .NET programmers meet this need.
I can see an argument that for bioinformatics programs to be more widely used they have to be accessible, and "double click on setup.exe then run the program" is, I guess, a lot more accessible than installing ActiveState Perl and tackling the command line. It's also more inuitive and familiar for most users to press buttons and tick checkboxes than to write command line arguments.
But all that is window dressing (perhaps very useful window dressing, but none the less). I don't see any advantage in using .NET behind the scenes to do any actual analysis or data munging, while there are certainly obvious disadvantages: i.e. Perl, Python and Java are free, already have widespread support in the bioinformatics community and will run almost anywhere.
Why not keep the computation seperate from the presentation and leverage .NET's other strength - its compartmentalized, web service friendly architecture - to access existing scripts written in a more bioinformatician friendly language using SOAP? As an alternative or supplement to web based interfaces .NET applications have something to offer. Accessing a remote web service is literally a one line operation (well, one line after a few clicks). Adding buttons and text boxes is drag n' drop affair. User preferences can be stored in the registry rather than with a transient cookie.
Admittedly this relies on applications being suitable for web services. If there's a lot of computation involved maybe you don't want it all done on your server, especially if there's somebody at the other end of the connection blithely pressing the "Go" button again and again and wondering why nothing is happening...
There's a new issue of PLoS Computational Biology out. I didn't realize that it's now the ISCB's official journal, though I do vaguely remember filling in an ISCB survey form about the importance (or lack thereof) of having an open access official journal a while back.
Anyway, I'm glad. I always thought that it was strange that despite being affiliated with the ISCB OUP's Bioinformatics only gave you access to PDF versions of papers (aargh!) up until fairly recently. That's when you got access at all. Not much good for text mining or semantic markup or... well, anything much.
There's an article in there about how to get published, aimed mostly at students. The takeaway message seemed to be listen to your reviewers and be objective, which is fair enough.
A recent paper by Noble et al. (watch out, PDF) in Bioinformatics spurred me on to finally getting to grips with support vector machines (SVMs). Afterwards I figured that it might be nice if there was a brief tutorial specifically geared towards sequence-based SVMs on the web: so here we are. I'm going to go through the steps involved in replicating (sort of) the results from that paper; given a PC and the ability to install some basic software packages you should be able to follow easily.
Some background
DNA is packaged into chromatin rather than floating around free and easy, so most of the time much of the DNA sequence isn't accessible to regulatory processes in the cell. Some bits of the sequence, however, are associated with "openings" in the chromatin structure. You can detect them in vivo as DNaseI hypersensitive sites (HSs). These hypersensitive sites are very strongly associated with cis regulatory elements in that almost every cis regulatory element is also an HSs.
Up until fairly recently it was fairly difficult to detect HSs in vivo. Recently more high-throughput methods have been developed, but wouldn't it be even better if there was a way to locate all HSs (and thus cis-regulatory regions...) in silico?
The Noble et al. paper talks about how they trained an SVM to differentiate between 280 known hypersensitive sites and 737 controls, acheiving a cross-validation accuracy of ~ 85%. This seems awfully high for a genomics machine learning problem. Then again, Bill Noble the first author wrote GIST, a well known SVM package, so he obviously knows what he's doing. Anyway, to prove that it's possible we're going to do a similar experiment.
Prerequisites
You'll need to download and install libsvm (the latest version) to follow this tutorial.
Libsvm comes as a .zip file which includes Windows binaries: on other platforms you'll need to compile it which is a bit outwith the scope of this tutorial (as is the Java version, which is also in the download package).
Optionally you'll need Python and Gnuplot, but I'll also provide the output of any scripts that use them along the way.
Getting Started
The first thing to do is to put our data into a format suitable for Libsvm. That format looks something like this:
 Each line represents an example (in our case, a sequence). The first number is the class of the example, either positive (1) or negative (-1). Then there's some whitespace. Each pair of numbers seperated by a colon (e.g. 1:0.42, 2:0.4 ...) after that represent features. Features are just numbers that describe the example.
Imagine that we were building an SVM that would classify genes as being either housekeeping genes (we'll make that the positive class) or not (the negative class). We would collect, say, 100 examples of housekeeping genes and 100 non-housekeeping genes. As features we could pick gene length, the number of exons, tissue specificity based on some high-throughput set and so on.
There's a nice introduction to how SVMs actually work available at the DTREG homepage, but for the impatient: to create an SVM the relevant software plots the training set of examples we give it as vectors (points) on a graph with a high number of dimensions. It then mathemagically works out the hyperplane (i.e. a plane in those dimensions) which best seperates the positive class of examples from the negative. After that, to classify a new object with the same kind of features we just need to plot it on the graph and see which side of the hyperplane it's on. The support vectors are vectors which are on the "edge" of the space in-between the two classifications on the graph and which help define the hyperplane.
Before we can construct a training set, we need to decide what features should we use to describe our examples. The number of times that certain motifs appear in each sequence? What about sequences of different length?
Noble et al. use the spectrum kernel, which is a fancy name for a feature set representing the distribution of every possible k-mer in a sequence. For example, the spectrum kernel for K=2 would have 16 features (AA,AC,AT,AG,CA,CC,CT,CG, etc..). The value for each feature is the number of times that particular feature appears in the sequence divided by the number of times any feature of the same length appears in the sequence. The Noble et al. paper uses K=6, but we're going to go with K=3 to keep file sizes down and because it's quicker to train and optimize the SVM that way (extending things to K=6 is left as an exercise to you, the reader). Well, I say K=3, but actually we're going to use all 3mers, 2mers and 1mers as our features.
Download this file, which is the HS and non-HS examples from the Noble Lab homepage converted into libsvm format using the 3mer - 1mer spectrum kernel by a quick Perl script.
Then download this file which is 1000 HS clusters and 1000 non-HS examples from a high-throughput dataset, similarly converted. We'll use it for further testing.
Scaling Features
Preferably, the features of the training and testing sets of sequences that we supply the SVM will be scaled betwen -1 and 1. Furthermore, we should scale the test set in exactly the same way that we scaled the training set: so if 10 becomes 1 in the training set we need to scale 11 to 1.1 in the test set. Why? Because otherwise features with much larger variations than the others skew the results.
Luckily libsvm has a program that does all the scaling stuff for us. Copy the two files that you downloaded above into the same directory as the svmscale program and then type:
svmscale -s noble.range 3mer_noble.svm > noble.train This will scale the contents of 3mer_noble.svm (our training file) and output the results in noble.train. It'll also save the max and min values for each feature in the noble.range file, which we'll now use to scale the testing data in the same way:
svmscale -r noble.range 3mer_ht.svm > noble.test The -r flag tells svmscale to use the max / min range we saved from the training set. You should now have three new files in the directory - noble.range, noble.train and noble.test.
Training the SVM
To train an SVM you use the svmtrain command. There are a couple of options which we may want to play around with to optimize performance later on, but for now let's see what happens with the defaults. Type in:
svmtrain -v 10 noble.train The -v option tells svmtrain to perform 10-fold cross validation on the training set. What's 10-fold cross validation? It's a way of predicting how good a classifier might perform on unseen data. The training set is split up into 10 sections that have the same distribution of positive and negative examples. Then an SVM is trained on 9/10 of the sections and the section left out is tested with it. This is repeated so that each section is the one tested against the other nine and the average accuracy is reported.
This is important in machine learning because of the danger of overfitting to the training data - producing a classifer that is too specific to your training set. Ideally we want a classifer to be able to generalize what it has learned from the training set when it encounters novel data.
In this case the cross validation accuracy is reported as being 84.75%. Cool. But how is performance on our test set? To find out, first we need to train an SVM with the entire training set:
svmtrain noble.train noble.model This saves the SVM trained from the training set into the noble.model file. You can then use the svmpredict program to classify new data.
svmpredict noble.test noble.model noble.test.predictions The first argument to svmpredict is the name of the test file. The second is the SVM to use to classify the test file and the third is where to put the predicted classifications that result. As you'll see from running the above program, accuracy on our testing set is 73.65%. Considering that the high-throughput set is pretty noisy - not everything marked as a positive class is positive and not all the negatives are negative - that's pretty good going. But can it get better?
Optimizing
One of the nice things about libsvm is that it tries to be friendly to people who've never trained an SVM before. To that end there are two python scripts - grid.py and easy.py - in the "tools" directory which do some useful things.
grid.py is an automated way of searching for the best gamma and cost options to use with svmtrain. Setting the cost option might be particularly important when the training set is imbalanced; that is to say that you have more positive examples than negative, or vice versa.
If you've got Python and Gnuplot installed, copy noble.train to the tools directory and run grid.py now.
grid.py -gnuplot [path to your gnuplot binaries] noble.train The script may take a while to run (it'd be a lot worse with 6mers, which is why we opted for K=3 when preparing the data above). Once it finishes, you can get the values of C (cost) and gamma from the final line that it outputs along with the cross validation accuracy rate that those values achieve on the training set you supplied:
32.0 0.0078125 85.8407 The first value is the suggested cost, the second the suggested gamma and the third the cross validation accuracy, which has increased to 85.84% with the new parameters (from 84.75%).
This isn't necessarily the whole story, though. Go back to the directory with the svmtrain and svmpredict binaries and give the new values a go:
svmtrain -c 32 -g 0.0078125 noble.train noble.model svmpredict noble.test noble.model noble.test.predictions
Notice that accuracy on the test set has falled to 72.1% (from 73.65%). Och well - you win some, you lose some. Perhaps we're overfitting to the training set, or maybe the test data is just noisy (a distinct possibility).
Conclusion & Further Exploration
So it looks like cross validation rates of ~ 85% are definitely possible on the Noble et al. HSs training data. This tutorial was a little rough but the best way to learn is by doing. It's not all that difficult to train an SVM, really: most of the work goes into preparing the data.
Libsvm has a lot of nifty features. Check out the -b option, which trains the model to output probability estimates instead of just positive / negative classes. It's a bad idea to rely on "accuracy" as your only performance measurement; unfortunately to get more detailed statistics you'll have to write your own scripts to interpret the predictions output by svmpredict (to determine precision and recall or the Kappa statistic, for example).
If you're looking for a good book that introduces other machine learning concepts, you could do a lot worse than Data Mining: Practical Machine Learning Tools & Techniques written by two of the authors of Weka.
Corrections and amendments welcome.
I dig SOAP (once it's been abstracted out for me by a helper library: maybe I should say that I dig SOAP::Lite). It's good to see that bioinformatics centres are providing web service interfaces to their data.
One thing, though - quite often I want to retrieve DNA sequences that aren't centered around genes (gene-centric genomics? how very last century). And that's strangely difficult with existing bioinformatics web services (it's easy enough with the Ensembl API or with wget and the right URL for UCSC, but not quick).
For some reason they're all based around retrieving sequences by identifier, a la RefSeq. Doesn't seem much good if you're not looking at a feature already defined somewhere. What I'd really like to see is a server which takes in SOAP requests (or, even simpler, uses REST). It'd take four parameters: and return the relevant base sequence. If you really wanted to be fancy you could put repeatmasked bases in lowercase or something. Simple, no?
Does such a server already exist? If not, why not?
Ray Kurzweil (entrepreneur and comp sci "celebrity") and Bill Joy (founder of Sun Microsystems) have written an op-ed piece in the New York Times in which they critize the recent release of the sequence of the 1918 H1N1 influenza virus, which accompanied the paper announcing that the sequencing - or rather, the assembly of the sequence - was finished, in Science.
Bill n' Ray decry the decision to deposit the sequence in Genbank; they say that it's worse than putting up precise instructions on how to build an atomic bomb, since you don't need plutonium or enriched uranium to unleash biological terrors. Continuing on that theme, they also suggest that we need a new Manhattan Project dedicated to developing technologies to help counter any future bioterrorist threat.
The first thing that struck me about this is: why did the New York Times publish an op-ed about genomics written by two computer scientists? (yeah, Ray Kurzweil dabbles in systems biology now, but if you check out that Edge of Computation article from a while back you'll notice that he's more of a futurist commentator than a proper researcher).
The second thing was: isn't this a fairly standard kneejerk reaction? This information could plausibly be dangerous - let's ban it! It's an old story. If it weren't for the heightened awareness (well, fear) brought about by the new avian flu scares then they'd be dismissed as conservative crackpots. There's a tradeoff between potential risk and potential benefit here - a tradeoff that was carefully considered before the sequence was published. Realistically, it's more likely that sharing the opportunity to study the virus will help prevent an impending disaster than causing a new one.
Thirdly: how do they propose to control the knowledge represented by the H1N1 sequence? They suggest getting "suitable security assurances" but is that feasible and would it be enough of a deterrent? If a motivated terrorist really wanted to get hold of the flu sequence then surely it wouldn't be that difficult to obtain it through underhand means (lying, bribing, hacking, whatever). What's a "suitable security assurance", anyway? Would you have to be vetted by the FBI? If the lab is in Pakistan, would that pose a problem?
Finally: how does all the stuff about keeping back the influenze sequence fit with the "new Manhattan Project" idea? I'm guessing that they just mean a massive funding boost rather than the US embarking on a huge new bioweapons research program, but isn't it a rather unfortunate comparison? For it to be worth the investment, what would the return have to be? Some sort of targetted ultra vaccine for any possible threat? How do you protect against pathogens that don't even exist yet?
We know that avian flu will mutate into a form capable of spreading from human to human, we just don't know how yet. Any concentrated research that goes into examing what makes some flu strains particular virulent - particularly if the relevant outbreaks have parallels to our current situation - gets the thumbs up from me. That research wouldn't be nearly so concentrated were the relevant information restricted to a chosen few.
There's an interesting post over at Neil's blog about Methods vs Discovery (I started off adding this as a comment there but decided it was too long and ranty):
One of my pet issues in bioinformatics is that of methodology versus discovery. In other words - is your new, cute and clever piece of software any use if biologists are not using it to discover interesting things? Reading it reminded me of being frustrated in my efforts to find software relating to chromatin structure for a project (not the S/MARs thing, but related) a wee while back. In the last year there've been two potentially really useful tools "released" which could have helped speed my work along. When I say "released", though, I mean that papers describing them were published; neither piece of software was ever actually made available for download (as you'll be aware if you've written an application note recently, editors "strongly suggest" that the software is released under an open source licence or made publically available, but it's just a suggestion).
This isn't too bad; I mean, sometimes code is messy or it's got some peculiar prerequisites or there's some other reason that you want to deal with potential downloaders in person. So I emailed the relevant authors to ask nicely if there was any possibility of obtaining the software, it'd be very useful, all due credit to be given etc.
I never got an answer from one group; the other responded that I'd have to wait until they'd published results from their own analyses with the software in question (presumably in case I had the cheek to try and use their software to discover something significant enough to be publishable before them).
Part of me understands the whole "I did the work, why should you reap the benefits?" idea but frankly it's a bit sneaky to publish a pointless software application note to pad out your CV with when the software is of no use to anybody (though not for lack of them trying). What's the point of a paper announcing that you've got this piece of software that, realistically, nobody else is ever going to use and you know it?
Isn't it short-sighted to keep a piece of software back so that you can get a few more papers out of it when the field is moving so quickly that your code will be obsolete within the next twelve months? Wouldn't it be better for everybody concerned - other scientists, who don't have to reinvent the wheel and implement their own tools, you, who will get cited more frequently and pick up more collaborations and the journal, who won't be publishing vanity pieces any longer - if we all shared?
There are sometimes other complications, of course. There's the possibility that the software in question is actually a bit crap; which will become obvious the moment anybody applies it to something other than the carefully selected test set in the paper (roll on transparent reviewing and reader comments). Perhaps you're using some sort of commercial or proprietary database.
Or perhaps the university technology transfer people got to you...
Biomedcentral are publishing a new open access journal called Biology Direct, which will "encompass all aspects of biology" but is focusing at first on genomics, bioinformatics and systems biology.
Interestingly they're embracing an open peer review model where authors pick their reviewers, who aren't anonymous and can choose to have their comments published alongside the paper. This is similar to David Kaplan's suggestions for reforming peer review, except that to be published in Biology Direct you'll have to pick reviewers from the journal's editorial board. Kaplan suggested that it wouldn't matter if you picked your cronies to review your paper for you because if the reviewers weren't anonymous the editors would notice and could reject the paper on that basis.
As a side note, the blurb says that everything will be on a tight timeframe: reviewers will have only 72 hours to agree to review a manuscript and three weeks after that to deliver their review... no more waiting months for that last reviewer to submit their comments?
If you've never visited Tangled Bank before, this week's issue at Living the Scientific Life has a good crop of science related blog entries for your entertainment and delight including essays on the protein folding problem, the potential new mouse model for Downs Syndrome and evil, scheming lower organisms. Check it out.
To quote the website, Tangled Bank is:
a version of the "Carnival of the Vanities" for science bloggers. A Carnival is a weekly showcase of good weblog writing, selected by the authors themselves (that's the vanity part). Every other week, one of our crew will highlight a collection of interesting weblog articles in one convenient place, making it easy for everyone to find the good stuff. If you're particularly proud of some science-related writing on your blog then why not nominate it? The email address to contact is host@tangledbank.net - alternatively you can contact the blog hosting Tangled Bank the coming week.
Speaking of which, Tangled Bank is coming here on November 16th - so get writing!
Introduction
In the bath the other day I was wondering: if you downloaded all of the titles and abstracts of papers published in bioinformatics journals over the last eight years and then did a little bit of text mining to look for certain patterns, would anything interesting emerge? Could you track fads and fashions? Would you see hype-cycles?
I'll present the full results below, but if you're impatient the answers to the questions above are "sort of" "kind of" and "no", respectively. Otherwise, prepare yourself.... for a lot of dodgy graphs... for wild speculation (and that's just the abstracts, ha ha).... for the Bioinformatics Zeitgeist 2005!
Methodology
All of the abstracts from the last 2,920 days published in Bioinformatics or BMC Bioinformatics were retrieved from PubMed (at first I also included NAR, Genome Research and Genome Biology, but there was too much noise from non-bioinformatics papers). They were grouped by year of publication and counted.
Subject areas were chosen and examplar abstracts for each were selected. These exemplars were scanned for key phrases which were then used to count the rough number of abstracts in each year dealing with a particular subject.
Note that when we talk about "Bioinformatics" it means the journal of that name; "bioinformatics" refers the discipline.
Results
First thing: in terms of raw numbers, the amount of work published in the two journals has grown tremendously. This isn't really surprising when you think about it: journals have a harder time attracting authors when they're just starting out. Bioinformatics has grown from 127 papers per year in 1999 to more than 782 ppy in 2005. Similarly BMC Bioinformatics has gone from publishing 90 papers in 2003 to 299 so far this year. Without a more exhaustive look at other bioinformatics journals unfortunately we can't say how much of this growth is down to an explosion in bioinformatics research and how much is down to these two journals simply becoming better known.
Anyway, what does all that mean? Why, that we'll be dealing in percentages of all available abstracts from now on so that we can look at trends. If you just look at the raw numbers almost every conceivable topic will have experienced growth partly as a side effect of us only looking at these two journals.
Some Trends
Open source is pretty hot right now. It was pretty hot everywhere else in IT a decade ago but you'd expect things to take a while to percolate here in the patent strewn biotech world. And look! The data sort of bears this out. This year ~ 4.5% of abstracts contained the words "open source" or "GPL", up from ~ 1.5% in 2001. Presumably bioinformatics software was being released under open source licences before 2001 but that wasn't considered important enough to be mentioned in the abstract (or perhaps authors thought that their audience didn't care).
Speaking of bioinformatics software, authors handily tend to stick to the same basic title structure when presenting new systems and databases. That structure looks something like this:
SUPERDOODAH [title]: A Novel Tool for ... etc. [description]
The percentage of abstracts whose title matches that pattern stays pretty steady - after a quick drop off in the early years - a little under 15%. I'd postulate that the drop off represents the switch from bioinformatics being all about software to accomplish something specific (like multiple alignments) to more "pure" research (like modelling interaction networks) but who knows?
The "same" drop off is seen in the number of new databases although they sometimes also match the title structure outlined above (so perhaps there's no drop off in software after all, just in databases). Apparently 2001 is the turning point for databases - suddenly we start seeing growth again after a big drop. One semi-plausible sounding theory: locus or task specific databases stop appearing as the genome sequence is completed and the big public repositories and viewers become available, hence the drop: once there's enough data to work with new databases are started up to hold the research coming out of genome-wide studies.
Some Things Never Change
Take four pretty big areas in bioinformatics research: detecting regulatory regions, multiple alignments, predicting protein structure and motif discovery. What has happened to them over the years?
First of all, contrary to my belief it turns out that motif discovery isn't such a big part of bioinformatics research after all; either that or the key phrases that we're looking for aren't very good. Leaving that aside you'll notice that regulatory region detection and protein structure prediction are fairly stable - though perhaps protein structure prediction has tailed off a little (now that it's become much harder?).
Mention of multiple-alignments, though, has gone way down. One explanation might be that the golden days of sequence alignment happened when there was a push to put together the draft human genome (in the late nineties); now interest has flagged.
Newer Ideas
What about new-fangled systems like the Gene Ontology, microarray technology and large scale protein interaction networks? GO started off as a collaboration between Flybase, MGD (mouse) and SGD (yeast) in 1998: at that point 2% of papers mentioned it in their abstracts. Now we're up to 8%, which is actually lower than I'd expected.
There's a quite nice peak in protein interaction network and microarray analysis papers - hurray! Hurray in that the peak has already past, I mean. I'll be happy to see more analyses of network topology once there's more data available, but until then... enough with the is it scale-free or isn't it and what does that mean question.
AI Smackdown
I have a soft spot for prediction and classification in bioinformatics, hence this (slightly more complicated) graph of machine learning techniques. Machine learning techniques and Markov Chains, anyway.
Mining the literature is getting more popular - just over 5% of papers this year mentioned it. "Text mining" in this case means everything from simple entity extraction to inferring protein interactions. As with GO I expected a higher percentage here: there are a lot of papers that don't deal specifically with text mining but that describe systems that leverage the data in PubMed abstracts one way or another.
Neural Networks have gotten less popular over time and Support Vector Machines have gotten more popular, which is perhaps what you'd expect. Apart from anything else SVMs are more popular in text mining nowadays and text mining abstracts are up, so there's a correlation there.
Discussion
Q. Would anything interesting emerge? A. Sort of. Depends on your point of view.
Q. Could you track fads and fashions? A. Kind of - neural networks is down, SVMs and GO are up, microarray analysis and protein interaction networks were popular a year or two ago and are now dipping.
Q. Would you see hype-cycles? A. No. Perhaps the editors reject papers on the basis that they've seen too many dealing with the same topic that month. Perhaps the time-lag between reading about a new technique, implementing it and writing something up about it is too long.
If you'd like the source data, feel free to email me.
See all posts from:
July 2005
August 2005
September 2005
October 2005
November 2005
December 2005
January 2006
February 2006
March 2006
April 2006
May 2006
June 2006
July 2006
September 2006
October 2006
November 2006
December 2006
January 2007
February 2007
March 2007
April 2007
May 2007
June 2007
July 2007
August 2007
October 2007
November 2007
December 2007
January 2008
February 2008
March 2008
April 2008
May 2008
October 2008
December 2008
January 2009
February 2009
March 2009
|
|