Restricted Access & the HGMD
The drawback is that there's no easy way to get at the data. Visiting the website, your only option is to search by gene; you'll then get a list of mutations that the gene contains. There's no form of advanced search and no way to bulk download the contents of the database (via a condoned channel; shadily, there's always wget configured with a time delay).
This is obviously something that the authors have considered. However, in their paper in NAR they mention that:
Since HGMD is partly dependent upon industrial funding and involves considerable editorial work over and above mere literature screening (e.g. to ensure the consistency of nucleotide sequence information, amino acid residue numbering and gene symbol usage), unsolved copyright problems have so far precluded HGMD from being downloadable in its entirety.It disturbs me slightly that this sort of thing is an issue. I think that it's because as opposed to lab based genetics, bioinformatics resources are usually free; programming languages (Java, Perl), libraries (Bio*, NCBI's API, Seqhound) and data (Ensembl, PubMed abstracts, GNF expression data...). Free is a tricky concept nowadays, of course, but I mean in the sense that they are usually free to obtain and to use in an academic environment.
Just to be clear, I'm not disparaging the work of the people involved in the HGMD, just the politics behind some of their policies. The fact remains that the HGMD is a good database. It has the potential to be even better, though.
Why not release copyright on this kind of data, or allow researchers to use the relevant information after signing a release to ensure that they stick to your terms and conditions? Restricting access in this way (especially without explanation, unless you've read the relevant part of the paper) surely just annoys scientists. There's no corporate peer pressure anymore; even Celera has given up trying to hoard genomic data that goes out of date by the time you've worked out how to charge for it.
Let open access work for you. The mutations in HGMD are often culled from literature and relating them to reference sequences is remarkably difficult. An internal database identifier and "Asn351 to Asp" is only great if people know which transcript is being talked about. Make the first condition of using HGMD data that any derived analyses be made publically available too. Presumably the first thing that some people will do is take a Perl script and dbSNP and start mapping. Start including annotation derived from HGMD by places like SNPs3D.
There's a note of hope in the next paragraph of the paper in NAR.
Once the closer cooperation with publically funded bioinformatics institutions currently envisaged has been put in place, unrestricted access to the database will become possible.Publically funded bioinformatics institutions? In the UK?
Back to OMIM and dbSNP it is.
