Flags and Lollipops

Tuesday, February 07, 2006

Biomedical PDFs

Alf @ Hublog did a quick survey of the PDFs available from a variety of different biomedical publishers. He looked at things like authentication methods, whether or not simple metadata was included in the PDF document properties and what the default filename for downloaded PDFs was. The results are quite interesting.

None of the publishers did everything right, even though none of the things Alf was looking for are particularly difficult to implement. BioMedCentral came quite close, but then they're an internet based publisher, so perhaps you'd expect that.

I have an inherent hatred of all things PDF, mainly because my PC at work has a strange problem with embedding Acrobat in Firefox (it hangs for up to a minute whenever I click on a link that leads to a PDF). I've always preferred to just capture the relevant fulltext HTML with the ScrapBook extension, set to capture the appropriate depth of links.

Comments and trackbacks Feel free to post your comments Anonymous alf Blogger Stew Anonymous Deepak Blogger Jared Ryan Clemence Anonymous Anonymous Anonymous Anonymous Anonymous Anonymous . This post has trackbacks.

Trackbacks:

7 Comments:

At February 07, 2006 4:34 PM, Anonymous alf said...

It's interesting that you capture the fulltext HTML - my next survey will be the markup structure of the HTML pages, with the aim of making it easier to capture using scrapers. Do you do any parsing on the sections of the paper, or just save the file as a block?

 
At February 07, 2006 5:34 PM, Blogger Stew said...

Hi Alf. Nope, I don't do any parsing. Having the fulltext makes it easier to just grep for what you're looking for, though.

 
At February 14, 2006 5:28 PM, Anonymous Deepak said...

Strangely enough, I am in the "no pdf will not read paper" camp. I agree with you though that the quality and standards are all over the map.

 
At March 26, 2006 11:38 PM, Blogger Jared Ryan Clemence said...

I agree with deepak. PDF's are much more convenient for me to both read and store for future reference. This makes my research process much more organized and simple.

-Jared Clemence
Biomedical Engineer

 
At September 15, 2006 10:18 PM, Anonymous Anonymous said...

Great work!
[url=http://baqsacjd.com/qcqj/xibe.html]My homepage[/url] | [url=http://vrigykcq.com/maap/uwqc.html]Cool site[/url]

 
At September 15, 2006 10:18 PM, Anonymous Anonymous said...

Nice site!
My homepage | Please visit

 
At September 15, 2006 10:18 PM, Anonymous Anonymous said...

Thank you!
http://baqsacjd.com/qcqj/xibe.html | http://jfllcesv.com/xyyh/wurz.html

 

Post a Comment

<< Home


See all posts from: July 2005 August 2005 September 2005 October 2005 November 2005 December 2005 January 2006 February 2006 March 2006 April 2006 May 2006 June 2006 July 2006 September 2006 October 2006 November 2006 December 2006 January 2007 February 2007 March 2007 April 2007 May 2007 June 2007 July 2007 August 2007 October 2007 November 2007 December 2007 January 2008 February 2008 March 2008 April 2008 May 2008