Flags and Lollipops

Friday, December 05, 2008

Strip HTML tags from a string, Ruby edition

Get Hpricot.


require 'hpricot'
page = Hpricot("<b>some marked up <i>text</i></b>")
puts page.to_plain_text


Interestingly the Hpricot FAQ says:


Q: How do I strip all HTML tags from a page?
A: Use regex replace!
A2: The regex is ok, but will break in some cases, even with valid html. Try the to_plain_text or inner_text methods instead.

Comments and trackbacks Feel free to post your comments Anonymous Michael Barton Blogger Stew Anonymous Rohit Blogger Crystal P. . This post has trackbacks.

Trackbacks:

4 Comments:

At December 05, 2008 11:23 AM, Anonymous Michael Barton said...

You're using Ruby/Rails at Nature?

 
At December 05, 2008 11:25 AM, Blogger Stew said...

Yeah, Precedings, Network, Nature.com Blogs and a bunch of other social sites use Rails.

The journal platform and other 'business critical' stuff is Java, though.

 
At February 19, 2009 10:55 AM, Anonymous Rohit said...

bioinformatics is very nice subject but in india there is not oppertunity..if you have completed your msc or phd in bioinforamtics or life science then you can join in this field but main thing u should have a good refernce.

 
At April 06, 2009 4:11 AM, Blogger Crystal P. said...

great post thanks!!

No Prescription Needed

 

Post a Comment

<< Home


See all posts from: July 2005 August 2005 September 2005 October 2005 November 2005 December 2005 January 2006 February 2006 March 2006 April 2006 May 2006 June 2006 July 2006 September 2006 October 2006 November 2006 December 2006 January 2007 February 2007 March 2007 April 2007 May 2007 June 2007 July 2007 August 2007 October 2007 November 2007 December 2007 January 2008 February 2008 March 2008 April 2008 May 2008 October 2008 December 2008 January 2009 February 2009 March 2009