Obtain Beautiful Soup.
from BeautifulSoup import BeautifulSoup
''.join(BeautifulSoup(page).findAll(text=True))
where 'page' is your string of text and HTML.
I'm not a pythonista, there might be a nicer way of doing it (Beautiful Soup is a lot of overhead). Might want to expand on this a bit to make sure spacing is handled OK, you can keep certain tags etc. etc. Feel free to post corrections or better suggestions in the comments.
Just don't use one line <(?:.*?)> regular expressions. No, really.Labels: google, html, python