news-please - an integrated web crawler and information extractor for news that just works
Extract the main article content (and optionally comments) from a web page