Module for automatic summarization of text documents and HTML pages.
Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)
A module and command-line utility to extract structured data from HTML