Interface for easier topic modelling.
Define models to represent a textual document, e.g. a PDF, preserving the hierarchy of the content.
Wrapper for the PageXML C++ library to ease handling of Page XML files within python.
Python implementation of bag-of-concepts