Converts a whole subdirectory with a big (or small) volume of PDF documents to a dataset (pandas DataFrame) with error tracking and choice of features
pdf2image port to a CLI version
Generate images and thumbnails based on bitmap transformations of rendered prose