79 dependents
Package Description Downloads/month
Convert PDF to structured text using MistralOCR 11K
OpenAICompletion processor 9K
Convert PDF to structured text using MinerU 9K
Sherpa sentence chunking processor 9K
Annotator based on Facebook's Acronyms 8K
OpenAIVision converter 8K
Sherpa Consolidation processor 7K
Convert HTML to text using inscriptis 7K
Groupe RF XML formatter 6K
Convert PDF to structured text using Grobid 6K
Annotator based on Facebook Duckling 6K
Convert PPTX to text using python-pptx 6K
NewsML converter (AFP news) 6K
Tabular formatter for Sherpa 6K
Rule-based segmenter 5K
Annotator based on Spacy NER 5K
Sherpa AFP Quality formatter 5K
AFPEntities annotations coming from different annotators 5K
Convert DOCX to Markdown using [mammoth](https://github.com/mwilliamson/python-m... 5K
Segmenter based on BlingFire 5K
Fetch and convert Pubmed articles 5K
syntok segmenter 5K
Rule-based segmenter 5K
Create segments from annotations 5K
DeepL processor plugin for pymultirole 5K
Sherpa Consolidation processor 5K
Annotator based on Presidio pattern recognizer 4K
Cairn.info XML converter 4K
Processor based on Presidio anonymizer 4K
WhisperX converter for audio transcription with speaker diarization support. 4K
OpenAIAudio converter 4K
Convert XLSX to 1-segment per row document 4K
Create segments from annotations based on Renseignor document structure 4K
SpacyMatcher annotator using the spacy rule-matching engine 4K
Sherpa reconciliation processor 4K
Convert PDF to structured text using PaddleOCR 4K
Sherpa transform annotations to categories processor 4K
Replace document text with capitalized annotations 4K
Markdown splitter segmenter 4K
Sherpa IPTC category mapper 4K
Processor based on Nameparser 4K
Processor based on AFP keywords extraction 4K
RFConsolidate annotations coming from different annotators 4K
Sherpa transform annotations to categories processor 4K
Annotator based on entity-fishing 3K
Json formatter for Sherpa 3K
Annotator based on Huggingface transformers zero-shot classification pipeline 2K
Processor based on Huggingface transformers zero-shot classification pipeline 2K
Formatter based on Huggingface transformers summarization pipeline 2K
Annotator based on Trankit NER 2K