79 dependents
| Package | Description | Downloads/month |
|---|---|---|
| Convert PDF to structured text using MistralOCR | 11K | |
| OpenAICompletion processor | 9K | |
| Convert PDF to structured text using MinerU | 9K | |
| Sherpa sentence chunking processor | 9K | |
| Annotator based on Facebook's Acronyms | 8K | |
| OpenAIVision converter | 8K | |
| Sherpa Consolidation processor | 7K | |
| Convert HTML to text using inscriptis | 7K | |
| Groupe RF XML formatter | 6K | |
| Convert PDF to structured text using Grobid | 6K | |
| Annotator based on Facebook Duckling | 6K | |
| Convert PPTX to text using python-pptx | 6K | |
| NewsML converter (AFP news) | 6K | |
| Tabular formatter for Sherpa | 6K | |
| Rule-based segmenter | 5K | |
| Annotator based on Spacy NER | 5K | |
| Sherpa AFP Quality formatter | 5K | |
| AFPEntities annotations coming from different annotators | 5K | |
| Convert DOCX to Markdown using [mammoth](https://github.com/mwilliamson/python-m... | 5K | |
| Segmenter based on BlingFire | 5K | |
| Fetch and convert Pubmed articles | 5K | |
| syntok segmenter | 5K | |
| Rule-based segmenter | 5K | |
| Create segments from annotations | 5K | |
| DeepL processor plugin for pymultirole | 5K | |
| Sherpa Consolidation processor | 5K | |
| Annotator based on Presidio pattern recognizer | 4K | |
| Cairn.info XML converter | 4K | |
| Processor based on Presidio anonymizer | 4K | |
| WhisperX converter for audio transcription with speaker diarization support. | 4K | |
| OpenAIAudio converter | 4K | |
| Convert XLSX to 1-segment per row document | 4K | |
| Create segments from annotations based on Renseignor document structure | 4K | |
| SpacyMatcher annotator using the spacy rule-matching engine | 4K | |
| Sherpa reconciliation processor | 4K | |
| Convert PDF to structured text using PaddleOCR | 4K | |
| Sherpa transform annotations to categories processor | 4K | |
| Replace document text with capitalized annotations | 4K | |
| Markdown splitter segmenter | 4K | |
| Sherpa IPTC category mapper | 4K | |
| Processor based on Nameparser | 4K | |
| Processor based on AFP keywords extraction | 4K | |
| RFConsolidate annotations coming from different annotators | 4K | |
| Sherpa transform annotations to categories processor | 4K | |
| Annotator based on entity-fishing | 3K | |
| Json formatter for Sherpa | 3K | |
| Annotator based on Huggingface transformers zero-shot classification pipeline | 2K | |
| Processor based on Huggingface transformers zero-shot classification pipeline | 2K | |
| Formatter based on Huggingface transformers summarization pipeline | 2K | |
| Annotator based on Trankit NER | 2K |