46 dependents
| Package | Description | Downloads/month |
|---|---|---|
| Toolkit for linearizing PDFs for LLM datasets/training | 41K | |
| The robust European language model benchmark. | 13K | |
| A multilingual phonemizer combining lexica, NLP, and probabilistic scoring for i... | 8K | |
| Fast profanity word, curse word, swear word, bad word filtering tool for English... | 7K | |
| Convert scientific posters (PDF/images) to structured JSON metadata using Large ... | 6K | |
| Comprehensive LLM evaluation at scale: A production-ready framework for evaluati... | 6K | |
| The robust European language model benchmark. | 5K | |
| Captions read and write | 4K | |
| Play ChatGPT and other LLM with Xiaomi AI Speaker | 4K | |
| Text preprocessing and PII anonymisation for NLP/ML. ONNX NER ensemble, language... | 2K | |
| 2K | ||
| GBD XML ETL package | 2K | |
| 2K | ||
| Transcribe and translate voice into LRC file using Whisper and LLMs (GPT, Claude... | 2K | |
| A collection of all our phonemeizers for dataset construction and inference | 1K | |
| Impactu utils for kahi plugins | 1K | |
| Package for creating synthetic datasets while preserving privacy. | 1K | |
| A python script to iterate over a list of PDF in a directory and try to guess th... | 1K | |
| OpusFilter - Parallel corpus processing toolkit | 1K | |
| A simple language detection library for short texts. | 1K | |
| The InvenioRDM GitHub Archiver (IGA) automatically archives GitHub releases in a... | 1K | |
| CoLRev: An open-source environment for collaborative reviews | 997 | |
| Data anonymization package, supporting different anonymization strategies | 957 | |
| 842 | ||
| Extension to bpm-ai for local AI inference | 815 | |
| 754 | ||
| LangEvals lingua evaluator for language detection. | 700 | |
| Visual Element-based Saliency Toolkit for multimodal webpage saliency extraction... | 673 | |
| Emacs Annotation and Language Learning tool. | 577 | |
| Detect temporal expressions in Slack messages ("tomorrow at 5 pm") and translate... | 536 | |
| Repository for Multililngual Generation, RAG evaluations, and surrogate judge tr... | 447 | |
| A package for translating text and detecting languages | 436 | |
| telegram bot with various functions | 348 | |
| ... | 299 | |
| Semantic subtitle aligner and merger for bilingual subtitle syncing. | 296 | |
| Convert text to IPA | 238 | |
| Python package that provides tokenization of multilingual texts using language-s... | 214 | |
| Convert Android & iOS strings files to any supported file type and vice versa. | 190 | |
| A keyword analyser which uses YAKE and Lingua | 188 | |
| Browser-integrated LinkedIn companion offering intelligent job filtering alongsi... | 163 | |
| Sentiment analysis pipeline for texts in multiple languages. | 122 | |
| CLI to turn Markdown notes into SEO briefs, drafts, metadata, and translations u... | 118 | |
| A simple language detection library for short texts. | 116 | |
| A utility for normalizing persian, arabic and english texts | 79 | |
| Convert Android & iOS string files to any supported file type and vice versa. | 11 | |
| Second level google search | 3 |