46 dependents
Package Description Downloads/month
Toolkit for linearizing PDFs for LLM datasets/training 41K
The robust European language model benchmark. 13K
A multilingual phonemizer combining lexica, NLP, and probabilistic scoring for i... 8K
Fast profanity word, curse word, swear word, bad word filtering tool for English... 7K
Convert scientific posters (PDF/images) to structured JSON metadata using Large ... 6K
Comprehensive LLM evaluation at scale: A production-ready framework for evaluati... 6K
The robust European language model benchmark. 5K
Captions read and write 4K
Play ChatGPT and other LLM with Xiaomi AI Speaker 4K
Text preprocessing and PII anonymisation for NLP/ML. ONNX NER ensemble, language... 2K
2K
GBD XML ETL package 2K
2K
Transcribe and translate voice into LRC file using Whisper and LLMs (GPT, Claude... 2K
A collection of all our phonemeizers for dataset construction and inference 1K
Impactu utils for kahi plugins 1K
Package for creating synthetic datasets while preserving privacy. 1K
A python script to iterate over a list of PDF in a directory and try to guess th... 1K
OpusFilter - Parallel corpus processing toolkit 1K
A simple language detection library for short texts. 1K
caltechlibrary iga
The InvenioRDM GitHub Archiver (IGA) automatically archives GitHub releases in a... 1K
CoLRev: An open-source environment for collaborative reviews 997
Data anonymization package, supporting different anonymization strategies 957
842
Extension to bpm-ai for local AI inference 815
754
LangEvals lingua evaluator for language detection. 700
Visual Element-based Saliency Toolkit for multimodal webpage saliency extraction... 673
Emacs Annotation and Language Learning tool. 577
Detect temporal expressions in Slack messages ("tomorrow at 5 pm") and translate... 536
Repository for Multililngual Generation, RAG evaluations, and surrogate judge tr... 447
A package for translating text and detecting languages 436
telegram bot with various functions 348
... 299
Semantic subtitle aligner and merger for bilingual subtitle syncing. 296
Convert text to IPA 238
Python package that provides tokenization of multilingual texts using language-s... 214
Convert Android & iOS strings files to any supported file type and vice versa. 190
A keyword analyser which uses YAKE and Lingua 188
Browser-integrated LinkedIn companion offering intelligent job filtering alongsi... 163
Sentiment analysis pipeline for texts in multiple languages. 122
CLI to turn Markdown notes into SEO briefs, drafts, metadata, and translations u... 118
A simple language detection library for short texts. 116
A utility for normalizing persian, arabic and english texts 79
Convert Android & iOS string files to any supported file type and vice versa. 11
Second level google search 3