75 dependents
| Package | Description | Downloads/month |
|---|---|---|
| String-to-String Algorithms for Natural Language Processing | 30K | |
| An Agentic Framework for Reflective PowerPoint Generation | 7K | |
| 7K | ||
| Modern Data Centric AI system for Large Language Models | 3K | |
| ActiveTigger in Python | 2K | |
| average plugin classifications for language detection | 2K | |
| Search Providers package for Mediacloud | 2K | |
| Ingest sources with proper citation — PDF, URL, media, Office, DJVU | 1K | |
| mmf: a modular framework for vision and language multimodal research. | 1K | |
| Aamraz which is written "ئامراز" in kurdish script means "instrument". This proj... | 1K | |
| A simple tool to make the video, audio, subtitle and video-url (especially youtu... | 1K | |
| Repository for CARTE: Context-Aware Representation of Table Entries | 929 | |
| nlu模型推理服务 | 899 | |
| X-Voice | 571 | |
| AutoML library for solving only text -> label task' | 548 | |
| Axiomatic constraints for information retrieval and retrieval-augmented generati... | 534 | |
| TozaText is a cleaning library for preprocessing raw Uzbek and multilingual text... | 518 | |
| Stylistic Device Detection Tool | 488 | |
| Pandas extension with NLP functionalities | 455 | |
| 437 | ||
| A package for translating text and detecting languages | 417 | |
| Position-aware, cross-lingually aligned word embeddings built on FastText | 414 | |
| A collection of Orange3 widgets to perform natural language processing | 394 | |
| 387 | ||
| SISTER (SImple SenTence EmbeddeR) | 380 | |
| Extracts citations from PDF, URLs and local media files in CSL-JSON. | 369 | |
| 量化fasttext并测试其性能 | 363 | |
| Template for AI chatbots & document management using Retrieval-Augmented Generat... | 356 | |
| Repository for TARTE: Transformer Augmented Representation of Table Entries | 352 | |
| One Line To Build Any Classifier Without Data | 323 | |
| FastText_Shop是一个基于FastText和结巴分词的短文本分类工具,特点是高效易用,同时支持中文和英文语料。基本使用方法、灵感来自TextGroce... | 322 | |
| 305 | ||
| Python-implementation of Discriminative Lexicon Model / Linear Discriminative Le... | 278 | |
| A Python package for analyzing multilingual text. | 272 | |
| Utilities for cleaning up text corpus | 255 | |
| dualtext alignment making use of a remote API for embedding | 253 | |
| Code for WECHSEL: Effective initialization of subword embeddings for cross-lingu... | 235 | |
| Offcial Python implementation of "FOCUS: Effective Embedding Initialization for ... | 231 | |
| LMOps Tool for Korean | 223 | |
| Lookup and/or predict gender of given first name. | 213 | |
| 211 | ||
| Awesome document classifcation - Implementation of major techniques | 202 | |
| This project is build on top of whatthelang and langid | 194 | |
| Ingestion (web/PDF/DOCX/TXT), cleaning, paragraph-level LID (PT/EN/ES), and spaC... | 193 | |
| Scalable Data Preprocessing Tool for Training Large Language Models | 185 | |
| A library for detecting verbatim-duplicated contents within a vast amount of doc... | 179 | |
| A small and fast language identification model powered by fastText | 178 | |
| 172 | ||
| Browser-integrated LinkedIn companion offering intelligent job filtering alongsi... | 161 | |
| SyGra - Graph-oriented Synthetic data generation Pipeline | 158 |