75 dependents
Package Description Downloads/month
String-to-String Algorithms for Natural Language Processing 30K
An Agentic Framework for Reflective PowerPoint Generation 7K
7K
Modern Data Centric AI system for Large Language Models 3K
ActiveTigger in Python 2K
average plugin classifications for language detection 2K
Search Providers package for Mediacloud 2K
Ingest sources with proper citation — PDF, URL, media, Office, DJVU 1K
mmf
mmf: a modular framework for vision and language multimodal research. 1K
Aamraz which is written "ئامراز" in kurdish script means "instrument". This proj... 1K
A simple tool to make the video, audio, subtitle and video-url (especially youtu... 1K
Repository for CARTE: Context-Aware Representation of Table Entries 929
nlu模型推理服务 899
X-Voice 571
AutoML library for solving only text -> label task' 548
Axiomatic constraints for information retrieval and retrieval-augmented generati... 534
TozaText is a cleaning library for preprocessing raw Uzbek and multilingual text... 518
Stylistic Device Detection Tool 488
Pandas extension with NLP functionalities 455
437
A package for translating text and detecting languages 417
Position-aware, cross-lingually aligned word embeddings built on FastText 414
A collection of Orange3 widgets to perform natural language processing 394
387
SISTER (SImple SenTence EmbeddeR) 380
Extracts citations from PDF, URLs and local media files in CSL-JSON. 369
量化fasttext并测试其性能 363
Template for AI chatbots & document management using Retrieval-Augmented Generat... 356
Repository for TARTE: Transformer Augmented Representation of Table Entries 352
One Line To Build Any Classifier Without Data 323
FastText_Shop是一个基于FastText和结巴分词的短文本分类工具,特点是高效易用,同时支持中文和英文语料。基本使用方法、灵感来自TextGroce... 322
305
Python-implementation of Discriminative Lexicon Model / Linear Discriminative Le... 278
A Python package for analyzing multilingual text. 272
Utilities for cleaning up text corpus 255
dualtext alignment making use of a remote API for embedding 253
Code for WECHSEL: Effective initialization of subword embeddings for cross-lingu... 235
Offcial Python implementation of "FOCUS: Effective Embedding Initialization for ... 231
LMOps Tool for Korean 223
Lookup and/or predict gender of given first name. 213
211
Awesome document classifcation - Implementation of major techniques 202
This project is build on top of whatthelang and langid 194
Ingestion (web/PDF/DOCX/TXT), cleaning, paragraph-level LID (PT/EN/ES), and spaC... 193
Scalable Data Preprocessing Tool for Training Large Language Models 185
A library for detecting verbatim-duplicated contents within a vast amount of doc... 179
A small and fast language identification model powered by fastText 178
172
Browser-integrated LinkedIn companion offering intelligent job filtering alongsi... 161
SyGra - Graph-oriented Synthetic data generation Pipeline 158