Dependents of fasttext-wheel

49 dependents

Package	Description	Downloads/month
dolma	Data and tools for generating and inspecting OLMo pre-training data.	44K
spacy-fastlang	Language detection using Spacy and Fasttext	43K
opennmt-py	Open Source Neural Machine Translation and (Large) Language Models in PyTorch	23K
fasttext-langdetect-wheel	80x faster and 95% accurate language identification with Fasttext	3K
stringsifter	A machine learning tool that ranks strings based on their relevance for malware ...	3K
open-dataflow	Modern Data Centric AI system for Large Language Models	3K
s2and	S2AND	2K
text-tagging-model	Here we collected some online and offline models for text tagging.	1K
fastspell	Targetted language identifier, based on FastText and Hunspell.	1K
pelican-nlp	Preprocessing and Extraction of Linguistic Information for Computational Analysi...	1K
open-dataflow-adp	Easy Data Preparation with latest LLMs-based Operators and Pipelines.	1K
bicleaner-hardrules	Pre-filtering step for bicleaner	1K
python-po-lint	Lint .po translation files for contamination, wrong languages, shifts, and garbl...	1K
short-language-detection	A simple language detection library for short texts.	899
ads-bib	Pipeline for querying and turning NASA's ADS publications metadata into curated,...	890
llmkira	⚡️ Build Your Own chatgpt Bot\|🧀 Discord/Slack/Kook/Telegram \|⛓ ToolCall\|🔖 Plugin...	847
eole	Open language modeling toolkit based on PyTorch	752
pamola-core	Pamola Core library for data anonymization, privacy models, metrics, and utiliti...	737
luga	Blazing fast language detection using fastText model	693
dataverse	An open-source simplifies ETL workflow with Python based on Spark	682
sapientml-preprocess	A SapientML plugin of preprocess CodeBlockGenerator	643
fastlid	Detect language of a given text, fast	577
saujana-nlp	Saujana NLP for World Embedding	564
text-machina	Text Machina: Seamless Generation of Machine-Generated Text Datasets	513
mosaic-model	Here I collected some online and offline models for text tagging.	444
saujana	Saujana NLP for World Embedding	435
turkic-translit	Deterministic Latin and IPA transliteration for Kazakh, Kyrgyz, Uzbek, Turkish, ...	422
fasttext-reducer	A tiny package (and standalone script) for downloading any pretrained fasttext w...	402
openla-feature-representation	A Python module that adds features to OpenLA data to make it easier to use for M...	398
robust-lid	Robust Language Identification using an ensemble of 5-7 LID backends	354
turkic-transliterate	Deterministic Latin and IPA transliteration for Kazakh, Kyrgyz, plus tokenizer/g...	249
data-analysis-tools-fdx	A data processor package	237
textlangid	Detects the language of text	226
data-modori	LMOps Tool for Korean	223
dreamml	Framework for creating, running and validation of ML models on tabular data	213
text-quality	Detect quality of (digitized) text.	207
yvestest	An open-source simplifies ETL workflow with Python based on Spark	193
intelli3text	Ingestion (web/PDF/DOCX/TXT), cleaning, paragraph-level LID (PT/EN/ES), and spaC...	193
data-analysis-similarity-tool-jms	A data processor package	173
simre	Requirements Similarity tool for Software Product Lines	163
sygra	SyGra - Graph-oriented Synthetic data generation Pipeline	158
studcamp-yandex-hse	Text-tagging project within Yandex x HSE StudCamp event	155
izihawa-langdetect		107
gunertensors	Advanced AI Optimization Toolkit	87
ggd-py-utils	A collection of utility functions for my projects.	79
llm-kira	chatbot client for llm	77
langcure	Detects and fixes AI word hallucinations in multilingual text	69
dingo-client	Dingo: A Comprehensive AI Data, Model and Application Quality Evaluation Tool	49
llm-web-kit	LLM Web Kit for processing web content	16