Dependents of sacrebleu

141 dependents

Package	Description	Downloads/month
lm-eval	A framework for few-shot evaluation of language models.	1.4M
unbabel-comet	A Neural Framework for MT Evaluation	272K
tf-models-nightly	Models and examples built with TensorFlow	145K
tf-models-official	Models and examples built with TensorFlow	68K
tensorflow-model-analysis	Model analysis tools for TensorFlow	64K
evalscope	A streamlined and customizable framework for efficient large model (LLM, VLM, AI...	45K
string2string	String-to-String Algorithms for Natural Language Processing	30K
t5	Code for the paper "Exploring the Limits of Transfer Learning with a Unified Tex...	28K
fairseq2	FAIR Sequence Modeling Toolkit 2	23K
opennmt-py	Open Source Neural Machine Translation and (Large) Language Models in PyTorch	23K
lighteval	Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backend...	22K
lit-nlp	The Learning Interpretability Tool: Interactively analyze ML models to understan...	21K
supervertaler	Open-source, AI-enhanced CAT tool with multi-LLM support, translation memory, gl...	18K
lmms-eval	One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio T...	16K
nvidia-lm-eval	A framework for evaluating language models - packaged by NVIDIA	14K
scandeval	The robust European language model benchmark.	13K
subset2evaluate	Find informative examples to efficiently (human)-evaluate NLG models.	12K
py-openjudge	OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards	11K
ms-opencompass	OpenCompass is an LLM evaluation platform, supporting a wide range of models (LL...	10K
autorag	AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evalu...	9K
orcheo	A full-stack, agentic workflow programming platform. Built for vibe-coding and w...	7K
eval-framework	Comprehensive LLM evaluation at scale: A production-ready framework for evaluati...	6K
llm-eval-toolkit	LLM Evaluation Framework	6K
flexeval		6K
opencompass	OpenCompass is an LLM evaluation platform, supporting a wide range of models (Ll...	5K
euroeval	The robust European language model benchmark.	5K
indictranstoolkit	A simple, consistent, and extendable module for IndicTrans2 compatible with HF m...	5K
multimetriceval	Multi-metric evaluation toolkit supporting MT, ASR, TTS, SimulST, VC, and Parali...	5K
dynamiq	Dynamiq is an orchestration framework for agentic AI and LLM applications	5K
flexrag	Flexible RAG (Retrieval-Augmented Generation) framework for building AI applicat...	4K
paddlespeech	Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Stream...	3K
dashai	DashAI: an interactive platform for training, evaluating and deploying AI models	3K
evals	Evals is a framework for evaluating LLMs and LLM systems, and an open-source reg...	3K
holodeck-ai	HoloDeck - Experimentation-driven agent experimentation and deployment	3K
ellora	Automated Hyperparameter Optimization Platform for Efficient LLM Fine-Tuning	3K
auto-lora	ADyFT(Auto Dynamic Fine Tuning) automates parameter-efficient fine-tuning of Lar...	3K
openstbench	OpenSTBench is an evaluation toolkit centered on translation and speech translat...	2K
simuleval	SimulEval: A Flexible Toolkit for Automated Machine Translation Evaluation	2K
lm-polygraph	Uncertainty Estimation Toolkit for Transformer Language Models	2K
evalution	Modern LLM model evaluation for Transformers, SGLang, vLLM, TensorRT-LLM, llama....	2K
translate-package	Contain functions and classes to efficiently train a sequence to sequence to tra...	2K
flagai	FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensibl...	1K
autorag-research	Automate your RAG research.	1K
pharia-studio-sdk		1K
vec2text	convert embedding vectors back to text	1K
whisper-eval-serbian	An evaluation framework for Serbian Whisper models.	1K
omnisteval	Evaluate simultaneous speech/text translation systems (shortform and longform) w...	1K
janus-llm	A transcoding library using LLMs.	1K
comprehensive-rag-evaluation-metrics	This library provides a comprehensive suite of metrics to evaluate the performan...	1K
sotabencheval	Easily benchmark Machine Learning models on selected tasks and datasets	1K