Dependents of bert-score

82 dependents

Package	Description	Downloads/month
string2string	String-to-String Algorithms for Natural Language Processing	30K
textattack	TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, ...	14K
scandeval	The robust European language model benchmark.	13K
uqlm	UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ...	6K
akasha-terminal	document QA(RAG) package using langchain and chromadb	6K
llm-eval-toolkit	LLM Evaluation Framework	6K
euroeval	The robust European language model benchmark.	5K
unicorn-eval	Evaluation and adaption method for the UNICORN Challenge	2K
monocle-test-tools	Monocle is a framework for tracing GenAI app code. This repo contains implementa...	2K
ragworkbench	A Rag framework for evaluation	2K
lm-polygraph	Uncertainty Estimation Toolkit for Transformer Language Models	2K
aisploit	🤖🛡️🔍🔒🔑 Tiny package designed to support red teams and penetration testers in exp...	2K
core-ai-evaluation	Evaluation framework for CoreAI.	1K
autorag-research	Automate your RAG research.	1K
vec2text	convert embedding vectors back to text	1K
radeval	RadEval: A framework for radiology text evaluation	1K
langevaluate	metric with langchain	1K
comprehensive-rag-evaluation-metrics	This library provides a comprehensive suite of metrics to evaluate the performan...	1K
encourage-rag	ENCOURAGE	1K
t5summarypratik	A summarization pipeline using T5 with evaluation metrics and utilities.	1K
bilbystats	A collection of functions for use in the statistics and machine learning pipelin...	984
kogito	A Python Commonsense Knowledge Inference Toolkit	881
halludetector	Hallucination detection package	856
fmeval	Amazon Foundation Model Evaluations	838
prompt2model	A library for distilling models from prompts.	662
arthur-bench	A tool for evaluating LLMs	652
saga-llm-evaluation	Versatile Python library designed for evaluating the performance of large langua...	595
openenv-halluguard	Production RL environment for training LLMs on hallucination avoidance — 1M+ exa...	576
plainqafact	A factuality evaluation metric for evaluating plain language summaries using que...	468
rag-eval-tool	A comprehensive evaluation toolkit for RAG and LLMs.	467
consistencybench	Tools and Techniques for Consistency Benchmarking	447
su-master-mind-llm	Sorbonne University Master MIND - Large Language Models course plugin	445
loom-scope	A Plug-and-Play and Comprehensive evaluation framework specifically for long-con...	431
mbrs	A library for minimum Bayes risk (MBR) decoding.	424
projectmoonshot-imda	Moonshot - A simple and modular tool to evaluate and red-team any LLM applicatio...	400
ai-validation-automation-deps	Dependency bundle for testing AI Agents	385
modelzipper	Quick Command Line Tools for Model Deployment	377
rag-evaluate	RAG Evaluator is a Python library for evaluating Retrieval-Augmented Generation ...	328
danoliterate	Benchmark of Generative Large Language Models in Danish	326
yival	YiVal is an open-source project designed to revolutionize the way developers and...	321
coolprompt	Automatic Prompt Optimization Framework	314
mallm	Multi-Agent Large Language Models for Collaborative Task-Solving.	291
rad-metric	Metrics for radiology eval	277
vero-eval	Open source framework for evaluating AI Agents	261
evalbench	Lightweight, modular library for evaluating LLM outputs with built-in metrics, c...	255
keiro-ember	Compositional framework for building and orchestrating Compound AI Systems and N...	250
longeval	Official repository for our EACL 2023 paper "LongEval: Guidelines for Human Eval...	237
factsumm	FactSumm: Factual Consistency Scorer for Abstractive Summarization	211
langrageval	LangRAGEval is a library for evaluating responses based on faithfulness, context...	208
examinationrag	XRAG: eXamining the Core - Benchmarking Foundational Component Modules in Advanc...	204