82 dependents
Package Description Downloads/month
String-to-String Algorithms for Natural Language Processing 30K
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, ... 14K
The robust European language model benchmark. 13K
UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ... 6K
document QA(RAG) package using langchain and chromadb 6K
LLM Evaluation Framework 6K
The robust European language model benchmark. 5K
Evaluation and adaption method for the UNICORN Challenge 2K
Monocle is a framework for tracing GenAI app code. This repo contains implementa... 2K
A Rag framework for evaluation 2K
Uncertainty Estimation Toolkit for Transformer Language Models 2K
🤖🛡️🔍🔒🔑 Tiny package designed to support red teams and penetration testers in exp... 2K
Evaluation framework for CoreAI. 1K
Automate your RAG research. 1K
convert embedding vectors back to text 1K
RadEval: A framework for radiology text evaluation 1K
metric with langchain 1K
This library provides a comprehensive suite of metrics to evaluate the performan... 1K
ENCOURAGE 1K
A summarization pipeline using T5 with evaluation metrics and utilities. 1K
A collection of functions for use in the statistics and machine learning pipelin... 984
A Python Commonsense Knowledge Inference Toolkit 881
Hallucination detection package 856
Amazon Foundation Model Evaluations 838
A library for distilling models from prompts. 662
A tool for evaluating LLMs 652
Versatile Python library designed for evaluating the performance of large langua... 595
Production RL environment for training LLMs on hallucination avoidance — 1M+ exa... 576
A factuality evaluation metric for evaluating plain language summaries using que... 468
A comprehensive evaluation toolkit for RAG and LLMs. 467
Tools and Techniques for Consistency Benchmarking 447
Sorbonne University Master MIND - Large Language Models course plugin 445
A Plug-and-Play and Comprehensive evaluation framework specifically for long-con... 431
A library for minimum Bayes risk (MBR) decoding. 424
Moonshot - A simple and modular tool to evaluate and red-team any LLM applicatio... 400
Dependency bundle for testing AI Agents 385
Quick Command Line Tools for Model Deployment 377
RAG Evaluator is a Python library for evaluating Retrieval-Augmented Generation ... 328
Benchmark of Generative Large Language Models in Danish 326
YiVal is an open-source project designed to revolutionize the way developers and... 321
Automatic Prompt Optimization Framework 314
Multi-Agent Large Language Models for Collaborative Task-Solving. 291
Metrics for radiology eval 277
Open source framework for evaluating AI Agents 261
Lightweight, modular library for evaluating LLM outputs with built-in metrics, c... 255
Compositional framework for building and orchestrating Compound AI Systems and N... 250
Official repository for our EACL 2023 paper "LongEval: Guidelines for Human Eval... 237
FactSumm: Factual Consistency Scorer for Abstractive Summarization 211
LangRAGEval is a library for evaluating responses based on faithfulness, context... 208
XRAG: eXamining the Core - Benchmarking Foundational Component Modules in Advanc... 204