82 dependents
| Package | Description | Downloads/month |
|---|---|---|
| String-to-String Algorithms for Natural Language Processing | 30K | |
| TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, ... | 14K | |
| The robust European language model benchmark. | 13K | |
| UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ... | 6K | |
| document QA(RAG) package using langchain and chromadb | 6K | |
| LLM Evaluation Framework | 6K | |
| The robust European language model benchmark. | 5K | |
| Evaluation and adaption method for the UNICORN Challenge | 2K | |
| Monocle is a framework for tracing GenAI app code. This repo contains implementa... | 2K | |
| A Rag framework for evaluation | 2K | |
| Uncertainty Estimation Toolkit for Transformer Language Models | 2K | |
| 🤖🛡️🔍🔒🔑 Tiny package designed to support red teams and penetration testers in exp... | 2K | |
| Evaluation framework for CoreAI. | 1K | |
| Automate your RAG research. | 1K | |
| convert embedding vectors back to text | 1K | |
| RadEval: A framework for radiology text evaluation | 1K | |
| metric with langchain | 1K | |
| This library provides a comprehensive suite of metrics to evaluate the performan... | 1K | |
| ENCOURAGE | 1K | |
| A summarization pipeline using T5 with evaluation metrics and utilities. | 1K | |
| A collection of functions for use in the statistics and machine learning pipelin... | 984 | |
| A Python Commonsense Knowledge Inference Toolkit | 881 | |
| Hallucination detection package | 856 | |
| Amazon Foundation Model Evaluations | 838 | |
| A library for distilling models from prompts. | 662 | |
| A tool for evaluating LLMs | 652 | |
| Versatile Python library designed for evaluating the performance of large langua... | 595 | |
| Production RL environment for training LLMs on hallucination avoidance — 1M+ exa... | 576 | |
| A factuality evaluation metric for evaluating plain language summaries using que... | 468 | |
| A comprehensive evaluation toolkit for RAG and LLMs. | 467 | |
| Tools and Techniques for Consistency Benchmarking | 447 | |
| Sorbonne University Master MIND - Large Language Models course plugin | 445 | |
| A Plug-and-Play and Comprehensive evaluation framework specifically for long-con... | 431 | |
| A library for minimum Bayes risk (MBR) decoding. | 424 | |
| Moonshot - A simple and modular tool to evaluate and red-team any LLM applicatio... | 400 | |
| Dependency bundle for testing AI Agents | 385 | |
| Quick Command Line Tools for Model Deployment | 377 | |
| RAG Evaluator is a Python library for evaluating Retrieval-Augmented Generation ... | 328 | |
| Benchmark of Generative Large Language Models in Danish | 326 | |
| YiVal is an open-source project designed to revolutionize the way developers and... | 321 | |
| Automatic Prompt Optimization Framework | 314 | |
| Multi-Agent Large Language Models for Collaborative Task-Solving. | 291 | |
| Metrics for radiology eval | 277 | |
| Open source framework for evaluating AI Agents | 261 | |
| Lightweight, modular library for evaluating LLM outputs with built-in metrics, c... | 255 | |
| Compositional framework for building and orchestrating Compound AI Systems and N... | 250 | |
| Official repository for our EACL 2023 paper "LongEval: Guidelines for Human Eval... | 237 | |
| FactSumm: Factual Consistency Scorer for Abstractive Summarization | 211 | |
| LangRAGEval is a library for evaluating responses based on faithfulness, context... | 208 | |
| XRAG: eXamining the Core - Benchmarking Foundational Component Modules in Advanc... | 204 |