47 dependents
| Package | Description | Downloads/month |
|---|---|---|
| DeepTeam is a framework to red team LLMs and LLM systems. | 56K | |
| RESTAI, so many 'A's and 'I's, so little time... | 17K | |
| A multi-backend evaluation framework for LLM, RAG, and agentic systems. | 4K | |
| IaNL: Infrastructure as Natural Language — AI agent that provisions AWS resource... | 4K | |
| Framework for prototyping of LLM-based applications | 4K | |
| PNNL Auto Multi-Agent AI: Dynamic multi-agent system for building applications | 3K | |
| Benchmark framework for AI coding assistants | 3K | |
| HoloDeck - Experimentation-driven agent experimentation and deployment | 3K | |
| The testing platform for AI teams. Bring engineers, PMs, and domain experts toge... | 2K | |
| A powerful and flexible framework for building, orchestrating, and deploying mul... | 2K | |
| Open source observability and auto-evals for AI agents | 1K | |
| Evaluation framework for CoreAI. | 1K | |
| OpenTelemetry GenAI Utils | 1K | |
| Service to examine data processing pipelines (e.g., machine learning or deep lea... | 1K | |
| Additional packages (components, document stores and the likes) to extend the ca... | 902 | |
| A comprehensive evaluation framework for AI systems | 803 | |
| RAG Evaluation Framework using Ragas metrics and MLflow tracking | 698 | |
| llama-index callbacks deepeval integration | 572 | |
| DeepEval integration adapter for Metrics Computation Engine | 440 | |
| Building DB, asking natural language questions through agents, and evaluate | 413 | |
| Project GenEval: A Unified Evaluation Framework for Generative AI Applications | 376 | |
| 374 | ||
| Probing Stylistic Appropriation in Large Language Models (PSALM): An LLM-as-a-Ju... | 361 | |
| Automatic Prompt Optimization Framework | 314 | |
| A Python framework designed for both generating and evaluating hints. | 301 | |
| free google results | 255 | |
| This library is to search the best parameters across different steps of the RAG ... | 221 | |
| Pre-deploy CI/CD QA dashboard for evaluating LLM and AI-agent outputs | 213 | |
| XRAG: eXamining the Core - Benchmarking Foundational Component Modules in Advanc... | 204 | |
| Toolkit for Active Learning in Generative Tasks | 203 | |
| A comprehensive framework for evaluating Large Language Models with built-in sup... | 148 | |
| MetaBeeAI LLM Pipeline for PDF processing and data extraction | 132 | |
| HuGME is an easy-to-use LLM assessment framework, which explicitly rates the Hun... | 129 | |
| 125 | ||
| A powerful web content fetcher and processor | 117 | |
| Python framework for building AI agents with tool integration, multi-agent workf... | 111 | |
| A MCP server project | 103 | |
| Uniform access layer for LLMs | 102 | |
| Similarity evaluator using Deepevalve | 99 | |
| RAGMeter is a universal evaluation toolkit designed to assess the performance of... | 93 | |
| Add your description here | 85 | |
| Validation framework with feedback loop | 80 | |
| CLI tool to evaluate LLM factuality on MMLU benchmark. | 67 | |
| My first RAGAAS package and test | 62 | |
| A package for evaluating text files. | 5 | |
| Nordlys is an AI lab building a Mixture of Models. This repository contains the ... | 4 | |
| Python framework for building AI agents with tool integration, multi-agent workf... | 1 |