47 dependents
Package Description Downloads/month
DeepTeam is a framework to red team LLMs and LLM systems. 56K
RESTAI, so many 'A's and 'I's, so little time... 17K
A multi-backend evaluation framework for LLM, RAG, and agentic systems. 4K
IaNL: Infrastructure as Natural Language — AI agent that provisions AWS resource... 4K
Framework for prototyping of LLM-based applications 4K
PNNL Auto Multi-Agent AI: Dynamic multi-agent system for building applications 3K
Benchmark framework for AI coding assistants 3K
HoloDeck - Experimentation-driven agent experimentation and deployment 3K
The testing platform for AI teams. Bring engineers, PMs, and domain experts toge... 2K
A powerful and flexible framework for building, orchestrating, and deploying mul... 2K
Open source observability and auto-evals for AI agents 1K
Evaluation framework for CoreAI. 1K
OpenTelemetry GenAI Utils 1K
Service to examine data processing pipelines (e.g., machine learning or deep lea... 1K
Additional packages (components, document stores and the likes) to extend the ca... 902
A comprehensive evaluation framework for AI systems 803
RAG Evaluation Framework using Ragas metrics and MLflow tracking 698
llama-index callbacks deepeval integration 572
DeepEval integration adapter for Metrics Computation Engine 440
Building DB, asking natural language questions through agents, and evaluate 413
Project GenEval: A Unified Evaluation Framework for Generative AI Applications 376
374
Probing Stylistic Appropriation in Large Language Models (PSALM): An LLM-as-a-Ju... 361
Automatic Prompt Optimization Framework 314
A Python framework designed for both generating and evaluating hints. 301
free google results 255
This library is to search the best parameters across different steps of the RAG ... 221
Pre-deploy CI/CD QA dashboard for evaluating LLM and AI-agent outputs 213
XRAG: eXamining the Core - Benchmarking Foundational Component Modules in Advanc... 204
Toolkit for Active Learning in Generative Tasks 203
A comprehensive framework for evaluating Large Language Models with built-in sup... 148
MetaBeeAI LLM Pipeline for PDF processing and data extraction 132
HuGME is an easy-to-use LLM assessment framework, which explicitly rates the Hun... 129
125
A powerful web content fetcher and processor 117
Python framework for building AI agents with tool integration, multi-agent workf... 111
A MCP server project 103
Uniform access layer for LLMs 102
Similarity evaluator using Deepevalve 99
RAGMeter is a universal evaluation toolkit designed to assess the performance of... 93
Add your description here 85
Validation framework with feedback loop 80
CLI tool to evaluate LLM factuality on MMLU benchmark. 67
My first RAGAAS package and test 62
A package for evaluating text files. 5
Nordlys is an AI lab building a Mixture of Models. This repository contains the ... 4
Python framework for building AI agents with tool integration, multi-agent workf... 1