Llm Evaluation Toolkit Python Packages

qa-metrics

An easy python package to run quick basic QA evaluations. This package includes standardized QA evaluation metrics and semantic evaluation metrics: Black-box and Open-Source large language model prompting and evaluation, exact match, F1 Score, PEDANT semantic match, transformer match. Our package also supports prompting OPENAI and Anthropic API.

11K 61 6

parea-ai

Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

10K 82 11

langtest

Pacific AI provides a library for delivering safe & effective NLP models.

3K 556 49

nlptest

Deliver safe & effective language models

2K 556 49

scalexi

scalexi is a versatile open-source Python library, optimized for Python 3.11+, focuses on facilitating low-code development and fine-tuning of diverse Large Language Models (LLMs).

2K 13 2

evalsense

Tools for systematic large language model evaluations

724 4 1

Search Packages