Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
Multinear platform
GenderBench - Evaluation suite for gender biases in LLMs
needle in a haystack for LLMs