CYaRon: Yet Another Random Olympic-iNformatics test data generator
Production-grade Python framework for evaluating LLM and agentic systems with traditional scorers, LLM judges (OpenAI, Anthropic, Ollama, 100+ models via LiteLLM), ensemble aggregation, and smart caching for cost-effective testing.
Given source code, Makefile (or build commands), input files, and answer files then judge the program locally.
BOJ-Offline-Judge는 백준 온라인 저지를 CLI, 혹은 Python 스크립트를 통해 이용 하기 위해 제작한 API 입니다.
Judge and evaluate your chunk quality with Judie, the Owl! Quick, easy, and effective!
🤔 wondering if your chunks are good? 🦉 Judie is here to Judge and Evaluate your Chunks! ✨