Universal evaluation layer for OpenEnv agentic RL environments. Measures what an agent learned - not just how much reward it accumulated.