Reward Shaping Python Packages | PyPI Stats

host-pytorch

Implementation of Humanoid Standing Up, from the paper "Learning Humanoid Standing-up Control across Diverse Postures" out of Shanghai, in Pytorch

3K 45 5

verdict

Inference-time scaling for LLMs-as-a-judge.

2K 339 25

opfgym

Reinforcement Learning environments for learning the Optimal Power Flow

281 29 3

shaner

for shaping RL agent package.

116 3 1