Dpo Python Packages | PyPI Stats

soup-cli

Soup turns the pain of LLM fine-tuning into a simple workflow. One config, one command, done.

11K 53 7

afterimage

Generate conversational, tool-calling, structured-output, and preference datasets — easily and at scale

3K 36 1

mlx-lm-lora

Train LLMs on Apple silicon with MLX and the Hugging Face Hub

3K 335 42

oumi

Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!

2K 9K 760

openpo

Build high quality synthetic datasets with AI feedback from 200+ LLMs

1K 27 0

oat-llm

🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

788 652 64

sillm-mlx

Running and training LLMs on Apple Silicon via MLX

334 286 26

oxrl

A lightweight post-training framework for LLMs and VLMs

306 16 1

knowlyr-sandbox

Gymnasium-style RL framework for LLM agent training — MDP environments, three-layer process reward & SFT/DPO/GRPO policy optimization. CLI + MCP ready.

252 3 0

knowlyr-hub

Gymnasium-style RL framework for LLM agent training — MDP environments, three-layer process reward & SFT/DPO/GRPO policy optimization. CLI + MCP ready.

246 3 0

knowlyr-core

Gymnasium-style RL framework for LLM agent training — MDP environments, three-layer process reward & SFT/DPO/GRPO policy optimization. CLI + MCP ready.

244 3 0

knowlyr-recorder

Gymnasium-style RL framework for LLM agent training — MDP environments, three-layer process reward & SFT/DPO/GRPO policy optimization. CLI + MCP ready.

240 3 0

mlora-cli

The cli tools for mLoRA system.

228 376 66

knowlyr-reward

Gymnasium-style RL framework for LLM agent training — MDP environments, three-layer process reward & SFT/DPO/GRPO policy optimization. CLI + MCP ready.

222 3 0