Llama Python Packages

sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

298.9M 27K 6K

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

9.2M 79K 16K

strands-agents

A model-driven approach to building AI agents in just a few lines of code.

5.6M 6K 816

torchao

PyTorch native quantization and sparsity for training and inference

3.5M 3K 502

strands-agents-tools

A set of tools that gives agents powerful capabilities.

3.1M 1K 293

unsloth

Web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.

1.9M 64K 6K

unsloth-zoo

Web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.

1.4M 64K 6K

aider-chat

aider is AI pair programming in your terminal

887K 44K 4K

liger-kernel

Efficient Triton Kernels for LLM Training

793K 6K 526

curated-transformers

🤖 A PyTorch library of curated Transformer models and their composable components

529K 895 35

gptcache

Semantic cache for LLMs. Fully integrated with LangChain and llama_index.

480K 8K 580

sglang-kernel

SGLang is a high-performance serving framework for large language models and multimodal models.

269K 27K 6K

sgl-kernel

SGLang is a high-performance serving framework for large language models and multimodal models.

257K 27K 6K

ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-R1, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Gemma4, Llava, Phi4, ...) (AAAI 2025).

174K 14K 1K

strands-agents-builder

An example agent demonstrating streaming, tool use, and interactivity from your terminal. This agent builder can help you to build your own agents and tools.

148K 407 86

vllm-tpu

A high-throughput and memory-efficient inference and serving engine for LLMs

144K 79K 16K

tensorzero

TensorZero is an open-source LLMOps platform that unifies an LLM gateway, observability, evaluation, optimization, and experimentation.

78K 11K 821

transformerlab

The open source research environment for AI researchers to seamlessly train, evaluate, and scale models from local hardware to GPU clusters.

63K 5K 510

liger-kernel-nightly

Efficient Triton Kernels for LLM Training

63K 6K 526

xinference

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop — all through one unified, production-ready inference API.

45K 9K 824