PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Sglang Python Packages

Python packages with the GitHub topic sglang. Sorted by relevance, with stars and monthly downloads.
lightseekorg
smg-grpc-proto

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.

868K 206 62
lightseekorg
smg-grpc-servicer

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.

593K 206 62
kvcache-ai
mooncake-transfer-engine

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

348K 5K 720
intel
auto-round

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

73K 1K 125
ModelCloud
gptqmodel

LLM model quantization (compression) toolkit with HW acceleration support for Nvidia, AMD, Intel GPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

39K 1K 185
intel
auto-round-nightly

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

18K 1K 125
intel
auto-round-lib

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

9K 1K 125
kvcache-ai
mooncake-transfer-engine-cuda13

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

8K 5K 720
vroomfondel
dgxarley

Ansible playbooks for a 3-node K3s cluster with NVIDIA DGX Spark nodes for distributed LLM inference

5K 1 0
theoddden
terradev-cli

Cross-Cloud Compute Optimization Platform with Migration & Evaluation - v4.0.12

3K 10 1
lightseekorg
smg

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.

3K 206 62
horizon-rl
strands-sglang

SGLang model provider for Strands Agents for on-policy agentic RL training.

3K 52 8
kvcache-ai
mooncake-transfer-engine-non-cuda

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

3K 5K 720
coconut-labs
infergrid

Tenant-fair LLM inference orchestration on a single GPU. No Kubernetes.

2K 1 1
intel
auto-round-hpu

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

2K 1K 125
swarmone
agentic-coding-bench

Open-source benchmark for LLM inference on agentic coding workloads

2K 0 0
FlyTOmeLight
llm-cal

LLM inference hardware calculator — architecture-aware (MLA/NSA/MoE), engine-aware (vLLM/SGLang), honest-labeled. Reads real safetensors bytes; supports 53 GPUs (NVIDIA / AMD / Huawei Ascend / 沐曦 / 昆仑芯 / 壁仞 / 寒武纪 / 海光).

2K 1 0
coconut-labs
kvwarden

Tenant-fair LLM inference orchestration on a single GPU. No Kubernetes.

1K 2 1
ovg-project
kvcached

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

1K 902 107
intel
auto-round-kernel

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

714 1K 125
recursia-lab
anchor-vision

Python client for Anchor — PaliGemma2 multi-LoRA vision inference

631 0 0
fahmiaziz98
docvision

Production-ready document parsing with Vision Language Models

575 1 0
HuiResearch
flashtts

基于SparkTTS、OrpheusTTS等模型,提供高质量中文语音合成与声音克隆服务。

294 601 76
theoddden
terradev-mcp

Complete Agentic GPU Infrastructure for Claude Code

120 10 2
    • Data from PyPI, GitHub, ClickHouse, and BigQuery