PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
lightseekorg
smg-grpc-proto

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.

829K 206 62
lightseekorg
smg-grpc-servicer

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.

560K 206 62
kvcache-ai
mooncake-transfer-engine

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

338K 5K 720
intel
auto-round

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

71K 1K 125
ModelCloud
gptqmodel

LLM model quantization (compression) toolkit with HW acceleration support for Nvidia, AMD, Intel GPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

38K 1K 185
intel
auto-round-nightly

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

18K 1K 125
intel
auto-round-lib

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

9K 1K 125
kvcache-ai
mooncake-transfer-engine-cuda13

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

7K 5K 720
vroomfondel
dgxarley

Ansible playbooks for a 3-node K3s cluster with NVIDIA DGX Spark nodes for distributed LLM inference

7K 1 0
theoddden
terradev-cli

Cross-Cloud Compute Optimization Platform with Migration & Evaluation - v4.0.12

3K 10 1
horizon-rl
strands-sglang

SGLang model provider for Strands Agents for on-policy agentic RL training.

3K 52 8
lightseekorg
smg

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.

3K 206 62
kvcache-ai
mooncake-transfer-engine-non-cuda

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

3K 5K 720
coconut-labs
infergrid

Tenant-fair LLM inference orchestration on a single GPU. No Kubernetes.

2K 1 1
intel
auto-round-hpu

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

2K 1K 125
swarmone
agentic-coding-bench

Open-source benchmark for LLM inference on agentic coding workloads

2K 0 0
FlyTOmeLight
llm-cal

LLM inference hardware calculator — architecture-aware (MLA/NSA/MoE), engine-aware (vLLM/SGLang), honest-labeled. Reads real safetensors bytes; supports 53 GPUs (NVIDIA / AMD / Huawei Ascend / 沐曦 / 昆仑芯 / 壁仞 / 寒武纪 / 海光).

1K 1 0
ovg-project
kvcached

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

1K 902 107
coconut-labs
kvwarden

Tenant-fair LLM inference orchestration on a single GPU. No Kubernetes.

996 2 1
intel
auto-round-kernel

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

744 1K 125
recursia-lab
anchor-vision

Python client for Anchor — PaliGemma2 multi-LoRA vision inference

559 0 0
fahmiaziz98
docvision

Production-ready document parsing with Vision Language Models

489 1 0
HuiResearch
flashtts

基于SparkTTS、OrpheusTTS等模型,提供高质量中文语音合成与声音克隆服务。

303 601 76
theoddden
terradev-mcp

Complete Agentic GPU Infrastructure for Claude Code

117 10 2
    • Data from PyPI, GitHub, ClickHouse, and BigQuery