PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Vllm Python Packages

Python packages with the GitHub topic vllm. Sorted by relevance, with stars and monthly downloads.
lightseekorg
smg-grpc-proto

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.

868K 206 62
lightseekorg
smg-grpc-servicer

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.

593K 206 62
kvcache-ai
mooncake-transfer-engine

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

348K 5K 720
kserve
kserve

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

116K 5K 1K
LMCache
lmcache

Supercharge Your LLM with the Fastest KV Cache Layer

112K 8K 1K
intel
auto-round

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

73K 1K 125
stackav-oss
conch-triton-kernels

A "standard library" of Triton kernels.

45K 24 3
xorbitsai
xinference

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop — all through one unified, production-ready inference API.

44K 9K 824
ModelCloud
gptqmodel

LLM model quantization (compression) toolkit with HW acceleration support for Nvidia, AMD, Intel GPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

39K 1K 185
MekayelAnik
vllm-cpu

Wheels & Docker images for running vLLM on CPU-only systems, optimized for different CPU instruction sets

29K 6 0
intel
auto-round-nightly

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

18K 1K 125
containers
ramalama

RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.

11K 3K 337
intel
auto-round-lib

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

9K 1K 125
project-david-ai
projectdavid-platform

A single pip installed package will orchestrate a production ready instance of the AI stack in any environment

9K 1 0
kvcache-ai
mooncake-transfer-engine-cuda13

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

8K 5K 720
vllm-project
vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

7K 2K 1K
Alberto-Codes
turboquant-vllm

TurboQuant KV cache compression plugin for vLLM — asymmetric K/V, 8 models validated, consumer GPUs

7K 46 5
ModelEngine-Group
uc-manager

Persist and reuse KV Cache to speedup your LLM.

7K 274 73
katanaml
sparrow-parse

Structured data extraction and instruction calling with ML, LLM and Vision LLM

6K 5K 515
call518
logsentinelai

LLM-powered security log analyzer: detect threats & anomalies with zero regex — just declare a Pydantic schema. Real-time Telegram alerts, SIEM-ready with Elasticsearch/Kibana. Supports OpenAI, Ollama, vLLM.

5K 46 9
Embedl
flash-head

FlashHead: Efficient Drop-In Replacement for the Classification Head in Language Model Inference

4K 6 1
MekayelAnik
vllm-cpu-avx512bf16

Wheels & Docker images for running vLLM on CPU-only systems, optimized for different CPU instruction sets

3K 6 0
theoddden
terradev-cli

Cross-Cloud Compute Optimization Platform with Migration & Evaluation - v4.0.12

3K 10 1
lightseekorg
smg

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.

3K 206 62
    • Data from PyPI, GitHub, ClickHouse, and BigQuery