PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
lightseekorg
smg-grpc-proto

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.

829K 206 62
lightseekorg
smg-grpc-servicer

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.

560K 206 62
kvcache-ai
mooncake-transfer-engine

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

338K 5K 720
LMCache
lmcache

Supercharge Your LLM with the Fastest KV Cache Layer

120K 8K 1K
kserve
kserve

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

114K 5K 1K
intel
auto-round

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

71K 1K 125
stackav-oss
conch-triton-kernels

A "standard library" of Triton kernels.

44K 24 3
xorbitsai
xinference

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop — all through one unified, production-ready inference API.

43K 9K 824
ModelCloud
gptqmodel

LLM model quantization (compression) toolkit with HW acceleration support for Nvidia, AMD, Intel GPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

38K 1K 185
MekayelAnik
vllm-cpu

Wheels & Docker images for running vLLM on CPU-only systems, optimized for different CPU instruction sets

31K 6 0
intel
auto-round-nightly

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

18K 1K 125
containers
ramalama

RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.

11K 3K 337
project-david-ai
projectdavid-platform

A single pip installed package will orchestrate a production ready instance of the AI stack in any environment

11K 1 0
intel
auto-round-lib

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

9K 1K 125
Alberto-Codes
turboquant-vllm

TurboQuant KV cache compression plugin for vLLM — asymmetric K/V, 8 models validated, consumer GPUs

8K 46 5
ModelEngine-Group
uc-manager

Persist and reuse KV Cache to speedup your LLM.

7K 274 73
vllm-project
vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

7K 2K 1K
kvcache-ai
mooncake-transfer-engine-cuda13

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

7K 5K 720
katanaml
sparrow-parse

Structured data extraction and instruction calling with ML, LLM and Vision LLM

6K 5K 515
call518
logsentinelai

LLM-powered security log analyzer: detect threats & anomalies with zero regex — just declare a Pydantic schema. Real-time Telegram alerts, SIEM-ready with Elasticsearch/Kibana. Supports OpenAI, Ollama, vLLM.

5K 46 9
Embedl
flash-head

FlashHead: Efficient Drop-In Replacement for the Classification Head in Language Model Inference

4K 6 1
theoddden
terradev-cli

Cross-Cloud Compute Optimization Platform with Migration & Evaluation - v4.0.12

3K 10 1
lightseekorg
smg

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat history, tokenization caching, Responses API, embeddings, WASM plugins, MCP, and multi-tenant auth.

3K 206 62
MekayelAnik
vllm-cpu-avx512bf16

Wheels & Docker images for running vLLM on CPU-only systems, optimized for different CPU instruction sets

3K 6 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery