PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
sgl-project
sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

287.7M 27K 6K
vllm-project
vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

9.4M 79K 16K
sgl-project
sglang-kernel

SGLang is a high-performance serving framework for large language models and multimodal models.

264K 27K 6K
sgl-project
sgl-kernel

SGLang is a high-performance serving framework for large language models and multimodal models.

256K 27K 6K
vllm-project
vllm-tpu

A high-throughput and memory-efficient inference and serving engine for LLMs

143K 79K 16K
NVIDIA
tensorrt-llm

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

16K 14K 2K
sgl-project
sglang-kt

SGLang is a high-performance serving framework for large language models and multimodal models.

4K 27K 6K
patrick-toulme
pyptx

A Python DSL to write Nvidia PTX for Hopper and Blackwell in JAX and PyTorch

3K 265 15
m96-chan
pygpukit

Minimal GPU runtime for Python - high-performance CUDA kernels, memory management, and LLM inference without heavy dependencies

3K 2 0
jpietek
penguin-burner

Nvidia ultimate undervolting companion on Linux. Now with a nice UI. Supports MSI Afterburner profile imports and LACT profile exports. Can automatically scan for the most optimal GPU VF curve and generate silent fan curves.

2K 31 0
sgl-project
dblcsgen

DBLC Fast Structured Generation

570 27K 6K
vllm-project
vllm-hust

A high-throughput and memory-efficient inference and serving engine for LLMs

437 79K 16K
vllm-project
wxy-test

A high-throughput and memory-efficient inference and serving engine for LLMs

375 2K 1K
vllm-project
vllm-xft

A high-throughput and memory-efficient inference and serving engine for LLMs

345 79K 16K
vllm-project
ai-dynamo-vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

344 79K 16K
vllm-project
vllm-acc

A high-throughput and memory-efficient inference and serving engine for LLMs

342 79K 16K
vllm-project
nextai-vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

273 79K 16K
vllm-project
vllm-consul

A high-throughput and memory-efficient inference and serving engine for LLMs

219 79K 16K
vllm-project
vllm-npu

A high-throughput and memory-efficient inference and serving engine for LLMs

209 79K 16K
vllm-project
vllm-musa

A high-throughput and memory-efficient inference and serving engine for LLMs

194 79K 16K
vllm-project
vllm-rocm

A high-throughput and memory-efficient inference and serving engine for LLMs

176 79K 16K
egaoharu-kensei
flash-attention-triton

Cross-platform FlashAttention-2 Triton implementation for Turing+ GPUs with custom configuration mode

154 26 0
sgl-project
sgltmp

SGLang is yet another fast serving framework for large language models and vision language models.

147 27K 6K
vllm-project
vllm-emissary

A high-throughput and memory-efficient inference and serving engine for LLMs

132 79K 16K
    • Data from PyPI, GitHub, ClickHouse, and BigQuery