Blackwell Python Packages

sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

287.7M 27K 6K

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

9.4M 79K 16K

sglang-kernel

SGLang is a high-performance serving framework for large language models and multimodal models.

264K 27K 6K

sgl-kernel

SGLang is a high-performance serving framework for large language models and multimodal models.

256K 27K 6K

vllm-tpu

A high-throughput and memory-efficient inference and serving engine for LLMs

143K 79K 16K

tensorrt-llm

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

16K 14K 2K

sglang-kt

SGLang is a high-performance serving framework for large language models and multimodal models.

4K 27K 6K

pyptx

A Python DSL to write Nvidia PTX for Hopper and Blackwell in JAX and PyTorch

3K 265 15

pygpukit

Minimal GPU runtime for Python - high-performance CUDA kernels, memory management, and LLM inference without heavy dependencies

3K 2 0

penguin-burner

Nvidia ultimate undervolting companion on Linux. Now with a nice UI. Supports MSI Afterburner profile imports and LACT profile exports. Can automatically scan for the most optimal GPU VF curve and generate silent fan curves.

2K 31 0

dblcsgen

DBLC Fast Structured Generation

570 27K 6K

vllm-hust

A high-throughput and memory-efficient inference and serving engine for LLMs

437 79K 16K

wxy-test

A high-throughput and memory-efficient inference and serving engine for LLMs

375 2K 1K

vllm-xft

A high-throughput and memory-efficient inference and serving engine for LLMs

345 79K 16K

ai-dynamo-vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

344 79K 16K

vllm-acc

A high-throughput and memory-efficient inference and serving engine for LLMs

342 79K 16K

nextai-vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

273 79K 16K

vllm-consul

A high-throughput and memory-efficient inference and serving engine for LLMs

219 79K 16K

vllm-npu

A high-throughput and memory-efficient inference and serving engine for LLMs

209 79K 16K

vllm-musa

A high-throughput and memory-efficient inference and serving engine for LLMs

194 79K 16K

vllm-rocm

A high-throughput and memory-efficient inference and serving engine for LLMs

176 79K 16K

flash-attention-triton

Cross-platform FlashAttention-2 Triton implementation for Turing+ GPUs with custom configuration mode

154 26 0

sgltmp

SGLang is yet another fast serving framework for large language models and vision language models.

147 27K 6K

vllm-emissary

A high-throughput and memory-efficient inference and serving engine for LLMs

132 79K 16K

Search Packages