Blackwell Python Packages

sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

306.7M 27K 6K

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

8.9M 79K 16K

sglang-kernel

SGLang is a high-performance serving framework for large language models and multimodal models.

273K 27K 6K

sgl-kernel

SGLang is a high-performance serving framework for large language models and multimodal models.

254K 27K 6K

vllm-tpu

A high-throughput and memory-efficient inference and serving engine for LLMs

145K 79K 16K

tensorrt-llm

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

16K 14K 2K

sglang-kt

SGLang is a high-performance serving framework for large language models and multimodal models.

4K 27K 6K

pyptx

A Python DSL to write Nvidia PTX for Hopper and Blackwell in JAX and PyTorch

3K 265 15

pygpukit

Minimal GPU runtime for Python - high-performance CUDA kernels, memory management, and LLM inference without heavy dependencies

3K 2 0

penguin-burner

Nvidia ultimate undervolting companion on Linux. Now with a nice UI. Supports MSI Afterburner profile imports and LACT profile exports. Can automatically scan for the most optimal GPU VF curve and generate silent fan curves.

2K 31 0

taiwan-asr-toolkit

Production-grade Traditional Chinese / Taiwan Mandarin speech-to-text. Qwen3-ASR + MediaTek Breeze-ASR-25, hot-word injection, LLM polish, speaker diarization. RTF up to 1554x on RTX 5090, 56 TDD tests.

759 1 0

dblcsgen

SGLang is a high-performance serving framework for large language models and multimodal models.

613 27K 6K

vllm-xft

A high-throughput and memory-efficient inference and serving engine for LLMs

485 79K 16K

vllm-acc

A high-throughput and memory-efficient inference and serving engine for LLMs

484 79K 16K

vllm-hust

A high-throughput and memory-efficient inference and serving engine for LLMs

480 79K 16K

ai-dynamo-vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

437 79K 16K

wxy-test

A high-throughput and memory-efficient inference and serving engine for LLMs

394 2K 1K

nextai-vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

378 79K 16K

vllm-consul

A high-throughput and memory-efficient inference and serving engine for LLMs

306 79K 16K

vllm-npu

A high-throughput and memory-efficient inference and serving engine for LLMs

281 79K 16K

vllm-musa

A high-throughput and memory-efficient inference and serving engine for LLMs

279 79K 16K

vllm-emissary

A high-throughput and memory-efficient inference and serving engine for LLMs

188 79K 16K

vllm-usf

A high-throughput and memory-efficient inference and serving engine for LLMs

166 79K 16K

flash-attention-triton

Cross-platform FlashAttention-2 Triton implementation for Turing+ GPUs with custom configuration mode

146 26 0