Inference Python Packages

sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

298.9M 27K 6K

filetype

Small, dependency-free, fast Python package to infer binary file types checking the magic numbers signature

32.6M 761 120

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

9.2M 79K 16K

ctranslate2

Fast inference engine for Transformer models

8.4M 4K 478

faster-whisper

Faster Whisper transcription with CTranslate2

7.6M 23K 2K

torchao

PyTorch native quantization and sparsity for training and inference

3.5M 3K 502

mediapipe

Cross-platform, customizable ML solutions for live and streaming media.

3.1M 35K 6K

optimum

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

1.7M 3K 639

openvino

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

1.4M 10K 3K

deepspeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

1.3M 42K 5K

inference-gpu

Turn any computer or edge device into a command center for your computer vision projects.

1.1M 2K 260

inference-cli

Turn any computer or edge device into a command center for your computer vision projects.

836K 2K 260

onnx-graphsurgeon

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

772K 13K 2K

vllm-omni

A framework for efficient model inference with omni-modality models

476K 5K 867

python-toon

🐍 TOON for Python (Token-Oriented Object Notation) Encoder/Decoder - Reduce LLM token costs by 30-60% with structured data.

361K 341 13

tensorrt-cu12-bindings

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

347K 13K 2K

mooncake-transfer-engine

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

344K 5K 720

any-llm-sdk

Communicate with an LLM provider using a single interface

334K 2K 167

tensorrt-cu12

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

283K 13K 2K

sglang-kernel

SGLang is a high-performance serving framework for large language models and multimodal models.

269K 27K 6K

multi-model-server

Multi Model Server is a tool for serving neural net models for inference

258K 1K 229

sgl-kernel

SGLang is a high-performance serving framework for large language models and multimodal models.

257K 27K 6K

tensorrt-cu12-libs

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

256K 13K 2K

model-archiver

Multi Model Server is a tool for serving neural net models for inference

247K 1K 229