PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Inference Python Packages

Python packages with the GitHub topic inference. Sorted by relevance, with stars and monthly downloads.
sgl-project
sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

298.9M 27K 6K
h2non
filetype

Small, dependency-free, fast Python package to infer binary file types checking the magic numbers signature

32.6M 761 120
vllm-project
vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

9.2M 79K 16K
OpenNMT
ctranslate2

Fast inference engine for Transformer models

8.4M 4K 478
SYSTRAN
faster-whisper

Faster Whisper transcription with CTranslate2

7.6M 23K 2K
pytorch
torchao

PyTorch native quantization and sparsity for training and inference

3.5M 3K 502
google
mediapipe

Cross-platform, customizable ML solutions for live and streaming media.

3.1M 35K 6K
huggingface
optimum

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

1.7M 3K 639
openvinotoolkit
openvino

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

1.4M 10K 3K
deepspeedai
deepspeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

1.3M 42K 5K
roboflow
inference-gpu

Turn any computer or edge device into a command center for your computer vision projects.

1.1M 2K 260
roboflow
inference-cli

Turn any computer or edge device into a command center for your computer vision projects.

836K 2K 260
NVIDIA
onnx-graphsurgeon

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

772K 13K 2K
vllm-project
vllm-omni

A framework for efficient model inference with omni-modality models

476K 5K 867
xaviviro
python-toon

🐍 TOON for Python (Token-Oriented Object Notation) Encoder/Decoder - Reduce LLM token costs by 30-60% with structured data.

361K 341 13
nvidia
tensorrt-cu12-bindings

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

347K 13K 2K
kvcache-ai
mooncake-transfer-engine

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

344K 5K 720
mozilla-ai
any-llm-sdk

Communicate with an LLM provider using a single interface

334K 2K 167
nvidia
tensorrt-cu12

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

283K 13K 2K
sgl-project
sglang-kernel

SGLang is a high-performance serving framework for large language models and multimodal models.

269K 27K 6K
awslabs
multi-model-server

Multi Model Server is a tool for serving neural net models for inference

258K 1K 229
sgl-project
sgl-kernel

SGLang is a high-performance serving framework for large language models and multimodal models.

257K 27K 6K
nvidia
tensorrt-cu12-libs

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

256K 13K 2K
awslabs
model-archiver

Multi Model Server is a tool for serving neural net models for inference

247K 1K 229
    • Data from PyPI, GitHub, ClickHouse, and BigQuery