PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
OpenNMT
ctranslate2

Fast inference engine for Transformer models

8.3M 4K 478
SYSTRAN
faster-whisper

Faster Whisper transcription with CTranslate2

7.4M 23K 2K
bitsandbytes-foundation
bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

6.5M 8K 847
pytorch
torchao

PyTorch native quantization and sparsity for training and inference

3.4M 3K 502
huggingface
optimum

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

1.7M 3K 639
PINTO0309
onnx2tf

A tool for converting ONNX files to LiteRT/TFLite/TensorFlow, PyTorch native code (nn.Module), TorchScript (.pt), state_dict (.pt), Exported Program (.pt2), and Dynamo ONNX. It also supports direct conversion from LiteRT to PyTorch.

1.5M 953 99
openvinotoolkit
nncf

Neural Network Compression Framework for enhanced OpenVINO™ inference

456K 1K 293
vllm-project
llmcompressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

285K 3K 498
huggingface
optimum-quanto

A pytorch quantization backend for optimum

267K 1K 86
thu-ml
sageattention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

156K 3K 403
tensorflow
tensorflow-model-optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.

105K 2K 347
PanQiWei
auto-gptq

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

75K 5K 540
intel
auto-round

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

71K 1K 125
natasha
navec

Compact high quality word embeddings for Russian language

51K 218 19
ModelCloud
gptqmodel

LLM model quantization (compression) toolkit with HW acceleration support for Nvidia, AMD, Intel GPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

38K 1K 185
quantumaikr
quantcpp

LLM inference with 7x longer context. Pure C, zero dependencies. Lossless KV cache compression + single-header library.

38K 386 42
intel
intel-extension-for-pytorch

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

36K 2K 315
jyunming
tqdb

Embedded vector database in Rust with Python bindings — TurboQuant algorithm (arXiv:2504.19874), zero training, 2–4 bit compression, HNSW ANN search, WAL persistence

31K 2 0
quic
aimet-torch

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

29K 3K 450
hiyouga
llamafactory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

29K 71K 9K
intel
neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

22K 3K 304
fastmachinelearning
qonnx

QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX

21K 184 57
Xilinx
brevitas

Brevitas: neural network quantization in PyTorch

21K 2K 243
quic
aimet-onnx

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

19K 3K 450
    • Data from PyPI, GitHub, ClickHouse, and BigQuery