Post Training Quantization Python Packages

neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

22K 3K 304

micronet

A model compression and deploy lib.

2K 2K 477

neural-compressor-pt

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

1K 3K 304

neural-compressor-tf

Repository of Intel® Neural Compressor

739 3K 304

neural-compressor-full

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

476 3K 304

neural-solution

Repository of Intel® Neural Compressor

384 3K 304

lpot

Repository of Intel® Low Precision Optimization Tool

384 3K 302

auto-adpq

This repo aims at replicating: "AdpQ: A Zero-shot Calibration Free Adaptive Post Training Quantization Method for LLMs" https://arxiv.org/abs/2405.13358

360 1 0

neural-insights

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

340 3K 304

ilit

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

94 3K 304

neural-compressor-3x-tf

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

14 3K 304

neural-compressor-3x-pt

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

13 3K 304

neural-compressor-3x-ort

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

6 3K 304

Search Packages