Gptq Python Packages

gptqmodel

LLM model quantization (compression) toolkit with HW acceleration support for Nvidia, AMD, Intel GPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

38K 1K 185

neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

22K 3K 304

neural-compressor-pt

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

1K 3K 304

neural-compressor-tf

Repository of Intel® Neural Compressor

739 3K 304

quantbenchx

Quantization quality analyzer - benchmark GGUF/GPTQ/AWQ quantization accuracy.

716 1 0

model-quantizer

A tool for quantizing large language models

530 2 0

neural-compressor-full

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

476 3K 304

bampe-weights

This repository is for profiling, extracting, visualizing and reusing generative AI weights to hopefully build more accurate AI models and audit/scan weights at rest to identify knowledge domains for risk(s).

463 10 0

xllm

Simple & Cutting Edge LLM Finetuning

458 408 21

neural-solution

Repository of Intel® Neural Compressor

384 3K 304

lpot

Repository of Intel® Low Precision Optimization Tool

384 3K 302

neural-insights

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

340 3K 304

quantcrush

Crush any LLM to 6x smaller in one command. GGUF, GPTQ, AWQ.

121 4 1

ilit

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

94 3K 304

neural-compressor-3x-tf

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

14 3K 304

neural-compressor-3x-pt

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

13 3K 304

neural-compressor-3x-ort

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

6 3K 304

Search Packages