Gguf Python Packages

auto-round

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

71K 1K 125

quantcpp

LLM inference with 7x longer context. Pure C, zero dependencies. Lossless KV cache compression + single-header library.

38K 386 42

llmfit

Hundreds of models & providers. One command to find what runs on your hardware.

23K 25K 1K

gguf-connector

gguf (GPT-Generated Unified Format) connector

19K 55 11

auto-round-nightly

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

18K 1K 125

hf-mem

A CLI to estimate inference memory requirements for Hugging Face models, written in Python.

16K 914 83

webscout

Webscout is the all-in-one search and AI toolkit you need. Discover insights with Yep.com, DuckDuckGo, and Phind; access cutting-edge AI models; transcribe YouTube videos; generate temporary emails and phone numbers; perform text-to-speech conversions; and much more!

13K 344 63

soup-cli

Soup turns the pain of LLM fine-tuning into a simple workflow. One config, one command, done.

11K 53 7

jang

JANG — GGUF for MLX. YOU MUST USE JANG_Q RUNTIME. Adaptive Mixed-Precision Quantization + Runtime for Apple Silicon

9K 142 20

auto-round-lib

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

9K 1K 125

llama-cpp-py-sync

Auto-synced CFFI ABI python bindings for llama.cpp with prebuilt wheels (CPU/CUDA/Vulkan/Metal).

8K 4 1

outetts

Interface for OuteTTS models.

7K 1K 116

llama-buddy

CLI wrapper for llama.cpp providing an ollama-like experience

3K 8 0

llama-core

solo connector core built on llama.cpp

3K 1 1

gguf-core

a simple way to interact llama with gguf

3K 5 1

auto-round-hpu

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

2K 1K 125

gguf-node

gguf node for comfyui

2K 217 15

onnxifier

Convert ANY IR to ONNX format

2K 27 4

cgg

call GGUF model

1K 7 1

ledgermind

LedgerMind — an autonomous living memory for AI agents. It self-heals, resolves conflicts, distills experience into rules, and evolves without human intervention. SQLite + Git + reasoning layer. Perfect for multi-agent systems and on-device deployment.

1K 13 1

notolog

Notolog Markdown Editor

1K 25 6

openvino2onnx

Convert ANY IR to ONNX format

1K 27 4

gguf-comfy

run flux1/sd3 model with beginner GPU (low cost) or even CPU

939 12 4

ovos-gguf-embeddings-plugin

A gguf embeddings plugin for OVOS

932 0 0

Search Packages