PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
intel
auto-round

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

71K 1K 125
quantumaikr
quantcpp

LLM inference with 7x longer context. Pure C, zero dependencies. Lossless KV cache compression + single-header library.

38K 386 42
AlexsJones
llmfit

Hundreds of models & providers. One command to find what runs on your hardware.

23K 25K 1K
calcuis
gguf-connector

gguf (GPT-Generated Unified Format) connector

19K 55 11
intel
auto-round-nightly

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

18K 1K 125
alvarobartt
hf-mem

A CLI to estimate inference memory requirements for Hugging Face models, written in Python.

16K 914 83
OEvortex
webscout

Webscout is the all-in-one search and AI toolkit you need. Discover insights with Yep.com, DuckDuckGo, and Phind; access cutting-edge AI models; transcribe YouTube videos; generate temporary emails and phone numbers; perform text-to-speech conversions; and much more!

13K 344 63
MakazhanAlpamys
soup-cli

Soup turns the pain of LLM fine-tuning into a simple workflow. One config, one command, done.

11K 53 7
jjang-ai
jang

JANG — GGUF for MLX. YOU MUST USE JANG_Q RUNTIME. Adaptive Mixed-Precision Quantization + Runtime for Apple Silicon

9K 142 20
intel
auto-round-lib

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

9K 1K 125
FarisZahrani
llama-cpp-py-sync

Auto-synced CFFI ABI python bindings for llama.cpp with prebuilt wheels (CPU/CUDA/Vulkan/Metal).

8K 4 1
edwko
outetts

Interface for OuteTTS models.

7K 1K 116
thilomichael
llama-buddy

CLI wrapper for llama.cpp providing an ollama-like experience

3K 8 0
calcuis
llama-core

solo connector core built on llama.cpp

3K 1 1
calcuis
gguf-core

a simple way to interact llama with gguf

3K 5 1
intel
auto-round-hpu

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

2K 1K 125
calcuis
gguf-node

gguf node for comfyui

2K 217 15
LoSealL
onnxifier

Convert ANY IR to ONNX format

2K 27 4
calcuis
cgg

call GGUF model

1K 7 1
sl4m3
ledgermind

​LedgerMind — an autonomous living memory for AI agents. It self-heals, resolves conflicts, distills experience into rules, and evolves without human intervention. SQLite + Git + reasoning layer. Perfect for multi-agent systems and on-device deployment.

1K 13 1
notolog
notolog

Notolog Markdown Editor

1K 25 6
LoSealL
openvino2onnx

Convert ANY IR to ONNX format

1K 27 4
calcuis
gguf-comfy

run flux1/sd3 model with beginner GPU (low cost) or even CPU

939 12 4
TigreGotico
ovos-gguf-embeddings-plugin

A gguf embeddings plugin for OVOS

932 0 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery