Dependents of gguf - PyPI Stats

65 dependents

Package	Description	Downloads/month
sglang	SGLang is a high-performance serving framework for large language models and mul...	287.7M
vllm	A high-throughput and memory-efficient inference and serving engine for LLMs	9.4M
vllm-tpu	A high-throughput and memory-efficient inference and serving engine for LLMs	143K
invokeai	Invoke is a leading creative engine for Stable Diffusion models, empowering prof...	77K
bigdl-core-cpp	Large Language Model Develop Toolkit	45K
vllm-cpu	Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe...	31K
lilbee	Terminal-first local search and AI chat over your documents, code, and crawled w...	20K
hypernix	Download and quantize the HyperNix PyTorch model to GGUF (fp32/fp16/Q8_0/Q6_K/Q4...	9K
instructlab-sdg	Python library for Synthetic Data Generation	7K
aphrodite-engine	Large-scale LLM inference engine	7K
instructlab	InstructLab Core package. Use this to chat with a model and execute the Instruc...	7K
tsunagi-ollama-bridge	Modular GGUF patching tool for Ollama multimodal monoliths	5K
sglang-kt	SGLang is a high-performance serving framework for large language models and mul...	4K
diffsynth-engine		4K
kt-kernel	KT-Kernel: High-performance kernel operations for KTransformers (AMX/AVX/KML opt...	4K
vllm-cpu-avx512bf16	Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe...	3K
vllm-cpu-avx512vnni	vLLM CPU inference engine (AVX512 + VNNI optimized)	3K
vllm-cpu-amxbf16	Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe...	3K
vllm-cpu-avx512	vLLM CPU inference engine (AVX512 optimized)	2K
codefinetuner	Create your own local code autocomplete model, fine-tuned on your custom code re...	1K
infinity-parser2	INF Tech's open-source MLLMs for SOTA visual-language understanding and advanced...	1K
globalmm	Add vision to any local LLM, no training.	1K
querent	The Asynchronous Data Dynamo and Graph Neural Network Catalyst	1K
llama-cpp-conv	gguf conversion util	1K
akasha-plus	Extension tools for akasha-terminal	1K
corp-extractor	T5-Gemma 2 statement extraction demo - Extract structured statements from text	1K
qwen-image-mps	Generate and edit images with Qwen models on Apple Silicon (MPS) and other devic...	948
fanfu	A simple tool to trans a ollama gguf to hf model.safetensors and otherwise.	800
dialz		713
nm-vllm	General Information, model certifications, and benchmarks for nm-vllm enterprise...	666
quant-clone	Generate a llama-quantize command to copy the quantization parameters of any GGU...	609
yera	An easier, safer way to write AI apps	595
repeng	representation engineering / control vectors	572
vail-model-registry	Project VAIL Model Registry	495
vllm-kunlun	vLLM Kunlun3 backend plugin	464
pdfkb-mcp	A PDF document RAG MCP that is easy to setup, supports completely local parsing ...	439
vllm-hust	A high-throughput and memory-efficient inference and serving engine for LLMs	437
nn-gpt	LLM-Based Neural Network Generator	391
z-explorer	AI Image Generation Without the UI Tax - Z-Image-Turbo + Qwen3-4B	380
wxy-test	A high-throughput and memory-efficient inference and serving engine for LLMs	375
sharktank	AMD-SHARK Inference Modeling and Serving	373
vllm-xft	A high-throughput and memory-efficient inference and serving engine for LLMs	345
ai-dynamo-vllm	A high-throughput and memory-efficient inference and serving engine for LLMs	344
amf-core	Atomic Model Fragmentation (AMF) — Molecular Inference Engine for resource-const...	315
readme-ready	Tool for auto-generating README documentation for code repositories	229
gguf2gui	A modern GUI tool to convert HuggingFace safetensors to GGUF	214
vectorprime	Hardware-aware CLI that selects the best runtime and quantization for efficient ...	187
power-sglang-cuda124	SGLang fork for ppc64le with CUDA 12.4 and Torch Triton support	186
vllm-rocm	A high-throughput and memory-efficient inference and serving engine for LLMs	176
vllm-emissary	A high-throughput and memory-efficient inference and serving engine for LLMs	132