65 dependents
| Package | Description | Downloads/month |
|---|---|---|
| SGLang is a high-performance serving framework for large language models and mul... | 287.7M | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 9.4M | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 143K | |
| Invoke is a leading creative engine for Stable Diffusion models, empowering prof... | 77K | |
| Large Language Model Develop Toolkit | 45K | |
| Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... | 31K | |
| Terminal-first local search and AI chat over your documents, code, and crawled w... | 20K | |
| Download and quantize the HyperNix PyTorch model to GGUF (fp32/fp16/Q8_0/Q6_K/Q4... | 9K | |
| Python library for Synthetic Data Generation | 7K | |
| Large-scale LLM inference engine | 7K | |
| InstructLab Core package. Use this to chat with a model and execute the Instruc... | 7K | |
| Modular GGUF patching tool for Ollama multimodal monoliths | 5K | |
| SGLang is a high-performance serving framework for large language models and mul... | 4K | |
| 4K | ||
| KT-Kernel: High-performance kernel operations for KTransformers (AMX/AVX/KML opt... | 4K | |
| Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... | 3K | |
| vLLM CPU inference engine (AVX512 + VNNI optimized) | 3K | |
| Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... | 3K | |
| vLLM CPU inference engine (AVX512 optimized) | 2K | |
| Create your own local code autocomplete model, fine-tuned on your custom code re... | 1K | |
| INF Tech's open-source MLLMs for SOTA visual-language understanding and advanced... | 1K | |
| Add vision to any local LLM, no training. | 1K | |
| The Asynchronous Data Dynamo and Graph Neural Network Catalyst | 1K | |
| gguf conversion util | 1K | |
| Extension tools for akasha-terminal | 1K | |
| T5-Gemma 2 statement extraction demo - Extract structured statements from text | 1K | |
| Generate and edit images with Qwen models on Apple Silicon (MPS) and other devic... | 948 | |
| A simple tool to trans a ollama gguf to hf model.safetensors and otherwise. | 800 | |
| 713 | ||
| General Information, model certifications, and benchmarks for nm-vllm enterprise... | 666 | |
| Generate a llama-quantize command to copy the quantization parameters of any GGU... | 609 | |
| An easier, safer way to write AI apps | 595 | |
| representation engineering / control vectors | 572 | |
| Project VAIL Model Registry | 495 | |
| vLLM Kunlun3 backend plugin | 464 | |
| A PDF document RAG MCP that is easy to setup, supports completely local parsing ... | 439 | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 437 | |
| LLM-Based Neural Network Generator | 391 | |
| AI Image Generation Without the UI Tax - Z-Image-Turbo + Qwen3-4B | 380 | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 375 | |
| AMD-SHARK Inference Modeling and Serving | 373 | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 345 | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 344 | |
| Atomic Model Fragmentation (AMF) — Molecular Inference Engine for resource-const... | 315 | |
| Tool for auto-generating README documentation for code repositories | 229 | |
| A modern GUI tool to convert HuggingFace safetensors to GGUF | 214 | |
| Hardware-aware CLI that selects the best runtime and quantization for efficient ... | 187 | |
| SGLang fork for ppc64le with CUDA 12.4 and Torch Triton support | 186 | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 176 | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 132 |