65 dependents
Package Description Downloads/month
SGLang is a high-performance serving framework for large language models and mul... 287.7M
A high-throughput and memory-efficient inference and serving engine for LLMs 9.4M
A high-throughput and memory-efficient inference and serving engine for LLMs 143K
Invoke is a leading creative engine for Stable Diffusion models, empowering prof... 77K
Large Language Model Develop Toolkit 45K
Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... 31K
Terminal-first local search and AI chat over your documents, code, and crawled w... 20K
Download and quantize the HyperNix PyTorch model to GGUF (fp32/fp16/Q8_0/Q6_K/Q4... 9K
Python library for Synthetic Data Generation 7K
Large-scale LLM inference engine 7K
InstructLab Core package. Use this to chat with a model and execute the Instruc... 7K
Modular GGUF patching tool for Ollama multimodal monoliths 5K
SGLang is a high-performance serving framework for large language models and mul... 4K
4K
KT-Kernel: High-performance kernel operations for KTransformers (AMX/AVX/KML opt... 4K
Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... 3K
vLLM CPU inference engine (AVX512 + VNNI optimized) 3K
Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... 3K
vLLM CPU inference engine (AVX512 optimized) 2K
Create your own local code autocomplete model, fine-tuned on your custom code re... 1K
INF Tech's open-source MLLMs for SOTA visual-language understanding and advanced... 1K
Add vision to any local LLM, no training. 1K
The Asynchronous Data Dynamo and Graph Neural Network Catalyst 1K
gguf conversion util 1K
Extension tools for akasha-terminal 1K
T5-Gemma 2 statement extraction demo - Extract structured statements from text 1K
Generate and edit images with Qwen models on Apple Silicon (MPS) and other devic... 948
A simple tool to trans a ollama gguf to hf model.safetensors and otherwise. 800
713
General Information, model certifications, and benchmarks for nm-vllm enterprise... 666
Generate a llama-quantize command to copy the quantization parameters of any GGU... 609
An easier, safer way to write AI apps 595
representation engineering / control vectors 572
Project VAIL Model Registry 495
vLLM Kunlun3 backend plugin 464
A PDF document RAG MCP that is easy to setup, supports completely local parsing ... 439
A high-throughput and memory-efficient inference and serving engine for LLMs 437
LLM-Based Neural Network Generator 391
AI Image Generation Without the UI Tax - Z-Image-Turbo + Qwen3-4B 380
A high-throughput and memory-efficient inference and serving engine for LLMs 375
AMD-SHARK Inference Modeling and Serving 373
A high-throughput and memory-efficient inference and serving engine for LLMs 345
A high-throughput and memory-efficient inference and serving engine for LLMs 344
Atomic Model Fragmentation (AMF) — Molecular Inference Engine for resource-const... 315
Tool for auto-generating README documentation for code repositories 229
A modern GUI tool to convert HuggingFace safetensors to GGUF 214
Hardware-aware CLI that selects the best runtime and quantization for efficient ... 187
SGLang fork for ppc64le with CUDA 12.4 and Torch Triton support 186
A high-throughput and memory-efficient inference and serving engine for LLMs 176
A high-throughput and memory-efficient inference and serving engine for LLMs 132