49 dependents
| Package | Description | Downloads/month |
|---|---|---|
| SGLang is a high-performance serving framework for large language models and mul... | 287.7M | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 9.4M | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 143K | |
| A toolset for compressing, deploying and serving LLM | 123K | |
| a simple and powerful tool to get things done with AI | 79K | |
| Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... | 31K | |
| TensorRT LLM provides users with an easy-to-use Python API to define Large Langu... | 16K | |
| MindRoot AI Agent Framework | 14K | |
| Accelerate, Optimize performance with streamlined training and serving options w... | 9K | |
| Large-scale LLM inference engine | 7K | |
| The official zero-trust, high-throughput kinetic execution engine for the coreas... | 4K | |
| SGLang is a high-performance serving framework for large language models and mul... | 4K | |
| Python component of using Briton | 4K | |
| Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... | 3K | |
| Offline voice agent framework for robots. | 3K | |
| Open-source framework for building AI-powered apps in JavaScript, Go, and Python... | 3K | |
| fastllm是后端无依赖的高性能大模型推理库。同时支持张量并行推理稠密模型和混合模式推理MOE模型,任意10G以上显卡即可推理满血DeepSeek。双路900... | 3K | |
| vLLM CPU inference engine (AVX512 + VNNI optimized) | 3K | |
| Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... | 3K | |
| vLLM CPU inference engine (AVX512 optimized) | 2K | |
| FuriosaAI SDK | 2K | |
| INF Tech's open-source MLLMs for SOTA visual-language understanding and advanced... | 1K | |
| A complete terminal implementation of Anthropic's Claude. | 1K | |
| useful utilities for prompt engineering | 1K | |
| A toolset for compressing, deploying and serving LLM | 887 | |
| Siada CLI is a Ai Pair Programming Tool in terminal | 689 | |
| General Information, model certifications, and benchmarks for nm-vllm enterprise... | 666 | |
| vLLM Kunlun3 backend plugin | 464 | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 437 | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 375 | |
| 353 | ||
| A high-throughput and memory-efficient inference and serving engine for LLMs | 344 | |
| JAX backend for SGL | 295 | |
| Modular Multimodal Intelligent Reformatting and Augmentation Generation Engine -... | 260 | |
| A tool for LLM agent conversations | 219 | |
| SGLang is yet another fast serving framework for large language models and visio... | 209 | |
| SkillEngine — framework-agnostic skills engine for LLM agents. Claude Code-like ... | 192 | |
| A minimal wrapper for the google gemini (google-genai) API | 189 | |
| SGLang fork for ppc64le with CUDA 12.4 and Torch Triton support | 186 | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 176 | |
| Convert infrastructure scans into various output formats such as Markdown tables... | 151 | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 132 | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 115 | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 80 | |
| Inferencing and Training Large Language Model Tasks | 73 | |
| Genkit AI Framework | 70 | |
| An agent framework using LLMs | 56 | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 42 | |
| SGLang is a fast serving framework for large language models and vision language... | 2 |