35 dependents
| Package | Description | Downloads/month |
|---|---|---|
| SGLang is a high-performance serving framework for large language models and mul... | 287.7M | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 9.4M | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 143K | |
| A collection of reference inference implementations for gpt-oss by OpenAI | 129K | |
| A toolset for compressing, deploying and serving LLM | 123K | |
| Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... | 31K | |
| A high-performance API server that provides OpenAI-compatible endpoints for MLX ... | 22K | |
| TensorRT LLM provides users with an easy-to-use Python API to define Large Langu... | 16K | |
| An open platform for training, serving, and evaluating large language model base... | 15K | |
| Large-scale LLM inference engine | 7K | |
| SGLang is a high-performance serving framework for large language models and mul... | 4K | |
| A flexible command-line chat loop framework for building AI agents with support ... | 3K | |
| Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... | 3K | |
| vLLM CPU inference engine (AVX512 + VNNI optimized) | 3K | |
| Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... | 3K | |
| rnow CLI - Reinforcement Learning platform command-line interface | 2K | |
| vLLM CPU inference engine (AVX512 optimized) | 2K | |
| A fully local GPU poor, multimodal Retrieval-Augmented Generation (RAG) system ... | 2K | |
| FuriosaAI SDK | 2K | |
| INF Tech's open-source MLLMs for SOTA visual-language understanding and advanced... | 1K | |
| A toolset for compressing, deploying and serving LLM | 887 | |
| A fully local GPU poor, multimodal Retrieval-Augmented Generation (RAG) system ... | 496 | |
| vLLM Kunlun3 backend plugin | 464 | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 437 | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 375 | |
| A local/offline-capable voice assistant with speech recognition, LLM processing,... | 324 | |
| Minimal OpenAI-compatible server for GPT-OSS models on Apple Silicon | 281 | |
| LLM Inference via Triton (Flexible & Modular): Focused on Kernel Optimization us... | 243 | |
| SGLang fork for ppc64le with CUDA 12.4 and Torch Triton support | 186 | |
| Any Message Could be a String, for LLM Usage | 180 | |
| AI Agent with intelligent planning and reasoning | 144 | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 115 | |
| Add your description here | 94 | |
| 71 | ||
| SGLang is a fast serving framework for large language models and vision language... | 2 |