35 dependents
Package Description Downloads/month
SGLang is a high-performance serving framework for large language models and mul... 287.7M
A high-throughput and memory-efficient inference and serving engine for LLMs 9.4M
A high-throughput and memory-efficient inference and serving engine for LLMs 143K
A collection of reference inference implementations for gpt-oss by OpenAI 129K
A toolset for compressing, deploying and serving LLM 123K
Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... 31K
A high-performance API server that provides OpenAI-compatible endpoints for MLX ... 22K
TensorRT LLM provides users with an easy-to-use Python API to define Large Langu... 16K
An open platform for training, serving, and evaluating large language model base... 15K
Large-scale LLM inference engine 7K
SGLang is a high-performance serving framework for large language models and mul... 4K
A flexible command-line chat loop framework for building AI agents with support ... 3K
Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... 3K
vLLM CPU inference engine (AVX512 + VNNI optimized) 3K
Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... 3K
rnow CLI - Reinforcement Learning platform command-line interface 2K
vLLM CPU inference engine (AVX512 optimized) 2K
A fully local GPU poor, multimodal Retrieval-Augmented Generation (RAG) system ... 2K
FuriosaAI SDK 2K
INF Tech's open-source MLLMs for SOTA visual-language understanding and advanced... 1K
A toolset for compressing, deploying and serving LLM 887
A fully local GPU poor, multimodal Retrieval-Augmented Generation (RAG) system ... 496
vLLM Kunlun3 backend plugin 464
A high-throughput and memory-efficient inference and serving engine for LLMs 437
A high-throughput and memory-efficient inference and serving engine for LLMs 375
A local/offline-capable voice assistant with speech recognition, LLM processing,... 324
Minimal OpenAI-compatible server for GPT-OSS models on Apple Silicon 281
LLM Inference via Triton (Flexible & Modular): Focused on Kernel Optimization us... 243
SGLang fork for ppc64le with CUDA 12.4 and Torch Triton support 186
Any Message Could be a String, for LLM Usage 180
AI Agent with intelligent planning and reasoning 144
A high-throughput and memory-efficient inference and serving engine for LLMs 115
Add your description here 94
71
SGLang is a fast serving framework for large language models and vision language... 2