16 dependents
Package Description Downloads/month
SGLang is a high-performance serving framework for large language models and mul... 287.7M
A high-throughput and memory-efficient inference and serving engine for LLMs 9.4M
TensorRT LLM provides users with an easy-to-use Python API to define Large Langu... 16K
Large-scale LLM inference engine 7K
Smash your AI models - Pro Version 6K
SGLang is a high-performance serving framework for large language models and mul... 4K
Building the Virtuous Cycle for AI-driven LLM Systems 2K
NeMo Export and Deploy - a library to export and deploy LLMs and MMs 1K
Single-user optimized inference wrapper for ExLlamaV3 260
The little (LLM) engine that could! 206
A multi-dimensional information extraction system that uses LLMs to extract temp... 187
SGLang fork for ppc64le with CUDA 12.4 and Torch Triton support 186
Kimi K2.5 (1.1T) optimization suite for RTX 3090 with aggressive RAM optimizatio... 138
A high-throughput and low-latency LLM inference system 125
A serving system for speech language models. 109
A high-performance inference framework for large language models, focusing on ef... 75