16 dependents
| Package | Description | Downloads/month |
|---|---|---|
| SGLang is a high-performance serving framework for large language models and mul... | 287.7M | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 9.4M | |
| TensorRT LLM provides users with an easy-to-use Python API to define Large Langu... | 16K | |
| Large-scale LLM inference engine | 7K | |
| Smash your AI models - Pro Version | 6K | |
| SGLang is a high-performance serving framework for large language models and mul... | 4K | |
| Building the Virtuous Cycle for AI-driven LLM Systems | 2K | |
| NeMo Export and Deploy - a library to export and deploy LLMs and MMs | 1K | |
| Single-user optimized inference wrapper for ExLlamaV3 | 260 | |
| The little (LLM) engine that could! | 206 | |
| A multi-dimensional information extraction system that uses LLMs to extract temp... | 187 | |
| SGLang fork for ppc64le with CUDA 12.4 and Torch Triton support | 186 | |
| Kimi K2.5 (1.1T) optimization suite for RTX 3090 with aggressive RAM optimizatio... | 138 | |
| A high-throughput and low-latency LLM inference system | 125 | |
| A serving system for speech language models. | 109 | |
| A high-performance inference framework for large language models, focusing on ef... | 75 |