175 dependents
| Package | Description | Downloads/month |
|---|---|---|
| TurboQuant KV cache compression plugin for vLLM — asymmetric K/V, 8 models valid... | 8K | |
| [ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI | 6K | |
| Text2Text Language Modeling Toolkit | 6K | |
| RankLLM is a Python toolkit for reproducible information retrieval research usin... | 5K | |
| [NeurIPS'24] HippoRAG is a novel RAG framework inspired by human long-term memor... | 5K | |
| The official zero-trust, high-throughput kinetic execution engine for the coreas... | 4K | |
| vLLM plugin for RBLN NPU | 4K | |
| vLLM plugin for Qwerky AI MambaInLlama hybrid models | 4K | |
| Serve MESA models locally | 3K | |
| A Python package for simulating hospital administrative tasks. | 3K | |
| vLLM plugin for Spyre hardware support | 3K | |
| Towards Human-Sounding Speech | 3K | |
| factory SDK | 3K | |
| SEC filings and Earnings call transcripts data | 3K | |
| Crilla is a simple way to introduce optimized single-GPU training into your proj... | 3K | |
| 2K | ||
| The easiest way to serve AI apps and models - Build Model Inference APIs, Job qu... | 2K | |
| 2K | ||
| Roboreason package | 2K | |
| Benchmarking the guided infilling models. | 2K | |
| A framework for optimizing DSPy programs with RL | 2K | |
| Agent-as-Annotators: Structured Distillation of Web Agent Capabilities | 1K | |
| Qwen3 ASR model for fasr | 1K | |
| vLLM adapter for a TGIS-compatible grpc server | 1K | |
| vLLM plugin for Spyre hardware support | 1K | |
| 1K | ||
| llama-index embeddings vllm integration | 1K | |
| This is a faster implementation for TTS models, to be used in highly async envir... | 1K | |
| An official repository for PatientSim package. | 1K | |
| A unified inference engine for large language models (LLMs) including open-sourc... | 1K | |
| An infinitely scalable text world for evaluating long-term memory in LLM agents | 983 | |
| Official implementation of TopicGPT: A Prompt-based Topic Modeling Framework (NA... | 947 | |
| Graph Foundation Model for Retrieval Augmented Generation | 945 | |
| 915 | ||
| An easy-to-extend LLM annotator for robust, resumable data annotation. | 901 | |
| happy_vllm is a REST API for vLLM, production ready | 894 | |
| Automatic configuration planner for vLLM - Eliminate the guesswork of configurin... | 891 | |
| A library helps to chat with all kinds of LLMs consistently. | 884 | |
| Converting longitudinal patient data into text for LLM-based event prediction an... | 869 | |
| Hackable RL post-training for LLMs | 808 | |
| 🌾 OAT: A research-friendly framework for LLM online alignment, including reinfor... | 788 | |
| LLAMP - Large Language Model for Planning | 736 | |
| vLLM plugin for interacting with activations during inference | 715 | |
| An asynchronous chat engine using vLLM with a async producer-consumer pattern. | 708 | |
| A Unified, Customizable, and High-Performing Open-Source Toolkit for Prompt Opti... | 705 | |
| ThinkBooster: a unified framework for test-time compute scaling of LLM reasoning | 683 | |
| A package for creating ML research assistant models through paper dataset creati... | 673 | |
| Efficient deep learning models for the edge. | 615 | |
| vLLM plugin: out-of-tree registration of canon-layer architectures (e.g. LlamaCa... | 576 | |
| 568 |