12 dependents
| Package | Description | Downloads/month |
|---|---|---|
| Community maintained hardware plugin for vLLM on Ascend | 7K | |
| DeepLink Inference Extension | 1K | |
| SiliconDiff-NPU | 580 | |
| High-performance FlashAttention implementation for Ascend NPU | 486 | |
| Ascend End-to-End Large Model Training Adaptation Framework Based on torchtitan | 481 | |
| 昇腾快速迁移适配包 | 387 | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 180 | |
| a lightweight vLLM implementation built from scratch and runs on NPU. | 137 | |
| triton for dsa | 72 | |
| The openmind-accelerate is a product which allows you to use NVIDIA Megatron-LM ... | 63 | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 51 | |
| vLLM Ascend backend plugin | 36 |