12 dependents
Package Description Downloads/month
Community maintained hardware plugin for vLLM on Ascend 7K
DeepLink Inference Extension 1K
SiliconDiff-NPU 580
High-performance FlashAttention implementation for Ascend NPU 486
Ascend End-to-End Large Model Training Adaptation Framework Based on torchtitan 481
昇腾快速迁移适配包 387
A high-throughput and memory-efficient inference and serving engine for LLMs 180
a lightweight vLLM implementation built from scratch and runs on NPU. 137
triton for dsa 72
The openmind-accelerate is a product which allows you to use NVIDIA Megatron-LM ... 63
A high-throughput and memory-efficient inference and serving engine for LLMs 51
vLLM Ascend backend plugin 36