Actually-sparse dynamic training for PyTorch. CPU-native, Apple Silicon first. Pluggable routers, drop-in SparseLinear.
Dynamic sparse training for PyTorch, CPU-native, Apple Silicon first.
High-performance CPU KV-cache quantization engine for LLM inference (~10× speedup, 4× memory reduction) with Python & PyTorch support.