[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
On-device LLM Inference Powered by X-Bit Quantization
Explorations into some recent techniques surrounding speculative decoding
picoLLM Inference Engine demos