Efficient Inference Python Packages

deepcache

[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free

62K 963 52

picollm

On-device LLM Inference Powered by X-Bit Quantization

980 311 25

speculative-decoding

Explorations into some recent techniques surrounding speculative decoding

639 301 24

picollmdemo

picoLLM Inference Engine demos

424 311 25

Search Packages