Mixed-vendor GPU inference cluster manager with speculative decoding
Local AI load balancer for Ollama fleets — auto-discovery, smart routing, OpenAI-compatible API, zero config. Perfect for Mac Minis & Studios.
Small Language Model Inference, Fine-Tuning and Observability.