Local inference server for Apple Silicon — hot-swaps MLX models (LLM, vision, embeddings, TTS, STT) via OpenAI API