Apple Silicon Python Packages

mlx-vlm

MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.

349K 5K 506

mlx-audio

A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.

78K 7K 578

mflux

MLX native implementations of state-of-the-art generative image models

40K 2K 141

mlx-openai-server

A high-performance API server that provides OpenAI-compatible endpoints for MLX models. Developed using Python and powered by the FastAPI framework, it provides an efficient, scalable, and user-friendly solution for running MLX-based vision and language models locally with an OpenAI-compatible interface.

22K 325 58

rapid-mlx

The fastest local AI engine for Apple Silicon. 4.2x faster than Ollama, 0.08s cached TTFT, 100% tool calling. 17 tool parsers, prompt cache, reasoning separation, cloud routing. Drop-in OpenAI replacement. Works with Claude Code, Cursor, Aider.

18K 635 74

mlx-tune

Fine-tune LLMs on your Mac with Apple Silicon. SFT, DPO, GRPO, Vision, TTS, STT, Embedding, and OCR fine-tuning — natively on MLX. Unsloth-compatible API.

14K 1K 79

jang

JANG — GGUF for MLX. YOU MUST USE JANG_Q RUNTIME. Adaptive Mixed-Precision Quantization + Runtime for Apple Silicon

9K 142 20

asitop

Perf monitoring CLI tool for Apple Silicon

8K 5K 205

libstreamvbyte

A C++ implementation of StreamVByte, with Python bindings.

7K 10 1

llm-autotune

Zero-config local LLM optimization for Ollama, LM Studio, and Apple Silicon MLX. Reduces TTFT by 40%, wall time for local agents by 46%, and RAM usage by 3x.

7K 24 1

lightning-core

Lightning Core: macOS-first CUDA-style runtime with Metal backend

7K 1 0

zxc-compress

High-performance asymmetric lossless compression. 40%+ faster decompression than LZ4 on ARM64 with better compression ratios. Optimized for Game Assets, Firmware & App Bundles.

6K 333 7

vidlizer

Point it at a video, image, or PDF — get structured JSON. uvx vidlizer[mcp]. Runs local (Ollama/gemma4, LM Studio, oMLX) or cloud (OpenRouter). CLI + MCP server for Claude Code, Cursor, and Claude Desktop.

5K 1 0

jax-mps

A JAX backend for Apple Metal Performance Shaders (MPS), enabling GPU-accelerated JAX computations on Apple Silicon.

5K 122 13

seeklink

SeekLink — hybrid semantic search for markdown vaults. Four-channel RRF fusion, MLX reranker, native CJK support. Fully local.

3K 6 0

turboquant-mlx-full

Extreme weight and KV cache compression for LLMs on Apple Silicon (MLX implementation of Google's TurboQuant)

3K 8 1

sparsecore

Actually-sparse dynamic training for PyTorch. CPU-native, Apple Silicon first. Pluggable routers, drop-in SparseLinear.

3K 8 2

ollama-herd

Local AI load balancer for Ollama fleets — auto-discovery, smart routing, OpenAI-compatible API, zero config. Perfect for Mac Minis & Studios.

3K 7 0

macfleet

Pool Apple Silicon Macs for distributed compute and ML training

2K 0 0

brain-mcp

Your AI has amnesia. Persistent memory and cognitive context for AI. 25 MCP tools. 12ms recall.

2K 44 13

m5-infer

Extraordinary speed, extraordinary quality — an MLX-based inference engine for Apple Silicon.

2K 0 1

mlx-stack

CLI control plane for local LLM infrastructure on Apple Silicon

2K 4 0

tqai

TurboQuant KV cache compression for local LLM inference

2K 1 0

mlx-flash

Run AI models too large for your Mac's memory — expert caching, speculative execution, and 15+ research techniques for MoE inference on Apple Silicon

1K 2 0

Search Packages