Local Llm Python Packages

cafitac-hermit-agent

Quiet MCP executor for Claude Code and Codex. Offload edits, tests, refactors, and commits to cheaper local or flat-rate models.

24K 3 1

lilbee

Terminal-first local search and AI chat over your documents, code, and crawled websites. Semantic + hybrid search, vision OCR, auto-built wiki, browsable GGUF model catalog. Works as CLI, TUI, MCP server, REST API, or Python library. Offline by default, no sidecar services.

20K 16 3

rapid-mlx

The fastest local AI engine for Apple Silicon. 4.2x faster than Ollama, 0.08s cached TTFT, 100% tool calling. 17 tool parsers, prompt cache, reasoning separation, cloud routing. Drop-in OpenAI replacement. Works with Claude Code, Cursor, Aider.

18K 635 74

local-deep-research

Local Deep Research achieves ~95% on SimpleQA benchmark (tested with Qwen 3.6). Supports local and cloud LLMs (Ollama, Google, Anthropic, ...). Searches 10+ sources - arXiv, PubMed, web, and your private documents. Everything Local & Encrypted.

15K 5K 427

mlx-tune

Fine-tune LLMs on your Mac with Apple Silicon. SFT, DPO, GRPO, Vision, TTS, STT, Embedding, and OCR fine-tuning — natively on MLX. Unsloth-compatible API.

14K 1K 79

soup-cli

Soup turns the pain of LLM fine-tuning into a simple workflow. One config, one command, done.

11K 53 7

m3-memory

Local-first Agentic Memory Layer Framework for MCP Agents and Multiple Computers • Over 60 tools • Hybrid search (FTS5 + vector + MMR) • GDPR • 100% local

9K 11 1

mcp-client-for-ollama

A text-based user interface (TUI) client for interacting with MCP servers using Ollama. Features include agent mode, multi-server, model switching, streaming responses, tool management, human-in-the-loop, thinking mode, model params config, MCP prompts, custom system prompt and saved preferences. Built for developers working with local LLMs.

8K 684 92

bazinga-indeed

BAZINGA - Distributed AI that belongs to everyone

7K 1 0

llm-autotune

Zero-config local LLM optimization for Ollama, LM Studio, and Apple Silicon MLX. Reduces TTFT by 40%, wall time for local agents by 46%, and RAM usage by 3x.

7K 24 1

ollmcp

7K 684 92

obsidian-llm-wiki

Karpathy’s LLM Wiki, 100% local with Ollama. Drop Markdown notes → AI extracts concepts → your Obsidian wiki auto-links and grows. Zero sharing. Your notes stay yours.

7K 458 80

vidchain

✅A Lightweight Video RAG Framework for Multimodal Reasoning

5K 1 0

palacelite

轻量级本地 AI 记忆系统

4K 3 0

llcat

/usr/bin/cat for LLMs

4K 33 1

cupel

separates precious LLMs from base LLMs. works with any OpenAI/Anthropic compatible API

4K 45 0

llama-buddy

CLI wrapper for llama.cpp providing an ollama-like experience

3K 8 0

tightwad

Mixed-vendor GPU inference cluster manager with speculative decoding

3K 20 2

ollama-herd

Local AI load balancer for Ollama fleets — auto-discovery, smart routing, OpenAI-compatible API, zero config. Perfect for Mac Minis & Studios.

3K 7 0

ragscore

⚡️ The "1-Minute RAG Audit" — Generate QA datasets & evaluate RAG systems in Colab, Jupyter, or CLI. Privacy-first, async, visual reports.

2K 31 5

pullnexus

Pull-on-demand skill registry for local LLMs. Search, install, and contribute skills, tools, datasets, and playbooks for Ollama, LM Studio, and any local AI setup.

2K 1 0

ollama-mcp-bridge

Extend the Ollama API with dynamic AI tool integration from multiple MCP (Model Context Protocol) servers. Fully compatible, transparent, and developer-friendly, ideal for building powerful local LLM applications, AI agents, and custom chatbots

2K 83 25

local-ai-agent-orchestrator

Multi-agent local LLM orchestrator for LM Studio: planner→coder→reviewer, SQLite queue, memory-safe model swaps. GPL-3.0 · CLI: lao

2K 2 0

m5-infer

Extraordinary speed, extraordinary quality — an MLX-based inference engine for Apple Silicon.

2K 0 1

Search Packages