Llama Cpp Python Packages

cyllama

A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp

35K 25 19

lilbee

Terminal-first local search and AI chat over your documents, code, and crawled websites. Semantic + hybrid search, vision OCR, auto-built wiki, browsable GGUF model catalog. Works as CLI, TUI, MCP server, REST API, or Python library. Offline by default, no sidecar services.

20K 16 3

cyllama-rocm

A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp

15K 25 19

cyllama-cuda12

A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp

15K 25 19

cyllama-vulkan

A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp

14K 25 19

cyllama-sycl

A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp

13K 25 19

llama-cpp-py-sync

Auto-synced CFFI ABI python bindings for llama.cpp with prebuilt wheels (CPU/CUDA/Vulkan/Metal).

8K 4 1

palacelite

轻量级本地 AI 记忆系统

4K 3 0

egovault

Local-first personal data vault with LLM enrichment and RAG chat

3K 0 0

llama-buddy

CLI wrapper for llama.cpp providing an ollama-like experience

3K 8 0

edgevox

Offline voice agent framework for robots.

3K 4 0

tightwad

Mixed-vendor GPU inference cluster manager with speculative decoding

3K 20 2

sollol

Super Ollama Load Balancer - Performance-aware routing for distributed Ollama deployments with Ray, Dask, and adaptive metrics

2K 4 2

mycellm

Distributed LLM inference across heterogeneous hardware. Pool GPUs into a P2P network with QUIC transport, Ed25519 identity, and an OpenAI-compatible API.

2K 4 1

forge-guardrails

A Python framework for self-hosted LLM tool-calling and multi-step agentic workflows

2K 1 0

inzen-bot

Multi-provider AI chatbot for the terminal. Chat with Claude, GPT-4, Ollama or llama.cpp without leaving your shell. Full conversation history. pip install inzen-bot

1K 0 0

openai-agents-redis

Native OpenAI Agents SDK session management implementation using Redis as the persistence layer.

1K 16 4

notolog

Notolog Markdown Editor

1K 25 6

grammarflow

Powering Agent Chains by Constraining LLM Outputs

699 9 0

llama-server

LLaMA Server combines the power of LLaMA C++ with the beauty of Chatbot UI.

695 135 14

langport

A large language model serving platform.

597 94 13

inferna

an early-stage experimental nanobind wrapper around llama.cpp

531 0 0

mcp-llama-swap

MCP server for hot-swapping llama.cpp models in Claude Code - launchctl (macOS) + systemd (Linux)

463 5 1

cognitive-cache

Optimal context window selection for LLM coding tools. Treats context as a constrained optimization problem, not retrieval. Beats RAG, grep, and LLM-triage baselines on real GitHub issues.

351 2 0

Search Packages