PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
shakfu
cyllama

A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp

35K 25 19
tobocop2
lilbee

Terminal-first local search and AI chat over your documents, code, and crawled websites. Semantic + hybrid search, vision OCR, auto-built wiki, browsable GGUF model catalog. Works as CLI, TUI, MCP server, REST API, or Python library. Offline by default, no sidecar services.

20K 16 3
shakfu
cyllama-rocm

A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp

15K 25 19
shakfu
cyllama-cuda12

A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp

15K 25 19
shakfu
cyllama-vulkan

A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp

14K 25 19
shakfu
cyllama-sycl

A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp

13K 25 19
FarisZahrani
llama-cpp-py-sync

Auto-synced CFFI ABI python bindings for llama.cpp with prebuilt wheels (CPU/CUDA/Vulkan/Metal).

8K 4 1
fajknli
palacelite

轻量级本地 AI 记忆系统

4K 3 0
milika
egovault

Local-first personal data vault with LLM enrichment and RAG chat

3K 0 0
thilomichael
llama-buddy

CLI wrapper for llama.cpp providing an ollama-like experience

3K 8 0
nrl-ai
edgevox

Offline voice agent framework for robots.

3K 4 0
youngharold
tightwad

Mixed-vendor GPU inference cluster manager with speculative decoding

3K 20 2
BenevolentJoker-JohnL
sollol

Super Ollama Load Balancer - Performance-aware routing for distributed Ollama deployments with Ray, Dask, and adaptive metrics

2K 4 2
mycellm
mycellm

Distributed LLM inference across heterogeneous hardware. Pool GPUs into a P2P network with QUIC transport, Ed25519 identity, and an OpenAI-compatible API.

2K 4 1
antoinezambelli
forge-guardrails

A Python framework for self-hosted LLM tool-calling and multi-step agentic workflows

2K 1 0
benwalkerai
inzen-bot

Multi-provider AI chatbot for the terminal. Chat with Claude, GPT-4, Ollama or llama.cpp without leaving your shell. Full conversation history. pip install inzen-bot

1K 0 0
rafaelpierre
openai-agents-redis

Native OpenAI Agents SDK session management implementation using Redis as the persistence layer.

1K 16 4
notolog
notolog

Notolog Markdown Editor

1K 25 6
e-lab
grammarflow

Powering Agent Chains by Constraining LLM Outputs

699 9 0
nuance1979
llama-server

LLaMA Server combines the power of LLaMA C++ with the beauty of Chatbot UI.

695 135 14
vtuber-plan
langport

A large language model serving platform.

597 94 13
shakfu
inferna

an early-stage experimental nanobind wrapper around llama.cpp

531 0 0
oussama-kh
mcp-llama-swap

MCP server for hot-swapping llama.cpp models in Claude Code - launchctl (macOS) + systemd (Linux)

463 5 1
Anyesh
cognitive-cache

Optimal context window selection for LLM coding tools. Treats context as a constrained optimization problem, not retrieval. Beats RAG, grep, and LLM-triage baselines on real GitHub issues.

351 2 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery