PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
kreuzberg-dev
kreuzberg

A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.

165K 8K 472
PaddlePaddle
paddlenlp

Easy-to-use and powerful LLM and SLM library with awesome model zoo.

36K 13K 3K
PaddlePaddle
tool-helpers

Easy-to-use and powerful LLM and SLM library with awesome model zoo.

10K 13K 3K
shcherbak-ai
contextgem

ContextGem: Effortless LLM extraction from documents

9K 2K 155
PaddlePaddle
fast-dataindex

Easy-to-use and powerful LLM and SLM library with awesome model zoo.

8K 13K 3K
vectorlessflow
vectorless

Knowing by reasoning, not vectors. ⭐ Star this repo if you find it useful.

6K 29 2
yuvaraj3855
preocr

Fast document classification and OCR detection. Analyzes any file type to determine if OCR is needed, saving time and money on unnecessary processing.

3K 10 4
ENDEVSOLS
longparser

Privacy-first document intelligence engine — converts PDFs, DOCX, PPTX, XLSX, and CSV into AI-ready Markdown + structured JSON for RAG pipelines.

2K 15 1
infly-ai
infinity-parser2

INF Tech's open-source MLLMs for SOTA visual-language understanding and advanced document intelligence.

1K 141 13
PaddlePaddle
fast-tokenizer-python

Easy-to-use and powerful LLM and SLM library with awesome model zoo.

1K 13K 3K
PaddlePaddle
faster-tokenizer

Easy-to-use and powerful LLM and SLM library with awesome model zoo.

956 13K 3K
AiAgentKarl
document-intelligence-mcp

Local document intelligence MCP server — extract text, tables, metadata from PDF and DOCX. No API key needed.

595 0 0
Orbifold
knwler

Knwler is a lightweight, single-file Python tool that extracts structured knowledge graphs from documents using AI. Feed it a PDF or text file and receive a richly connected network of entities, relationships, and topics — complete with an interactive HTML report and exports ready for your favorite graph analytics platform.

561 123 10
kreuzberg-dev
langchain-kreuzberg

Kreuzberg document loader for LangChain — extract text from 88+ file formats with true async and rich metadata

461 4 0
PaddlePaddle
paddle-pipelines

Paddle-Pipelines: An End to End Natural Language Proceessing Development Kit Based on PaddleNLP

430 13K 3K
arnav2
ks-xlsx-parser

XLSX parser for LLMs, RAG, LangChain, LangGraph, CrewAI, Claude, MCP — turns Excel (.xlsx) into citation-ready JSON with formulas, charts, dependency graphs, and token-counted chunks. Open-source Python library (MIT).

372 17 2
PaddlePaddle
faster-tokenizers

PaddleNLP Faster Tokenizer Library written in C++

248 13K 3K
echology-io
decompose-mcp

The missing cognitive primitive for AI agents. Structured intelligence from any text.

186 9 2
Goldziher
mseep-kreuzberg

A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.

162 8K 477
    • Data from PyPI, GitHub, ClickHouse, and BigQuery