PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Unstructured Data Python Packages

Python packages with the GitHub topic unstructured-data. Sorted by relevance, with stars and monthly downloads.
treeverse
dvc

🦉 Data Versioning and ML Experiments

3M 16K 1K
nuclia
nucliadb-utils

NucliaDB, The AI Search database for RAG

196K 720 58
voxel51
fiftyone

Refine high-quality datasets and visual AI models

178K 11K 752
voxel51
fiftyone-db

Refine high-quality datasets and visual AI models

171K 11K 752
nuclia
nucliadb-models

NucliaDB, The AI Search database for RAG

144K 720 58
nuclia
nucliadb-protos

NucliaDB, The AI Search database for RAG

143K 720 58
nuclia
nucliadb-telemetry

NucliaDB, The AI Search database for RAG

129K 720 58
nuclia
nucliadb-dataset

NucliaDB, The AI Search database for RAG

125K 720 58
nuclia
nucliadb

NucliaDB, The AI Search database for RAG

109K 720 58
garyelephant
pygrok

python implementation of jordansissel's grok regular expression library

104K 284 74
kodexa-ai
kodexa

Kodexa Python Client

99K 5 1
nuclia
nucliadb-sdk

NucliaDB, The AI Search database for RAG

61K 720 58
yobix-ai
extractous

Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.

58K 2K 96
nomic-ai
nomic

Nomic Developer API SDK

51K 2K 197
datachain-ai
datachain

Data Memory: the operational data context layer for AI agents - typed, versioned datasets over images, video, docs and tables

49K 3K 140
nuclia
nidx-protos

NucliaDB, The AI Search database for RAG

36K 720 58
nuclia
nidx-binding

Bindings for nidx (part of nucliadb)

11K 720 58
shcherbak-ai
contextgem

ContextGem: Effortless LLM extraction from documents

9K 2K 155
mitdbg
palimpzest

A System for Optimized Semantic Computation

6K 214 44
Zipstack
unstract-sdk

A framework for writing Unstract Tools/Apps

5K 23 1
amphi-ai
jupyterlab-amphi

visual data prep powered by python

3K 1K 106
voxel51
fiftyone-db-ubuntu2204

Refine high-quality datasets and visual AI models

3K 11K 752
emcf
thepipe-api

Get clean data from tricky documents, powered by vision-language models âš¡

3K 2K 99
towhee-io
towhee

Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

2K 3K 261
    • Data from PyPI, GitHub, ClickHouse, and BigQuery