PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Data Python Packages

Python packages with the GitHub topic data. Sorted by relevance, with stars and monthly downloads.
fatiando
pooch

A friend to fetch your data files

18.1M 722 87
mahmoud
glom

☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️

16.1M 2K 72
PrefectHQ
prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

12.6M 22K 2K
kayak
pypika

PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.

12.5M 3K 330
run-llama
llama-index

LlamaIndex is the leading document agent and OCR platform

10.5M 49K 7K
dlt-hub
dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

7.3M 5K 498
run-llama
llama-index-core

LlamaIndex is the leading document agent and OCR platform

7.3M 49K 7K
PrefectHQ
prefect-aws

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

6M 22K 2K
run-llama
llama-index-instrumentation

LlamaIndex is the leading document agent and OCR platform

3.9M 49K 7K
thombashi
dataproperty

A Python library for extract property from data.

3.1M 16 5
akfamily
akshare

AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

2.7M 19K 3K
capitalone
datacompy

Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!

2.6M 639 160
iterative
dvc-data

DVC's data management subsystem

2.2M 18 28
lk-geimfari
mimesis

Mimesis is a fast Python library for generating fake data in multiple languages.

1.9M 5K 359
PrefectHQ
prefect-docker

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

1.6M 22K 2K
tensorflow
tensorflow-datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...

1.6M 5K 2K
octoenergy
tentaclio

Single repository regrouping IO connectors used in the data world.

1.5M 30 2
foxglove
mcap

MCAP is a modular, performant, and serialization-agnostic container file format, useful for pub/sub and robotics applications.

1.5M 915 200
PrefectHQ
prefect-dbt

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

1.2M 22K 2K
PrefectHQ
prefect-gcp

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

1.2M 22K 2K
meltano
meltano

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

1.2M 2K 235
smarie
pytest-cases

Separate test code from test cases in pytest.

939K 373 41
datafold
collate-data-diff

Compare tables within or across databases

936K 3K 305
PrefectHQ
prefect-sqlalchemy

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

884K 22K 2K
    • Data from PyPI, GitHub, ClickHouse, and BigQuery