PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Datasets Python Packages

Python packages with the GitHub topic datasets. Sorted by relevance, with stars and monthly downloads.
huggingface
datasets

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

118.9M 21K 3K
akfamily
akshare

AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

2.7M 19K 3K
Arize-ai
arize-phoenix

AI Observability & Evaluation

2.2M 10K 850
Arize-ai
arize-phoenix-otel

AI Observability & Evaluation

1.7M 10K 850
tensorflow
tensorflow-datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...

1.6M 5K 2K
Arize-ai
arize-phoenix-client

AI Observability & Evaluation

895K 10K 850
Arize-ai
arize-phoenix-evals

AI Observability & Evaluation

770K 10K 850
colour-science
colour-science

Colour Science for Python

447K 3K 290
tensorflow
tfds-nightly

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...

286K 5K 2K
torchgeo
torchgeo

TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data

232K 4K 549
Mozilla-Data-Collective
datacollective

Python library for easily accessing Mozilla Data Collective datasets

170K 20 6
mims-harvard
pytdc

Therapeutics Commons (TDC): Multimodal Foundation for Therapeutic Science

131K 1K 211
simonw
datasette

An open source multi-tool for exploring and publishing data

121K 11K 829
Coloquinte
torchsr

Super Resolution datasets and models in Pytorch

118K 213 22
HumanSignal
label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format

114K 27K 4K
snap-stanford
ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning

92K 2K 407
Nixtla
datasetsforecast

Datasets for time series forecasting

86K 123 12
cleanlab
cleanlab

Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

58K 11K 890
JovianML
opendatasets

A Python library for downloading datasets from Kaggle, Google Drive, and other online sources.

53K 347 143
MinishLab
semhash

Fast Multimodal Semantic Deduplication & Filtering

53K 919 56
ibm
unitxt

🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data for end-to-end AI benchmarking

34K 212 67
open-edge-platform
datumaro

Dataset Management Framework, a Python library and a CLI tool to build, analyze and manage Computer Vision datasets.

28K 668 154
Farama-Foundation
minari

A standard format for offline reinforcement learning datasets, with popular reference datasets and related utilities

28K 510 63
autogluon
fev

Forecast evaluation library

26K 158 16
    • Data from PyPI, GitHub, ClickHouse, and BigQuery