PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Dataset Python Packages

Python packages with the GitHub topic dataset. Sorted by relevance, with stars and monthly downloads.
joke2k
faker

Faker is a Python package that generates fake data for you.

69.9M 19K 2K
tensorflow
tensorflow-io-gcs-filesystem

Dataset, streaming, and file system extensions maintained by TensorFlow SIG-IO

6.7M 735 308
ashvardanian
stringzilla

Up to 100x faster strings for C, C++, CUDA, Python, Rust, Swift, JS, & Go, leveraging NEON, AVX2, AVX-512, SVE, GPGPU, & SWAR to accelerate search, hashing, sorting, edit distances, sketches, and memory ops 🦖

3.2M 3K 124
tensorflow
tensorflow-datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...

1.6M 5K 2K
pytorch
torchtext

Models, data loaders and abstractions for language processing, powered by PyTorch

948K 4K 812
smarie
pytest-cases

Separate test code from test cases in pytest.

939K 373 41
pydata
pandas-datareader

Extract data from a wide range of Internet sources into a pandas DataFrame.

843K 3K 692
fastai
fastdownload

Easily download, verify, and extract archives

766K 47 12
allenai
ir-datasets

Provides a common interface to many IR ranking datasets.

580K 389 52
mosaicml
mosaicml-streaming

A Data Streaming Library for Efficient Neural Network Training

517K 2K 188
colour-science
colour-science

Colour Science for Python

447K 3K 290
tensorflow
tensorflow-io

Dataset, streaming, and file system extensions maintained by TensorFlow SIG-IO

391K 735 308
quandl
quandl

Package for quandl API access

309K 1K 338
tensorflow
tfds-nightly

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...

286K 5K 2K
scipp
scipp

Multi-dimensional data arrays with labeled dimensions

136K 143 22
HumanSignal
label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format

114K 27K 4K
palewire
cpi

Quickly adjust U.S. dollars for inflation using the Consumer Price Index (CPI)

105K 142 23
mlmed
torchxrayvision

TorchXRayVision: A library of chest X-ray datasets and models. Classifiers, segmentation, and autoencoders.

94K 1K 248
segments-ai
segments-ai

Segments.ai Python SDK

86K 27 10
datalad
datalad

Keep code, data, containers under control with git and git-annex

86K 637 127
cvat-ai
cvat-sdk

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.

85K 16K 4K
joke2k
fake-factory

Faker is a Python package that generates fake data for you.

74K 19K 2K
tensorflow
tensorflow-io-nightly

Dataset, streaming, and file system extensions maintained by TensorFlow SIG-IO

72K 735 308
neuspell
neuspell

NeuSpell: A Neural Spelling Correction Toolkit

52K 712 106
    • Data from PyPI, GitHub, ClickHouse, and BigQuery