PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
voxel51
fiftyone

Refine high-quality datasets and visual AI models

179K 11K 752
voxel51
fiftyone-db

Refine high-quality datasets and visual AI models

169K 11K 752
cleanlab
cleanlab

Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

58K 11K 890
visualdatabase
fastdup

fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and video datasets. It helps enhance the quality of both images and labels, while significantly reducing data operation costs, all with unmatched scalability.

11K 2K 87
cleanlab
cleanlab-studio

Client interface to Cleanlab Studio

4K 31 10
voxel51
fiftyone-db-ubuntu2204

Refine high-quality datasets and visual AI models

3K 11K 752
Digital-Dermatology
selfclean

[NeurIPS 2024] 🧼🔎 A holistic self-supervised data cleaning strategy to detect irrelevant samples, near duplicates and label errors.

2K 37 2
voxel51
fiftyone-desktop

FiftyOne Desktop

2K 11K 752
Renumics
sliceguard

A library for detecting problematic data segments in structured and unstructured data with few lines of code.

1K 63 3
TieuLongPhan
synrbl

Rebalancing chemical reaction

1K 29 2
KenObata
distributed-curator

Partition-aware MinHash LSH deduplication library for large-scale text data curation on Apache Spark.

848 1 0
PennLINC
cubids

Curation of BIDS (CuBIDS): A sanity-preserving software package for processing BIDS datasets.

712 30 13
aminnaghdloo
annotate-ez

High-throughput curation and visualization of large-scale single-cell microscopy images, in a lightweight GUI.

593 1 0
voxel51
fiftyone-db-ubuntu2004

Refine high-quality datasets and visual AI models

539 11K 752
cleanlab
cleanlab-cli

Client interface to Cleanlab Studio

459 31 10
bluestero
urlgenie

Python package to make URL extraction, generalization, validation, and filtration easy.

391 4 1
UAL-RE
ldcoolp-figshare

Python tool using the Figshare API for data curation

358 3 1
cleanlab
example-package-elisno

The standard package for data-centric AI, machine learning with label errors, and automatically finding and fixing dataset issues in Python.

300 11K 890
Docta-ai
docta-ai

Docta.ai

198 3K 256
NVIDIA
invisible-rabbit

Scalable Data Preprocessing Tool for Training Large Language Models

185 2K 264
voxel51
fiftyone-db-debian9

FiftyOne DB

185 11K 752
voxel51
fiftyone-db-ubuntu1604

Project FiftyOne database

139 11K 752
voxel51
fiftyone-db-rhel7

Refine high-quality datasets and visual AI models

135 11K 752
LaureBerti
learn2clean

Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning

91 54 20
    • Data from PyPI, GitHub, ClickHouse, and BigQuery