PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
open-metadata
openmetadata-ingestion

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.

394K 14K 2K
polyaxon
traceml

Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.

142K 530 47
polyaxon
datatile

Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.

113K 530 47
mouradmourafiq
pandas-summary

Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.

113K 530 47
canimus
cuallee

Possibly the fastest DataFrame-agnostic quality check library in town.

106K 243 22
open-metadata
openmetadata-managed-apis

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.

39K 14K 2K
re-data
re-data

re_data - fix data issues before your users & CEO would discover them 😊

6K 2K 125
open-metadata
openmetadata-airflow-managed-apis

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.

3K 14K 2K
socialpoint-labs
sqlbucket

Lightweight library to write, orchestrate and test your SQL ETL. Writing ETL with data integrity in mind.

2K 74 9
dqops
dqops

DQOps Data Quality Operations Center

1K 192 36
open-metadata
openmetadata-ingestion-core

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.

1K 14K 2K
scienxlab
redflag

Safety net for machine learning pipelines. Plays nice with sklearn and pandas.

1K 21 6
maltzsama
sumeh

Sumeh — Unified Data Quality Framework Sumeh is a unified data quality validation framework supporting multiple backends (PySpark, Dask, Polars, DuckDB, Pandas) with centralized rule configuration.

957 4 0
ecmwf
grib-check

A tool that validates project-specific conventions of GRIB files

815 0 2
weiser-ai
weiser-ai

Enterprise-grade data quality framework with YAML configuration, LLM-friendly design, and advanced statistical validation

667 2 0
arpitg1304
forge-robotics

Convert between robotics dataset formats (RLDS, LeRobot v2/v3, Zarr, HDF5, Rosbag). Inspect, visualize, and analyze datasets. Works with HuggingFace Hub. Built for OpenVLA, Octo, LeRobot, and Diffusion Policy workflows.

427 112 12
sumanthprabhu
dqc-toolkit

Quality Checks for Training Data in Machine Learning

374 7 0
open-metadata
openmetadata-sqlalchemy-bigquery

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.

266 14K 2K
acracker
data-watchtower

Data quality inspection tool. Identify issues before your CTO detects them!

266 0 1
Ezzaldin97
qprofiler

profile tabular datasets, manage automatic validation for new datasets, automatic handling for quality issues.

243 0 0
litedatum
validatelite

ValidateLite: A lightweight CLI for database schema validation and data quality checks. Ideal for CI/CD, ETL, and data pipelines.

212 3 0
realdatadriven
etlx-wrapper

Python wrapper for ETLX CLI to run ETL workflows from Python

174 40 3
Ygor-J
sql-guard

A small package for data quality rules using SQL

148 5 0
mfcabrera
hooqu

Data unit testing for your Python DataFrames

88 29 1
    • Data from PyPI, GitHub, ClickHouse, and BigQuery