PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Data Quality Checks Python Packages

Python packages with the GitHub topic data-quality-checks. Sorted by relevance, with stars and monthly downloads.
open-metadata
openmetadata-ingestion

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.

391K 14K 2K
polyaxon
traceml

Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.

139K 530 47
mouradmourafiq
pandas-summary

Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.

111K 530 47
polyaxon
datatile

Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.

111K 530 47
canimus
cuallee

Possibly the fastest DataFrame-agnostic quality check library in town.

104K 243 22
open-metadata
openmetadata-managed-apis

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.

39K 14K 2K
re-data
re-data

re_data - fix data issues before your users & CEO would discover them 😊

6K 2K 125
open-metadata
openmetadata-airflow-managed-apis

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.

3K 14K 2K
socialpoint-labs
sqlbucket

Lightweight library to write, orchestrate and test your SQL ETL. Writing ETL with data integrity in mind.

2K 74 9
dqops
dqops

DQOps Data Quality Operations Center

1K 192 36
open-metadata
openmetadata-ingestion-core

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.

1K 14K 2K
scienxlab
redflag

Safety net for machine learning pipelines. Plays nice with sklearn and pandas.

1K 21 6
maltzsama
sumeh

Sumeh — Unified Data Quality Framework Sumeh is a unified data quality validation framework supporting multiple backends (PySpark, Dask, Polars, DuckDB, Pandas) with centralized rule configuration.

1K 4 0
ecmwf
grib-check

A tool that validates project-specific conventions of GRIB files

829 0 2
weiser-ai
weiser-ai

Enterprise-grade data quality framework with YAML configuration, LLM-friendly design, and advanced statistical validation

718 2 0
arpitg1304
forge-robotics

Convert between robotics dataset formats (RLDS, LeRobot v2/v3, Zarr, HDF5, Rosbag). Inspect, visualize, and analyze datasets. Works with HuggingFace Hub. Built for OpenVLA, Octo, LeRobot, and Diffusion Policy workflows.

459 112 12
sumanthprabhu
dqc-toolkit

Quality Checks for Training Data in Machine Learning

366 7 0
Ezzaldin97
qprofiler

profile tabular datasets, manage automatic validation for new datasets, automatic handling for quality issues.

288 0 0
open-metadata
openmetadata-sqlalchemy-bigquery

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.

285 14K 2K
acracker
data-watchtower

Data quality inspection tool. Identify issues before your CTO detects them!

279 0 1
litedatum
validatelite

ValidateLite: A lightweight CLI for database schema validation and data quality checks. Ideal for CI/CD, ETL, and data pipelines.

221 3 0
realdatadriven
etlx-wrapper

Python wrapper for ETLX CLI to run ETL workflows from Python

170 40 3
Ygor-J
sql-guard

A small package for data quality rules using SQL

149 5 0
AmmarYasser455
geoqa

GeoQA: A Python package for geospatial data quality assessment and interactive profiling

99 3 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery