PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Data Validation Python Packages

Python packages with the GitHub topic data-validation. Sorted by relevance, with stars and monthly downloads.
pandera-dev
pandera

A light-weight, flexible, and expressive statistical data testing library

8.9M 4K 395
pyeve
cerberus

Lightweight, extensible data validation library for Python

8.7M 3K 242
databrickslabs
databricks-labs-remorph

Accelerates migrations to Databricks by automating key migration activities

1.5M 140 101
evidentlyai
evidently

Evidently is ​​an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.

1.2M 7K 836
open-metadata
openmetadata-ingestion

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.

390K 14K 2K
deepchecks
deepchecks

Deepchecks: Tests for Continuous Validation of ML Models & Data. Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling to thoroughly test your data and models from research to production.

58K 4K 294
cleanlab
cleanlab

Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

58K 11K 890
posit-dev
pointblank

Data validation toolkit for assessing and monitoring data quality.

53K 430 27
InfuseAI
recce

The data-validation toolkit for enhanced dbt (data build tool) PR review

50K 454 26
InfuseAI
recce-nightly

The data-validation toolkit for enhanced dbt (data build tool) PR review

42K 454 26
open-metadata
openmetadata-managed-apis

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.

38K 14K 2K
databrickslabs
databricks-switch-plugin

Accelerates migrations to Databricks by automating key migration activities

18K 140 101
OpenDQV
opendqv

Open-source, contract-driven data quality validation. Shift-left enforcement at the point of write — before data enters your pipeline.

15K 10 2
DataRecce
recce-cloud-nightly

The data-validation toolkit for enhanced dbt (data build tool) PR review

14K 454 26
databrickslabs
databricks-labs-lakebridge

Accelerates migrations to Databricks by automating key migration activities

11K 140 101
cleanlab
cleanvision

Automatically find issues in image datasets and practice data-centric computer vision.

10K 1K 80
json-structure
json-structure

Official SDKs for JSON Structure schema and instance validation

10K 22 1
DataRecce
recce-cloud

The data-validation toolkit for enhanced dbt (data build tool) PR review

7K 454 26
shopnilsazal
validus

A dead simple Python string validation library.

6K 259 13
vertti
daffy

Lightweight DataFrame validation decorators for Pandas, Polars, Modin, and PyArrow. No custom types required.

5K 58 5
RayCarterLab
excelalchemy

Schema-driven Python library for typed Excel import/export workflows with Pydantic, locale-aware workbooks, pluggable storage, and contract-tested architecture.

4K 10 1
seadonggyun4
truthound

"Sniffs out bad data"

4K 18 1
cleanlab
cleanlab-studio

Client interface to Cleanlab Studio

4K 31 10
open-metadata
openmetadata-airflow-managed-apis

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.

3K 14K 2K
    • Data from PyPI, GitHub, ClickHouse, and BigQuery