PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Data Transformation Python Packages

Python packages with the GitHub topic data-transformation. Sorted by relevance, with stars and monthly downloads.
mahmoud
glom

☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️

16.2M 2K 72
daq-tools
commons-codec

Data decoding, encoding, conversion, and translation utilities.

8K 2 2
ironmussa
optimuspyspark

:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

6K 2K 232
bruin-data
bruin-sdk

Bruin Python SDK — eliminate boilerplate in Bruin Python assets

6K 5 0
panodata
tikray

A compact data transformation engine.

5K 1 0
kmatarese
glide

Easy ETL

5K 17 2
productml
blurr-dev

Data aggregation pipeline for running real-time predictive models

4K 4 0
azukds
tubular

Python package implementing ML feature engineering and pre-processing for polars or pandas dataframes.

4K 100 27
scottroberts140
dsr-feature-eng-ml

Machine learning model evaluation and feature engineering framework with hyperparameter tuning, data balancing, and feature importance analysis.

3K 1 0
Org-EthereaLogic
etherealogic-aetheriaforge

Databricks-native intelligent data transformation engine — coherence-scored Bronze/Silver/Gold with entity resolution and temporal reconciliation in a single deployable product.

2K 1 0
jhd3197
tukuy

A flexible data transformation library with a plugin system

1K 3 0
globaldothealth
adtl

Another data transformation language

1K 2 1
benzsevern
goldenflow

Data transformation toolkit — standardize, reshape, and normalize messy data. Python & TypeScript. 83 transforms, zero-config mode, MCP server, edge-safe. DQBench 100/100.

1K 1 0
hi-primus
pyoptimus

:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

1K 2K 232
Cydra-Tech
smelt-ai

LLM-powered structured data transformation. Batch process rows through any LLM, get back strictly typed Pydantic models.

884 2 0
MatheusGiacomo
dataforge-dfg

Data Forge is a high-performance, CLI-first data integration tool designed to streamline the lifecycle of data from ingestion to transformation. Built with Python, it provides a robust framework for handling both ETL and ELT workflows with a focus on automation, reliability, and developer experience.

813 1 0
mikeAdamss
tidychef

Python framework for transforming tabulated data with visual relationships into tidy data

428 1 1
ityutin
df-and-order

Using df-and-order your interactions with dataframes become very clean and predictable.

400 3 2
artemlops
customer-segmentation-toolkit

Data transformations for the Engineering Lab2 Feature-Store-for-ML

367 1 0
productml
blurr

Data aggregation pipeline for running real-time predictive models

367 4 0
enram
vptstools

Python library to transfer and convert vertical profile time series data

318 4 1
amadou-6e
pymdt2json

pymdt2json is a Python CLI and library for converting markdown tables into structured JSON: ideal for data pipelines, LLM preprocessing, and web/API integration.

208 1 0
brotherzhafif
pythistic

Frequency Table Conversion, Descriptive Statistics and Data Transformation Calculation Tool in Python

207 3 0
jameshuh
dataflow-converter

A powerful data format conversion tool - CSV/JSON/XML/Excel/YAML/TSV converter with batch processing and field mapping

128 0 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery