PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Data Generation Python Packages

Python packages with the GitHub topic data-generation. Sorted by relevance, with stars and monthly downloads.
Stranger6667
hypothesis-graphql

Generate arbitrary queries matching your GraphQL schema, and use them to verify your backend implementation.

1.6M 48 4
databrickslabs
dbldatagen

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines

275K 460 93
sdv-dev
copulas

A library to model multivariate data using copulas.

202K 643 120
sdv-dev
sdv

Synthetic data generation for tabular data

142K 3K 417
sdv-dev
ctgan

Conditional GAN for generating synthetic tabular data.

134K 2K 330
sdv-dev
deepecho

Synthetic Data Generation for mixed-type, multivariate time series.

118K 123 17
avsolatorio
realtabformer

A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.

9K 243 30
tabularis-ai
be-great

A novel approach for synthesizing tabular data using pretrained large language models

5K 355 59
Mukhopadhyay
pyfake

A Flexible and Extensible fake data generator based on Pydantic models.

3K 4 0
jaehyeon-kim
dynamic-des

Real-time SimPy control plane to dynamically update parameters and stream outputs via external systems like Kafka, Redis, or Postgres. Built for event-driven digital twins.

3K 3 0
rasinmuhammed
misata

Python synthetic data generator for realistic multi-table test data, database seeding, and scenario simulation

2K 54 3
DexForce
embodichain

An end-to-end, GPU-accelerated, and modular platform for building generalized Embodied Intelligence.

2K 153 15
wwhenxuan
s2generator

A series-symbol (S2) dual-modality data generation mechanism, enabling the unrestricted creation of high-quality time series data paired with corresponding symbolic representations.

1K 17 3
dmey
synthia

📈 🐍 Multidimensional synthetic data generation with Copula and fPCA models in Python

991 66 10
eriknovak
anonipy

Data anonymization package, supporting different anonymization strategies

957 8 4
agonzalezla
pydni

Librería en Python para la validación y generación de documentos de identidad españoles (DNI, NIE, CIF, NIF), así como la creación de nombres y personas ficticias válidas para entornos de desarrollo, QA y pruebas automatizadas.

863 0 0
stef41
castwright

Generate high-quality synthetic instruction-tuning data from seed examples. Simple API, built-in quality filtering, cost-aware.

806 1 0
Buba98
regex-enumerator

Enumerate all strings that match a given regex

779 1 0
Infineon
streamgen

Python framework for generating streams of labeled data.

718 15 1
PDBeurope
mmcif-gen

This application is designed to create mmcif files from facilities data.

606 5 0
hypervectorio
hypervector-wrapper

Python wrapper to use the Hypervector API. Better data tests

564 9 7
apiverve
apiverve-colorpalettegenerator

Color Palette is a powerful tool for generating harmonious color palettes. Generate color schemes (mono, contrast, triade, tetrade, analogic) with accessibility data, CSS exports, and palette images.

545 0 0
burning-cost
insurance-datasets

Synthetic UK motor insurance datasets with known DGP for model validation

537 0 0
matousc89
signalz

Synthetic data generators in Python

530 14 3
    • Data from PyPI, GitHub, ClickHouse, and BigQuery