PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Synthetic Data Python Packages

Python packages with the GitHub topic synthetic-data. Sorted by relevance, with stars and monthly downloads.
lk-geimfari
mimesis

Mimesis is a fast Python library for generating fake data in multiple languages.

1.9M 5K 359
pgmpy
pgmpy

Python Toolkit for Causal and Probabilistic Reasoning

447K 3K 1K
databrickslabs
dbldatagen

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines

275K 460 93
sdv-dev
copulas

A library to model multivariate data using copulas.

202K 643 120
sdv-dev
sdmetrics

Metrics to evaluate quality and efficacy of synthetic datasets.

145K 258 52
sdv-dev
sdv

Synthetic data generation for tabular data

142K 3K 417
sdv-dev
ctgan

Conditional GAN for generating synthetic tabular data.

134K 2K 330
sdv-dev
deepecho

Synthetic Data Generation for mixed-type, multivariate time series.

118K 123 17
unrealcv
unrealcv

UnrealCV: Connecting Computer Vision to Unreal Engine

66K 2K 460
bespokelabsai
bespokelabs-curator

Synthetic data curation for post-training and structured data extraction

46K 2K 140
barseghyanartur
faker-file

Create files with fake data. In many formats. With no efforts.

16K 104 10
tdspora
syngen

Open-source version of the TDspora synthetic data generation algorithm.

14K 18 12
nickkunz
smogn

Synthetic Minority Over-Sampling Technique for Regression

13K 348 84
gretelai
gretel-client

The Gretel Python Client allows you to interact with the Gretel REST API.

11K 63 20
privateai
privateai-client

A python client used to interact with the Private AI's API

11K 23 3
ydataai
ydata-synthetic

Synthetic data generators for tabular and time-series data

9K 2K 260
mostly-ai
mostlyai

Synthetic Data SDK ✨

9K 769 64
opendsr-std
seedfaker

Deterministic synthetic data generator for realistic, correlated, and noisy test records across 68 locales. Rust CLI/Python/Node.js/Browser WASM/Go/PHP/Ruby/MCP

9K 23 0
avsolatorio
realtabformer

A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.

9K 243 30
gretelai
gretel-synthetics

Synthetic data generators for structured and unstructured text, featuring differentially private learning.

8K 677 100
mostly-ai
mostlyai-engine

Synthetic Data Engine 💎

7K 76 19
mostly-ai
mostlyai-qa

Synthetic Data Quality Assurance 🔎

7K 66 13
lightning-rod-labs
lightningrod-ai

Python SDK for dataset generation on LightningRod platform ⚡

6K 44 3
sparkfish
augraphy

Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes

6K 527 60
    • Data from PyPI, GitHub, ClickHouse, and BigQuery