PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
nfstream
nfstream

NFStream: a Flexible Network Data Analysis Framework.

18K 1K 143
hearmeneigh
datasetrising

Toolchain for creating custom datasets and training Stable Diffusion (1.x, 2.x, XL) models and LoRAs

9K 18 1
lightning-rod-labs
lightningrod-ai

Python SDK for dataset generation on LightningRod platform ⚡

6K 44 3
Kiln-AI
kiln-ai

Build, Evaluate, and Optimize AI Systems. Includes evals, RAG, agents, fine-tuning, synthetic data generation, dataset management, MCP, and more.

3K 5K 361
HZYAI
ragscore

⚡️ The "1-Minute RAG Audit" — Generate QA datasets & evaluate RAG systems in Colab, Jupyter, or CLI. Privacy-first, async, visual reports.

2K 31 5
scalexi
scalexi

scalexi is a versatile open-source Python library, optimized for Python 3.11+, focuses on facilitating low-code development and fine-tuning of diverse Large Language Models (LLMs).

2K 13 2
Kiln-AI
kiln-server

Build, Evaluate, and Optimize AI Systems. Includes evals, RAG, agents, fine-tuning, synthetic data generation, dataset management, MCP, and more.

1K 5K 361
colddsam
modeyolo

ModeYOLO is a versatile Python package designed for efficient color space transformations and simplified dataset modification for deep learning applications. Seamlessly integrating into your workflow, this package empowers users to effortlessly perform diverse color operations and streamline the creation of modified datasets, enhancing the flexibility and convenience of machine learning model training processes.

1K 0 0
DIYer22
bpycv

Computer vision utils for Blender.

1K 501 60
Superuser666-Sigil
human-eval-rust

SigilDERG Data Production is an enterprise-grade Rust pipeline that crawls crates, runs rigorous scans (Clippy, Geiger, license checks), and generates instruction-style JSONL shards. It features semantic chunking, configurable splits, observability, and seamless SigilDERG ecosystem integration.

1K 0 1
MatteoGuadrini
pyreports

pyreports is a python library that allows you to create complex report from various sources

885 113 9
facebookresearch
stopes

A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.

752 303 46
JaonHax
scpscraper

A Python library designed for scraping data from the SCP wiki.

738 16 4
StarlangSoftware
nlptoolkit-datagenerator

Classification dataset generator library for high level Nlp tasks

720 3 0
SimGus
chatette

A powerful dataset generator for Rasa NLU, inspired by Chatito

688 315 53
christiangarcia0311
data-seed-ph

A Python library for generating realistic, synthetic Philippine-based datasets.

672 8 0
OmarSamirz
iftg

IFTG (ImageFromTextGenerator) is a Python package that simplifies creating robust datasets for OCR models. Generate images from text, apply over 10 built-in noise effects, and customize fonts and layouts. IFTG supports all languages and offers endless noise combinations, including custom noise creation.

666 21 2
OOXXXXOO
d-arth

DATASETS FOR WHOLE E-ARTH

631 9 7
4thel00z
ccdown

A rust based, resumable downloader cli and python library for Common Crawl data

599 0 0
ElementAI
synbols

The Synbols dataset generator is a ServiceNow Research project that was started at Element AI.

551 45 6
TimeEval
timeeval-gutentag

A good Timeseries Anomaly Generator.

491 95 17
johnazedo
financial-scraper

A Python-based web scraping tool for collecting financial data from multiple sources

444 1 0
timcera
pyslice

Data set templating library for model dataset creation and model running.

424 1 1
radi-cho
datasetgpt

Generate textual and conversational datasets with LLMs.

417 298 19
    • Data from PyPI, GitHub, ClickHouse, and BigQuery