290 dependents
Package Description Downloads/month
🦦 weasel: A small and easy workflow system 27.3M
simple, flexible, offline capable, cloud storage with a Python path-like interfa... 4.1M
s3path is a pathlib extension for AWS S3 Service 3.4M
Meltano: the declarative code-first data integration engine that powers your wil... 1.2M
Command Line Interface for Anyscale 1.1M
s3pathlib is the python package provides the Pythonic objective oriented program... 985K
ghandic jsf
Creates fake JSON files from a JSON schema 716K
Streaming (and fast!) parser for multipart/form-data written in Cython 404K
Disaster recovery solution for Amazon Managed Workflows for Apache Airflow (MWAA... 290K
This is the development home of the workflow management system Snakemake. For ge... 254K
The Privacy Engineering & Compliance Framework 90K
Schema, functions and a python library for storing and accessing STAC collection... 78K
A streaming audio reader, processor, and writer built on top of soundfile, and P... 66K
Astro SDK allows rapid and clean development of {Extract, Load, Transform} workf... 56K
ccflow is a collection of tools for workflow configuration, orchestration, and d... 53K
Bigeye SDK offers developer tools and clients to interact with Bigeye programmat... 51K
A collection of Python-based 'connectors' that extract metadata from various sou... 51K
Data and tools for generating and inspecting OLMo pre-training data. 44K
Contains the ML and non-Azure specific common code associated with running A... 42K
Toolkit for linearizing PDFs for LLM datasets/training 40K
Contains ML models, featurizers and scoring code which can either be used with A... 32K
Used for automatically finding the best machine learning model and its parameter... 28K
flūmine - Betting trading framework 26K
Framework for simpler Spark Pipelines 24K
3D molecular fingerprints 21K
The leading data integration platform for ETL / ELT data pipelines from APIs, da... 19K
Google Ads API Report Fetcher (gaarf) 18K
Common API for all "second gen" AutoML APIs: Auger.AI, Google Cloud AutoML and A... 17K
Command Line Interface (CLI) for bulk processing/loading data into RegScale 13K
Some data analysis tools for working with historical PV solar time-series data s... 12K
GeoNode is an open source platform that facilitates the creation, sharing, and c... 11K
Tetrascience Python SDK 10K
Handles reading queries and writing GarfReport from garf-core package 10K
Synthetic Data SDK ✨ 9K
Extracts adhoc queries from the Looker API to S3 8K
A multilingual phonemizer combining lexica, NLP, and probabilistic scoring for i... 8K
Simple functions shared across fsai apps. 8K
The leading data integration platform for ETL / ELT data pipelines from APIs, da... 7K
CsvPath Framework is a data preboarding automation library for receiving, valida... 6K
Pipleline for generating data used in text analytics notebooks. Used by Welfare ... 6K
The leading data integration platform for ETL / ELT data pipelines from APIs, da... 6K
6K
Python library for working with Music Information Retrieval datasets 6K
Utilities for analysis of adaptive immune receptor repertoire (AIRR) data 5K
The leading data integration platform for ETL / ELT data pipelines from APIs, da... 5K
LOCI static analysis service 5K
Conforms pandas to "correct" datatypes to ensure data in/out using CSV, JSONL an... 4K
Clodius is a tool for breaking up large data sets into smaller tiles that can su... 4K
Genropy framework repository 4K
ALCF Inference Gateway SDK 4K