27 dependents
Package Description Downloads/month
Apache Airflow - A platform to programmatically author, schedule, and monitor wo... 138K
DBND is an agile pipeline framework that helps data engineering teams track and ... 15K
Utilities to handle BP/RP (XP) Gaia low-resolution spectra as delivered via the ... 4K
QALITA Platform Command Line Interface 3K
Use bacpipe to streamline the process of generating embeddings and analysing you... 3K
A scheduler-driven data transfer platform 2K
Test Command tool 2K
QALITA Platform Core lib for common function used in pack 1K
Spark-based distribution version of fast and customizable framework for automati... 848
Pure Numpy ON Scala3 726
NLP framework in python for entity recognition and relationship extraction 399
354
logrotate in minutes 342
Add your description here 295
A powerful command-line tool for querying and manipulating Parquet datasets dire... 218
Allows to import zip-compressed Python package by URL (http, hdfs). 200
Write Singer data to JSONL files via webhdfs 181
A PySpark implementation of the Blue Brain Project Functionalizer 171
etl_ml is a tools could etl origin excel or csv dirty data and send data to ftp... 145
Spark-based distribution version of fast and customizable framework for automati... 136
MLflow WebHDFS Plugins 81
73
A custom of from hdfs.ext.kerberos import KerberosClient from hdfs package that ... 71
微众银行多方大数据隐私计算平台核心组件组件,包括隐私求交集、多方LGBM训练/预测、多方LR训练/预测、安全多方计算、统一网关等 41
A distributed, parallel datareplication engine that caters to various source/tar... 14
A library for analyzing files from HDFS and saving results to MongoDB 5
FastAPI framework, high performance, easy to learn, fast to code, ready for prod... 2