PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
spotify
luigi

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

1.2M 19K 2K
CODAIT
yarn-api-client

Python client for Hadoop® YARN API

319K 109 49
h2oai
h2o

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

250K 7K 2K
jcrist
skein

A tool and library for easily deploying applications on Apache YARN

56K 146 39
iterative
dvc-hdfs

HDFS/WebHDFS plugin for dvc

33K 2 1
SneaksAndData
hadoop-fs-wrapper

Python Wrappers for Hadoop FileSystem

26K 4 0
jingw
pyhdfs

Python HDFS client

6K 97 23
Breaka84
spooq

Spooq is a PySpark based helper library for ETL data ingestion pipeline in Data Lakes.

4K 10 2
h2oai
h2o-client

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

2K 7K 2K
developer-sdk
hadoop-yarn-rest-api

This is Python Library for YARN REST api

2K 0 0
h2oai
h2o-mlflow-flavor

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

1K 7K 2K
eecs485staff
madoop

A light weight MapReduce framework for education.

1K 10 5
szilard-nemeth
yarn-dev-tools

Various scripts to automate and ease Apache Hadoop YARN development.

934 2 0
splitlog
splitlog

Utility to split aggregated logs from Apache Hadoop Yarn applications into a folder hierarchy

874 0 0
BROADSoftware
hadeploy

An Hadoop Application Deployment tool

869 9 4
IBMStreams
streamsx-hdfs

HDFS integration for IBM Streams

574 9 20
SvenskaSpel
cobra-policytool

Manage Apache Atlas and Ranger configuration for your Hadoop environment.

480 16 6
criteo
tf-yarn

Train TensorFlow models on YARN in just a few lines of code!

453 93 28
ab2dridi
lakekeeper

A configurable PySpark package to identify fragmented external tables and perform safe in-place compaction

402 0 0
clusterdock
clusterdock

clusterdock is a framework for creating Docker-based container clusters

370 30 8
dask
knit

Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead

272 54 10
deeplearning4j
jumpy

Numpy and nd4j interop

271 14K 4K
MariaDukmak
hadopy

Easy parallel map-reduce command line tool

238 7 0
canimus
aiowebhdfs

A modern and async implementation of the WebHDFS API in python

190 7 1
    • Data from PyPI, GitHub, ClickHouse, and BigQuery