PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Hadoop Python Packages

Python packages with the GitHub topic hadoop. Sorted by relevance, with stars and monthly downloads.
spotify
luigi

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

1.5M 19K 2K
CODAIT
yarn-api-client

Python client for Hadoop® YARN API

319K 109 49
h2oai
h2o

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

244K 7K 2K
jcrist
skein

A tool and library for easily deploying applications on Apache YARN

56K 146 39
iterative
dvc-hdfs

HDFS/WebHDFS plugin for dvc

33K 2 1
SneaksAndData
hadoop-fs-wrapper

Python Wrappers for Hadoop FileSystem

26K 4 0
jingw
pyhdfs

Python HDFS client

6K 97 23
Breaka84
spooq

Spooq is a PySpark based helper library for ETL data ingestion pipeline in Data Lakes.

4K 10 2
h2oai
h2o-client

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

2K 7K 2K
developer-sdk
hadoop-yarn-rest-api

This is Python Library for YARN REST api

2K 0 0
h2oai
h2o-mlflow-flavor

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

1K 7K 2K
eecs485staff
madoop

A light weight MapReduce framework for education.

1K 10 5
szilard-nemeth
yarn-dev-tools

Various scripts to automate and ease Apache Hadoop YARN development.

941 2 0
BROADSoftware
hadeploy

An Hadoop Application Deployment tool

877 9 4
splitlog
splitlog

Utility to split aggregated logs from Apache Hadoop Yarn applications into a folder hierarchy

838 0 0
IBMStreams
streamsx-hdfs

HDFS integration for IBM Streams

614 9 20
SvenskaSpel
cobra-policytool

Manage Apache Atlas and Ranger configuration for your Hadoop environment.

557 16 6
criteo
tf-yarn

Train TensorFlow models on YARN in just a few lines of code!

471 93 28
ab2dridi
lakekeeper

A configurable PySpark package to identify fragmented external tables and perform safe in-place compaction

433 0 0
clusterdock
clusterdock

clusterdock is a framework for creating Docker-based container clusters

385 30 8
dask
knit

Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead

262 54 10
deeplearning4j
jumpy

Numpy and nd4j interop

259 14K 4K
MariaDukmak
hadopy

Easy parallel map-reduce command line tool

238 7 0
Orhideous
python3-lzo-indexer

Python library for indexing block offsets within LZO compressed files

204 2 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery