PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
piskvorky
smart-open

Utils for streaming large files (S3, HDFS, gzip, bz2...)

70.6M 3K 385
TileDB-Inc
tiledb

Python interface to the TileDB storage engine

83K 202 38
jcrist
skein

A tool and library for easily deploying applications on Apache YARN

56K 146 39
iterative
dvc-hdfs

HDFS/WebHDFS plugin for dvc

33K 2 1
megvii-research
megfile

Megvii FILE Library - Working with Files in Python same as the standard library

30K 174 20
wradlib
wradlib

weather radar data processing - python package

14K 308 88
jingw
pyhdfs

Python HDFS client

6K 97 23
spotify
snakebite

A pure python HDFS client

5K 859 213
criteo
cluster-pack

A library on top of either pex or conda-pack to make your Python code easily available on a cluster

1K 46 23
tks18
pyquery-polars

PyQuery is a local-first data operating system built on lazy execution that processes 100GB+ files while you doomscroll. No cap. 🧢

1K 1 0
BROADSoftware
hadeploy

An Hadoop Application Deployment tool

869 9 4
fasouto
webhdfspy

Python wrapper to access Hadoop HDFS REST API

720 8 5
IBMStreams
streamsx-hdfs

HDFS integration for IBM Streams

574 9 20
canimus
alphareader

A reader for large files with custom delimiters and encodings

437 6 1
ab2dridi
lakekeeper

A configurable PySpark package to identify fragmented external tables and perform safe in-place compaction

402 0 0
tks18
pyquery-core

PyQuery is a local-first data operating system built on lazy execution that processes 100GB+ files while you doomscroll. No cap. 🧢

107 1 0
yassineazzouz
pydistcp

pydistcp: python WebHDFS inter/intra-cluster data copy tool.

102 9 3
ceph
test-cephadm

Ceph is a distributed object, block, and file storage platform

98 17K 6K
qiyangduan
schemaindex

SchemaIndex is designed for data scientists to index and search metadata more efficiently.

89 3 1
silkway-ai
dfspy

Distributed File System written in Python

84 14 0
yassineazzouz
kraken-pyds

Kraken - A distributed data transfer tool.

78 2 1
piskvorky
srcd-smart-open

Utils for streaming large files (S3, HDFS, gzip, bz2...) - temporary source{d} fork

61 3K 385
marco-gallegos
sqoopit

A simple package to let you Sqoop into HDFS/Hive/HBase with python

61 0 0
yassineazzouz
tanit

Kraken - A distributed data transfer tool.

42 2 1
    • Data from PyPI, GitHub, ClickHouse, and BigQuery