PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Clustering Python Packages

Python packages with the GitHub topic clustering. Sorted by relevance, with stars and monthly downloads.
embeddings-benchmark
mteb

MTEB: Massive Text Embedding Benchmark

2.8M 3K 608
scikit-learn-contrib
hdbscan

A high performance implementation of HDBSCAN clustering.

2.4M 3K 532
IBM
drain3

A robust streaming log template miner based on the Drain algorithm

628K 798 169
unum-cloud
usearch

Fast Open-Source Search & Clustering engine × for Vectors & Arbitrary Objects × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍

570K 4K 311
pycaret
pycaret

Open-source, low-code AutoML platform for Python. PyCaret 4.0: sklearn-native engine + React control plane.

467K 10K 2K
wannesm
dtaidistance

Time series distances: Dynamic Time Warping (fast DTW implementation in C)

244K 1K 191
rapidsai
pylibraft-cu12

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.

226K 1K 231
rapidsai
libraft-cu12

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.

223K 1K 231
DynamicTimeWarping
dtw-python

Python port of R's Comprehensive Dynamic Time Warp algorithms package

178K 339 39
joshlk
k-means-constrained

K-Means clustering - constrained with minimum and maximum cluster size. Documentation: https://joshlk.github.io/k-means-constrained

149K 233 44
hazelcast
hazelcast-python-client

Hazelcast Python Client

139K 116 78
WenjieDu
pypots

A Python toolkit/library for reality-centric machine/deep learning & data mining on partially-observed time series, with 50+ SOTA neural network models for scientific analysis tasks (imputation, classification, clustering, forecasting, anomaly detection, cleaning) on incomplete industrial irregularly-sampled multivariate TS with NaN missing values

122K 2K 184
rapidsai
raft-dask-cu12

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.

119K 1K 231
biolab
orange3

🍊 :bar_chart: :bulb: Orange: Interactive data analysis

100K 6K 1K
dedupeio
dedupe

:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

100K 4K 569
nomic-ai
nomic

Nomic Developer API SDK

53K 2K 197
jokofa
torch-kmeans

PyTorch implementations of KMeans, Soft-KMeans and Constrained-KMeans which can be run on GPU and work on (mini-)batches of data.

50K 83 7
VlachosGroup
aimsim-core

A Python toolbox to work with molecular similarity

50K 45 5
JustGlowing
minisom

:red_circle: MiniSom is a minimalistic implementation of the Self Organizing Maps

46K 2K 444
rapidsai
libcuvs-cu12

cuVS - a library for vector search and clustering on the GPU

43K 743 184
GAA-UAM
scikit-fda

Functional Data Analysis Python package

43K 344 63
motiwari
banditpam

BanditPAM C++ implementation and Python package

25K 657 51
wangyiqiu
dbscan

Theoretically Efficient and Practical Parallel DBSCAN

22K 96 20
rapidsai
cuvs-cu12

cuVS - a library for vector search and clustering on the GPU

22K 743 184
    • Data from PyPI, GitHub, ClickHouse, and BigQuery