PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
nomic-ai
nomic

Nomic Developer API SDK

54K 2K 197
akamhy
videohash

Near Duplicate Video Detection (Perceptual Video Hashing) - Get a 64-bit comparable hash-value for any video.

12K 371 62
sophiaconsulting
fast-suffix-array

O(n) suffix array construction (SA-IS) with LCP arrays, BWT, FM-index, and pattern search. Rust-powered Python bindings.

8K 0 0
AI-team-UoA
pyjedai

An open-source library that leverages Python’s data science ecosystem to build powerful end-to-end Entity Resolution workflows.

2K 93 13
nmavail
nmavail

Async Python CLI tool to check name availability across domains, GitHub, NPM, PyPI, Crates.io, and system packages. Zero-config tool for startups and developers."

2K 0 0
justinshenk
simages

Find similar images in a dataset

821 23 3
akcarsten
duplicate-finder

This Python packages identifies duplicate files in a folder of interest.

778 24 5
erikreed
pydupes

A duplicate file finder like rdfind/fdupes et al that may be faster in environments with millions of files and terabytes of data or over high latency filesystems (e.g. NFS).

474 4 2
deplicate
deplicate

Advanced Duplicate File Finder for Python. Nothing is impossible to solve.

436 79 17
KeyWeeUsr
thebear

Bear - the decluttering deduplicator

274 4 1
exponential-decay
demystify-digipres

Engine for analysis of Siegfried export files and DROID CSV. The tool has three purposes, break the export into its components and store them within a SQLite database; create additional columns to augment the output where useful; and query the SQLite database, outputting results in a readable form useful for analysis by researchers and archivists within digital preservation departments in memory institutions. The tool will find duplicates, unidentified files, blacklisted objects, character encoding issues, and more.

262 33 5
drogers0
clonehunter

CloneHunter finds duplicate code across mixed-language repositories. It runs a semantic retrieval pipeline with model inference and indexing to catch harder, non-trivial duplicate patterns.

252 2 0
ChuckNorrison
imgdups

Very fast two folder image duplicate finder programmed with pickle and cv2

226 2 0
markusressel
py-image-dedup

A library to find duplicate images and delete unwanted ones

220 170 18
zeronyk
imageduplicatefinder

Simple duplication finder for Images, matches on names and then compares image hashes.

194 0 1
MarcinOrlowski
dhunter

Fast, content based duplicate file detector with cache and more!

190 0 0
elcorto
findsame

Find duplicate files and directories based on file hashes.

185 6 1
vuolter
deplicate-cli

Command Line Interface for deplicate.

112 3 1
giosali
dupeutil

A command-line program written in Python for detecting and removing duplicate files

75 0 0
NicolasBi
dupe-eraser

A command-line tool which automate the deletion of duplicate files based on their hash or perceptual-hash.

51 13 0
callforpapers-source
doc2term

A fast sentence/word tokenizer, and punctuation remover.

38 2 1
dealfonso
searchdups

Search for duplicate files

34 0 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery