How PyPI Stats Works
PyPI shows download counts. GitHub shows stars and forks. Data from both is combined to surface new insights: packages with millions of downloads but few stars, projects gaining momentum before they trend, and popular libraries that may need new maintainers.
Data Sources
Download statistics come from Google BigQuery's public PyPI dataset, maintained by the Python Software Foundation. Daily, weekly, and monthly download counts are queried.
Historical download data is powered by ClickHouse, which PyPI uses to store and serve download statistics.
Repository metadata (stars, forks, issues, topics, last commit) comes from the GitHub REST API. GitHub URLs are extracted from PyPI package metadata.
Package details (description, version, release dates) come from the PyPI JSON API.
Health scores, momentum indicators, and insight categories are calculated from this raw data.
Collection Schedule
Nightly: Download statistics are fetched from ClickHouse for all tracked packages.
Weekly: GitHub metadata (stars, forks, issues, topics) is updated for all packages.
Packages not included in the latest collection run are marked as "stale" (currently ~3.5% of packages).
Architecture
The API is built with FastAPI and asyncpg for non-blocking database queries. Data lives in PostgreSQL with a materialized view for fast leaderboard and search queries.
Smart cache warming is used: when a collection run completes, the cache is cleared and popular pages are pre-populated. Cache TTL is set until midnight UTC, giving ~19 hours of cache hits before the next potential update.
The frontend is static HTML, CSS, and JavaScript with no build step. Cloudflare provides edge caching.
The API has read-only database access, separate from the collector, and is rate-limited.
Privacy
Cloudflare Analytics is used for basic traffic insights. No Google Analytics or third-party tracking. A single cookie is used to remember your light/dark mode preference. All links to PyPI and GitHub go directly to those sites.
Limitations
- Download stats update nightly; GitHub metadata updates throughout the week
- Not all packages have GitHub URLs, so some lack repository metrics
- Packages deleted from PyPI may persist with stale data
- Packages sharing a GitHub repo (monorepos) show identical GitHub stats
Security
Packages are cross-referenced against the OSV database and other sources to identify malicious, vulnerable, or typosquatting packages. Flagged packages are excluded from leaderboards and lists, but remain searchable so users can find information about them. Detail pages display a prominent warning banner.