How PyPI Stats Works

PyPI shows download counts. GitHub shows stars and forks. Data from both is combined to surface new insights: packages with millions of downloads but few stars, projects gaining momentum before they trend, and popular libraries that may need new maintainers.

Data Sources

BigQueryDownload statistics come from Google BigQuery's public PyPI dataset, maintained by the Python Software Foundation. Daily, weekly, and monthly download counts are queried.

ClickHouseHistorical download data is powered by ClickHouse, which PyPI uses to store and serve download statistics.

GitHubRepository metadata (stars, forks, issues, topics, last commit) comes from the GitHub REST API. GitHub URLs are extracted from PyPI package metadata.

PyPIPackage details (description, version, release dates) come from the PyPI JSON API.

Health scores, momentum indicators, and insight categories are calculated from this raw data.

Collection Schedule

Nightly: Download statistics are fetched from ClickHouse for all tracked packages.

Weekly: GitHub metadata (stars, forks, issues, topics) is updated for all packages.

Packages not included in the latest collection run are marked as "stale" (currently ~3.5% of packages).

Architecture

The API is built with FastAPI and asyncpg for non-blocking database queries. Data lives in PostgreSQL with a materialized view for fast leaderboard and search queries.

Smart cache warming is used: when a collection run completes, the cache is cleared and popular pages are pre-populated. Cache TTL is set until midnight UTC, giving ~19 hours of cache hits before the next potential update.

The frontend is static HTML, CSS, and JavaScript with no build step. Cloudflare provides edge caching.

The API has read-only database access, separate from the collector, and is rate-limited.

Privacy

Cloudflare Analytics is used for basic traffic insights. No Google Analytics or third-party tracking. A single cookie is used to remember your light/dark mode preference. All links to PyPI and GitHub go directly to those sites.

Limitations

  • Download stats update nightly; GitHub metadata updates throughout the week
  • Not all packages have GitHub URLs, so some lack repository metrics
  • Packages deleted from PyPI may persist with stale data
  • Packages sharing a GitHub repo (monorepos) show identical GitHub stats

Security

Packages are cross-referenced against the OSV database and other sources to identify malicious, vulnerable, or typosquatting packages. Flagged packages are excluded from leaderboards and lists, but remain searchable so users can find information about them. Detail pages display a prominent warning banner.