PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Data Science Python Packages

Python packages with the GitHub topic data-science. Sorted by relevance, with stars and monthly downloads.
pandas-dev
pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

684.3M 49K 20K
matplotlib
matplotlib

matplotlib: plotting with Python

220.4M 23K 8K
scikit-learn
scikit-learn

scikit-learn: machine learning in Python

207.1M 66K 27K
ipython
ipython

Official repository for IPython itself. Other repos in the IPython organization contain things like the website, documentation builds, etc.

164.7M 17K 4K
aws
awswrangler

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

86.3M 4K 727
pymupdf
pymupdf

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

78.5M 10K 718
mwaskom
seaborn

Statistical data visualization in Python

55.8M 14K 2K
modal-labs
modal

SDK libraries for Modal

53.1M 472 93
ray-project
ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

52.9M 42K 8K
snowflakedb
snowflake-snowpark-python

Snowflake Snowpark Python API

52.2M 333 148
apache
apache-airflow-providers-common-sql

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

50.3M 45K 17K
aws
redshift-connector

Redshift Python Connector. It supports Python Database API Specification v2.0.

49.6M 218 87
statsmodels
statsmodels

Statsmodels: statistical modeling and econometrics in Python

36.7M 11K 3K
great-expectations
great-expectations

Always know what to expect from your data.

31.5M 11K 2K
apache
apache-airflow-providers-fab

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

30.2M 45K 17K
streamlit
streamlit

Streamlit — A faster way to build and share data apps.

27.7M 44K 4K
wandb
wandb

The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.

25M 11K 864
supabase
realtime

Python Client for Supabase. Query Postgres from Flask, Django, FastAPI. Python user authentication, security policies, edge functions, file storage, and realtime data streaming. Good first issue.

25M 3K 479
apache
apache-airflow-providers-http

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

24M 45K 17K
apache
apache-airflow-providers-common-compat

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

22.2M 45K 17K
explosion
spacy

💫 Industrial-strength Natural Language Processing (NLP) in Python

22.1M 34K 5K
apache
apache-airflow-providers-databricks

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

22.1M 45K 17K
apache
apache-airflow-providers-cncf-kubernetes

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

21.4M 45K 17K
keras-team
keras

Deep Learning for humans

21.1M 64K 20K
    • Data from PyPI, GitHub, ClickHouse, and BigQuery