PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Nlp Python Packages

Python packages with the GitHub topic nlp. Sorted by relevance, with stars and monthly downloads.
huggingface
tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

163.2M 11K 1K
huggingface
transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

143.5M 160K 33K
huggingface
datasets

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

118.9M 21K 3K
nltk
nltk

NLTK Source

60.5M 15K 3K
explosion
thinc

🔮 A refreshing functional take on deep learning, compatible with your favorite libraries

24.8M 3K 294
explosion
spacy

💫 Industrial-strength Natural Language Processing (NLP) in Python

22.1M 34K 5K
explosion
spacy-loggers

📟 Logging utilities for spaCy

17.7M 12 17
adbar
htmldate

Fast and robust date extraction from web pages, with Python or on the command-line

9.6M 148 30
adbar
trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

7.4M 6K 363
datamade
usaddress

:us: a python library for parsing unstructured United States address strings into address components

6.5M 2K 308
Unstructured-IO
unstructured

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

5.3M 15K 1K
RaRe-Technologies
gensim

Topic Modelling for Humans

5.1M 16K 4K
Microsoft
presidio-analyzer

An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.

4.6M 8K 1K
masci
banks

LLM prompt language based on Jinja. Banks provides tools and functions to build prompts text and chat messages from generic blueprints. It allows attaching metadata to prompts to ease their management, and versioning is first-class citizen. Banks provides ways to store prompts on disk along with their metadata.

4.3M 126 20
modelscope
modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

4.1M 9K 934
akoumjian
datefinder

Find dates inside text using Python and get back datetime objects

4.1M 662 170
Microsoft
presidio-anonymizer

An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.

3.4M 8K 1K
isaacus-dev
semchunk

A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.

3.3M 619 40
hplt-project
sacremoses

Python port of Moses tokenizer, truecaser and normalizer

2.6M 495 59
vi3k6i5
flashtext

Extract Keywords from sentence or Replace keywords in sentences.

2.3M 6K 598
sloria
textblob

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

2.2M 10K 1K
pemistahl
lingua-language-detector

The most accurate natural language detection library for Python, suitable for short text and mixed-language text

1.7M 2K 59
openvinotoolkit
openvino

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

1.4M 10K 3K
akshaynagpal
word2number

Convert number words (eg. twenty one) to numeric digits (21)

1.4M 179 77
    • Data from PyPI, GitHub, ClickHouse, and BigQuery