PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
explosion
spacy

💫 Industrial-strength Natural Language Processing (NLP) in Python

21.9M 34K 5K
WorksApplications
sudachipy

Sudachi in Rust 🦀 and new generation of SudachiPy

1.9M 442 50
AgentOps-AI
tokencost

Easy token price estimates for 400+ LLMs. TokenOps.

384K 2K 104
mysto
ff3

FPE - Format Preserving Encryption with FF3 in Python

251K 104 20
ScrapeGraphAI
toonify

Toonify: Compact data format reducing LLM token usage by 30-60%

187K 337 24
natasha
razdel

Rule-based token, sentence segmentation for Russian language

99K 281 34
adbar
simplemma

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

90K 195 15
OpenNMT
pyonmttok

Fast and customizable text tokenization library with BPE and SentencePiece support

61K 333 82
izikeros
count-tokens

Count tokens in a text file.

46K 13 0
davidpirogov
toon-llm

Token-Oriented Object Notation (TOON) is an LLM-optimized data serialization format implemented in Python.

29K 9 3
nlpcloud
nlpcloud

NLP Cloud serves high performance pre-trained or custom models for NER, sentiment-analysis, classification, summarization, paraphrasing, intent classification, product description and ad generation, chatbot, grammar and spelling correction, keywords and keyphrases extraction, text generation, image generation, code generation, and more...

16K 86 8
OpenVoiceOS
quebra-frases

chunks strings into byte sized pieces

13K 1 3
vkcom
youtokentome

Unsupervised text tokenizer focused on computational efficiency

11K 977 108
AlgoBrother
mayatok

MayaTok is a Byte Pair Encoding based Tokenizer.

7K 1 0
PyThaiNLP
attacut

A Fast and Accurate Neural Thai Word Segmenter

7K 94 18
THUDM
icetk

A unified tokenization tool for Images, Chinese and English.

4K 153 17
explosion
spacy-streamlit

👑 spaCy building blocks and visualizers for Streamlit apps

4K 854 117
BitBadges
bitbadgespy-sdk

The most feature-rich tokenization standard ever built — TypeScript SDK for the BitBadges tokenization Cosmos SDK module

3K 0 0
rosette-api
rosette-api

Babel Street Analytics Client Library for Python

3K 38 37
lucidrains
h-net-dynamic-chunking

H-Net Dynamic Chunking Modules

3K 71 2
cedricrupb
code-tokenize

Fast tokenization and structural analysis of any programming language

2K 62 10
daac-tools
vaporetto

🛥 Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto.

2K 21 1
cbaziotis
ekphrasis

Text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction.

2K 675 94
TI-Toolkit
tivars

A Python library for interacting with TI-(e)z80 (82/83/84 series) calculator files

2K 26 1
    • Data from PyPI, GitHub, ClickHouse, and BigQuery