PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
OpenNMT
pyonmttok

Fast and customizable text tokenization library with BPE and SentencePiece support

61K 333 82
stef41
toksight

Tokenizer analysis toolkit. Compare vocabulary coverage, compression ratios, and token boundaries across GPT-4o, Llama 3, Mistral, and any HuggingFace tokenizer.

695 1 0
Systemcluster
kitoken

Fast and versatile tokenizer for language models, supporting BPE, Unigram and WordPiece tokenization

646 49 2
himkt
tiny-tokenizer

No description available

471 261 25
Okramjimmy
meitei-senter

Neural sentence boundary detection for Meitei Mayek (Manipuri) using SentencePiece tokenization and a CNN-based spaCy pipeline.

292 0 0
ZJaume
escape-unk

Escape unknown symbols in SentecePiece vocabularies

277 0 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery