PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
OpenNMT
pyonmttok

Fast and customizable text tokenization library with BPE and SentencePiece support

61K 333 82
akretion
nfelib

nfelib - bindings Python para e ler e gerar XML de NF-e, NFS-e nacional, CT-e, MDF-e, BP-e

36K 191 69
rsennrich
subword-nmt

Unsupervised Word Segmentation for Neural Machine Translation and Text Generation

25K 2K 472
gweidart
rs-bpe

A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust

24K 38 5
vkcom
youtokentome

Unsupervised text tokenizer focused on computational efficiency

11K 977 108
soaxelbrooke
bpe

Byte Pair Encoding for Python!

2K 232 39
neluca
tinybpe

This is an ultra-fast, lightweight and clean code implementation of the Byte Pair Encoding (BPE) algorithm.

1K 4 0
VihangaFTW
bytetok

A fast, modular and light-weight BPE tokenizer for NLP research and prototyping.

776 2 0
Systemcluster
kitoken

Fast and versatile tokenizer for language models, supporting BPE, Unigram and WordPiece tokenization

654 49 2
zouharvi
tokenization-scorer

Simple-to-use scoring function for arbitrarily tokenized texts.

542 48 6
Hk669
bpetokenizer

(py package) train your own tokenizer based on BPE algorithm for the LLMs (supports the regex pattern and special tokens)

511 3 1
Thibault00
runtoken

A blazing-fast BPE tokenizer for LLMs. Drop-in tiktoken replacement, 20-80x faster.

502 1 0
crodriguez1a
bpe-summarizer

Auto summarization from BPE tokenization

373 3 1
thammegowda
nlcodec

Natural Language EnCoder-Decoder: word, char, bpe etc

372 5 4
sefineh-ai
amharic-tokenizer

Amharic tokenizer with BPE-like merges over decomposed fidel (Cython)

288 99 14
DVDAGames
pgn-tokenizer

A byte pair encoding (BPE) tokenizer for chess portable game notation (PGN)

263 0 0
TnsaAi
tokenize2

Official Repository of Tokenize2 Tokenizers by TNSA

178 0 0
shiningsunnyday
geobpe

Protein Structure Tokenization via Geometric Byte Pair Encoding (GeoBPE)

146 24 4
akretion
nfelib-xsdata

nfelib - bindings Python para e ler e gerar XML de NF-e, NFS-e nacional, CT-e, MDF-e, BP-e

3 191 69
    • Data from PyPI, GitHub, ClickHouse, and BigQuery