PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Bpe Python Packages

Python packages with the GitHub topic bpe. Sorted by relevance, with stars and monthly downloads.
OpenNMT
pyonmttok

Fast and customizable text tokenization library with BPE and SentencePiece support

62K 333 82
akretion
nfelib

nfelib - bindings Python para e ler e gerar XML de NF-e, NFS-e nacional, CT-e, MDF-e, BP-e

37K 191 69
rsennrich
subword-nmt

Unsupervised Word Segmentation for Neural Machine Translation and Text Generation

27K 2K 472
gweidart
rs-bpe

A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust

24K 38 5
vkcom
youtokentome

Unsupervised text tokenizer focused on computational efficiency

11K 977 108
soaxelbrooke
bpe

Byte Pair Encoding for Python!

2K 232 39
neluca
tinybpe

This is an ultra-fast, lightweight and clean code implementation of the Byte Pair Encoding (BPE) algorithm.

1K 4 0
VihangaFTW
bytetok

A fast, modular and light-weight BPE tokenizer for NLP research and prototyping.

833 2 0
Systemcluster
kitoken

Fast and versatile tokenizer for language models, supporting BPE, Unigram and WordPiece tokenization

621 49 2
zouharvi
tokenization-scorer

Simple-to-use scoring function for arbitrarily tokenized texts.

540 48 6
Thibault00
runtoken

A blazing-fast BPE tokenizer for LLMs. Drop-in tiktoken replacement, 20-80x faster.

494 1 0
Hk669
bpetokenizer

(py package) train your own tokenizer based on BPE algorithm for the LLMs (supports the regex pattern and special tokens)

484 3 1
thammegowda
nlcodec

Natural Language EnCoder-Decoder: word, char, bpe etc

363 5 4
crodriguez1a
bpe-summarizer

Auto summarization from BPE tokenization

345 3 1
sefineh-ai
amharic-tokenizer

Amharic tokenizer with BPE-like merges over decomposed fidel (Cython)

300 99 14
DVDAGames
pgn-tokenizer

A byte pair encoding (BPE) tokenizer for chess portable game notation (PGN)

261 0 0
TnsaAi
tokenize2

Official Repository of Tokenize2 Tokenizers by TNSA

176 0 0
shiningsunnyday
geobpe

Protein Structure Tokenization via Geometric Byte Pair Encoding (GeoBPE)

140 24 4
akretion
nfelib-xsdata

nfelib - bindings Python para e ler e gerar XML de NF-e, NFS-e nacional, CT-e, MDF-e, BP-e

3 191 69
    • Data from PyPI, GitHub, ClickHouse, and BigQuery