MTEB: Massive Text Embedding Benchmark
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
A Scandinavian Benchmark for sentence embeddings
NLP pipelines for Tagalog using spaCy
R-BPE: Improving BPE-Tokenizers with Token Reuse