PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
chrismattmann
tika

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

410K 2K 251
bjascob
lemminflect

A python module for English lemmatization and inflection.

109K 278 26
pdrm83
sent2vec

How to encode sentences in a high-dimensional vector space, a.k.a., sentence embedding.

12K 135 12
katanaml
sparrow-parse

Structured data extraction and instruction calling with ML, LLM and Vision LLM

6K 5K 515
StabRise
scaledp

ScaleDP is an Open-Source extension of Apache Spark for Document Processing

5K 18 1
deeppavlov
deeppavlov

An open source library for deep learning end-to-end dialog systems and chatbots.

5K 7K 1K
Jasonsey
fern2

A model development structure control for NLP

4K 3 1
howl-anderson
microtokenizer

一个轻量且功能全面的中文分词器,帮助学生了解分词器的工作原理。MicroTokenizer: A lightweight Chinese tokenizer designed for educational and research purposes. Provides a practical, hands-on approach to understanding NLP concepts, featuring multiple tokenization algorithms and customizable models. Ideal for students, researchers, and NLP enthusiasts..

4K 159 22
Ars-Linguistica
mlconjug3

A Python library to conjugate verbs in French, English, Spanish, Italian, Portuguese and Romanian (more soon) using Machine Learning techniques.

3K 80 13
CLARIN-PL
clarinpl-embeddings

Embeddings: State-of-the-art Text Representations for Natural Language Processing tasks, an initial version of library focus on the Polish Language

3K 37 3
piteren
torchness

PyTorch tools

3K 0 0
MilaNLProc
contextualized-topic-models

A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).

3K 1K 151
microsoft
autobrewml

With AutoBrewML Framework the time it takes to get production-ready ML models with great ease and efficiency highly accelerates.

3K 25 31
StatguyUser
textfeatureselection

Python library for feature selection for text features. It has filter method, genetic algorithm and TextFeatureSelectionEnsemble for improving text classification models. Helps improve your machine learning models

2K 53 5
stonybrooknlp
appworld

🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and Interactive Coding Agent, ACL'24 Best Resource Paper.

2K 410 68
bobxwu
topmost

A Topic Modeling System Toolkit (ACL 2024 Demo)

2K 288 26
MAIF
melusine

📧 Melusine: Use python to automatize your email processing workflow

1K 363 58
NorskRegnesentral
skweak

skweak: A software toolkit for weak supervision applied to NLP tasks

1K 926 77
ahmetozdemirrr
turkish-syllable

In this repository there is a library that provides a spelling function for Turkish. For NLP projects with large data sets, I wrote a library in C. I made the C code usable by writing a Python wrapper.

1K 1 0
maximtrp
bitermplus

Biterm Topic Model (BTM): modeling topics in short texts

1K 85 15
SekouD
mlconjug

A Python library to conjugate French, English, Spanish, Italian, Portuguese and Romanian verbs using Machine Learning techniques.

1K 74 8
brightertiger
pygarble

A Python package for detecting garbled text using multiple detection strategies with a scikit-learn-like interface

1K 14 4
skblaz
rakun2

RaKUn 2.0 - A fast keyword detection algorithm

1K 73 7
praekelt
feersum-nlu

FeersumNLU API

962 9 3
    • Data from PyPI, GitHub, ClickHouse, and BigQuery