PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
neocl
speach

🐍🍑 Python 3 library for managing, annotating, and converting natural language corpuses using popular formats (CoNLL, ELAN, Praat, CSV, JSON, SQLite, VTT, Audacity, TTL, TIG, ISF, etc.)

29K 21 6
gunthercox
chatterbot-corpus

A multilingual dialog corpus

9K 1K 1K
flairNLP
fundus

A very simple news crawler with a funny name

5K 452 108
johentsch
ms3

A parser for annotated MuseScore 3 files.

3K 55 6
MozillaSecurity
corpus-replicator

A corpus generation tool

2K 27 3
gambolputty
german-nouns

A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the data and parse compound words.

2K 169 22
ko-nlp
korpora

Korean corpus repository

2K 748 79
GlobalMaksimum
sadedegel

A General Purpose NLP library for Turkish

2K 95 14
NationalLibraryOfNorway
maalfrid-toolkit

Toolkit for the Målfrid project

1K 2 1
affjljoo3581
expanda

The universal integrated corpus-building environment.

1K 33 7
tanloong
neosca

L2SCA & LCA fork: cross-platform, GUI, without Java dependency

1K 43 14
NetherlandsForensicInstitute
demeuk

Demeuk is a simple tool to clean up corpora (like dictionaries) or any dataset containing plain text strings.

1K 22 4
entelecheia
ekorpkit

eKorpkit provides a flexible interface for NLP and ML research pipelines such as extraction, transformation, tokenization, training, and visualization.

1K 6 2
yonkornilov
opus-api

OPUS (opus.nlpl.eu) Python3 API

852 18 5
eggplants
aovec

Make Word2Vec from aozorabunko/aozorabunko

846 3 0
lovit
krwordrank

Korean Corpus Downloader

836 747 79
kunansy
rnc

API for Russian National Corpus

819 9 1
asshatter
keywords

This is a simple library for extracting keywords from data with/without using a corpus.

453 8 3
tarepan
npvcc2016

npvcc2016: Python loader of npVCC2016 speech corpus

450 0 0
CLUEBenchmark
pyclue

Python toolkit for Chinese Language Understanding(CLUE) Evaluation benchmark

442 133 15
IngoKl
textdirectory

TextDirectory allows you to filter, transform, and combine multiple text files into one aggregated file.

413 11 2
letuananh
texttaglib

Python library for managing and annotating text corpuses in different formats (ELAN, TIG, TTL, et cetera)

375 0 0
GateNLP
wpextract

Create datasets from WordPress sites

333 6 0
grammarly
ua-gec

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

322 270 23
    • Data from PyPI, GitHub, ClickHouse, and BigQuery