PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Corpus Python Packages

Python packages with the GitHub topic corpus. Sorted by relevance, with stars and monthly downloads.
neocl
speach

🐍🍑 Python 3 library for managing, annotating, and converting natural language corpuses using popular formats (CoNLL, ELAN, Praat, CSV, JSON, SQLite, VTT, Audacity, TTL, TIG, ISF, etc.)

30K 21 6
gunthercox
chatterbot-corpus

A multilingual dialog corpus

8K 1K 1K
flairNLP
fundus

A very simple news crawler with a funny name

6K 452 108
johentsch
ms3

A parser for annotated MuseScore 3 files.

3K 55 6
gambolputty
german-nouns

A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the data and parse compound words.

3K 169 22
MozillaSecurity
corpus-replicator

A corpus generation tool

2K 27 3
ko-nlp
korpora

Korean corpus repository

2K 748 79
GlobalMaksimum
sadedegel

A General Purpose NLP library for Turkish

2K 95 14
NationalLibraryOfNorway
maalfrid-toolkit

Toolkit for the Målfrid project

1K 2 1
tanloong
neosca

L2SCA & LCA fork: cross-platform, GUI, without Java dependency

1K 43 14
NetherlandsForensicInstitute
demeuk

Demeuk is a simple tool to clean up corpora (like dictionaries) or any dataset containing plain text strings.

1K 22 4
affjljoo3581
expanda

The universal integrated corpus-building environment.

1K 33 7
entelecheia
ekorpkit

eKorpkit provides a flexible interface for NLP and ML research pipelines such as extraction, transformation, tokenization, training, and visualization.

984 6 2
eggplants
aovec

Make Word2Vec from aozorabunko/aozorabunko

918 3 0
yonkornilov
opus-api

OPUS (opus.nlpl.eu) Python3 API

916 18 5
lovit
krwordrank

Korean Corpus Downloader

902 747 79
kunansy
rnc

API for Russian National Corpus

842 9 1
tarepan
npvcc2016

Python loader of npVCC2016 corpus

500 0 0
asshatter
keywords

This is a simple library for extracting keywords from data with/without using a corpus.

459 8 3
CLUEBenchmark
pyclue

Python toolkit for Chinese Language Understanding(CLUE) Evaluation benchmark

455 133 15
IngoKl
textdirectory

TextDirectory allows you to filter, transform, and combine multiple text files into one aggregated file.

423 11 2
GateNLP
wpextract

Create datasets from WordPress sites

372 6 0
letuananh
texttaglib

Python library for managing and annotating text corpuses in different formats (ELAN, TIG, TTL, et cetera)

358 0 0
grammarly
ua-gec

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

342 270 23
    • Data from PyPI, GitHub, ClickHouse, and BigQuery