Python package for working with MediaWiki XML content dumps
A Python toolkit to generate a tokenized dump of Wikipedia for NLP
Flexible Wikipedia dataset builder with sampling and pretraining support. Built on top of wikipedia-monthly, providing fresh, clean Wikipedia dumps updated monthly.
Read Wikipedia dumps
A library that assists in traversing and downloading from Wikimedia Data Dumps and their mirrors.