28 dependents
Package Description Downloads/month
Now included in rigour 148K
Data cleaning and validation functions for names, languages, identifiers, etc. 87K
:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ...... 74K
Data model and processing tools for investigative entity graphs. Used by OpenSan... 64K
Utility library to turn country names into ISO two-letter codes 38K
A Python library for defining rule-based overrides on messy data 9K
FOSSLight Source Scanner 7K
Common interface definitions for aleph toolkit services and applications 6K
:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ...... 5K
Lightweight web scraping toolkit for documents and structured data. 4K
Repository scanner for the identification of effective licenses and copyright in... 2K
API for OpenSanctions with support for entity search and bulk matching of data c... 1K
A simple interface written in python for reproducible i/o workflows around tabul... 1K
Fragment storage/database layer for FollowTheMoney entities 845
Content-addressable storage for files across S3 and local file systems 780
Extract names of places from text and determine which country they may refer to 546
Common interface definitions for aleph toolkit services and applications 528
A library with real-world data parsers. 528
A reconciliation service for OpenRefine serving data from a given CSV file. 487
Python package installation for OU module TM351 307
Scrape HTML to dictionaries 260
Translate FollowTheMoney Document entities via Apertium or Argos 248
followthemoney query dsl and io helpers 244
Reusable Python components to be shared with Python projects. 229
Utility functions for scrapers hosted on morph.io 215
CSV to Postgres data puncher. 185
Tool to rename Greenhouse Gas tables to a unified scheme 155
A minimalistic, recursive web crawling library for Python. 93