34 dependents
Package Description Downloads/month
Clean US addresses following USPS pub 28 and RESO guidelines 1.7M
Thoughtful OCR Package 28K
Framework for simpler Spark Pipelines 24K
The usaddress library made easy with Pandas. 9K
A verified version of the WebArena Benchmark 6K
Address variable type for dedupe 6K
helix.personmatching 3K
Fully automated entry of job application data into Connecticut's DOL ReEmployCT ... 934
Offline address-to-Census data mapping for Python with PL 94-171 and ACS support 744
A Library for using in our CRM 672
List Massachusetts Courts in Docassemble 587
Swiss Army Kit 476
446
A powerful, flexible framework for entity resolution and record linkage. 424
LabPaper - A Jupyter notebook extension for exporting notebooks to academic pape... 413
A package for emulating common data entry errors 316
DealStat Utilities 308
262
A :snake: package for translating raw US address strings into the OSM tagging sc... 251
Python toolkit for Medicaid claims data analysis — preprocessing, cleaning, risk... 213
This module defines classes and functions for converting a CSV file of points of... 197
Scalable Data Preprocessing Tool for Training Large Language Models 192
Secure Infrastructure for Research with Administrative Data 180
Hank AI API Client for Hank AI Services. 168
156
Utilities for parsing and understanding building-related data to support covered... 146
A script to extract US-style street addresses from a text file. 142
Python wrapper for njactb. 102
Christopher Todd's Python Library For Dealing With Locations 95
A short description of your package 91
fetch, munge, and parse résumés and job postings 83
Scalable data pre processing and curation toolkit for LLMs 77
A short description of your package 65
Scalable Data Preprocessing Tool for Training Large Language Models 1