The Python Data Valuation Library
DataCull is a modular, light-weight data pruning library containing many dataset pruning (coreset selection) algorithm including the official Implementation of the paper, titled, RCAP: Robust, Class-Aware, Probab ilistic Dynamic Dataset Pruning
A lightweight, extensible Python library for data pruning with Hugging Face datasets and transformers