Text preprocessing and PII anonymisation for NLP/ML. ONNX NER ensemble, language detection, stopword removal. Built for statistical ML and language models.
This repository contains the code and experimental data for the Text Anonymization Evaluator (TAE), an evaluation tool for anonymized documents including multiple state-of-the-art metrics for both utility preservation and privacy protection assessment.