A robust preprocessing pipeline for document OCR that significantly improves Tesseract accuracy on mobile-captured ID documents