mirror of
https://github.com/ocrmypdf/OCRmyPDF.git
synced 2026-05-07 06:07:58 -04:00
fileinput is supposed to save time in these cases but it's not capable of doing both in-place rewrites and working with a non-ascii encoding. This was not noticed until characters outside of ASCII were picked up by tesseract and saved in a HOCR file. Rework some surrounding code as well and add multilingual test cases.