mirror of
https://github.com/ocrmypdf/OCRmyPDF.git
synced 2026-05-05 21:27:37 -04:00
For sanity's sake, deal with tesseract streams in binary without transcoding (via universal_newlines, etc.). The only differences are printing messages regarding spoofing. Also hash the source file so that changes to the cache mechanism invalidate old cache automatically. That is probably too aggressive, but simple and safer than the previous approach.