Ghostscript 10.6 has a bug that truncates JPEG data by 1-15 bytes.
This adds detection and repair by comparing output images to input
images and restoring the original bytes when truncation is detected.
- Add warning when GS 10.6+ is used with PDF/A output
- Add _repair_gs106_jpeg_corruption() to fix damaged JPEGs after
Ghostscript processing
- Add unit tests for the repair function
Instead the standard executor will fall back to threads.
semfree caused test failures with Py3.14:
https://github.com/ocrmypdf/OCRmyPDF/issues/1558
In retrospect and with emerging Python tech like freethreading, semfree is becoming less necessary. We can use threads for the time being.
A consequence is that performance may be lower on Lambda and Termux when we are using threads and not shelling out work.
strip_invisible_text resets the text render mode on each `BT` (begin text) command. However the text state is not actually reset for each text element, only for each page.
The pdf reference says:
> The text state operators can appear outside text objects, and the values they set
> are retained across text objects in a single content stream. Like other graphics
> state parameters, these parameters are initialized to their default values at the
> beginning of each page.
>
> -- https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/pdfreference1.7old.pdf#page=397
With the current implementation, a text object is only deleted if it contains a `3 Tr` command (setting the text rendering mode to invalid). However the rendering mode may be set once and then not changed for multiple text objects or set outside of a text object.
In that case only the first text object (which contains the `3 Tr`-command) is removed. This not only leaves the other text objects in the pdf, but also makes them visible, since the text object that contained the `3 Tr`-command is removed.
This PR updates `strip_invisible_text` to not reset the rendering mode for each object and to keep track of the rendering mode when the graphic state is pushed/popped.
Before 42ff7fc842, `make_rotate_test`
always used `resources / 'typewriter.png'`, but after the change the
second call accidentally used just `resources`, which is a directory,
and fails to open.
Full message
eportlab/lib/rl_safe_eval.py:11: DeprecationWarning: ast.NameConstant is deprecated and will be removed in Python 3.14; use ast.Constant instead
haveNameConstant = hasattr(ast,'NameConstant')
Warning triggered by reportlab-4.0.7 and Python 3.12