James R. Barlow
4fc0c3a0d5
Add watcher test, such as it is
2025-08-13 01:04:58 -07:00
James R. Barlow
175b743ffe
Fix version test
2025-07-03 11:30:05 -07:00
James R. Barlow
45cf92f40b
xfail Python logging bug in 3.13.3/4
2025-07-03 09:21:31 -07:00
James R. Barlow
3beabf55e7
Skip optimizing images with pre-blended soft masks
...
Fixes issue [Bug]: Optimized pdf not rendering with Quartz / Core Graphics #1536
2025-06-12 23:58:43 -07:00
James R. Barlow
6851ea7f11
Remove test since ghostscript error handling changed
2025-04-21 12:23:34 -07:00
James R. Barlow
32322a9fe9
Fix broken test_hocrtransform_matches_sandwich
...
Expect word similarity rather than exact match. Difference appears to be due to quote styles.
Thanks @QuLogic for reporting.
2025-02-09 13:57:50 -08:00
James R. Barlow
137b054f43
Adjust test again for older Ghostscript
2025-01-27 23:44:37 -08:00
James R. Barlow
65df44f670
Modify tests to deal with variety of Ghostscript versions
2025-01-09 02:14:29 -08:00
James R. Barlow
6edc749023
Fix error handling when PDF contains an invalid image with both ImageMask and ColorSpace set
...
Fixes #1453
2025-01-07 00:27:07 -08:00
Kara Engelhardt
636623ab49
graft: fix invisible text appearing after strip_invisible_text
...
strip_invisible_text resets the text render mode on each `BT` (begin text) command. However the text state is not actually reset for each text element, only for each page.
The pdf reference says:
> The text state operators can appear outside text objects, and the values they set
> are retained across text objects in a single content stream. Like other graphics
> state parameters, these parameters are initialized to their default values at the
> beginning of each page.
>
> -- https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/pdfreference1.7old.pdf#page=397
With the current implementation, a text object is only deleted if it contains a `3 Tr` command (setting the text rendering mode to invalid). However the rendering mode may be set once and then not changed for multiple text objects or set outside of a text object.
In that case only the first text object (which contains the `3 Tr`-command) is removed. This not only leaves the other text objects in the pdf, but also makes them visible, since the text object that contained the `3 Tr`-command is removed.
This PR updates `strip_invisible_text` to not reset the rendering mode for each object and to keep track of the rendering mode when the graphic state is pushed/popped.
2024-12-11 18:01:12 +01:00
James R. Barlow
9a075039b5
Remove empty test file
2024-12-02 11:23:35 -08:00
James R. Barlow
fe89be5dc0
Fix test broken in commit 85d6fb8c
2024-11-27 15:44:12 -08:00
James R. Barlow
a659f83d67
Remove invalid hyperlink annotations to satisfy Ghostscript 10.x during PDF/A conversion
...
Closes #1425
2024-11-16 19:02:10 -08:00
James R. Barlow
dbd3c93757
Fix issue with unpickling HOCRResult
...
Fixes [Bug]: HOCRResult.from_json() not unpickling correctly #1427
2024-11-10 02:05:57 -08:00
James R. Barlow
f77f701a50
Fix quadratic time performance regression on scanning pages
2024-10-27 21:57:53 -07:00
James R. Barlow
6f755321b8
Ignore unpaper warning message when checking version
...
Fixes #1409
2024-10-27 13:06:30 -07:00
James R. Barlow
18b59c57b4
Refactor our tests that check if we are in a container
2024-10-27 11:55:22 -07:00
Elliott Sales de Andrade
bb4c47e707
Fix broken test_rotate_page_level ( #1382 )
...
Before 42ff7fc842 , `make_rotate_test`
always used `resources / 'typewriter.png'`, but after the change the
second call accidentally used just `resources`, which is a directory,
and fails to open.
2024-08-21 01:25:07 -07:00
James R. Barlow
59f6bc8306
More Tesseract-specific language checks to its plugin
2024-06-01 00:15:50 -07:00
James R. Barlow
abf9729c61
Semfree test: accept pdfa conversion failed as a valid return code
...
Fixes #1316
2024-05-21 01:26:11 -07:00
James R. Barlow
7a8cc21e31
Add support for sidecar output to io.BytesIO
...
Closes #1252
2024-04-07 01:38:55 -07:00
James R. Barlow
065bddbc6c
Reformat with ruff format
2024-04-07 00:25:32 -07:00
James Barlow
855de287b2
Fix test suite failure with Ghostscript >= 10.3
...
Ghostscript is more picky about a specific case with SMask that cannot be converted to PDF/A
Details here
4dcfae36bb
2024-03-19 17:20:33 -07:00
James R. Barlow
6a746a1cbb
ruff linting/Python 3.10 cleanup
2024-02-14 12:41:51 -08:00
James R. Barlow
42ff7fc842
Fix handling of pages that are restored to correct orientation with /Rotate
...
Appears inversion of CTM was incorrect, introduced in commit 9898904
2024-02-12 01:32:26 -08:00
James R. Barlow
26470fe16a
Suppress reportlab deprecation warning
2024-02-12 01:17:08 -08:00
James R. Barlow
74d2a156c4
Update cache
2024-01-07 01:35:05 -08:00
James R. Barlow
14365d10b8
Skip testing oom killer on Python 3.12
...
Need to investigate further if there's a safe way to do this test.
2024-01-02 16:28:22 -08:00
James R. Barlow
9489c01259
Skip test_encrypted on Py3.12 + macOS
2023-12-08 00:12:24 -08:00
James R. Barlow
a4987733c4
Filter rl_safe_eval deprecation warning
...
Full message
eportlab/lib/rl_safe_eval.py:11: DeprecationWarning: ast.NameConstant is deprecated and will be removed in Python 3.14; use ast.Constant instead
haveNameConstant = hasattr(ast,'NameConstant')
Warning triggered by reportlab-4.0.7 and Python 3.12
2023-12-07 23:40:23 -08:00
James R. Barlow
445617a1a5
Rebuild cache for hocr default case
2023-12-03 15:16:18 -08:00
James R. Barlow
f6e90a5934
hOCR renderer is now default
2023-12-02 19:58:00 -08:00
James R. Barlow
11d3e32f1e
Fix hocrtransform CLI
2023-12-02 08:08:29 -08:00
James R. Barlow
03669183d7
Rationalize canvas interface
2023-11-20 15:54:13 -08:00
James R. Barlow
db2e5132e6
Remove some obsolete parameters
2023-11-20 00:10:55 -08:00
James R. Barlow
c591f9601a
Remove Latin hOCR test
2023-11-19 23:51:27 -08:00
James R. Barlow
27d5229842
Make logger names unique
2023-11-09 23:03:39 -08:00
James R. Barlow
a596ccf844
Raise exception if resulting PDF might appear blank in a known in some PDF viewers
...
Fixes #1187
2023-11-09 22:33:22 -08:00
James R. Barlow
e7fa97731f
ghostscript duplicate filter: filter within a window of previous messages
2023-11-09 22:32:39 -08:00
James R. Barlow
290aa28108
Fix error on attempt to write to debug log after removing debug log handler
2023-11-09 16:02:41 -08:00
James R. Barlow
916106733c
Skip semfree unless on Linux
2023-10-30 00:33:21 -07:00
James R. Barlow
71166f7be8
Make hocr API experimental for now
...
This commit can be reverted when we are ready to release a new version.
2023-10-30 00:07:10 -07:00
James R. Barlow
580252a1a0
Merge branch 'feature/gscan2pdf'
...
Reconcile release notes and copy_final() with new pipeline.
2023-10-30 00:01:28 -07:00
James R. Barlow
b5e73ac4e4
Drop check for obsolete .dockerinit file
2023-10-24 13:49:46 -07:00
James R. Barlow
db3df13e95
Remove ocrmypdf._sync
2023-10-24 00:54:31 -07:00
James R. Barlow
9ffb45f283
Remove public domain congress.jpg and replace with baiona_color.jpg
...
For reuse compliance we are phasing out public domain licenses
2023-10-24 00:54:31 -07:00
James R. Barlow
a06ab2a1c5
unpaper: Remove format conversion
...
Code is no longer reachable since we rasterize a 1/L/RGB image prior to this point.
2023-10-24 00:54:31 -07:00
James R. Barlow
dfa4ebf1a6
Simplify function signature of extract_image_filter
2023-10-24 00:54:31 -07:00
James R. Barlow
58f388c69d
optimize: better coverage
2023-10-24 00:54:31 -07:00
James R. Barlow
990b462a94
Fix coverage settings and cover semfree
2023-10-24 00:54:31 -07:00