Commit Graph

795 Commits

Author SHA1 Message Date
James R. Barlow
74d2a156c4 Update cache 2024-01-07 01:35:05 -08:00
James R. Barlow
14365d10b8 Skip testing oom killer on Python 3.12
Need to investigate further if there's a safe way to do this test.
2024-01-02 16:28:22 -08:00
James R. Barlow
9489c01259 Skip test_encrypted on Py3.12 + macOS 2023-12-08 00:12:24 -08:00
James R. Barlow
a4987733c4 Filter rl_safe_eval deprecation warning
Full message
eportlab/lib/rl_safe_eval.py:11: DeprecationWarning: ast.NameConstant is deprecated and will be removed in Python 3.14; use ast.Constant instead
    haveNameConstant = hasattr(ast,'NameConstant')

Warning triggered by reportlab-4.0.7 and Python 3.12
2023-12-07 23:40:23 -08:00
James R. Barlow
445617a1a5 Rebuild cache for hocr default case 2023-12-03 15:16:18 -08:00
James R. Barlow
f6e90a5934 hOCR renderer is now default 2023-12-02 19:58:00 -08:00
James R. Barlow
11d3e32f1e Fix hocrtransform CLI 2023-12-02 08:08:29 -08:00
James R. Barlow
03669183d7 Rationalize canvas interface 2023-11-20 15:54:13 -08:00
James R. Barlow
db2e5132e6 Remove some obsolete parameters 2023-11-20 00:10:55 -08:00
James R. Barlow
c591f9601a Remove Latin hOCR test 2023-11-19 23:51:27 -08:00
James R. Barlow
27d5229842 Make logger names unique 2023-11-09 23:03:39 -08:00
James R. Barlow
a596ccf844 Raise exception if resulting PDF might appear blank in a known in some PDF viewers
Fixes #1187
2023-11-09 22:33:22 -08:00
James R. Barlow
e7fa97731f ghostscript duplicate filter: filter within a window of previous messages 2023-11-09 22:32:39 -08:00
James R. Barlow
290aa28108 Fix error on attempt to write to debug log after removing debug log handler 2023-11-09 16:02:41 -08:00
James R. Barlow
916106733c Skip semfree unless on Linux 2023-10-30 00:33:21 -07:00
James R. Barlow
71166f7be8 Make hocr API experimental for now
This commit can be reverted when we are ready to release a new version.
2023-10-30 00:07:10 -07:00
James R. Barlow
580252a1a0 Merge branch 'feature/gscan2pdf'
Reconcile release notes and copy_final() with new pipeline.
2023-10-30 00:01:28 -07:00
James R. Barlow
b5e73ac4e4 Drop check for obsolete .dockerinit file 2023-10-24 13:49:46 -07:00
James R. Barlow
db3df13e95 Remove ocrmypdf._sync 2023-10-24 00:54:31 -07:00
James R. Barlow
9ffb45f283 Remove public domain congress.jpg and replace with baiona_color.jpg
For reuse compliance we are phasing out public domain licenses
2023-10-24 00:54:31 -07:00
James R. Barlow
a06ab2a1c5 unpaper: Remove format conversion
Code is no longer reachable since we rasterize a 1/L/RGB image prior to this point.
2023-10-24 00:54:31 -07:00
James R. Barlow
dfa4ebf1a6 Simplify function signature of extract_image_filter 2023-10-24 00:54:31 -07:00
James R. Barlow
58f388c69d optimize: better coverage 2023-10-24 00:54:31 -07:00
James R. Barlow
990b462a94 Fix coverage settings and cover semfree 2023-10-24 00:54:31 -07:00
James R. Barlow
b928dc0808 Skip fewer tests 2023-10-24 00:54:31 -07:00
James R. Barlow
8916955f45 Convert many run_ocrmypdf -> run_ocrmypdf_api 2023-10-24 00:54:31 -07:00
James R. Barlow
82bef40aa6 Eliminate more run_ocrmypdf calls 2023-10-24 00:54:31 -07:00
James R. Barlow
1c45f32941 tests: replace many run_ocrmypdf -> run_ocrmypdf_api 2023-10-24 00:54:31 -07:00
James R. Barlow
fadc0cf69b Replace cryptic test error messages with more informative ones 2023-10-24 00:54:31 -07:00
James R. Barlow
eb3a51e33a Prefer pikepdf's newer Page.mediabox accessor over .MediaBox 2023-10-24 00:54:31 -07:00
James R. Barlow
a4059762e6 Fix hocrtransform test to generate blank hocr 2023-10-24 00:54:31 -07:00
James R. Barlow
16eb5627a7 Fix unused imports and other trivia 2023-10-24 00:54:31 -07:00
James R. Barlow
fbf0674189 hocr_to_ocr_pdf: handle missing hocr json file 2023-10-24 00:54:31 -07:00
James R. Barlow
7935914f55 Use empty .hocr file instead of dummy template for symmetry with sandwich 2023-10-24 00:54:31 -07:00
James R. Barlow
23951c9e38 Working HOCR folder to PDF converter 2023-10-24 00:54:30 -07:00
James R. Barlow
e8ae370ceb Eliminate api= kwarg and implicit creation of pluginmanager 2023-10-24 00:54:30 -07:00
James R. Barlow
1a7738a925 Refactor -migrate metadata repair to new module 2023-10-24 00:54:30 -07:00
James R. Barlow
68bb38d0ad pdf_to_hocr: improve plugin handling 2023-10-24 00:52:31 -07:00
James R. Barlow
0443e87345 Introduce pdf_to_hocr API 2023-10-24 00:52:31 -07:00
James R. Barlow
95b14ee282 Refactor lossless reconstruction setter into separate function
Still messy but good enough as a start.
2023-10-24 00:52:31 -07:00
James R. Barlow
93fda0dd00 Detect and warn about Tagged PDFs 2023-10-12 01:03:09 -07:00
James R. Barlow
91a14660b3 Require Pillow >= 10.0.1 and drop shims for older versions 2023-10-04 00:04:28 -07:00
James R. Barlow
10530a8698 Change Ghostscript version skip to fail
Reported to fail on earlier versions than the check tested for.
2023-09-26 20:02:52 -07:00
James R. Barlow
d5128c5cf5 Further improvements to image DPI calculation 2023-09-26 00:28:54 -07:00
James R. Barlow
ea36aedb5f Overhaul version checkers to prefer Version to str 2023-09-25 00:59:44 -07:00
James R. Barlow
8fcf358934 Rename pike local variable to pdf for consistency 2023-09-25 00:22:26 -07:00
James R. Barlow
7018e2b247 Refactor ghostscript error message deduplicating 2023-09-24 20:22:04 -07:00
James R. Barlow
d855f63985 Remove single dispatch version of calculate_downsample 2023-09-23 15:05:04 -07:00
James R. Barlow
146da79c00 Regenerate test cache 2023-09-21 00:24:55 -07:00
James R. Barlow
0388c23ae7 Merge branch 'feature/jbig2thresh' into v15 2023-09-21 00:07:05 -07:00