James R. Barlow
74d2a156c4
Update cache
2024-01-07 01:35:05 -08:00
James R. Barlow
14365d10b8
Skip testing oom killer on Python 3.12
...
Need to investigate further if there's a safe way to do this test.
2024-01-02 16:28:22 -08:00
James R. Barlow
9489c01259
Skip test_encrypted on Py3.12 + macOS
2023-12-08 00:12:24 -08:00
James R. Barlow
a4987733c4
Filter rl_safe_eval deprecation warning
...
Full message
eportlab/lib/rl_safe_eval.py:11: DeprecationWarning: ast.NameConstant is deprecated and will be removed in Python 3.14; use ast.Constant instead
haveNameConstant = hasattr(ast,'NameConstant')
Warning triggered by reportlab-4.0.7 and Python 3.12
2023-12-07 23:40:23 -08:00
James R. Barlow
445617a1a5
Rebuild cache for hocr default case
2023-12-03 15:16:18 -08:00
James R. Barlow
f6e90a5934
hOCR renderer is now default
2023-12-02 19:58:00 -08:00
James R. Barlow
11d3e32f1e
Fix hocrtransform CLI
2023-12-02 08:08:29 -08:00
James R. Barlow
03669183d7
Rationalize canvas interface
2023-11-20 15:54:13 -08:00
James R. Barlow
db2e5132e6
Remove some obsolete parameters
2023-11-20 00:10:55 -08:00
James R. Barlow
c591f9601a
Remove Latin hOCR test
2023-11-19 23:51:27 -08:00
James R. Barlow
27d5229842
Make logger names unique
2023-11-09 23:03:39 -08:00
James R. Barlow
a596ccf844
Raise exception if resulting PDF might appear blank in a known in some PDF viewers
...
Fixes #1187
2023-11-09 22:33:22 -08:00
James R. Barlow
e7fa97731f
ghostscript duplicate filter: filter within a window of previous messages
2023-11-09 22:32:39 -08:00
James R. Barlow
290aa28108
Fix error on attempt to write to debug log after removing debug log handler
2023-11-09 16:02:41 -08:00
James R. Barlow
916106733c
Skip semfree unless on Linux
2023-10-30 00:33:21 -07:00
James R. Barlow
71166f7be8
Make hocr API experimental for now
...
This commit can be reverted when we are ready to release a new version.
2023-10-30 00:07:10 -07:00
James R. Barlow
580252a1a0
Merge branch 'feature/gscan2pdf'
...
Reconcile release notes and copy_final() with new pipeline.
2023-10-30 00:01:28 -07:00
James R. Barlow
b5e73ac4e4
Drop check for obsolete .dockerinit file
2023-10-24 13:49:46 -07:00
James R. Barlow
db3df13e95
Remove ocrmypdf._sync
2023-10-24 00:54:31 -07:00
James R. Barlow
9ffb45f283
Remove public domain congress.jpg and replace with baiona_color.jpg
...
For reuse compliance we are phasing out public domain licenses
2023-10-24 00:54:31 -07:00
James R. Barlow
a06ab2a1c5
unpaper: Remove format conversion
...
Code is no longer reachable since we rasterize a 1/L/RGB image prior to this point.
2023-10-24 00:54:31 -07:00
James R. Barlow
dfa4ebf1a6
Simplify function signature of extract_image_filter
2023-10-24 00:54:31 -07:00
James R. Barlow
58f388c69d
optimize: better coverage
2023-10-24 00:54:31 -07:00
James R. Barlow
990b462a94
Fix coverage settings and cover semfree
2023-10-24 00:54:31 -07:00
James R. Barlow
b928dc0808
Skip fewer tests
2023-10-24 00:54:31 -07:00
James R. Barlow
8916955f45
Convert many run_ocrmypdf -> run_ocrmypdf_api
2023-10-24 00:54:31 -07:00
James R. Barlow
82bef40aa6
Eliminate more run_ocrmypdf calls
2023-10-24 00:54:31 -07:00
James R. Barlow
1c45f32941
tests: replace many run_ocrmypdf -> run_ocrmypdf_api
2023-10-24 00:54:31 -07:00
James R. Barlow
fadc0cf69b
Replace cryptic test error messages with more informative ones
2023-10-24 00:54:31 -07:00
James R. Barlow
eb3a51e33a
Prefer pikepdf's newer Page.mediabox accessor over .MediaBox
2023-10-24 00:54:31 -07:00
James R. Barlow
a4059762e6
Fix hocrtransform test to generate blank hocr
2023-10-24 00:54:31 -07:00
James R. Barlow
16eb5627a7
Fix unused imports and other trivia
2023-10-24 00:54:31 -07:00
James R. Barlow
fbf0674189
hocr_to_ocr_pdf: handle missing hocr json file
2023-10-24 00:54:31 -07:00
James R. Barlow
7935914f55
Use empty .hocr file instead of dummy template for symmetry with sandwich
2023-10-24 00:54:31 -07:00
James R. Barlow
23951c9e38
Working HOCR folder to PDF converter
2023-10-24 00:54:30 -07:00
James R. Barlow
e8ae370ceb
Eliminate api= kwarg and implicit creation of pluginmanager
2023-10-24 00:54:30 -07:00
James R. Barlow
1a7738a925
Refactor -migrate metadata repair to new module
2023-10-24 00:54:30 -07:00
James R. Barlow
68bb38d0ad
pdf_to_hocr: improve plugin handling
2023-10-24 00:52:31 -07:00
James R. Barlow
0443e87345
Introduce pdf_to_hocr API
2023-10-24 00:52:31 -07:00
James R. Barlow
95b14ee282
Refactor lossless reconstruction setter into separate function
...
Still messy but good enough as a start.
2023-10-24 00:52:31 -07:00
James R. Barlow
93fda0dd00
Detect and warn about Tagged PDFs
2023-10-12 01:03:09 -07:00
James R. Barlow
91a14660b3
Require Pillow >= 10.0.1 and drop shims for older versions
2023-10-04 00:04:28 -07:00
James R. Barlow
10530a8698
Change Ghostscript version skip to fail
...
Reported to fail on earlier versions than the check tested for.
2023-09-26 20:02:52 -07:00
James R. Barlow
d5128c5cf5
Further improvements to image DPI calculation
2023-09-26 00:28:54 -07:00
James R. Barlow
ea36aedb5f
Overhaul version checkers to prefer Version to str
2023-09-25 00:59:44 -07:00
James R. Barlow
8fcf358934
Rename pike local variable to pdf for consistency
2023-09-25 00:22:26 -07:00
James R. Barlow
7018e2b247
Refactor ghostscript error message deduplicating
2023-09-24 20:22:04 -07:00
James R. Barlow
d855f63985
Remove single dispatch version of calculate_downsample
2023-09-23 15:05:04 -07:00
James R. Barlow
146da79c00
Regenerate test cache
2023-09-21 00:24:55 -07:00
James R. Barlow
0388c23ae7
Merge branch 'feature/jbig2thresh' into v15
2023-09-21 00:07:05 -07:00