Commit Graph

3641 Commits

Author SHA1 Message Date
James R. Barlow
44bcafd3aa Add missing file header 2023-10-30 00:28:09 -07:00
James R. Barlow
71166f7be8 Make hocr API experimental for now
This commit can be reverted when we are ready to release a new version.
2023-10-30 00:07:10 -07:00
James R. Barlow
580252a1a0 Merge branch 'feature/gscan2pdf'
Reconcile release notes and copy_final() with new pipeline.
2023-10-30 00:01:28 -07:00
James R. Barlow
c0b60dae6a build: add repository -y 2023-10-29 23:41:36 -07:00
James R. Barlow
ae123fd209 Try to retain/copy xattrs v15.3.1 2023-10-28 01:42:06 -07:00
James R. Barlow
454ad0acc5 build/macos: add openssl 2023-10-28 01:41:28 -07:00
James R. Barlow
0c306ac328 v15.3.1 release ntoes 2023-10-28 00:49:37 -07:00
James R. Barlow
52d99732b1 Fix mistakes with watcher loglevel handling 2023-10-28 00:47:40 -07:00
James R. Barlow
5b5827983b Tweak documentation of --output-type 2023-10-26 23:57:08 -07:00
James R. Barlow
56f9bc311d Improve verbosity of colorspace selection 2023-10-25 00:38:56 -07:00
James R. Barlow
eb17dc1ecf Fix pdf save settings at metadata_fixup 2023-10-25 00:13:17 -07:00
James R. Barlow
6f8115a052 Fix import of metadata_fixup 2023-10-25 00:06:26 -07:00
James R. Barlow
aac913c666 tesseract: EAFP 2023-10-24 13:50:04 -07:00
James R. Barlow
b5e73ac4e4 Drop check for obsolete .dockerinit file 2023-10-24 13:49:46 -07:00
James R. Barlow
9e98c90891 docs: note on docker performance 2023-10-24 13:34:26 -07:00
James R. Barlow
ca2592c1d9 Update draft release notes 2023-10-24 00:56:00 -07:00
James R. Barlow
a31f17bb9d Update comments and make worker functions private 2023-10-24 00:56:00 -07:00
James R. Barlow
1cb46afa94 Update pluginspec docs 2023-10-24 00:56:00 -07:00
James R. Barlow
5a759947dd Update release notes so far 2023-10-24 00:56:00 -07:00
James R. Barlow
db3df13e95 Remove ocrmypdf._sync 2023-10-24 00:54:31 -07:00
James R. Barlow
2a8bc03167 optimize: typing 2023-10-24 00:54:31 -07:00
James R. Barlow
d2297b39d0 info: clarify ICC -> components checking 2023-10-24 00:54:31 -07:00
James R. Barlow
e4cd081d4d info: clarify pageinfo context management 2023-10-24 00:54:31 -07:00
James R. Barlow
d2dbea6cf8 Reorganize progress bars so they can be typed properly 2023-10-24 00:54:31 -07:00
James R. Barlow
46a279a49a Improve passing of arguments to workers
The executor system was built around passing only a single
argument to workers, which was
always PageContext. For other tasks, all actual arguments were packed in
a tuple, which meant we needed intermediate functions to unpack the
tuple.

The situation is now rationlized and resembles how Python handles
argument passing to familiar multiprocessing tools.
2023-10-24 00:54:31 -07:00
James R. Barlow
299f0c4003 Update dep5 2023-10-24 00:54:31 -07:00
James R. Barlow
9ffb45f283 Remove public domain congress.jpg and replace with baiona_color.jpg
For reuse compliance we are phasing out public domain licenses
2023-10-24 00:54:31 -07:00
James R. Barlow
cd61c4efd9 pngquant: remove unused ability to quantize a non-PNG
Covering testing showed this branch was never used, and when tested it didn't work.
2023-10-24 00:54:31 -07:00
James R. Barlow
a06ab2a1c5 unpaper: Remove format conversion
Code is no longer reachable since we rasterize a 1/L/RGB image prior to this point.
2023-10-24 00:54:31 -07:00
James R. Barlow
dfa4ebf1a6 Simplify function signature of extract_image_filter 2023-10-24 00:54:31 -07:00
James R. Barlow
58f388c69d optimize: better coverage 2023-10-24 00:54:31 -07:00
James R. Barlow
990b462a94 Fix coverage settings and cover semfree 2023-10-24 00:54:31 -07:00
James R. Barlow
b928dc0808 Skip fewer tests 2023-10-24 00:54:31 -07:00
James R. Barlow
8916955f45 Convert many run_ocrmypdf -> run_ocrmypdf_api 2023-10-24 00:54:31 -07:00
James R. Barlow
82bef40aa6 Eliminate more run_ocrmypdf calls 2023-10-24 00:54:31 -07:00
James R. Barlow
1c45f32941 tests: replace many run_ocrmypdf -> run_ocrmypdf_api 2023-10-24 00:54:31 -07:00
James R. Barlow
fadc0cf69b Replace cryptic test error messages with more informative ones 2023-10-24 00:54:31 -07:00
James R. Barlow
7ce9d08b2d Define progress bar plugins formally instead of "tqdm-like" 2023-10-24 00:54:31 -07:00
James R. Barlow
eb3a51e33a Prefer pikepdf's newer Page.mediabox accessor over .MediaBox 2023-10-24 00:54:31 -07:00
James R. Barlow
f3dd733773 optimize: explore page container as objects instead of page helpers 2023-10-24 00:54:31 -07:00
James R. Barlow
4dbc5e1dba Fix some typing issues 2023-10-24 00:54:31 -07:00
James R. Barlow
c0637c287e vscode isn't ready for black py312, revert 2023-10-24 00:54:31 -07:00
James R. Barlow
6127f7abd6 tqdm_kwargs to progress_kwargs 2023-10-24 00:54:31 -07:00
James R. Barlow
a4059762e6 Fix hocrtransform test to generate blank hocr 2023-10-24 00:54:31 -07:00
James R. Barlow
40afcd68a7 pluginspec: spacing 2023-10-24 00:54:31 -07:00
James R. Barlow
f238e721ed Improve documentation of new public hOCR APIs 2023-10-24 00:54:31 -07:00
James R. Barlow
16eb5627a7 Fix unused imports and other trivia 2023-10-24 00:54:31 -07:00
James R. Barlow
fbf0674189 hocr_to_ocr_pdf: handle missing hocr json file 2023-10-24 00:54:31 -07:00
James R. Barlow
62c4f65fc3 Remove duplicate thread local storage of page numbers 2023-10-24 00:54:31 -07:00
James R. Barlow
e400112f32 Improve ._pipelines naming 2023-10-24 00:54:31 -07:00