James R. Barlow
44bcafd3aa
Add missing file header
2023-10-30 00:28:09 -07:00
James R. Barlow
71166f7be8
Make hocr API experimental for now
...
This commit can be reverted when we are ready to release a new version.
2023-10-30 00:07:10 -07:00
James R. Barlow
580252a1a0
Merge branch 'feature/gscan2pdf'
...
Reconcile release notes and copy_final() with new pipeline.
2023-10-30 00:01:28 -07:00
James R. Barlow
c0b60dae6a
build: add repository -y
2023-10-29 23:41:36 -07:00
James R. Barlow
ae123fd209
Try to retain/copy xattrs
v15.3.1
2023-10-28 01:42:06 -07:00
James R. Barlow
454ad0acc5
build/macos: add openssl
2023-10-28 01:41:28 -07:00
James R. Barlow
0c306ac328
v15.3.1 release ntoes
2023-10-28 00:49:37 -07:00
James R. Barlow
52d99732b1
Fix mistakes with watcher loglevel handling
2023-10-28 00:47:40 -07:00
James R. Barlow
5b5827983b
Tweak documentation of --output-type
2023-10-26 23:57:08 -07:00
James R. Barlow
56f9bc311d
Improve verbosity of colorspace selection
2023-10-25 00:38:56 -07:00
James R. Barlow
eb17dc1ecf
Fix pdf save settings at metadata_fixup
2023-10-25 00:13:17 -07:00
James R. Barlow
6f8115a052
Fix import of metadata_fixup
2023-10-25 00:06:26 -07:00
James R. Barlow
aac913c666
tesseract: EAFP
2023-10-24 13:50:04 -07:00
James R. Barlow
b5e73ac4e4
Drop check for obsolete .dockerinit file
2023-10-24 13:49:46 -07:00
James R. Barlow
9e98c90891
docs: note on docker performance
2023-10-24 13:34:26 -07:00
James R. Barlow
ca2592c1d9
Update draft release notes
2023-10-24 00:56:00 -07:00
James R. Barlow
a31f17bb9d
Update comments and make worker functions private
2023-10-24 00:56:00 -07:00
James R. Barlow
1cb46afa94
Update pluginspec docs
2023-10-24 00:56:00 -07:00
James R. Barlow
5a759947dd
Update release notes so far
2023-10-24 00:56:00 -07:00
James R. Barlow
db3df13e95
Remove ocrmypdf._sync
2023-10-24 00:54:31 -07:00
James R. Barlow
2a8bc03167
optimize: typing
2023-10-24 00:54:31 -07:00
James R. Barlow
d2297b39d0
info: clarify ICC -> components checking
2023-10-24 00:54:31 -07:00
James R. Barlow
e4cd081d4d
info: clarify pageinfo context management
2023-10-24 00:54:31 -07:00
James R. Barlow
d2dbea6cf8
Reorganize progress bars so they can be typed properly
2023-10-24 00:54:31 -07:00
James R. Barlow
46a279a49a
Improve passing of arguments to workers
...
The executor system was built around passing only a single
argument to workers, which was
always PageContext. For other tasks, all actual arguments were packed in
a tuple, which meant we needed intermediate functions to unpack the
tuple.
The situation is now rationlized and resembles how Python handles
argument passing to familiar multiprocessing tools.
2023-10-24 00:54:31 -07:00
James R. Barlow
299f0c4003
Update dep5
2023-10-24 00:54:31 -07:00
James R. Barlow
9ffb45f283
Remove public domain congress.jpg and replace with baiona_color.jpg
...
For reuse compliance we are phasing out public domain licenses
2023-10-24 00:54:31 -07:00
James R. Barlow
cd61c4efd9
pngquant: remove unused ability to quantize a non-PNG
...
Covering testing showed this branch was never used, and when tested it didn't work.
2023-10-24 00:54:31 -07:00
James R. Barlow
a06ab2a1c5
unpaper: Remove format conversion
...
Code is no longer reachable since we rasterize a 1/L/RGB image prior to this point.
2023-10-24 00:54:31 -07:00
James R. Barlow
dfa4ebf1a6
Simplify function signature of extract_image_filter
2023-10-24 00:54:31 -07:00
James R. Barlow
58f388c69d
optimize: better coverage
2023-10-24 00:54:31 -07:00
James R. Barlow
990b462a94
Fix coverage settings and cover semfree
2023-10-24 00:54:31 -07:00
James R. Barlow
b928dc0808
Skip fewer tests
2023-10-24 00:54:31 -07:00
James R. Barlow
8916955f45
Convert many run_ocrmypdf -> run_ocrmypdf_api
2023-10-24 00:54:31 -07:00
James R. Barlow
82bef40aa6
Eliminate more run_ocrmypdf calls
2023-10-24 00:54:31 -07:00
James R. Barlow
1c45f32941
tests: replace many run_ocrmypdf -> run_ocrmypdf_api
2023-10-24 00:54:31 -07:00
James R. Barlow
fadc0cf69b
Replace cryptic test error messages with more informative ones
2023-10-24 00:54:31 -07:00
James R. Barlow
7ce9d08b2d
Define progress bar plugins formally instead of "tqdm-like"
2023-10-24 00:54:31 -07:00
James R. Barlow
eb3a51e33a
Prefer pikepdf's newer Page.mediabox accessor over .MediaBox
2023-10-24 00:54:31 -07:00
James R. Barlow
f3dd733773
optimize: explore page container as objects instead of page helpers
2023-10-24 00:54:31 -07:00
James R. Barlow
4dbc5e1dba
Fix some typing issues
2023-10-24 00:54:31 -07:00
James R. Barlow
c0637c287e
vscode isn't ready for black py312, revert
2023-10-24 00:54:31 -07:00
James R. Barlow
6127f7abd6
tqdm_kwargs to progress_kwargs
2023-10-24 00:54:31 -07:00
James R. Barlow
a4059762e6
Fix hocrtransform test to generate blank hocr
2023-10-24 00:54:31 -07:00
James R. Barlow
40afcd68a7
pluginspec: spacing
2023-10-24 00:54:31 -07:00
James R. Barlow
f238e721ed
Improve documentation of new public hOCR APIs
2023-10-24 00:54:31 -07:00
James R. Barlow
16eb5627a7
Fix unused imports and other trivia
2023-10-24 00:54:31 -07:00
James R. Barlow
fbf0674189
hocr_to_ocr_pdf: handle missing hocr json file
2023-10-24 00:54:31 -07:00
James R. Barlow
62c4f65fc3
Remove duplicate thread local storage of page numbers
2023-10-24 00:54:31 -07:00
James R. Barlow
e400112f32
Improve ._pipelines naming
2023-10-24 00:54:31 -07:00