Commit Graph

2477 Commits

Author SHA1 Message Date
James R. Barlow
5dbc080fa0 Rename PDFContext->PdfContext 2020-05-02 04:32:46 -07:00
James R. Barlow
e02f6c1e97 Support plugin invocation with API 2020-05-02 03:34:31 -07:00
James R. Barlow
8c9a8fc85c pluginspec: avoid circular reference 2020-05-02 03:32:55 -07:00
James R. Barlow
23d558ad8c Allow plugins to add command line arguments 2020-05-02 01:37:24 -07:00
James R. Barlow
be107b4fed Set up filter_ocr_image hook 2020-05-01 02:56:41 -07:00
James R. Barlow
8d2535e327 Get pluggy to work with forking workers 2020-05-01 02:39:50 -07:00
James R. Barlow
5eb4fe0052 Refactor plugin setup to get_plugin_manager 2020-05-01 02:18:31 -07:00
James R. Barlow
d8ff4485f8 Move samefile to helpers 2020-05-01 02:18:11 -07:00
James R. Barlow
82bce463ae Start pluggy-based plugin system 2020-05-01 02:15:23 -07:00
James R. Barlow
016dfd420c Add warning if problematic --tesseract-pagesegmode is selected
Fixes #549
2020-04-30 04:12:11 -07:00
James R. Barlow
8f5c95f0f4 Remove last vestiges of command line usage of qpdf - change to check_pdf 2020-04-26 05:33:26 -07:00
James R. Barlow
168fc60774 Update release notes with v10 changes 2020-04-26 05:14:59 -07:00
James R. Barlow
c84d0f606d ghostscript: remove deprecated argument from generate_pdfa 2020-04-26 05:11:11 -07:00
James R. Barlow
8b54ce338f setup: remove deprecated message about removeal of --force parameter 2020-04-26 05:09:42 -07:00
James R. Barlow
18c4aa10bf Adjust number of workers for concurrent page scanning 2020-04-26 04:21:15 -07:00
James R. Barlow
991db17fde Remove Ghostscript-based text extraction
While faster than Python based methods, we've outgrown the limited
amount of information Ghostscript provides with this feature, and it
repeats an analysis we have to do anyway to learn what images are
present.
2020-04-26 04:02:07 -07:00
James R. Barlow
2c07515907 macOS - use spawn for multiprocessing
See bpo-33725. This is the default for 3.8, opt-in for 3.7 and older.
2020-04-26 03:49:40 -07:00
James R. Barlow
27a3b80376 Use once-per-worker pikepdf init 2020-04-26 03:49:20 -07:00
James R. Barlow
8c381a0227 Replace task_initargs with use of partial() 2020-04-26 03:49:20 -07:00
James R. Barlow
86145a8c76 Some wrong with forking worker_pdf, just open it once per page for now 2020-04-26 03:49:20 -07:00
James R. Barlow
7513f5425c Fix some broken tests 2020-04-26 03:49:20 -07:00
James R. Barlow
af3c3c6466 Further refactoring of concurrency concerns 2020-04-26 03:49:20 -07:00
James R. Barlow
db3e75e33e Refactor multiprocessing pool 2020-04-26 03:49:13 -07:00
James R. Barlow
ce49fc26dd Do pikepdf.open() once instead of per worker 2020-04-26 03:42:13 -07:00
James R. Barlow
d0d0a98dca First cut at concurrent page scan
Improvement appears on 168 page file. Needs refactoring
2020-04-26 03:42:13 -07:00
James R. Barlow
94c52a6fa3 Refactor 'xyres' into Resolution 2020-04-24 04:12:05 -07:00
James R. Barlow
57771f06a3 Refactor xy-pair for resolution to tuple 2020-04-16 15:38:33 -07:00
James R. Barlow
4581027246 Drop support for pdfminer.six 20181108
This version required a patch that has since been mainlined, and also did not
declare its dependency on chardet
correctly. We can remove both hacks now.
2020-04-15 02:50:36 -07:00
James R. Barlow
31b5f63f85 hocrtransform: cleanup/PEP8
Some API breaking changes.
2020-04-15 02:48:56 -07:00
James R. Barlow
957fb1494e pytest picky about list vs tuple 2020-04-15 02:26:20 -07:00
James R. Barlow
9e3e4f2687 Improve help text about aborting due to text 2020-04-15 02:17:55 -07:00
James R. Barlow
2155bcacb4 Loosen test language requirements - eng/deu 2020-04-15 00:30:38 -07:00
James R. Barlow
346da95899 Suppress loglevel since we have color now 2020-04-15 00:09:36 -07:00
James R. Barlow
f4f7946a0c Add colored logs 2020-04-15 00:05:38 -07:00
James R. Barlow
c2919f2e1c Reinstate logging of page numbers 2020-04-15 00:05:23 -07:00
James R. Barlow
a63d624052 Improve logging of subprocess output 2020-04-15 00:04:43 -07:00
James R. Barlow
af91489376 Remove safe_symlink log= warning 2020-04-14 23:59:33 -07:00
James R. Barlow
d146d2b65c The Great Logging Refactor
Remove all instances of logger object being passed as parameters.
This was a holdover from ruffus, and complicated a lot of simple things.
2020-04-14 23:59:33 -07:00
James R. Barlow
4ff4ed24a8 Refactor Windows executable shims 2020-04-14 23:59:33 -07:00
James R. Barlow
c38ff90081 Merge branch 'master' of github.com:jbarlow83/OCRmyPDF 2020-04-14 23:55:01 -07:00
James R. Barlow
4c029e973f Fix isinstance(..,str) 2020-04-14 23:53:52 -07:00
Lars K.W. Gohlke
21cf9029e8 docs: Set ownership when using docker image (#518) 2020-04-14 23:32:01 -07:00
James R. Barlow
4a640b8dcd Fix language argument not working as list
Fixes #523
2020-04-14 23:18:52 -07:00
James R. Barlow
9471bc8921 Fix versions with leading v, e.g. v5.0 v9.7.1 2020-04-10 13:42:33 -07:00
James R. Barlow
7fe06c64fc v9.7.1 release notes 2020-04-10 13:00:19 -07:00
James R. Barlow
d13d70fd56 Fix version checker failing for qpdf 10.0.0
Fixes #527
2020-04-10 13:00:19 -07:00
James R. Barlow
58ec56180a Add a few more type annotations to public APIs 2020-04-10 13:00:19 -07:00
James R. Barlow
32a88f1bad docs: warn that AWS Lambda doesn't work 2020-04-10 13:00:19 -07:00
James R. Barlow
99ef42940c docs: warn that Windows users should use an ifmain guard 2020-04-10 13:00:19 -07:00
jbarlow83
c152710617 Update issue templates 2020-04-04 15:41:53 -07:00