Commit Graph

566 Commits

Author SHA1 Message Date
James R. Barlow
f4cb424451 Support input/output streams at API level 2020-06-22 02:02:18 -07:00
James R. Barlow
fef14778d5 Fix missing f-string in log message 2020-06-22 01:17:16 -07:00
James R. Barlow
48e2750551 Fix some tests that were failing in Docker 2020-06-21 01:48:13 -07:00
James R. Barlow
ebfe4f0d29 Fix issue #582 - PDF/A acquires title "Untitled" after conversion 2020-06-20 02:01:16 -07:00
James R. Barlow
892db88f0e test_two_languages: use narrower test 2020-06-12 14:33:02 -07:00
James R. Barlow
eeb44f78cc Fix tests that failed on other platforms from previous fix 2020-06-12 12:59:46 -07:00
James R. Barlow
393c5a9ea4 Fix error on -l lang1+lang2 2020-06-12 12:10:29 -07:00
James R. Barlow
c6b9a49cbb Fix tests that fail in CI 2020-06-10 17:08:00 -07:00
James R. Barlow
872bafad4b Reinstate quick test for text/no text
Partial revert of commit 991db17
2020-06-10 12:00:52 -07:00
James R. Barlow
64891c2fc3 Pre-release delinting 2020-06-09 15:27:14 -07:00
James R. Barlow
fe156db41d Merge branch 'release/v10' into trialmerge 2020-06-09 15:12:56 -07:00
James R. Barlow
0f942fb714 Rename ocrmypdf.exec -> ocrmypdf._exec 2020-06-09 14:59:09 -07:00
James R. Barlow
be8ca589d4 Move ocrmypdf.exec.run and friends to ocrmypdf.subprocess 2020-06-09 14:53:10 -07:00
James R. Barlow
3b6f6782f0 Remove tesseract_env, --tesseract-env 2020-06-09 00:39:53 -07:00
James R. Barlow
21c0e045cb Remove _OCRMYPDF_TEST_PATH environment variable 2020-06-09 00:30:13 -07:00
James R. Barlow
ebbf68bd08 The big payoff: abolishing spoofing machinery 2020-06-09 00:08:20 -07:00
James R. Barlow
2059e916da Convert all ghostscript spoofs to test plugins 2020-06-09 00:00:25 -07:00
James R. Barlow
7b9025f397 Convert generate_pdfa to plugin 2020-06-08 22:28:38 -07:00
James R. Barlow
b109445215 Move Ghostscript rasterize_pdf to plugin 2020-06-08 17:10:27 -07:00
James R. Barlow
a9a473f2e5 Convert all tesseract cache usages to plugin 2020-06-05 17:55:18 -07:00
James R. Barlow
6268e2faff Begin replacing tests/spoof/tesseract_cache with plugin 2020-06-05 17:27:10 -07:00
James R. Barlow
ec3f506500 Convert tesseract_badutf8 to plugin 2020-06-05 16:38:19 -07:00
James R. Barlow
5e14d5b0dd Fix test_report_file_size
Use more realistic test data
2020-06-03 13:24:55 -07:00
James R. Barlow
c6b2fa8851 Remove unpaper spoof; no plugin needed 2020-06-02 02:42:14 -07:00
James R. Barlow
1b92f447c3 Convert tesseract_crash to plugin 2020-06-02 02:36:41 -07:00
James R. Barlow
82e7eb91d2 Tidy tesseract_noop 2020-06-02 01:50:02 -07:00
James R. Barlow
4f4ad0fb76 Convert tesseract_big_image_error to plugin 2020-06-02 01:49:47 -07:00
James R. Barlow
1598f2f0e5 Abolish spoof_tesseract_noop 2020-06-01 03:07:53 -07:00
James R. Barlow
2b23f7ec73 tesseract_noop: begin implementing with plugin 2020-06-01 02:45:49 -07:00
James R. Barlow
642ebc6098 Fix test that failed on Windows 2020-05-28 15:52:00 -07:00
James R. Barlow
df9f5157bd Fix shim_paths to account for unexpected files in Program Files\gs
Fixes #565
2020-05-28 14:58:41 -07:00
James R. Barlow
aa060db5bc Refactor tesseract_env variable into the plugin
Removed all cases except one in api.py, which isn't worth solving because
it should be removed anyway.

This also fixes a logic error in the OMP_THREAD_LIMIT decision, api.py
did not use pass kwargs correctly so they never worked before.
2020-05-26 02:14:06 -07:00
James R. Barlow
d43212d30b Refactor --language argument into set 2020-05-25 03:20:10 -07:00
James R. Barlow
a0f9ca3a30 Move Tesseract options validation into plugin 2020-05-25 01:31:46 -07:00
James R. Barlow
9bccff4f88 Move Tesseract specific arguments to plugin 2020-05-16 03:24:31 -07:00
James R. Barlow
2bd586e093 Compare requested languages to OCR engine instead of tesseract directly
Also refactoring to facilitating validation needing the plugin manager.
2020-05-16 01:50:37 -07:00
James R. Barlow
9af94ac9b7 pipeline: use OCR engine abstraction instead of Tesseract 2020-05-16 01:28:56 -07:00
James R. Barlow
41eb54cc0a Standardize tesseract.generate_hocr and _pdf parameters 2020-05-14 03:23:25 -07:00
James R. Barlow
12a2f78c4d Fix validation of languages not using tesseract_env
And some related issues.
2020-05-14 03:19:22 -07:00
James R. Barlow
d372f1f7fa Remove "skip page" from tesseract interface
Breaks tests/test_main.py::test_tesseract_missing_tessdata because
conftest.py does not update options.tesseract_env before testing options
for some reason, and tesseract.has_textonly_pdf raises an exception
instead of returning False as the test assumes.
2020-05-12 04:09:42 -07:00
James R. Barlow
2541f6cf89 Fix missing jbig2enc reported as error with -O3 instead of warning
Fixes #558
2020-05-12 01:05:57 -07:00
James R. Barlow
977665d2b6 Delint some tests 2020-05-08 03:49:33 -07:00
James R. Barlow
fd7497f00d Remove old function tesseract.v4() 2020-05-08 03:44:39 -07:00
James R. Barlow
1b086f60a9 tesseract.py: api cleanup 2020-05-06 12:37:44 -07:00
James R. Barlow
85cbf94a6e Convert many uses of str paths to Path 2020-05-06 02:53:47 -07:00
James R. Barlow
c85278b31d Delinting 2020-05-03 00:53:29 -07:00
James R. Barlow
5dbc080fa0 Rename PDFContext->PdfContext 2020-05-02 04:32:46 -07:00
James R. Barlow
e02f6c1e97 Support plugin invocation with API 2020-05-02 03:34:31 -07:00
James R. Barlow
016dfd420c Add warning if problematic --tesseract-pagesegmode is selected
Fixes #549
2020-04-30 04:12:11 -07:00
James R. Barlow
b840b16c82 Remove tesseract_badutf8.py
Should have been removed in 9db01c7
2020-04-28 02:35:23 -07:00