Commit Graph

268 Commits

Author SHA1 Message Date
Hugo
d761d80750 Use more standard __version__ rather than PILLOW_VERSION (#257) 2018-04-19 23:35:32 -07:00
James R. Barlow
0b10db91be Fix regression: Disable Ghostscript JPEG passthrough entirely 2018-04-17 17:00:24 -07:00
James R. Barlow
1a516b2af9 Fix regression: time stamp test suite failures 2018-04-17 16:59:21 -07:00
James R. Barlow
7368399f8b Clarify license of two test files - https://github.com/jbarlow83/OCRmyPDF/issues/254 2018-04-17 11:56:36 -07:00
James R. Barlow
34c78a892a Fix list table for tests/resources
[ci skip]
2018-04-15 23:52:19 -07:00
James R. Barlow
10aa59f674 v6.1.4 fix test suite regression with Ghostscript 9.23 2018-04-12 15:16:54 -07:00
James R. Barlow
ba0535e3fb Update test cache to account for unpaper --layout none change 2018-04-12 00:48:21 -07:00
James R. Barlow
49fa7f6b5c tesseract_cache: don't reveal host system file paths in manifest file 2018-04-12 00:47:28 -07:00
James R. Barlow
7a1cd39b21 Fix creation date metadata lost from input
Closes #247
2018-04-02 17:53:39 -07:00
James R. Barlow
4f6bffb477 Update copyrights 2018-03-31 11:54:38 -07:00
James R. Barlow
8d9be43c60 test_bookmarks_preserved won't raise ImportError any more
Due to trapping this in ocrmypdf.lib
2018-03-28 23:22:55 -07:00
James R. Barlow
40ef4f0bbe Add new argument --skip-repair to skip the repair step 2018-03-28 00:54:58 -07:00
James R. Barlow
5becfcf8ea Refactor fitz ImportError trap 2018-03-27 21:38:02 -07:00
James R. Barlow
a9bd494cc0 Merge branch 'optional-fitz' 2018-03-27 13:36:33 -07:00
James R. Barlow
6a4df78bc0 Add _naive_find_text to search for text when fitz is not available 2018-03-27 13:36:17 -07:00
James R. Barlow
530eae3898 Fix test_main missing file_claims_pdfa 2018-03-26 15:33:53 -07:00
James R. Barlow
3e444f6a90 Make fitz optional 2018-03-26 13:22:09 -07:00
James R. Barlow
45dbff6401 Fix table of contents not preserved in PDF/A 2018-03-26 02:23:19 -07:00
James R. Barlow
bc56b8e058 Move metadata tests to new test_metadata 2018-03-26 01:49:25 -07:00
James R. Barlow
746969207a Remove deprecated --pdf-renderer tess4, which was renamed to sandwich
Should have been cut in v6.0.0
2018-03-26 01:17:22 -07:00
James R. Barlow
230d301268 conftest: py3.5 path issue 2018-03-25 00:52:45 -07:00
James R. Barlow
a2d00f5f1d tess cache: fix tess3 error for -psm instead of --psm 2018-03-25 00:43:02 -07:00
James R. Barlow
8c1c61f207 test cache: fix Path + str error 2018-03-25 00:02:03 -07:00
James R. Barlow
77476965ae test cache: use .bin extension, fix .gitignore .gitattributes 2018-03-24 23:54:16 -07:00
James R. Barlow
ca51514046 Add test cache 2018-03-24 23:50:41 -07:00
James R. Barlow
8975b72a01 Fix test_testonly_pdf generating an output file in pwd 2018-03-24 22:34:35 -07:00
James R. Barlow
874ec6a87f Add missing fixture to test_unpaper 2018-03-24 22:24:14 -07:00
James R. Barlow
909eaeeead spoof: Allow tesseract cache to share cache
Previous incarnation was only suitable for generating a local cache
where the suite was executed repeatedly. Now the cache ignores
differences, so it can be checked into Github and shared.
2018-03-24 22:17:36 -07:00
James R. Barlow
c138161fae Tests: more cleanup 2018-03-24 15:35:57 -07:00
James R. Barlow
e48590d66c Refactor out unpaper-specific tests 2018-03-24 15:21:44 -07:00
James R. Barlow
5b1c8541fc Review some skipped tests to make sure reasons still valid 2018-03-24 15:13:23 -07:00
James R. Barlow
e5e011021b Remove the OCRMYPDF_program environment variables
Really, this was just replicating the functionality of the PATH
environment variable, and users probably do that anyway.
2018-03-24 15:09:08 -07:00
James R. Barlow
11d74dea09 Remove the OCRMYPDF_program environment variables
Really, this was just replicating the functionality of the PATH
environment variable, and users probably do that anyway.
2018-03-24 15:07:02 -07:00
James R. Barlow
6756016572 Add license notice to all files
Source files to GPL3

Exceptions:
-tests/spoof/* to MIT
-hocrtransform.py
-_unicodefun.py

Test resources to CC BY-SA 4.0 except when otherwise noted.

Add GPL license.
2018-03-24 02:33:24 -07:00
James R. Barlow
d700154e0e Fix regressions after --skip-text improvements 2018-03-24 02:24:45 -07:00
James R. Barlow
8159cc6b88 Skip one test that fails for qpdf 8.0.[0,1], due to qpdf regression 2018-03-09 07:57:22 -08:00
James R. Barlow
4046766ca5 Fix Python 3.5 test suite failure on symlinks
Did not account for API difference in pathlib
2018-03-02 16:57:46 -08:00
James R. Barlow
74ca736333 Issue #223: improve text of encrypted PDF error message 2018-02-27 15:08:22 -08:00
James R. Barlow
8ab8132411 lint: unused variables, wildcard imports 2018-02-24 12:48:52 -08:00
James R. Barlow
45c7bd9a60 lint: Remove shebangs from non-executable files 2018-02-24 12:38:58 -08:00
James R. Barlow
e7bcb95635 Fix pylint errors 2018-02-24 11:59:01 -08:00
James R. Barlow
3de83627a9 Handle output to /dev/null or directory (#219)
Previously we threw an exception if the output name was a directory (only after doing OCR) and would trigger a PermissionError on trying to flip permission bits of /dev/null due to shutil.copyfile implementation. Instead of copying file use shutil.copyfileobj which should also respect umask etc.
2018-02-19 22:15:07 -08:00
James R. Barlow
a9da839c39 Add vector-only PDF test case 2018-02-08 00:17:35 -08:00
James R. Barlow
1dfc32d7e6 Preserve "text as curves" vector content
Never updated the checking logic to deal with a pure vector file with no text that needs an OCR layer. This is doable, so allow it.
2018-02-07 16:05:48 -08:00
James R. Barlow
019513696b Ghostscript spoof scripts did not report their --version correctly 2018-01-10 17:08:14 -08:00
James R. Barlow
ad7a4476db hugemono.pdf needs --max-image-mpixels to pass with Pillow 5.0 2018-01-10 16:55:18 -08:00
James R. Barlow
4812b20fb2 Fix tesseract_noop.py generating wrong size of output PDF in tests
This caused trouble before with test_deskew
2018-01-10 16:35:31 -08:00
James R. Barlow
882fc2257c Add --max-image-mpixels argument to support Pillow 5.0 2018-01-10 15:43:59 -08:00
James R. Barlow
91b42cbfa8 Fix issue in sandwich renderer when skipping OCR on a rotated and deskewed page
If OCR is skipped due to --tesseract-timeout or similar, and the skip page is rotated with /Rotate, and the skip page was deskewed or had other image processing, then the skip page was created with the wrong dimensions causing the output page to be cropped.
2018-01-09 00:17:53 -08:00
James R. Barlow
da11fd17ee qpdf dummy: needs to return version now 2017-11-29 14:35:37 -08:00