Hugo
d761d80750
Use more standard __version__ rather than PILLOW_VERSION ( #257 )
2018-04-19 23:35:32 -07:00
James R. Barlow
0b10db91be
Fix regression: Disable Ghostscript JPEG passthrough entirely
2018-04-17 17:00:24 -07:00
James R. Barlow
1a516b2af9
Fix regression: time stamp test suite failures
2018-04-17 16:59:21 -07:00
James R. Barlow
7368399f8b
Clarify license of two test files - https://github.com/jbarlow83/OCRmyPDF/issues/254
2018-04-17 11:56:36 -07:00
James R. Barlow
34c78a892a
Fix list table for tests/resources
...
[ci skip]
2018-04-15 23:52:19 -07:00
James R. Barlow
10aa59f674
v6.1.4 fix test suite regression with Ghostscript 9.23
2018-04-12 15:16:54 -07:00
James R. Barlow
ba0535e3fb
Update test cache to account for unpaper --layout none change
2018-04-12 00:48:21 -07:00
James R. Barlow
49fa7f6b5c
tesseract_cache: don't reveal host system file paths in manifest file
2018-04-12 00:47:28 -07:00
James R. Barlow
7a1cd39b21
Fix creation date metadata lost from input
...
Closes #247
2018-04-02 17:53:39 -07:00
James R. Barlow
4f6bffb477
Update copyrights
2018-03-31 11:54:38 -07:00
James R. Barlow
8d9be43c60
test_bookmarks_preserved won't raise ImportError any more
...
Due to trapping this in ocrmypdf.lib
2018-03-28 23:22:55 -07:00
James R. Barlow
40ef4f0bbe
Add new argument --skip-repair to skip the repair step
2018-03-28 00:54:58 -07:00
James R. Barlow
5becfcf8ea
Refactor fitz ImportError trap
2018-03-27 21:38:02 -07:00
James R. Barlow
a9bd494cc0
Merge branch 'optional-fitz'
2018-03-27 13:36:33 -07:00
James R. Barlow
6a4df78bc0
Add _naive_find_text to search for text when fitz is not available
2018-03-27 13:36:17 -07:00
James R. Barlow
530eae3898
Fix test_main missing file_claims_pdfa
2018-03-26 15:33:53 -07:00
James R. Barlow
3e444f6a90
Make fitz optional
2018-03-26 13:22:09 -07:00
James R. Barlow
45dbff6401
Fix table of contents not preserved in PDF/A
2018-03-26 02:23:19 -07:00
James R. Barlow
bc56b8e058
Move metadata tests to new test_metadata
2018-03-26 01:49:25 -07:00
James R. Barlow
746969207a
Remove deprecated --pdf-renderer tess4, which was renamed to sandwich
...
Should have been cut in v6.0.0
2018-03-26 01:17:22 -07:00
James R. Barlow
230d301268
conftest: py3.5 path issue
2018-03-25 00:52:45 -07:00
James R. Barlow
a2d00f5f1d
tess cache: fix tess3 error for -psm instead of --psm
2018-03-25 00:43:02 -07:00
James R. Barlow
8c1c61f207
test cache: fix Path + str error
2018-03-25 00:02:03 -07:00
James R. Barlow
77476965ae
test cache: use .bin extension, fix .gitignore .gitattributes
2018-03-24 23:54:16 -07:00
James R. Barlow
ca51514046
Add test cache
2018-03-24 23:50:41 -07:00
James R. Barlow
8975b72a01
Fix test_testonly_pdf generating an output file in pwd
2018-03-24 22:34:35 -07:00
James R. Barlow
874ec6a87f
Add missing fixture to test_unpaper
2018-03-24 22:24:14 -07:00
James R. Barlow
909eaeeead
spoof: Allow tesseract cache to share cache
...
Previous incarnation was only suitable for generating a local cache
where the suite was executed repeatedly. Now the cache ignores
differences, so it can be checked into Github and shared.
2018-03-24 22:17:36 -07:00
James R. Barlow
c138161fae
Tests: more cleanup
2018-03-24 15:35:57 -07:00
James R. Barlow
e48590d66c
Refactor out unpaper-specific tests
2018-03-24 15:21:44 -07:00
James R. Barlow
5b1c8541fc
Review some skipped tests to make sure reasons still valid
2018-03-24 15:13:23 -07:00
James R. Barlow
e5e011021b
Remove the OCRMYPDF_program environment variables
...
Really, this was just replicating the functionality of the PATH
environment variable, and users probably do that anyway.
2018-03-24 15:09:08 -07:00
James R. Barlow
11d74dea09
Remove the OCRMYPDF_program environment variables
...
Really, this was just replicating the functionality of the PATH
environment variable, and users probably do that anyway.
2018-03-24 15:07:02 -07:00
James R. Barlow
6756016572
Add license notice to all files
...
Source files to GPL3
Exceptions:
-tests/spoof/* to MIT
-hocrtransform.py
-_unicodefun.py
Test resources to CC BY-SA 4.0 except when otherwise noted.
Add GPL license.
2018-03-24 02:33:24 -07:00
James R. Barlow
d700154e0e
Fix regressions after --skip-text improvements
2018-03-24 02:24:45 -07:00
James R. Barlow
8159cc6b88
Skip one test that fails for qpdf 8.0.[0,1], due to qpdf regression
2018-03-09 07:57:22 -08:00
James R. Barlow
4046766ca5
Fix Python 3.5 test suite failure on symlinks
...
Did not account for API difference in pathlib
2018-03-02 16:57:46 -08:00
James R. Barlow
74ca736333
Issue #223 : improve text of encrypted PDF error message
2018-02-27 15:08:22 -08:00
James R. Barlow
8ab8132411
lint: unused variables, wildcard imports
2018-02-24 12:48:52 -08:00
James R. Barlow
45c7bd9a60
lint: Remove shebangs from non-executable files
2018-02-24 12:38:58 -08:00
James R. Barlow
e7bcb95635
Fix pylint errors
2018-02-24 11:59:01 -08:00
James R. Barlow
3de83627a9
Handle output to /dev/null or directory ( #219 )
...
Previously we threw an exception if the output name was a directory (only after doing OCR) and would trigger a PermissionError on trying to flip permission bits of /dev/null due to shutil.copyfile implementation. Instead of copying file use shutil.copyfileobj which should also respect umask etc.
2018-02-19 22:15:07 -08:00
James R. Barlow
a9da839c39
Add vector-only PDF test case
2018-02-08 00:17:35 -08:00
James R. Barlow
1dfc32d7e6
Preserve "text as curves" vector content
...
Never updated the checking logic to deal with a pure vector file with no text that needs an OCR layer. This is doable, so allow it.
2018-02-07 16:05:48 -08:00
James R. Barlow
019513696b
Ghostscript spoof scripts did not report their --version correctly
2018-01-10 17:08:14 -08:00
James R. Barlow
ad7a4476db
hugemono.pdf needs --max-image-mpixels to pass with Pillow 5.0
2018-01-10 16:55:18 -08:00
James R. Barlow
4812b20fb2
Fix tesseract_noop.py generating wrong size of output PDF in tests
...
This caused trouble before with test_deskew
2018-01-10 16:35:31 -08:00
James R. Barlow
882fc2257c
Add --max-image-mpixels argument to support Pillow 5.0
2018-01-10 15:43:59 -08:00
James R. Barlow
91b42cbfa8
Fix issue in sandwich renderer when skipping OCR on a rotated and deskewed page
...
If OCR is skipped due to --tesseract-timeout or similar, and the skip page is rotated with /Rotate, and the skip page was deskewed or had other image processing, then the skip page was created with the wrong dimensions causing the output page to be cropped.
2018-01-09 00:17:53 -08:00
James R. Barlow
da11fd17ee
qpdf dummy: needs to return version now
2017-11-29 14:35:37 -08:00