Introduce --pdf-renderer auto

Tess 3.03's has various quality problems like wrong DPI that are fixed
in Tess 3.04. Idea here is to introduce an option to let OCRmyPDF
select the rendering backend based on the options and system.

However, we're not ready for tesseract as the main renderer.
Setting pdf-renderer to tesseract does not pass all test cases, mainly
the one where --tesseract-timeout is triggered, and some others.
This commit is contained in:
James R. Barlow
2015-12-02 23:20:31 -08:00
parent df1fda7438
commit 4f964a3c8a

View File

@@ -174,7 +174,7 @@ advanced.add_argument(
'--tesseract-config', default=[], type=list, action='append',
help="additional Tesseract configuration files")
advanced.add_argument(
'--pdf-renderer', choices=['tesseract', 'hocr'], default='hocr',
'--pdf-renderer', choices=['auto', 'tesseract', 'hocr'], default='auto',
help='choose OCR PDF renderer')
advanced.add_argument(
'--tesseract-timeout', default=180.0, type=float,
@@ -216,6 +216,8 @@ if not set(options.language).issubset(tesseract.languages()):
# ----------
# Arguments
if options.pdf_renderer == 'auto':
options.pdf_renderer = 'hocr'
if any((options.deskew, options.clean, options.clean_final)):
try: