mirror of
https://github.com/ocrmypdf/OCRmyPDF.git
synced 2026-05-19 03:58:06 -04:00
Introduce --pdf-renderer auto
Tess 3.03's has various quality problems like wrong DPI that are fixed in Tess 3.04. Idea here is to introduce an option to let OCRmyPDF select the rendering backend based on the options and system. However, we're not ready for tesseract as the main renderer. Setting pdf-renderer to tesseract does not pass all test cases, mainly the one where --tesseract-timeout is triggered, and some others.
This commit is contained in:
@@ -174,7 +174,7 @@ advanced.add_argument(
|
||||
'--tesseract-config', default=[], type=list, action='append',
|
||||
help="additional Tesseract configuration files")
|
||||
advanced.add_argument(
|
||||
'--pdf-renderer', choices=['tesseract', 'hocr'], default='hocr',
|
||||
'--pdf-renderer', choices=['auto', 'tesseract', 'hocr'], default='auto',
|
||||
help='choose OCR PDF renderer')
|
||||
advanced.add_argument(
|
||||
'--tesseract-timeout', default=180.0, type=float,
|
||||
@@ -216,6 +216,8 @@ if not set(options.language).issubset(tesseract.languages()):
|
||||
# ----------
|
||||
# Arguments
|
||||
|
||||
if options.pdf_renderer == 'auto':
|
||||
options.pdf_renderer = 'hocr'
|
||||
|
||||
if any((options.deskew, options.clean, options.clean_final)):
|
||||
try:
|
||||
|
||||
Reference in New Issue
Block a user