OCRmyPDF/tests/test_ocr_engine_interface.py at main

mirror of https://github.com/ocrmypdf/OCRmyPDF.git synced 2026-02-07 21:03:59 -05:00

Files

James R. Barlow 0c3745a1a4 Add OCR engine selection framework and null OCR engine

Introduce --ocr-engine option to select between OCR engines:
- 'auto' (default): Uses Tesseract
- 'tesseract': Explicit Tesseract selection
- 'none': Skip OCR entirely (for PDF processing only)

Key changes:
- Extend OcrEngine ABC with generate_ocr() and supports_generate_ocr()
  for direct OcrElement tree output (bypasses hOCR)
- Add get_ocr_engine(options) hook parameter for engine selection
- Implement NullOcrEngine for --ocr-engine none
- Export OcrElement, OcrClass, BoundingBox from ocrmypdf package
- Add ocr_tree support to grafting pipeline

This prepares the foundation for pluggable OCR engines while maintaining
full backward compatibility with existing Tesseract-based workflows.

2026-01-12 10:11:14 -08:00

3.9 KiB

Raw Permalink Blame History

View Raw

3.9 KiB Raw Permalink Blame History

3.9 KiB

Raw Permalink Blame History