mirror of
https://github.com/ocrmypdf/OCRmyPDF.git
synced 2026-02-07 21:03:59 -05:00
Introduce --ocr-engine option to select between OCR engines: - 'auto' (default): Uses Tesseract - 'tesseract': Explicit Tesseract selection - 'none': Skip OCR entirely (for PDF processing only) Key changes: - Extend OcrEngine ABC with generate_ocr() and supports_generate_ocr() for direct OcrElement tree output (bypasses hOCR) - Add get_ocr_engine(options) hook parameter for engine selection - Implement NullOcrEngine for --ocr-engine none - Export OcrElement, OcrClass, BoundingBox from ocrmypdf package - Add ocr_tree support to grafting pipeline This prepares the foundation for pluggable OCR engines while maintaining full backward compatibility with existing Tesseract-based workflows.
3.9 KiB
3.9 KiB