OCRmyPDF

mirror of https://github.com/ocrmypdf/OCRmyPDF.git synced 2026-05-03 20:24:33 -04:00

Files

James R. Barlow 5be368fe75 Fix RTL text extraction order in fpdf2 renderer (#1655 )

fpdf2's shape_text() produces RTL ligature glyphs (e.g. lam-alef) with
multi-character CMap entries whose character order gets reversed by the
bidi algorithm during text extraction, producing garbled output like
"سالح" instead of "سلاح".

For invisible text (the production OCR overlay path), bypass text shaping
and use encode_text() with pre-reversed strings. encode_text() maps
characters 1:1 in logical order, avoiding the ligature CMap issue. The
pre-reversal compensates for bidi reversal by text extractors. Since the
text is invisible (Tr=3), the lack of joining forms is harmless.

Add RTL text extraction tests that verify glyph stream order, ToUnicode
CMap 1:1 mappings, and correct logical order for Arabic (including
lam-alef ligature) and Hebrew scripts.

2026-04-04 01:40:38 -07:00

cache

…

plugins

…

resources

…

__init__.py

…

conftest.py

…

test_acroform.py

…

test_annots.py

…

test_api.py

Fix Python API ignoring language parameter (fixes #1640 )

2026-02-20 17:10:57 -08:00

test_check_pdf.py

…

test_completion.py

…

test_concurrency.py

…

test_fpdf_renderer.py

…

test_ghostscript.py

…

test_graft.py

Fix OCR text displacement on PDFs with non-zero MediaBox origins

2026-02-17 23:34:33 -08:00

test_helpers.py

…

test_hocr_parser.py

…

test_hocrtransform.py

…

test_image_input.py

…

test_imageops.py

…

test_json_serialization.py

Fix Python API producing empty OCR due to tesseract_timeout defaulting to 0

2026-02-17 21:55:49 -08:00

test_logging.py

…

test_main.py

…

test_metadata.py

…

test_multi_font_manager.py

…

test_multilingual_direct.py

…

test_null_ocr_engine.py

…

test_ocr_element.py

…

test_ocr_engine_interface.py

…

test_ocr_engine_selection.py

…

test_optimize.py

…

test_page_boxes.py

…

test_page_numbers.py

…

test_pdf_renderer.py

Fix RTL text extraction order in fpdf2 renderer (#1655 )

2026-04-04 01:40:38 -07:00

test_pdfa.py

…

test_pdfinfo.py

…

test_pipeline_generate_ocr.py

…

test_pipeline.py

…

test_preprocessing.py

…

test_quality.py

…

test_rasterizer.py

…

test_rotation.py

…

test_semfree.py

…

test_soft_error.py

…

test_stdio.py

…

test_system_font_provider.py

…

test_tagged.py

…

test_tesseract.py

…

test_unpaper.py

…

test_userunit.py

…

test_validation.py

…

test_verapdf.py

…

test_watcher.py

…