OCRmyPDF/tests at v17.2.0 - OCRmyPDF - Gitea: Git with a cup of tea

mirror/OCRmyPDF

mirror of https://github.com/ocrmypdf/OCRmyPDF.git synced 2026-04-18 05:00:01 -04:00

Files

History

James R. Barlow d68e2f6e34 Fix OCR text layer misalignment with non-zero mediabox origins

Fixes #1630 where --redo-ocr would shift OCR text vertically on PDFs
with non-zero mediabox origins (e.g., [0, 100, width, height+100]).

The bug occurred in _graft_fpdf2_text_layer where the Form XObject BBox
was set to the text layer's mediabox [0, 0, w, h] instead of the base
page's mediabox [0, 100, w, h+100]. This caused a coordinate mismatch
between the BBox and the transformation matrix, resulting in text being
positioned incorrectly.

The fix changes line 450 in _graft.py to use base_mediabox instead of
mediabox, making the fpdf2 renderer consistent with the sandwich renderer
which already used base_mediabox correctly.

This issue commonly affected:
- JSTOR PDFs (generated by iText with cropping)
- Cropped PDFs from various tools
- PDFs with non-standard coordinate systems

Added regression test that creates a PDF with offset mediabox origin
and verifies --redo-ocr preserves coordinates correctly.

2026-02-08 23:55:26 -08:00

..

…

ruff lint and format

2026-01-13 01:50:57 -08:00

…

__init__.py

…

conftest.py

Tighten ruff rules and modernize style

2026-01-27 14:04:52 -08:00

test_acroform.py

ruff lint and format

2026-01-13 01:50:57 -08:00

test_annots.py

…

test_api.py

Rename OCROptions to OcrOptions for consistency

2026-01-12 23:37:54 -08:00

test_check_pdf.py

…

test_completion.py

…

test_concurrency.py

…

test_fpdf_renderer.py

Refactor: move ocr_element to a better location

2026-01-27 14:01:30 -08:00

test_ghostscript.py

Handle Ghostscript rasterization with DPI below 10

2026-01-31 13:01:04 -08:00

test_graft.py

Fix OCR text layer misalignment with non-zero mediabox origins

2026-02-08 23:55:26 -08:00

test_helpers.py

…

test_hocr_parser.py

test: For Windows, ensure outputs are UTF-8

2026-01-20 21:37:13 -08:00

test_hocrtransform.py

…

test_image_input.py

…

test_imageops.py

…

test_json_serialization.py

Tighten ruff rules and modernize style

2026-01-27 14:04:52 -08:00

test_logging.py

…

test_main.py

…

test_metadata.py

Tighten ruff rules and modernize style

2026-01-27 14:04:52 -08:00

test_multi_font_manager.py

…

test_multilingual_direct.py

Tighten ruff rules and modernize style

2026-01-27 14:04:52 -08:00

test_null_ocr_engine.py

Add OCR engine selection framework and null OCR engine

2026-01-12 10:11:14 -08:00

test_ocr_element.py

…

test_ocr_engine_interface.py

Add OCR engine selection framework and null OCR engine

2026-01-12 10:11:14 -08:00

test_ocr_engine_selection.py

Rename OCROptions to OcrOptions for consistency

2026-01-12 23:37:54 -08:00

test_optimize.py

Replace magic Ghostscript raster device strings with StrEnum

2026-01-20 10:44:25 -08:00

test_page_boxes.py

tests: fix test_page_boxes when verapdf unavailable

2026-01-21 00:22:26 -08:00

test_page_numbers.py

…

test_pdf_renderer.py

…

test_pdfa.py

Tidy long lines and unnested with blocks

2026-01-27 15:28:27 -08:00

test_pdfinfo.py

Add Encoding.flate_jpeg to recognize deflated JPEG images

2026-01-30 12:53:59 -08:00

test_pipeline_generate_ocr.py

Various test fixes, mainly Windows issues

2026-01-20 22:28:06 -08:00

test_pipeline.py

Add Encoding.flate_jpeg to recognize deflated JPEG images

2026-01-30 12:53:59 -08:00

test_preprocessing.py

Replace magic Ghostscript raster device strings with StrEnum

2026-01-20 10:44:25 -08:00

test_quality.py

…

test_rasterizer.py

Rename OCROptions to OcrOptions for consistency

2026-01-12 23:37:54 -08:00

test_rotation.py

Replace magic Ghostscript raster device strings with StrEnum

2026-01-20 10:44:25 -08:00

test_semfree.py

…

test_soft_error.py

…

test_stdio.py

…

test_system_font_provider.py

Tidy long lines and unnested with blocks

2026-01-27 15:28:27 -08:00

test_tagged.py

Add --tagged-pdf-mode option to control Tagged PDF handling

2026-01-30 16:15:43 -08:00

test_tesseract.py

…

test_unpaper.py

Normalize unpaper_args to list at construction time

2026-01-31 12:05:37 -08:00

test_userunit.py

…

test_validation.py

Tidy long lines and unnested with blocks

2026-01-27 15:28:27 -08:00

test_verapdf.py

Tidy long lines and unnested with blocks

2026-01-27 15:28:27 -08:00

test_watcher.py

Tidy long lines and unnested with blocks

2026-01-27 15:28:27 -08:00