mirror of
https://github.com/ocrmypdf/OCRmyPDF.git
synced 2026-04-18 05:00:01 -04:00
poppler/pdftotext does not carry Tz (horizontal scaling) across BT/ET boundaries, causing words to appear on separate lines. Replace per-word BT blocks (via fpdf2's cell/set_stretching API) with a single BT block per line using raw PDF operators. Each non-last word gets a trailing space with Tz calculated to span exactly to the next word's start position.