Commit Graph

1 Commits

Author SHA1 Message Date
James R. Barlow
b4f9673364 Add unit tests for HocrParser, PdfTextRenderer, and OcrElement
Comprehensive test coverage for the new hocrtransform components:

- test_ocr_element.py: Tests for BoundingBox, Baseline, FontInfo,
  OcrElement dataclass methods (iter_by_class, find_by_class,
  get_text_recursive, words/lines/paragraphs properties)

- test_hocr_parser.py: Tests for parsing hOCR files including
  page/paragraph/line/word extraction, RTL text, rotated text,
  different line types (header, caption), font info, and edge cases

- test_pdf_renderer.py: Tests for PDF rendering including text
  extraction verification, page sizing, multi-line content,
  text direction, baseline handling, textangle rotation, word breaks,
  debug options, and image overlay

Also fixes x_font regex pattern to not capture trailing semicolons.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-21 17:05:49 -08:00