Commit Graph

89 Commits

Author SHA1 Message Date
James R. Barlow
37e7131a01 Drop support for Python 3.10, require Python 3.11+
Python 3.11 is now the minimum supported version. This aligns with
the codebase's use of StrEnum (introduced in 3.11) and removes
compatibility shims that were only needed for older versions.
2026-01-20 11:54:55 -08:00
James R. Barlow
2f4280b66c Comprrehensive documentation update in preparation for v17 2026-01-16 01:38:47 -08:00
James R. Barlow
3f328785f0 Fix pypdfium rasterizer to match Ghostscript dimensions
The pypdfium rasterizer was producing output images that differed by 1
pixel compared to Ghostscript due to floating-point precision issues in
dimension calculations.

Root cause:
- pypdfium used harmonic mean of x/y DPI to calculate a single scale
  factor, losing the distinction between x and y DPI
- No DPI rounding like Ghostscript's 6-decimal precision
- Compound rounding errors when converting points to pixels

Solution:
1. Round DPI to 6 decimals to match Ghostscript's precision
2. Calculate expected output dimensions using separate x/y DPI values
3. Handle dimension swapping for 90°/270° rotations
4. Resize output image if off by 1-2 pixels (graceful correction)

This ensures pixel-perfect matching with Ghostscript while being
minimally invasive and only resizing when necessary.

Changes:
- Modified _render_page_to_bitmap() to calculate expected dimensions
- Modified _process_image_for_output() to correct small discrepancies
- Updated rasterize_pdf_page() to pass dimensions through pipeline
- Parametrized rotation tests to run with both rasterizers

All 45 rotation tests now pass with both pypdfium and ghostscript.

Fixes test_rotated_skew_timeout with pypdfium rasterizer.
2026-01-14 14:37:24 -08:00
James R. Barlow
5acf21651f ruff lint and format 2026-01-13 01:50:57 -08:00
James R. Barlow
4c7086c609 Replace typer with cyclopts CLI library in misc scripts
Migrate watcher.py and pdf_text_diff.py from typer to cyclopts for
CLI argument parsing. Update pyproject.toml to reflect the dependency
change in the watcher optional feature.
2026-01-13 00:43:14 -08:00
James R. Barlow
bf76c8270c Rationalize optional dependencies vs dependency groups
Establish clear separation between user-facing optional dependencies
and developer-only dependency groups:

**Optional Dependencies (user features):**
- watcher: File watching service for batch processing
- webservice: Streamlit-based web UI
- Installable via: uv sync --extra <name> or pip install ocrmypdf[name]

**Dependency Groups (developer tools):**
- test: Testing infrastructure (merged from test + extended_test)
- docs: Documentation building tools
- streamlit-dev: Enhanced Streamlit development tools
- dev: General development tools (mypy, ipykernel)
- Installable via: uv sync --group <name> (uv only, NOT pip)

Breaking changes for developers:
- pip install -e .[test] no longer works → use uv sync --group test
- pip install -e .[docs] no longer works → use uv sync --group docs
- pip install -e .[extended_test] removed → merged into test group

No breaking changes for end users:
- pip install ocrmypdf[watcher] still works
- pip install ocrmypdf[webservice] still works

Updated:
- CI/CD workflows to use uv sync --group test
- Docker images to exclude test dependencies
- Documentation to recommend uv with pip as fallback
- pyproject.toml with clear comments explaining both systems
2026-01-13 00:34:55 -08:00
James R. Barlow
7a4b98974c Integrate fpdf2 renderer and remove legacy hOCR renderer
- Update pipeline to use fpdf2 renderer as default
- Remove legacy hocrtransform PDF renderer (_font.py, _hocr.py,
  pdf_renderer.py)
- Update CLI and options for fpdf2 renderer
- Add fpdf2 dependency to pyproject.toml
- Update graft module for fpdf2 multi-page rendering
2026-01-06 13:45:44 -08:00
James R. Barlow
cf3fb6e89b Fix raster_device settings for pypdfium rasterizer 2025-12-21 12:29:17 -08:00
James R. Barlow
d556014185 Remove language warning 2025-12-13 11:41:58 -08:00
Chris Mayo
9dbce33ee6 Update Changelog URL (#1597)
Renamed in:
d1a45e4a ("Convert remaining rst -> md", 2025-04-17)
2025-11-16 23:10:48 -08:00
James R. Barlow
abc2d41e2d Require recent pikepdf to fix check_pdf_syntax issue 2025-10-29 11:40:51 -07:00
James R. Barlow
8a784d6052 Drop explicit norecursedirs setting, which we no longer need 2025-06-13 00:03:24 -07:00
James R. Barlow
6f6448f286 Update dependency lockfile 2025-05-27 14:19:09 -07:00
James R. Barlow
9f6e5a48ad Deny use of pikepdf 9.8.0 due to GlyphlessFont error 2025-05-27 12:16:19 -07:00
James R. Barlow
92a78f611e rst -> md migration in progress 2025-04-17 02:10:40 -07:00
James R. Barlow
15a77c9d69 Modernize pyproject license specification 2025-04-06 01:07:25 -07:00
James R. Barlow
b5bc1d209c Remove ttyd 2025-02-26 14:53:13 -08:00
James R. Barlow
073a434ab3 Fix webservice interactions with Docker 2025-01-04 12:09:32 -08:00
James R. Barlow
dd6ed4c5f8 Switch to streamlit based web app 2025-01-01 17:26:22 -08:00
James R. Barlow
3c4b099cb1 Don't try to install dev dependencies in build 2024-12-08 14:03:04 -08:00
James R. Barlow
15df9c370c Update notes 2024-12-08 12:20:40 -08:00
James R. Barlow
a659f83d67 Remove invalid hyperlink annotations to satisfy Ghostscript 10.x during PDF/A conversion
Closes #1425
2024-11-16 19:02:10 -08:00
James R. Barlow
b9dd0a5e3c Use hatchling and hatch-vcs as build backend 2024-08-31 01:08:23 -07:00
James R. Barlow
6949ad2c5d pyproject: link changelog 2024-08-31 00:45:46 -07:00
James R. Barlow
b38cac6931 Drop wheel from build requires
Not usually needed anymore
2024-08-28 01:31:56 -07:00
James R. Barlow
0674829d8f Remove tool.black config 2024-04-07 00:36:52 -07:00
James R. Barlow
315aa0474b Merge branch 'main' of github.com:ocrmypdf/OCRmyPDF 2024-04-07 00:34:51 -07:00
Ben Beasley
df3451e779 Update the typer[all] dependency to typer-slim[standard] (#1287)
In 0.12.1, Typer was significantly reorganized.

- `typer-slim` is the library (for `import typer`)
- `typer-slim[standard]` adds optional dependencies (currently `rich`
  and `shellingham`, basically equivalent to the old `typer[all]`)
- `typer-cli` is the `typer` command-line tool
- `typer` is now basically a metapackage that brings in *all of the
  above*, and it no longer has an `all` extra

Pip will warn about this and proceed,

```
WARNING: typer 0.12.1 does not provide the extra 'all'
```

but there are other tools that will fail hard when asked to resolve a
(now) nonexistent extra.

Since this project doesn’t need the `typer` command-line tool, it looks
like changing the dependency to `typer-slim[standard]` is the best way
forward.

See https://typer.tiangolo.com/release-notes/#0121 and
tiangolo/typer#785 for further discussion
and details.
2024-04-07 00:34:34 -07:00
James R. Barlow
d6342cb8c2 Add heif/heic input image support 2024-04-07 00:33:13 -07:00
James R. Barlow
065bddbc6c Reformat with ruff format 2024-04-07 00:25:32 -07:00
James R. Barlow
8d30cff4ef Undo future annotations from watcher.py till Typer fixes its issue
Fixes #1258
2024-02-20 19:14:39 -08:00
James R. Barlow
906c130f96 Update rust toml settings 2024-02-14 12:32:26 -08:00
James R. Barlow
fddf3ce2f4 Clarify warnings filter 2024-02-12 01:43:47 -08:00
James R. Barlow
26470fe16a Suppress reportlab deprecation warning 2024-02-12 01:17:08 -08:00
James R. Barlow
73ed33a086 Tighten dependencies 2023-12-20 12:33:18 -08:00
James R. Barlow
30d92ad83f Fix build settings to adjust for dropping py39 2023-12-07 23:40:45 -08:00
James R. Barlow
43618e6b3f Move canvas API to pikepdf and import it 2023-12-02 19:42:35 -08:00
James R. Barlow
3f7b540f76 Drop Python 3.9 support 2023-11-21 00:46:00 -08:00
James R. Barlow
8d1e75017e Remote reportlab backend and make reportlab a test-only dependency 2023-11-19 23:51:27 -08:00
James R. Barlow
9898904be7 Fix pikepdf PdfMatrix deprecation warning; v15.4.3 release notes 2023-11-15 20:27:16 -08:00
James R. Barlow
990b462a94 Fix coverage settings and cover semfree 2023-10-24 00:54:31 -07:00
James R. Barlow
c0637c287e vscode isn't ready for black py312, revert 2023-10-24 00:54:31 -07:00
James R. Barlow
ad3a1dbbad deps: update PyMuPDF req 2023-10-24 00:54:31 -07:00
James R. Barlow
0655f8e7ae Add py312 to black coverage 2023-10-24 00:54:30 -07:00
James R. Barlow
0565cb0b10 misc/watcher.py: use Typer and dotenv to improve ease of use 2023-10-20 19:56:39 -07:00
James R. Barlow
91a14660b3 Require Pillow >= 10.0.1 and drop shims for older versions 2023-10-04 00:04:28 -07:00
James R. Barlow
3829af16fb Fix bdist_wheel tag set to py38 2023-09-26 15:46:47 -07:00
James R. Barlow
8132a4ae10 Update release notes and files 2023-09-26 00:37:04 -07:00
James R. Barlow
0239f69912 Tigthen Python dependencies 2023-09-21 00:05:39 -07:00
James R. Barlow
e8c82ee4b6 Remove shim for img2pdf < 0.4.4 2023-09-21 00:05:39 -07:00