Commit Graph

4 Commits

Author SHA1 Message Date
James R. Barlow
efb83ad64f Add --ghostscript-jpeg-quality and --ghostscript-jpeg-maxdpi
Expose Ghostscript's -dJPEGQ and image downsampling switches as
advanced, plugin-scoped options for tuning PDF/A output, without
polluting the central OcrOptions registry. The optimizer's existing
--jpeg-quality remains the recommended JPEG quality control.

- GhostscriptOptions gains jpeg_quality and jpeg_maxdpi fields and CLI
  args (advanced help text). jpeg_quality=0 is honored as Ghostscript's
  maximum compression rather than being silently coerced to the default.
- _exec.ghostscript.generate_pdfa() forwards both values; when
  jpeg_maxdpi is set, downsample threshold is pinned at 1.0.
- _get_plugin_options falls back to extra_attrs for namespaced fields
  so plugins can own their options without registering them centrally.
- Documentation explains the rationale: Ghostscript is the legacy path
  (pypdfium + verapdf is preferred in v17+), the optimizer is the
  supported file-size lever, and lowering quality is almost always a
  better trade than downsampling.
2026-05-25 10:20:54 -07:00
James R. Barlow
16c2604a07 Remove lossy JBIG2 support, retain lossless JBIG2 only
Lossy JBIG2 has been removed due to well-documented risks of character
substitution errors (e.g., 6/8 confusion). The --jbig2-lossy and
--jbig2-page-group-size arguments are now deprecated and ignored with
a warning.

Changes:
- Remove jbig2_lossy and jbig2_page_group_size from OCROptions
- Simplify optimize.py to use single-image JBIG2 encoding only
  (no symbol dictionaries/JBIG2Globals)
- Remove convert_group() from jbig2enc.py
- Deprecate CLI args with warnings for backward compatibility
- Update documentation to explain lossless-only JBIG2
2025-12-23 02:45:07 -08:00
James R. Barlow
e4a8f7a354 Remove redundant optimizer content 2025-04-17 15:10:59 -07:00
James R. Barlow
3b9367fc69 Continuing rst -> md 2025-04-17 02:27:59 -07:00