Files
OCRmyPDF/docs/jbig2.md
James R. Barlow 16c2604a07 Remove lossy JBIG2 support, retain lossless JBIG2 only
Lossy JBIG2 has been removed due to well-documented risks of character
substitution errors (e.g., 6/8 confusion). The --jbig2-lossy and
--jbig2-page-group-size arguments are now deprecated and ignored with
a warning.

Changes:
- Remove jbig2_lossy and jbig2_page_group_size from OCROptions
- Simplify optimize.py to use single-image JBIG2 encoding only
  (no symbol dictionaries/JBIG2Globals)
- Remove convert_group() from jbig2enc.py
- Deprecate CLI args with warnings for backward compatibility
- Update documentation to explain lossless-only JBIG2
2025-12-23 02:45:07 -08:00

64 lines
2.3 KiB
Markdown

% SPDX-FileCopyrightText: 2022 James R. Barlow
% SPDX-License-Identifier: CC-BY-SA-4.0
{#jbig2}
# Installing the JBIG2 encoder
Most Linux distributions do not include a JBIG2 encoder since JBIG2
encoding was patented for a long time. All known JBIG2 US patents have
expired as of 2017, but it is possible that unknown patents exist.
JBIG2 encoding is recommended for OCRmyPDF and is used to losslessly
create smaller PDFs. If JBIG2 encoding is not available, lower quality
CCITT encoding will be used for monochrome images.
JBIG2 decoding is not patented and is performed automatically by most
PDF viewers. It is widely supported and has been part of the PDF
specification since 2001.
JBIG encoding is automatically provided by these OCRmyPDF packages: -
Docker image (both Ubuntu and Alpine) - Snap package - ArchLinux AUR
package - Alpine Linux package - Homebrew on macOS
For all other platforms, you would need to build the JBIG2 encoder from
source:
:::{code} bash
git clone https://github.com/agl/jbig2enc
cd jbig2enc
./autogen.sh
./configure && make
[sudo] make install
:::
Dependencies include libtoolize and libleptonica, which on Ubuntu
systems are packaged as libtool and libleptonica-dev. On Fedora (35)
they are packaged as libtool and leptonica-devel. For this to work,
please make sure to install `autotools`, `automake`, `libtool`, `pkg-config`
and `leptonica` first if not already installed. Other dependencies might
be required depending on your system.
:::{code} bash
[sudo] apt install autotools-dev automake libtool libleptonica-dev pkg-config
:::
## JBIG2 Compression
OCRmyPDF uses JBIG2 lossless compression for bitonal (black and white)
images. This provides excellent compression ratios compared to the older
CCITT G4 standard, while preserving the exact pixel content of the
original image.
You can adjust the threshold for JBIG2 compression with
`--jbig2-threshold`. The default is 0.85.
:::{note}
Previous versions of OCRmyPDF supported a lossy JBIG2 mode
(`--jbig2-lossy`). This feature has been removed due to the well-known
risk of character substitution errors (e.g., 6/8 confusion). See
[JBIG2 disadvantages](https://en.wikipedia.org/wiki/JBIG2#Disadvantages)
for more information on why lossy JBIG2 is problematic. The `--jbig2-lossy`
and `--jbig2-page-group-size` arguments are now ignored with a warning.
:::