I'm not fully happy with this arrangement, as it effectively downloads
OCRmyPDF twice, not to mention the lengthy setup time overall.
Will need to try separate build/run images in the future, but now just
get it working again.
setuptools_scm barfs because it can't find the version, because Docker hub
retrieves the application from Github in a way that omits the necessary
details.
I suppose there is a certain logic to Docker only using the tagged
released versions from PyPI, so go with it. The other attractive option
is to nix setuptools_scm.
debian and ubuntu both install unpaper 0.4.2 or so. No .deb packages
available at higher version numbers although ArchLinux had something.
Considered making a separate image to handle building and install but
decided that was a premature optimization at this point, so just build
the unpaper that works. All tests pass.
Switched from Ubuntu to debian:stretch because stretch has more recent
versions of our binary packages and starts smaller. In particular,
stretch has both pillow==2.9.0 and reportlab==3.2.0 available as system
packages which saves the considerable hassle of install a toolchain.
Instead, a pyvenv is set up with access to system's site-packages (note:
needs two steps), making the binary-dependent packages available. Then
the remaining packages are installed into the pyvenv with --no-cache-dir
to avoid saving files. And there we are.
Image is still very large (>500 MB), but programs like reportlab require
font rendering capabilities so they pull in large portions of the Linux
graphics stack. Not much will shrink that.