Files
home-information/Dockerfile
Thiago Trautwein Santos e2c0867cb0 Feature/82 Implement Media Thumbnail Previews for File Attributes (#267)
* Implement media thumbnail previews for file attributes

* Replace PyMuPDF with pdf2image for PDF thumbnail generation

* create utility file for thumbnail generation

* Refactor thumbnail generation logic

* Add backfill command for missing file thumbnails and update docker_entrypoint script

* Added poppler dependency (needed for pdf2img) to dev setup docs.

* Fixed file card grid alignment with uniform thumbnail heights.

* Fix thumbnail backfill command test: filename desync via direct path

The test saved a file at a hardcoded source_path, then created an
EntityAttribute with file_value=source_path. AttributeModel.save() calls
generate_unique_filename() to add a timestamp suffix to file_value.name,
which desyncs the attribute's stored path from the on-disk path —
file_value.size then raises FileNotFoundError inside the backfill command.

Pass the file bytes via ContentFile(..., name='...') instead. The model
now owns the full filename-generation lifecycle and the on-disk and
in-database paths stay in sync.

Applied symmetrically to the dry-run test for consistency, even though
its assertions happened to pass without seeing the underlying mismatch.

* Switch thumbnail backfill from Docker entrypoint to render-time lazy gen

The Docker entrypoint's call to ``backfill_attribute_thumbnails`` had
unbounded startup cost for users with many file attributes, paid
overhead on every container restart even when nothing needed
generation, and was a permanent ongoing entry point for what's really
a transient migration need.

Replace with synchronous lazy generation triggered the first time a
file card is rendered for an attribute that lacks a thumbnail. Each
file attribute pays the generation cost once, on first view, spread
across actual usage instead of forced into startup.

- ``package/docker_entrypoint.sh``: drop the backfill invocation.
- ``AttributeModel.ensure_thumbnail()``: model method that generates
  if missing, no-op otherwise.
- ``{% ensure_thumbnail attribute %}``: template tag wrapper for
  call-from-template ergonomics.
- ``file_card.html``: invoke the tag once at the top, before any
  ``has_thumbnail`` / ``thumbnail_url`` reads.

The ``backfill_attribute_thumbnails`` management command stays
available for users who want to warm the cache explicitly. Its
counters remain accurate because the new render-time path is
isolated to ``ensure_thumbnail()`` — ``has_thumbnail`` stays a
pure existence check.

* Harden PDF thumbnail generation against pathological input

The 20MB pre-render byte cap was the only protection against expensive
thumbnail generation, but PDF rendering cost doesn't correlate well
with file size — a small PDF can produce a multi-GB pixel buffer at
pdf2image's 200-DPI default, and a crafted PDF can hang the underlying
pdftoppm subprocess indefinitely.

Three protections, all on the PDF path:

- ``size=(640, 640)`` passed to ``convert_from_bytes()``: caps the
  rasterized output dimensions, keeping pre-resize memory bounded.
  640 gives roughly a 2x oversample of the 320x320 thumbnail for
  LANCZOS quality without runaway buffers.
- ``timeout=30`` threaded through to the pdftoppm subprocess so
  pathological content can't hang generation.
- 10MB per-mime-type source-byte cap for PDFs, separate from the
  existing 20MB cap for images.

Constants exposed as class attributes on ``AttributeThumbnail`` so
they're discoverable and tunable in one place.

* Install poppler-utils in CI for pdf2image-dependent tests

pdf2image shells out to pdftoppm (from the poppler-utils package) to
rasterize PDFs. The Docker image already installs poppler-utils, but
the GitHub Actions workflow runner did not, so
test_generate_thumbnail_best_effort_pdf_success passes locally and in
Docker but failed in CI on the ``assertTrue(generated)`` check.

Add an apt-get install step for poppler-utils so CI exercises the
same PDF-rendering surface as production.

* Add tests for AttributeModel.ensure_thumbnail() lazy-generation hook

ensure_thumbnail() wraps AttributeThumbnail.generate_thumbnail_best_effort()
with three branches that aren't transitively covered by the existing
generator tests or by the backfill command tests:

- supported file, thumbnail missing -> generates
- thumbnail already present -> short-circuits before instantiating the
  generator (verified by mocking AttributeThumbnail)
- unsupported mime type -> short-circuits at the supports check

Co-located with the existing generate_thumbnail_best_effort_* tests
so a future reader finds the lazy hook's coverage in the same place
as the generator's direct coverage.

---------

Co-authored-by: Anthony Cassandra <github@cassandra.org>
2026-05-15 14:08:27 -05:00

55 lines
1.6 KiB
Docker

# Pin specific Python version for consistency across platforms
FROM python:3.11.8-bookworm
# Install dependencies with curl for healthcheck
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
curl \
supervisor \
nginx \
redis-server \
redis-tools \
poppler-utils \
&& mkdir -p /var/log/supervisor \
&& mkdir -p /etc/supervisor/conf.d \
&& rm -rf /var/lib/apt/lists/* \
&& pip install --upgrade pip
WORKDIR /src
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
ENV PYTHONPATH=/src
EXPOSE 8000
VOLUME /data/database /data/media
RUN mkdir -p /data/database && mkdir -p /data/media
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
# Assumes base.txt is all that is needed (ignores dev-specific dependencies)
COPY src/hi/requirements/base.txt /src/requirements.txt
RUN pip install --no-cache-dir --root-user-action=ignore -r requirements.txt
COPY package/docker_supervisord.conf /etc/supervisor/conf.d/hi.conf
COPY package/docker_nginx.conf /etc/nginx/sites-available/default
# Clean up nginx default configurations and ensure proper symlinks
RUN rm -f /etc/nginx/conf.d/default.conf \
&& rm -f /etc/nginx/sites-enabled/default \
&& ln -s /etc/nginx/sites-available/default /etc/nginx/sites-enabled/default \
&& nginx -t
COPY package/docker_entrypoint.sh /src/entrypoint.sh
RUN chmod +x /src/entrypoint.sh
COPY HI_VERSION /HI_VERSION
COPY src /src
RUN chmod +x /src/bin/docker-start-gunicorn.sh
ENTRYPOINT ["/src/entrypoint.sh"]
CMD ["/usr/bin/supervisord", "-c", "/etc/supervisor/supervisord.conf" ]