Files
home-information/docs/integrations
Tony C 40f932e088 Paperless-ngx integration (ATTRIBUTE_REFERENCE capability) (#375)
* ATTRIBUTE_REFERENCE capability + framework backend (#232 phase 1)

Adds the framework seam for the new ATTRIBUTE_REFERENCE
integration capability, parallel to the existing CONNECT and
IMPORT capabilities. Integrations advertising this capability
contribute a search-and-attach affordance to HI's attribute-edit
UI: the operator searches the upstream corpus, multi-selects
matching results, and HI creates one regular TEXT attribute per
selection on the host Entity or Location.

Pieces:

  - hi.integrations.enums.IntegrationCapability.ATTRIBUTE_REFERENCE
    joins CONNECT and IMPORT.
  - hi.integrations.referencer is the new package owning the
    per-capability abstract class (IntegrationAttributeReferencer),
    the picker-only result dataclass (AttributeReferenceResult),
    and the shared wire-field constants (WireField).
  - IntegrationGateway.get_attribute_referencer() returns None
    by default; integrations override to opt in.
  - Two POST endpoints under /api/integrations/referencer/:
    - search/<integration_id> dispatches a query to the
      integration's referencer and returns JSON results for the
      picker UI to render.
    - attach/<integration_id> takes a list of selections and an
      attribute-owning item (Entity or Location, keyed by
      ItemType) and creates one TEXT attribute per selection.
    Both views require edit-mode; the attach view rejects
    unsupported item types, malformed payloads, and stale
    references to integrations that no longer advertise the
    capability.

No UI yet — that's phase 2. With a stub referencer, the search
and attach endpoints are exercisable end-to-end and the test
suite covers both happy paths and error branches (empty query,
limit clamp/parse, upstream exception → 502, unknown integration
→ 404, unsupported item_type → 400, malformed selections → 400,
title truncation to model max_length, etc.).

* Phase 2: ATTRIBUTE_REFERENCE picker (HiModal + antinode)

Adds the operator-facing picker that searches an enabled
ATTRIBUTE_REFERENCE integration and attaches selected results as TEXT
attributes on an Entity or Location. The picker is a single
HiModalView with server-driven multi-select state; the picker URL
takes no integration_id and the form discovers / selects integrations
internally so the action-bar surface stays a single "Link" button
regardless of how many referencers are configured.

- Picker view (search re-render via antinode partial swap; attach via
  refresh_response). Multi-select state carried in the form across
  re-searches (selections_json + visible_url + result_url + remove_url).
- Picker modal + body templates; integration selector shown only when
  more than one referencer is enabled.
- has_attribute_referencers template tag; "Link" button wired into the
  Entity and Location edit content body action bars.
- View tests cover GET, search re-render, attach, multi-select state
  transitions, and stale-integration rejection.

* Phase 3: paperless-ngx simulator with parametric search responses

Adds a paperless-ngx stub simulator that responds to any documents
search with a synthetic, parametrically-shaped result list. Unlike
the entity-shaped simulators, paperless has no SimEntity rows — it
contributes only via ATTRIBUTE_REFERENCE, so its state is a small
ephemeral settings dataclass on the singleton (same lifecycle as
HomeBox's _api_version): result count, mime mix, thumbnails on/off,
snippets on/off, and optional artificial latency. Operator tunes
each knob from an extras-pane form that submits on change.

The result generator seeds its RNG from the query string so repeated
searches return the same set, and every title carries the query as
a visible breadcrumb so the developer can verify the picker is
wiring the search input through. /api/documents/<id>/thumb/ returns
an inline SVG when thumbnails are on and 404s when off, exercising
the picker's fallback-icon path; /documents/<id>/details/ stands in
for the per-document view paperless source URLs link to.

The shared services/pages/service.html now gates profile CRUD, the
ADD ENTITY menu, and the entity list on sim_entity_definition_list
being non-empty so ephemeral simulators don't render irrelevant
controls. Connect simulators with zero current entities are
unaffected — they still have definitions, so the empty-state alert
fires as before.

* Phase 4: paperless-ngx integration (ATTRIBUTE_REFERENCE)

First integration to declare only the ATTRIBUTE_REFERENCE capability.
Lets the operator search a paperless-ngx server from inside HI's
picker modal and attach matching documents as TEXT-attribute links
on existing Entity / Location records. No connector, no importer, no
monitors, no manager singleton — the gateway returns a referencer
and a single thumbnail proxy view bridges browser-embedded <img>
tags to the upstream API.

- PaperlessGateway, PaperlessAttributeReferencer, and a thin
  PaperlessClient over requests. Auth via Authorization: Token
  <value>. Trailing slash on the configured API URL is forgiving.
- ThumbnailProxyView fetches upstream with the configured token and
  streams bytes back so the picker's embedded thumbnails work
  without exposing the token to the browser. Document source URLs
  point directly at paperless's web UI — operators authenticate
  with paperless's own session when clicking the saved link.
- Snippet extraction is client-side (paperless returns full content
  per hit, no excerpt field): ~160-char window around the matched
  query with ellipses for clipped edges, leading-window fallback
  when the query does not match verbatim.
- Wire-format strings centralized in pl_models.PaperlessApi per
  project convention.
- Tests cover validation, client + factory (including disabled /
  missing-attribute paths), referencer translation + snippet
  extraction edge cases, gateway probe (200 / 401 / 5xx /
  connection error), and proxy view (unconfigured / upstream 404 /
  upstream auth failure / connection error / success).
- Per-integration docs added (user-facing + developer-facing) and
  linked from the integrations landing page; the lead-in copy is
  updated to acknowledge the new "attachable references" shape
  alongside the existing CONNECT / IMPORT integrations.

* Phase 5: Content Sources tab + CapabilityGateway abstraction

Adds a Content Sources config tab for ATTRIBUTE_REFERENCE-capability
integrations (paperless-ngx). Sibling to the renamed Connectors tab.
Each tab now belongs to exactly one capability — the two surfaces are
semantically different (live mirror vs. on-demand reference) and
sharing a page was forcing capability awareness into framework chrome.

Page model: per-integration sidebar nav, attribute form with a
single ENABLE/UPDATE primary action that always runs schema +
upstream-access validation, and a DISABLE button (when enabled) that
rides the same form via an ``action=disable`` POST. ENABLE flips
is_enabled to True only after the upstream probe succeeds (atomic
"nothing changes on failure" semantics). DISABLE skips validation
and just flips is_enabled to False. The framework re-renders the
form area in place after each action, so the status badge and the
primary-button label update without a full page reload.

Architectural refactor: introduce ``CapabilityGateway`` as the
shared base of ``IntegrationConnector`` / ``IntegrationImporter`` /
``IntegrationAttributeReferencer``. The three peers had nothing in
common before, leaving cross-capability concerns (description text,
action-bar template fragment) with no natural home. The new base
carries ``capability`` (class attribute), ``get_description()``, and
``get_attribute_actions_template_name()`` — each capability owns
its own defaults. The connector's existing sync-flow ``get_description
(is_initial_connect)`` is renamed to ``get_sync_description`` so it
does not collide with the general capability description.

``IntegrationAttributeItemEditContext`` now takes a
``CapabilityGateway`` instance instead of a capability enum and
exposes it in the template context, so
``integrations/panes/integration_edit_content_body.html`` is now
capability-agnostic: a single ``{% include %}`` of whatever
fragment the active capability returns. The previous inline
CONNECT-only health badge moves into
``connect_attribute_actions.html``.

attr.js threads the submit ``event.submitter`` into ``FormData``
so multi-submit-button forms (e.g., the new ENABLE / DISABLE form)
include the clicked button's ``name=value`` in the POST body. This
matches native browser behavior.

UI labels: the existing ``Integrations`` tab becomes ``Connectors``
(internal CONNECT routes / module names unchanged); the new tab is
``Content Sources`` (code naming uses ``reference`` to align with
the capability and the surrounding ``referencer/`` directory).

* Content Sources page: source description, label-stale fix,
history/restore view capability cleanup

- Per-source description below the integration title: "Find related
  content in <label> and link to it." Uses the integration label
  directly so future ATTRIBUTE_REFERENCE integrations get sensible
  copy automatically.
- Rebuild the attribute edit context after the first-time ENABLE
  flips is_enabled to True; otherwise the async response re-rendered
  with the stale "ENABLE" button label and disabled-state action bar
  even though the integration was now enabled.
- IntegrationAttributeHistoryInlineView / IntegrationAttributeRestoreInlineView
  pass capability_gateway=None instead of get_connector(). These ops
  target a specific attribute by id, so the capability filter inside
  the edit context is never consulted — the previous hardcoded
  CONNECT was a no-op that silently mismatched any non-CONNECT
  integration.

* Apply attribute action-bar button styling to anchor-styled buttons

The .attr-v2-action-buttons CSS targeted only <button> elements, so
the new LINK anchor on entity / location edit pages rendered smaller
and top-aligned instead of matching its <button> siblings (Add File,
Add Info). Extend the selectors to cover a.btn inside the action
bar and add explicit inline-flex centering so the icon and text
line up with the other buttons.

* Picker: JS-owned selection state, split search / attach endpoints

The previous picker maintained selection state on the server, which
meant ticking a result card did nothing visible until the next form
submit — operators saw "Add 0 References" stuck disabled after their
first check and assumed the picker was broken. Multi-selection now
behaves as one transaction: JS owns the in-memory selection set,
search calls are async within that state context, and the only POST
that touches the server is the final Add submit.

attr-picker.js (new, in js_hi_grid_header_content) owns:
  - The selection set (per-modal WeakMap keyed by the picker DOM).
  - Chip-row rendering (the one piece of HTML the JS generates).
  - The Add button's label and enabled state.
  - The async search request: POST to a new search endpoint that
    returns ONLY the result-cards partial; JS swaps it into the
    results container and re-applies checked state to any newly-
    visible result whose URL is already selected.
  - The Add submit: serialize the selection set into a hidden
    selections_json input, then route through antinode's public
    API (AN.hideModalIfNeeded + AN.post) so modal cleanup and the
    {refresh: true} response flow through the framework. Form
    deliberately omits data-async because antinode binds at <body>
    level and would otherwise race attr-picker's document-level
    submit handler.

Server endpoints split: picker GET stays, plus new search and
attach POST endpoints. Old multi-select bookkeeping
(_compute_selections, visible_url[], remove_url, action=attach
button) deleted along with the WireField class.

Client/server constant sharing now follows the project's
established DIVID pattern: ATTR_PICKER_* entries in
hi.constants.DIVID (server) and Hi.ATTR_PICKER_* in main.js
(client). Templates, view code, and JS all read from these — no
more magic strings to drift between sides.

Simulator fix: paperless's _generate_results used document_id =
index + 1, which meant different queries produced overlapping ids
(and thus URLs). The JS keys selections by source_url, so the first
result of query A and the first result of query B were treated as
the same document. Document id is now a hash of (query, index) so
different queries produce distinct ids while repeating the same
query stays stable — matching real paperless's "same document
re-found in multiple searches" behavior.

CSS: a.btn-styled buttons inside .attr-v2-action-buttons inherit
the same min-height / padding / font-size / border-radius / focus /
hover / icon-spacing rules as their <button> siblings, so the
anchor-styled LINK button on Entity / Location edit pages lines up
with Add File and Add Info.

* Picker: footer-anchored chips + Add, sticky search, seeded initial query

Layout reshape using Bootstrap's built-in ``modal-dialog-scrollable``
opt-in (only the picker modal opts in; other modals are unaffected):

- Footer is now the commitment zone. It holds the selection chip
  row above a Cancel/Add row. The footer stays anchored regardless
  of how far the operator scrolls through results, so the
  accumulated selections and the Add button are always visible
  together.
- Body holds only the search form and the results container. The
  search form sticks to the top of the body's scroll region
  (position: sticky) so refining the query stays one click away
  no matter how far down in the result list the operator is.
- The picker-root class moved up to the modal-dialog so JS handlers
  in both body (result cards) and footer (chip-X buttons, attach
  form) can ``closest()`` to the same picker instance.

CSS additions, scoped to ``modal-dialog-scrollable`` so the
flex-shrink contract works correctly with this project's custom
``.modal-title`` (vs. Bootstrap's default ``.modal-header``):

- ``.modal-dialog-scrollable .modal-title``, ``.modal-footer`` get
  ``flex-shrink: 0`` so the title and footer keep their natural
  height when the body grows. Without this the title squashed to
  near-zero height on tall result lists.
- ``.hi-attr-picker .modal-body { padding-top: 0 }`` so the sticky
  search form sits flush against the title — otherwise the
  modal-body's default padding-top showed scrolling content
  through the gap above the sticky element.

Initial-render seeded with owner's name:

- The picker GET now runs the same ``_search_upstream`` the search
  endpoint uses, with ``owner.name`` as the query, and seeds the
  results container with the resulting ``picker_results.html``
  partial. Operator opens the picker and sees relevant matches
  immediately — no retype, no extra click. Same template + same
  context variables (``query``, ``results``) as the search
  endpoint so initial-render and re-search-render are structurally
  identical.

* Picker: source banner always rendered, wider modal, "Link Content" copy

The picker UI now has the same shape for one source or many — the
modal doesn't change structurally when a second ATTRIBUTE_REFERENCE
integration is configured.

- Source banner: replaces the previous conditional ``<select>`` in
  the multi-source case + hidden input in the single-source case.
  Always-rendered Bootstrap dropdown whose button face shows
  ``[logo] [name] [caret]``. With one source the caret is hidden
  and the button is disabled (the visual stays consistent without
  inviting a no-op click). On selection: attr-picker.js updates the
  banner face, sets the hidden ``integration_id`` input, and
  re-submits the search form — natural for "search the current
  query against a different source."

- Modal width bumped to ``hi-modal-dialog-700`` so the title bar
  fits "Link Content" and the result cards have room to breathe.

- Operator-facing copy aligned with the "Content Sources" tab:
  modal title is now constant "Link Content" (was conditional
  "Link References" / "Add from <name>"); attach button reads
  "Add N Link(s)"; the chip-row empty hint reads "No content
  selected yet."; the action-bar button on Entity / Location
  edit pages reads "Link Content".

- DIVID + main.js gain matching ``ATTR_PICKER_SOURCE_*`` entries
  for the banner classes, the dropdown-item class, and the three
  data attributes the dropdown items carry (source id, logo URL,
  label) so the JS handler can read them without re-hardcoding
  any strings.

- CSS keeps the banner on white so source logos meant for
  light backgrounds render correctly (the modal title bar uses a
  colored background and would clash). Single-source modifier
  class hides the caret and removes the disabled-button visual
  fade.

* Align operator-facing copy with Connectors / Content Sources terminology

When the config tab was renamed from "Integrations" to "Connectors"
(and the new "Content Sources" tab was added for ATTRIBUTE_REFERENCE
integrations), several operator-facing strings continued to use the
older "Integration(s)" wording. Updated to match the page labels
operators now see:

- Connectors page sidebar button: "INTEGRATIONS" → "CONNECTORS"
- Empty-state copy on the Connectors page: "no integrations
  currently configured" / "CONFIGURE INTEGRATIONS"
- The "All Integrations" picker modal title → "All Connectors"
- Disable-confirm modal: "Disable {label} Integration?" /
  "This integration has..." / "detached from this integration" /
  "If you re-configure this integration later..." → use "connector"
  consistently.
- Pre-sync confirm modal: same "This integration has..." /
  "detached from this integration" → "connector".
- Data Import card note: "Also available as an Integration." →
  "Also available as a Connector."
- Data Import empty state: "No integrations support Data Import."
  → "No data importers currently defined." (the prior wording was
  technically accurate but ambiguous given the new tab labels;
  the new wording matches what the operator is looking at).
- "Data Import vs. Integration" help modal → "Data Import vs.
  Connector"; body text refers to the contrast as Data Import vs.
  Connector and uses "external system" instead of "upstream source".

Internal code identifiers (``IntegrationGateway``,
``integration_id``, ``ConfigPageType.INTEGRATIONS_CONNECT``, etc.)
unchanged — they were never operator-visible.

* Added new link icon and used for new attribute referrer integration.

* Lift get_metadata onto CapabilityGateway base

``IntegrationConnector``, ``IntegrationImporter``, and
``IntegrationAttributeReferencer`` all returned the integration's
``IntegrationMetaData`` constant but disagreed on the method name —
the connector used ``get_integration_metadata`` while the other two
used ``get_metadata``. Same fact, three implementations, two names.

Move the contract onto ``CapabilityGateway`` as a single abstract
``get_metadata()``, drop the duplicate declarations from the
importer / referencer bases, and rename the connector's variant.

- IntegrationConnector: ``get_integration_metadata`` →
  ``get_metadata`` (with the three internal call sites updated).
- IntegrationImporter / IntegrationAttributeReferencer: declaration
  removed; inherited from CapabilityGateway.
- HASS / ZoneMinder / Frigate / HomeBox connectors: method renamed
  to ``get_metadata``.
- test_integration_connector.py / test_sync_check.py: fixture
  stubs renamed to match.

The ``_get_integration_metadata`` private helper in
``integration_tags.py`` is unrelated (it takes an
``IntegrationDetailsModel``, not a gateway) and untouched.

* Some integration modal cleanup

* File name normalization for new integrations referencer.

* Refresh integration-guidelines.md for current capability model

The doc had drifted in several places:

- Said "two capabilities exist today." Three do — ATTRIBUTE_REFERENCE
  is the third (paperless-ngx). Added the entry plus a note that
  ATTRIBUTE_REFERENCE-only integrations land on the sibling "Content
  Sources" tab rather than "Connectors."
- Said the per-capability classes "don't share a base class —
  commonality is composed through shared helpers." They do now:
  ``CapabilityGateway`` (``hi/integrations/capability_gateway.py``)
  is the shared base, carrying ``capability``, ``get_metadata``,
  ``get_description``, and ``get_attribute_actions_template_name``.
- Setup steps 3-5 referenced an obsolete ``IntegrationType`` enum,
  an ``activate / deactivate / manage`` method surface, and an
  ``integration_factory.py`` that doesn't exist. Rewrote those steps
  to describe declaring ``IntegrationMetaData`` with the right
  capability set, implementing the actual gateway methods
  (``get_metadata``, ``get_<capability>``, ``validate_configuration``,
  ``validate_access``, ``notify_settings_changed``), and the
  auto-discovery flow that replaces factory registration.
- Gateway Implementation Patterns described the old
  activate/deactivate/manage contract and dict return shape — neither
  still exists. Replaced with a short summary of the current method
  contract.
- Error Handling listed exceptions that don't exist
  (``ConnectionError``, ``AuthenticationError``,
  ``DataValidationError``) and missed the ones that do
  (``IntegrationDisabledError``, ``IntegrationAttributeError``,
  ``IntegrationConnectionError``). Corrected.
- Key Base Classes & Modules had wrong dotted paths
  (``hi.integration.*`` → ``hi.integrations.*``,
  ``hi.utils.singleton`` → ``hi.apps.common.singleton``),
  referenced a non-existent ``IntegrationStatus`` enum and the
  obsolete factory module. Replaced with the current set, added
  ``CapabilityGateway`` and ``IntegrationManager``.
- Example Integrations listed three; added Frigate and paperless,
  and noted that paperless has a slimmer file layout because
  ATTRIBUTE_REFERENCE-only integrations have no monitors / sync /
  converter / manager.
- File layout section gained a short paragraph noting which
  role-files apply only to CONNECT/IMPORT-shaped integrations.

* Accept single-label intranet hostnames in attribute URL linkification

Django's URLValidator rejects hostnames without a TLD (e.g.
http://cassandra:4100/...), so URLs pointing at LAN hosts saved as
attribute values were not rendered as clickable links. Replace the
strict validator with a permissive urlparse-based check (http(s)
scheme + non-empty netloc) so intranet single-label hostnames are
linkified just like IPs and FQDNs.

* Refresh paperless-ngx docs for Content Sources tab and slim dev page

User-facing changes follow the post-#232 UI: paperless is configured
from the Content Sources tab (not Connectors), the action button on
item/Location edit pages reads 'Link Content', and the picker's
attach button reads 'Add N Links'. Integrations.md routes paperless
to its own walkthrough rather than folding a non-Connectors flow into
the shared steps.

Dev page refocused on design rationale (capability shape, thumbnail
proxy vs source-URL passthrough, no-manager, snippet strategy,
single-deployment assumption). Dropped module/test enumerations and
wire-format strings — those rot and a developer can read the
directory or pl_models.py.

* Connectors vocabulary sweep in capability-blocked modal and selector

The capability-block modal still emitted 'GO TO INTEGRATIONS' /
'disable the … Integration' / 'configured as Integration with',
contradicting the new three-tab operator vocabulary (Connectors /
Content Sources / Data Import). The empty state in the integrations
selector also still said 'No integrations found' under an 'All
Connectors' title. Aligned both to 'Connector(s)' and updated the
matching importer-view test assertion.
2026-05-27 12:22:47 -05:00
..