* fix(pii): post-merge review fixes + live NER e2e for the privacy-filter tier
Follow-up to the NER tier engine (#10360), already on master. This carries
only the incremental review fixes and tests that postdate that merge — the
feature itself is not re-introduced.
Review fixes:
- openai_completion.go: remove the dead `elem >= 0` conjunct in applyAnyText
(the `elem < 0` guard above already returns).
- application.go: collapse ResolvePIIPolicy's inline re-implementation of
PIIIsEnabled to a single cfg.PIIIsEnabled() call (sole source of the
"explicit pii.enabled wins, else cloud-proxy default" rule) and return true
past the !enabled guard where it is provable.
- pattern.go: hoist the triple `appConfig != nil && EnableTracing` check in
patternDetector.Detect into one local.
- grammar.go: MaxQuantifier was 4096, but Go's regexp/syntax rejects repeat
bounds above 1000 at Parse time, so walk()'s {n,m} guard could never fire —
dead code shadowed by the parser. Lower it to 512 so a bound in (512,1000]
is rejected here with an actionable error; >1000 still fails closed via
Parse. Specs pin the relationship so the guard can't silently revert.
- PatternListEditor.jsx: clamp a directly-typed negative min_len to >=0 and
force the DOM value back when clamping (min={0} only constrained the spinner,
so a negative reached saved config and silently disabled the length filter).
Tests:
- piipattern_test.go: MaxQuantifier guard specs (must stay live, not dead).
- model-config.spec.js: assert the min_len clamp, and that entity_actions
collapses a duplicate group to a single row (map semantics; regression guard
against emitting an array that drops a row on save).
- tests/e2e-backends: token_classify capability driving the TokenClassify gRPC
RPC against the backend image, asserting byte-correct, UTF-8 rune-aligned
spans (entity.Text == text[start:end]) at threshold 0. Verified on CPU via
`make test-extra-backend-privacy-filter` (3/3 specs).
- Makefile: test-extra-backend-privacy-filter wrapper.
- tests/e2e: e2e_pii_ner_test.go drives /api/pii/analyze + /api/pii/redact
(mask + block) through the full HTTP -> detector -> redactor path; gated on
PII_NER_MODEL_GGUF so the default suite is unaffected.
- .github/workflows/tests-pii-ner-e2e.yml: path-filtered / nightly CI job
running the container harness on CPU.
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(gallery): add privacy-filter-nemotron (f16 + q8)
GGUF conversions of OpenMed/privacy-filter-nemotron — a fine-grained English
PII token-classifier (55 categories / 221 BIOES classes), fine-tuned from
openai/privacy-filter on NVIDIA's Nemotron-PII dataset. Sibling to the existing
privacy-filter-multilingual entry, trading language breadth for category depth.
- privacy-filter-nemotron: F16 reference artifact (~2.8 GB).
- privacy-filter-nemotron-q8: Q8_0 quant (~1.64 GB) for RAM-constrained / edge
use; description notes the size/speed tradeoff and to validate on your own
data (a single dropped span is a PII leak).
Both run on the privacy-filter backend with known_usecases [token_classify] and
a default mask policy (min_score 0.5); operators add per-category entity_actions
as needed. sha256s taken from the HF repo's LFS object ids.
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>
Squashed feat/pii-ner-tier-engine rebased onto master (was 45 commits; see
backup/pii-ner-tier-engine-prerebase). Net change:
- privacy-filter.cpp: standalone GGML engine for the openai-privacy-filter
PII/NER token classifier, wired as a LocalAI gRPC backend (CPU/CUDA/Vulkan).
TokenClassify moves off the patched llama.cpp path onto this backend.
- PII filter reworked to be NER-centric (encoder/NER detection tier scanning
whole conversations as one document), with a recreated bounded restricted-
regex secret-matching pattern detector tier alongside it (per-model
pii_detection.builtins / .patterns + core/services/routing/piipattern).
- Detection labelled by source (ner vs pattern); backend trace / confidence /
debug observability; analyze/redact exposed as a synchronous API.
- Instance-wide default detector policy + per-usecase default-on; request
filtering extended to completions, embeddings, edits & Ollama.
- React UI: NER-centric PII editor, detector-models table, pattern/builtins
editor, middleware default-policy UI.
- Gallery: privacy-filter-multilingual token-classify model + NER install
filter; token_classify known_usecase; batch sized to context for NER models.
privacy-filter backend registered in the backend gallery (cpu/vulkan/cuda-13
meta + image entries with a capabilities map) matching its CI matrix jobs,
and an /import-model auto-detect importer (PrivacyFilterImporter, narrow
privacy-filter GGUF detection) replacing the prior pref-only registration.
Reconciled against master's independent evolution:
- Dropped master's PIIPatternOverrides feature (global-pattern runtime
overrides + /api/pii/patterns API + runtime_settings.json persistence). The
per-model NER + pattern-detector design supersedes it; it was built on the
global redactor pattern set this branch replaced.
- Reverted the llama.cpp Score carry-patch (0006-server-task-type-score):
removed the patch and restored master's grpc-server.cpp Score RPC (direct
llama_decode, slot-loop bypass) and LLAMA_VERSION pin, plus master's
model_config validation forbidding score + chat/completion/embeddings on
llama-cpp. token_classify is unaffected (it runs on the privacy-filter
backend, not llama-cpp).
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(router): score classifier production-readiness
Conversation trimming runs through the classifier model's chat template
and trims by exact token count, sized to the model's n_batch which is
now scaled to context so long probes can't crash the backend. Missing
chat_message templates are a hard error at router build time. Router-
facing factories (Embedder/Scorer/Reranker/TokenCounter) re-resolve
ModelConfig per call so a model installed post-startup doesn't bind a
stub Backend="" config and silently fall into the loader's auto-
iterate path.
New 'vector_store' backend trace recorded inside localVectorStore on
every Search/Insert — including the backend-load-failure path that
previously vanished into an xlog.Warn — with outcome tagging
(hit/miss/empty_store/backend_load_error/find_error/insert_error/ok).
Companion cleanup drops misleading similarity:0 and input_tokens_count:0
from non-hit and text-mode traces.
Gallery local-store-development aliases to 'local-store' so the master
image satisfies pkg/model.LocalStoreBackend lookups from the embedding
cache.
Misc: llama-cpp TokenizeString reads the correct 'prompt' JSON key
(the original bug); ModelTokenize nil-guard; non-fatal mitm proxy
startup; PII 'route_local' renamed to 'allow' with docs/UI in sync;
model-editor footer no longer eats the edit area on small screens;
several config-editor template/dropdown/section fixes.
Tests: e2e router specs (casual/code-hint + long-conversation trim),
vector_store trace specs, lazy-factory specs, gallery dev-alias
resolution, Playwright trace badge + scroll regression.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(backend): auto-size batch to context for embedding and rerank models
Embedding and rerank models pool over the whole input in a single physical batch (n_ubatch). With batch left at the 512 default, the backend rejects longer inputs with "input is too large to process", silently capping a large-context embedder (e.g. 8k/32k) at 512 tokens. Size n_batch to the context for these single-pass usecases, mirroring the existing FLAG_SCORE behaviour; an explicit batch: still wins.
Extracts EffectiveContextSize/EffectiveBatchSize from grpcModelOpts so the effective decode window has one home for other callers to reuse.
Adds an e2e-aio regression test that embeds a >512-token input. The AIO embedding model is switched to nomic-embed-text-v1.5 (2048 context) because the previous granite model was capped at 512 tokens and could not exercise the larger batch.
Assisted-by: claude-code:claude-opus-4-8 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(gallery): raise arch-router scoring output cap via parallel:64
Scoring decodes the whole prompt+candidate in a single llama_decode and
reads one logit row per candidate token. The vendored llama.cpp server
caps causal output rows at n_parallel, so the default of 1 aborts with
GGML_ASSERT(n_outputs_max <= cparams.n_outputs_max) on multi-token route
labels. Set options: [parallel:64] on both arch-router quant entries to
lift the cap; kv_unified (the grpc-server default) keeps the full context
per sequence, so this does not split the KV cache.
Assisted-by: claude-code:claude-opus-4-8 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>
- Strict monotonic Go coverage gate (make test-coverage-check, 45% baseline)
run in CI; fixes ginkgo dropping all-but-one coverprofile across multiple
recursive roots, builds with -tags auth, and folds in the in-process
tests/e2e suite via --coverpkg.
- React UI e2e coverage (make test-ui-coverage: vite-plugin-istanbul + nyc,
nix-provided Chromium) plus e2e specs for 6 previously-untested pages, and a
UI coverage gate (make test-ui-coverage-check) with a small tolerance since
e2e line coverage jitters ~0.5pp run-to-run.
- pre-commit hook: lint + coverage on Go changes, Playwright e2e + UI coverage
gate on react-ui changes; install with make install-hooks.
- New Go handler tests (settings, branding), hermetic base64 download test.
- fix(ui): model editor reads vram_display (snake_case), so the VRAM estimate
renders again; covered by a regression test.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(ui): Add dynamic model editor with autocomplete
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* chore(docs): Add link to longformat installation video
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>