feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360)

Squashed feat/pii-ner-tier-engine rebased onto master (was 45 commits; see backup/pii-ner-tier-engine-prerebase). Net change: - privacy-filter.cpp: standalone GGML engine for the openai-privacy-filter PII/NER token classifier, wired as a LocalAI gRPC backend (CPU/CUDA/Vulkan). TokenClassify moves off the patched llama.cpp path onto this backend. - PII filter reworked to be NER-centric (encoder/NER detection tier scanning whole conversations as one document), with a recreated bounded restricted- regex secret-matching pattern detector tier alongside it (per-model pii_detection.builtins / .patterns + core/services/routing/piipattern). - Detection labelled by source (ner vs pattern); backend trace / confidence / debug observability; analyze/redact exposed as a synchronous API. - Instance-wide default detector policy + per-usecase default-on; request filtering extended to completions, embeddings, edits & Ollama. - React UI: NER-centric PII editor, detector-models table, pattern/builtins editor, middleware default-policy UI. - Gallery: privacy-filter-multilingual token-classify model + NER install filter; token_classify known_usecase; batch sized to context for NER models. privacy-filter backend registered in the backend gallery (cpu/vulkan/cuda-13 meta + image entries with a capabilities map) matching its CI matrix jobs, and an /import-model auto-detect importer (PrivacyFilterImporter, narrow privacy-filter GGUF detection) replacing the prior pref-only registration. Reconciled against master's independent evolution: - Dropped master's PIIPatternOverrides feature (global-pattern runtime overrides + /api/pii/patterns API + runtime_settings.json persistence). The per-model NER + pattern-detector design supersedes it; it was built on the global redactor pattern set this branch replaced. - Reverted the llama.cpp Score carry-patch (0006-server-task-type-score): removed the patch and restored master's grpc-server.cpp Score RPC (direct llama_decode, slot-loop bypass) and LLAMA_VERSION pin, plus master's model_config validation forbidding score + chat/completion/embeddings on llama-cpp. token_classify is unaffected (it runs on the privacy-filter backend, not llama-cpp). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-06-18 21:58:58 -04:00 · 2026-06-18 11:45:22 +01:00
parent c133ca39dc
commit 3fa7b2955c
134 changed files with 6671 additions and 4223 deletions
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -796,6 +796,112 @@
    - filename: llama-cpp/mmproj/Step-3.7-Flash-GGUF/mmproj-F32.gguf
      sha256: 2fab13dcd32e4b3dc4410297df80f4d82627308e725dedac802940ceca7dff13
      uri: https://huggingface.co/unsloth/Step-3.7-Flash-GGUF/resolve/main/mmproj-F32.gguf
+- name: "privacy-filter-multilingual"
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  icon: https://cdn-avatars.huggingface.co/v1/production/uploads/5fd5e18a90b6dc4633f6d292/QPiv8pt4JNxr0FdGnpFef.png
+  urls:
+    - https://huggingface.co/OpenMed/privacy-filter-multilingual
+    - https://huggingface.co/LocalAI-io/privacy-filter-multilingual-GGUF
+  description: |
+    A multilingual PII token-classification model: a fine-tune of
+    openai/privacy-filter by OpenMed. It labels every token with a BIOES tag
+    over 54 PII categories (217 classes) across 16 languages (ar, bn, de, en,
+    es, fr, hi, it, ja, ko, nl, pt, te, tr, vi, zh), spanning identity, contact,
+    address, financial, vehicle, digital, and crypto entities.
+
+    In LocalAI this is a PII detector for the NER redactor tier: set
+    known_usecases to [token_classify] (as below), and any model opts into
+    redaction by listing this one under pii.detectors. The detection policy
+    (which categories to mask vs block, and the score threshold) lives on this
+    model's own pii_detection block - see the overrides below. It runs locally
+    with no Python, served by the standalone privacy-filter backend's
+    TokenClassify RPC (constrained BIOES Viterbi decode into UTF-8 byte-offset
+    entity spans).
+
+    Architecture: gpt-oss-style sparse MoE (8 layers, 128 experts top-4, ~50M
+    active per token), bidirectional banded attention, o200k tokenizer; served
+    via the openai-privacy-filter architecture. F16, ~2.7 GB.
+  license: apache-2.0
+  tags:
+    - token-classification
+    - ner
+    - pii
+    - privacy
+    - multilingual
+    - gguf
+  overrides:
+    backend: privacy-filter
+    embeddings: true
+    known_usecases:
+      - token_classify
+    parameters:
+      model: privacy-filter/models/privacy-filter-multilingual/privacy-filter-multilingual-f16.gguf
+    # Detection policy used when another model references this one via
+    # pii.detectors. Default-mask everything the model flags; block the
+    # credential/financial-secret/crypto categories. Keys are the model's
+    # own entity-group names (uppercase, no separators); anything not
+    # listed falls through to default_action: mask.
+    pii_detection:
+      min_score: 0.5
+      default_action: mask
+      entity_actions:
+        PASSWORD: block
+        PIN: block
+        CVV: block
+        CREDITCARD: block
+        IBAN: block
+        BIC: block
+        BANKACCOUNT: block
+        SSN: block
+        BITCOINADDRESS: block
+        ETHEREUMADDRESS: block
+        LITECOINADDRESS: block
+  files:
+    - filename: privacy-filter/models/privacy-filter-multilingual/privacy-filter-multilingual-f16.gguf
+      sha256: 01b76572f80b7d2ebee80a27cb9c3699c26b04cae1c402eee7664fc17a4b5ce6
+      uri: https://huggingface.co/LocalAI-io/privacy-filter-multilingual-GGUF/resolve/main/privacy-filter-multilingual-f16.gguf
+- name: "secret-filter"
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  description: |
+    A pattern-based PII detector for high-entropy, highly-regular secrets —
+    API keys, tokens, and private-key blocks — that the NER tier cannot catch
+    (it has no credential class, so it fragments a key and may leave the secret
+    part exposed). Detection is bounded restricted-regex compiled to RE2
+    (linear time, no backtracking); it runs entirely in-process with no model
+    download, no backend, and zero VRAM.
+
+    Install it, then reference it under another model's pii.detectors (or set it
+    as the instance-wide default detector on the Middleware page) to block leaks
+    of known credential formats out of the box. Add your own patterns under
+    pii_detection.patterns in a restricted regex subset (e.g. "tok-\\w{32,}");
+    each must carry a fixed literal anchor of at least 3 characters, so open-
+    ended shapes like email addresses are rejected and left to the NER tier.
+  license: apache-2.0
+  tags:
+    - pii
+    - privacy
+    - secrets
+    - pattern
+  overrides:
+    backend: pattern
+    known_usecases:
+      - token_classify
+    # Matched secrets are blocked by default (a leaked credential should not
+    # reach an upstream provider); downgrade individual groups to mask/allow
+    # via entity_actions if needed. Group names mirror the built-in catalogue.
+    pii_detection:
+      default_action: block
+      builtins:
+        - anthropic_api_key
+        - openai_api_key
+        - github_token
+        - github_pat
+        - aws_access_key
+        - google_api_key
+        - slack_token
+        - stripe_key
+        - jwt
+        - private_key_block
 - name: "lfm2.5-8b-a1b"
  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
  urls: