feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360)

Squashed feat/pii-ner-tier-engine rebased onto master (was 45 commits; see
backup/pii-ner-tier-engine-prerebase). Net change:

- privacy-filter.cpp: standalone GGML engine for the openai-privacy-filter
  PII/NER token classifier, wired as a LocalAI gRPC backend (CPU/CUDA/Vulkan).
  TokenClassify moves off the patched llama.cpp path onto this backend.
- PII filter reworked to be NER-centric (encoder/NER detection tier scanning
  whole conversations as one document), with a recreated bounded restricted-
  regex secret-matching pattern detector tier alongside it (per-model
  pii_detection.builtins / .patterns + core/services/routing/piipattern).
- Detection labelled by source (ner vs pattern); backend trace / confidence /
  debug observability; analyze/redact exposed as a synchronous API.
- Instance-wide default detector policy + per-usecase default-on; request
  filtering extended to completions, embeddings, edits & Ollama.
- React UI: NER-centric PII editor, detector-models table, pattern/builtins
  editor, middleware default-policy UI.
- Gallery: privacy-filter-multilingual token-classify model + NER install
  filter; token_classify known_usecase; batch sized to context for NER models.
  privacy-filter backend registered in the backend gallery (cpu/vulkan/cuda-13
  meta + image entries with a capabilities map) matching its CI matrix jobs,
  and an /import-model auto-detect importer (PrivacyFilterImporter, narrow
  privacy-filter GGUF detection) replacing the prior pref-only registration.

Reconciled against master's independent evolution:

- Dropped master's PIIPatternOverrides feature (global-pattern runtime
  overrides + /api/pii/patterns API + runtime_settings.json persistence). The
  per-model NER + pattern-detector design supersedes it; it was built on the
  global redactor pattern set this branch replaced.
- Reverted the llama.cpp Score carry-patch (0006-server-task-type-score):
  removed the patch and restored master's grpc-server.cpp Score RPC (direct
  llama_decode, slot-loop bypass) and LLAMA_VERSION pin, plus master's
  model_config validation forbidding score + chat/completion/embeddings on
  llama-cpp. token_classify is unaffected (it runs on the privacy-filter
  backend, not llama-cpp).

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Richard Palethorpe <io@richiejp.com>
This commit is contained in:
Richard Palethorpe
2026-06-18 11:45:22 +01:00
committed by GitHub
parent c133ca39dc
commit 3fa7b2955c
134 changed files with 6671 additions and 4223 deletions

View File

@@ -796,6 +796,112 @@
- filename: llama-cpp/mmproj/Step-3.7-Flash-GGUF/mmproj-F32.gguf
sha256: 2fab13dcd32e4b3dc4410297df80f4d82627308e725dedac802940ceca7dff13
uri: https://huggingface.co/unsloth/Step-3.7-Flash-GGUF/resolve/main/mmproj-F32.gguf
- name: "privacy-filter-multilingual"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
icon: https://cdn-avatars.huggingface.co/v1/production/uploads/5fd5e18a90b6dc4633f6d292/QPiv8pt4JNxr0FdGnpFef.png
urls:
- https://huggingface.co/OpenMed/privacy-filter-multilingual
- https://huggingface.co/LocalAI-io/privacy-filter-multilingual-GGUF
description: |
A multilingual PII token-classification model: a fine-tune of
openai/privacy-filter by OpenMed. It labels every token with a BIOES tag
over 54 PII categories (217 classes) across 16 languages (ar, bn, de, en,
es, fr, hi, it, ja, ko, nl, pt, te, tr, vi, zh), spanning identity, contact,
address, financial, vehicle, digital, and crypto entities.
In LocalAI this is a PII detector for the NER redactor tier: set
known_usecases to [token_classify] (as below), and any model opts into
redaction by listing this one under pii.detectors. The detection policy
(which categories to mask vs block, and the score threshold) lives on this
model's own pii_detection block - see the overrides below. It runs locally
with no Python, served by the standalone privacy-filter backend's
TokenClassify RPC (constrained BIOES Viterbi decode into UTF-8 byte-offset
entity spans).
Architecture: gpt-oss-style sparse MoE (8 layers, 128 experts top-4, ~50M
active per token), bidirectional banded attention, o200k tokenizer; served
via the openai-privacy-filter architecture. F16, ~2.7 GB.
license: apache-2.0
tags:
- token-classification
- ner
- pii
- privacy
- multilingual
- gguf
overrides:
backend: privacy-filter
embeddings: true
known_usecases:
- token_classify
parameters:
model: privacy-filter/models/privacy-filter-multilingual/privacy-filter-multilingual-f16.gguf
# Detection policy used when another model references this one via
# pii.detectors. Default-mask everything the model flags; block the
# credential/financial-secret/crypto categories. Keys are the model's
# own entity-group names (uppercase, no separators); anything not
# listed falls through to default_action: mask.
pii_detection:
min_score: 0.5
default_action: mask
entity_actions:
PASSWORD: block
PIN: block
CVV: block
CREDITCARD: block
IBAN: block
BIC: block
BANKACCOUNT: block
SSN: block
BITCOINADDRESS: block
ETHEREUMADDRESS: block
LITECOINADDRESS: block
files:
- filename: privacy-filter/models/privacy-filter-multilingual/privacy-filter-multilingual-f16.gguf
sha256: 01b76572f80b7d2ebee80a27cb9c3699c26b04cae1c402eee7664fc17a4b5ce6
uri: https://huggingface.co/LocalAI-io/privacy-filter-multilingual-GGUF/resolve/main/privacy-filter-multilingual-f16.gguf
- name: "secret-filter"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
description: |
A pattern-based PII detector for high-entropy, highly-regular secrets —
API keys, tokens, and private-key blocks — that the NER tier cannot catch
(it has no credential class, so it fragments a key and may leave the secret
part exposed). Detection is bounded restricted-regex compiled to RE2
(linear time, no backtracking); it runs entirely in-process with no model
download, no backend, and zero VRAM.
Install it, then reference it under another model's pii.detectors (or set it
as the instance-wide default detector on the Middleware page) to block leaks
of known credential formats out of the box. Add your own patterns under
pii_detection.patterns in a restricted regex subset (e.g. "tok-\\w{32,}");
each must carry a fixed literal anchor of at least 3 characters, so open-
ended shapes like email addresses are rejected and left to the NER tier.
license: apache-2.0
tags:
- pii
- privacy
- secrets
- pattern
overrides:
backend: pattern
known_usecases:
- token_classify
# Matched secrets are blocked by default (a leaked credential should not
# reach an upstream provider); downgrade individual groups to mask/allow
# via entity_actions if needed. Group names mirror the built-in catalogue.
pii_detection:
default_action: block
builtins:
- anthropic_api_key
- openai_api_key
- github_token
- github_pat
- aws_access_key
- google_api_key
- slack_token
- stripe_key
- jwt
- private_key_block
- name: "lfm2.5-8b-a1b"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls: