mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-19 06:09:07 -04:00
feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360)
Squashed feat/pii-ner-tier-engine rebased onto master (was 45 commits; see backup/pii-ner-tier-engine-prerebase). Net change: - privacy-filter.cpp: standalone GGML engine for the openai-privacy-filter PII/NER token classifier, wired as a LocalAI gRPC backend (CPU/CUDA/Vulkan). TokenClassify moves off the patched llama.cpp path onto this backend. - PII filter reworked to be NER-centric (encoder/NER detection tier scanning whole conversations as one document), with a recreated bounded restricted- regex secret-matching pattern detector tier alongside it (per-model pii_detection.builtins / .patterns + core/services/routing/piipattern). - Detection labelled by source (ner vs pattern); backend trace / confidence / debug observability; analyze/redact exposed as a synchronous API. - Instance-wide default detector policy + per-usecase default-on; request filtering extended to completions, embeddings, edits & Ollama. - React UI: NER-centric PII editor, detector-models table, pattern/builtins editor, middleware default-policy UI. - Gallery: privacy-filter-multilingual token-classify model + NER install filter; token_classify known_usecase; batch sized to context for NER models. privacy-filter backend registered in the backend gallery (cpu/vulkan/cuda-13 meta + image entries with a capabilities map) matching its CI matrix jobs, and an /import-model auto-detect importer (PrivacyFilterImporter, narrow privacy-filter GGUF detection) replacing the prior pref-only registration. Reconciled against master's independent evolution: - Dropped master's PIIPatternOverrides feature (global-pattern runtime overrides + /api/pii/patterns API + runtime_settings.json persistence). The per-model NER + pattern-detector design supersedes it; it was built on the global redactor pattern set this branch replaced. - Reverted the llama.cpp Score carry-patch (0006-server-task-type-score): removed the patch and restored master's grpc-server.cpp Score RPC (direct llama_decode, slot-loop bypass) and LLAMA_VERSION pin, plus master's model_config validation forbidding score + chat/completion/embeddings on llama-cpp. token_classify is unaffected (it runs on the privacy-filter backend, not llama-cpp). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com>
This commit is contained in:
committed by
GitHub
parent
c133ca39dc
commit
3fa7b2955c
109
backend/Dockerfile.privacy-filter
Normal file
109
backend/Dockerfile.privacy-filter
Normal file
@@ -0,0 +1,109 @@
|
||||
ARG BASE_IMAGE=ubuntu:24.04
|
||||
# BUILDER_BASE_IMAGE defaults to BASE_IMAGE so the Dockerfile parses when no
|
||||
# prebuilt base is supplied; the builder-prebuilt stage is only entered when
|
||||
# BUILDER_TARGET=builder-prebuilt, so the fallback content is harmless
|
||||
# (BuildKit prunes the unreferenced builder).
|
||||
ARG BUILDER_BASE_IMAGE=${BASE_IMAGE}
|
||||
# BUILDER_TARGET selects which builder stage the scratch image copies from.
|
||||
# Declared before any FROM so it is usable in `FROM ${BUILDER_TARGET}`. The
|
||||
# backend_build workflow sets it to builder-prebuilt when the matrix entry
|
||||
# provides builder-base-image, else builder-fromsource (the local default).
|
||||
ARG BUILDER_TARGET=builder-fromsource
|
||||
ARG APT_MIRROR=""
|
||||
ARG APT_PORTS_MIRROR=""
|
||||
|
||||
# privacy-filter: standalone GGML engine for the openai-privacy-filter PII/NER
|
||||
# token classifier, wrapped as a LocalAI gRPC backend.
|
||||
#
|
||||
# Mirrors backend/Dockerfile.llama-cpp: the build toolchain (gRPC + cmake +
|
||||
# protoc + conditional CUDA/Vulkan) comes from the shared
|
||||
# .docker/install-base-deps.sh (from-source path) or a prebuilt
|
||||
# quay.io/go-skynet/ci-cache:base-grpc-* image (CI path) — nothing GPU-specific
|
||||
# is hand-rolled here. BUILD_TYPE selects the engine backend in the Makefile:
|
||||
# "" = cpu, "cublas" -> -DPF_CUDA=ON, "vulkan" -> -DPF_VULKAN=ON.
|
||||
|
||||
# ============================================================================
|
||||
# Stage: builder-fromsource — self-contained build. Runs the same install
|
||||
# script backend/Dockerfile.base-grpc-builder runs, so this path is
|
||||
# bit-equivalent to the prebuilt base. Used when BUILDER_TARGET=builder-fromsource
|
||||
# (the default; local `make backends/privacy-filter`).
|
||||
# ============================================================================
|
||||
FROM ${BASE_IMAGE} AS builder-fromsource
|
||||
ARG BUILD_TYPE
|
||||
ARG CUDA_MAJOR_VERSION
|
||||
ARG CUDA_MINOR_VERSION
|
||||
ARG CMAKE_FROM_SOURCE=false
|
||||
# CUDA Toolkit 13.x needs CMake 3.31.9+ for correct toolchain/arch detection.
|
||||
ARG CMAKE_VERSION=3.31.10
|
||||
ARG GRPC_VERSION=v1.65.0
|
||||
ARG GRPC_MAKEFLAGS="-j4 -Otarget"
|
||||
ARG SKIP_DRIVERS=false
|
||||
ARG TARGETARCH
|
||||
ARG UBUNTU_VERSION=2404
|
||||
ARG APT_MIRROR
|
||||
ARG APT_PORTS_MIRROR
|
||||
|
||||
ENV BUILD_TYPE=${BUILD_TYPE} \
|
||||
CUDA_MAJOR_VERSION=${CUDA_MAJOR_VERSION} \
|
||||
CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION} \
|
||||
CMAKE_FROM_SOURCE=${CMAKE_FROM_SOURCE} \
|
||||
CMAKE_VERSION=${CMAKE_VERSION} \
|
||||
GRPC_VERSION=${GRPC_VERSION} \
|
||||
GRPC_MAKEFLAGS=${GRPC_MAKEFLAGS} \
|
||||
SKIP_DRIVERS=${SKIP_DRIVERS} \
|
||||
TARGETARCH=${TARGETARCH} \
|
||||
UBUNTU_VERSION=${UBUNTU_VERSION} \
|
||||
APT_MIRROR=${APT_MIRROR} \
|
||||
APT_PORTS_MIRROR=${APT_PORTS_MIRROR} \
|
||||
DEBIAN_FRONTEND=noninteractive
|
||||
# CUDA on PATH (a no-op when CUDA is not installed, e.g. cpu/vulkan builds).
|
||||
ENV PATH=/usr/local/cuda/bin:${PATH}
|
||||
|
||||
WORKDIR /build
|
||||
|
||||
# apt deps + cmake + protoc + gRPC + conditional CUDA/Vulkan, all from the
|
||||
# shared script (the source of truth that base-grpc-builder also runs).
|
||||
RUN --mount=type=bind,source=.docker/install-base-deps.sh,target=/usr/local/sbin/install-base-deps \
|
||||
--mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
|
||||
bash /usr/local/sbin/install-base-deps
|
||||
|
||||
# install-base-deps installs gRPC under /opt/grpc; copy it to /usr/local so the
|
||||
# backend's find_package(gRPC CONFIG) resolves it at the canonical prefix.
|
||||
RUN cp -a /opt/grpc/. /usr/local/
|
||||
|
||||
COPY . /LocalAI
|
||||
|
||||
RUN --mount=type=cache,target=/root/.ccache,id=privacy-filter-ccache-${TARGETARCH}-${BUILD_TYPE},sharing=locked \
|
||||
make -C /LocalAI/backend/cpp/privacy-filter BUILD_TYPE=${BUILD_TYPE} NATIVE=false grpc-server package
|
||||
|
||||
# ============================================================================
|
||||
# Stage: builder-prebuilt — FROM a prebuilt
|
||||
# quay.io/go-skynet/ci-cache:base-grpc-* image (gRPC at /opt/grpc + apt deps +
|
||||
# CUDA/Vulkan already installed). Used in CI when the matrix entry sets
|
||||
# builder-base-image.
|
||||
# ============================================================================
|
||||
FROM ${BUILDER_BASE_IMAGE} AS builder-prebuilt
|
||||
ARG BUILD_TYPE
|
||||
ARG TARGETARCH
|
||||
ENV BUILD_TYPE=${BUILD_TYPE}
|
||||
# CUDA on PATH (a no-op for the cpu/vulkan base images).
|
||||
ENV PATH=/usr/local/cuda/bin:${PATH}
|
||||
|
||||
# Mirror builder-fromsource: the base-grpc image installs gRPC to /opt/grpc but
|
||||
# does not copy it to /usr/local.
|
||||
RUN cp -a /opt/grpc/. /usr/local/
|
||||
|
||||
COPY . /LocalAI
|
||||
|
||||
RUN --mount=type=cache,target=/root/.ccache,id=privacy-filter-ccache-${TARGETARCH}-${BUILD_TYPE},sharing=locked \
|
||||
make -C /LocalAI/backend/cpp/privacy-filter BUILD_TYPE=${BUILD_TYPE} NATIVE=false grpc-server package
|
||||
|
||||
# ============================================================================
|
||||
# Final stage — copy the package output from the selected builder. BuildKit
|
||||
# does not expand variables in `COPY --from=`, so alias the chosen builder to a
|
||||
# fixed stage name first.
|
||||
# ============================================================================
|
||||
FROM ${BUILDER_TARGET} AS builder
|
||||
|
||||
FROM scratch
|
||||
COPY --from=builder /LocalAI/backend/cpp/privacy-filter/package/. ./
|
||||
9
backend/cpp/privacy-filter/.gitignore
vendored
Normal file
9
backend/cpp/privacy-filter/.gitignore
vendored
Normal file
@@ -0,0 +1,9 @@
|
||||
/privacy-filter.cpp
|
||||
build/
|
||||
package/
|
||||
grpc-server
|
||||
*.o
|
||||
backend.pb.cc
|
||||
backend.pb.h
|
||||
backend.grpc.pb.cc
|
||||
backend.grpc.pb.h
|
||||
69
backend/cpp/privacy-filter/CMakeLists.txt
Normal file
69
backend/cpp/privacy-filter/CMakeLists.txt
Normal file
@@ -0,0 +1,69 @@
|
||||
cmake_minimum_required(VERSION 3.21)
|
||||
project(privacy-filter-grpc-server LANGUAGES CXX C)
|
||||
|
||||
set(CMAKE_CXX_STANDARD 17)
|
||||
set(CMAKE_CXX_STANDARD_REQUIRED ON)
|
||||
set(TARGET grpc-server)
|
||||
|
||||
# Path to the privacy-filter.cpp engine sources. The Makefile arranges for this
|
||||
# to exist (clone of a pinned commit, or a symlink to PRIVACY_FILTER_SRC).
|
||||
set(PRIVACY_FILTER_DIR "${CMAKE_CURRENT_SOURCE_DIR}/privacy-filter.cpp"
|
||||
CACHE PATH "Path to the privacy-filter.cpp engine source tree")
|
||||
|
||||
find_package(Threads REQUIRED)
|
||||
find_package(Protobuf CONFIG QUIET)
|
||||
if(NOT Protobuf_FOUND)
|
||||
find_package(Protobuf REQUIRED)
|
||||
endif()
|
||||
find_package(gRPC CONFIG QUIET)
|
||||
if(NOT gRPC_FOUND)
|
||||
# Ubuntu's apt-installed grpc++ does not ship a CMake config - fall back.
|
||||
find_library(GRPCPP_LIB grpc++ REQUIRED)
|
||||
find_library(GRPCPP_REFLECTION_LIB grpc++_reflection REQUIRED)
|
||||
add_library(gRPC::grpc++ INTERFACE IMPORTED)
|
||||
set_target_properties(gRPC::grpc++ PROPERTIES INTERFACE_LINK_LIBRARIES "${GRPCPP_LIB}")
|
||||
add_library(gRPC::grpc++_reflection INTERFACE IMPORTED)
|
||||
set_target_properties(gRPC::grpc++_reflection PROPERTIES INTERFACE_LINK_LIBRARIES "${GRPCPP_REFLECTION_LIB}")
|
||||
endif()
|
||||
|
||||
find_program(_PROTOC NAMES protoc REQUIRED)
|
||||
find_program(_GRPC_CPP_PLUGIN NAMES grpc_cpp_plugin REQUIRED)
|
||||
|
||||
get_filename_component(HW_PROTO "${CMAKE_CURRENT_SOURCE_DIR}/../../backend.proto" ABSOLUTE)
|
||||
get_filename_component(HW_PROTO_PATH "${HW_PROTO}" PATH)
|
||||
|
||||
set(HW_PROTO_SRCS "${CMAKE_CURRENT_BINARY_DIR}/backend.pb.cc")
|
||||
set(HW_PROTO_HDRS "${CMAKE_CURRENT_BINARY_DIR}/backend.pb.h")
|
||||
set(HW_GRPC_SRCS "${CMAKE_CURRENT_BINARY_DIR}/backend.grpc.pb.cc")
|
||||
set(HW_GRPC_HDRS "${CMAKE_CURRENT_BINARY_DIR}/backend.grpc.pb.h")
|
||||
|
||||
add_custom_command(
|
||||
OUTPUT "${HW_PROTO_SRCS}" "${HW_PROTO_HDRS}" "${HW_GRPC_SRCS}" "${HW_GRPC_HDRS}"
|
||||
COMMAND ${_PROTOC}
|
||||
ARGS --grpc_out "${CMAKE_CURRENT_BINARY_DIR}"
|
||||
--cpp_out "${CMAKE_CURRENT_BINARY_DIR}"
|
||||
-I "${HW_PROTO_PATH}"
|
||||
--plugin=protoc-gen-grpc="${_GRPC_CPP_PLUGIN}"
|
||||
"${HW_PROTO}"
|
||||
DEPENDS "${HW_PROTO}")
|
||||
|
||||
add_library(hw_grpc_proto STATIC
|
||||
${HW_GRPC_SRCS} ${HW_GRPC_HDRS}
|
||||
${HW_PROTO_SRCS} ${HW_PROTO_HDRS})
|
||||
target_include_directories(hw_grpc_proto PUBLIC ${CMAKE_CURRENT_BINARY_DIR})
|
||||
|
||||
# Build only the pf static lib (+ ggml) from the engine tree — no CLI/bench/tests.
|
||||
# PF_VULKAN is honored when passed on the cmake command line (it lands in the
|
||||
# shared cache the engine reads).
|
||||
set(PF_BUILD_TOOLS OFF CACHE BOOL "" FORCE)
|
||||
set(PF_BUILD_TESTS OFF CACHE BOOL "" FORCE)
|
||||
add_subdirectory(${PRIVACY_FILTER_DIR} ${CMAKE_CURRENT_BINARY_DIR}/privacy-filter.cpp)
|
||||
|
||||
add_executable(${TARGET} grpc-server.cpp)
|
||||
target_link_libraries(${TARGET} PRIVATE
|
||||
pf
|
||||
hw_grpc_proto
|
||||
gRPC::grpc++
|
||||
gRPC::grpc++_reflection
|
||||
protobuf::libprotobuf
|
||||
Threads::Threads)
|
||||
77
backend/cpp/privacy-filter/Makefile
Normal file
77
backend/cpp/privacy-filter/Makefile
Normal file
@@ -0,0 +1,77 @@
|
||||
# privacy-filter backend Makefile.
|
||||
#
|
||||
# Wraps the standalone privacy-filter.cpp GGML engine (the openai-privacy-filter
|
||||
# PII/NER token classifier) as a LocalAI gRPC backend. The engine source is
|
||||
# fetched at the pin below — .github/workflows/bump_deps.yaml finds and updates
|
||||
# PRIVACY_FILTER_VERSION, matching the llama-cpp / ds4 convention.
|
||||
#
|
||||
# Local development: point at a working checkout instead of cloning, e.g.
|
||||
# make PRIVACY_FILTER_SRC=$HOME/c/privacy-filter.cpp grpc-server
|
||||
|
||||
PRIVACY_FILTER_VERSION?=646342f7a59c6b7d195185eac60bad762e572f1d
|
||||
PRIVACY_FILTER_REPO?=https://github.com/localai-org/privacy-filter.cpp
|
||||
PRIVACY_FILTER_SRC?=
|
||||
|
||||
CURRENT_MAKEFILE_DIR := $(dir $(abspath $(lastword $(MAKEFILE_LIST))))
|
||||
BUILD_DIR := build
|
||||
|
||||
BUILD_TYPE ?=
|
||||
NATIVE ?= false
|
||||
JOBS ?= $(shell nproc 2>/dev/null || echo 4)
|
||||
|
||||
CMAKE_ARGS ?= -DCMAKE_BUILD_TYPE=Release
|
||||
|
||||
# GPU backends; the default (cpu) needs no extra flags. 'cublas' is LocalAI's
|
||||
# name for the CUDA build (matches llama-cpp / ds4), mapping to the engine's
|
||||
# GGML_CUDA path; 'vulkan' selects the ggml Vulkan backend.
|
||||
ifeq ($(BUILD_TYPE),cublas)
|
||||
CMAKE_ARGS += -DPF_CUDA=ON
|
||||
endif
|
||||
ifeq ($(BUILD_TYPE),vulkan)
|
||||
CMAKE_ARGS += -DPF_VULKAN=ON
|
||||
endif
|
||||
|
||||
# Portable binaries for distribution: disable -march=native unless asked.
|
||||
ifneq ($(NATIVE),true)
|
||||
CMAKE_ARGS += -DGGML_NATIVE=OFF
|
||||
endif
|
||||
|
||||
.PHONY: grpc-server package clean purge test all
|
||||
all: grpc-server
|
||||
|
||||
# Provide the engine sources at ./privacy-filter.cpp. With PRIVACY_FILTER_SRC
|
||||
# set we symlink a local checkout (instant, no network); otherwise we clone the
|
||||
# pinned commit and its ggml submodule. The directory/symlink is the target, so
|
||||
# make only does this once — run 'make purge && make' to refetch after a bump.
|
||||
privacy-filter.cpp:
|
||||
ifneq ($(PRIVACY_FILTER_SRC),)
|
||||
ln -sfn $(abspath $(PRIVACY_FILTER_SRC)) privacy-filter.cpp
|
||||
else
|
||||
mkdir -p privacy-filter.cpp
|
||||
cd privacy-filter.cpp && \
|
||||
git init -q && \
|
||||
git remote add origin $(PRIVACY_FILTER_REPO) && \
|
||||
git fetch --depth 1 origin $(PRIVACY_FILTER_VERSION) && \
|
||||
git checkout FETCH_HEAD && \
|
||||
git submodule update --init --recursive --depth 1
|
||||
endif
|
||||
|
||||
grpc-server: privacy-filter.cpp
|
||||
@echo "Building privacy-filter grpc-server ($(BUILD_TYPE)) with $(CMAKE_ARGS)"
|
||||
mkdir -p $(BUILD_DIR)
|
||||
cd $(BUILD_DIR) && cmake $(CMAKE_ARGS) $(CURRENT_MAKEFILE_DIR) && cmake --build . --config Release -j $(JOBS)
|
||||
cp $(BUILD_DIR)/grpc-server grpc-server
|
||||
|
||||
package: grpc-server
|
||||
bash package.sh
|
||||
|
||||
test:
|
||||
@echo "privacy-filter backend: parity/regression coverage lives in the engine repo"
|
||||
|
||||
clean:
|
||||
rm -rf $(BUILD_DIR) grpc-server package
|
||||
|
||||
# 'privacy-filter.cpp' may be a symlink (PRIVACY_FILTER_SRC) — rm without a
|
||||
# trailing slash removes the link, never the linked-to checkout.
|
||||
purge: clean
|
||||
rm -rf privacy-filter.cpp
|
||||
210
backend/cpp/privacy-filter/grpc-server.cpp
Normal file
210
backend/cpp/privacy-filter/grpc-server.cpp
Normal file
@@ -0,0 +1,210 @@
|
||||
// privacy-filter LocalAI gRPC backend.
|
||||
//
|
||||
// Thin shim over privacy-filter.cpp's flat C API (include/pf.h): a standalone
|
||||
// GGML engine for the openai-privacy-filter token-classification model family
|
||||
// (PII NER). It replaces the llama.cpp-patched TokenClassify path for this one
|
||||
// model family — same GGUF files, no llama.cpp carry-patches.
|
||||
//
|
||||
// Only the RPCs the PII tier needs are implemented: LoadModel, TokenClassify,
|
||||
// plus Health / Status / Free. Everything else inherits the generated base
|
||||
// class default (UNIMPLEMENTED).
|
||||
|
||||
#include "backend.pb.h"
|
||||
#include "backend.grpc.pb.h"
|
||||
|
||||
#include "pf.h"
|
||||
|
||||
#include <grpcpp/grpcpp.h>
|
||||
#include <grpcpp/server.h>
|
||||
#include <grpcpp/server_builder.h>
|
||||
#include <grpcpp/ext/proto_server_reflection_plugin.h>
|
||||
|
||||
#include <atomic>
|
||||
#include <chrono>
|
||||
#include <csignal>
|
||||
#include <iostream>
|
||||
#include <memory>
|
||||
#include <mutex>
|
||||
#include <string>
|
||||
|
||||
using grpc::Server;
|
||||
using grpc::ServerBuilder;
|
||||
using grpc::ServerContext;
|
||||
// NOTE: do NOT alias grpc::Status as Status — the Status RPC method below would
|
||||
// shadow the type and break the other method signatures. Use GStatus instead.
|
||||
using GStatus = ::grpc::Status;
|
||||
using grpc::StatusCode;
|
||||
|
||||
namespace {
|
||||
|
||||
// The engine is single-model-per-process: LocalAI spawns one backend process
|
||||
// per loaded model. g_mu guards (re)load against in-flight classification.
|
||||
std::mutex g_mu;
|
||||
pf_ctx * g_ctx = nullptr;
|
||||
std::atomic<Server *> g_server{nullptr};
|
||||
|
||||
// Resolve the device string the engine expects ("cpu" / "gpu" / "cuda" /
|
||||
// "vulkan", optionally ":N"). Priority: an explicit "device:..." in
|
||||
// ModelOptions.Options, then a non-zero NGPULayers as a coarse "use the GPU"
|
||||
// signal, else CPU. "gpu" lets the engine pick whichever GPU backend this
|
||||
// binary was compiled with (CUDA or Vulkan), so the same config works on
|
||||
// either build; pin "device:cuda"/"device:vulkan" to be explicit.
|
||||
std::string resolve_device(const backend::ModelOptions * opts) {
|
||||
for (const auto & o : opts->options()) {
|
||||
const std::string prefix = "device:";
|
||||
if (o.rfind(prefix, 0) == 0) {
|
||||
return o.substr(prefix.size());
|
||||
}
|
||||
}
|
||||
if (opts->ngpulayers() > 0) {
|
||||
return "gpu";
|
||||
}
|
||||
return "cpu";
|
||||
}
|
||||
|
||||
class PrivacyFilterBackend final : public backend::Backend::Service {
|
||||
public:
|
||||
GStatus Health(ServerContext *, const backend::HealthMessage *,
|
||||
backend::Reply * reply) override {
|
||||
reply->set_message("OK");
|
||||
return GStatus::OK;
|
||||
}
|
||||
|
||||
GStatus Status(ServerContext *, const backend::HealthMessage *,
|
||||
backend::StatusResponse * response) override {
|
||||
std::lock_guard<std::mutex> lock(g_mu);
|
||||
response->set_state(g_ctx ? backend::StatusResponse::READY
|
||||
: backend::StatusResponse::UNINITIALIZED);
|
||||
return GStatus::OK;
|
||||
}
|
||||
|
||||
GStatus LoadModel(ServerContext *, const backend::ModelOptions * request,
|
||||
backend::Result * result) override {
|
||||
std::lock_guard<std::mutex> lock(g_mu);
|
||||
|
||||
// ModelFile is the absolute path LocalAI resolves; Model is the bare
|
||||
// name. Prefer the former, fall back to the latter.
|
||||
const std::string path =
|
||||
!request->modelfile().empty() ? request->modelfile() : request->model();
|
||||
if (path.empty()) {
|
||||
result->set_success(false);
|
||||
result->set_message("no model path supplied");
|
||||
return GStatus::OK;
|
||||
}
|
||||
|
||||
const std::string device = resolve_device(request);
|
||||
|
||||
if (g_ctx) { pf_free(g_ctx); g_ctx = nullptr; }
|
||||
|
||||
pf_ctx * ctx = pf_load(path.c_str(), device.c_str(), request->threads());
|
||||
const char * err = pf_last_error(ctx);
|
||||
if (err) {
|
||||
result->set_success(false);
|
||||
result->set_message(std::string("privacy-filter load failed: ") + err);
|
||||
pf_free(ctx);
|
||||
return GStatus::OK;
|
||||
}
|
||||
|
||||
// ContextSize, when set, becomes the per-forward window. The engine
|
||||
// ignores values that are too small to window (<= 2*halo) and just
|
||||
// runs a single forward, so passing it through is always safe.
|
||||
if (request->contextsize() > 0) {
|
||||
pf_set_window(ctx, request->contextsize());
|
||||
}
|
||||
|
||||
g_ctx = ctx;
|
||||
result->set_success(true);
|
||||
result->set_message("privacy-filter loaded (" + device + ")");
|
||||
return GStatus::OK;
|
||||
}
|
||||
|
||||
GStatus TokenClassify(ServerContext *, const backend::TokenClassifyRequest * request,
|
||||
backend::TokenClassifyResponse * response) override {
|
||||
std::lock_guard<std::mutex> lock(g_mu);
|
||||
if (!g_ctx) {
|
||||
return GStatus(StatusCode::FAILED_PRECONDITION, "Model not loaded");
|
||||
}
|
||||
|
||||
const std::string & text = request->text();
|
||||
if (text.empty()) {
|
||||
return GStatus::OK; // no text -> no entities
|
||||
}
|
||||
|
||||
pf_entity * ents = nullptr;
|
||||
size_t n = 0;
|
||||
if (pf_classify(g_ctx, text.data(), text.size(), request->threshold(), &ents, &n) != 0) {
|
||||
const char * err = pf_last_error(g_ctx);
|
||||
return GStatus(StatusCode::INTERNAL,
|
||||
std::string("TokenClassify failed: ") + (err ? err : "unknown"));
|
||||
}
|
||||
|
||||
// Byte offsets are into the original UTF-8 text; the engine already
|
||||
// applied the threshold and whitespace-trimmed span edges.
|
||||
for (size_t i = 0; i < n; i++) {
|
||||
backend::TokenClassifyEntity * ent = response->add_entities();
|
||||
ent->set_entity_group(ents[i].label ? ents[i].label : "");
|
||||
ent->set_start(ents[i].start);
|
||||
ent->set_end(ents[i].end);
|
||||
ent->set_score(ents[i].score);
|
||||
ent->set_text(text.substr((size_t) ents[i].start,
|
||||
(size_t) (ents[i].end - ents[i].start)));
|
||||
}
|
||||
pf_entities_free(ents, n);
|
||||
return GStatus::OK;
|
||||
}
|
||||
|
||||
GStatus Free(ServerContext *, const backend::HealthMessage *,
|
||||
backend::Result * result) override {
|
||||
std::lock_guard<std::mutex> lock(g_mu);
|
||||
if (g_ctx) { pf_free(g_ctx); g_ctx = nullptr; }
|
||||
result->set_success(true);
|
||||
return GStatus::OK;
|
||||
}
|
||||
};
|
||||
|
||||
void RunServer(const std::string & addr) {
|
||||
PrivacyFilterBackend service;
|
||||
grpc::EnableDefaultHealthCheckService(true);
|
||||
grpc::reflection::InitProtoReflectionServerBuilderPlugin();
|
||||
|
||||
ServerBuilder builder;
|
||||
builder.AddListeningPort(addr, grpc::InsecureServerCredentials());
|
||||
builder.RegisterService(&service);
|
||||
builder.SetMaxReceiveMessageSize(64 * 1024 * 1024);
|
||||
builder.SetMaxSendMessageSize(64 * 1024 * 1024);
|
||||
|
||||
std::unique_ptr<Server> server(builder.BuildAndStart());
|
||||
if (!server) {
|
||||
std::cerr << "privacy-filter grpc-server: failed to bind " << addr << "\n";
|
||||
std::exit(1);
|
||||
}
|
||||
g_server = server.get();
|
||||
std::cerr << "privacy-filter grpc-server listening on " << addr << "\n";
|
||||
server->Wait();
|
||||
}
|
||||
|
||||
void signal_handler(int) {
|
||||
if (auto * srv = g_server.load()) {
|
||||
srv->Shutdown(std::chrono::system_clock::now() + std::chrono::seconds(3));
|
||||
}
|
||||
}
|
||||
|
||||
} // namespace
|
||||
|
||||
int main(int argc, char * argv[]) {
|
||||
std::string addr = "127.0.0.1:50051";
|
||||
for (int i = 1; i < argc; ++i) {
|
||||
std::string a = argv[i];
|
||||
const std::string addr_flag = "--addr=";
|
||||
if (a.rfind(addr_flag, 0) == 0) addr = a.substr(addr_flag.size());
|
||||
else if (a == "--addr" && i + 1 < argc) addr = argv[++i];
|
||||
else if (a == "--help" || a == "-h") {
|
||||
std::cout << "Usage: grpc-server --addr=HOST:PORT\n";
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
std::signal(SIGINT, signal_handler);
|
||||
std::signal(SIGTERM, signal_handler);
|
||||
RunServer(addr);
|
||||
return 0;
|
||||
}
|
||||
39
backend/cpp/privacy-filter/package.sh
Executable file
39
backend/cpp/privacy-filter/package.sh
Executable file
@@ -0,0 +1,39 @@
|
||||
#!/bin/bash
|
||||
# Assemble package/ for the from-scratch backend image: the grpc-server binary,
|
||||
# run.sh, the dynamic loader, and every shared library the binary needs.
|
||||
set -e
|
||||
CURDIR=$(dirname "$(realpath "$0")")
|
||||
REPO_ROOT="${CURDIR}/../../.."
|
||||
|
||||
mkdir -p "$CURDIR/package/lib"
|
||||
cp -avf "$CURDIR/grpc-server" "$CURDIR/package/"
|
||||
cp -rfv "$CURDIR/run.sh" "$CURDIR/package/"
|
||||
|
||||
# The dynamic loader, renamed to lib/ld.so so run.sh can invoke it explicitly
|
||||
# (makes the image independent of the host's glibc layout).
|
||||
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
|
||||
cp -arfLv /lib64/ld-linux-x86-64.so.2 "$CURDIR/package/lib/ld.so"
|
||||
elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
|
||||
cp -arfLv /lib/ld-linux-aarch64.so.1 "$CURDIR/package/lib/ld.so"
|
||||
else
|
||||
echo "package.sh: unknown architecture" >&2; exit 1
|
||||
fi
|
||||
|
||||
# Bundle the binary's transitive shared deps (libstdc++, libgomp, and the apt
|
||||
# grpc++/protobuf/absl stack) by walking ldd — robust to whichever of those are
|
||||
# linked shared vs static. The loader line (no "=>") is skipped; ld.so above
|
||||
# already covers it.
|
||||
ldd "$CURDIR/grpc-server" | awk '$2 == "=>" && $3 ~ /^\// { print $3 }' | sort -u | \
|
||||
while read -r so; do
|
||||
[ -f "$so" ] && cp -arfLv "$so" "$CURDIR/package/lib/"
|
||||
done
|
||||
|
||||
# Vulkan loader / GPU libs when building the GPU variant.
|
||||
GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
|
||||
if [ -f "$GPU_LIB_SCRIPT" ]; then
|
||||
source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
|
||||
package_gpu_libs
|
||||
fi
|
||||
|
||||
echo "privacy-filter package contents:"
|
||||
ls -lah "$CURDIR/package/" "$CURDIR/package/lib/"
|
||||
9
backend/cpp/privacy-filter/run.sh
Executable file
9
backend/cpp/privacy-filter/run.sh
Executable file
@@ -0,0 +1,9 @@
|
||||
#!/bin/bash
|
||||
# Entry point for the privacy-filter backend image / BACKEND_BINARY mode.
|
||||
set -e
|
||||
CURDIR=$(dirname "$(realpath "$0")")
|
||||
export LD_LIBRARY_PATH="$CURDIR/lib:$LD_LIBRARY_PATH"
|
||||
if [ -f "$CURDIR/lib/ld.so" ]; then
|
||||
exec "$CURDIR/lib/ld.so" "$CURDIR/grpc-server" "$@"
|
||||
fi
|
||||
exec "$CURDIR/grpc-server" "$@"
|
||||
@@ -999,6 +999,42 @@
|
||||
nvidia-l4t: "vulkan-localvqe"
|
||||
nvidia-l4t-cuda-12: "vulkan-localvqe"
|
||||
nvidia-l4t-cuda-13: "vulkan-localvqe"
|
||||
- &privacyfilter
|
||||
name: "privacy-filter"
|
||||
alias: "privacy-filter"
|
||||
icon: https://cdn-avatars.huggingface.co/v1/production/uploads/5fd5e18a90b6dc4633f6d292/QPiv8pt4JNxr0FdGnpFef.png
|
||||
description: |
|
||||
Standalone GGML engine (privacy-filter.cpp) for the OpenMed privacy-filter
|
||||
PII/NER token-classification model family. It runs the openai-privacy-filter
|
||||
architecture (a gpt-oss-style sparse-MoE bidirectional token classifier) on
|
||||
stock upstream GGML — no llama.cpp coupling and no Python — and serves the
|
||||
TokenClassify RPC (constrained BIOES Viterbi decode into UTF-8 byte-offset
|
||||
entity spans) used by LocalAI's NER PII redaction tier.
|
||||
urls:
|
||||
- https://github.com/localai-org/privacy-filter.cpp
|
||||
tags:
|
||||
- token-classification
|
||||
- ner
|
||||
- pii
|
||||
- privacy
|
||||
- CPU
|
||||
- GPU
|
||||
license: apache-2.0
|
||||
# Builds: CPU (amd64+arm64 manifest), Vulkan (amd64) and CUDA 13 (amd64).
|
||||
# Only a host that actually reports CUDA 13 gets the CUDA image (it bundles
|
||||
# the CUDA 13 runtime and needs a recent driver); every other GPU — including
|
||||
# NVIDIA without a CUDA-13 toolkit, AMD and Intel — routes to the Vulkan
|
||||
# image, which only needs a Vulkan ICD. Everything else (incl. arm64/Jetson,
|
||||
# where Vulkan/CUDA images are a future add) falls back to the CPU build,
|
||||
# already fast for this ~50M-active-param model.
|
||||
capabilities:
|
||||
default: "cpu-privacy-filter"
|
||||
nvidia: "vulkan-privacy-filter"
|
||||
nvidia-cuda-12: "vulkan-privacy-filter"
|
||||
nvidia-cuda-13: "cuda13-privacy-filter"
|
||||
amd: "vulkan-privacy-filter"
|
||||
intel: "vulkan-privacy-filter"
|
||||
vulkan: "vulkan-privacy-filter"
|
||||
- &faster-whisper
|
||||
icon: https://avatars.githubusercontent.com/u/1520500?s=200&v=4
|
||||
description: |
|
||||
@@ -2703,6 +2739,37 @@
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-stablediffusion-ggml"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-nvidia-cuda-13-stablediffusion-ggml
|
||||
## privacy-filter
|
||||
- !!merge <<: *privacyfilter
|
||||
name: "cpu-privacy-filter"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-privacy-filter"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-cpu-privacy-filter
|
||||
- !!merge <<: *privacyfilter
|
||||
name: "cpu-privacy-filter-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-cpu-privacy-filter"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-cpu-privacy-filter
|
||||
- !!merge <<: *privacyfilter
|
||||
name: "vulkan-privacy-filter"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-vulkan-privacy-filter"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-gpu-vulkan-privacy-filter
|
||||
- !!merge <<: *privacyfilter
|
||||
name: "vulkan-privacy-filter-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-vulkan-privacy-filter"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-vulkan-privacy-filter
|
||||
- !!merge <<: *privacyfilter
|
||||
name: "cuda13-privacy-filter"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-privacy-filter"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-gpu-nvidia-cuda-13-privacy-filter
|
||||
- !!merge <<: *privacyfilter
|
||||
name: "cuda13-privacy-filter-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-privacy-filter"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-nvidia-cuda-13-privacy-filter
|
||||
# vllm
|
||||
- !!merge <<: *vllm
|
||||
name: "vllm-development"
|
||||
|
||||
@@ -270,10 +270,17 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
|
||||
|
||||
def TokenClassify(self, request, context):
|
||||
# Runs HuggingFace's token-classification pipeline and returns
|
||||
# the aggregated entity spans. The pipeline gives us byte
|
||||
# offsets via aggregation_strategy="simple" (set at load
|
||||
# time), so the caller can slice the original text without
|
||||
# re-tokenising on the Go side.
|
||||
# the aggregated entity spans.
|
||||
#
|
||||
# OFFSET UNITS: the proto contract (TokenClassifyEntity.start/end)
|
||||
# is UTF-8 BYTE offsets into request.text. HuggingFace's pipeline,
|
||||
# however, reports start/end as CODEPOINT offsets into the Python
|
||||
# str (derived from the fast tokenizer's offset_mapping). Those
|
||||
# coincide only for ASCII; for any multi-byte character they
|
||||
# diverge — and this entry point exists to serve the explicitly
|
||||
# multilingual privacy-filter model, so the conversion is
|
||||
# mandatory, not a nicety. We build one prefix table mapping each
|
||||
# codepoint index to its byte offset and translate every span.
|
||||
if not getattr(self, "TokenClassification", False):
|
||||
context.set_code(grpc.StatusCode.FAILED_PRECONDITION)
|
||||
context.set_details("model was not loaded as Type=TokenClassification")
|
||||
@@ -286,18 +293,50 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
|
||||
context.set_details(f"token-classification failed: {err}")
|
||||
return backend_pb2.TokenClassifyResponse()
|
||||
|
||||
text = request.text
|
||||
# byte_at[i] = byte length of text[:i]; len == len(text)+1 so an
|
||||
# exclusive end offset that points one past the last codepoint
|
||||
# maps to len(text.encode("utf-8")). Built in a single O(n) pass.
|
||||
byte_at = [0] * (len(text) + 1)
|
||||
acc = 0
|
||||
for i, ch in enumerate(text):
|
||||
byte_at[i] = acc
|
||||
acc += len(ch.encode("utf-8"))
|
||||
byte_at[len(text)] = acc
|
||||
|
||||
def to_byte(cp_index, default):
|
||||
# Clamp out-of-range codepoint indices into the table rather
|
||||
# than throwing: a span we can't place is better dropped Go-side
|
||||
# than crashing the RPC.
|
||||
if cp_index is None:
|
||||
cp_index = default
|
||||
if cp_index < 0:
|
||||
cp_index = 0
|
||||
elif cp_index > len(text):
|
||||
cp_index = len(text)
|
||||
return byte_at[cp_index]
|
||||
|
||||
threshold = request.threshold if request.threshold > 0 else 0.0
|
||||
entities = []
|
||||
for r in results:
|
||||
score = float(r.get("score", 0.0))
|
||||
if score < threshold:
|
||||
continue
|
||||
cp_start = r.get("start")
|
||||
cp_end = r.get("end")
|
||||
start = to_byte(cp_start, 0)
|
||||
end = to_byte(cp_end, 0)
|
||||
entities.append(backend_pb2.TokenClassifyEntity(
|
||||
entity_group=str(r.get("entity_group") or r.get("entity") or ""),
|
||||
start=int(r.get("start", 0)),
|
||||
end=int(r.get("end", 0)),
|
||||
start=start,
|
||||
end=end,
|
||||
score=score,
|
||||
text=str(r.get("word", "")),
|
||||
# Slice the original text by the (codepoint) span so the
|
||||
# echoed text matches start..end exactly, instead of the
|
||||
# pipeline's reconstructed "word" which can carry wordpiece
|
||||
# artifacts. Falls back to "word" when offsets are absent.
|
||||
text=(text[cp_start:cp_end] if cp_start is not None and cp_end is not None
|
||||
else str(r.get("word", ""))),
|
||||
))
|
||||
return backend_pb2.TokenClassifyResponse(entities=entities)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user