Compare commits

...

14 Commits

Author SHA1 Message Date
Ettore Di Giacinto
8423d8e7d6 fix(liquid-audio): guard uv-only --index-strategy for the pip/darwin path
Same fix as trl: the darwin/MPS build installs with pip (USE_PIP=true), which
rejects the uv-only --index-strategy flag and failed the darwin backend build.
Add it only on the uv path; Linux/CUDA resolution is unchanged.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]
2026-06-24 20:05:46 +00:00
Ettore Di Giacinto
01ec925e9f ci(liquid-audio): trigger darwin build via requirements-mps note
The changed-backends path filter only builds a backend when a file under
its directory changes. The metal wiring lived in index.yaml + the matrix,
so the darwin job was skipped. Add a documenting comment to the MPS
requirements so CI actually exercises the darwin build.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]
2026-06-24 19:55:37 +00:00
Ettore Di Giacinto
d5e638e1dd feat(backends): add darwin/metal build for liquid-audio
Wire the already-MPS-ready liquid-audio backend (it ships
requirements-mps.txt) into the darwin CI matrix and the gallery so
metal-darwin-arm64 images are built and selectable.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4.8 [Claude Code]
2026-06-24 17:10:35 +00:00
Dedy F. Setyawan
e8ae88a2a0 i18n(id): update and complete Indonesian translations (#10480)
- translate remaining English strings in chat, common, home, and media locales.
- fix typo and improve wording consistency (e.g., klaster -> kluster, otomasi -> automasi).

Signed-off-by: Dedy F. Setyawan <dedyfajars@gmail.com>
2026-06-24 18:35:21 +02:00
Richard Palethorpe
e1994579f8 fix(pii): load default detectors at startup + add LOCALAI_PII_DEFAULT_DETECTORS (#10474)
pii_default_detectors was applied to the live config only by a live
POST /api/settings (ApplyRuntimeSettings) — neither the startup loader nor
the config file watcher read it back. So after a restart the persisted
default detectors were dropped, and the cloud-proxy MITM listener (which
resolves each intercept host's detectors once at start via ResolvePIIPolicy)
came up with an empty set and forwarded intercepted traffic unredacted, even
though the MITM model had pii.enabled:true and the defaults were on disk.
Request-side default redaction broke the same way.

- startup.go: loadRuntimeSettingsFromFile now applies pii_default_detectors,
  before startMITMIfConfigured, with env > file precedence.
- config_file_watcher.go: apply pii_default_detectors on live file edits,
  matching the existing env-guard pattern used for the other fields.
- settings endpoint: rebuild the MITM listener when pii_default_detectors
  changes (its per-host detector map is frozen at listener start), not only
  on a mitm_listen change — so toggling a default detector takes effect on
  cloud-proxy traffic immediately.
- new LOCALAI_PII_DEFAULT_DETECTORS env var / CLI flag (WithPIIDefaultDetectors)
  so the default detector set can be pinned at boot for immutable deployments.

Assisted-by: Claude:claude-opus-4-8 Claude-Code

Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-06-24 11:08:57 +02:00
LocalAI [bot]
e5620989dd refactor(distributed): make in-flight tracking coverage a compile-time contract (#10476)
PR #10475 fixed SoundDetection in-flight tracking, but the underlying trap
remains: InFlightTrackingClient embedded the whole grpc.Backend interface
"for passthrough of untracked methods", so any newly added inference method
is silently satisfied by the embedded passthrough and never wrapped with
track(). That leaves onFirstComplete unfired and in-flight stuck at 1 - the
exact SoundDetection bug, waiting to recur for the next backend method.

Close the gap at the type level instead of relying on reviewers to remember:

- Split grpc.Backend into two composed sub-interfaces: InferenceBackend
  (methods that are one discrete inference call and must be tracked) and
  ControlBackend (control-plane calls plus the streaming constructors whose
  work spans the returned stream, safe to pass through). The classification
  now lives next to the interface it documents.
- InFlightTrackingClient embeds only grpc.ControlBackend and implements every
  InferenceBackend method explicitly, delegating to an inner InferenceBackend.
  A `var _ grpc.Backend = (*InFlightTrackingClient)(nil)` assertion makes the
  package fail to compile if any inference method is left unwrapped.

Now adding a method to InferenceBackend is a build error (at the assertion and
every call site: "does not implement grpc.Backend (missing method X)"), not a
silent runtime leak - and the obvious fix is to copy a neighbouring wrapper,
which calls track(). No runtime guard or reviewer vigilance required.

Pure refactor: the composed Backend interface is identical to the old flat
one, so all implementers and consumers are unaffected (verified with a full
`go build ./...`). Behaviour is unchanged; the existing nodes suite passes.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 11:08:29 +02:00
LocalAI [bot]
fc618dcee6 fix(distributed): track in-flight for SoundDetection requests (#10475)
The distributed router wraps backend clients in InFlightTrackingClient so
the eviction logic knows which replicas are actively serving. Every
inference method must be wrapped: track() increments in-flight on entry
and decrements (plus fires onFirstComplete, which releases the load-time
reservation) on return.

SoundDetection was added after the tracking client and never got a
wrapper, so its calls fell through to the embedded passthrough Backend.
The increment/decrement never ran and, critically, onFirstComplete never
fired, so the reservation set at model load was never released - leaving
in-flight stuck at 1 and the replica permanently ineligible for eviction.

Wrap SoundDetection like the other non-LLM methods and cover it in the
"non-LLM inference methods track in-flight" table test.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 10:13:37 +02:00
LocalAI [bot]
e6042080c0 fix(agents): URL-decode collection/agent name path params (#10443) (#10471)
fix(agents): URL-decode collection/agent name path params

Collection and agent names carry a "legacy-api-key:" prefix, so the ':'
arrives percent-encoded as %3A in the request path. Echo routes such
paths via URL.RawPath and stores the matched path-param value still
escaped, so c.Param("name") returned "legacy-api-key%3ALiteraryResearch"
and the store lookup 404'd ("collection not found").

This was second-order fallout of #10375/#10387: once colons became valid
in names, the URL-decode gap surfaced on every name-bearing endpoint.

Add a decodedParam helper that url.PathUnescape's the param (falling back
to the raw value on invalid encoding) and wire it into all collection
endpoints and the agent :name endpoints, which share the identical
prefix. The entry endpoints already unescaped c.Param("*"); this closes
the same gap for :name.

Fixes #10443


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 09:42:09 +02:00
LocalAI [bot]
0f3b24436d chore: ⬆️ Update mudler/parakeet.cpp to 89f5e2977b4d8bccd45e7bcc6f2ef7c4ed49e89a (#10468)
⬆️ Update mudler/parakeet.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 09:41:43 +02:00
LocalAI [bot]
4b6f911835 chore: ⬆️ Update ggml-org/whisper.cpp to 43d78af5be58f41d6ffbc227d608f104577741ea (#10466)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 09:41:14 +02:00
LocalAI [bot]
a5e28942a6 chore: ⬆️ Update ggml-org/llama.cpp to be4a6a63eb2b848e19c277bdcf2bd399e8af76d9 (#10467)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 09:40:54 +02:00
LocalAI [bot]
dba9cd7ca4 chore: ⬆️ Update CrispStrobe/CrispASR to 96b2a6ee31d30389fed8a7ef1a54239b75231ddc (#10465)
⬆️ Update CrispStrobe/CrispASR

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 09:40:34 +02:00
LocalAI [bot]
c93190de50 chore: ⬆️ Update ikawrakow/ik_llama.cpp to 7ccf1d209588962b96eacca325b37e9b3e8faf5e (#10456)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 09:40:13 +02:00
LocalAI [bot]
4dbf69f889 chore(model gallery): 🤖 add 1 new models via gallery agent (#10472)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-24 00:00:26 +02:00
30 changed files with 433 additions and 119 deletions

View File

@@ -4974,6 +4974,9 @@ includeDarwin:
- backend: "kitten-tts"
tag-suffix: "-metal-darwin-arm64-kitten-tts"
build-type: "mps"
- backend: "liquid-audio"
tag-suffix: "-metal-darwin-arm64-liquid-audio"
build-type: "mps"
- backend: "piper"
tag-suffix: "-metal-darwin-arm64-piper"
build-type: "metal"

View File

@@ -1,5 +1,5 @@
IK_LLAMA_VERSION?=6c00e87ac84404af588ad2e65935bd6f079c696f
IK_LLAMA_VERSION?=7ccf1d209588962b96eacca325b37e9b3e8faf5e
LLAMA_REPO?=https://github.com/ikawrakow/ik_llama.cpp
CMAKE_ARGS?=

View File

@@ -1,5 +1,5 @@
LLAMA_VERSION?=73618f27a801c0b8614ceaf3547d3c2a99baae14
LLAMA_VERSION?=be4a6a63eb2b848e19c277bdcf2bd399e8af76d9
LLAMA_REPO?=https://github.com/ggerganov/llama.cpp
CMAKE_ARGS?=

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# CrispASR version (release tag)
CRISPASR_REPO?=https://github.com/CrispStrobe/CrispASR
CRISPASR_VERSION?=63b57289255267edf66e43e33bc3911e04a2e92d
CRISPASR_VERSION?=96b2a6ee31d30389fed8a7ef1a54239b75231ddc
SO_TARGET?=libgocrispasr.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF

View File

@@ -1,6 +1,6 @@
# parakeet-cpp backend Makefile.
#
# Upstream pin lives below as PARAKEET_VERSION?=db755a78d39f789bb7d4e3935158a9e8105dbe36
# Upstream pin lives below as PARAKEET_VERSION?=89f5e2977b4d8bccd45e7bcc6f2ef7c4ed49e89a
# (.github/bump_deps.sh) can find and update it - matches the
# whisper.cpp / ds4 / vibevoice-cpp convention.
#
@@ -15,7 +15,7 @@
# That's what the L0 smoke test uses. The default target below does the
# proper clone-at-pin + cmake build so CI doesn't need a side-checkout.
PARAKEET_VERSION?=db755a78d39f789bb7d4e3935158a9e8105dbe36
PARAKEET_VERSION?=89f5e2977b4d8bccd45e7bcc6f2ef7c4ed49e89a
PARAKEET_REPO?=https://github.com/mudler/parakeet.cpp
GOCMD?=go

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp
WHISPER_CPP_VERSION?=bae6bc02b1940bbfb87b6a0299c565e563b916d1
WHISPER_CPP_VERSION?=43d78af5be58f41d6ffbc227d608f104577741ea
SO_TARGET?=libgowhisper.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF

View File

@@ -1284,6 +1284,7 @@
nvidia-cuda-13: "cuda13-liquid-audio"
nvidia-cuda-12: "cuda12-liquid-audio"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-liquid-audio"
metal: "metal-liquid-audio"
icon: https://cdn-avatars.huggingface.co/v1/production/uploads/61b8e2ba285851687028d395/7_6D7rWrLxp2hb6OHSV1p.png
- &qwen-tts
urls:
@@ -4612,6 +4613,7 @@
nvidia-cuda-13: "cuda13-liquid-audio-development"
nvidia-cuda-12: "cuda12-liquid-audio-development"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-liquid-audio-development"
metal: "metal-liquid-audio-development"
- !!merge <<: *liquid-audio
name: "cpu-liquid-audio"
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-liquid-audio"
@@ -4622,6 +4624,16 @@
uri: "quay.io/go-skynet/local-ai-backends:master-cpu-liquid-audio"
mirrors:
- localai/localai-backends:master-cpu-liquid-audio
- !!merge <<: *liquid-audio
name: "metal-liquid-audio"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-liquid-audio"
mirrors:
- localai/localai-backends:latest-metal-darwin-arm64-liquid-audio
- !!merge <<: *liquid-audio
name: "metal-liquid-audio-development"
uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-liquid-audio"
mirrors:
- localai/localai-backends:master-metal-darwin-arm64-liquid-audio
- !!merge <<: *liquid-audio
name: "cuda12-liquid-audio"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-liquid-audio"

View File

@@ -14,5 +14,11 @@ else
fi
# liquid-audio's torch wheels are large; allow upgrades to satisfy transitive pins
EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
EXTRA_PIP_INSTALL_FLAGS+=" --upgrade"
# --index-strategy is a uv-only flag. The darwin/MPS build installs with pip
# (USE_PIP=true in scripts/build/python-darwin.sh), which rejects it. Only add
# it on the uv path; Linux/CUDA resolution is unchanged.
if [ "x${USE_PIP:-}" != "xtrue" ]; then
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-first-match"
fi
installRequirements

View File

@@ -1,3 +1,4 @@
# MPS (Apple Silicon / Metal) build profile - installed by the darwin CI job.
torch>=2.8.0
torchaudio>=2.8.0
torchcodec>=0.9.1

View File

@@ -215,6 +215,7 @@ func readRuntimeSettingsJson(startupAppConfig config.ApplicationConfig) fileHand
envBackendGalleries := slices.Equal(appConfig.BackendGalleries, startupAppConfig.BackendGalleries)
envAutoloadGalleries := appConfig.AutoloadGalleries == startupAppConfig.AutoloadGalleries
envAutoloadBackendGalleries := appConfig.AutoloadBackendGalleries == startupAppConfig.AutoloadBackendGalleries
envPIIDefaultDetectors := slices.Equal(appConfig.PIIDefaultDetectors, startupAppConfig.PIIDefaultDetectors)
envAgentJobRetentionDays := appConfig.AgentJobRetentionDays == startupAppConfig.AgentJobRetentionDays
envForceEvictionWhenBusy := appConfig.ForceEvictionWhenBusy == startupAppConfig.ForceEvictionWhenBusy
envLRUEvictionMaxRetries := appConfig.LRUEvictionMaxRetries == startupAppConfig.LRUEvictionMaxRetries
@@ -335,6 +336,15 @@ func readRuntimeSettingsJson(startupAppConfig config.ApplicationConfig) fileHand
if settings.AutoloadBackendGalleries != nil && !envAutoloadBackendGalleries {
appConfig.AutoloadBackendGalleries = *settings.AutoloadBackendGalleries
}
if settings.PIIDefaultDetectors != nil && !envPIIDefaultDetectors {
// Request-side default redaction reads this live via
// ResolvePIIPolicy, so a file edit takes effect on the next chat
// request. The MITM listener resolves its per-host detector map
// once at start, so a raw file edit reaches cloud-proxy traffic
// only after a restart or a POST /api/settings (which rebuilds
// the listener) — the admin UI uses the latter.
appConfig.PIIDefaultDetectors = append([]string(nil), (*settings.PIIDefaultDetectors)...)
}
if settings.AutoUpgradeBackends != nil {
appConfig.AutoUpgradeBackends = *settings.AutoUpgradeBackends
}

View File

@@ -109,6 +109,52 @@ var _ = Describe("loadRuntimeSettingsFromFile", func() {
})
})
// Instance-wide default PII detectors. The file is the only source (no
// env var), and the loader runs immediately before startMITMIfConfigured,
// so a regression here means the cloud-proxy MITM listener resolves an
// empty detector set at boot and forwards intercepted traffic unredacted —
// even though pii_default_detectors is on disk and the MITM model has PII
// enabled. It also breaks request-side default redaction the same way.
Describe("PII default detectors", func() {
It("loads pii_default_detectors from the file", func() {
cfg := &config.ApplicationConfig{DynamicConfigsDir: seedSettings(`{"pii_default_detectors": ["privacy-filter-nemotron", "secret-filter"]}`)}
loadRuntimeSettingsFromFile(cfg)
Expect(cfg.PIIDefaultDetectors).To(Equal([]string{"privacy-filter-nemotron", "secret-filter"}))
})
It("does not override an env/CLI-set value (LOCALAI_PII_DEFAULT_DETECTORS)", func() {
cfg := &config.ApplicationConfig{
DynamicConfigsDir: seedSettings(`{"pii_default_detectors": ["from-file"]}`),
PIIDefaultDetectors: []string{"from-env"}, // simulate WithPIIDefaultDetectors(env)
}
loadRuntimeSettingsFromFile(cfg)
Expect(cfg.PIIDefaultDetectors).To(Equal([]string{"from-env"}), "env var must win over the persisted file value")
})
})
// The live file watcher applies pii_default_detectors on a runtime change
// the same way it handles galleries/threads/etc.: env-set values (current
// == startup snapshot) are left alone, otherwise the file value is applied
// to the live config so request-side default redaction picks it up without
// a restart.
Describe("file watcher: pii_default_detectors", func() {
It("applies a changed file value to the live config", func() {
startup := config.ApplicationConfig{} // no env baseline
live := &config.ApplicationConfig{PIIDefaultDetectors: []string{"old"}}
handler := readRuntimeSettingsJson(startup)
Expect(handler([]byte(`{"pii_default_detectors":["new-a","new-b"]}`), live)).To(Succeed())
Expect(live.PIIDefaultDetectors).To(Equal([]string{"new-a", "new-b"}))
})
It("leaves an env-controlled value untouched", func() {
startup := config.ApplicationConfig{PIIDefaultDetectors: []string{"from-env"}}
live := &config.ApplicationConfig{PIIDefaultDetectors: []string{"from-env"}}
handler := readRuntimeSettingsJson(startup)
Expect(handler([]byte(`{"pii_default_detectors":["from-file"]}`), live)).To(Succeed())
Expect(live.PIIDefaultDetectors).To(Equal([]string{"from-env"}), "env-controlled detectors must not be overwritten by the file")
})
})
// The Agent Pool block has a mix of zero and non-zero defaults
// (Enabled=true, EmbeddingModel="granite-...", MaxChunkingSize=400,
// VectorEngine="chromem", AgentHubURL="https://agenthub.localai.io").

View File

@@ -750,6 +750,20 @@ func loadRuntimeSettingsFromFile(options *config.ApplicationConfig) {
options.MITMListen = *settings.MITMListen
}
// Instance-wide default PII detectors. LOCALAI_PII_DEFAULT_DETECTORS (via
// WithPIIDefaultDetectors) wins when set; otherwise the file is the source
// — apply it only when the env/CLI left the value empty, mirroring the
// "env > file" precedence used for the other fields. This must land before
// startMITMIfConfigured (called right after this loader): the cloud-proxy
// listener resolves each intercept host's detectors once at start via
// ResolvePIIPolicy, and a MITM model that names no detectors of its own
// falls back to these defaults. Without it the listener (and request-side
// default redaction) starts with an empty detector set and forwards
// traffic unredacted even though pii_default_detectors is on disk.
if settings.PIIDefaultDetectors != nil && len(options.PIIDefaultDetectors) == 0 {
options.PIIDefaultDetectors = append([]string(nil), (*settings.PIIDefaultDetectors)...)
}
// Backend upgrade flags
if settings.AutoUpgradeBackends != nil {
if !options.AutoUpgradeBackends {

View File

@@ -181,6 +181,8 @@ type RunCMD struct {
// Cloud-proxy MITM listener (off by default).
MITMListen string `env:"LOCALAI_MITM_LISTEN" help:"Address (host:port) for the cloudproxy MITM listener. Empty = disabled. Clients set HTTPS_PROXY=http://<this>:<port>. Intercept hosts are declared per-model via the model YAML mitm.hosts: block; create one from the Add Model UI." group:"middleware"`
MITMCADir string `env:"LOCALAI_MITM_CA_DIR" type:"path" help:"Directory holding the MITM proxy CA cert + key. Defaults to <data-path>/mitm-ca." group:"middleware"`
PIIDefaultDetectors []string `env:"LOCALAI_PII_DEFAULT_DETECTORS" help:"Instance-wide default PII/secret detector model names applied to any PII-enabled model (chiefly cloud-proxy / MITM models) that names no pii.detectors of its own. Comma-separated, e.g. privacy-filter-nemotron,secret-filter. Takes precedence over the value persisted via the Middleware UI." group:"middleware"`
}
func (r *RunCMD) Run(ctx *cliContext.Context) error {
@@ -243,6 +245,7 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error {
config.WithAPIAddress(r.Address),
config.WithMITMListen(r.MITMListen),
config.WithMITMCADir(r.MITMCADir),
config.WithPIIDefaultDetectors(r.PIIDefaultDetectors),
config.WithAgentJobRetentionDays(r.AgentJobRetentionDays),
config.WithLlamaCPPTunnelCallback(func(tunnels []string) {
tunnelEnvVar := strings.Join(tunnels, ",")

View File

@@ -712,6 +712,18 @@ func WithMITMCADir(dir string) AppOption {
}
}
// WithPIIDefaultDetectors sets the instance-wide default PII/secret detector
// model names applied to any PII-enabled model (chiefly cloud-proxy / MITM
// models) that names no pii.detectors of its own. CLI/env:
// LOCALAI_PII_DEFAULT_DETECTORS. Empty leaves the value to
// runtime_settings.json / the Middleware UI; a non-empty value takes
// precedence over the file (env > file).
func WithPIIDefaultDetectors(detectors []string) AppOption {
return func(o *ApplicationConfig) {
o.PIIDefaultDetectors = detectors
}
}
func WithDynamicConfigDir(dynamicConfigsDir string) AppOption {
return func(o *ApplicationConfig) {
o.DynamicConfigsDir = dynamicConfigsDir

View File

@@ -70,7 +70,7 @@ func UploadToCollectionEndpoint(app *application.Application) echo.HandlerFunc {
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
name := c.Param("name")
name := decodedParam(c, "name")
file, err := c.FormFile("file")
if err != nil {
return c.JSON(http.StatusBadRequest, map[string]string{"error": "file required"})
@@ -116,7 +116,7 @@ func ListCollectionEntriesEndpoint(app *application.Application) echo.HandlerFun
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
entries, err := svc.ListCollectionEntriesForUser(userID, c.Param("name"))
entries, err := svc.ListCollectionEntriesForUser(userID, decodedParam(c, "name"))
if err != nil {
if strings.Contains(err.Error(), "not found") {
return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
@@ -139,7 +139,7 @@ func GetCollectionEntryContentEndpoint(app *application.Application) echo.Handle
if err != nil {
entry = entryParam
}
content, chunkCount, err := svc.GetCollectionEntryContentForUser(userID, c.Param("name"), entry)
content, chunkCount, err := svc.GetCollectionEntryContentForUser(userID, decodedParam(c, "name"), entry)
if err != nil {
if strings.Contains(err.Error(), "not found") {
return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
@@ -164,7 +164,7 @@ func SearchCollectionEndpoint(app *application.Application) echo.HandlerFunc {
if err := c.Bind(&payload); err != nil {
return c.JSON(http.StatusBadRequest, map[string]string{"error": err.Error()})
}
results, err := svc.SearchCollectionForUser(userID, c.Param("name"), payload.Query, payload.MaxResults)
results, err := svc.SearchCollectionForUser(userID, decodedParam(c, "name"), payload.Query, payload.MaxResults)
if err != nil {
if strings.Contains(err.Error(), "not found") {
return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
@@ -182,7 +182,7 @@ func ResetCollectionEndpoint(app *application.Application) echo.HandlerFunc {
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
if err := svc.ResetCollectionForUser(userID, c.Param("name")); err != nil {
if err := svc.ResetCollectionForUser(userID, decodedParam(c, "name")); err != nil {
if strings.Contains(err.Error(), "not found") {
return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
}
@@ -202,7 +202,7 @@ func DeleteCollectionEntryEndpoint(app *application.Application) echo.HandlerFun
if err := c.Bind(&payload); err != nil {
return c.JSON(http.StatusBadRequest, map[string]string{"error": err.Error()})
}
remaining, err := svc.DeleteCollectionEntryForUser(userID, c.Param("name"), payload.Entry)
remaining, err := svc.DeleteCollectionEntryForUser(userID, decodedParam(c, "name"), payload.Entry)
if err != nil {
if strings.Contains(err.Error(), "not found") {
return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
@@ -230,7 +230,7 @@ func AddCollectionSourceEndpoint(app *application.Application) echo.HandlerFunc
if payload.UpdateInterval < 1 {
payload.UpdateInterval = 60
}
if err := svc.AddCollectionSourceForUser(userID, c.Param("name"), payload.URL, payload.UpdateInterval); err != nil {
if err := svc.AddCollectionSourceForUser(userID, decodedParam(c, "name"), payload.URL, payload.UpdateInterval); err != nil {
if strings.Contains(err.Error(), "not found") {
return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
}
@@ -250,7 +250,7 @@ func RemoveCollectionSourceEndpoint(app *application.Application) echo.HandlerFu
if err := c.Bind(&payload); err != nil {
return c.JSON(http.StatusBadRequest, map[string]string{"error": err.Error()})
}
if err := svc.RemoveCollectionSourceForUser(userID, c.Param("name"), payload.URL); err != nil {
if err := svc.RemoveCollectionSourceForUser(userID, decodedParam(c, "name"), payload.URL); err != nil {
return c.JSON(http.StatusInternalServerError, map[string]string{"error": err.Error()})
}
return c.JSON(http.StatusOK, map[string]string{"status": "ok"})
@@ -267,7 +267,7 @@ func GetCollectionEntryRawFileEndpoint(app *application.Application) echo.Handle
if err != nil {
entry = entryParam
}
fpath, err := svc.GetCollectionEntryFilePathForUser(userID, c.Param("name"), entry)
fpath, err := svc.GetCollectionEntryFilePathForUser(userID, decodedParam(c, "name"), entry)
if err != nil {
if strings.Contains(err.Error(), "not found") {
return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
@@ -282,7 +282,7 @@ func ListCollectionSourcesEndpoint(app *application.Application) echo.HandlerFun
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
sources, err := svc.ListCollectionSourcesForUser(userID, c.Param("name"))
sources, err := svc.ListCollectionSourcesForUser(userID, decodedParam(c, "name"))
if err != nil {
if strings.Contains(err.Error(), "not found") {
return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})

View File

@@ -0,0 +1,49 @@
package localai
import (
"net/http"
"net/http/httptest"
"github.com/labstack/echo/v4"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
// Regression for #10443: agent/collection names carry a "legacy-api-key:"
// prefix, so the ':' is percent-encoded as %3A in the request path. Echo routes
// such paths via URL.RawPath and stores the path-param value still escaped, so
// handlers must URL-decode it before looking the collection up in the store -
// otherwise the lookup sees "legacy-api-key%3ALiteraryResearch" and 404s.
var _ = Describe("decodedParam", func() {
var e *echo.Echo
BeforeEach(func() {
e = echo.New()
})
// route runs a request through Echo's real router so the path param is
// populated exactly as it would be in production, then returns the decoded
// value the handler would observe.
route := func(rawPath string) string {
var got string
e.GET("/api/agents/collections/:name/upload", func(c echo.Context) error {
got = decodedParam(c, "name")
return c.NoContent(http.StatusOK)
})
req := httptest.NewRequest(http.MethodGet, rawPath, nil)
rec := httptest.NewRecorder()
e.ServeHTTP(rec, req)
Expect(rec.Code).To(Equal(http.StatusOK))
return got
}
It("decodes a percent-encoded colon in the collection name", func() {
got := route("/api/agents/collections/legacy-api-key%3ALiteraryResearch/upload")
Expect(got).To(Equal("legacy-api-key:LiteraryResearch"))
})
It("leaves an unencoded name untouched", func() {
got := route("/api/agents/collections/PlainCollection/upload")
Expect(got).To(Equal("PlainCollection"))
})
})

View File

@@ -6,6 +6,7 @@ import (
"io"
"maps"
"net/http"
"net/url"
"os"
"path/filepath"
"slices"
@@ -33,6 +34,22 @@ func getUserID(c echo.Context) string {
return user.ID
}
// decodedParam returns the named path parameter, URL-decoding it.
//
// Echo routes a request via URL.RawPath whenever the path contains
// percent-encoded characters (e.g. %3A for ':'), and in that case stores the
// matched path-param value raw/escaped. Agent and collection names carry a
// "legacy-api-key:" prefix, so the ':' arrives as %3A and the raw param no
// longer matches the stored name. Callers must unescape before lookups.
// Falls back to the raw value if it isn't valid percent-encoding.
func decodedParam(c echo.Context, name string) string {
raw := c.Param(name)
if decoded, err := url.PathUnescape(raw); err == nil {
return decoded
}
return raw
}
// isAdminUser returns true if the authenticated user has admin role.
func isAdminUser(c echo.Context) bool {
user := auth.GetUser(c)
@@ -127,7 +144,7 @@ func GetAgentEndpoint(app *application.Application) echo.HandlerFunc {
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
name := c.Param("name")
name := decodedParam(c, "name")
statuses := svc.ListAgentsForUser(userID)
active, exists := statuses[name]
@@ -142,7 +159,7 @@ func UpdateAgentEndpoint(app *application.Application) echo.HandlerFunc {
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
name := c.Param("name")
name := decodedParam(c, "name")
var cfg state.AgentConfig
if err := c.Bind(&cfg); err != nil {
return c.JSON(http.StatusBadRequest, map[string]string{"error": err.Error()})
@@ -161,7 +178,7 @@ func DeleteAgentEndpoint(app *application.Application) echo.HandlerFunc {
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
name := c.Param("name")
name := decodedParam(c, "name")
if err := svc.DeleteAgentForUser(userID, name); err != nil {
return c.JSON(http.StatusInternalServerError, map[string]string{"error": err.Error()})
}
@@ -173,7 +190,7 @@ func GetAgentConfigEndpoint(app *application.Application) echo.HandlerFunc {
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
name := c.Param("name")
name := decodedParam(c, "name")
cfg := svc.GetAgentConfigForUser(userID, name)
if cfg == nil {
return c.JSON(http.StatusNotFound, map[string]string{"error": "Agent not found"})
@@ -186,7 +203,7 @@ func PauseAgentEndpoint(app *application.Application) echo.HandlerFunc {
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
if err := svc.PauseAgentForUser(userID, c.Param("name")); err != nil {
if err := svc.PauseAgentForUser(userID, decodedParam(c, "name")); err != nil {
return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
}
return c.JSON(http.StatusOK, map[string]string{"status": "ok"})
@@ -197,7 +214,7 @@ func ResumeAgentEndpoint(app *application.Application) echo.HandlerFunc {
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
if err := svc.ResumeAgentForUser(userID, c.Param("name")); err != nil {
if err := svc.ResumeAgentForUser(userID, decodedParam(c, "name")); err != nil {
return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
}
return c.JSON(http.StatusOK, map[string]string{"status": "ok"})
@@ -208,7 +225,7 @@ func GetAgentStatusEndpoint(app *application.Application) echo.HandlerFunc {
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
name := c.Param("name")
name := decodedParam(c, "name")
history := svc.GetAgentStatusForUser(userID, name)
if history == nil {
@@ -241,7 +258,7 @@ func GetAgentObservablesEndpoint(app *application.Application) echo.HandlerFunc
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
name := c.Param("name")
name := decodedParam(c, "name")
history, err := svc.GetAgentObservablesForUser(userID, name)
if err != nil {
@@ -261,7 +278,7 @@ func ClearAgentObservablesEndpoint(app *application.Application) echo.HandlerFun
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
name := c.Param("name")
name := decodedParam(c, "name")
if err := svc.ClearAgentObservablesForUser(userID, name); err != nil {
return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
}
@@ -273,7 +290,7 @@ func ChatWithAgentEndpoint(app *application.Application) echo.HandlerFunc {
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
name := c.Param("name")
name := decodedParam(c, "name")
var payload struct {
Message string `json:"message"`
}
@@ -302,7 +319,7 @@ func AgentSSEEndpoint(app *application.Application) echo.HandlerFunc {
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
name := c.Param("name")
name := decodedParam(c, "name")
// Try local SSE manager first
manager := svc.GetSSEManagerForUser(userID, name)
@@ -334,7 +351,7 @@ func ExportAgentEndpoint(app *application.Application) echo.HandlerFunc {
return func(c echo.Context) error {
svc := app.AgentPoolService()
userID := effectiveUserID(c)
name := c.Param("name")
name := decodedParam(c, "name")
data, err := svc.ExportAgentForUser(userID, name)
if err != nil {
return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})

View File

@@ -271,7 +271,14 @@ func UpdateSettingsEndpoint(app *application.Application) echo.HandlerFunc {
}
}
if settings.MITMListen != nil {
// Rebuild the MITM listener when its address OR the instance-wide
// default detectors change. The per-host detector map is resolved once
// at listener start (startMITMLocked → ResolvePIIPolicy), so a
// default-detector change is otherwise invisible to cloud-proxy traffic
// until the next restart — an admin toggling a default detector would
// see no redaction. RestartMITM is a no-op when the listener is
// disabled (empty address).
if settings.MITMListen != nil || settings.PIIDefaultDetectors != nil {
if err := app.RestartMITM(); err != nil {
xlog.Error("Failed to restart MITM proxy", "error", err)
return c.JSON(http.StatusInternalServerError, schema.SettingsResponse{

View File

@@ -146,6 +146,24 @@ var _ = Describe("Settings endpoints", func() {
Expect(*ondisk.PIIDefaultDetectors).To(Equal([]string{"det-a"}))
})
// The MITM listener resolves its per-host PII detectors once at start
// (startMITMLocked → ResolvePIIPolicy), and the handler used to restart it
// only when mitm_listen changed. So an admin toggling a default detector
// (the Middleware detector table POSTs only pii_default_detectors) left
// cloud-proxy traffic unredacted until the next reboot. A
// pii_default_detectors change must now rebuild the listener.
It("rebuilds the MITM listener when only pii_default_detectors changes", func() {
rec := post(`{"mitm_listen":"127.0.0.1:0"}`)
Expect(rec.Code).To(Equal(http.StatusOK), rec.Body.String())
srv1 := app.MITMServer()
Expect(srv1).ToNot(BeNil(), "listener should be running after mitm_listen is set")
rec = post(`{"pii_default_detectors":["det-a"]}`)
Expect(rec.Code).To(Equal(http.StatusOK), rec.Body.String())
Expect(app.MITMServer()).ToNot(BeIdenticalTo(srv1),
"a default-detector change must restart the listener so it picks up the new detectors")
})
// Residual #9125: enabling the watchdog from a cold (off) state via the
// React master toggle must start the live watchdog immediately, without a
// restart. The toggle posts watchdog_idle_enabled/busy_enabled=true while

View File

@@ -45,7 +45,7 @@
},
"scheduling": {
"title": "Penjadwalan",
"subtitle": "Aturan penempatan model dan replika di seluruh klaster"
"subtitle": "Aturan penempatan model dan replika di seluruh kluster"
},
"p2p": {
"title": "Komputasi AI Terdistribusi",
@@ -86,4 +86,4 @@
"title": "Penjelajah",
"subtitle": "Jelajahi file dan konfigurasi"
}
}
}

View File

@@ -72,7 +72,7 @@
"actions": {
"copy": "Salin",
"regenerate": "Hasilkan ulang",
"jumpToLatest": "Jump to latest"
"jumpToLatest": "Lompat ke terbaru"
},
"streaming": {
"transferring": "Mentransfer model...",
@@ -115,4 +115,4 @@
"clearAll": "Hapus semua",
"deleteAllTitle": "Hapus semua percakapan"
}
}
}

View File

@@ -1,8 +1,8 @@
{
"unsaved": {
"title": "Discard unsaved changes?",
"message": "You have unsaved changes that will be lost if you leave this page.",
"leave": "Leave"
"title": "Buang perubahan yang belum disimpan?",
"message": "Anda memiliki perubahan yang belum disimpan. Perubahan tersebut akan hilang jika Anda meninggalkan halaman ini.",
"leave": "Tinggalkan Halaman"
},
"actions": {
"save": "Simpan",

View File

@@ -7,15 +7,15 @@
"resourceGpu": "GPU",
"resourceRam": "RAM",
"greeting": {
"morning": "Good morning",
"afternoon": "Good afternoon",
"evening": "Good evening",
"night": "Working late"
"morning": "Selamat pagi",
"afternoon": "Selamat siang",
"evening": "Selamat malam",
"night": "Selamat lembur"
},
"statusLine": {
"modelsLoaded_one": "{{count}} model loaded",
"modelsLoaded_other": "{{count}} models loaded",
"noModelsLoaded": "No models loaded",
"modelsLoaded_one": "{{count}} model dimuat",
"modelsLoaded_other": "{{count}} model dimuat",
"noModelsLoaded": "Tidak ada model yang dimuat",
"nodes_one": "{{count}} node",
"nodes_other": "{{count}} nodes"
},
@@ -79,14 +79,14 @@
},
"connect": {
"title": "Satu endpoint, semua API",
"subtitle": "LocalAI menyediakan API miliknya sendiri yang lengkap — pembuatan gambar & video, depth, deteksi objek, reranking, audio, pengenalan wajah & suara, serta suara realtime melalui WebRTC dan WebSocket. Di atas itu, lapisan kompatibilitas drop-in membuat aplikasi apa pun yang dibuat untuk OpenAI, Anthropic, Ollama, atau OpenAI Responses bekerja tanpa perubahan.",
"subtitle": "LocalAI menyediakan API miliknya sendiri yang lengkap — pembuatan gambar & video, depth, deteksi objek, reranking, audio, pengenalan wajah & suara, serta suara realtime melalui WebRTC dan WebSocket. Selain itu, lapisan kompatibilitas drop-in membuat aplikasi apa pun yang dibuat untuk OpenAI, Anthropic, Ollama, atau OpenAI Responses bekerja tanpa perubahan.",
"nativeTitle": "API native",
"compatTitle": "Kompatibilitas drop-in",
"apiReference": "Referensi API lengkap",
"copy": "Salin",
"copied": "Disalin",
"browse": "Browse the API",
"hide": "Hide endpoints",
"dismiss": "Dismiss"
"browse": "Jelajahi API",
"hide": "Sembunyikan endpoint",
"dismiss": "Abaikan"
}
}

View File

@@ -5,7 +5,7 @@
"video": "Video",
"tts": "TTS",
"sound": "Suara",
"transform": "Transform"
"transform": "Transformasi"
}
},
"image": {
@@ -30,7 +30,7 @@
"refImagesAdded_other": "{{count}} gambar ditambahkan"
},
"actions": {
"view": "View",
"view": "Lihat",
"generate": "Hasilkan",
"generating": "Menghasilkan..."
},
@@ -153,4 +153,4 @@
"clearConfirm": "Hapus",
"cleared": "Riwayat dihapus"
}
}
}

View File

@@ -19,11 +19,11 @@
"operate": "Operasikan"
},
"operate": {
"inference": "Inference",
"cluster": "Cluster",
"observability": "Observability",
"access": "Access",
"system": "System"
"inference": "Inferensi",
"cluster": "Kluster",
"observability": "Observabilitas",
"access": "Akses",
"system": "Sistem"
},
"items": {
"home": "Beranda",
@@ -64,7 +64,7 @@
"copyright": "© 2023-{{year}} {{author}}"
},
"console": {
"automation": "Otomasi",
"automation": "Automasi",
"training": "Pelatihan"
}
}

View File

@@ -19,25 +19,40 @@ import (
// Per-replica: a single tracker instance is bound to (nodeID, modelName, replicaIndex).
// The router constructs one tracker per Route() result, so each in-flight tick lands
// on the correct row even when multiple replicas of the same model live on the same node.
//
// Embedding only grpc.ControlBackend (not the whole grpc.Backend) is what makes
// the in-flight accounting safe by construction: the control-plane methods pass
// through untracked, while every grpc.InferenceBackend method must be declared
// explicitly below to satisfy grpc.Backend. Adding an inference method to the
// interface therefore breaks this file's build (see the var assertion below)
// until it is wrapped with track() - so a new inference path can't be added
// without an in-flight accounting decision.
type InFlightTrackingClient struct {
grpc.Backend // embed for passthrough of untracked methods
registry InFlightTracker
nodeID string
modelName string
replicaIndex int
grpc.ControlBackend // passthrough for control-plane / streaming-constructor methods
inner grpc.InferenceBackend // tracked inference methods delegate here
registry InFlightTracker
nodeID string
modelName string
replicaIndex int
firstOnce sync.Once // guards onFirstComplete
onFirstComplete func() // called once after the first tracked inference call completes
}
// Compile-time contract: *InFlightTrackingClient must implement the FULL backend
// surface. Because it embeds only ControlBackend, this fails to compile if any
// InferenceBackend method is left unwrapped.
var _ grpc.Backend = (*InFlightTrackingClient)(nil)
// NewInFlightTrackingClient wraps a gRPC backend client with in-flight tracking.
func NewInFlightTrackingClient(inner grpc.Backend, registry InFlightTracker, nodeID, modelName string, replicaIndex int) *InFlightTrackingClient {
return &InFlightTrackingClient{
Backend: inner,
registry: registry,
nodeID: nodeID,
modelName: modelName,
replicaIndex: replicaIndex,
ControlBackend: inner,
inner: inner,
registry: registry,
nodeID: nodeID,
modelName: modelName,
replicaIndex: replicaIndex,
}
}
@@ -91,154 +106,162 @@ func (c *InFlightTrackingClient) reconcile(err error) error {
func (c *InFlightTrackingClient) Predict(ctx context.Context, in *pb.PredictOptions, opts ...ggrpc.CallOption) (*pb.Reply, error) {
defer c.track(ctx)()
reply, err := c.Backend.Predict(ctx, in, opts...)
reply, err := c.inner.Predict(ctx, in, opts...)
return reply, c.reconcile(err)
}
func (c *InFlightTrackingClient) PredictStream(ctx context.Context, in *pb.PredictOptions, f func(reply *pb.Reply), opts ...ggrpc.CallOption) error {
defer c.track(ctx)()
return c.reconcile(c.Backend.PredictStream(ctx, in, f, opts...))
return c.reconcile(c.inner.PredictStream(ctx, in, f, opts...))
}
func (c *InFlightTrackingClient) Embeddings(ctx context.Context, in *pb.PredictOptions, opts ...ggrpc.CallOption) (*pb.EmbeddingResult, error) {
defer c.track(ctx)()
res, err := c.Backend.Embeddings(ctx, in, opts...)
res, err := c.inner.Embeddings(ctx, in, opts...)
return res, c.reconcile(err)
}
func (c *InFlightTrackingClient) GenerateImage(ctx context.Context, in *pb.GenerateImageRequest, opts ...ggrpc.CallOption) (*pb.Result, error) {
defer c.track(ctx)()
res, err := c.Backend.GenerateImage(ctx, in, opts...)
res, err := c.inner.GenerateImage(ctx, in, opts...)
return res, c.reconcile(err)
}
func (c *InFlightTrackingClient) GenerateVideo(ctx context.Context, in *pb.GenerateVideoRequest, opts ...ggrpc.CallOption) (*pb.Result, error) {
defer c.track(ctx)()
res, err := c.Backend.GenerateVideo(ctx, in, opts...)
res, err := c.inner.GenerateVideo(ctx, in, opts...)
return res, c.reconcile(err)
}
func (c *InFlightTrackingClient) TTS(ctx context.Context, in *pb.TTSRequest, opts ...ggrpc.CallOption) (*pb.Result, error) {
defer c.track(ctx)()
res, err := c.Backend.TTS(ctx, in, opts...)
res, err := c.inner.TTS(ctx, in, opts...)
return res, c.reconcile(err)
}
func (c *InFlightTrackingClient) TTSStream(ctx context.Context, in *pb.TTSRequest, f func(reply *pb.Reply), opts ...ggrpc.CallOption) error {
defer c.track(ctx)()
return c.reconcile(c.Backend.TTSStream(ctx, in, f, opts...))
return c.reconcile(c.inner.TTSStream(ctx, in, f, opts...))
}
func (c *InFlightTrackingClient) SoundGeneration(ctx context.Context, in *pb.SoundGenerationRequest, opts ...ggrpc.CallOption) (*pb.Result, error) {
defer c.track(ctx)()
res, err := c.Backend.SoundGeneration(ctx, in, opts...)
res, err := c.inner.SoundGeneration(ctx, in, opts...)
return res, c.reconcile(err)
}
func (c *InFlightTrackingClient) AudioTranscription(ctx context.Context, in *pb.TranscriptRequest, opts ...ggrpc.CallOption) (*pb.TranscriptResult, error) {
defer c.track(ctx)()
res, err := c.Backend.AudioTranscription(ctx, in, opts...)
res, err := c.inner.AudioTranscription(ctx, in, opts...)
return res, c.reconcile(err)
}
func (c *InFlightTrackingClient) AudioTranscriptionStream(ctx context.Context, in *pb.TranscriptRequest, f func(chunk *pb.TranscriptStreamResponse), opts ...ggrpc.CallOption) error {
defer c.track(ctx)()
return c.reconcile(c.Backend.AudioTranscriptionStream(ctx, in, f, opts...))
return c.reconcile(c.inner.AudioTranscriptionStream(ctx, in, f, opts...))
}
func (c *InFlightTrackingClient) Detect(ctx context.Context, in *pb.DetectOptions, opts ...ggrpc.CallOption) (*pb.DetectResponse, error) {
defer c.track(ctx)()
res, err := c.Backend.Detect(ctx, in, opts...)
res, err := c.inner.Detect(ctx, in, opts...)
return res, c.reconcile(err)
}
func (c *InFlightTrackingClient) Depth(ctx context.Context, in *pb.DepthRequest, opts ...ggrpc.CallOption) (*pb.DepthResponse, error) {
defer c.track(ctx)()
res, err := c.Backend.Depth(ctx, in, opts...)
res, err := c.inner.Depth(ctx, in, opts...)
return res, c.reconcile(err)
}
func (c *InFlightTrackingClient) Rerank(ctx context.Context, in *pb.RerankRequest, opts ...ggrpc.CallOption) (*pb.RerankResult, error) {
defer c.track(ctx)()
res, err := c.Backend.Rerank(ctx, in, opts...)
res, err := c.inner.Rerank(ctx, in, opts...)
return res, c.reconcile(err)
}
func (c *InFlightTrackingClient) VAD(ctx context.Context, in *pb.VADRequest, opts ...ggrpc.CallOption) (*pb.VADResponse, error) {
defer c.track(ctx)()
res, err := c.Backend.VAD(ctx, in, opts...)
res, err := c.inner.VAD(ctx, in, opts...)
return res, c.reconcile(err)
}
func (c *InFlightTrackingClient) Diarize(ctx context.Context, in *pb.DiarizeRequest, opts ...ggrpc.CallOption) (*pb.DiarizeResponse, error) {
defer c.track(ctx)()
res, err := c.Backend.Diarize(ctx, in, opts...)
res, err := c.inner.Diarize(ctx, in, opts...)
return res, c.reconcile(err)
}
func (c *InFlightTrackingClient) FaceVerify(ctx context.Context, in *pb.FaceVerifyRequest, opts ...ggrpc.CallOption) (*pb.FaceVerifyResponse, error) {
defer c.track(ctx)()
res, err := c.Backend.FaceVerify(ctx, in, opts...)
res, err := c.inner.FaceVerify(ctx, in, opts...)
return res, c.reconcile(err)
}
func (c *InFlightTrackingClient) FaceAnalyze(ctx context.Context, in *pb.FaceAnalyzeRequest, opts ...ggrpc.CallOption) (*pb.FaceAnalyzeResponse, error) {
defer c.track(ctx)()
res, err := c.Backend.FaceAnalyze(ctx, in, opts...)
res, err := c.inner.FaceAnalyze(ctx, in, opts...)
return res, c.reconcile(err)
}
func (c *InFlightTrackingClient) VoiceVerify(ctx context.Context, in *pb.VoiceVerifyRequest, opts ...ggrpc.CallOption) (*pb.VoiceVerifyResponse, error) {
defer c.track(ctx)()
res, err := c.Backend.VoiceVerify(ctx, in, opts...)
res, err := c.inner.VoiceVerify(ctx, in, opts...)
return res, c.reconcile(err)
}
func (c *InFlightTrackingClient) VoiceAnalyze(ctx context.Context, in *pb.VoiceAnalyzeRequest, opts ...ggrpc.CallOption) (*pb.VoiceAnalyzeResponse, error) {
defer c.track(ctx)()
res, err := c.Backend.VoiceAnalyze(ctx, in, opts...)
res, err := c.inner.VoiceAnalyze(ctx, in, opts...)
return res, c.reconcile(err)
}
func (c *InFlightTrackingClient) VoiceEmbed(ctx context.Context, in *pb.VoiceEmbedRequest, opts ...ggrpc.CallOption) (*pb.VoiceEmbedResponse, error) {
defer c.track(ctx)()
res, err := c.Backend.VoiceEmbed(ctx, in, opts...)
res, err := c.inner.VoiceEmbed(ctx, in, opts...)
return res, c.reconcile(err)
}
func (c *InFlightTrackingClient) TokenClassify(ctx context.Context, in *pb.TokenClassifyRequest, opts ...ggrpc.CallOption) (*pb.TokenClassifyResponse, error) {
defer c.track(ctx)()
res, err := c.Backend.TokenClassify(ctx, in, opts...)
res, err := c.inner.TokenClassify(ctx, in, opts...)
return res, c.reconcile(err)
}
func (c *InFlightTrackingClient) Score(ctx context.Context, in *pb.ScoreRequest, opts ...ggrpc.CallOption) (*pb.ScoreResponse, error) {
defer c.track(ctx)()
res, err := c.Backend.Score(ctx, in, opts...)
res, err := c.inner.Score(ctx, in, opts...)
return res, c.reconcile(err)
}
func (c *InFlightTrackingClient) SoundDetection(ctx context.Context, in *pb.SoundDetectionRequest, opts ...ggrpc.CallOption) (*pb.SoundDetectionResponse, error) {
defer c.track(ctx)()
res, err := c.inner.SoundDetection(ctx, in, opts...)
return res, c.reconcile(err)
}
func (c *InFlightTrackingClient) AudioEncode(ctx context.Context, in *pb.AudioEncodeRequest, opts ...ggrpc.CallOption) (*pb.AudioEncodeResult, error) {
defer c.track(ctx)()
res, err := c.Backend.AudioEncode(ctx, in, opts...)
res, err := c.inner.AudioEncode(ctx, in, opts...)
return res, c.reconcile(err)
}
func (c *InFlightTrackingClient) AudioDecode(ctx context.Context, in *pb.AudioDecodeRequest, opts ...ggrpc.CallOption) (*pb.AudioDecodeResult, error) {
defer c.track(ctx)()
res, err := c.Backend.AudioDecode(ctx, in, opts...)
res, err := c.inner.AudioDecode(ctx, in, opts...)
return res, c.reconcile(err)
}
func (c *InFlightTrackingClient) AudioTransform(ctx context.Context, in *pb.AudioTransformRequest, opts ...ggrpc.CallOption) (*pb.AudioTransformResult, error) {
defer c.track(ctx)()
res, err := c.Backend.AudioTransform(ctx, in, opts...)
res, err := c.inner.AudioTransform(ctx, in, opts...)
return res, c.reconcile(err)
}
// AudioTransformStream, AudioToAudioStream and Forward are deliberately left as
// embedded passthrough: they return a stream client and the inference spans the
// stream's lifetime, not the constructor call. Wrapping the constructor with
// track() would increment and immediately decrement (and fire onFirstComplete)
// before any audio flows. Tracking those correctly needs the done() func tied to
// stream close, which the current Backend interface doesn't surface here.
// AudioTransformStream, AudioToAudioStream and Forward live in grpc.ControlBackend
// and are passed through via the embedded field, NOT tracked: they return a stream
// client and the inference spans the stream's lifetime, not the constructor call.
// Wrapping the constructor with track() would increment and immediately decrement
// (and fire onFirstComplete) before any audio flows. Tracking those correctly needs
// the done() func tied to stream close, which the Backend interface doesn't surface
// here. If they ever need tracking, move them to grpc.InferenceBackend - the build
// will then force an explicit wrapper here.

View File

@@ -408,6 +408,13 @@ var _ = Describe("InFlightTrackingClient", func() {
return err
})
})
It("SoundDetection", func() {
assertTracked(func() error {
_, err := client.SoundDetection(context.Background(), &pb.SoundDetectionRequest{})
return err
})
})
})
Describe("stale model reload (self-heal)", func() {

View File

@@ -185,6 +185,13 @@ It is persisted through `POST /api/settings` and read live, so a change takes
effect on the next request without a restart. A default that names a model no
longer loaded still appears (marked *not loaded*) so it can be toggled off.
The default set can also be supplied out-of-band with the
`LOCALAI_PII_DEFAULT_DETECTORS` environment variable (comma-separated model
names, e.g. `privacy-filter-nemotron,secret-filter`). When set it takes
precedence over the value persisted via the UI (env > file), which is the
right behaviour for immutable container deployments that pin filtering policy
at boot rather than via the admin UI.
This is what makes `cloud-proxy` / MITM redaction work out of the box: those
backends default to PII-enabled but ship no detector list, so without a
default detector the filter runs with nothing to scan. Set one here and

View File

@@ -1,4 +1,54 @@
---
- name: "lfm2.5-1.2b-instruct"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:
- https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF
description: |
Try LFM • Docs • LEAP • Discord
# LFM2.5-1.2B-Instruct
LFM2.5 is a new family of hybrid models designed for **on-device deployment**. It builds on the LFM2 architecture with extended pre-training and reinforcement learning.
- **Best-in-class performance**: A 1.2B model rivaling much larger models, bringing high-quality AI to your pocket.
- **Fast edge inference**: 239 tok/s decode on AMD CPU, 82 tok/s on mobile NPU. Runs under 1GB of memory with day-one support for llama.cpp, MLX, and vLLM.
- **Scaled training**: Extended pre-training from 10T to 28T tokens and large-scale multi-stage reinforcement learning.
Find more information about LFM2.5 in our blog post.
## 🗒️ Model Details
LFM2.5-1.2B-Instruct is a general-purpose text-only model with the following features:
...
license: "other"
tags:
- llm
- gguf
icon: https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/dxnYF2fuLpulismtFSGFi.png
overrides:
backend: llama-cpp
function:
automatic_tool_parsing_fallback: true
grammar:
disable: true
known_usecases:
- chat
options:
- use_jinja:true
parameters:
min_p: 0.15
model: llama-cpp/models/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q4_K_M.gguf
repeat_penalty: 1.05
temperature: 0.1
top_k: 50
top_p: 0.1
template:
use_tokenizer_template: true
files:
- filename: llama-cpp/models/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q4_K_M.gguf
sha256: b1b3de114215d9507409a662a501a631095a479a419584e8a2ded6304b19b4f5
uri: https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF/resolve/main/LFM2.5-1.2B-Instruct-Q4_K_M.gguf
- name: "qwopus3.6-27b-coder-compat-mtp"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:

View File

@@ -41,11 +41,34 @@ func buildClient(address string, parallel bool, wd WatchDog, enableWatchDog bool
}
}
// Backend is the full client surface of a model backend. It is deliberately
// composed of two sub-interfaces so that wrappers can get a COMPILE-TIME
// guarantee about which methods they must account for:
//
// - InferenceBackend - methods that each perform one discrete inference call
// (the call begins on entry and ends on return). A wrapper that does
// per-call accounting - e.g. the distributed router's in-flight tracker,
// core/services/nodes.InFlightTrackingClient - embeds only ControlBackend
// and implements every InferenceBackend method explicitly. Adding a method
// to InferenceBackend therefore breaks that wrapper's build until it is
// implemented: inference can't be added without an accounting decision.
// - ControlBackend - everything that is NOT a discrete inference call:
// lifecycle/control-plane operations and the streaming constructors whose
// work spans the returned stream rather than the constructor call. These
// are safe to pass through untracked.
//
// Keep the two sets disjoint; every backend method belongs to exactly one.
type Backend interface {
IsBusy() bool
HealthCheck(ctx context.Context) (bool, error)
InferenceBackend
ControlBackend
}
// InferenceBackend is the subset of Backend whose methods each map to a single
// inference call. Wrappers that account for in-flight work must implement these
// explicitly (see Backend). Do NOT add methods that return a stream client or
// that are control-plane only - those belong in ControlBackend.
type InferenceBackend interface {
Embeddings(ctx context.Context, in *pb.PredictOptions, opts ...grpc.CallOption) (*pb.EmbeddingResult, error)
LoadModel(ctx context.Context, in *pb.ModelOptions, opts ...grpc.CallOption) (*pb.Result, error)
PredictStream(ctx context.Context, in *pb.PredictOptions, f func(reply *pb.Reply), opts ...grpc.CallOption) error
Predict(ctx context.Context, in *pb.PredictOptions, opts ...grpc.CallOption) (*pb.Reply, error)
GenerateImage(ctx context.Context, in *pb.GenerateImageRequest, opts ...grpc.CallOption) (*pb.Result, error)
@@ -53,6 +76,8 @@ type Backend interface {
TTS(ctx context.Context, in *pb.TTSRequest, opts ...grpc.CallOption) (*pb.Result, error)
TTSStream(ctx context.Context, in *pb.TTSRequest, f func(reply *pb.Reply), opts ...grpc.CallOption) error
SoundGeneration(ctx context.Context, in *pb.SoundGenerationRequest, opts ...grpc.CallOption) (*pb.Result, error)
AudioTranscription(ctx context.Context, in *pb.TranscriptRequest, opts ...grpc.CallOption) (*pb.TranscriptResult, error)
AudioTranscriptionStream(ctx context.Context, in *pb.TranscriptRequest, f func(chunk *pb.TranscriptStreamResponse), opts ...grpc.CallOption) error
Detect(ctx context.Context, in *pb.DetectOptions, opts ...grpc.CallOption) (*pb.DetectResponse, error)
Depth(ctx context.Context, in *pb.DepthRequest, opts ...grpc.CallOption) (*pb.DepthResponse, error)
FaceVerify(ctx context.Context, in *pb.FaceVerifyRequest, opts ...grpc.CallOption) (*pb.FaceVerifyResponse, error)
@@ -60,8 +85,25 @@ type Backend interface {
VoiceVerify(ctx context.Context, in *pb.VoiceVerifyRequest, opts ...grpc.CallOption) (*pb.VoiceVerifyResponse, error)
VoiceAnalyze(ctx context.Context, in *pb.VoiceAnalyzeRequest, opts ...grpc.CallOption) (*pb.VoiceAnalyzeResponse, error)
VoiceEmbed(ctx context.Context, in *pb.VoiceEmbedRequest, opts ...grpc.CallOption) (*pb.VoiceEmbedResponse, error)
AudioTranscription(ctx context.Context, in *pb.TranscriptRequest, opts ...grpc.CallOption) (*pb.TranscriptResult, error)
AudioTranscriptionStream(ctx context.Context, in *pb.TranscriptRequest, f func(chunk *pb.TranscriptStreamResponse), opts ...grpc.CallOption) error
Rerank(ctx context.Context, in *pb.RerankRequest, opts ...grpc.CallOption) (*pb.RerankResult, error)
TokenClassify(ctx context.Context, in *pb.TokenClassifyRequest, opts ...grpc.CallOption) (*pb.TokenClassifyResponse, error)
Score(ctx context.Context, in *pb.ScoreRequest, opts ...grpc.CallOption) (*pb.ScoreResponse, error)
VAD(ctx context.Context, in *pb.VADRequest, opts ...grpc.CallOption) (*pb.VADResponse, error)
Diarize(ctx context.Context, in *pb.DiarizeRequest, opts ...grpc.CallOption) (*pb.DiarizeResponse, error)
SoundDetection(ctx context.Context, in *pb.SoundDetectionRequest, opts ...grpc.CallOption) (*pb.SoundDetectionResponse, error)
AudioEncode(ctx context.Context, in *pb.AudioEncodeRequest, opts ...grpc.CallOption) (*pb.AudioEncodeResult, error)
AudioDecode(ctx context.Context, in *pb.AudioDecodeRequest, opts ...grpc.CallOption) (*pb.AudioDecodeResult, error)
AudioTransform(ctx context.Context, in *pb.AudioTransformRequest, opts ...grpc.CallOption) (*pb.AudioTransformResult, error)
}
// ControlBackend is the subset of Backend that is NOT per-call inference:
// lifecycle/control-plane operations and the streaming constructors whose work
// spans the returned stream rather than the constructor call. In-flight-tracking
// wrappers embed this directly and pass it through untracked (see Backend).
type ControlBackend interface {
IsBusy() bool
HealthCheck(ctx context.Context) (bool, error)
LoadModel(ctx context.Context, in *pb.ModelOptions, opts ...grpc.CallOption) (*pb.Result, error)
TokenizeString(ctx context.Context, in *pb.PredictOptions, opts ...grpc.CallOption) (*pb.TokenizationResponse, error)
Status(ctx context.Context) (*pb.StatusResponse, error)
@@ -70,24 +112,11 @@ type Backend interface {
StoresGet(ctx context.Context, in *pb.StoresGetOptions, opts ...grpc.CallOption) (*pb.StoresGetResult, error)
StoresFind(ctx context.Context, in *pb.StoresFindOptions, opts ...grpc.CallOption) (*pb.StoresFindResult, error)
Rerank(ctx context.Context, in *pb.RerankRequest, opts ...grpc.CallOption) (*pb.RerankResult, error)
TokenClassify(ctx context.Context, in *pb.TokenClassifyRequest, opts ...grpc.CallOption) (*pb.TokenClassifyResponse, error)
Score(ctx context.Context, in *pb.ScoreRequest, opts ...grpc.CallOption) (*pb.ScoreResponse, error)
GetTokenMetrics(ctx context.Context, in *pb.MetricsRequest, opts ...grpc.CallOption) (*pb.MetricsResponse, error)
VAD(ctx context.Context, in *pb.VADRequest, opts ...grpc.CallOption) (*pb.VADResponse, error)
Diarize(ctx context.Context, in *pb.DiarizeRequest, opts ...grpc.CallOption) (*pb.DiarizeResponse, error)
SoundDetection(ctx context.Context, in *pb.SoundDetectionRequest, opts ...grpc.CallOption) (*pb.SoundDetectionResponse, error)
AudioEncode(ctx context.Context, in *pb.AudioEncodeRequest, opts ...grpc.CallOption) (*pb.AudioEncodeResult, error)
AudioDecode(ctx context.Context, in *pb.AudioDecodeRequest, opts ...grpc.CallOption) (*pb.AudioDecodeResult, error)
AudioTransform(ctx context.Context, in *pb.AudioTransformRequest, opts ...grpc.CallOption) (*pb.AudioTransformResult, error)
// Streaming constructors: these return a stream client immediately; the
// actual inference spans the stream's lifetime, not this call, so they are
// NOT tracked as a single in-flight unit.
AudioTransformStream(ctx context.Context, opts ...grpc.CallOption) (AudioTransformStreamClient, error)
AudioToAudioStream(ctx context.Context, opts ...grpc.CallOption) (AudioToAudioStreamClient, error)