fix(watchdog): don't log optional Free() as an error when backend returns Unimplemented (#10602) (#10607)

* fix(watchdog): don't log optional Free() as an error when backend returns Unimplemented (#10602)

When the watchdog evicts a model, deleteProcess calls the backend's gRPC
Free() to release VRAM before stopping the process. Free is optional:
backends that don't override it -- the generated UnimplementedBackendServer
stub, many Python/external backends, or a federation proxy in distributed
mode -- return gRPC Unimplemented. That is expected, not a failure: VRAM is
reclaimed when the local process is stopped, or by the remote unloader for
remote backends. Logging it as "WARN Error freeing GPU resources" made a
benign, optional RPC look like a fault (the alarming line in #10602, seen
in distributed mode where the model is remote and Free hits a stub).

Treat gRPC Unimplemented from Free() as a no-op logged at Debug; genuine
failures still Warn. Free() is still attempted for every backend, so any
backend that does implement it is unaffected.

Add a reusable grpcerrors.IsUnimplemented helper following the package's
existing code-based detection idiom (prefer the typed status code, fall
back to the message across non-gRPC boundaries), with table tests.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com>

* fix(watchdog): log a non-Unimplemented Free() failure at error level

Per review: now that the expected gRPC Unimplemented case is split out and
logged at Debug, any remaining Free() error is a genuine failure to release
VRAM, so surface it at error level instead of warn.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com>

---------

Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com>
This commit is contained in:
Adira
2026-06-30 23:14:01 +03:00
committed by GitHub
parent 25ecb9f015
commit 4ec39bb776
3 changed files with 43 additions and 2 deletions

View File

@@ -11,6 +11,7 @@ import (
"time"
"github.com/hpcloud/tail"
"github.com/mudler/LocalAI/pkg/grpc/grpcerrors"
"github.com/mudler/LocalAI/pkg/signals"
process "github.com/mudler/go-processmanager"
"github.com/mudler/xlog"
@@ -52,10 +53,21 @@ func (ml *ModelLoader) deleteProcess(s string) error {
hook(s)
}
// Free GPU resources before stopping the process to ensure VRAM is released
// Free GPU resources before stopping the process to ensure VRAM is released.
// Free is optional: backends that don't override it (the generated stub, many
// Python/external backends, or a federation proxy in distributed mode) return
// gRPC Unimplemented. That is expected, not a failure — VRAM is reclaimed when
// the process is stopped below, or by the remote unloader for remote backends —
// so don't surface it as an error.
xlog.Debug("Calling Free() to release GPU resources", "model", s)
if err := model.GRPC(false, ml.wd).Free(context.Background()); err != nil {
xlog.Warn("Error freeing GPU resources", "error", err, "model", s)
if grpcerrors.IsUnimplemented(err) {
xlog.Debug("Backend does not implement Free(); GPU release handled on process stop", "model", s)
} else {
// Now that the expected Unimplemented case is filtered out above, a
// remaining error is a genuine failure to release VRAM — surface it.
xlog.Error("Error freeing GPU resources", "error", err, "model", s)
}
}
process := model.Process()