Verified against the real rocm/dev-ubuntu-24.04:7.2.1 image with
hipblas-dev/hipblaslt-dev/rocblas-dev installed: /usr/share/libdrm/amdgpu.ids
is owned by libdrm-common, not libdrm-amdgpu1 as the comment said.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
ROCm's bundled libdrm_amdgpu looks up the GPU ASIC ID table at a
hardcoded fallback path, /opt/amdgpu/share/libdrm/amdgpu.ids, which is
only populated by AMD's full amdgpu-install (graphics/DKMS) stack. The
hipblas image is compute-only and doesn't have it, so every model load
logs "No such file or directory" and the GPU can't be identified.
Symlink it to the equivalent file already shipped by Ubuntu's
libdrm-amdgpu1 package.
Fixes#10624
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
The launcher starts the server with run --models-path/--backends-path but
leaves --data-path and the dynamic config dir unset, so the server falls
back to its /data and /configuration defaults.
is kong.ExpandPath("."), i.e. the launcher process CWD
(commonly the user's home root), producing ~/data and ~/configuration
outside ~/.localai and an agent-pool stateDir under ~/data.
Pass --data-path and --localai-config-dir explicitly, rooted at the
launcher's own data directory (GetDataPath() -> ~/.localai), so data and
config stay consistent with --models-path/--backends-path.
* fix(watchdog): don't log optional Free() as an error when backend returns Unimplemented (#10602)
When the watchdog evicts a model, deleteProcess calls the backend's gRPC
Free() to release VRAM before stopping the process. Free is optional:
backends that don't override it -- the generated UnimplementedBackendServer
stub, many Python/external backends, or a federation proxy in distributed
mode -- return gRPC Unimplemented. That is expected, not a failure: VRAM is
reclaimed when the local process is stopped, or by the remote unloader for
remote backends. Logging it as "WARN Error freeing GPU resources" made a
benign, optional RPC look like a fault (the alarming line in #10602, seen
in distributed mode where the model is remote and Free hits a stub).
Treat gRPC Unimplemented from Free() as a no-op logged at Debug; genuine
failures still Warn. Free() is still attempted for every backend, so any
backend that does implement it is unaffected.
Add a reusable grpcerrors.IsUnimplemented helper following the package's
existing code-based detection idiom (prefer the typed status code, fall
back to the message across non-gRPC boundaries), with table tests.
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com>
* fix(watchdog): log a non-Unimplemented Free() failure at error level
Per review: now that the expected gRPC Unimplemented case is split out and
logged at Debug, any remaining Free() error is a genuine failure to release
VRAM, so surface it at error level instead of warn.
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com>
---------
Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com>
The Q4_K_M quant degraded tool-call reliability for LFM2.5-8B-A1B.
Switch the gallery entry to the Q8_0 GGUF (sha256 verified via HF
x-linked-etag) while keeping the native jinja tool-parsing config.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
The backend.proto AudioTranscriptionLive bidirectional streaming RPC added
new required trait items (AudioTranscriptionLiveStream + audio_transcription_live)
on the generated Backend trait. The kokoros (TTS) backend did not implement
them, breaking its release build with E0046 (missing trait items).
kokoros is text-to-speech and has no live-ASR support, so stub the method to
return UNIMPLEMENTED, mirroring the existing audio_transcription_stream stub.
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-30 19:38:41 +02:00
16 changed files with 91 additions and 16 deletions
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.