fix(hipblas): correct amdgpu.ids source package name in comment

Verified against the real rocm/dev-ubuntu-24.04:7.2.1 image with hipblas-dev/hipblaslt-dev/rocblas-dev installed: /usr/share/libdrm/amdgpu.ids is owned by libdrm-common, not libdrm-amdgpu1 as the comment said. Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
fix(hipblas): symlink amdgpu.ids so ROCm backends find the ASIC ID table
2026-07-01 20:07:18 -04:00 · 2026-07-01 20:16:57 +00:00 · 2026-07-01 20:06:41 +00:00 · 2026-07-01 21:56:59 +02:00 · 2026-07-01 21:56:41 +02:00 · 2026-07-01 21:56:21 +02:00
16 changed files with 91 additions and 16 deletions
--- a/11
+++ b/11
@@ -171,6 +171,17 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ]; then \
    ln -s /opt/rocm-**/lib/llvm/lib/libomp.so /usr/lib/libomp.so \
    ; fi

+# ROCm's bundled libdrm_amdgpu is built with a hardcoded fallback lookup path
+# for the ASIC ID table (/opt/amdgpu/share/libdrm/amdgpu.ids), which only exists
+# if AMD's full amdgpu graphics/DKMS stack is installed. This compute-only image
+# doesn't have it, so hipblas/rocBLAS log "No such file or directory" on every
+# model load and can fail to identify the GPU. Point it at the equivalent file
+# Ubuntu's libdrm-common package already ships.
+RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ -f /usr/share/libdrm/amdgpu.ids ] && [ ! -e /opt/amdgpu/share/libdrm/amdgpu.ids ]; then \
+    mkdir -p /opt/amdgpu/share/libdrm && \
+    ln -s /usr/share/libdrm/amdgpu.ids /opt/amdgpu/share/libdrm/amdgpu.ids \
+    ; fi
+
 RUN expr "${BUILD_TYPE}" = intel && echo "intel" > /run/localai/capability || echo "not intel"

 # Cuda
--- a/backend/cpp/ik-llama-cpp/Makefile
+++ b/backend/cpp/ik-llama-cpp/Makefile
@@ -1,5 +1,5 @@

-IK_LLAMA_VERSION?=f74a6fb87b315b2c3154166e075360e15021a61d
+IK_LLAMA_VERSION?=29431b31c89e79c10f8736e8f2742485ba1713d6
 LLAMA_REPO?=https://github.com/ikawrakow/ik_llama.cpp

 CMAKE_ARGS?=
--- a/backend/cpp/llama-cpp/Makefile
+++ b/backend/cpp/llama-cpp/Makefile
@@ -1,5 +1,5 @@

-LLAMA_VERSION?=6f4f53f2b7da54fcdbbecaaa734337c337ad6176
+LLAMA_VERSION?=0eca4d490e591d4e93058d07540cf47278a72577
 LLAMA_REPO?=https://github.com/ggerganov/llama.cpp

 CMAKE_ARGS?=
--- a/backend/go/crispasr/Makefile
+++ b/backend/go/crispasr/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # CrispASR version (release tag)
 CRISPASR_REPO?=https://github.com/CrispStrobe/CrispASR
-CRISPASR_VERSION?=3b93758f9725d400eca82976f895e4cec3f31260
+CRISPASR_VERSION?=8fd9db8fec8cb5e929d23d3267ed5817794feb1a
 SO_TARGET?=libgocrispasr.so

 CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
--- a/backend/go/stablediffusion-ggml/Makefile
+++ b/backend/go/stablediffusion-ggml/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # stablediffusion.cpp (ggml)
 STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
-STABLEDIFFUSION_GGML_VERSION?=3b6c9ca97cfcda8e68e719e6670d06379fcbe943
+STABLEDIFFUSION_GGML_VERSION?=484baa41e5e006c52dcd4addc38c830b9489745f

 CMAKE_ARGS+=-DGGML_MAX_NAME=128

--- a/backend/go/stablediffusion-ggml/cpp/gosd.cpp
+++ b/backend/go/stablediffusion-ggml/cpp/gosd.cpp
@@ -798,6 +798,7 @@ void sd_img_gen_params_set_seed(sd_img_gen_params_t *params, int64_t seed) {
 int gen_image(sd_img_gen_params_t *p, int steps, char *dst, float cfg_scale, char *src_image, float strength, char *mask_image, char* ref_images[], int ref_images_count) {

    sd_image_t* results;
+    int num_results_out = 0;

    std::vector<int> skip_layers = {7, 8, 9};

@@ -994,10 +995,14 @@ int gen_image(sd_img_gen_params_t *p, int steps, char *dst, float cfg_scale, cha
            sd_ctx_params_to_str(&ctx_params),
            sd_img_gen_params_to_str(p));

-    results = generate_image(sd_c, p);
+    bool gen_ok = generate_image(sd_c, p, &results, &num_results_out);

    std::free(p);

+    if (!gen_ok || num_results_out == 0) {
+        results = NULL;
+    }
+
    if (results == NULL) {
        fprintf (stderr, "NO results\n");
        if (input_image_buffer) free(input_image_buffer);
--- a/backend/go/whisper/Makefile
+++ b/backend/go/whisper/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # whisper.cpp version
 WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp
-WHISPER_CPP_VERSION?=0ae02cdb2c7317b50991367c165736ce42ed96ac
+WHISPER_CPP_VERSION?=0874de3e8e8e48361dba85c7fe6d176f008bf158
 SO_TARGET?=libgowhisper.so

 CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
--- a/backend/python/vllm/install.sh
+++ b/backend/python/vllm/install.sh
@@ -104,7 +104,7 @@ if [ "$(uname -s)" = "Darwin" ]; then
    # can rewrite it. Darwin therefore follows vllm-metal and can lag the Linux
    # vllm pin (requirements-cublas13-after.txt, bumped independently against
    # vllm/vllm) until vllm-metal supports a newer vLLM.
-    VLLM_METAL_VERSION="v0.3.0.dev20260628073537"
+    VLLM_METAL_VERSION="v0.3.0.dev20260630095652"

    # The coupled vLLM source version is whatever this vllm-metal release builds
    # against -- it declares it in its own installer as `vllm_v=`. Derive it from
--- a/backend/python/vllm/requirements-cublas13-after.txt
+++ b/backend/python/vllm/requirements-cublas13-after.txt
@@ -3,8 +3,8 @@
 # on a cu130 host. Pull the cu130-flavoured wheel from vLLM's per-tag index
 # instead — the cublas13 case in install.sh adds --index-strategy=unsafe-best-match
 # so uv consults this index alongside PyPI.
--extra-index-url https://wheels.vllm.ai/0.23.0/cu130
+--extra-index-url https://wheels.vllm.ai/0.24.0/cu130
 # VERSION COUPLING: darwin/Apple-Silicon builds use vllm-metal (see install.sh),
 # which pins this exact vLLM version. Bumping vllm here means coordinating with a
 # vllm-metal release that supports the new version, or macOS/Metal builds break.
-vllm==0.23.0
+vllm==0.24.0
--- a/backend/rust/kokoros/src/service.rs
+++ b/backend/rust/kokoros/src/service.rs
@@ -351,6 +351,16 @@ impl Backend for KokorosService {
        Err(Status::unimplemented("Not supported"))
    }

+    type AudioTranscriptionLiveStream =
+        ReceiverStream<Result<backend::TranscriptLiveResponse, Status>>;
+
+    async fn audio_transcription_live(
+        &self,
+        _: Request<tonic::Streaming<backend::TranscriptLiveRequest>>,
+    ) -> Result<Response<Self::AudioTranscriptionLiveStream>, Status> {
+        Err(Status::unimplemented("Not supported"))
+    }
+
    async fn diarize(
        &self,
        _: Request<backend::DiarizeRequest>,
--- a/cmd/launcher/internal/launcher.go
+++ b/cmd/launcher/internal/launcher.go
@@ -207,12 +207,20 @@ func (l *Launcher) StartLocalAI() error {
 	}

 	// Build command arguments
+	dataPath := l.GetDataPath()
 	args := []string{
 		"run",
 		"--models-path", l.config.ModelsPath,
 		"--backends-path", l.config.BackendsPath,
 		"--address", l.config.Address,
 		"--log-level", l.config.LogLevel,
+		// Keep persistent data and dynamic config under the launcher's data
+		// directory (~/.localai) rather than letting the server resolve them
+		// to ${basepath}/{data,configuration}. ${basepath} expands to the
+		// launcher process's CWD (often the user's home root), which puts
+		// ~/data and ~/configuration outside ~/.localai. See #10610.
+		"--data-path", filepath.Join(dataPath, "data"),
+		"--localai-config-dir", filepath.Join(dataPath, "configuration"),
 	}

 	l.localaiCmd = exec.CommandContext(l.ctx, binaryPath, args...)
--- a/docs/data/version.json
+++ b/docs/data/version.json
@@ -1,3 +1,3 @@
 {
-  "version": "v4.5.5"
+  "version": "v4.5.6"
 }
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -1716,7 +1716,7 @@
      - use_jinja:true
    parameters:
      min_p: 0.15
-      model: llama-cpp/models/LFM2.5-8B-A1B-GGUF/LFM2.5-8B-A1B-Q4_K_M.gguf
+      model: llama-cpp/models/LFM2.5-8B-A1B-GGUF/LFM2.5-8B-A1B-Q8_0.gguf
      repeat_penalty: 1.05
      temperature: 0.1
      top_k: 50
@@ -1724,9 +1724,9 @@
    template:
      use_tokenizer_template: true
  files:
-    - filename: llama-cpp/models/LFM2.5-8B-A1B-GGUF/LFM2.5-8B-A1B-Q4_K_M.gguf
-      uri: https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF/resolve/main/LFM2.5-8B-A1B-Q4_K_M.gguf
-      sha256: 4923ec14f06b968b74d663e5949867d2d9c3bf13a20b8be1a9f9af39989b2bb0
+    - filename: llama-cpp/models/LFM2.5-8B-A1B-GGUF/LFM2.5-8B-A1B-Q8_0.gguf
+      uri: https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF/resolve/main/LFM2.5-8B-A1B-Q8_0.gguf
+      sha256: 33ab3b8ce6a964fb8ebac89360c9b3cf72c4fa418d5e4c0a94d46883124d5c02
 - name: "qwopus3.5-9b-coder-mtp"
  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
  urls:
--- a/pkg/grpc/grpcerrors/errors.go
+++ b/pkg/grpc/grpcerrors/errors.go
@@ -58,6 +58,23 @@ func IsLiveTranscriptionUnsupported(err error) bool {
 	return strings.Contains(strings.ToLower(err.Error()), "unimplemented")
 }

+// IsUnimplemented reports whether err is a gRPC Unimplemented status — the
+// signal a backend gives for an RPC it does not implement. The generated
+// UnimplementedBackendServer stub returns exactly this for any RPC a backend
+// (e.g. a Python or external backend) has not overridden, so callers can treat
+// an optional RPC as a no-op rather than a failure. Prefers the typed status
+// code and falls back to the message for paths that lose the status (e.g. errors
+// wrapped across non-gRPC boundaries).
+func IsUnimplemented(err error) bool {
+	if err == nil {
+		return false
+	}
+	if status.Code(err) == codes.Unimplemented {
+		return true
+	}
+	return strings.Contains(strings.ToLower(err.Error()), "unimplemented")
+}
+
 // StreamTranscriptionUnsupported returns the canonical error a backend returns
 // when it (or the loaded model) cannot serve the server-streaming
 // AudioTranscriptionStream RPC. It carries codes.Unimplemented like the live
--- a/pkg/grpc/grpcerrors/errors_test.go
+++ b/pkg/grpc/grpcerrors/errors_test.go
@@ -55,6 +55,18 @@ var _ = Describe("grpcerrors", func() {
 		Expect(grpcerrors.IsModelNotLoaded(err)).To(BeFalse())
 	})

+	DescribeTable("IsUnimplemented",
+		func(err error, want bool) {
+			Expect(grpcerrors.IsUnimplemented(err)).To(Equal(want))
+		},
+		Entry("nil", nil, false),
+		Entry("typed code", status.Error(codes.Unimplemented, "method Free not implemented"), true),
+		Entry("stale stub message (Unknown code)", errors.New("rpc error: code = Unimplemented desc = "), true),
+		Entry("unrelated error", errors.New("context deadline exceeded"), false),
+		Entry("unrelated grpc code", status.Error(codes.Unavailable, "connection refused"), false),
+		Entry("model not loaded is NOT unimplemented", grpcerrors.ModelNotLoaded("parakeet-cpp"), false),
+	)
+
 	It("StreamTranscriptionUnsupported carries Unimplemented and is not ModelNotLoaded", func() {
 		err := grpcerrors.StreamTranscriptionUnsupported("parakeet-cpp", "not a streaming model")
 		Expect(status.Code(err)).To(Equal(codes.Unimplemented))
--- a/pkg/model/process.go
+++ b/pkg/model/process.go
@@ -11,6 +11,7 @@ import (
 	"time"

 	"github.com/hpcloud/tail"
+	"github.com/mudler/LocalAI/pkg/grpc/grpcerrors"
 	"github.com/mudler/LocalAI/pkg/signals"
 	process "github.com/mudler/go-processmanager"
 	"github.com/mudler/xlog"
@@ -52,10 +53,21 @@ func (ml *ModelLoader) deleteProcess(s string) error {
 		hook(s)
 	}

-	// Free GPU resources before stopping the process to ensure VRAM is released
+	// Free GPU resources before stopping the process to ensure VRAM is released.
+	// Free is optional: backends that don't override it (the generated stub, many
+	// Python/external backends, or a federation proxy in distributed mode) return
+	// gRPC Unimplemented. That is expected, not a failure — VRAM is reclaimed when
+	// the process is stopped below, or by the remote unloader for remote backends —
+	// so don't surface it as an error.
 	xlog.Debug("Calling Free() to release GPU resources", "model", s)
 	if err := model.GRPC(false, ml.wd).Free(context.Background()); err != nil {
-		xlog.Warn("Error freeing GPU resources", "error", err, "model", s)
+		if grpcerrors.IsUnimplemented(err) {
+			xlog.Debug("Backend does not implement Free(); GPU release handled on process stop", "model", s)
+		} else {
+			// Now that the expected Unimplemented case is filtered out above, a
+			// remaining error is a genuine failure to release VRAM — surface it.
+			xlog.Error("Error freeing GPU resources", "error", err, "model", s)
+		}
 	}

 	process := model.Process()
Author	SHA1	Message	Date
Ettore Di Giacinto	5ad4d86ec4	fix(hipblas): correct amdgpu.ids source package name in comment Verified against the real rocm/dev-ubuntu-24.04:7.2.1 image with hipblas-dev/hipblaslt-dev/rocblas-dev installed: /usr/share/libdrm/amdgpu.ids is owned by libdrm-common, not libdrm-amdgpu1 as the comment said. Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-07-01 20:16:57 +00:00
Ettore Di Giacinto	637e382f04	fix(hipblas): symlink amdgpu.ids so ROCm backends find the ASIC ID table ROCm's bundled libdrm_amdgpu looks up the GPU ASIC ID table at a hardcoded fallback path, /opt/amdgpu/share/libdrm/amdgpu.ids, which is only populated by AMD's full amdgpu-install (graphics/DKMS) stack. The hipblas image is compute-only and doesn't have it, so every model load logs "No such file or directory" and the GPU can't be identified. Symlink it to the equivalent file already shipped by Ubuntu's libdrm-amdgpu1 package. Fixes #10624 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-07-01 20:06:41 +00:00
LocalAI [bot]	703ea32de6	chore: ⬆️ Update vllm-metal (darwin) to `v0.3.0.dev20260630095652` (#10616 ) ⬆️ Update vllm-project/vllm-metal (darwin) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-01 21:56:59 +02:00
LocalAI [bot]	751db06e35	chore: ⬆️ Update CrispStrobe/CrispASR to `8fd9db8fec8cb5e929d23d3267ed5817794feb1a` (#10615 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-01 21:56:41 +02:00
LocalAI [bot]	f46c0e9c83	docs: ⬆️ update docs version mudler/LocalAI (#10614 ) ⬆️ Update docs version mudler/LocalAI Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-01 21:56:21 +02:00
LocalAI [bot]	0d8adfc59a	chore: ⬆️ Update ggml-org/llama.cpp to `0eca4d490e591d4e93058d07540cf47278a72577` (#10617 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-01 09:31:50 +02:00
LocalAI [bot]	43f2615e19	chore: ⬆️ Update vllm-project/vllm cu130 wheel to `0.24.0` (#10618 ) ⬆️ Update vllm-project/vllm cu130 wheel Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-01 08:53:03 +02:00
LocalAI [bot]	875c539ad5	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `29431b31c89e79c10f8736e8f2742485ba1713d6` (#10620 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-01 08:52:36 +02:00
LocalAI [bot]	d641ded194	chore: ⬆️ Update ggml-org/whisper.cpp to `0874de3e8e8e48361dba85c7fe6d176f008bf158` (#10621 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-01 08:43:40 +02:00
LocalAI [bot]	40445fff05	chore: ⬆️ Update leejet/stable-diffusion.cpp to `484baa41e5e006c52dcd4addc38c830b9489745f` (#10619 ) * ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * fix(stablediffusion-ggml): adapt to new generate_image() out-param signature leejet/stable-diffusion.cpp@484baa4 changed generate_image() from returning sd_image_t* to returning bool with images_out/num_images_out out-parameters (same pattern already used by generate_video()). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-07-01 08:32:57 +02:00
Tai An	057dee956a	fix(launcher): keep data/config under ~/.localai (#10610 ) (#10613 ) The launcher starts the server with run --models-path/--backends-path but leaves --data-path and the dynamic config dir unset, so the server falls back to its /data and /configuration defaults. is kong.ExpandPath("."), i.e. the launcher process CWD (commonly the user's home root), producing ~/data and ~/configuration outside ~/.localai and an agent-pool stateDir under ~/data. Pass --data-path and --localai-config-dir explicitly, rooted at the launcher's own data directory (GetDataPath() -> ~/.localai), so data and config stay consistent with --models-path/--backends-path.	2026-06-30 22:14:59 +02:00
Adira	4ec39bb776	fix(watchdog): don't log optional Free() as an error when backend returns Unimplemented (#10602 ) (#10607 ) * fix(watchdog): don't log optional Free() as an error when backend returns Unimplemented (#10602) When the watchdog evicts a model, deleteProcess calls the backend's gRPC Free() to release VRAM before stopping the process. Free is optional: backends that don't override it -- the generated UnimplementedBackendServer stub, many Python/external backends, or a federation proxy in distributed mode -- return gRPC Unimplemented. That is expected, not a failure: VRAM is reclaimed when the local process is stopped, or by the remote unloader for remote backends. Logging it as "WARN Error freeing GPU resources" made a benign, optional RPC look like a fault (the alarming line in #10602, seen in distributed mode where the model is remote and Free hits a stub). Treat gRPC Unimplemented from Free() as a no-op logged at Debug; genuine failures still Warn. Free() is still attempted for every backend, so any backend that does implement it is unaffected. Add a reusable grpcerrors.IsUnimplemented helper following the package's existing code-based detection idiom (prefer the typed status code, fall back to the message across non-gRPC boundaries), with table tests. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com> * fix(watchdog): log a non-Unimplemented Free() failure at error level Per review: now that the expected gRPC Unimplemented case is split out and logged at Debug, any remaining Free() error is a genuine failure to release VRAM, so surface it at error level instead of warn. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com> --------- Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com>	2026-06-30 22:14:01 +02:00
Ettore Di Giacinto	25ecb9f015	fix(gallery): use Q8_0 for lfm2.5-8b-a1b to fix poor tool-call quality The Q4_K_M quant degraded tool-call reliability for LFM2.5-8B-A1B. Switch the gallery entry to the Q8_0 GGUF (sha256 verified via HF x-linked-etag) while keeping the native jinja tool-parsing config. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code]	2026-06-30 17:46:20 +00:00
LocalAI [bot]	2be495f9c0	fix(kokoros): implement AudioTranscriptionLive trait stub (#10612 ) The backend.proto AudioTranscriptionLive bidirectional streaming RPC added new required trait items (AudioTranscriptionLiveStream + audio_transcription_live) on the generated Backend trait. The kokoros (TTS) backend did not implement them, breaking its release build with E0046 (missing trait items). kokoros is text-to-speech and has no live-ASR support, so stub the method to return UNIMPLEMENTED, mirroring the existing audio_transcription_stream stub. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-30 19:38:41 +02:00