chore(deps): bump the pip group across 5 directories with 1 update

Bumps the pip group with 1 update in the /backend/python/ace-step directory: torch. Bumps the pip group with 1 update in the /backend/python/rfdetr directory: torch. Bumps the pip group with 1 update in the /backend/python/sglang directory: torch. Bumps the pip group with 1 update in the /backend/python/trl directory: torch. Bumps the pip group with 1 update in the /backend/python/vllm-omni directory: torch. Updates `torch` from 2.10.0+rocm7.0 to 2.12.0+cpu Updates `torch` from 2.7.1 to 2.12.0+cu130 Updates `torch` from 2.9.0 to 2.12.0+cpu Updates `torch` from 2.10.0 to 2.12.0+cpu Updates `torch` from 2.7.0 to 2.12.0+cu130 --- updated-dependencies: - dependency-name: torch dependency-version: 2.12.0+cpu dependency-type: direct:production dependency-group: pip - dependency-name: torch dependency-version: 2.12.0+cu130 dependency-type: direct:production dependency-group: pip - dependency-name: torch dependency-version: 2.12.0+cpu dependency-type: direct:production dependency-group: pip - dependency-name: torch dependency-version: 2.12.0+cpu dependency-type: direct:production dependency-group: pip - dependency-name: torch dependency-version: 2.12.0+cu130 dependency-type: direct:production dependency-group: pip ... Signed-off-by: dependabot[bot] <support@github.com>
chore: ⬆️ Update ggml-org/llama.cpp to 0eca4d490e591d4e93058d07540cf47278a72577 (#10617 )
2026-07-01 20:07:18 -04:00 · 2026-07-01 18:55:56 +00:00 · 2026-07-01 09:31:50 +02:00 · 2026-07-01 08:53:03 +02:00 · 2026-07-01 08:52:36 +02:00 · 2026-07-01 08:43:40 +02:00
31 changed files with 101 additions and 37 deletions
--- a/backend/cpp/ik-llama-cpp/Makefile
+++ b/backend/cpp/ik-llama-cpp/Makefile
@@ -1,5 +1,5 @@

-IK_LLAMA_VERSION?=f74a6fb87b315b2c3154166e075360e15021a61d
+IK_LLAMA_VERSION?=29431b31c89e79c10f8736e8f2742485ba1713d6
 LLAMA_REPO?=https://github.com/ikawrakow/ik_llama.cpp

 CMAKE_ARGS?=
--- a/backend/cpp/llama-cpp/Makefile
+++ b/backend/cpp/llama-cpp/Makefile
@@ -1,5 +1,5 @@

-LLAMA_VERSION?=6f4f53f2b7da54fcdbbecaaa734337c337ad6176
+LLAMA_VERSION?=0eca4d490e591d4e93058d07540cf47278a72577
 LLAMA_REPO?=https://github.com/ggerganov/llama.cpp

 CMAKE_ARGS?=
--- a/backend/go/stablediffusion-ggml/Makefile
+++ b/backend/go/stablediffusion-ggml/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # stablediffusion.cpp (ggml)
 STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
-STABLEDIFFUSION_GGML_VERSION?=3b6c9ca97cfcda8e68e719e6670d06379fcbe943
+STABLEDIFFUSION_GGML_VERSION?=484baa41e5e006c52dcd4addc38c830b9489745f

 CMAKE_ARGS+=-DGGML_MAX_NAME=128

--- a/backend/go/stablediffusion-ggml/cpp/gosd.cpp
+++ b/backend/go/stablediffusion-ggml/cpp/gosd.cpp
@@ -798,6 +798,7 @@ void sd_img_gen_params_set_seed(sd_img_gen_params_t *params, int64_t seed) {
 int gen_image(sd_img_gen_params_t *p, int steps, char *dst, float cfg_scale, char *src_image, float strength, char *mask_image, char* ref_images[], int ref_images_count) {

    sd_image_t* results;
+    int num_results_out = 0;

    std::vector<int> skip_layers = {7, 8, 9};

@@ -994,10 +995,14 @@ int gen_image(sd_img_gen_params_t *p, int steps, char *dst, float cfg_scale, cha
            sd_ctx_params_to_str(&ctx_params),
            sd_img_gen_params_to_str(p));

-    results = generate_image(sd_c, p);
+    bool gen_ok = generate_image(sd_c, p, &results, &num_results_out);

    std::free(p);

+    if (!gen_ok || num_results_out == 0) {
+        results = NULL;
+    }
+
    if (results == NULL) {
        fprintf (stderr, "NO results\n");
        if (input_image_buffer) free(input_image_buffer);
--- a/backend/go/whisper/Makefile
+++ b/backend/go/whisper/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # whisper.cpp version
 WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp
-WHISPER_CPP_VERSION?=0ae02cdb2c7317b50991367c165736ce42ed96ac
+WHISPER_CPP_VERSION?=0874de3e8e8e48361dba85c7fe6d176f008bf158
 SO_TARGET?=libgowhisper.so

 CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
--- a/backend/python/ace-step/requirements-cpu.txt
+++ b/backend/python/ace-step/requirements-cpu.txt
@@ -4,7 +4,7 @@ torchaudio
 torchvision

 # Core dependencies
-transformers>=4.51.0,<4.58.0
+transformers>=5.12.1,<5.13.0
 diffusers
 gradio
 matplotlib>=3.7.5
--- a/backend/python/ace-step/requirements-cublas12.txt
+++ b/backend/python/ace-step/requirements-cublas12.txt
@@ -4,7 +4,7 @@ torchaudio
 torchvision

 # Core dependencies
-transformers>=4.51.0,<4.58.0
+transformers>=5.12.1,<5.13.0
 diffusers
 gradio>=6.5.1
 matplotlib>=3.7.5
--- a/backend/python/ace-step/requirements-cublas13.txt
+++ b/backend/python/ace-step/requirements-cublas13.txt
@@ -4,7 +4,7 @@ torchaudio
 torchvision

 # Core dependencies
-transformers>=4.51.0,<4.58.0
+transformers>=5.12.1,<5.13.0
 diffusers
 gradio>=6.5.1
 matplotlib>=3.7.5
--- a/backend/python/ace-step/requirements-hipblas.txt
+++ b/backend/python/ace-step/requirements-hipblas.txt
@@ -1,10 +1,10 @@
 --extra-index-url https://download.pytorch.org/whl/rocm7.0
-torch==2.10.0+rocm7.0
+torch==2.12.0+cpu
 torchaudio
 torchvision

 # Core dependencies
-transformers>=4.51.0,<4.58.0
+transformers>=5.12.1,<5.13.0
 diffusers
 gradio>=6.5.1
 matplotlib>=3.7.5
--- a/backend/python/ace-step/requirements-intel.txt
+++ b/backend/python/ace-step/requirements-intel.txt
@@ -4,7 +4,7 @@ torchaudio
 torchvision

 # Core dependencies
-transformers>=4.51.0,<4.58.0
+transformers>=5.12.1,<5.13.0
 diffusers
 gradio
 matplotlib>=3.7.5
--- a/backend/python/ace-step/requirements-l4t13.txt
+++ b/backend/python/ace-step/requirements-l4t13.txt
@@ -3,7 +3,7 @@ torch
 torchaudio
 torchvision
 # Core dependencies
-transformers>=4.51.0,<4.58.0
+transformers>=5.12.1,<5.13.0
 diffusers
 gradio>=6.5.1
 matplotlib>=3.7.5
--- a/backend/python/ace-step/requirements-mps.txt
+++ b/backend/python/ace-step/requirements-mps.txt
@@ -3,7 +3,7 @@ torchaudio
 torchvision

 # Core dependencies
-transformers>=4.51.0,<4.58.0
+transformers>=5.12.1,<5.13.0
 diffusers
 gradio
 matplotlib>=3.7.5
--- a/backend/python/rfdetr/requirements-cpu.txt
+++ b/backend/python/rfdetr/requirements-cpu.txt
@@ -3,5 +3,5 @@ opencv-python
 accelerate
 peft
 inference
-torch==2.7.1
+torch==2.12.0+cu130
 optimum-quanto
--- a/backend/python/rfdetr/requirements-cublas12.txt
+++ b/backend/python/rfdetr/requirements-cublas12.txt
@@ -1,4 +1,4 @@
-torch==2.7.1
+torch==2.12.0+cu130
 rfdetr
 opencv-python
 accelerate
--- a/backend/python/rfdetr/requirements-cublas13.txt
+++ b/backend/python/rfdetr/requirements-cublas13.txt
@@ -1,5 +1,5 @@
 --extra-index-url https://download.pytorch.org/whl/cu130
-torch==2.9.1
+torch==2.12.0+cu130
 rfdetr
 opencv-python
 accelerate
--- a/backend/python/rfdetr/requirements-hipblas.txt
+++ b/backend/python/rfdetr/requirements-hipblas.txt
@@ -1,5 +1,5 @@
 --extra-index-url https://download.pytorch.org/whl/rocm7.0
-torch==2.10.0+rocm7.0
+torch==2.12.0+cu130
 torchvision==0.25.0+rocm7.0
 rfdetr
 opencv-python
--- a/backend/python/rfdetr/requirements-mps.txt
+++ b/backend/python/rfdetr/requirements-mps.txt
@@ -1,4 +1,4 @@
-torch==2.7.1
+torch==2.12.0+cu130
 rfdetr
 opencv-python
 accelerate
--- a/backend/python/sglang/requirements-cpu.txt
+++ b/backend/python/sglang/requirements-cpu.txt
@@ -1,6 +1,6 @@
 --extra-index-url https://download.pytorch.org/whl/cpu
 accelerate
-torch==2.9.0
+torch==2.12.0+cpu
 torchvision
 torchaudio
 transformers
--- a/backend/python/sglang/requirements-cublas12.txt
+++ b/backend/python/sglang/requirements-cublas12.txt
@@ -6,7 +6,7 @@
 # for cublas12 so uv consults this index alongside PyPI.
 --extra-index-url https://download.pytorch.org/whl/cu128
 accelerate
-torch==2.9.1
+torch==2.12.0+cpu
 torchvision
 torchaudio
 transformers
--- a/backend/python/trl/requirements-cpu.txt
+++ b/backend/python/trl/requirements-cpu.txt
@@ -1,9 +1,9 @@
 --extra-index-url https://download.pytorch.org/whl/cpu
-torch==2.10.0
+torch==2.12.0+cpu
 trl
 peft
 datasets>=3.0.0
-transformers>=4.56.2
+transformers>=5.12.1
 accelerate>=1.4.0
 huggingface-hub>=1.3.0
 sentencepiece
--- a/backend/python/trl/requirements-cublas12.txt
+++ b/backend/python/trl/requirements-cublas12.txt
@@ -1,8 +1,8 @@
-torch==2.10.0
+torch==2.12.0+cpu
 trl
 peft
 datasets>=3.0.0
-transformers>=4.56.2
+transformers>=5.12.1
 accelerate>=1.4.0
 huggingface-hub>=1.3.0
 sentencepiece
--- a/backend/python/trl/requirements-cublas13.txt
+++ b/backend/python/trl/requirements-cublas13.txt
@@ -1,8 +1,8 @@
-torch==2.10.0
+torch==2.12.0+cpu
 trl
 peft
 datasets>=3.0.0
-transformers>=4.56.2
+transformers>=5.12.1
 accelerate>=1.4.0
 huggingface-hub>=1.3.0
 sentencepiece
--- a/backend/python/trl/requirements-mps.txt
+++ b/backend/python/trl/requirements-mps.txt
@@ -1,8 +1,8 @@
-torch==2.10.0
+torch==2.12.0+cpu
 trl
 peft
 datasets>=3.0.0
-transformers>=4.56.2
+transformers>=5.12.1
 accelerate>=1.4.0
 huggingface-hub>=1.3.0
 sentencepiece
--- a/backend/python/vllm-omni/requirements-cublas12.txt
+++ b/backend/python/vllm-omni/requirements-cublas12.txt
@@ -1,4 +1,4 @@
 accelerate
-torch==2.7.0
+torch==2.12.0+cu130
 transformers
 bitsandbytes
--- a/backend/python/vllm/requirements-cublas13-after.txt
+++ b/backend/python/vllm/requirements-cublas13-after.txt
@@ -3,8 +3,8 @@
 # on a cu130 host. Pull the cu130-flavoured wheel from vLLM's per-tag index
 # instead — the cublas13 case in install.sh adds --index-strategy=unsafe-best-match
 # so uv consults this index alongside PyPI.
--extra-index-url https://wheels.vllm.ai/0.23.0/cu130
+--extra-index-url https://wheels.vllm.ai/0.24.0/cu130
 # VERSION COUPLING: darwin/Apple-Silicon builds use vllm-metal (see install.sh),
 # which pins this exact vLLM version. Bumping vllm here means coordinating with a
 # vllm-metal release that supports the new version, or macOS/Metal builds break.
-vllm==0.23.0
+vllm==0.24.0
--- a/backend/rust/kokoros/src/service.rs
+++ b/backend/rust/kokoros/src/service.rs
@@ -351,6 +351,16 @@ impl Backend for KokorosService {
        Err(Status::unimplemented("Not supported"))
    }

+    type AudioTranscriptionLiveStream =
+        ReceiverStream<Result<backend::TranscriptLiveResponse, Status>>;
+
+    async fn audio_transcription_live(
+        &self,
+        _: Request<tonic::Streaming<backend::TranscriptLiveRequest>>,
+    ) -> Result<Response<Self::AudioTranscriptionLiveStream>, Status> {
+        Err(Status::unimplemented("Not supported"))
+    }
+
    async fn diarize(
        &self,
        _: Request<backend::DiarizeRequest>,
--- a/cmd/launcher/internal/launcher.go
+++ b/cmd/launcher/internal/launcher.go
@@ -207,12 +207,20 @@ func (l *Launcher) StartLocalAI() error {
 	}

 	// Build command arguments
+	dataPath := l.GetDataPath()
 	args := []string{
 		"run",
 		"--models-path", l.config.ModelsPath,
 		"--backends-path", l.config.BackendsPath,
 		"--address", l.config.Address,
 		"--log-level", l.config.LogLevel,
+		// Keep persistent data and dynamic config under the launcher's data
+		// directory (~/.localai) rather than letting the server resolve them
+		// to ${basepath}/{data,configuration}. ${basepath} expands to the
+		// launcher process's CWD (often the user's home root), which puts
+		// ~/data and ~/configuration outside ~/.localai. See #10610.
+		"--data-path", filepath.Join(dataPath, "data"),
+		"--localai-config-dir", filepath.Join(dataPath, "configuration"),
 	}

 	l.localaiCmd = exec.CommandContext(l.ctx, binaryPath, args...)
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -1716,7 +1716,7 @@
      - use_jinja:true
    parameters:
      min_p: 0.15
-      model: llama-cpp/models/LFM2.5-8B-A1B-GGUF/LFM2.5-8B-A1B-Q4_K_M.gguf
+      model: llama-cpp/models/LFM2.5-8B-A1B-GGUF/LFM2.5-8B-A1B-Q8_0.gguf
      repeat_penalty: 1.05
      temperature: 0.1
      top_k: 50
@@ -1724,9 +1724,9 @@
    template:
      use_tokenizer_template: true
  files:
-    - filename: llama-cpp/models/LFM2.5-8B-A1B-GGUF/LFM2.5-8B-A1B-Q4_K_M.gguf
-      uri: https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF/resolve/main/LFM2.5-8B-A1B-Q4_K_M.gguf
-      sha256: 4923ec14f06b968b74d663e5949867d2d9c3bf13a20b8be1a9f9af39989b2bb0
+    - filename: llama-cpp/models/LFM2.5-8B-A1B-GGUF/LFM2.5-8B-A1B-Q8_0.gguf
+      uri: https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF/resolve/main/LFM2.5-8B-A1B-Q8_0.gguf
+      sha256: 33ab3b8ce6a964fb8ebac89360c9b3cf72c4fa418d5e4c0a94d46883124d5c02
 - name: "qwopus3.5-9b-coder-mtp"
  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
  urls:
--- a/pkg/grpc/grpcerrors/errors.go
+++ b/pkg/grpc/grpcerrors/errors.go
@@ -58,6 +58,23 @@ func IsLiveTranscriptionUnsupported(err error) bool {
 	return strings.Contains(strings.ToLower(err.Error()), "unimplemented")
 }

+// IsUnimplemented reports whether err is a gRPC Unimplemented status — the
+// signal a backend gives for an RPC it does not implement. The generated
+// UnimplementedBackendServer stub returns exactly this for any RPC a backend
+// (e.g. a Python or external backend) has not overridden, so callers can treat
+// an optional RPC as a no-op rather than a failure. Prefers the typed status
+// code and falls back to the message for paths that lose the status (e.g. errors
+// wrapped across non-gRPC boundaries).
+func IsUnimplemented(err error) bool {
+	if err == nil {
+		return false
+	}
+	if status.Code(err) == codes.Unimplemented {
+		return true
+	}
+	return strings.Contains(strings.ToLower(err.Error()), "unimplemented")
+}
+
 // StreamTranscriptionUnsupported returns the canonical error a backend returns
 // when it (or the loaded model) cannot serve the server-streaming
 // AudioTranscriptionStream RPC. It carries codes.Unimplemented like the live
--- a/pkg/grpc/grpcerrors/errors_test.go
+++ b/pkg/grpc/grpcerrors/errors_test.go
@@ -55,6 +55,18 @@ var _ = Describe("grpcerrors", func() {
 		Expect(grpcerrors.IsModelNotLoaded(err)).To(BeFalse())
 	})

+	DescribeTable("IsUnimplemented",
+		func(err error, want bool) {
+			Expect(grpcerrors.IsUnimplemented(err)).To(Equal(want))
+		},
+		Entry("nil", nil, false),
+		Entry("typed code", status.Error(codes.Unimplemented, "method Free not implemented"), true),
+		Entry("stale stub message (Unknown code)", errors.New("rpc error: code = Unimplemented desc = "), true),
+		Entry("unrelated error", errors.New("context deadline exceeded"), false),
+		Entry("unrelated grpc code", status.Error(codes.Unavailable, "connection refused"), false),
+		Entry("model not loaded is NOT unimplemented", grpcerrors.ModelNotLoaded("parakeet-cpp"), false),
+	)
+
 	It("StreamTranscriptionUnsupported carries Unimplemented and is not ModelNotLoaded", func() {
 		err := grpcerrors.StreamTranscriptionUnsupported("parakeet-cpp", "not a streaming model")
 		Expect(status.Code(err)).To(Equal(codes.Unimplemented))
--- a/pkg/model/process.go
+++ b/pkg/model/process.go
@@ -11,6 +11,7 @@ import (
 	"time"

 	"github.com/hpcloud/tail"
+	"github.com/mudler/LocalAI/pkg/grpc/grpcerrors"
 	"github.com/mudler/LocalAI/pkg/signals"
 	process "github.com/mudler/go-processmanager"
 	"github.com/mudler/xlog"
@@ -52,10 +53,21 @@ func (ml *ModelLoader) deleteProcess(s string) error {
 		hook(s)
 	}

-	// Free GPU resources before stopping the process to ensure VRAM is released
+	// Free GPU resources before stopping the process to ensure VRAM is released.
+	// Free is optional: backends that don't override it (the generated stub, many
+	// Python/external backends, or a federation proxy in distributed mode) return
+	// gRPC Unimplemented. That is expected, not a failure — VRAM is reclaimed when
+	// the process is stopped below, or by the remote unloader for remote backends —
+	// so don't surface it as an error.
 	xlog.Debug("Calling Free() to release GPU resources", "model", s)
 	if err := model.GRPC(false, ml.wd).Free(context.Background()); err != nil {
-		xlog.Warn("Error freeing GPU resources", "error", err, "model", s)
+		if grpcerrors.IsUnimplemented(err) {
+			xlog.Debug("Backend does not implement Free(); GPU release handled on process stop", "model", s)
+		} else {
+			// Now that the expected Unimplemented case is filtered out above, a
+			// remaining error is a genuine failure to release VRAM — surface it.
+			xlog.Error("Error freeing GPU resources", "error", err, "model", s)
+		}
 	}

 	process := model.Process()
Author	SHA1	Message	Date
dependabot[bot]	e9154d4a3a	chore(deps): bump the pip group across 5 directories with 1 update Bumps the pip group with 1 update in the /backend/python/ace-step directory: torch. Bumps the pip group with 1 update in the /backend/python/rfdetr directory: torch. Bumps the pip group with 1 update in the /backend/python/sglang directory: torch. Bumps the pip group with 1 update in the /backend/python/trl directory: torch. Bumps the pip group with 1 update in the /backend/python/vllm-omni directory: torch. Updates `torch` from 2.10.0+rocm7.0 to 2.12.0+cpu Updates `torch` from 2.7.1 to 2.12.0+cu130 Updates `torch` from 2.9.0 to 2.12.0+cpu Updates `torch` from 2.10.0 to 2.12.0+cpu Updates `torch` from 2.7.0 to 2.12.0+cu130 --- updated-dependencies: - dependency-name: torch dependency-version: 2.12.0+cpu dependency-type: direct:production dependency-group: pip - dependency-name: torch dependency-version: 2.12.0+cu130 dependency-type: direct:production dependency-group: pip - dependency-name: torch dependency-version: 2.12.0+cpu dependency-type: direct:production dependency-group: pip - dependency-name: torch dependency-version: 2.12.0+cpu dependency-type: direct:production dependency-group: pip - dependency-name: torch dependency-version: 2.12.0+cu130 dependency-type: direct:production dependency-group: pip ... Signed-off-by: dependabot[bot] <support@github.com>	2026-07-01 18:55:56 +00:00
LocalAI [bot]	0d8adfc59a	chore: ⬆️ Update ggml-org/llama.cpp to `0eca4d490e591d4e93058d07540cf47278a72577` (#10617 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-01 09:31:50 +02:00
LocalAI [bot]	43f2615e19	chore: ⬆️ Update vllm-project/vllm cu130 wheel to `0.24.0` (#10618 ) ⬆️ Update vllm-project/vllm cu130 wheel Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-01 08:53:03 +02:00
LocalAI [bot]	875c539ad5	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `29431b31c89e79c10f8736e8f2742485ba1713d6` (#10620 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-01 08:52:36 +02:00
LocalAI [bot]	d641ded194	chore: ⬆️ Update ggml-org/whisper.cpp to `0874de3e8e8e48361dba85c7fe6d176f008bf158` (#10621 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-01 08:43:40 +02:00
LocalAI [bot]	40445fff05	chore: ⬆️ Update leejet/stable-diffusion.cpp to `484baa41e5e006c52dcd4addc38c830b9489745f` (#10619 ) * ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * fix(stablediffusion-ggml): adapt to new generate_image() out-param signature leejet/stable-diffusion.cpp@484baa4 changed generate_image() from returning sd_image_t* to returning bool with images_out/num_images_out out-parameters (same pattern already used by generate_video()). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-07-01 08:32:57 +02:00
Tai An	057dee956a	fix(launcher): keep data/config under ~/.localai (#10610 ) (#10613 ) The launcher starts the server with run --models-path/--backends-path but leaves --data-path and the dynamic config dir unset, so the server falls back to its /data and /configuration defaults. is kong.ExpandPath("."), i.e. the launcher process CWD (commonly the user's home root), producing ~/data and ~/configuration outside ~/.localai and an agent-pool stateDir under ~/data. Pass --data-path and --localai-config-dir explicitly, rooted at the launcher's own data directory (GetDataPath() -> ~/.localai), so data and config stay consistent with --models-path/--backends-path.	2026-06-30 22:14:59 +02:00
Adira	4ec39bb776	fix(watchdog): don't log optional Free() as an error when backend returns Unimplemented (#10602 ) (#10607 ) * fix(watchdog): don't log optional Free() as an error when backend returns Unimplemented (#10602) When the watchdog evicts a model, deleteProcess calls the backend's gRPC Free() to release VRAM before stopping the process. Free is optional: backends that don't override it -- the generated UnimplementedBackendServer stub, many Python/external backends, or a federation proxy in distributed mode -- return gRPC Unimplemented. That is expected, not a failure: VRAM is reclaimed when the local process is stopped, or by the remote unloader for remote backends. Logging it as "WARN Error freeing GPU resources" made a benign, optional RPC look like a fault (the alarming line in #10602, seen in distributed mode where the model is remote and Free hits a stub). Treat gRPC Unimplemented from Free() as a no-op logged at Debug; genuine failures still Warn. Free() is still attempted for every backend, so any backend that does implement it is unaffected. Add a reusable grpcerrors.IsUnimplemented helper following the package's existing code-based detection idiom (prefer the typed status code, fall back to the message across non-gRPC boundaries), with table tests. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com> * fix(watchdog): log a non-Unimplemented Free() failure at error level Per review: now that the expected gRPC Unimplemented case is split out and logged at Debug, any remaining Free() error is a genuine failure to release VRAM, so surface it at error level instead of warn. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com> --------- Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com>	2026-06-30 22:14:01 +02:00
Ettore Di Giacinto	25ecb9f015	fix(gallery): use Q8_0 for lfm2.5-8b-a1b to fix poor tool-call quality The Q4_K_M quant degraded tool-call reliability for LFM2.5-8B-A1B. Switch the gallery entry to the Q8_0 GGUF (sha256 verified via HF x-linked-etag) while keeping the native jinja tool-parsing config. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code]	2026-06-30 17:46:20 +00:00
LocalAI [bot]	2be495f9c0	fix(kokoros): implement AudioTranscriptionLive trait stub (#10612 ) The backend.proto AudioTranscriptionLive bidirectional streaming RPC added new required trait items (AudioTranscriptionLiveStream + audio_transcription_live) on the generated Backend trait. The kokoros (TTS) backend did not implement them, breaking its release build with E0046 (missing trait items). kokoros is text-to-speech and has no live-ASR support, so stub the method to return UNIMPLEMENTED, mirroring the existing audio_transcription_stream stub. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-06-30 19:38:41 +02:00