fix(backends): don't force-reinstall LOCALAI_EXTERNAL_BACKENDS on boot

The startup loop for LOCALAI_EXTERNAL_BACKENDS runs InstallExternalBackend for each listed backend on every boot, and its gallery-name path hardcoded force=true — so every start re-downloaded and re-extracted each listed backend's OCI image even when it was installed and runnable. Supervising apps that list several backends paid several full OCI pulls per launch. Give InstallExternalBackend an explicit force parameter (it only affects the gallery-name fallback; URI installs always write) and pass: - false from the boot loop and `local-ai backends install` (idempotent ensure — `backends upgrade` is the refresh path), - op.Force from the local manager's external-URI op, - the request's force on the worker install path and true on its upgrade path (behavior unchanged). Assisted-by: Claude:claude-fable-5 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
fix(backends): make backend install ops idempotent unless forced
2026-07-02 12:26:49 -04:00 · 2026-07-02 11:11:40 +00:00 · 2026-07-02 11:00:30 +00:00 · 2026-07-02 09:52:51 +02:00 · 2026-07-02 09:48:20 +02:00 · 2026-07-02 09:48:08 +02:00
29 changed files with 508 additions and 29 deletions
--- a/11
+++ b/11
@@ -171,6 +171,17 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ]; then \
    ln -s /opt/rocm-**/lib/llvm/lib/libomp.so /usr/lib/libomp.so \
    ; fi

+# ROCm's bundled libdrm_amdgpu is built with a hardcoded fallback lookup path
+# for the ASIC ID table (/opt/amdgpu/share/libdrm/amdgpu.ids), which only exists
+# if AMD's full amdgpu graphics/DKMS stack is installed. This compute-only image
+# doesn't have it, so hipblas/rocBLAS log "No such file or directory" on every
+# model load and can fail to identify the GPU. Point it at the equivalent file
+# Ubuntu's libdrm-common package already ships.
+RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ -f /usr/share/libdrm/amdgpu.ids ] && [ ! -e /opt/amdgpu/share/libdrm/amdgpu.ids ]; then \
+    mkdir -p /opt/amdgpu/share/libdrm && \
+    ln -s /usr/share/libdrm/amdgpu.ids /opt/amdgpu/share/libdrm/amdgpu.ids \
+    ; fi
+
 RUN expr "${BUILD_TYPE}" = intel && echo "intel" > /run/localai/capability || echo "not intel"

 # Cuda
--- a/backend/cpp/ik-llama-cpp/Makefile
+++ b/backend/cpp/ik-llama-cpp/Makefile
@@ -1,5 +1,5 @@

-IK_LLAMA_VERSION?=29431b31c89e79c10f8736e8f2742485ba1713d6
+IK_LLAMA_VERSION?=068b173649f2fd8dc96b35ada5a0b76d8985105d
 LLAMA_REPO?=https://github.com/ikawrakow/ik_llama.cpp

 CMAKE_ARGS?=
--- a/backend/cpp/llama-cpp/Makefile
+++ b/backend/cpp/llama-cpp/Makefile
@@ -1,5 +1,5 @@

-LLAMA_VERSION?=0eca4d490e591d4e93058d07540cf47278a72577
+LLAMA_VERSION?=4fc4ec5541b243957ae5099edb67372f8f3b550e
 LLAMA_REPO?=https://github.com/ggerganov/llama.cpp

 CMAKE_ARGS?=
--- a/backend/go/crispasr/Makefile
+++ b/backend/go/crispasr/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # CrispASR version (release tag)
 CRISPASR_REPO?=https://github.com/CrispStrobe/CrispASR
-CRISPASR_VERSION?=8fd9db8fec8cb5e929d23d3267ed5817794feb1a
+CRISPASR_VERSION?=fcbc8718e654995e3bd2d0c98bcb8e55e297d23c
 SO_TARGET?=libgocrispasr.so

 CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
--- a/backend/go/parakeet-cpp/Makefile
+++ b/backend/go/parakeet-cpp/Makefile
@@ -1,6 +1,6 @@
 # parakeet-cpp backend Makefile.
 #
-# Upstream pin lives below as PARAKEET_VERSION?=f469a57270a1cc4554acb15febf60e56619673b9
+# Upstream pin lives below as PARAKEET_VERSION?=e8acc6172a94e20a952cf1843decace5d771a94b
 # (.github/bump_deps.sh) can find and update it - matches the
 # whisper.cpp / ds4 / vibevoice-cpp convention.
 #
@@ -15,7 +15,7 @@
 # That's what the L0 smoke test uses. The default target below does the
 # proper clone-at-pin + cmake build so CI doesn't need a side-checkout.

-PARAKEET_VERSION?=f469a57270a1cc4554acb15febf60e56619673b9
+PARAKEET_VERSION?=e8acc6172a94e20a952cf1843decace5d771a94b
 PARAKEET_REPO?=https://github.com/mudler/parakeet.cpp

 GOCMD?=go
--- a/backend/go/stablediffusion-ggml/Makefile
+++ b/backend/go/stablediffusion-ggml/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # stablediffusion.cpp (ggml)
 STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
-STABLEDIFFUSION_GGML_VERSION?=484baa41e5e006c52dcd4addc38c830b9489745f
+STABLEDIFFUSION_GGML_VERSION?=3590aa8d626e671a1b1dc84506ea2932a243a480

 CMAKE_ARGS+=-DGGML_MAX_NAME=128

--- a/backend/go/whisper/Makefile
+++ b/backend/go/whisper/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # whisper.cpp version
 WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp
-WHISPER_CPP_VERSION?=0874de3e8e8e48361dba85c7fe6d176f008bf158
+WHISPER_CPP_VERSION?=6fc7c33b4c3a2cec83e4b65abd5e96a890480375
 SO_TARGET?=libgowhisper.so

 CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
--- a/backend/python/vllm/backend.py
+++ b/backend/python/vllm/backend.py
@@ -748,7 +748,12 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
        # When (A) native streaming ran cleanly, per-delta yields above already
        # delivered everything — do NOT extract again on the full text or we'd
        # duplicate content/tool_calls into the final chunk.
-        if has_tool_parser and not (native_streaming and not native_streaming_error):
+        # NOTE: `native_streaming` is a capability flag ("streaming parser is
+        # available"), not a state flag ("streaming actually ran"). For
+        # non-streaming requests it is still True but the per-delta loop was
+        # never entered, so we MUST still run extract_tool_calls here. Hence
+        # the explicit `streaming and …` guard on both branches.
+        if has_tool_parser and not (streaming and native_streaming and not native_streaming_error):
            try:
                tp = tp_instance
                if tp is None:
@@ -770,7 +775,7 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
                        ))
            except Exception as e:
                print(f"Tool parser error: {e}", file=sys.stderr)
-        elif native_streaming and not native_streaming_error:
+        elif streaming and native_streaming and not native_streaming_error:
            # Per-delta path already emitted content + tool_calls; the final
            # chat_delta should carry only metadata (token counts, logprobs).
            content = ""
--- a/backend/python/vllm/install.sh
+++ b/backend/python/vllm/install.sh
@@ -104,7 +104,7 @@ if [ "$(uname -s)" = "Darwin" ]; then
    # can rewrite it. Darwin therefore follows vllm-metal and can lag the Linux
    # vllm pin (requirements-cublas13-after.txt, bumped independently against
    # vllm/vllm) until vllm-metal supports a newer vLLM.
-    VLLM_METAL_VERSION="v0.3.0.dev20260630095652"
+    VLLM_METAL_VERSION="v0.3.0.dev20260701132215"

    # The coupled vLLM source version is whatever this vllm-metal release builds
    # against -- it declares it in its own installer as `vllm_v=`. Derive it from
--- a/core/application/distributed.go
+++ b/core/application/distributed.go
@@ -356,6 +356,12 @@ func initDistributed(cfg *config.ApplicationConfig, authDB *gorm.DB, configLoade
 		PrefixConfig:     prefixCfg,
 		Pressure:         pressure,
 		SharedModels:     cfg.Distributed.SharedModels,
+		// Cap how long a cold load may hold the per-model advisory lock: the
+		// configured backend.install deadline plus a margin for file staging and
+		// the remote LoadModel. Derived from the install timeout so raising it
+		// (for slow links pulling multi-GB images) widens the ceiling too,
+		// instead of letting the static default cut a legitimately slow load.
+		ModelLoadCeiling: cfg.Distributed.BackendInstallTimeoutOrDefault() + 10*time.Minute,
 	})

 	// Wire staging-progress broadcasting so file-staging shows up on every
--- a/core/application/startup.go
+++ b/core/application/startup.go
@@ -369,7 +369,7 @@ func New(opts ...config.AppOption) (*Application, error) {
 	}

 	for _, backend := range options.ExternalBackends {
-		if err := galleryop.InstallExternalBackend(options.Context, options.BackendGalleries, options.SystemState, application.ModelLoader(), nil, backend, "", "", options.RequireBackendIntegrity); err != nil {
+		if err := galleryop.InstallExternalBackend(options.Context, options.BackendGalleries, options.SystemState, application.ModelLoader(), nil, backend, "", "", false, options.RequireBackendIntegrity); err != nil {
 			xlog.Error("error installing external backend", "error", err)
 		}
 	}
--- a/core/cli/backends.go
+++ b/core/cli/backends.go
@@ -127,7 +127,7 @@ func (bi *BackendsInstall) Run(ctx *cliContext.Context) error {
 	}

 	modelLoader := model.NewModelLoader(systemState)
-	err = galleryop.InstallExternalBackend(context.Background(), galleries, systemState, modelLoader, progressCallback, bi.BackendArgs, bi.Name, bi.Alias, bi.RequireBackendIntegrity)
+	err = galleryop.InstallExternalBackend(context.Background(), galleries, systemState, modelLoader, progressCallback, bi.BackendArgs, bi.Name, bi.Alias, false, bi.RequireBackendIntegrity)
 	if err != nil {
 		return err
 	}
--- a/core/http/endpoints/localai/backend.go
+++ b/core/http/endpoints/localai/backend.go
@@ -65,6 +65,10 @@ type BackendEndpointService struct {

 type GalleryBackend struct {
 	ID string `json:"id"`
+	// Force reinstalls the backend even when it is already installed and
+	// runnable. Off by default so apply stays idempotent for supervising
+	// apps that ensure their backend on every boot.
+	Force bool `json:"force"`
 }

 func CreateBackendEndpointService(galleries []config.Gallery, systemState *system.SystemState, backendApplier *galleryop.GalleryService, upgradeChecker UpgradeInfoProvider) BackendEndpointService {
@@ -103,7 +107,9 @@ func (mgs *BackendEndpointService) GetAllStatusEndpoint() echo.HandlerFunc {
 	}
 }

-// ApplyBackendEndpoint installs a new backend to a LocalAI instance
+// ApplyBackendEndpoint installs a new backend to a LocalAI instance. The op is
+// idempotent: an already-installed, runnable backend is left alone unless the
+// request sets "force": true (explicit reinstall).
 // @Summary Install backends to LocalAI.
 // @Tags backends
 // @Param request body GalleryBackend true "query params"
@@ -137,6 +143,7 @@ func (mgs *BackendEndpointService) ApplyBackendEndpoint(systemState *system.Syst
 			ID:                 uuid.String(),
 			GalleryElementName: input.ID,
 			Galleries:          mgs.galleries,
+			Force:              input.Force,
 		}

 		return c.JSON(200, schema.BackendResponse{ID: uuid.String(), StatusURL: fmt.Sprintf("%sbackends/jobs/%s", middleware.BaseURL(c), uuid.String())})
--- a/core/http/endpoints/localai/backend_apply_test.go
+++ b/core/http/endpoints/localai/backend_apply_test.go
@@ -0,0 +1,87 @@
+package localai_test
+
+import (
+	"net/http"
+	"net/http/httptest"
+	"os"
+	"strings"
+
+	"github.com/labstack/echo/v4"
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/gallery"
+	. "github.com/mudler/LocalAI/core/http/endpoints/localai"
+	"github.com/mudler/LocalAI/core/services/galleryop"
+	"github.com/mudler/LocalAI/pkg/model"
+	"github.com/mudler/LocalAI/pkg/system"
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+)
+
+// POST /backends/apply must be idempotent by default: supervising apps call it
+// on every boot to ensure a backend exists, and forcing a reinstall there
+// re-downloads the whole artifact each time. Reinstall stays available behind
+// the explicit force flag.
+var _ = Describe("POST /backends/apply force plumbing", func() {
+	var (
+		app     *echo.Echo
+		gs      *galleryop.GalleryService
+		tmpDir  string
+		received chan galleryop.ManagementOp[gallery.GalleryBackend, any]
+	)
+
+	BeforeEach(func() {
+		app = echo.New()
+
+		var err error
+		tmpDir, err = os.MkdirTemp("", "backends-apply-test-*")
+		Expect(err).NotTo(HaveOccurred())
+
+		systemState, err := system.GetSystemState(system.WithBackendPath(tmpDir))
+		Expect(err).NotTo(HaveOccurred())
+		appConfig := &config.ApplicationConfig{SystemState: systemState}
+
+		// The service is deliberately not started: the test reads the op off
+		// the (unbuffered) channel itself.
+		gs = galleryop.NewGalleryService(appConfig, model.NewModelLoader(systemState))
+		svc := CreateBackendEndpointService(nil, systemState, gs, nil)
+		app.POST("/backends/apply", svc.ApplyBackendEndpoint(systemState))
+
+		received = make(chan galleryop.ManagementOp[gallery.GalleryBackend, any], 1)
+		go func() {
+			op := <-gs.BackendGalleryChannel
+			received <- op
+		}()
+	})
+
+	AfterEach(func() {
+		Expect(os.RemoveAll(tmpDir)).To(Succeed())
+	})
+
+	apply := func(body string) *httptest.ResponseRecorder {
+		req := httptest.NewRequest(http.MethodPost, "/backends/apply", strings.NewReader(body))
+		req.Header.Set(echo.HeaderContentType, echo.MIMEApplicationJSON)
+		rec := httptest.NewRecorder()
+		app.ServeHTTP(rec, req)
+		return rec
+	}
+
+	It("enqueues a non-forced op by default", func() {
+		rec := apply(`{"id":"llama-cpp"}`)
+		Expect(rec.Code).To(Equal(http.StatusOK))
+
+		var op galleryop.ManagementOp[gallery.GalleryBackend, any]
+		Eventually(received).Should(Receive(&op))
+		Expect(op.GalleryElementName).To(Equal("llama-cpp"))
+		Expect(op.Force).To(BeFalse())
+	})
+
+	It("enqueues a forced op when the request sets force", func() {
+		rec := apply(`{"id":"llama-cpp","force":true}`)
+		Expect(rec.Code).To(Equal(http.StatusOK))
+
+		var op galleryop.ManagementOp[gallery.GalleryBackend, any]
+		Eventually(received).Should(Receive(&op))
+		Expect(op.GalleryElementName).To(Equal("llama-cpp"))
+		Expect(op.Force).To(BeTrue())
+	})
+})
--- a/core/http/routes/ui_api.go
+++ b/core/http/routes/ui_api.go
@@ -1243,6 +1243,9 @@ func RegisterUIAPIRoutes(app *echo.Echo, cl *config.ModelConfigLoader, ml *model
 			Galleries:          appConfig.BackendGalleries,
 			Context:            ctx,
 			CancelFunc:         cancelFunc,
+			// The React UI's "Reinstall backend" action reuses this route, so
+			// the op must force even when the backend is already installed.
+			Force: true,
 		}
 		// Store cancellation function immediately so queued operations can be cancelled
 		galleryService.StoreCancellation(uid, cancelFunc)
--- a/core/services/advisorylock/advisorylock.go
+++ b/core/services/advisorylock/advisorylock.go
@@ -6,10 +6,39 @@ import (
 	"hash/fnv"
 	"strings"
 	"sync"
+	"time"

 	"gorm.io/gorm"
 )

+// advisoryLockWaitBackstop bounds, server-side, how long we will wait to
+// acquire a blocking advisory lock when the caller's context carries no
+// deadline (e.g. a startup schema migration using context.Background()). It
+// only exists so such a caller cannot hang forever behind a holder whose
+// session never releases the lock; it is far longer than any legitimate
+// guarded section. A var (not const) so tests can shrink it.
+var advisoryLockWaitBackstop = 30 * time.Minute
+
+// advisoryLockTimeoutMargin is added to a context's remaining budget when
+// deriving the server-side lock_timeout, so the Go context's own (cleaner)
+// cancellation fires first and the server bound is only ever a backstop.
+const advisoryLockTimeoutMargin = 30 * time.Second
+
+// advisoryLockWaitBudget returns the server-side lock_timeout to use for a
+// blocking acquire: the caller context's remaining time plus a margin (so the
+// Go context still governs), or the backstop when the context has no deadline.
+// Never returns zero - "wait forever" must not be possible.
+func advisoryLockWaitBudget(ctx context.Context) time.Duration {
+	if dl, ok := ctx.Deadline(); ok {
+		budget := time.Until(dl) + advisoryLockTimeoutMargin
+		if budget < time.Second {
+			budget = time.Second
+		}
+		return budget
+	}
+	return advisoryLockWaitBackstop
+}
+
 // localLocks holds one buffered channel (capacity 1) per lock key, used as an
 // in-process mutex for non-PostgreSQL dialects (SQLite). A SQLite auth DB is
 // effectively single-process, so serializing guarded sections within this
@@ -130,6 +159,27 @@ func WithLockCtx(ctx context.Context, db *gorm.DB, key int64, fn func() error) e
 	}
 	defer conn.Close()

+	// Override any deployment-wide lock_timeout on this dedicated connection.
+	// Operators commonly set a short global lock_timeout (on the role or
+	// database) to bound ordinary row-lock waits. Applied to the blocking
+	// pg_advisory_lock below, it aborts the wait with SQLSTATE 55P03 and turns
+	// LocalAI's intentional cross-replica "wait your turn, then re-check"
+	// coordination into a hard error for the caller (e.g. a chat request that
+	// just wanted to reuse a model another replica is loading).
+	//
+	// We do NOT disable it outright (lock_timeout = 0 would wait forever, which
+	// is unsafe for the schema-migration callers that pass context.Background()).
+	// Instead we set a bound derived from the caller's context: its remaining
+	// budget plus a margin so the Go context's cancellation wins with a clean
+	// error, or a finite backstop when the context has no deadline.
+	waitBudget := advisoryLockWaitBudget(ctx)
+	if _, err := conn.ExecContext(ctx,
+		fmt.Sprintf("SET lock_timeout = %d", waitBudget.Milliseconds())); err != nil {
+		return fmt.Errorf("advisorylock: setting lock_timeout: %w", err)
+	}
+	// Restore the session default before this pooled connection is reused.
+	defer func() { _, _ = conn.ExecContext(context.Background(), "RESET lock_timeout") }()
+
 	if _, err := conn.ExecContext(ctx, "SELECT pg_advisory_lock($1)", key); err != nil {
 		return fmt.Errorf("advisorylock: acquiring lock %d: %w", key, err)
 	}
--- a/core/services/advisorylock/advisorylock_test.go
+++ b/core/services/advisorylock/advisorylock_test.go
@@ -158,6 +158,87 @@ var _ = Describe("AdvisoryLock", func() {
 			Expect(err).To(HaveOccurred())
 		})

+		It("waits out a short server-side lock_timeout instead of failing with 55P03", func() {
+			const lockKey int64 = 703
+
+			// Reproduce the production deployment that triggered this: a short
+			// global lock_timeout set on the database. Without the fix, a waiter
+			// blocked on pg_advisory_lock() is aborted by the server after this
+			// window and surfaces SQLSTATE 55P03 ("canceling statement due to
+			// lock timeout") to the caller instead of waiting for its turn.
+			Expect(db.Exec("ALTER DATABASE testdb SET lock_timeout = '300ms'").Error).ToNot(HaveOccurred())
+			sqlDB, err := db.DB()
+			Expect(err).ToNot(HaveOccurred())
+			// Drop pooled connections so subsequent ones reconnect and inherit
+			// the new database-level lock_timeout default.
+			sqlDB.SetMaxIdleConns(0)
+
+			holding := make(chan struct{})
+			released := make(chan struct{})
+			go func() {
+				defer GinkgoRecover()
+				herr := WithLockCtx(context.Background(), db, lockKey, func() error {
+					close(holding)
+					// Hold well past the 300ms server lock_timeout.
+					time.Sleep(1 * time.Second)
+					return nil
+				})
+				Expect(herr).ToNot(HaveOccurred())
+				close(released)
+			}()
+
+			<-holding // ensure the holder owns the lock before we contend
+
+			ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+			defer cancel()
+			executed := false
+			start := time.Now()
+			werr := WithLockCtx(ctx, db, lockKey, func() error {
+				executed = true
+				return nil
+			})
+			Expect(werr).ToNot(HaveOccurred(),
+				"waiter should wait out the in-progress hold, not fail with lock_timeout (55P03)")
+			Expect(executed).To(BeTrue())
+			Expect(time.Since(start)).To(BeNumerically(">=", 400*time.Millisecond),
+				"waiter should have actually waited for the holder to release")
+			<-released
+		})
+
+		It("bounds a deadline-less waiter with the backstop instead of waiting forever", func() {
+			const lockKey int64 = 704
+
+			// A caller with no context deadline (e.g. startup schema migration
+			// passing context.Background()) must not hang forever if the holder
+			// never releases. Shrink the backstop so the test is fast.
+			origBackstop := advisoryLockWaitBackstop
+			advisoryLockWaitBackstop = 500 * time.Millisecond
+			DeferCleanup(func() { advisoryLockWaitBackstop = origBackstop })
+
+			holding := make(chan struct{})
+			release := make(chan struct{})
+			go func() {
+				defer GinkgoRecover()
+				_ = WithLockCtx(context.Background(), db, lockKey, func() error {
+					close(holding)
+					<-release // hold until the test releases us
+					return nil
+				})
+			}()
+			defer close(release)
+
+			<-holding
+
+			start := time.Now()
+			err := WithLockCtx(context.Background(), db, lockKey, func() error {
+				Fail("waiter should not have acquired the still-held lock")
+				return nil
+			})
+			Expect(err).To(HaveOccurred(), "deadline-less waiter should give up at the backstop, not hang")
+			Expect(time.Since(start)).To(BeNumerically("<", 5*time.Second),
+				"backstop must cap the wait well under the test timeout")
+		})
+
 		It("serializes concurrent WithLockCtx on same key", func() {
 			const lockKey int64 = 702

--- a/core/services/galleryop/backend_force_test.go
+++ b/core/services/galleryop/backend_force_test.go
@@ -0,0 +1,114 @@
+package galleryop_test
+
+import (
+	"context"
+	"os"
+	"path/filepath"
+
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/gallery"
+	"github.com/mudler/LocalAI/core/services/galleryop"
+	"github.com/mudler/LocalAI/pkg/model"
+	"github.com/mudler/LocalAI/pkg/system"
+
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+	"gopkg.in/yaml.v3"
+)
+
+// The install op must be idempotent unless Force is set: API clients call
+// POST /backends/apply on every boot to make sure the backend exists, and an
+// unconditional force here re-downloads the whole backend artifact each time.
+// Reinstall is an explicit, opted-in action.
+var _ = Describe("LocalBackendManager force semantics", func() {
+	var (
+		backendsDir string
+		srcDir      string
+		mgr         *galleryop.LocalBackendManager
+		systemState *system.SystemState
+		ml          *model.ModelLoader
+	)
+
+	const installedRunSh = "#!/bin/sh\necho installed\n"
+	const galleryRunSh = "#!/bin/sh\necho from-gallery\n"
+
+	installedRunShPath := func() string {
+		return filepath.Join(backendsDir, "test-backend", "run.sh")
+	}
+
+	BeforeEach(func() {
+		var err error
+		backendsDir, err = os.MkdirTemp("", "force-backends-*")
+		Expect(err).NotTo(HaveOccurred())
+		srcDir, err = os.MkdirTemp("", "force-src-*")
+		Expect(err).NotTo(HaveOccurred())
+
+		// The gallery serves test-backend from a plain directory (offline).
+		// The gallery yaml itself must live under the backends path: file://
+		// galleries outside the trusted root are rejected by the downloader.
+		Expect(os.WriteFile(filepath.Join(srcDir, "run.sh"), []byte(galleryRunSh), 0o755)).To(Succeed())
+		entries := []map[string]any{{"name": "test-backend", "uri": srcDir}}
+		data, err := yaml.Marshal(entries)
+		Expect(err).NotTo(HaveOccurred())
+		galleryYAML := filepath.Join(backendsDir, "gallery.yaml")
+		Expect(os.WriteFile(galleryYAML, data, 0o644)).To(Succeed())
+
+		// test-backend is already installed, with content that differs from
+		// the gallery's so a reinstall is observable.
+		Expect(os.MkdirAll(filepath.Join(backendsDir, "test-backend"), 0o755)).To(Succeed())
+		Expect(os.WriteFile(installedRunShPath(), []byte(installedRunSh), 0o755)).To(Succeed())
+
+		systemState, err = system.GetSystemState(system.WithBackendPath(backendsDir))
+		Expect(err).NotTo(HaveOccurred())
+		appConfig := &config.ApplicationConfig{
+			SystemState:      systemState,
+			BackendGalleries: []config.Gallery{{Name: "test", URL: "file://" + galleryYAML}},
+		}
+		ml = model.NewModelLoader(systemState)
+		mgr = galleryop.NewLocalBackendManager(appConfig, ml)
+	})
+
+	AfterEach(func() {
+		Expect(os.RemoveAll(backendsDir)).To(Succeed())
+		Expect(os.RemoveAll(srcDir)).To(Succeed())
+	})
+
+	It("skips an already-installed backend when Force is not set", func() {
+		op := &galleryop.ManagementOp[gallery.GalleryBackend, any]{
+			ID:                 "op-1",
+			GalleryElementName: "test-backend",
+		}
+		Expect(mgr.InstallBackend(context.Background(), op, nil)).To(Succeed())
+
+		content, err := os.ReadFile(installedRunShPath())
+		Expect(err).NotTo(HaveOccurred())
+		Expect(string(content)).To(Equal(installedRunSh), "install without Force must not overwrite an installed backend")
+	})
+
+	It("reinstalls an already-installed backend when Force is set", func() {
+		op := &galleryop.ManagementOp[gallery.GalleryBackend, any]{
+			ID:                 "op-2",
+			GalleryElementName: "test-backend",
+			Force:              true,
+		}
+		Expect(mgr.InstallBackend(context.Background(), op, nil)).To(Succeed())
+
+		content, err := os.ReadFile(installedRunShPath())
+		Expect(err).NotTo(HaveOccurred())
+		Expect(string(content)).To(Equal(galleryRunSh), "install with Force must overwrite the installed backend")
+	})
+
+	// The LOCALAI_EXTERNAL_BACKENDS boot loop goes through
+	// InstallExternalBackend's gallery-name path on EVERY startup; it must not
+	// force, or each boot re-downloads every listed backend.
+	It("skips an already-installed backend on the non-forced external gallery-name path", func() {
+		err := galleryop.InstallExternalBackend(context.Background(),
+			[]config.Gallery{{Name: "test", URL: "file://" + filepath.Join(backendsDir, "gallery.yaml")}},
+			systemState, ml, nil, "test-backend", "", "", false, false)
+		Expect(err).NotTo(HaveOccurred())
+
+		content, err := os.ReadFile(installedRunShPath())
+		Expect(err).NotTo(HaveOccurred())
+		Expect(string(content)).To(Equal(installedRunSh), "non-forced external install must not overwrite an installed backend")
+	})
+})
--- a/core/services/galleryop/backends.go
+++ b/core/services/galleryop/backends.go
@@ -144,7 +144,12 @@ func (g *GalleryService) backendHandler(op *ManagementOp[gallery.GalleryBackend,
 // InstallExternalBackend installs a backend from an external source (OCI image, URL, or path).
 // This method contains the logic to detect the input type and call the appropriate installation function.
 // It can be used by both CLI and Web UI for installing backends from external sources.
-func InstallExternalBackend(ctx context.Context, galleries []config.Gallery, systemState *system.SystemState, modelLoader *model.ModelLoader, downloadStatus func(string, string, string, float64), backend, name, alias string, requireIntegrity bool) error {
+//
+// force applies only to the gallery-name fallback: a URI install (dir/OCI/file)
+// always writes, but a bare gallery name is an "ensure installed" — the
+// LOCALAI_EXTERNAL_BACKENDS boot loop runs it on every start and must not
+// re-download an installed, runnable backend.
+func InstallExternalBackend(ctx context.Context, galleries []config.Gallery, systemState *system.SystemState, modelLoader *model.ModelLoader, downloadStatus func(string, string, string, float64), backend, name, alias string, force, requireIntegrity bool) error {
 	uri := downloader.URI(backend)
 	switch {
 	case uri.LooksLikeDir():
@@ -202,7 +207,7 @@ func InstallExternalBackend(ctx context.Context, galleries []config.Gallery, sys
 		if name != "" || alias != "" {
 			return fmt.Errorf("specifying a name or alias is not supported for gallery backends")
 		}
-		err := gallery.InstallBackendFromGallery(ctx, galleries, systemState, modelLoader, backend, downloadStatus, true, requireIntegrity)
+		err := gallery.InstallBackendFromGallery(ctx, galleries, systemState, modelLoader, backend, downloadStatus, force, requireIntegrity)
 		if err != nil {
 			return fmt.Errorf("error installing backend %s: %w", backend, err)
 		}
--- a/core/services/galleryop/backends_test.go
+++ b/core/services/galleryop/backends_test.go
@@ -70,6 +70,7 @@ var _ = Describe("InstallExternalBackend", func() {
 				"test-backend", // gallery name
 				"custom-name",  // name should not be allowed
 				"",
+				false, // force
 				false,
 			)
 			Expect(err).To(HaveOccurred())
@@ -86,6 +87,7 @@ var _ = Describe("InstallExternalBackend", func() {
 				"non-existent-backend",
 				"",
 				"",
+				false, // force
 				false,
 			)
 			Expect(err).To(HaveOccurred())
@@ -103,6 +105,7 @@ var _ = Describe("InstallExternalBackend", func() {
 				"oci://quay.io/mudler/tests:localai-backend-test",
 				"", // name is required for OCI images
 				"",
+				false, // force
 				false,
 			)
 			Expect(err).To(HaveOccurred())
@@ -136,6 +139,7 @@ var _ = Describe("InstallExternalBackend", func() {
 				testBackendPath,
 				"", // name should be inferred as "source-backend"
 				"",
+				false, // force
 				false,
 			)
 			// The function should at least attempt to install with the inferred name
@@ -155,6 +159,7 @@ var _ = Describe("InstallExternalBackend", func() {
 				testBackendPath,
 				"custom-backend-name",
 				"",
+				false, // force
 				false,
 			)
 			// The function should use the provided name
@@ -173,6 +178,7 @@ var _ = Describe("InstallExternalBackend", func() {
 				testBackendPath,
 				"custom-backend-name",
 				"custom-alias",
+				false, // force
 				false,
 			)
 			// The function should accept alias for directory paths
--- a/core/services/galleryop/managers_local.go
+++ b/core/services/galleryop/managers_local.go
@@ -110,10 +110,13 @@ func (b *LocalBackendManager) CheckUpgrades(ctx context.Context) (map[string]gal
 func (b *LocalBackendManager) InstallBackend(ctx context.Context, op *ManagementOp[gallery.GalleryBackend, any], progressCb ProgressCallback) error {
 	if op.ExternalURI != "" {
 		return InstallExternalBackend(ctx, b.backendGalleries, b.systemState, b.modelLoader,
-			progressCb, op.ExternalURI, op.ExternalName, op.ExternalAlias, b.requireBackendIntegrity)
+			progressCb, op.ExternalURI, op.ExternalName, op.ExternalAlias, op.Force, b.requireBackendIntegrity)
 	}
+	// op.Force distinguishes an explicit reinstall from an idempotent
+	// "make sure it's installed" op; the latter must not re-download an
+	// already-runnable backend (supervisors apply on every boot).
 	return gallery.InstallBackendFromGallery(ctx, b.backendGalleries, b.systemState,
-		b.modelLoader, op.GalleryElementName, progressCb, true, b.requireBackendIntegrity)
+		b.modelLoader, op.GalleryElementName, progressCb, op.Force, b.requireBackendIntegrity)
 }

 func (b *LocalBackendManager) IsDistributed() bool { return false }
--- a/core/services/galleryop/operation.go
+++ b/core/services/galleryop/operation.go
@@ -45,6 +45,13 @@ type ManagementOp[T any, E any] struct {

 	// Upgrade is true if this is an upgrade operation (not a fresh install)
 	Upgrade bool
+
+	// Force reinstalls a backend even when it is already installed and
+	// runnable. Without it a backend install op is idempotent — API clients
+	// that ensure a backend exists on every boot must not trigger a full
+	// artifact re-download each time. The UI's explicit "Reinstall backend"
+	// action sets it.
+	Force bool
 }

 type OpStatus struct {
--- a/core/services/nodes/router.go
+++ b/core/services/nodes/router.go
@@ -68,6 +68,13 @@ type SmartRouterOptions struct {
 	// the absolute model paths untouched so the worker loads them directly from
 	// the shared volume (#10556). See config.DistributedConfig.SharedModels.
 	SharedModels bool
+	// ModelLoadCeiling is the hard upper bound on how long a single cold-load
+	// attempt (node selection -> backend install -> file staging -> LoadModel)
+	// may run while holding the per-model advisory lock. It backstops every
+	// sub-step's own timeout so a wedged worker can never pin the lock - and
+	// every other replica's request for that model - indefinitely. Zero selects
+	// defaultModelLoadCeiling.
+	ModelLoadCeiling time.Duration
 }

 // SmartRouter routes inference requests to the best available backend node.
@@ -101,8 +108,18 @@ type SmartRouter struct {
 	// sharedModels skips file staging when all nodes mount the same models
 	// directory at the same path (see SmartRouterOptions.SharedModels).
 	sharedModels bool
+	// modelLoadCeiling bounds how long a cold load may hold the per-model
+	// advisory lock (see SmartRouterOptions.ModelLoadCeiling).
+	modelLoadCeiling time.Duration
 }

+// defaultModelLoadCeiling is the fallback hold ceiling for a cold model load.
+// It must comfortably exceed the slowest legitimate load - a multi-GB backend
+// install (DefaultBackendInstallTimeout, 15m) plus staging and the remote
+// LoadModel (5m) - so it never cuts a real load short; it only ever fires when
+// a step is genuinely wedged (e.g. a worker that died mid-install).
+const defaultModelLoadCeiling = 25 * time.Minute
+
 // probeCacheTTL is how long a successful gRPC HealthCheck on a backend is
 // trusted before the next request re-probes. Matches healthCheckTTL in
 // pkg/model/model.go so the single-process and distributed paths share a
@@ -117,6 +134,10 @@ func NewSmartRouter(registry ModelRouter, opts SmartRouterOptions) *SmartRouter
 	if factory == nil {
 		factory = &tokenClientFactory{token: opts.AuthToken}
 	}
+	ceiling := opts.ModelLoadCeiling
+	if ceiling <= 0 {
+		ceiling = defaultModelLoadCeiling
+	}
 	return &SmartRouter{
 		registry:         registry,
 		unloader:         opts.Unloader,
@@ -131,6 +152,7 @@ func NewSmartRouter(registry ModelRouter, opts SmartRouterOptions) *SmartRouter
 		prefixConfig:     opts.PrefixConfig,
 		pressure:         opts.Pressure,
 		sharedModels:     opts.SharedModels,
+		modelLoadCeiling: ceiling,
 	}
 }

@@ -383,11 +405,19 @@ func (r *SmartRouter) Route(ctx context.Context, modelID, modelName, backendType
 	// the request context. If staging were bound to it, the multi-GB upload
 	// aborts with "context canceled" mid-transfer and large models can never
 	// finish staging (the model-load outage). WithoutCancel keeps the request's
-	// values (prefix chain, etc.) but drops its cancellation/deadline. Each
-	// long step still has its own bound (the file stager's resume budget,
-	// LoadModel's 5m timeout), and the per-model advisory lock below de-dupes
-	// concurrent loaders across replicas.
-	loadCtx := context.WithoutCancel(ctx)
+	// values (prefix chain, etc.) but drops its cancellation/deadline.
+	//
+	// Detaching from the caller is necessary, but it must not be unbounded: the
+	// load runs while holding the per-model advisory lock, and a worker that
+	// dies mid-install (its backend.install never replies) would otherwise pin
+	// that lock (and every other replica's request for the same model) until
+	// the NATS install deadline alone expires. Re-impose a single hard ceiling
+	// over the whole sequence so the lock is always released in bounded time,
+	// even if a sub-step wedges. Each long step still has its own (tighter)
+	// bound; this only backstops them. The per-model advisory lock below
+	// de-dupes concurrent loaders across replicas.
+	loadCtx, cancelLoad := context.WithTimeout(context.WithoutCancel(ctx), r.modelLoadCeiling)
+	defer cancelLoad()
 	loadModel := func(ctx context.Context) (*RouteResult, error) {
 		// Re-check after acquiring lock — another request may have loaded it
 		node, nm, err := r.registry.FindAndLockNodeWithModel(ctx, trackingKey, candidateNodeIDs, pref)
@@ -916,7 +946,14 @@ func (r *SmartRouter) installBackendOnNode(ctx context.Context, node *BackendNod
 	}

 	key := fmt.Sprintf("%s|%s|%s|%d", node.ID, backendType, modelID, replicaIndex)
-	v, err, _ := r.installFlight.Do(key, func() (any, error) {
+	// DoChan rather than Do so this wait honors ctx cancellation. InstallBackend
+	// blocks for its full NATS deadline (15m by default) when a worker accepts
+	// the request but never replies (e.g. it died mid-install). Without ctx
+	// awareness the caller (holding the per-model advisory lock) would sit there
+	// the whole time; here a cancelled ctx (typically the model-load ceiling)
+	// frees the caller promptly. The shared install keeps running in the
+	// background and still coalesces other callers via singleflight.
+	resCh := r.installFlight.DoChan(key, func() (any, error) {
 		reply, err := r.unloader.InstallBackend(node.ID, backendType, modelID, r.galleriesJSON, "", "", "", replicaIndex, "", nil)
 		if err != nil {
 			return "", err
@@ -931,10 +968,15 @@ func (r *SmartRouter) installBackendOnNode(ctx context.Context, node *BackendNod
 		}
 		return addr, nil
 	})
-	if err != nil {
-		return "", err
+	select {
+	case <-ctx.Done():
+		return "", ctx.Err()
+	case res := <-resCh:
+		if res.Err != nil {
+			return "", res.Err
+		}
+		return res.Val.(string), nil
 	}
-	return v.(string), nil
 }

 func (r *SmartRouter) buildClientForAddr(node *BackendNode, addr string, parallel bool) grpc.Backend {
--- a/core/services/nodes/router_test.go
+++ b/core/services/nodes/router_test.go
@@ -493,6 +493,44 @@ var _ = Describe("SmartRouter", func() {
 				Expect(result.Node.ID).To(Equal("n3"))
 			})
 		})
+
+		Context("worker wedges mid-install (dead node holding the lock)", func() {
+			It("aborts the load at the ModelLoadCeiling instead of blocking forever", func() {
+				// Simulate the production incident: the chosen worker accepts the
+				// backend.install but never replies (it died), so InstallBackend
+				// would otherwise block for its full NATS deadline (15m by
+				// default) while pinning the per-model advisory lock. Route must
+				// give up at the ceiling so the lock is released promptly.
+				reg.findAndLockErr = errors.New("not found")
+				reg.findIdleNode = &BackendNode{ID: "n4", Name: "dead-node", Address: "10.0.0.4:50051"}
+
+				block := make(chan struct{})
+				defer close(block) // let the background install goroutine drain at test end
+				unloader.installHook = func() { <-block }
+
+				router := NewSmartRouter(reg, SmartRouterOptions{
+					Unloader:         unloader,
+					ClientFactory:    factory,
+					ModelLoadCeiling: 200 * time.Millisecond,
+				})
+
+				done := make(chan error, 1)
+				start := time.Now()
+				go func() {
+					defer GinkgoRecover()
+					_, err := router.Route(context.Background(), "wedged-model",
+						"models/wedged.gguf", "llama-cpp",
+						&pb.ModelOptions{Model: "models/wedged.gguf"}, false)
+					done <- err
+				}()
+
+				var routeErr error
+				Eventually(done, 5*time.Second).Should(Receive(&routeErr),
+					"Route must not block on a wedged install past the ceiling")
+				Expect(routeErr).To(HaveOccurred())
+				Expect(time.Since(start)).To(BeNumerically("<", 5*time.Second))
+			})
+		})
 	})

 	Describe("scheduleNewModel (mock-based, via Route)", func() {
--- a/core/services/worker/install.go
+++ b/core/services/worker/install.go
@@ -134,7 +134,7 @@ func (s *backendSupervisor) installBackend(req messaging.BackendInstallRequest,
 		if req.URI != "" {
 			xlog.Info("Installing backend from external URI", "backend", req.Backend, "uri", req.URI, "force", force)
 			if err := galleryop.InstallExternalBackend(
-				context.Background(), galleries, s.systemState, s.ml, downloadCb, req.URI, req.Name, req.Alias, s.cfg.RequireBackendIntegrity,
+				context.Background(), galleries, s.systemState, s.ml, downloadCb, req.URI, req.Name, req.Alias, force, s.cfg.RequireBackendIntegrity,
 			); err != nil {
 				return "", fmt.Errorf("installing backend from gallery: %w", err)
 			}
@@ -201,7 +201,7 @@ func (s *backendSupervisor) upgradeBackend(req messaging.BackendUpgradeRequest)
 	if req.URI != "" {
 		xlog.Info("Upgrading backend from external URI", "backend", req.Backend, "uri", req.URI)
 		if err := galleryop.InstallExternalBackend(
-			context.Background(), galleries, s.systemState, s.ml, downloadCb, req.URI, req.Name, req.Alias, s.cfg.RequireBackendIntegrity,
+			context.Background(), galleries, s.systemState, s.ml, downloadCb, req.URI, req.Name, req.Alias, true, s.cfg.RequireBackendIntegrity,
 		); err != nil {
 			return fmt.Errorf("upgrading backend from external URI: %w", err)
 		}
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -1758,8 +1758,8 @@
      use_tokenizer_template: true
  files:
    - filename: llama-cpp/models/Qwopus3.5-9B-Coder-MTP-GGUF/Qwopus3.5-9B-Coder-MTP-Q4_K_M.gguf
-      sha256: f6fc5d193045796d9e1870cbc40f827fe55f53f70593c3f5c1968b82b9331991
      uri: https://huggingface.co/Jackrong/Qwopus3.5-9B-Coder-MTP-GGUF/resolve/main/Qwopus3.5-9B-Coder-MTP-Q4_K_M.gguf
+      sha256: 9ea3ecd122a5165b8b81655f29eaf09d71daf841503e4c4212bdfadb36ab3712
    - filename: llama-cpp/mmproj/Qwopus3.5-9B-Coder-MTP-GGUF/Qwopus3.5-9B-Coder-MTP-mmproj.gguf
      sha256: f48daca405a1c768a9514e392c3955dcc4a9d66a5cf64cf45e064092b5f20ee4
      uri: https://huggingface.co/Jackrong/Qwopus3.5-9B-Coder-MTP-GGUF/resolve/main/Qwopus3.5-9B-Coder-MTP-mmproj.gguf
--- a/swagger/docs.go
+++ b/swagger/docs.go
@@ -3605,6 +3605,10 @@ const docTemplate = `{
        "localai.GalleryBackend": {
            "type": "object",
            "properties": {
+                "force": {
+                    "description": "Force reinstalls the backend even when it is already installed and\nrunnable. Off by default so apply stays idempotent for supervising\napps that ensure their backend on every boot.",
+                    "type": "boolean"
+                },
                "id": {
                    "type": "string"
                }
--- a/swagger/swagger.json
+++ b/swagger/swagger.json
@@ -3602,6 +3602,10 @@
        "localai.GalleryBackend": {
            "type": "object",
            "properties": {
+                "force": {
+                    "description": "Force reinstalls the backend even when it is already installed and\nrunnable. Off by default so apply stays idempotent for supervising\napps that ensure their backend on every boot.",
+                    "type": "boolean"
+                },
                "id": {
                    "type": "string"
                }
--- a/swagger/swagger.yaml
+++ b/swagger/swagger.yaml
@@ -303,6 +303,12 @@ definitions:
    type: object
  localai.GalleryBackend:
    properties:
+      force:
+        description: |-
+          Force reinstalls the backend even when it is already installed and
+          runnable. Off by default so apply stays idempotent for supervising
+          apps that ensure their backend on every boot.
+        type: boolean
      id:
        type: string
    type: object
Author	SHA1	Message	Date
Ettore Di Giacinto	3a1507463f	fix(backends): don't force-reinstall LOCALAI_EXTERNAL_BACKENDS on boot The startup loop for LOCALAI_EXTERNAL_BACKENDS runs InstallExternalBackend for each listed backend on every boot, and its gallery-name path hardcoded force=true — so every start re-downloaded and re-extracted each listed backend's OCI image even when it was installed and runnable. Supervising apps that list several backends paid several full OCI pulls per launch. Give InstallExternalBackend an explicit force parameter (it only affects the gallery-name fallback; URI installs always write) and pass: - false from the boot loop and `local-ai backends install` (idempotent ensure — `backends upgrade` is the refresh path), - op.Force from the local manager's external-URI op, - the request's force on the worker install path and true on its upgrade path (behavior unchanged). Assisted-by: Claude:claude-fable-5 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-07-02 11:11:40 +00:00
Ettore Di Giacinto	d64743e565	fix(backends): make backend install ops idempotent unless forced POST /backends/apply hardcoded force=true through LocalBackendManager.InstallBackend, so applying an already-installed backend re-downloaded and re-extracted the whole artifact every time. API clients that ensure a backend exists at startup paid a full OCI image pull on every boot. Backend install ops now default to non-forced — an installed, runnable backend short-circuits (the orphaned-meta reinstall path in InstallBackendFromGallery is preserved) — and reinstall stays available: - ManagementOp gains a Force field; the local manager passes it through instead of hardcoding true. - /backends/apply accepts an optional "force" boolean in the body. - The React UI install route keeps forcing, since its button doubles as the explicit "Reinstall backend" action. Distributed installs already behaved this way (workers skip when the binary exists unless force is set); this aligns single-node behavior. Assisted-by: Claude:claude-fable-5 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-07-02 11:00:30 +00:00
LocalAI [bot]	29001a88c1	fix(distributed): don't let a dead worker pin the model-load advisory lock (#10600 ) * fix(distributed): don't let a dead worker pin the model-load advisory lock In distributed mode a chat request could fail with: failed to route model with internal loader: routing model ...: loading model ...: advisorylock: acquiring lock <id>: ERROR: canceling statement due to lock timeout (SQLSTATE 55P03) Root cause is two independent defects in the cross-replica model-load path: 1. SmartRouter.Route holds a per-model PostgreSQL advisory lock for the whole cold-load sequence, which includes installBackendOnNode -> InstallBackend, a NATS request-reply with a 15m deadline (DefaultBackendInstallTimeout) that ignored ctx. When the chosen worker died mid-install, the holder sat on the lock for up to 15m. The detached loadCtx (WithoutCancel) had no deadline, so nothing capped the hold. 2. The acquiring statement, pg_advisory_lock(), is subject to any deployment global lock_timeout. A common operator setting (e.g. 10s) aborts the wait with SQLSTATE 55P03, so every other replica's request for that model hard -errored instead of waiting for the in-progress load and reusing it. For the ~15m window the model was effectively unroutable. Fixes: - advisorylock.WithLockCtx (postgres): SET lock_timeout = 0 on its dedicated connection (RESET before it returns to the pool) so the Go context, not a deployment-wide GUC, governs how long we wait. Waiters now block and then re-check, reusing the model another replica just loaded. - SmartRouter: bound the detached loadCtx with a single ModelLoadCeiling so the lock is always released in bounded time even if a sub-step wedges. Default is the configured backend.install deadline + 10m (staging + LoadModel margin), so a legitimately slow load is never cut. - installBackendOnNode: use singleflight.DoChan + select on ctx.Done() so the install wait honors cancellation; the ceiling can then actually free a caller pinned behind a dead worker. The shared install still coalesces via singleflight. Reproduced both defects as failing tests first (a real 55P03 against a testcontainer with a short lock_timeout; a wedged install that blocks Route) and confirmed green. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] * fix(distributed): bound advisory-lock wait instead of disabling lock_timeout Setting lock_timeout = 0 to override a deployment's short global lock_timeout meant "wait forever" server-side. Safe for SmartRouter.Route (its loadCtx now carries the model-load ceiling) but unsafe for the schema-migration callers that pass context.Background(): a holder whose session never releases would hang them indefinitely. Derive the server-side lock_timeout from the caller's context instead: its remaining budget plus a margin (so the Go context's cancellation still wins with a clean error and the server bound is only a backstop), or a finite 30m backstop when the context has no deadline. Never zero - "wait forever" is no longer possible, while a deployment's hostile short lock_timeout is still overridden so legitimate cross-replica waits don't fail with 55P03. Added a spec proving a deadline-less waiter gives up at the (shrunk) backstop rather than hanging. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2026-07-02 09:52:51 +02:00
LocalAI [bot]	b0bfa0852e	chore: ⬆️ Update CrispStrobe/CrispASR to `fcbc8718e654995e3bd2d0c98bcb8e55e297d23c` (#10634 ) ⬆️ Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-02 09:48:20 +02:00
LocalAI [bot]	39a93e91cf	chore: ⬆️ Update vllm-metal (darwin) to `v0.3.0.dev20260701132215` (#10633 ) ⬆️ Update vllm-project/vllm-metal (darwin) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-02 09:48:08 +02:00
LocalAI [bot]	26e0c98967	chore: ⬆️ Update leejet/stable-diffusion.cpp to `3590aa8d626e671a1b1dc84506ea2932a243a480` (#10631 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-02 09:47:54 +02:00
LocalAI [bot]	9acca54b25	chore: ⬆️ Update mudler/parakeet.cpp to `e8acc6172a94e20a952cf1843decace5d771a94b` (#10629 ) ⬆️ Update mudler/parakeet.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-02 09:47:41 +02:00
LocalAI [bot]	2728e6000e	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `068b173649f2fd8dc96b35ada5a0b76d8985105d` (#10632 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-02 09:47:28 +02:00
LocalAI [bot]	006310d746	chore: ⬆️ Update ggml-org/llama.cpp to `4fc4ec5541b243957ae5099edb67372f8f3b550e` (#10630 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-02 09:47:15 +02:00
LocalAI [bot]	05acdb1778	chore: ⬆️ Update ggml-org/whisper.cpp to `6fc7c33b4c3a2cec83e4b65abd5e96a890480375` (#10635 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-02 09:47:01 +02:00
LocalAI [bot]	5e68b5700c	chore(model-gallery): ⬆️ update checksum (#10637 ) ⬆️ Checksum updates in gallery/index.yaml Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-07-02 09:26:32 +02:00
pos-ei-don	7910018249	fix(vllm): non-streaming tool-call regression after #10351 (#10638 ) fix(vllm): non-streaming tool-call regression after #10351 (native_streaming is a capability flag, not a state flag) #10351 introduced native streaming via `parser.extract_tool_calls_streaming` and gated the post-loop `extract_tool_calls` block on `native_streaming and not native_streaming_error`. That works for streaming requests, but for non-streaming requests the same flag is still True (it only means "the parser can stream", not "we actually streamed"), so the block was skipped and the `elif` cleared `content = ""` — the tool call was silently lost. Symptom: non-streaming chat.completions with `tools=[...]` returns `finish_reason: "stop"` with `content: ""` and no `tool_calls`. Streaming requests are unaffected. Fix: gate both branches on `streaming` too, so the extract_tool_calls block runs for non-streaming requests (and for streaming requests that fell back to the buffered path). Reproduction (vLLM 0.24, Qwen3-Coder-Next-NVFP4, qwen3_coder parser): curl -s -X POST http://localhost:8080/v1/chat/completions \ -H 'Content-Type: application/json' \ -d '{"model":"coder","stream":false, "messages":[{"role":"user","content":"7*8 via calc"}], "tools":[{"type":"function","function":{"name":"calc", "parameters":{"type":"object", "properties":{"expression":{"type":"string"}}}}}]}' Before: finish_reason: "stop", content: "", tool_calls: [] After: finish_reason: "tool_calls", tool_calls[0].function.name: "calc" Streaming path re-verified in the same setup: delta.tool_calls arrives token-by-token, finish_reason: "tool_calls", no raw XML in content. Signed-off-by: pos-ei-don <1822533+pos-ei-don@users.noreply.github.com>	2026-07-02 09:26:14 +02:00
LocalAI [bot]	1a03712a6f	fix(hipblas): symlink amdgpu.ids so ROCm backends find the ASIC ID table (#10627 ) * fix(hipblas): symlink amdgpu.ids so ROCm backends find the ASIC ID table ROCm's bundled libdrm_amdgpu looks up the GPU ASIC ID table at a hardcoded fallback path, /opt/amdgpu/share/libdrm/amdgpu.ids, which is only populated by AMD's full amdgpu-install (graphics/DKMS) stack. The hipblas image is compute-only and doesn't have it, so every model load logs "No such file or directory" and the GPU can't be identified. Symlink it to the equivalent file already shipped by Ubuntu's libdrm-amdgpu1 package. Fixes #10624 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(hipblas): correct amdgpu.ids source package name in comment Verified against the real rocm/dev-ubuntu-24.04:7.2.1 image with hipblas-dev/hipblaslt-dev/rocblas-dev installed: /usr/share/libdrm/amdgpu.ids is owned by libdrm-common, not libdrm-amdgpu1 as the comment said. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-07-02 09:25:14 +02:00