Compare commits

..

1 Commits

Author SHA1 Message Date
Ettore Di Giacinto
d2dbb81af4 feat(distributed): add LOCALAI_DISTRIBUTED_SHARED_MODELS to skip staging on shared volumes (#10556)
In distributed mode, even when the frontend and workers share the same
models directory via a shared volume mount, starting a model on a worker
re-staged (re-downloaded) it: stageModelFiles always uploads model files
into a tracking-key-namespaced subdir on the worker, and the staging probe
only checks that staged location, so a file already present on the shared
volume at the canonical path was never reused.

Add a config switch LOCALAI_DISTRIBUTED_SHARED_MODELS (default false). When
enabled, the operator asserts that all nodes mount the SAME models directory
at the SAME path, so staging is unnecessary: the frontend's absolute model
paths are already valid on the worker. In that mode stageModelFiles returns
the cloned opts unchanged without uploading, leaving the path fields pointing
at their canonical absolute paths so the worker loads them directly from the
shared volume.

The value is plumbed from DistributedConfig through SmartRouterOptions into
the SmartRouter. Docs and docker-compose.distributed.yaml updated.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-27 22:02:04 +00:00
10 changed files with 149 additions and 120 deletions

View File

@@ -355,6 +355,7 @@ func initDistributed(cfg *config.ApplicationConfig, authDB *gorm.DB, configLoade
PrefixProvider: prefixProvider,
PrefixConfig: prefixCfg,
Pressure: pressure,
SharedModels: cfg.Distributed.SharedModels,
})
// Wire staging-progress broadcasting so file-staging shows up on every

View File

@@ -160,6 +160,7 @@ type RunCMD struct {
RegistrationRequireAuth bool `env:"LOCALAI_REGISTRATION_REQUIRE_AUTH" default:"false" help:"Fail startup when distributed mode is enabled but LOCALAI_REGISTRATION_TOKEN is empty (node endpoints and worker file-transfer server would otherwise be unauthenticated)" group:"distributed"`
DistributedRequireAuth bool `env:"LOCALAI_DISTRIBUTED_REQUIRE_AUTH" default:"false" help:"Umbrella switch: require BOTH NATS JWT credentials and a registration token when distributed mode is enabled (implies --nats-require-auth and --registration-require-auth)" group:"distributed"`
AutoApproveNodes bool `env:"LOCALAI_AUTO_APPROVE_NODES" default:"false" help:"Auto-approve new worker nodes (skip admin approval)" group:"distributed"`
DistributedSharedModels bool `env:"LOCALAI_DISTRIBUTED_SHARED_MODELS" default:"false" help:"Assert that every node mounts the SAME models directory at the SAME path (shared volume). When true, the router skips staging model files to workers and loads them directly from the shared path, avoiding re-downloads." group:"distributed"`
DistributedPrefixCache bool `env:"LOCALAI_DISTRIBUTED_PREFIX_CACHE" default:"true" help:"Enable prefix-cache-aware routing in distributed mode (default true). When false, routing falls back to round-robin." group:"distributed"`
DistributedPrefixCacheTTL string `env:"LOCALAI_DISTRIBUTED_PREFIX_CACHE_TTL" help:"Idle-timeout for prefix-cache index entries; also drives the background eviction cadence (every TTL/2). Default 5m." group:"distributed"`
BackendInstallTimeout string `env:"LOCALAI_NATS_BACKEND_INSTALL_TIMEOUT" help:"NATS round-trip timeout for backend.install requests sent to worker nodes (default 15m). Increase for slow links pulling multi-GB images." group:"distributed"`
@@ -310,6 +311,9 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error {
if r.DistributedRequireAuth {
opts = append(opts, config.EnableDistributedRequireAuth)
}
if r.DistributedSharedModels {
opts = append(opts, config.EnableDistributedSharedModels)
}
if r.NatsAccountSeed != "" {
opts = append(opts, config.WithNatsAccountSeed(r.NatsAccountSeed))
}

View File

@@ -31,6 +31,14 @@ type DistributedConfig struct {
// available to enforce just one layer.
RequireAuth bool // LOCALAI_DISTRIBUTED_REQUIRE_AUTH
AutoApproveNodes bool // --auto-approve-nodes / LOCALAI_AUTO_APPROVE_NODES (skip admin approval for new workers)
// SharedModels asserts that every node (frontend and workers) mounts the
// SAME models directory at the SAME path (e.g. a shared volume, as in
// docker-compose.distributed.yaml). When true, the router skips staging
// model files to workers entirely: the frontend's absolute model paths are
// already valid on the worker, so re-uploading them into a per-model
// subdirectory only re-downloads what is already present (#10556). Default
// false preserves the historical per-node staging behavior.
SharedModels bool // --distributed-shared-models / LOCALAI_DISTRIBUTED_SHARED_MODELS
// NATS JWT auth (optional; see pkg/natsauth and docs/features/distributed-mode.md)
NatsAccountSeed string // LOCALAI_NATS_ACCOUNT_SEED — account signing seed to mint per-node worker JWTs
@@ -282,6 +290,13 @@ var EnableAutoApproveNodes = func(o *ApplicationConfig) {
o.Distributed.AutoApproveNodes = true
}
// EnableDistributedSharedModels marks the cluster as sharing one models
// directory across all nodes, so the router skips staging model files to
// workers (see DistributedConfig.SharedModels).
var EnableDistributedSharedModels = func(o *ApplicationConfig) {
o.Distributed.SharedModels = true
}
// DisablePrefixCache turns off prefix-cache-aware routing (falls back to
// round-robin). Prefix-cache routing is enabled by default in distributed mode.
var DisablePrefixCache = func(o *ApplicationConfig) {

View File

@@ -25,7 +25,6 @@ import (
"github.com/mudler/LocalAI/core/http/auth"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/galleryop"
"github.com/mudler/LocalAI/core/services/messaging"
"github.com/mudler/LocalAI/core/services/nodes"
"github.com/mudler/LocalAI/core/services/nodes/prefixcache"
"github.com/mudler/LocalAI/pkg/httpclient"
@@ -551,23 +550,12 @@ func DeleteBackendOnNodeEndpoint(unloader nodes.NodeCommandSender) echo.HandlerF
}
// ListBackendsOnNodeEndpoint lists installed backends on a worker node via NATS.
func ListBackendsOnNodeEndpoint(unloader nodes.NodeCommandSender, registry *nodes.NodeRegistry) echo.HandlerFunc {
func ListBackendsOnNodeEndpoint(unloader nodes.NodeCommandSender) echo.HandlerFunc {
return func(c echo.Context) error {
nodeID := c.Param("id")
// Agent-type workers don't run backends and never subscribe to the
// nodes.<id>.backend.list NATS subject, so the request would hang
// until timeout with "no responders". Their backend list is simply
// empty. Mirror the aggregate-list guard in managers_distributed.go
// (skip nodes whose NodeType is set and not "backend") so the
// single-node and cluster-wide views stay consistent.
if node, err := registry.Get(c.Request().Context(), nodeID); err == nil {
if node.NodeType != "" && node.NodeType != nodes.NodeTypeBackend {
return c.JSON(http.StatusOK, []messaging.NodeBackendInfo{})
}
}
if unloader == nil {
return c.JSON(http.StatusServiceUnavailable, nodeError(http.StatusServiceUnavailable, "NATS not configured"))
}
nodeID := c.Param("id")
reply, err := unloader.ListBackends(nodeID)
if err != nil {
xlog.Error("Failed to list backends on node", "node", nodeID, "error", err)

View File

@@ -1,103 +0,0 @@
package localai
import (
"context"
"encoding/json"
"net/http"
"net/http/httptest"
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/services/messaging"
"github.com/mudler/LocalAI/core/services/nodes"
"github.com/mudler/LocalAI/core/services/testutil"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
// stubNodeCommandSender records whether ListBackends was invoked so the test can
// assert the endpoint short-circuits (no NATS request) for agent-type nodes.
type stubNodeCommandSender struct {
listBackendsCalled bool
}
func (s *stubNodeCommandSender) InstallBackend(_, _, _, _, _, _, _ string, _ int, _ string, _ func(messaging.BackendInstallProgressEvent)) (*messaging.BackendInstallReply, error) {
return &messaging.BackendInstallReply{}, nil
}
func (s *stubNodeCommandSender) UpgradeBackend(_, _, _, _, _, _ string, _ int, _ string, _ func(messaging.BackendInstallProgressEvent)) (*messaging.BackendUpgradeReply, error) {
return &messaging.BackendUpgradeReply{}, nil
}
func (s *stubNodeCommandSender) DeleteBackend(_, _ string) (*messaging.BackendDeleteReply, error) {
return &messaging.BackendDeleteReply{Success: true}, nil
}
func (s *stubNodeCommandSender) ListBackends(_ string) (*messaging.BackendListReply, error) {
s.listBackendsCalled = true
return &messaging.BackendListReply{Backends: []messaging.NodeBackendInfo{{Name: "llama-cpp"}}}, nil
}
func (s *stubNodeCommandSender) StopBackend(_, _ string) error { return nil }
func (s *stubNodeCommandSender) UnloadModelOnNode(_, _ string) error { return nil }
var _ = Describe("ListBackendsOnNodeEndpoint", func() {
var registry *nodes.NodeRegistry
BeforeEach(func() {
db := testutil.SetupTestDB()
var err error
registry, err = nodes.NewNodeRegistry(db)
Expect(err).ToNot(HaveOccurred())
})
callEndpoint := func(unloader nodes.NodeCommandSender, nodeID string) *httptest.ResponseRecorder {
e := echo.New()
req := httptest.NewRequest(http.MethodGet, "/", nil)
rec := httptest.NewRecorder()
c := e.NewContext(req, rec)
c.SetParamNames("id")
c.SetParamValues(nodeID)
handler := ListBackendsOnNodeEndpoint(unloader, registry)
Expect(handler(c)).To(Succeed())
return rec
}
It("returns an empty list for an agent node without issuing a NATS request", func() {
ctx := context.Background()
node := &nodes.BackendNode{Name: "agent-1", NodeType: nodes.NodeTypeAgent}
Expect(registry.Register(ctx, node, true)).To(Succeed())
stub := &stubNodeCommandSender{}
rec := callEndpoint(stub, node.ID)
Expect(rec.Code).To(Equal(http.StatusOK))
Expect(stub.listBackendsCalled).To(BeFalse(),
"agent workers don't subscribe to backend.list; the endpoint must not issue the doomed NATS request")
var list []messaging.NodeBackendInfo
Expect(json.Unmarshal(rec.Body.Bytes(), &list)).To(Succeed())
Expect(list).To(BeEmpty())
// Must be `[]`, not `null`, so the UI can render it.
Expect(rec.Body.String()).To(ContainSubstring("[]"))
})
It("consults the unloader (NATS) for a backend node", func() {
ctx := context.Background()
node := &nodes.BackendNode{Name: "backend-1", NodeType: nodes.NodeTypeBackend, Address: "10.0.0.1:50051"}
Expect(registry.Register(ctx, node, true)).To(Succeed())
stub := &stubNodeCommandSender{}
rec := callEndpoint(stub, node.ID)
Expect(rec.Code).To(Equal(http.StatusOK))
Expect(stub.listBackendsCalled).To(BeTrue(),
"backend nodes must still be queried over NATS")
var list []messaging.NodeBackendInfo
Expect(json.Unmarshal(rec.Body.Bytes(), &list)).To(Succeed())
Expect(list).To(HaveLen(1))
Expect(list[0].Name).To(Equal("llama-cpp"))
})
})

View File

@@ -88,7 +88,7 @@ func RegisterNodeAdminRoutes(e *echo.Echo, registry *nodes.NodeRegistry, unloade
admin.POST("/:id/approve", localai.ApproveNodeEndpoint(registry, authDB, hmacSecret, natsCfg))
// Backend management on workers
admin.GET("/:id/backends", localai.ListBackendsOnNodeEndpoint(unloader, registry))
admin.GET("/:id/backends", localai.ListBackendsOnNodeEndpoint(unloader))
admin.POST("/:id/backends/install", localai.InstallBackendOnNodeEndpoint(unloader, galleryService, opcache, appConfig))
admin.POST("/:id/backends/delete", localai.DeleteBackendOnNodeEndpoint(unloader))

View File

@@ -63,6 +63,11 @@ type SmartRouterOptions struct {
// The reconciler reads the same instance to autoscale a saturated cache-warm
// replica. nil disables recording (the disabled path stays a no-op).
Pressure *prefixcache.Pressure
// SharedModels asserts that every node mounts the same models directory at
// the same path. When true, stageModelFiles skips all uploading and leaves
// the absolute model paths untouched so the worker loads them directly from
// the shared volume (#10556). See config.DistributedConfig.SharedModels.
SharedModels bool
}
// SmartRouter routes inference requests to the best available backend node.
@@ -93,6 +98,9 @@ type SmartRouter struct {
// per-request routing doesn't stall behind a busy backend's serialized
// HealthCheck/Predict. See probe_cache.go for the rationale.
probeCache *probeCache
// sharedModels skips file staging when all nodes mount the same models
// directory at the same path (see SmartRouterOptions.SharedModels).
sharedModels bool
}
// probeCacheTTL is how long a successful gRPC HealthCheck on a backend is
@@ -122,6 +130,7 @@ func NewSmartRouter(registry ModelRouter, opts SmartRouterOptions) *SmartRouter
prefixProvider: opts.PrefixProvider,
prefixConfig: opts.PrefixConfig,
pressure: opts.Pressure,
sharedModels: opts.SharedModels,
}
}
@@ -947,6 +956,19 @@ func (r *SmartRouter) buildClientForAddr(node *BackendNode, addr string, paralle
// simply remove the {ModelsPath}/{trackingKey}/ directory.
func (r *SmartRouter) stageModelFiles(ctx context.Context, node *BackendNode, opts *pb.ModelOptions, trackingKey string) (*pb.ModelOptions, error) {
opts = proto.Clone(opts).(*pb.ModelOptions)
// Shared-models mode: every node mounts the same models directory at the
// same path, so the frontend's absolute model paths are already valid on the
// worker. Staging would only re-upload files that already exist on the shared
// volume (under a tracking-key subdir the probe never reuses), re-downloading
// the model on every load (#10556). Return the clone untouched: no upload, no
// path rewrite, no staging tracker.
if r.sharedModels {
xlog.Info("Skipping model file staging: shared-models mode is on (LOCALAI_DISTRIBUTED_SHARED_MODELS); worker loads directly from the shared volume",
"node", node.Name, "modelFile", opts.ModelFile, "trackingKey", trackingKey)
return opts, nil
}
xlog.Info("Staging model files for remote node", "node", node.Name, "modelFile", opts.ModelFile, "trackingKey", trackingKey)
// Derive the frontend models directory from ModelFile and Model.

View File

@@ -0,0 +1,85 @@
package nodes
import (
"context"
"os"
"path/filepath"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
)
// These tests cover shared-models mode (LOCALAI_DISTRIBUTED_SHARED_MODELS): when
// every node mounts the same models directory at the same path, the router must
// NOT stage model files to workers. The canonical absolute path is already valid
// on the worker, so staging would only re-download what is already present
// (#10556).
var _ = Describe("stageModelFiles shared-models mode", func() {
var (
stager *fakeFileStager
node *BackendNode
tmp string
gguf string
modelID = "ornith-1.0-35b"
)
BeforeEach(func() {
stager = &fakeFileStager{}
node = &BackendNode{ID: "node-1", Name: "node-1", Address: "10.0.0.1:50051"}
tmp = GinkgoT().TempDir()
modelDir := filepath.Join(tmp, "models", "llama-cpp", "models")
Expect(os.MkdirAll(modelDir, 0o755)).To(Succeed())
gguf = filepath.Join(modelDir, "ornith.gguf")
Expect(os.WriteFile(gguf, []byte("weights"), 0o644)).To(Succeed())
})
It("does not stage and keeps the canonical absolute ModelFile when shared-models is enabled", func() {
router := &SmartRouter{
fileStager: stager,
stagingTracker: NewStagingTracker(),
sharedModels: true,
}
opts := &pb.ModelOptions{
Model: "llama-cpp/models/ornith.gguf",
ModelFile: gguf,
}
staged, err := router.stageModelFiles(context.Background(), node, opts, modelID)
Expect(err).ToNot(HaveOccurred())
// The file stager must never be touched: no upload, no re-download.
Expect(stager.ensureCalls).To(BeEmpty())
// The worker loads directly from the shared volume, so the path is unchanged.
Expect(staged.ModelFile).To(Equal(gguf))
})
It("stages files (existing behavior) when shared-models is disabled", func() {
router := &SmartRouter{
fileStager: stager,
stagingTracker: NewStagingTracker(),
sharedModels: false,
}
opts := &pb.ModelOptions{
Model: "llama-cpp/models/ornith.gguf",
ModelFile: gguf,
}
staged, err := router.stageModelFiles(context.Background(), node, opts, modelID)
Expect(err).ToNot(HaveOccurred())
// Default mode uploads the model file to the worker.
Expect(stager.ensureCalls).ToNot(BeEmpty())
stagedLocals := make([]string, 0, len(stager.ensureCalls))
for _, c := range stager.ensureCalls {
stagedLocals = append(stagedLocals, c.localPath)
}
Expect(stagedLocals).To(ContainElement(gguf))
// ModelFile is rewritten to the remote (tracking-key namespaced) path.
Expect(staged.ModelFile).ToNot(Equal(gguf))
})
})

View File

@@ -57,6 +57,11 @@ services:
LOCALAI_AGENT_POOL_VECTOR_ENGINE: "postgres"
LOCALAI_AGENT_POOL_DATABASE_URL: "postgresql://localai:localai@postgres:5432/localai?sslmode=disable"
LOCALAI_REGISTRATION_TOKEN: "changeme" # Change this in production!
# Shared-models mode (optional): set when every node mounts the SAME
# models directory at the SAME path (see "Shared Volume Mode" below).
# The router then skips gRPC file staging and workers load models
# directly from the shared volume instead of re-downloading them.
# LOCALAI_DISTRIBUTED_SHARED_MODELS: "true"
# Auth (required for distributed mode — must use PostgreSQL)
LOCALAI_AUTH: "true"
LOCALAI_AUTH_DATABASE_URL: "postgresql://localai:localai@postgres:5432/localai?sslmode=disable"
@@ -157,8 +162,11 @@ services:
# Then add to the volumes section:
# shared_models:
#
# With shared volumes, model files are already available on the backend —
# gRPC file staging becomes a no-op (paths match).
# With shared volumes the model files are already present on every worker at
# the same path. Set LOCALAI_DISTRIBUTED_SHARED_MODELS=true on the frontend
# (see its environment above) so the router skips gRPC file staging and the
# worker loads the model directly from the shared path instead of
# re-downloading it into a per-model subdirectory.
# --- Adding More Workers ---
# Copy the worker-1 service above and change:

View File

@@ -67,6 +67,7 @@ The frontend is a standard LocalAI instance with distributed mode enabled. These
| `--registration-require-auth` | `LOCALAI_REGISTRATION_REQUIRE_AUTH` | `false` | Fail startup when distributed mode is enabled but the registration token is empty (node endpoints and worker file-transfer would otherwise be unauthenticated) |
| `--distributed-require-auth` | `LOCALAI_DISTRIBUTED_REQUIRE_AUTH` | `false` | **Umbrella switch.** Implies both `--nats-require-auth` and `--registration-require-auth` — one knob to lock down the NATS bus *and* the registration/file-transfer layer. Set this in production instead of the two granular flags. |
| `--auto-approve-nodes` | `LOCALAI_AUTO_APPROVE_NODES` | `false` | Auto-approve new worker nodes (skip admin approval) |
| `--distributed-shared-models` | `LOCALAI_DISTRIBUTED_SHARED_MODELS` | `false` | Assert that every node mounts the **same** models directory at the **same** path (a shared volume). When `true`, the router skips file staging entirely and workers load models directly from the shared path instead of re-downloading them. See [Shared models directory](#shared-models-directory). |
| `--auth` | `LOCALAI_AUTH` | `false` | **Must be `true`** for distributed mode |
| `--auth-database-url` | `LOCALAI_AUTH_DATABASE_URL` | *(required)* | PostgreSQL connection URL |
| `--backend-install-timeout` | `LOCALAI_NATS_BACKEND_INSTALL_TIMEOUT` | `15m` | How long the frontend waits for a worker to acknowledge a backend install before considering the request stalled. Raise it when workers pull large backend images over slow links. If a worker takes longer than this, the operation shows as "still installing in background" in the admin UI and clears once the worker finishes. |
@@ -133,6 +134,14 @@ When S3 is not configured, model files are transferred directly from the fronten
For high-throughput or very large model files, S3 can be more efficient since it avoids streaming through the frontend.
### Shared models directory
If every node (frontend and workers) mounts the **same** models directory at the **same** path - for example a shared volume or network filesystem, as shown in the "Shared Volume Mode" section of `docker-compose.distributed.yaml` - the model files are already present on each worker at their canonical path. In that case staging is wasted work: it copies files that already exist into a per-model subdirectory the worker then loads from, which shows up as a re-download of a model you already have.
Set `LOCALAI_DISTRIBUTED_SHARED_MODELS=true` (or `--distributed-shared-models`) on the frontend to skip staging entirely. The router then leaves the model's absolute paths untouched and the worker loads them directly from the shared volume.
This flag is a contract you assert: all nodes must mount identical paths. Leave it off (the default) when workers have independent models directories - the frontend stages files to them over HTTP (or S3) as described above.
{{% notice warning %}}
The worker HTTP file transfer server is authenticated by `LOCALAI_REGISTRATION_TOKEN`. If the token is **empty**, the server **fails open** — anyone who can reach the port gets read/write access to the worker's models/staging/data directories (a remote model-poisoning / exfiltration vector). The worker logs a loud warning at startup in this case. Always set `LOCALAI_REGISTRATION_TOKEN` in distributed mode, and set `LOCALAI_DISTRIBUTED_REQUIRE_AUTH=true` (frontend **and** workers) to make a missing token *or* missing NATS credentials a hard startup error rather than a silent fail-open. Firewall the file-transfer port (gRPC base 1) so only the frontend can reach it.
{{% /notice %}}