mirror of
https://github.com/mudler/LocalAI.git
synced 2026-05-17 04:56:52 -04:00
* feat(messaging): add backend.upgrade NATS subject + payload types
Splits the slow force-reinstall path off backend.install so it can run on
its own subscription goroutine, eliminating head-of-line blocking between
routine model loads and full gallery upgrades.
Wire-level Force flag on BackendInstallRequest is kept for one release as
the rolling-update fallback target; doc note marks it deprecated.
Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(distributed/worker): add per-backend mutex helper to backendSupervisor
Different backend names lock independently; same backend serializes. This
is the synchronization primitive used by the upcoming concurrent install
handler — without it, wrapping the NATS callback in a goroutine would
race the gallery directory when two requests target the same backend.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(distributed/worker): run backend.install handler in a goroutine
NATS subscriptions deliver messages serially on a single per-subscription
goroutine. With a synchronous install handler, a multi-minute gallery
download would head-of-line-block every other install request to the
same worker — manifesting upstream as a 5-minute "nats: timeout" on
unrelated routine model loads.
The body now runs in its own goroutine, with a per-backend mutex
(lockBackend) protecting the gallery directory from concurrent operations
on the same backend. Different backend names install in parallel.
Backward-compat: req.Force=true is still honored here, so an older master
that hasn't been updated to send on backend.upgrade keeps working.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(distributed/worker): subscribe to backend.upgrade as a separate path
Slow force-reinstall now lives on its own NATS subscription, so a
multi-minute gallery pull cannot head-of-line-block the routine
backend.install handler on the same worker. Same per-backend mutex
guards both — concurrent install + upgrade for the same backend
serialize at the gallery directory; different backends are independent.
upgradeBackend stops every live process for the backend, force-installs
from gallery, and re-registers. It does not start a new process — the
next backend.install will spawn one with the freshly-pulled binary.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(distributed): add UpgradeBackend on NodeCommandSender; drop Force from InstallBackend
Master now sends to backend.upgrade for force-reinstall, with a
nats.ErrNoResponders fallback to the legacy backend.install Force=true
path so a rolling update with a new master + an old worker still
converges. The Force parameter leaves the public Go API surface
entirely — only the internal fallback sets it on the wire.
InstallBackend timeout drops 5min -> 3min (most replies are sub-second
since the worker short-circuits on already-running or already-installed).
UpgradeBackend timeout is 15min, sized for real-world Jetson-on-WiFi
gallery pulls.
Updates the admin install HTTP endpoint
(core/http/endpoints/localai/nodes.go) to the new signature too.
router_test.go's fakeUnloader does not yet implement the new interface
shape; Task 3.2 will catch it up before the next package-level test run.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* test(distributed): update fakeUnloader for new NodeCommandSender shape
InstallBackend lost its force bool param (Force is not part of the public
Go API anymore — only the internal upgrade-fallback path sets it on the
wire). UpgradeBackend gained a method. Fake records both call slices and
provides an installHook concurrency seam for upcoming singleflight tests.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* test(distributed): cover UpgradeBackend's new subject + rolling-update fallback
Task 3.1 changed the master to publish UpgradeBackend on the new
backend.upgrade subject; the existing UpgradeBackend tests scripted the
old install subject and so all 3 began failing as expected. Updates them
to script SubjectNodeBackendUpgrade with BackendUpgradeReply.
Adds two new specs for the rolling-update fallback:
- ErrNoResponders on backend.upgrade triggers a backend.install
Force=true retry on the same node.
- Non-NoResponders errors propagate to the caller unchanged.
scriptedMessagingClient gains scriptNoResponders (real nats sentinel) and
scriptReplyMatching (predicate-matched canned reply, used to assert that
the fallback path actually sets Force=true on the install retry).
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(distributed): coalesce concurrent identical backend.install via singleflight
Six simultaneous chat completions for the same not-yet-loaded model were
observed firing six independent NATS install requests, each serializing
through the worker's per-subscription goroutine and amplifying queue
depth. SmartRouter now wraps the NATS round-trip in a singleflight.Group
keyed by (nodeID, backend, modelID, replica): N concurrent identical
loads share one round-trip and one reply.
Distinct (modelID, replica) keys still fire independent calls, so
multi-replica scaling and multi-model fan-out are unaffected.
fakeUnloader gains a sync.Mutex around its recording slices to keep
concurrent test goroutines race-clean.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* test(e2e/distributed): drop force arg from InstallBackend test calls
Two e2e test call sites still passed the trailing force bool that was
removed from RemoteUnloaderAdapter.InstallBackend in 9bde76d7. Caught
by golangci-lint typecheck on the upgrade-split branch (master CI was
already green because these tests don't run in the standard test path).
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactor(distributed): extract worker business logic to core/services/worker
core/cli/worker.go grew to 1212 lines after the backend.upgrade split.
The CLI package was carrying backendSupervisor, NATS lifecycle handlers,
gallery install/upgrade orchestration, S3 file staging, and registration
helpers — all distributed-worker business logic that doesn't belong in
the cobra surface.
Move it to a new core/services/worker package, mirroring the existing
core/services/{nodes,messaging,galleryop} pattern. core/cli/worker.go
shrinks to ~19 lines: a kong-tagged shim that embeds worker.Config and
delegates Run.
No behavior change. All symbols stay unexported except Config and Run.
The three worker-specific tests (addr/replica/concurrency) move with
the code via git mv so history follows them.
Files split as:
worker.go - Run entry point
config.go - Config struct (kong tags retained, kong not imported)
supervisor.go - backendProcess, backendSupervisor, process lifecycle
install.go - installBackend, upgradeBackend, findBackend, lockBackend
lifecycle.go - subscribeLifecycleEvents (verbatim, decomposition is
a follow-up commit)
file_staging.go - subscribeFileStaging, isPathAllowed
registration.go - advertiseAddr, registrationBody, heartbeatBody, etc.
reply.go - replyJSON
process_helpers.go - readLastLinesFromFile
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactor(distributed/worker): decompose subscribeLifecycleEvents into per-event handlers
The 226-line subscribeLifecycleEvents method packed eight NATS subscriptions
inline. Each grew context-shaped doc comments mixed with subscription
plumbing, making it hard to read any one handler without scrolling past the
others. Extract each handler into its own method on *backendSupervisor; the
subscriber becomes a thin 8-line dispatcher.
No behavior change: each method body is byte-equivalent to its corresponding
inline goroutine + handler. Doc comments that were attached to the inline
SubscribeReply calls migrate to the new method godocs.
Adding the next NATS subject is now a 2-line patch to the dispatcher plus
one new method, instead of grafting onto a monolith.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
143 lines
4.2 KiB
Go
143 lines
4.2 KiB
Go
package worker
|
|
|
|
import (
|
|
"cmp"
|
|
"fmt"
|
|
"os"
|
|
"strconv"
|
|
"strings"
|
|
|
|
"github.com/mudler/LocalAI/pkg/xsysinfo"
|
|
)
|
|
|
|
// effectiveBasePort returns the port used as base for gRPC backend processes.
|
|
// Priority: Addr port → ServeAddr port → 50051
|
|
func (cfg *Config) effectiveBasePort() int {
|
|
for _, addr := range []string{cfg.Addr, cfg.ServeAddr} {
|
|
if addr != "" {
|
|
if _, portStr, ok := strings.Cut(addr, ":"); ok {
|
|
if p, _ := strconv.Atoi(portStr); p > 0 {
|
|
return p
|
|
}
|
|
}
|
|
}
|
|
}
|
|
return 50051
|
|
}
|
|
|
|
// advertiseAddr returns the address the frontend should use to reach this node.
|
|
func (cfg *Config) advertiseAddr() string {
|
|
if cfg.AdvertiseAddr != "" {
|
|
return cfg.AdvertiseAddr
|
|
}
|
|
if cfg.Addr != "" {
|
|
return cfg.Addr
|
|
}
|
|
hostname, _ := os.Hostname()
|
|
return fmt.Sprintf("%s:%d", cmp.Or(hostname, "localhost"), cfg.effectiveBasePort())
|
|
}
|
|
|
|
// resolveHTTPAddr returns the address to bind the HTTP file transfer server to.
|
|
// Uses basePort-1 so it doesn't conflict with dynamically allocated gRPC ports
|
|
// which grow upward from basePort.
|
|
func (cfg *Config) resolveHTTPAddr() string {
|
|
if cfg.HTTPAddr != "" {
|
|
return cfg.HTTPAddr
|
|
}
|
|
return fmt.Sprintf("0.0.0.0:%d", cfg.effectiveBasePort()-1)
|
|
}
|
|
|
|
// advertiseHTTPAddr returns the HTTP address the frontend should use to reach
|
|
// this node for file transfer.
|
|
func (cfg *Config) advertiseHTTPAddr() string {
|
|
if cfg.AdvertiseHTTPAddr != "" {
|
|
return cfg.AdvertiseHTTPAddr
|
|
}
|
|
advHost, _, _ := strings.Cut(cfg.advertiseAddr(), ":")
|
|
httpPort := cfg.effectiveBasePort() - 1
|
|
return fmt.Sprintf("%s:%d", advHost, httpPort)
|
|
}
|
|
|
|
// registrationBody builds the JSON body for node registration.
|
|
func (cfg *Config) registrationBody() map[string]any {
|
|
nodeName := cfg.NodeName
|
|
if nodeName == "" {
|
|
hostname, err := os.Hostname()
|
|
if err != nil {
|
|
nodeName = fmt.Sprintf("node-%d", os.Getpid())
|
|
} else {
|
|
nodeName = hostname
|
|
}
|
|
}
|
|
|
|
// Detect GPU info for VRAM-aware scheduling
|
|
totalVRAM, _ := xsysinfo.TotalAvailableVRAM()
|
|
gpuVendor, _ := xsysinfo.DetectGPUVendor()
|
|
|
|
maxReplicas := cfg.MaxReplicasPerModel
|
|
if maxReplicas < 1 {
|
|
maxReplicas = 1
|
|
}
|
|
body := map[string]any{
|
|
"name": nodeName,
|
|
"address": cfg.advertiseAddr(),
|
|
"http_address": cfg.advertiseHTTPAddr(),
|
|
"total_vram": totalVRAM,
|
|
"available_vram": totalVRAM, // initially all VRAM is available
|
|
"gpu_vendor": gpuVendor,
|
|
"max_replicas_per_model": maxReplicas,
|
|
}
|
|
|
|
// If no GPU detected, report system RAM so the scheduler/UI has capacity info
|
|
if totalVRAM == 0 {
|
|
if ramInfo, err := xsysinfo.GetSystemRAMInfo(); err == nil {
|
|
body["total_ram"] = ramInfo.Total
|
|
body["available_ram"] = ramInfo.Available
|
|
}
|
|
}
|
|
if cfg.RegistrationToken != "" {
|
|
body["token"] = cfg.RegistrationToken
|
|
}
|
|
|
|
// Parse and add static node labels. Always include the auto-label
|
|
// `node.replica-slots=N` so AND-selectors in ModelSchedulingConfig can
|
|
// target high-capacity nodes (e.g. {"node.replica-slots":"4"}).
|
|
labels := make(map[string]string)
|
|
if cfg.NodeLabels != "" {
|
|
for _, pair := range strings.Split(cfg.NodeLabels, ",") {
|
|
pair = strings.TrimSpace(pair)
|
|
if k, v, ok := strings.Cut(pair, "="); ok {
|
|
labels[strings.TrimSpace(k)] = strings.TrimSpace(v)
|
|
}
|
|
}
|
|
}
|
|
labels["node.replica-slots"] = strconv.Itoa(maxReplicas)
|
|
body["labels"] = labels
|
|
|
|
return body
|
|
}
|
|
|
|
// heartbeatBody returns the current VRAM/RAM stats for heartbeat payloads.
|
|
//
|
|
// When aggregate VRAM usage is unknown (no GPU, or temporary detection
|
|
// failure), we deliberately OMIT available_vram so the frontend keeps its
|
|
// last good value — overwriting with 0 makes the UI show the node as "fully
|
|
// used", while reporting total-as-available lies to the scheduler about
|
|
// free capacity.
|
|
func (cfg *Config) heartbeatBody() map[string]any {
|
|
body := map[string]any{}
|
|
aggregate := xsysinfo.GetGPUAggregateInfo()
|
|
if aggregate.TotalVRAM > 0 {
|
|
body["available_vram"] = aggregate.FreeVRAM
|
|
}
|
|
|
|
// CPU-only workers (or workers that lost GPU visibility momentarily):
|
|
// report system RAM so the scheduler still has capacity info.
|
|
if aggregate.TotalVRAM == 0 {
|
|
if ramInfo, err := xsysinfo.GetSystemRAMInfo(); err == nil {
|
|
body["available_ram"] = ramInfo.Available
|
|
}
|
|
}
|
|
return body
|
|
}
|