Files
LocalAI/core/services/nodes/interfaces.go
LocalAI [bot] a891eedd08 fix(distributed): persist per-model load info so reconciler survives frontend restart (#9981)
* feat(distributed): add per-model ModelLoadInfo persistence

Adds a dedicated ModelLoadInfo table keyed by model name, decoupled from
the per-replica NodeModel rows. The reconciler can now recover model load
metadata after every NodeModel row has been removed (worker death,
eviction, MarkOffline reaping, frontend restart with stale heartbeats),
which is the read side of Bug-1 from the distributed mode bug hunt.

Registry exposes:
  - UpsertModelLoadInfo: ON CONFLICT (model_name) update; last-write-wins,
    matching the existing per-replica blob semantics under concurrent
    multi-frontend dispatch.
  - GetModelLoadInfo: read from the new table first; fall back to the
    legacy NodeModel-blob scan for rows written before any frontend in
    the cluster ran an UpsertModelLoadInfo (rolling-upgrade transition).

SetNodeModelLoadInfo (per-replica blob) is preserved for backward
compatibility and per-replica diagnostics; the dispatch-path hook in the
next commit calls both.

The new table joins the existing nodes AutoMigrate set under the same
schema-migration advisory lock.

Refs: Bug-1, docs/superpowers/specs/2026-05-24-distributed-mode-bug-hunt-findings.md

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7[1m]

* fix(distributed): persist per-model load info on dispatch

scheduleAndLoad now writes the (backendType, ModelOptions blob) pair to
the new ModelLoadInfo table in addition to the existing per-replica
NodeModel.model_opts_blob field. The per-replica blob still works for
the hot path; the per-model row outlives every NodeModel row going away,
which is what unblocks the reconciler on the read side.

Both writes are best-effort with warn-level logging on failure: a write
miss here just means the reconciler may need a fresh inference request
to repopulate, which is the pre-fix behavior.

Concurrency: two frontends loading the same model at the same time both
fire UpsertModelLoadInfo; ON CONFLICT (model_name) makes the row
converge to whichever commits last. Matches the existing per-replica
blob semantics.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7[1m]

* test(distributed): cover load info persistence and Bug-1 recovery

Adds Ginkgo specs that prove the persistence layer behaves correctly and
that the reconciler actually recovers from the frontend-restart scenario
that was failing in production:

registry_test.go:
  - per-model row survives RemoveAllNodeModelReplicas (the bug repro)
  - ON CONFLICT (model_name) updates backend type + blob, last-write-wins
  - legacy NodeModel-blob fallback still works (rolling-upgrade transition)
  - GetModelLoadInfo returns ErrRecordNotFound when both sources are empty
  - UpsertModelLoadInfo rejects empty model names

reconciler_test.go:
  - Bug-1 end-to-end: with min_replicas=2, no NodeModel rows, but a
    ModelLoadInfo row present, one reconcile tick fires two scheduler
    calls. Pre-fix this returned "no load info" and the scheduler never
    got called until a fresh inference request arrived.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7[1m]

* docs(distributed): note restart-safe reconciler behavior

Adds a bullet to the Replica Reconciler section explaining that per-model
load metadata is persisted across frontend restarts via the new
model_load_infos PostgreSQL table, so a rolling upgrade no longer needs a
fresh inference request per model before the reconciler can replace dead
replicas.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7[1m]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-25 13:00:06 +02:00

117 lines
6.1 KiB
Go

package nodes
import (
"context"
"time"
grpc "github.com/mudler/LocalAI/pkg/grpc"
)
// ModelRouter is used by SmartRouter for routing decisions and model lifecycle.
type ModelRouter interface {
FindAndLockNodeWithModel(ctx context.Context, modelName string, candidateNodeIDs []string) (*BackendNode, *NodeModel, error)
DecrementInFlight(ctx context.Context, nodeID, modelName string, replicaIndex int) error
IncrementInFlight(ctx context.Context, nodeID, modelName string, replicaIndex int) error
RemoveNodeModel(ctx context.Context, nodeID, modelName string, replicaIndex int) error
RemoveAllNodeModelReplicas(ctx context.Context, nodeID, modelName string) error
TouchNodeModel(ctx context.Context, nodeID, modelName string, replicaIndex int)
SetNodeModel(ctx context.Context, nodeID, modelName string, replicaIndex int, state, address string, initialInFlight int) error
SetNodeModelLoadInfo(ctx context.Context, nodeID, modelName string, replicaIndex int, backendType string, optsBlob []byte) error
UpsertModelLoadInfo(ctx context.Context, modelName, backendType string, optsBlob []byte) error
GetModelLoadInfo(ctx context.Context, modelName string) (backendType string, optsBlob []byte, err error)
NextFreeReplicaIndex(ctx context.Context, nodeID, modelName string, maxSlots int) (int, error)
CountReplicasOnNode(ctx context.Context, nodeID, modelName string) (int, error)
FindNodeWithVRAM(ctx context.Context, minBytes uint64) (*BackendNode, error)
FindIdleNode(ctx context.Context) (*BackendNode, error)
FindLeastLoadedNode(ctx context.Context) (*BackendNode, error)
FindGlobalLRUModelWithZeroInFlight(ctx context.Context) (*NodeModel, error)
FindLRUModel(ctx context.Context, nodeID string) (*NodeModel, error)
Get(ctx context.Context, nodeID string) (*BackendNode, error)
GetModelScheduling(ctx context.Context, modelName string) (*ModelSchedulingConfig, error)
FindNodesBySelector(ctx context.Context, selector map[string]string) ([]BackendNode, error)
FindNodesWithFreeSlot(ctx context.Context, modelName string, candidateNodeIDs []string) ([]BackendNode, error)
ReserveVRAM(ctx context.Context, nodeID string, bytes uint64) error
ReleaseVRAM(ctx context.Context, nodeID string, bytes uint64) error
FindNodeWithVRAMFromSet(ctx context.Context, minBytes uint64, nodeIDs []string) (*BackendNode, error)
FindIdleNodeFromSet(ctx context.Context, nodeIDs []string) (*BackendNode, error)
FindLeastLoadedNodeFromSet(ctx context.Context, nodeIDs []string) (*BackendNode, error)
GetNodeLabels(ctx context.Context, nodeID string) ([]NodeLabel, error)
FindNodesWithModel(ctx context.Context, modelName string) ([]BackendNode, error)
}
// ConcurrencyConflictResolver returns the names of configured models that
// share at least one concurrency group with the given model. It is satisfied
// by *config.ModelConfigLoader and lets the SmartRouter make group-aware
// placement decisions without importing the config package's full surface.
type ConcurrencyConflictResolver interface {
GetModelsConflictingWith(modelName string) []string
}
// NodeHealthStore is used by HealthMonitor for node status management.
type NodeHealthStore interface {
List(ctx context.Context) ([]BackendNode, error)
GetNodeModels(ctx context.Context, nodeID string) ([]NodeModel, error)
MarkOffline(ctx context.Context, nodeID string) error
MarkUnhealthy(ctx context.Context, nodeID string) error
MarkHealthy(ctx context.Context, nodeID string) error
Heartbeat(ctx context.Context, nodeID string, update *HeartbeatUpdate) error
FindStaleNodes(ctx context.Context, threshold time.Duration) ([]BackendNode, error)
RemoveNodeModel(ctx context.Context, nodeID, modelName string, replicaIndex int) error
}
// ModelLocator is used by RemoteUnloaderAdapter for model discovery.
type ModelLocator interface {
FindNodesWithModel(ctx context.Context, modelName string) ([]BackendNode, error)
RemoveNodeModel(ctx context.Context, nodeID, modelName string, replicaIndex int) error
RemoveAllNodeModelReplicas(ctx context.Context, nodeID, modelName string) error
}
// ModelLookup is used by DistributedModelStore for model existence queries.
type ModelLookup interface {
FindNodeForModel(ctx context.Context, modelName string) (*BackendNode, bool)
ListAllLoadedModels(ctx context.Context) ([]NodeModel, error)
Get(ctx context.Context, nodeID string) (*BackendNode, error)
}
// InFlightTracker is used by InFlightTrackingClient for request counting.
type InFlightTracker interface {
IncrementInFlight(ctx context.Context, nodeID, modelName string, replicaIndex int) error
DecrementInFlight(ctx context.Context, nodeID, modelName string, replicaIndex int) error
}
// NodeManager is used by HTTP endpoints for node registration and lifecycle.
type NodeManager interface {
Register(ctx context.Context, node *BackendNode, autoApprove bool) error
Get(ctx context.Context, nodeID string) (*BackendNode, error)
GetByName(ctx context.Context, name string) (*BackendNode, error)
List(ctx context.Context) ([]BackendNode, error)
Deregister(ctx context.Context, nodeID string) error
ApproveNode(ctx context.Context, nodeID string) error
MarkOffline(ctx context.Context, nodeID string) error
MarkDraining(ctx context.Context, nodeID string) error
MarkHealthy(ctx context.Context, nodeID string) error
Heartbeat(ctx context.Context, nodeID string, update *HeartbeatUpdate) error
GetNodeModels(ctx context.Context, nodeID string) ([]NodeModel, error)
UpdateAuthRefs(ctx context.Context, nodeID, authUserID, apiKeyID string) error
RemoveNodeModel(ctx context.Context, nodeID, modelName string, replicaIndex int) error
RemoveAllNodeModelReplicas(ctx context.Context, nodeID, modelName string) error
}
// BackendClientFactory creates gRPC backend clients.
type BackendClientFactory interface {
NewClient(address string, parallel bool) grpc.Backend
}
// tokenClientFactory is the default BackendClientFactory that creates gRPC
// clients with an optional bearer token for distributed auth.
type tokenClientFactory struct {
token string
}
func (f *tokenClientFactory) NewClient(address string, parallel bool) grpc.Backend {
if f.token != "" {
return grpc.NewClientWithToken(address, parallel, nil, false, f.token)
}
return grpc.NewClient(address, parallel, nil, false)
}