mirror of
https://github.com/mudler/LocalAI.git
synced 2026-05-17 04:56:52 -04:00
* feat(concurrency-groups): per-model exclusive groups for backend loading Adds `concurrency_groups: [...]` to model YAML configs. Two models that share a group cannot be loaded concurrently on the same node — loading one evicts the others, reusing the existing pinned/busy/retry policy from LRU eviction. Layered design: - Watchdog (pkg/model): per-node correctness floor — on every Load(), evict any loaded model that shares a group with the requested one. Pinned skips surface NeedMore so the loader retries (and ultimately logs a clear warning), instead of silently allowing the rule to be violated. - Distributed scheduler (core/services/nodes): soft anti-affinity hint — scheduleNewModel prefers nodes that don't already host a same-group model, falling back to eviction only if every candidate has a conflict. Composes with NodeSelector at the same point in the candidate pipeline. Per-node, not cluster-wide: VRAM is a node-local resource, and two heavy models running on different nodes is fine. The ConfigLoader is wired into SmartRouter via a small ConcurrencyConflictResolver interface so the nodes package keeps a narrow surface on core/config. Refactors the inner LRU eviction body into a shared collectEvictionsLocked helper and the loader retry loop into retryEnforce(fn, maxRetries, interval), so both LRU and group enforcement share busy/pinned/retry semantics. Closes #9659. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(watchdog): sync pinned + concurrency_groups at startup The startup-time watchdog setup lives in initializeWatchdog (startup.go), not in startWatchdog (watchdog.go). The latter is only invoked from the runtime-settings RestartWatchdog path. As a result, neither SyncPinnedModelsToWatchdog nor SyncModelGroupsToWatchdog ran at boot, so `pinned: true` and `concurrency_groups: [...]` only became effective after a settings-driven watchdog restart. Fix by adding both sync calls to initializeWatchdog. Confirmed end-to-end: loading model A in group "heavy", then C with no group (coexists), then B in group "heavy" now correctly evicts A and leaves [B, C]. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(test): satisfy errcheck on new os.Remove in concurrency_groups spec CI lint runs new-from-merge-base, so the existing pre-existing `defer os.Remove(tmp.Name())` lines are baseline-grandfathered but the one introduced by the concurrency_groups YAML round-trip test is held to errcheck. Wrap the remove in a closure that discards the error. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
51 lines
1.3 KiB
Go
51 lines
1.3 KiB
Go
package model
|
|
|
|
import (
|
|
"sync/atomic"
|
|
"time"
|
|
|
|
. "github.com/onsi/ginkgo/v2"
|
|
. "github.com/onsi/gomega"
|
|
)
|
|
|
|
var _ = Describe("retryEnforce", func() {
|
|
It("returns immediately when the first attempt is satisfied", func() {
|
|
var calls atomic.Int32
|
|
retryEnforce(func() EnforceLRULimitResult {
|
|
calls.Add(1)
|
|
return EnforceLRULimitResult{}
|
|
}, 5, 1*time.Millisecond, "test")
|
|
Expect(calls.Load()).To(Equal(int32(1)))
|
|
})
|
|
|
|
It("retries until NeedMore clears", func() {
|
|
var calls atomic.Int32
|
|
retryEnforce(func() EnforceLRULimitResult {
|
|
n := calls.Add(1)
|
|
if n < 3 {
|
|
return EnforceLRULimitResult{NeedMore: true}
|
|
}
|
|
return EnforceLRULimitResult{EvictedCount: 1}
|
|
}, 5, 1*time.Millisecond, "test")
|
|
Expect(calls.Load()).To(Equal(int32(3)))
|
|
})
|
|
|
|
It("stops after maxRetries when NeedMore never clears", func() {
|
|
var calls atomic.Int32
|
|
retryEnforce(func() EnforceLRULimitResult {
|
|
calls.Add(1)
|
|
return EnforceLRULimitResult{NeedMore: true}
|
|
}, 4, 1*time.Millisecond, "test")
|
|
Expect(calls.Load()).To(Equal(int32(4)))
|
|
})
|
|
|
|
It("treats maxRetries <= 0 as a no-op (no calls)", func() {
|
|
var calls atomic.Int32
|
|
retryEnforce(func() EnforceLRULimitResult {
|
|
calls.Add(1)
|
|
return EnforceLRULimitResult{}
|
|
}, 0, 1*time.Millisecond, "test")
|
|
Expect(calls.Load()).To(Equal(int32(0)))
|
|
})
|
|
})
|