Files
LocalAI/pkg/model/watchdog_options.go
Leoy b50b1fe418 feat(watchdog): add size-aware LRU eviction mode (#9527)
* feat(watchdog): add size-aware LRU eviction mode

When the model count hits the LRU limit or the memory reclaimer fires,
evict the largest model by on-disk file size first rather than the
least-recently-used one.  For GGUF models the file size is a reliable
proxy for GPU/RAM footprint, so evicting the largest candidate maximises
freed memory per eviction round while keeping small utility models
(embeddings, classifiers, rerankers) resident.

Changes:
- `pkg/model/watchdog.go`: add `sizeAwareEviction` flag and
  `modelSizes map[string]int64` to `WatchDog`; sort candidates by
  `sizeBytes` desc (LRU time as tiebreaker) when the flag is set;
  add `RegisterModelSize`, `SetSizeAwareEviction`, `GetSizeAwareEviction`
- `pkg/model/watchdog_options.go`: add `WithSizeAwareEviction` option
- `pkg/model/initializers.go`: stat model file after load and call
  `RegisterModelSize` so size data is available before the first eviction
- `core/config/application_config.go`, `runtime_settings.go`: add
  `SizeAwareEviction` field and `WithSizeAwareEviction` app option;
  expose via `ToRuntimeSettings` / `ApplyRuntimeSettings` for the
  `POST /api/settings` live-reload path
- `core/cli/run.go`: add `--size-aware-eviction` flag /
  `LOCALAI_SIZE_AWARE_EVICTION` env var
- `core/application/startup.go`, `watchdog.go`: wire the new option
  through to `NewWatchDog`
- `pkg/model/watchdog_test.go`: 5 new specs — option enable, dynamic
  toggle, largest-first ordering, equal-size LRU tiebreaker, no-size
  fallback to LRU, and size-map cleanup on eviction

Closes #9375

Signed-off-by: supermario_leo <leo.stack@outlook.com>

* refactor(watchdog): use vram estimation scaffolding for model size

Replace the brittle os.Stat(modelFile) approach with a proper call to
pkg/vram, which handles multi-file models (DownloadFiles, MMProj) and
all weight file types, not just single GGUF files.

- Add estimateModelSizeBytes() in core/backend/options.go that collects
  all weight file URIs from the model config, resolves them to file://
  URIs, and calls vram.Estimate() with the shared DefaultCachedSizeResolver
  (15-min TTL cache avoids redundant stat calls on repeated loads)
- Thread the result through via a new WithModelSizeBytes() loader option
- In initializers.go, consume the pre-computed size instead of calling
  os.Stat; if no size was supplied (e.g. for external/router-dispatched
  models) the registration is simply skipped

Signed-off-by: supermario_leo <leo.stack@outlook.com>

* refactor(watchdog): use EstimateModel with HF fallback for size estimation

Switch estimateModelSizeBytes from calling vram.Estimate directly to the
unified vram.EstimateModel entry point, which adds automatic fallbacks:
file-based GGUF metadata → HF API → size string.

Also extract the HuggingFace repo ID from model URIs (huggingface://,
hf://, https://huggingface.co/ and org/model short-form) and pass it
as ModelEstimateInput.HFRepo, so models not yet downloaded locally can
still get a size estimate via the HF API.

Addresses @mudler's review feedback: "better to rely on EstimateModel
and pass by the HF URL of the model extracted from the URI".

Signed-off-by: supermario_leo <leo.stack@outlook.com>

* feat(webui): add Size-Aware Eviction toggle to settings page

The size-aware eviction setting was wired through the CLI flag and the
RuntimeSettings live-reload path (POST /api/settings) but had no handle
on the React settings page, so it could not be toggled from the UI.

Add a Size-Aware Eviction toggle to the Watchdog section, next to the
existing Force Eviction When Busy / LRU eviction handles. The settings
page loads and saves the whole RuntimeSettings object, so the new
size_aware_eviction key is picked up with no extra plumbing.

Addresses @mudler's review feedback: the application config setting
should land on the same UI settings page as the other handles.

Signed-off-by: supermario_leo <leo.stack@outlook.com>

---------

Signed-off-by: supermario_leo <leo.stack@outlook.com>
2026-06-21 17:17:04 +02:00

156 lines
4.6 KiB
Go

package model
import (
"time"
)
const (
DefaultWatchdogInterval = 500 * time.Millisecond
DefaultMemoryReclaimerThreshold = 0.80
)
// WatchDogOptions contains all configuration for the WatchDog
type WatchDogOptions struct {
processManager ProcessManager
// Timeout settings
busyTimeout time.Duration
idleTimeout time.Duration
watchdogInterval time.Duration
// Check toggles
busyCheck bool
idleCheck bool
// LRU settings
lruLimit int // Maximum number of active backends (0 = unlimited)
// Memory reclaimer settings (works with GPU if available, otherwise RAM)
memoryReclaimerEnabled bool // Enable memory threshold monitoring
memoryReclaimerThreshold float64 // Threshold 0.0-1.0 (e.g., 0.95 = 95%)
// Eviction settings
forceEvictionWhenBusy bool // Force eviction even when models have active API calls (default: false for safety)
// Size-aware eviction: sort candidates by model file size (largest first)
sizeAwareEviction bool
}
// WatchDogOption is a function that configures WatchDogOptions
type WatchDogOption func(*WatchDogOptions)
// WithProcessManager sets the process manager for the watchdog
func WithProcessManager(pm ProcessManager) WatchDogOption {
return func(o *WatchDogOptions) {
o.processManager = pm
}
}
// WithBusyTimeout sets the busy timeout duration
func WithBusyTimeout(timeout time.Duration) WatchDogOption {
return func(o *WatchDogOptions) {
o.busyTimeout = timeout
}
}
// WithIdleTimeout sets the idle timeout duration
func WithIdleTimeout(timeout time.Duration) WatchDogOption {
return func(o *WatchDogOptions) {
o.idleTimeout = timeout
}
}
// WithWatchdogCheck sets the watchdog check duration
func WithWatchdogInterval(interval time.Duration) WatchDogOption {
return func(o *WatchDogOptions) {
o.watchdogInterval = interval
}
}
// WithBusyCheck enables or disables busy checking
func WithBusyCheck(enabled bool) WatchDogOption {
return func(o *WatchDogOptions) {
o.busyCheck = enabled
}
}
// WithIdleCheck enables or disables idle checking
func WithIdleCheck(enabled bool) WatchDogOption {
return func(o *WatchDogOptions) {
o.idleCheck = enabled
}
}
// WithLRULimit sets the maximum number of active backends (0 = unlimited)
func WithLRULimit(limit int) WatchDogOption {
return func(o *WatchDogOptions) {
o.lruLimit = limit
}
}
// WithMemoryReclaimer enables memory threshold monitoring with the specified threshold
// Works with GPU VRAM if available, otherwise uses system RAM
func WithMemoryReclaimer(enabled bool, threshold float64) WatchDogOption {
return func(o *WatchDogOptions) {
o.memoryReclaimerEnabled = enabled
o.memoryReclaimerThreshold = threshold
}
}
// WithMemoryReclaimerEnabled enables or disables memory threshold monitoring
func WithMemoryReclaimerEnabled(enabled bool) WatchDogOption {
return func(o *WatchDogOptions) {
o.memoryReclaimerEnabled = enabled
}
}
// WithMemoryReclaimerThreshold sets the memory threshold (0.0-1.0)
func WithMemoryReclaimerThreshold(threshold float64) WatchDogOption {
return func(o *WatchDogOptions) {
o.memoryReclaimerThreshold = threshold
}
}
// WithForceEvictionWhenBusy sets whether to force eviction even when models have active API calls
// Default: false (skip eviction when busy for safety)
func WithForceEvictionWhenBusy(force bool) WatchDogOption {
return func(o *WatchDogOptions) {
o.forceEvictionWhenBusy = force
}
}
// WithSizeAwareEviction enables size-aware eviction ordering.
// When true, eviction candidates are sorted by on-disk file size (largest first)
// so that bigger models are freed before smaller ones, keeping small utility models
// resident and maximizing the memory freed per eviction round.
// Default: false (LRU time ordering).
func WithSizeAwareEviction(enabled bool) WatchDogOption {
return func(o *WatchDogOptions) {
o.sizeAwareEviction = enabled
}
}
// DefaultWatchDogOptions returns default options for the watchdog
func DefaultWatchDogOptions() *WatchDogOptions {
return &WatchDogOptions{
busyTimeout: 5 * time.Minute,
idleTimeout: 15 * time.Minute,
watchdogInterval: DefaultWatchdogInterval,
busyCheck: false,
idleCheck: false,
lruLimit: 0,
memoryReclaimerEnabled: false,
memoryReclaimerThreshold: DefaultMemoryReclaimerThreshold,
forceEvictionWhenBusy: false, // Default: skip eviction when busy for safety
}
}
// NewWatchDogOptions creates WatchDogOptions with the provided options applied
func NewWatchDogOptions(opts ...WatchDogOption) *WatchDogOptions {
o := DefaultWatchDogOptions()
for _, opt := range opts {
opt(o)
}
return o
}