mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-22 15:49:12 -04:00
refactor(config): single source of truth for default values across config + backend Defaults were decided in two areas with duplicated/drifted literals: the config SetDefaults tiers vs core/backend/options.go's grpcModelOpts (which translates a ModelConfig to the backend wire format and supplied its own fallbacks). They had drifted - n_gpu_layers 9999999 (options.go) vs 99999999 (gguf.go), two 512 batch constants, context 1024 (gguf) vs 4096 (backend) scattered as bare literals. Introduce core/config/defaults.go as the canonical home (DefaultContextSize=4096, GGUFFallbackContextSize=1024, DefaultNGPULayers=99999999, DefaultFlashAttention= auto). gguf.go / hooks_llamacpp.go use them directly; core/backend references them (backend imports config, never the reverse) so DefaultContextSize/DefaultBatchSize and the flash-attn / n_gpu_layers fallbacks resolve to one place. The two context values (1024 GGUF-no-estimate vs 4096 general) are kept distinct but now named + documented, not blind literals. Behavior-preserving; config + backend suites green. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
31 lines
1.3 KiB
Go
31 lines
1.3 KiB
Go
package config
|
|
|
|
// Canonical default values.
|
|
//
|
|
// These are owned here so the two layers that need them share a single source
|
|
// of truth: the config tiers (ApplyInference/Hardware/Serving/Generic — which
|
|
// *decide* defaults) and core/backend/options.go (which *translates* a
|
|
// ModelConfig to the backend wire format and supplies the same fallbacks
|
|
// defensively). Previously these were duplicated as literals across both
|
|
// packages and had drifted (e.g. n_gpu_layers 9999999 vs 99999999, two batch
|
|
// constants of 512). core/backend imports core/config, so backend references
|
|
// these; config never imports backend.
|
|
const (
|
|
// DefaultContextSize is the fallback context window when none is configured
|
|
// or estimable from the model.
|
|
DefaultContextSize = 4096
|
|
|
|
// GGUFFallbackContextSize is the context window for a GGUF model whose
|
|
// metadata yields no usable estimate (see guessGGUFFromFile). Deliberately
|
|
// smaller than DefaultContextSize to stay conservative on memory there.
|
|
GGUFFallbackContextSize = 1024
|
|
|
|
// DefaultNGPULayers means "offload all layers"; the backend (fit_params)
|
|
// clamps to what actually fits in device memory.
|
|
DefaultNGPULayers = 99999999
|
|
|
|
// DefaultFlashAttention is the flash-attention mode default; "auto" lets the
|
|
// backend enable it when the model + backend support it.
|
|
DefaultFlashAttention = "auto"
|
|
)
|