Files
LocalAI/core
Ettore Di Giacinto 0c9666c295 fix(config): gate top_k=40 default on backend family (#6632)
SetDefaults injected top_k=40 (llama.cpp's sampling default) for every
model config regardless of backend. That value is wrong for backends
whose native default differs: mlx_lm's intended default is top_k=0
(disabled) and mlx does not remap 0->40, so a client that omits top_k
silently got 40 shipped to mlx, changing sampling. The mlx backend's own
getattr(request,'TopK',0) fallback is dead because proto3 int32 is always
present.

Gate the injection on backend family via UsesLlamaSamplerDefaults: keep
top_k=40 for the llama.cpp family and for the empty/auto backend (the GGUF
auto-detect path resolves to llama.cpp, so existing behavior is preserved),
but leave TopK nil for the known non-llama backends (mlx, mlx-vlm,
mlx-distributed). gRPCPredictOpts now sends 0 when TopK is nil, which is
the value mlx actually wants.

Only TopK is gated - the confirmed bug. The sibling sampler defaults
(top_p, temperature, min_p) are left global to avoid widening scope and
introducing nil-deref risk; revisit per-backend if needed.

Assisted-by: claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-12 21:45:22 +00:00
..