Files
LocalAI/pkg/xsysinfo/computecap_internal_test.go
Ettore Di Giacinto bca250e2bd feat(config): node-aware hardware defaults — larger physical batch on Blackwell
A larger physical batch (n_batch/n_ubatch) materially lifts MoE prefill on
NVIDIA Blackwell consumer GPUs (sm_120/121, incl. GB10 / DGX Spark) — measured
on a GB10 with Qwen3-Coder-30B-A3B, the prefill ceiling rises (ub512 ~2994 ->
ub2048 ~3316 t/s) and saturates around 2048.

The heuristic lives in core/config alongside the other config overriders
(ApplyInferenceDefaults, guessDefaultsFromFile/NGPULayers) — they all fill the
ModelConfig from heuristics, so hardware tuning is the same domain and stays in
one place. It is parameterized on a GPU descriptor (not direct detection) so it
works in both deployment shapes:

- Single host: SetDefaults applies it with the LocalGPU.
- Distributed: only the worker sees the GPU, so the worker reports its compute
  capability on registration (gpu_compute_capability -> BackendNode), and the
  router re-applies the SAME core/config heuristic for the SELECTED node before
  loading — fixing the case where the frontend has no GPU at all.

Explicit `batch:` always wins (only managed default values are touched).
xsysinfo gains NVIDIAComputeCapability() (detection only); all interpretation
lives in core/config. Tests: core/config, pkg/xsysinfo, core/services/nodes.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-19 22:02:14 +00:00

24 lines
561 B
Go

package xsysinfo
import (
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
var _ = Describe("parseComputeCap", func() {
DescribeTable("splits major.minor",
func(in string, maj, min int) {
m, n := parseComputeCap(in)
Expect(m).To(Equal(maj))
Expect(n).To(Equal(min))
},
Entry("GB10 / DGX Spark", "12.1", 12, 1),
Entry("RTX 50-series", "12.0", 12, 0),
Entry("Hopper", "9.0", 9, 0),
Entry("major only", "12", 12, 0),
Entry("whitespace", " 12.1 ", 12, 1),
Entry("empty", "", -1, -1),
Entry("garbage", "abc", -1, -1),
)
})