Workers on NVIDIA unified-memory hardware (DGX Spark / GB10, Jetson AGX Thor,
Jetson Orin/Xavier/Nano) were reporting `available_vram=0` back to the frontend,
so the Nodes UI showed the node as fully used even when most of the unified
memory was actually free.
Three causes addressed:
* `isTegraDevice` only matched `/sys/devices/soc0/family == "Tegra"`. DGX Spark
(SBSA) reports JEDEC codes there instead — `jep106:0426` for the NVIDIA
manufacturer — so the Tegra/unified-memory fallback never ran. Renamed to
`isNVIDIAIntegratedGPU` and extended to also match `jep106:0426[:*]` via
`/sys/devices/soc0/soc_id`.
* The unified-iGPU code defaulted the device name to `"NVIDIA Jetson"` when
`/proc/device-tree/model` was missing. That's what happens for Thor inside a
docker container, and always on DGX Spark. New `nvidiaIntegratedGPUName`
resolves via dt-model → `/sys/devices/soc0/machine` → `soc_id` lookup
(`jep106:0426:8901` → `"NVIDIA GB10"`) so the Nodes UI labels the box
correctly.
* Worker heartbeat sent `available_vram=0` (or total-as-available) when VRAM
usage was momentarily unknown — e.g. when `nvidia-smi` intermittently failed
with `waitid: no child processes` under containers without `--init`. Each
such heartbeat overwrote the DB and made the UI flip to "fully used".
`heartbeatBody` now omits `available_vram` in that case so the DB keeps its
last good value.
Also updates the commented GPU blocks in both compose files with
`NVIDIA_DRIVER_CAPABILITIES=compute,utility`, `capabilities: [gpu, utility]`,
and `init: true`, and documents the requirement in the distributed-mode and
nvidia-l4t pages. Without `utility`, NVML/`nvidia-smi` are absent inside the
container, which is what put the DGX Spark worker into the buggy fallback in
the first place.
Detection verified on live hardware (dgx.casa / GB10 and 192.168.68.23 / Thor)
by running a cross-compiled probe of the new helpers on both host and inside
the worker container.
Assisted-by: Claude:opus-4.7 [Claude Code]
* feat(ui): add users and authentication support
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: allow the admin user to impersonificate users
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: ui improvements, disable 'Users' button in navbar when no auth is configured
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: add OIDC support
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: gate models
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: cache requests to optimize speed
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* small UI enhancements
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(ui): style improvements
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: cover other paths by auth
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: separate local auth, refactor
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* security hardening, approval mode
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: fix tests and expectations
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: update localagi/localrecall
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: add standalone and agentic functionalities
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* expose agents via responses api
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This addresses issue #8108 where the legacy nvidia driver configuration
causes container startup failures with newer NVIDIA Container Toolkit versions.
Changes:
- Update docker-compose example to show both CDI (recommended) and legacy
nvidia driver options
- Add troubleshooting section for 'Auto-detected mode as legacy' error
- Document the fix for nvidia-container-cli 'invalid expression' errors
The root cause is a Docker/NVIDIA Container Toolkit configuration issue,
not a LocalAI code bug. The error occurs during the container runtime's
prestart hook before LocalAI starts.
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
- Added named volumes (models, images) to docker-compose.yaml
- Added named volumes (models, backends) to .devcontainer/docker-compose-devcontainer.yml
- Changed bind mounts to named volumes for Windows compatibility
Fixes#8455
Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
* feat: split remaining backends and drop embedded backends
- Drop silero-vad, huggingface, and stores backend from embedded
binaries
- Refactor Makefile and Dockerfile to avoid building grpc backends
- Drop golang code that was used to embed backends
- Simplify building by using goreleaser
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(gallery): be specific with llama-cpp backend templates
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(docs): update
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(ci): minor fixes
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: drop all ffmpeg references
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: run protogen-go
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Always enable p2p mode
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Update gorelease file
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(stores): do not always load
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix linting issues
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Simplify
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Mac OS fixup
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>