Files
LocalAI/docker-compose.yaml
Ettore Di Giacinto 551ebdb57a fix(distributed): correct VRAM/RAM reporting on NVIDIA unified-memory hosts (#9545)
Workers on NVIDIA unified-memory hardware (DGX Spark / GB10, Jetson AGX Thor,
Jetson Orin/Xavier/Nano) were reporting `available_vram=0` back to the frontend,
so the Nodes UI showed the node as fully used even when most of the unified
memory was actually free.

Three causes addressed:

* `isTegraDevice` only matched `/sys/devices/soc0/family == "Tegra"`. DGX Spark
  (SBSA) reports JEDEC codes there instead — `jep106:0426` for the NVIDIA
  manufacturer — so the Tegra/unified-memory fallback never ran. Renamed to
  `isNVIDIAIntegratedGPU` and extended to also match `jep106:0426[:*]` via
  `/sys/devices/soc0/soc_id`.

* The unified-iGPU code defaulted the device name to `"NVIDIA Jetson"` when
  `/proc/device-tree/model` was missing. That's what happens for Thor inside a
  docker container, and always on DGX Spark. New `nvidiaIntegratedGPUName`
  resolves via dt-model → `/sys/devices/soc0/machine` → `soc_id` lookup
  (`jep106:0426:8901` → `"NVIDIA GB10"`) so the Nodes UI labels the box
  correctly.

* Worker heartbeat sent `available_vram=0` (or total-as-available) when VRAM
  usage was momentarily unknown — e.g. when `nvidia-smi` intermittently failed
  with `waitid: no child processes` under containers without `--init`. Each
  such heartbeat overwrote the DB and made the UI flip to "fully used".
  `heartbeatBody` now omits `available_vram` in that case so the DB keeps its
  last good value.

Also updates the commented GPU blocks in both compose files with
`NVIDIA_DRIVER_CAPABILITIES=compute,utility`, `capabilities: [gpu, utility]`,
and `init: true`, and documents the requirement in the distributed-mode and
nvidia-l4t pages. Without `utility`, NVML/`nvidia-smi` are absent inside the
container, which is what put the DGX Spark worker into the buggy fallback in
the first place.

Detection verified on live hardware (dgx.casa / GB10 and 192.168.68.23 / Thor)
by running a cross-compiled probe of the new helpers on both host and inside
the worker container.

Assisted-by: Claude:opus-4.7 [Claude Code]
2026-04-24 22:02:23 +02:00

92 lines
3.3 KiB
YAML

services:
api:
# See https://localai.io/basics/getting_started/#container-images for
# a list of available container images (or build your own with the provided Dockerfile)
# Available images with CUDA, ROCm, SYCL
# Image list (quay.io): https://quay.io/repository/go-skynet/local-ai?tab=tags
# Image list (dockerhub): https://hub.docker.com/r/localai/localai
image: quay.io/go-skynet/local-ai:master
build:
context: .
dockerfile: Dockerfile
args:
- IMAGE_TYPE=core
- BASE_IMAGE=ubuntu:24.04
ports:
- 8080:8080
env_file:
- .env
environment:
- MODELS_PATH=/models
# - DEBUG=true
## Agents (LocalAGI) - https://localai.io/features/agents/
# - LOCALAI_DISABLE_AGENTS=false
# - LOCALAI_AGENT_POOL_DEFAULT_MODEL=hermes-3-llama3.1-8b
# - LOCALAI_AGENT_POOL_ENABLE_SKILLS=true
# - LOCALAI_AGENT_POOL_ENABLE_LOGS=true
# - LOCALAI_AGENT_HUB_URL=https://agenthub.localai.io
## Uncomment to use PostgreSQL for the knowledge base (requires the postgres service below)
# - LOCALAI_AGENT_POOL_VECTOR_ENGINE=postgres
# - LOCALAI_AGENT_POOL_DATABASE_URL=postgresql://localrecall:localrecall@postgres:5432/localrecall?sslmode=disable
volumes:
- models:/models
- images:/tmp/generated/images/
- data:/data
- backends:/backends
- configuration:/configuration
command:
# Here we can specify a list of models to run (see quickstart https://localai.io/basics/getting_started/#running-models )
# or an URL pointing to a YAML configuration file, for example:
# - https://gist.githubusercontent.com/mudler/ad601a0488b497b69ec549150d9edd18/raw/a8a8869ef1bb7e3830bf5c0bae29a0cce991ff8d/phi-2.yaml
- phi-2
# For NVIDIA GPU support with CDI (recommended for NVIDIA Container Toolkit 1.14+):
# Uncomment the following deploy section and use driver: nvidia.com/gpu.
# Include `utility` in capabilities so nvidia-smi / NVML are available —
# without it, free-VRAM reporting on discrete GPUs is unavailable and the
# Nodes UI will misreport memory usage.
# environment:
# NVIDIA_DRIVER_CAPABILITIES: "compute,utility"
# init: true # avoids zombie-reap races that can make nvidia-smi flaky
# deploy:
# resources:
# reservations:
# devices:
# - driver: nvidia.com/gpu
# count: all
# capabilities: [gpu, utility]
#
# For legacy NVIDIA driver (for older NVIDIA Container Toolkit):
# environment:
# NVIDIA_DRIVER_CAPABILITIES: "compute,utility"
# init: true
# deploy:
# resources:
# reservations:
# devices:
# - driver: nvidia
# count: 1
# capabilities: [gpu, utility]
## Uncomment for PostgreSQL-backed knowledge base (see Agents docs)
# postgres:
# image: quay.io/mudler/localrecall:v0.5.2-postgresql
# environment:
# - POSTGRES_DB=localrecall
# - POSTGRES_USER=localrecall
# - POSTGRES_PASSWORD=localrecall
# volumes:
# - postgres_data:/var/lib/postgresql
# healthcheck:
# test: ["CMD-SHELL", "pg_isready -U localrecall"]
# interval: 10s
# timeout: 5s
# retries: 5
volumes:
models:
images:
data:
configuration:
backends:
# postgres_data: