Files
LocalAI/docs/content/reference/nvidia-l4t.md
Ettore Di Giacinto 551ebdb57a fix(distributed): correct VRAM/RAM reporting on NVIDIA unified-memory hosts (#9545)
Workers on NVIDIA unified-memory hardware (DGX Spark / GB10, Jetson AGX Thor,
Jetson Orin/Xavier/Nano) were reporting `available_vram=0` back to the frontend,
so the Nodes UI showed the node as fully used even when most of the unified
memory was actually free.

Three causes addressed:

* `isTegraDevice` only matched `/sys/devices/soc0/family == "Tegra"`. DGX Spark
  (SBSA) reports JEDEC codes there instead — `jep106:0426` for the NVIDIA
  manufacturer — so the Tegra/unified-memory fallback never ran. Renamed to
  `isNVIDIAIntegratedGPU` and extended to also match `jep106:0426[:*]` via
  `/sys/devices/soc0/soc_id`.

* The unified-iGPU code defaulted the device name to `"NVIDIA Jetson"` when
  `/proc/device-tree/model` was missing. That's what happens for Thor inside a
  docker container, and always on DGX Spark. New `nvidiaIntegratedGPUName`
  resolves via dt-model → `/sys/devices/soc0/machine` → `soc_id` lookup
  (`jep106:0426:8901` → `"NVIDIA GB10"`) so the Nodes UI labels the box
  correctly.

* Worker heartbeat sent `available_vram=0` (or total-as-available) when VRAM
  usage was momentarily unknown — e.g. when `nvidia-smi` intermittently failed
  with `waitid: no child processes` under containers without `--init`. Each
  such heartbeat overwrote the DB and made the UI flip to "fully used".
  `heartbeatBody` now omits `available_vram` in that case so the DB keeps its
  last good value.

Also updates the commented GPU blocks in both compose files with
`NVIDIA_DRIVER_CAPABILITIES=compute,utility`, `capabilities: [gpu, utility]`,
and `init: true`, and documents the requirement in the distributed-mode and
nvidia-l4t pages. Without `utility`, NVML/`nvidia-smi` are absent inside the
container, which is what put the DGX Spark worker into the buggy fallback in
the first place.

Detection verified on live hardware (dgx.casa / GB10 and 192.168.68.23 / Thor)
by running a cross-compiled probe of the new helpers on both host and inside
the worker container.

Assisted-by: Claude:opus-4.7 [Claude Code]
2026-04-24 22:02:23 +02:00

3.8 KiB

+++ disableToc = false title = "Running on Nvidia ARM64" weight = 27 +++

LocalAI can be run on Nvidia ARM64 devices, such as the Jetson Nano, Jetson Xavier NX, Jetson AGX Orin, and Nvidia DGX Spark. The following instructions will guide you through building and using the LocalAI container for Nvidia ARM64 devices.

Platform Compatibility

  • CUDA 12 L4T images: Compatible with Nvidia AGX Orin and similar platforms (Jetson Nano, Jetson Xavier NX, Jetson AGX Xavier)
  • CUDA 13 L4T images: Compatible with Nvidia DGX Spark

Prerequisites

Pre-built Images

Pre-built images are available on quay.io and dockerhub:

CUDA 12 (for AGX Orin and similar platforms)

docker pull quay.io/go-skynet/local-ai:latest-nvidia-l4t-arm64
# or
docker pull localai/localai:latest-nvidia-l4t-arm64

CUDA 13 (for DGX Spark)

docker pull quay.io/go-skynet/local-ai:latest-nvidia-l4t-arm64-cuda-13
# or
docker pull localai/localai:latest-nvidia-l4t-arm64-cuda-13

Build the container

If you need to build the container yourself, use the following commands:

CUDA 12 (for AGX Orin and similar platforms)

git clone https://github.com/mudler/LocalAI

cd LocalAI

docker build --build-arg SKIP_DRIVERS=true --build-arg BUILD_TYPE=cublas --build-arg BASE_IMAGE=nvcr.io/nvidia/l4t-jetpack:r36.4.0 --build-arg IMAGE_TYPE=core -t quay.io/go-skynet/local-ai:master-nvidia-l4t-arm64-core .

CUDA 13 (for DGX Spark)

git clone https://github.com/mudler/LocalAI

cd LocalAI

docker build --build-arg SKIP_DRIVERS=false --build-arg BUILD_TYPE=cublas --build-arg CUDA_MAJOR_VERSION=13 --build-arg CUDA_MINOR_VERSION=0 --build-arg BASE_IMAGE=ubuntu:24.04 --build-arg IMAGE_TYPE=core -t quay.io/go-skynet/local-ai:master-nvidia-l4t-arm64-cuda-13-core .

Usage

Run the LocalAI container on Nvidia ARM64 devices using the following commands, where /data/models is the directory containing the models:

CUDA 12 (for AGX Orin and similar platforms)

docker run -e DEBUG=true -p 8080:8080 -v /data/models:/models -ti --restart=always --name local-ai --runtime nvidia --gpus all quay.io/go-skynet/local-ai:latest-nvidia-l4t-arm64

CUDA 13 (for DGX Spark)

docker run -e DEBUG=true -p 8080:8080 -v /data/models:/models -ti --restart=always --name local-ai --runtime nvidia --gpus all quay.io/go-skynet/local-ai:latest-nvidia-l4t-arm64-cuda-13

Note: /data/models is the directory containing the models. You can replace it with the directory containing your models.

GPU reporting in distributed mode

If you run a worker on a Jetson, DGX Spark (GB10), or Thor and the Nodes page in the frontend shows the node as fully used, check two things:

  1. NVIDIA_DRIVER_CAPABILITIES must include utility so nvidia-smi / NVML work inside the container. With --gpus all alone (or --runtime nvidia without extra flags) only compute is wired in on some driver versions. Add -e NVIDIA_DRIVER_CAPABILITIES=compute,utility to your docker run, or capabilities: [gpu, utility] in compose / Kubernetes device reservations.
  2. Pass --init to docker run (or init: true in compose) so the container has a proper PID 1 reaper — otherwise short-lived child processes like nvidia-smi can intermittently fail with waitid: no child processes.

On unified-memory devices LocalAI auto-detects the SoC via /sys/devices/soc0/{family,soc_id} and reports system RAM as VRAM, so nvidia-smi is not strictly required for VRAM metrics. See [Distributed Mode → NVIDIA GPU support]({{% relref "/features/distributed-mode#nvidia-gpu-support" %}}) for full context.