Files
LocalAI/docs/content/installation/containers.md
LocalAI [bot] 5e13193d84 docs: add CDI driver config for NVIDIA GPU in containers (fix #8108) (#8677)
This addresses issue #8108 where the legacy nvidia driver configuration
causes container startup failures with newer NVIDIA Container Toolkit versions.

Changes:
- Update docker-compose example to show both CDI (recommended) and legacy
  nvidia driver options
- Add troubleshooting section for 'Auto-detected mode as legacy' error
- Document the fix for nvidia-container-cli 'invalid expression' errors

The root cause is a Docker/NVIDIA Container Toolkit configuration issue,
not a LocalAI code bug. The error occurs during the container runtime's
prestart hook before LocalAI starts.

Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-02-28 08:42:53 +01:00

10 KiB

title, description, weight, url
title description weight url
Containers Install and use LocalAI with container engines (Docker, Podman) 1 /installation/containers/

LocalAI supports Docker, Podman, and other OCI-compatible container engines. This guide covers the common aspects of running LocalAI in containers.

Prerequisites

Before you begin, ensure you have a container engine installed:

Quick Start

The fastest way to get started is with the CPU image:

docker run -p 8080:8080 --name local-ai -ti localai/localai:latest
# Or with Podman:
podman run -p 8080:8080 --name local-ai -ti localai/localai:latest

This will:

  • Start LocalAI (you'll need to install models separately)
  • Make the API available at http://localhost:8080

Image Types

LocalAI provides several image types to suit different needs. These images work with both Docker and Podman.

Standard Images

Standard images don't include pre-configured models. Use these if you want to configure models manually.

CPU Image

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 localai/localai:latest

GPU Images

NVIDIA CUDA 13:

docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-13
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 --device nvidia.com/gpu=all localai/localai:latest-gpu-nvidia-cuda-13

NVIDIA CUDA 12:

docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 --device nvidia.com/gpu=all localai/localai:latest-gpu-nvidia-cuda-12

AMD GPU (ROCm):

docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 --device rocm.com/gpu=all localai/localai:latest-gpu-hipblas

Intel GPU:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-intel
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 --device gpu.intel.com/all localai/localai:latest-gpu-intel

Vulkan:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan

NVIDIA Jetson (L4T ARM64):

CUDA 12 (for Nvidia AGX Orin and similar platforms):

docker run -ti --name local-ai -p 8080:8080 --runtime nvidia --gpus all localai/localai:latest-nvidia-l4t-arm64

CUDA 13 (for Nvidia DGX Spark):

docker run -ti --name local-ai -p 8080:8080 --runtime nvidia --gpus all localai/localai:latest-nvidia-l4t-arm64-cuda-13

All-in-One (AIO) Images

Recommended for beginners - These images come pre-configured with models and backends, ready to use immediately.

CPU Image

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu

GPU Images

NVIDIA CUDA 13:

docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-13
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 --device nvidia.com/gpu=all localai/localai:latest-aio-gpu-nvidia-cuda-13

NVIDIA CUDA 12:

docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 --device nvidia.com/gpu=all localai/localai:latest-aio-gpu-nvidia-cuda-12

AMD GPU (ROCm):

docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-aio-gpu-hipblas
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 --device rocm.com/gpu=all localai/localai:latest-aio-gpu-hipblas

Intel GPU:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-gpu-intel
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 --device gpu.intel.com/all localai/localai:latest-aio-gpu-intel

Using Compose

For a more manageable setup, especially with persistent volumes, use Docker Compose or Podman Compose:

The CDI approach is recommended for newer versions of the NVIDIA Container Toolkit (1.14 and later). It provides better compatibility and is the future-proof method:

version: "3.9"
services:
  api:
    image: localai/localai:latest-aio-gpu-nvidia-cuda-12
    # For CUDA 13, use: localai/localai:latest-aio-gpu-nvidia-cuda-13
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 20m
      retries: 5
    ports:
      - 8080:8080
    environment:
      - DEBUG=false
    volumes:
      - ./models:/models:cached
    # CDI driver configuration (recommended for NVIDIA Container Toolkit 1.14+)
    # This uses the nvidia.com/gpu resource API
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia.com/gpu
              count: all
              capabilities: [gpu]

Save this as compose.yaml and run:

docker compose up -d
# Or with Podman:
podman-compose up -d

Using Legacy NVIDIA Driver - For Older NVIDIA Container Toolkit

If you are using an older version of the NVIDIA Container Toolkit (before 1.14), or need backward compatibility, use the legacy approach:

version: "3.9"
services:
  api:
    image: localai/localai:latest-aio-gpu-nvidia-cuda-12
    # For CUDA 13, use: localai/localai:latest-aio-gpu-nvidia-cuda-13
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 20m
      retries: 5
    ports:
      - 8080:8080
    environment:
      - DEBUG=false
    volumes:
      - ./models:/models:cached
    # Legacy NVIDIA driver configuration (for older NVIDIA Container Toolkit)
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Persistent Storage

To persist models and configurations, mount a volume:

docker run -ti --name local-ai -p 8080:8080 \
  -v $PWD/models:/models \
  localai/localai:latest-aio-cpu
# Or with Podman:
podman run -ti --name local-ai -p 8080:8080 \
  -v $PWD/models:/models \
  localai/localai:latest-aio-cpu

Or use a named volume:

docker volume create localai-models
docker run -ti --name local-ai -p 8080:8080 \
  -v localai-models:/models \
  localai/localai:latest-aio-cpu
# Or with Podman:
podman volume create localai-models
podman run -ti --name local-ai -p 8080:8080 \
  -v localai-models:/models \
  localai/localai:latest-aio-cpu

What's Included in AIO Images

All-in-One images come pre-configured with:

  • Text Generation: LLM models for chat and completion
  • Image Generation: Stable Diffusion models
  • Text to Speech: TTS models
  • Speech to Text: Whisper models
  • Embeddings: Vector embedding models
  • Function Calling: Support for OpenAI-compatible function calling

The AIO images use OpenAI-compatible model names (like gpt-4, gpt-4-vision-preview) but are backed by open-source models. See the container images documentation for the complete mapping.

Next Steps

After installation:

  1. Access the WebUI at http://localhost:8080
  2. Check available models: curl http://localhost:8080/v1/models
  3. Install additional models
  4. Try out examples

Troubleshooting

Container won't start

  • Check container engine is running: docker ps or podman ps
  • Check port 8080 is available: netstat -an | grep 8080 (Linux/Mac)
  • View logs: docker logs local-ai or podman logs local-ai

GPU not detected

  • Ensure Docker has GPU access: docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
  • For Podman, see the Podman installation guide
  • For NVIDIA: Install NVIDIA Container Toolkit
  • For AMD: Ensure devices are accessible: ls -la /dev/kfd /dev/dri

NVIDIA Container fails to start with "Auto-detected mode as 'legacy'" error

If you encounter this error:

Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #0: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: requirement error: invalid expression

This indicates a Docker/NVIDIA Container Toolkit configuration issue. The container runtime's prestart hook fails before LocalAI starts. This is not a LocalAI code bug.

Solutions:

  1. Use CDI mode (recommended): Update your docker-compose.yaml to use the CDI driver configuration:

    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia.com/gpu
              count: all
              capabilities: [gpu]
    
  2. Upgrade NVIDIA Container Toolkit: Ensure you have version 1.14 or later, which has better CDI support.

  3. Check NVIDIA Container Toolkit configuration: Run nvidia-container-cli --query-gpu to verify your installation is working correctly outside of containers.

  4. Verify Docker GPU access: Test with docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi

Models not downloading

  • Check internet connection
  • Verify disk space: df -h
  • Check container logs for errors: docker logs local-ai or podman logs local-ai

See Also