+++
disableToc = false
title = "Troubleshooting"
weight = 9
url = '/basics/troubleshooting/'
icon = "build"
+++

This guide covers common issues you may encounter when using LocalAI, organized by category. For each issue, diagnostic steps and solutions are provided.

## Quick Diagnostics

Before diving into specific issues, run these commands to gather diagnostic information:

```bash
# Check LocalAI is running and responsive
curl http://localhost:8080/readyz

# List loaded models
curl http://localhost:8080/v1/models

# Check LocalAI version
local-ai --version

# Enable debug logging for detailed output
DEBUG=true local-ai run
# or
local-ai run --log-level=debug
```

For Docker deployments:

```bash
# View container logs
docker logs local-ai

# Check container status
docker ps -a | grep local-ai

# Test GPU access (NVIDIA)
docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu24.04 nvidia-smi
```

## Installation Issues

### Binary Won't Execute on Linux

**Symptoms:** Permission denied or "cannot execute binary file" errors.

**Solution:**

```bash
chmod +x local-ai-*
./local-ai-Linux-x86_64 run
```

If you see "cannot execute binary file: Exec format error", you downloaded the wrong architecture. Verify with:

```bash
uname -m
# x86_64 → download the x86_64 binary
# aarch64 → download the arm64 binary
```

### macOS: Application Is Quarantined

**Symptoms:** macOS blocks LocalAI from running because the DMG is not signed by Apple.

**Solution:** See [GitHub issue #6268](https://github.com/mudler/LocalAI/issues/6268) for quarantine bypass instructions. This is tracked for resolution in [issue #6244](https://github.com/mudler/LocalAI/issues/6244).


## Model Loading Problems

### Model Not Found

**Symptoms:** API returns `404` or `"model not found"` error.

**Diagnostic steps:**

1. Check the model exists in your models directory:
   ```bash
   ls -la /path/to/models/
   ```

2. Verify your models path is correct:
   ```bash
   # Check what path LocalAI is using
   local-ai run --models-path /path/to/models --log-level=debug
   ```

3. Confirm the model name matches your request:
   ```bash
   # List available models
   curl http://localhost:8080/v1/models | jq '.data[].id'
   ```

### Model Fails to Load (Backend Error)

**Symptoms:** Model is found but fails to load, with backend errors in the logs.

**Common causes and fixes:**

- **Wrong backend:** Ensure the backend in your model YAML matches the model format. GGUF models use `llama-cpp`, diffusion models use `diffusers`, etc. See the [compatibility table](/docs/reference/compatibility-table/) for details.
- **Backend not installed:** Check installed backends:
  ```bash
  local-ai backends list
  # Install a missing backend:
  local-ai backends install llama-cpp
  ```
- **Corrupt model file:** Re-download the model. Partial downloads or disk errors can corrupt files.
- **Wrong model format:** LocalAI uses GGUF format for llama.cpp models. Older GGML format is deprecated.

### Model Configuration Issues

**Symptoms:** Model loads but produces unexpected results or errors during inference.

Check your model YAML configuration:

```yaml
# Example model config
name: my-model
backend: llama-cpp
parameters:
  model: my-model.gguf  # Relative to models directory
context_size: 2048
threads: 4  # Should match physical CPU cores
```

Common mistakes:
- `model` path must be relative to the models directory, not an absolute path
- `threads` set higher than physical CPU cores causes contention
- `context_size` too large for available RAM causes OOM errors

## GPU and Memory Issues

### GPU Not Detected

**NVIDIA (CUDA):**

```bash
# Verify CUDA is available
nvidia-smi

# For Docker, verify GPU passthrough
docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu24.04 nvidia-smi
```

When working correctly, LocalAI logs should show: `ggml_init_cublas: found X CUDA devices`.

Ensure you are using a CUDA-enabled container image (tags containing `cuda11`, `cuda12`, or `cuda13`). CPU-only images cannot use NVIDIA GPUs.

**AMD (ROCm):**

```bash
# Verify ROCm installation
rocminfo

# Docker requires device passthrough
docker run --device=/dev/kfd --device=/dev/dri --group-add=video ...
```

If your GPU is not in the default target list, open up an Issue. Supported targets include: gfx900, gfx906, gfx908, gfx90a, gfx940, gfx941, gfx942, gfx1030, gfx1031, gfx1100, gfx1101.

**Intel (SYCL):**

```bash
# Docker requires device passthrough
docker run --device /dev/dri ...
```

Use container images with `gpu-intel` in the tag. **Known issue:** SYCL hangs when `mmap: true` is set — disable it in your model config:

```yaml
mmap: false
```

**Overriding backend auto-detection:**

If LocalAI picks the wrong GPU backend, override it:

```bash
LOCALAI_FORCE_META_BACKEND_CAPABILITY=nvidia local-ai run
# Options: default, nvidia, amd, intel
```

### Out of Memory (OOM)

**Symptoms:** Model loading fails or the process is killed by the OS.

**Solutions:**

1. **Use smaller quantizations:** Q4_K_S or Q2_K use significantly less memory than Q8_0 or Q6_K
2. **Reduce context size:** Lower `context_size` in your model YAML
3. **Enable low VRAM mode:** Add `low_vram: true` to your model config
4. **Limit active models:** Only keep one model loaded at a time:
   ```bash
   local-ai run --max-active-backends=1
   ```
5. **Enable idle watchdog:** Automatically unload unused models:
   ```bash
   local-ai run --enable-watchdog-idle --watchdog-idle-timeout=10m
   ```
6. **Manually unload a model:**
   ```bash
   curl -X POST http://localhost:8080/backend/shutdown \
     -H "Content-Type: application/json" \
     -d '{"model": "model-name"}'
   ```

### Models Stay Loaded and Consume Memory

By default, models remain loaded in memory after first use. This can exhaust VRAM when switching between models.

**Configure LRU eviction:**

```bash
# Keep at most 2 models loaded; evict least recently used
local-ai run --max-active-backends=2
```

**Configure watchdog auto-unload:**

```bash
local-ai run \
  --enable-watchdog-idle --watchdog-idle-timeout=15m \
  --enable-watchdog-busy --watchdog-busy-timeout=5m
```

These can also be set via environment variables (`LOCALAI_WATCHDOG_IDLE=true`, `LOCALAI_WATCHDOG_IDLE_TIMEOUT=15m`) or in the Web UI under Settings → Watchdog Settings.

See the [VRAM Management guide](/advanced/vram-management/) for more details.

## API Connection Problems

### Connection Refused

**Symptoms:** `curl: (7) Failed to connect to localhost port 8080: Connection refused`

**Diagnostic steps:**

1. Verify LocalAI is running:
   ```bash
   # Direct install
   ps aux | grep local-ai

   # Docker
   docker ps | grep local-ai
   ```

2. Check the bind address and port:
   ```bash
   # Default is :8080. Override with:
   local-ai run --address=0.0.0.0:8080
   # or
   LOCALAI_ADDRESS=":8080" local-ai run
   ```

3. Check for port conflicts:
   ```bash
   ss -tlnp | grep 8080
   ```

### Authentication Errors (401)

**Symptoms:** `401 Unauthorized` response.

If API key authentication is enabled (`LOCALAI_API_KEY` or `--api-keys`), include the key in your requests:

```bash
curl http://localhost:8080/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"
```

Keys can also be passed via `x-api-key` or `xi-api-key` headers.

### Request Errors (400/422)

**Symptoms:** `400 Bad Request` or `422 Unprocessable Entity`.

Common causes:
- Malformed JSON in request body
- Missing required fields (e.g., `model` or `messages`)
- Invalid parameter values (e.g., negative `top_n` for reranking)

Enable debug logging to see the full request/response:

```bash
DEBUG=true local-ai run
```

See the [API Errors reference](/reference/api-errors/) for a complete list of error codes and their meanings.

## Performance Issues

### Slow Inference

**Diagnostic steps:**

1. Enable debug mode to see inference timing:
   ```bash
   DEBUG=true local-ai run
   ```

2. Use streaming to measure time-to-first-token:
   ```bash
   curl http://localhost:8080/v1/chat/completions \
     -H "Content-Type: application/json" \
     -d '{"model": "my-model", "messages": [{"role": "user", "content": "Hello"}], "stream": true}'
   ```

**Common causes and fixes:**

- **Model on HDD:** Move models to an SSD. If stuck with HDD, disable memory mapping (`mmap: false`) to load the model entirely into RAM.
- **Thread overbooking:** Set `--threads` to match your physical CPU core count (not logical/hyperthreaded count).
- **Default sampling:** LocalAI uses mirostat sampling by default, which produces better quality output but is slower. Disable it for benchmarking:
  ```yaml
  # In model config
  mirostat: 0
  ```
- **No GPU offloading:** Ensure `gpu_layers` is set in your model config to offload layers to GPU:
  ```yaml
  gpu_layers: 99  # Offload all layers
  ```
- **Context size too large:** Larger context sizes require more memory and slow down inference. Use the smallest context size that meets your needs.

### High Memory Usage

- Use quantized models (Q4_K_M is a good balance of quality and size)
- Reduce `context_size`
- Enable `low_vram: true` in model config
- Disable `mmlock` (memory locking) if it's enabled
- Set `--max-active-backends=1` to keep only one model in memory

## Docker-Specific Problems

### Container Fails to Start

**Diagnostic steps:**

```bash
# Check container logs
docker logs local-ai

# Check if port is already in use
ss -tlnp | grep 8080

# Verify the image exists
docker images | grep localai
```

### GPU Not Available Inside Container

**NVIDIA:**

```bash
# Ensure nvidia-container-toolkit is installed, then:
docker run --gpus all ...
```

**AMD:**

```bash
docker run --device=/dev/kfd --device=/dev/dri --group-add=video ...
```

**Intel:**

```bash
docker run --device /dev/dri ...
```

### Health Checks Failing

Add a health check to your Docker Compose configuration:

```yaml
services:
  local-ai:
    image: localai/localai:latest
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 30s
      timeout: 10s
      retries: 3
```

### Models Not Persisted Between Restarts

Mount a volume for your models directory:

```yaml
services:
  local-ai:
    volumes:
      - ./models:/build/models:cached
```

## Network and P2P Issues

### P2P Workers Not Discovered

**Symptoms:** Distributed inference setup but workers are not found.

**Key requirements:**

- Use `--net host` or `network_mode: host` in Docker
- Share the same P2P token across all nodes

**Debug P2P connectivity:**

```bash
LOCALAI_P2P_LOGLEVEL=debug \
LOCALAI_P2P_LIB_LOGLEVEL=debug \
LOCALAI_P2P_ENABLE_LIMITS=true \
LOCALAI_P2P_TOKEN="<TOKEN>" \
local-ai run
```

**If DHT is causing issues**, try disabling it to use local mDNS discovery instead:

```bash
LOCALAI_P2P_DISABLE_DHT=true local-ai run
```

### P2P Limitations

- Only a single model is currently supported for distributed inference
- Workers must be detected before inference starts — you cannot add workers mid-inference
- Workers mode supports llama-cpp compatible models only

See the [Distributed Inferencing guide](/features/distributed-inferencing/) for full setup instructions.

## Still Having Issues?

If your issue isn't covered here:

1. **Search existing issues:** Check the [GitHub Issues](https://github.com/mudler/LocalAI/issues) for similar problems
2. **Enable debug logging:** Run with `DEBUG=true` or `--log-level=debug` and include the logs when reporting
3. **Open a new issue:** Include your OS, hardware (CPU/GPU), LocalAI version, model being used, full error logs, and steps to reproduce
4. **Community help:** Join the [LocalAI Discord](https://discord.gg/uJAeKSAGDy) for community support