diff --git a/docs/content/getting-started/troubleshooting.md b/docs/content/getting-started/troubleshooting.md new file mode 100644 index 000000000..dc6ae6668 --- /dev/null +++ b/docs/content/getting-started/troubleshooting.md @@ -0,0 +1,444 @@ ++++ +disableToc = false +title = "Troubleshooting" +weight = 9 +url = '/basics/troubleshooting/' +icon = "build" ++++ + +This guide covers common issues you may encounter when using LocalAI, organized by category. For each issue, diagnostic steps and solutions are provided. + +## Quick Diagnostics + +Before diving into specific issues, run these commands to gather diagnostic information: + +```bash +# Check LocalAI is running and responsive +curl http://localhost:8080/readyz + +# List loaded models +curl http://localhost:8080/v1/models + +# Check LocalAI version +local-ai --version + +# Enable debug logging for detailed output +DEBUG=true local-ai run +# or +local-ai run --log-level=debug +``` + +For Docker deployments: + +```bash +# View container logs +docker logs local-ai + +# Check container status +docker ps -a | grep local-ai + +# Test GPU access (NVIDIA) +docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu24.04 nvidia-smi +``` + +## Installation Issues + +### Binary Won't Execute on Linux + +**Symptoms:** Permission denied or "cannot execute binary file" errors. + +**Solution:** + +```bash +chmod +x local-ai-* +./local-ai-Linux-x86_64 run +``` + +If you see "cannot execute binary file: Exec format error", you downloaded the wrong architecture. Verify with: + +```bash +uname -m +# x86_64 → download the x86_64 binary +# aarch64 → download the arm64 binary +``` + +### macOS: Application Is Quarantined + +**Symptoms:** macOS blocks LocalAI from running because the DMG is not signed by Apple. + +**Solution:** See [GitHub issue #6268](https://github.com/mudler/LocalAI/issues/6268) for quarantine bypass instructions. This is tracked for resolution in [issue #6244](https://github.com/mudler/LocalAI/issues/6244). + + + + + + +## Model Loading Problems + +### Model Not Found + +**Symptoms:** API returns `404` or `"model not found"` error. + +**Diagnostic steps:** + +1. Check the model exists in your models directory: + ```bash + ls -la /path/to/models/ + ``` + +2. Verify your models path is correct: + ```bash + # Check what path LocalAI is using + local-ai run --models-path /path/to/models --log-level=debug + ``` + +3. Confirm the model name matches your request: + ```bash + # List available models + curl http://localhost:8080/v1/models | jq '.data[].id' + ``` + +### Model Fails to Load (Backend Error) + +**Symptoms:** Model is found but fails to load, with backend errors in the logs. + +**Common causes and fixes:** + +- **Wrong backend:** Ensure the backend in your model YAML matches the model format. GGUF models use `llama-cpp`, diffusion models use `diffusers`, etc. See the [compatibility table](/docs/reference/compatibility-table/) for details. +- **Backend not installed:** Check installed backends: + ```bash + local-ai backends list + # Install a missing backend: + local-ai backends install llama-cpp + ``` +- **Corrupt model file:** Re-download the model. Partial downloads or disk errors can corrupt files. +- **Wrong model format:** LocalAI uses GGUF format for llama.cpp models. Older GGML format is deprecated. + +### Model Configuration Issues + +**Symptoms:** Model loads but produces unexpected results or errors during inference. + +Check your model YAML configuration: + +```yaml +# Example model config +name: my-model +backend: llama-cpp +parameters: + model: my-model.gguf # Relative to models directory +context_size: 2048 +threads: 4 # Should match physical CPU cores +``` + +Common mistakes: +- `model` path must be relative to the models directory, not an absolute path +- `threads` set higher than physical CPU cores causes contention +- `context_size` too large for available RAM causes OOM errors + +## GPU and Memory Issues + +### GPU Not Detected + +**NVIDIA (CUDA):** + +```bash +# Verify CUDA is available +nvidia-smi + +# For Docker, verify GPU passthrough +docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu24.04 nvidia-smi +``` + +When working correctly, LocalAI logs should show: `ggml_init_cublas: found X CUDA devices`. + +Ensure you are using a CUDA-enabled container image (tags containing `cuda11`, `cuda12`, or `cuda13`). CPU-only images cannot use NVIDIA GPUs. + +**AMD (ROCm):** + +```bash +# Verify ROCm installation +rocminfo + +# Docker requires device passthrough +docker run --device=/dev/kfd --device=/dev/dri --group-add=video ... +``` + +If your GPU is not in the default target list, open up an Issue. Supported targets include: gfx900, gfx906, gfx908, gfx90a, gfx940, gfx941, gfx942, gfx1030, gfx1031, gfx1100, gfx1101. + +**Intel (SYCL):** + +```bash +# Docker requires device passthrough +docker run --device /dev/dri ... +``` + +Use container images with `gpu-intel` in the tag. **Known issue:** SYCL hangs when `mmap: true` is set — disable it in your model config: + +```yaml +mmap: false +``` + +**Overriding backend auto-detection:** + +If LocalAI picks the wrong GPU backend, override it: + +```bash +LOCALAI_FORCE_META_BACKEND_CAPABILITY=nvidia local-ai run +# Options: default, nvidia, amd, intel +``` + +### Out of Memory (OOM) + +**Symptoms:** Model loading fails or the process is killed by the OS. + +**Solutions:** + +1. **Use smaller quantizations:** Q4_K_S or Q2_K use significantly less memory than Q8_0 or Q6_K +2. **Reduce context size:** Lower `context_size` in your model YAML +3. **Enable low VRAM mode:** Add `low_vram: true` to your model config +4. **Limit active models:** Only keep one model loaded at a time: + ```bash + local-ai run --max-active-backends=1 + ``` +5. **Enable idle watchdog:** Automatically unload unused models: + ```bash + local-ai run --enable-watchdog-idle --watchdog-idle-timeout=10m + ``` +6. **Manually unload a model:** + ```bash + curl -X POST http://localhost:8080/backend/shutdown \ + -H "Content-Type: application/json" \ + -d '{"model": "model-name"}' + ``` + +### Models Stay Loaded and Consume Memory + +By default, models remain loaded in memory after first use. This can exhaust VRAM when switching between models. + +**Configure LRU eviction:** + +```bash +# Keep at most 2 models loaded; evict least recently used +local-ai run --max-active-backends=2 +``` + +**Configure watchdog auto-unload:** + +```bash +local-ai run \ + --enable-watchdog-idle --watchdog-idle-timeout=15m \ + --enable-watchdog-busy --watchdog-busy-timeout=5m +``` + +These can also be set via environment variables (`LOCALAI_WATCHDOG_IDLE=true`, `LOCALAI_WATCHDOG_IDLE_TIMEOUT=15m`) or in the Web UI under Settings → Watchdog Settings. + +See the [VRAM Management guide](/advanced/vram-management/) for more details. + +## API Connection Problems + +### Connection Refused + +**Symptoms:** `curl: (7) Failed to connect to localhost port 8080: Connection refused` + +**Diagnostic steps:** + +1. Verify LocalAI is running: + ```bash + # Direct install + ps aux | grep local-ai + + # Docker + docker ps | grep local-ai + ``` + +2. Check the bind address and port: + ```bash + # Default is :8080. Override with: + local-ai run --address=0.0.0.0:8080 + # or + LOCALAI_ADDRESS=":8080" local-ai run + ``` + +3. Check for port conflicts: + ```bash + ss -tlnp | grep 8080 + ``` + +### Authentication Errors (401) + +**Symptoms:** `401 Unauthorized` response. + +If API key authentication is enabled (`LOCALAI_API_KEY` or `--api-keys`), include the key in your requests: + +```bash +curl http://localhost:8080/v1/models \ + -H "Authorization: Bearer YOUR_API_KEY" +``` + +Keys can also be passed via `x-api-key` or `xi-api-key` headers. + +### Request Errors (400/422) + +**Symptoms:** `400 Bad Request` or `422 Unprocessable Entity`. + +Common causes: +- Malformed JSON in request body +- Missing required fields (e.g., `model` or `messages`) +- Invalid parameter values (e.g., negative `top_n` for reranking) + +Enable debug logging to see the full request/response: + +```bash +DEBUG=true local-ai run +``` + +See the [API Errors reference](/reference/api-errors/) for a complete list of error codes and their meanings. + +## Performance Issues + +### Slow Inference + +**Diagnostic steps:** + +1. Enable debug mode to see inference timing: + ```bash + DEBUG=true local-ai run + ``` + +2. Use streaming to measure time-to-first-token: + ```bash + curl http://localhost:8080/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{"model": "my-model", "messages": [{"role": "user", "content": "Hello"}], "stream": true}' + ``` + +**Common causes and fixes:** + +- **Model on HDD:** Move models to an SSD. If stuck with HDD, disable memory mapping (`mmap: false`) to load the model entirely into RAM. +- **Thread overbooking:** Set `--threads` to match your physical CPU core count (not logical/hyperthreaded count). +- **Default sampling:** LocalAI uses mirostat sampling by default, which produces better quality output but is slower. Disable it for benchmarking: + ```yaml + # In model config + mirostat: 0 + ``` +- **No GPU offloading:** Ensure `gpu_layers` is set in your model config to offload layers to GPU: + ```yaml + gpu_layers: 99 # Offload all layers + ``` +- **Context size too large:** Larger context sizes require more memory and slow down inference. Use the smallest context size that meets your needs. + +### High Memory Usage + +- Use quantized models (Q4_K_M is a good balance of quality and size) +- Reduce `context_size` +- Enable `low_vram: true` in model config +- Disable `mmlock` (memory locking) if it's enabled +- Set `--max-active-backends=1` to keep only one model in memory + +## Docker-Specific Problems + +### Container Fails to Start + +**Diagnostic steps:** + +```bash +# Check container logs +docker logs local-ai + +# Check if port is already in use +ss -tlnp | grep 8080 + +# Verify the image exists +docker images | grep localai +``` + +### GPU Not Available Inside Container + +**NVIDIA:** + +```bash +# Ensure nvidia-container-toolkit is installed, then: +docker run --gpus all ... +``` + +**AMD:** + +```bash +docker run --device=/dev/kfd --device=/dev/dri --group-add=video ... +``` + +**Intel:** + +```bash +docker run --device /dev/dri ... +``` + +### Health Checks Failing + +Add a health check to your Docker Compose configuration: + +```yaml +services: + local-ai: + image: localai/localai:latest + healthcheck: + test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"] + interval: 30s + timeout: 10s + retries: 3 +``` + +### Models Not Persisted Between Restarts + +Mount a volume for your models directory: + +```yaml +services: + local-ai: + volumes: + - ./models:/build/models:cached +``` + +## Network and P2P Issues + +### P2P Workers Not Discovered + +**Symptoms:** Distributed inference setup but workers are not found. + +**Key requirements:** + +- Use `--net host` or `network_mode: host` in Docker +- Share the same P2P token across all nodes + +**Debug P2P connectivity:** + +```bash +LOCALAI_P2P_LOGLEVEL=debug \ +LOCALAI_P2P_LIB_LOGLEVEL=debug \ +LOCALAI_P2P_ENABLE_LIMITS=true \ +LOCALAI_P2P_TOKEN="" \ +local-ai run +``` + +**If DHT is causing issues**, try disabling it to use local mDNS discovery instead: + +```bash +LOCALAI_P2P_DISABLE_DHT=true local-ai run +``` + +### P2P Limitations + +- Only a single model is currently supported for distributed inference +- Workers must be detected before inference starts — you cannot add workers mid-inference +- Workers mode supports llama-cpp compatible models only + +See the [Distributed Inferencing guide](/features/distributed-inferencing/) for full setup instructions. + +## Still Having Issues? + +If your issue isn't covered here: + +1. **Search existing issues:** Check the [GitHub Issues](https://github.com/mudler/LocalAI/issues) for similar problems +2. **Enable debug logging:** Run with `DEBUG=true` or `--log-level=debug` and include the logs when reporting +3. **Open a new issue:** Include your OS, hardware (CPU/GPU), LocalAI version, model being used, full error logs, and steps to reproduce +4. **Community help:** Join the [LocalAI Discord](https://discord.gg/uJAeKSAGDy) for community support