feat: Add --data-path CLI flag for persistent data separation (#8888)

feat: add --data-path CLI flag for persistent data separation

- Add LOCALAI_DATA_PATH environment variable and --data-path CLI flag
- Default data path: /data (separate from configuration directory)
- Automatic migration on startup: moves agent_tasks.json, agent_jobs.json, collections/, and assets/ from old config dir to new data path
- Backward compatible: preserves old behavior if LOCALAI_DATA_PATH is not set
- Agent state and job directories now use DataPath with proper fallback chain
- Update documentation with new flag and docker-compose example

This separates mutable persistent data (collectiondb, agents, assets, skills) from configuration files, enabling better volume mounting and data persistence in containerized deployments.

Signed-off-by: localai-bot <localai-bot@noreply.github.com>
Co-authored-by: localai-bot <localai-bot@noreply.github.com>
This commit is contained in:
LocalAI [bot]
2026-03-09 14:11:15 +01:00
committed by GitHub
parent 95aef32492
commit d200401e86
7 changed files with 77 additions and 5 deletions

View File

@@ -60,7 +60,7 @@ All agent-related settings can be configured via environment variables:
| `LOCALAI_AGENT_POOL_TRANSCRIPTION_MODEL` | _(empty)_ | Default transcription (speech-to-text) model for agents |
| `LOCALAI_AGENT_POOL_TRANSCRIPTION_LANGUAGE` | _(empty)_ | Default transcription language for agents |
| `LOCALAI_AGENT_POOL_TTS_MODEL` | _(empty)_ | Default TTS (text-to-speech) model for agents |
| `LOCALAI_AGENT_POOL_STATE_DIR` | _(config dir)_ | Directory for persisting agent state |
| `LOCALAI_AGENT_POOL_STATE_DIR` | _(data path)_ | Directory for persisting agent state. Defaults to `LOCALAI_DATA_PATH` if set, otherwise falls back to `LOCALAI_CONFIG_DIR` |
| `LOCALAI_AGENT_POOL_TIMEOUT` | `5m` | Default timeout for agent operations |
| `LOCALAI_AGENT_POOL_ENABLE_SKILLS` | `false` | Enable the skills service |
| `LOCALAI_AGENT_POOL_VECTOR_ENGINE` | `chromem` | Vector engine for knowledge base (`chromem` or `postgres`) |
@@ -96,15 +96,18 @@ services:
- 8080:8080
environment:
- MODELS_PATH=/models
- LOCALAI_DATA_PATH=/data
- LOCALAI_AGENT_POOL_DEFAULT_MODEL=hermes-3-llama3.1-8b
- LOCALAI_AGENT_POOL_EMBEDDING_MODEL=granite-embedding-107m-multilingual
- LOCALAI_AGENT_POOL_ENABLE_SKILLS=true
- LOCALAI_AGENT_POOL_ENABLE_LOGS=true
volumes:
- models:/models
- localai_data:/data
- localai_config:/etc/localai
volumes:
models:
localai_data:
localai_config:
```

View File

@@ -22,6 +22,7 @@ Complete reference for all LocalAI command-line interface (CLI) parameters and e
| Parameter | Default | Description | Environment Variable |
|-----------|---------|-------------|----------------------|
| `--models-path` | `BASEPATH/models` | Path containing models used for inferencing | `$LOCALAI_MODELS_PATH`, `$MODELS_PATH` |
| `--data-path` | `BASEPATH/data` | Path for persistent data (collectiondb, agent state, tasks, jobs). Separates mutable data from configuration | `$LOCALAI_DATA_PATH` |
| `--generated-content-path` | `/tmp/generated/content` | Location for assets generated by backends (e.g. stablediffusion, images, audio, videos) | `$LOCALAI_GENERATED_CONTENT_PATH`, `$GENERATED_CONTENT_PATH` |
| `--upload-path` | `/tmp/localai/upload` | Path to store uploads from files API | `$LOCALAI_UPLOAD_PATH`, `$UPLOAD_PATH` |
| `--localai-config-dir` | `BASEPATH/configuration` | Directory for dynamic loading of certain configuration files (currently runtime_settings.json, api_keys.json, and external_backends.json). See [Runtime Settings]({{%relref "features/runtime-settings" %}}) for web-based configuration. | `$LOCALAI_CONFIG_DIR` |