* feat: allow to set forcing backends eviction while requests are in flight Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat: try to make the request sit and retry if eviction couldn't be done Otherwise calls that in order to pass would need to shutdown other backends would just fail. In this way instead we make the request sit and retry eviction until it succeeds. The thresholds can be configured by the user. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * add tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * expose settings to CLI Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Update docs Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
12 KiB
+++ disableToc = false title = "CLI Reference" weight = 25 url = '/reference/cli-reference' +++
Complete reference for all LocalAI command-line interface (CLI) parameters and environment variables.
Note: All CLI flags can also be set via environment variables. Environment variables take precedence over CLI flags. See [.env files]({{%relref "advanced/advanced-usage#env-files" %}}) for configuration file support.
Global Flags
| Parameter | Default | Description | Environment Variable |
|---|---|---|---|
-h, --help |
Show context-sensitive help | ||
--log-level |
info |
Set the level of logs to output [error,warn,info,debug,trace] | $LOCALAI_LOG_LEVEL |
--debug |
false |
DEPRECATED - Use --log-level=debug instead. Enable debug logging |
$LOCALAI_DEBUG, $DEBUG |
Storage Flags
| Parameter | Default | Description | Environment Variable |
|---|---|---|---|
--models-path |
BASEPATH/models |
Path containing models used for inferencing | $LOCALAI_MODELS_PATH, $MODELS_PATH |
--generated-content-path |
/tmp/generated/content |
Location for assets generated by backends (e.g. stablediffusion, images, audio, videos) | $LOCALAI_GENERATED_CONTENT_PATH, $GENERATED_CONTENT_PATH |
--upload-path |
/tmp/localai/upload |
Path to store uploads from files API | $LOCALAI_UPLOAD_PATH, $UPLOAD_PATH |
--localai-config-dir |
BASEPATH/configuration |
Directory for dynamic loading of certain configuration files (currently runtime_settings.json, api_keys.json, and external_backends.json). See [Runtime Settings]({{%relref "features/runtime-settings" %}}) for web-based configuration. | $LOCALAI_CONFIG_DIR |
--localai-config-dir-poll-interval |
Time duration to poll the LocalAI Config Dir if your system has broken fsnotify events (example: 1m) |
$LOCALAI_CONFIG_DIR_POLL_INTERVAL |
|
--models-config-file |
YAML file containing a list of model backend configs (alias: --config-file) |
$LOCALAI_MODELS_CONFIG_FILE, $CONFIG_FILE |
Backend Flags
| Parameter | Default | Description | Environment Variable |
|---|---|---|---|
--backends-path |
BASEPATH/backends |
Path containing backends used for inferencing | $LOCALAI_BACKENDS_PATH, $BACKENDS_PATH |
--backends-system-path |
/var/lib/local-ai/backends |
Path containing system backends used for inferencing | $LOCALAI_BACKENDS_SYSTEM_PATH, $BACKEND_SYSTEM_PATH |
--external-backends |
A list of external backends to load from gallery on boot | $LOCALAI_EXTERNAL_BACKENDS, $EXTERNAL_BACKENDS |
|
--external-grpc-backends |
A list of external gRPC backends (format: BACKEND_NAME:URI) |
$LOCALAI_EXTERNAL_GRPC_BACKENDS, $EXTERNAL_GRPC_BACKENDS |
|
--backend-galleries |
JSON list of backend galleries | $LOCALAI_BACKEND_GALLERIES, $BACKEND_GALLERIES |
|
--autoload-backend-galleries |
true |
Automatically load backend galleries on startup | $LOCALAI_AUTOLOAD_BACKEND_GALLERIES, $AUTOLOAD_BACKEND_GALLERIES |
--parallel-requests |
false |
Enable backends to handle multiple requests in parallel if they support it (e.g.: llama.cpp or vllm) | $LOCALAI_PARALLEL_REQUESTS, $PARALLEL_REQUESTS |
--max-active-backends |
0 |
Maximum number of active backends (loaded models). When exceeded, the least recently used model is evicted. Set to 0 for unlimited, 1 for single-backend mode |
$LOCALAI_MAX_ACTIVE_BACKENDS, $MAX_ACTIVE_BACKENDS |
--single-active-backend |
false |
DEPRECATED - Use --max-active-backends=1 instead. Allow only one backend to be run at a time |
$LOCALAI_SINGLE_ACTIVE_BACKEND, $SINGLE_ACTIVE_BACKEND |
--preload-backend-only |
false |
Do not launch the API services, only the preloaded models/backends are started (useful for multi-node setups) | $LOCALAI_PRELOAD_BACKEND_ONLY, $PRELOAD_BACKEND_ONLY |
--enable-watchdog-idle |
false |
Enable watchdog for stopping backends that are idle longer than the watchdog-idle-timeout | $LOCALAI_WATCHDOG_IDLE, $WATCHDOG_IDLE |
--watchdog-idle-timeout |
15m |
Threshold beyond which an idle backend should be stopped | $LOCALAI_WATCHDOG_IDLE_TIMEOUT, $WATCHDOG_IDLE_TIMEOUT |
--enable-watchdog-busy |
false |
Enable watchdog for stopping backends that are busy longer than the watchdog-busy-timeout | $LOCALAI_WATCHDOG_BUSY, $WATCHDOG_BUSY |
--watchdog-busy-timeout |
5m |
Threshold beyond which a busy backend should be stopped | $LOCALAI_WATCHDOG_BUSY_TIMEOUT, $WATCHDOG_BUSY_TIMEOUT |
--force-eviction-when-busy |
false |
Force eviction even when models have active API calls (default: false for safety). Warning: Enabling this can interrupt active requests | $LOCALAI_FORCE_EVICTION_WHEN_BUSY, $FORCE_EVICTION_WHEN_BUSY |
--lru-eviction-max-retries |
30 |
Maximum number of retries when waiting for busy models to become idle before eviction | $LOCALAI_LRU_EVICTION_MAX_RETRIES, $LRU_EVICTION_MAX_RETRIES |
--lru-eviction-retry-interval |
1s |
Interval between retries when waiting for busy models to become idle (e.g., 1s, 2s) |
$LOCALAI_LRU_EVICTION_RETRY_INTERVAL, $LRU_EVICTION_RETRY_INTERVAL |
For more information on VRAM management, see [VRAM and Memory Management]({{%relref "advanced/vram-management" %}}).
Models Flags
| Parameter | Default | Description | Environment Variable |
|---|---|---|---|
--galleries |
JSON list of galleries | $LOCALAI_GALLERIES, $GALLERIES |
|
--autoload-galleries |
true |
Automatically load galleries on startup | $LOCALAI_AUTOLOAD_GALLERIES, $AUTOLOAD_GALLERIES |
--preload-models |
A list of models to apply in JSON at start | $LOCALAI_PRELOAD_MODELS, $PRELOAD_MODELS |
|
--models |
A list of model configuration URLs to load | $LOCALAI_MODELS, $MODELS |
|
--preload-models-config |
A list of models to apply at startup. Path to a YAML config file | $LOCALAI_PRELOAD_MODELS_CONFIG, $PRELOAD_MODELS_CONFIG |
|
--load-to-memory |
A list of models to load into memory at startup | $LOCALAI_LOAD_TO_MEMORY, $LOAD_TO_MEMORY |
Note: You can also pass model configuration URLs as positional arguments:
local-ai run MODEL_URL1 MODEL_URL2 ...
Performance Flags
| Parameter | Default | Description | Environment Variable |
|---|---|---|---|
--f16 |
false |
Enable GPU acceleration | $LOCALAI_F16, $F16 |
-t, --threads |
Number of threads used for parallel computation. Usage of the number of physical cores in the system is suggested | $LOCALAI_THREADS, $THREADS |
|
--context-size |
Default context size for models | $LOCALAI_CONTEXT_SIZE, $CONTEXT_SIZE |
API Flags
| Parameter | Default | Description | Environment Variable |
|---|---|---|---|
--address |
:8080 |
Bind address for the API server | $LOCALAI_ADDRESS, $ADDRESS |
--cors |
false |
Enable CORS (Cross-Origin Resource Sharing) | $LOCALAI_CORS, $CORS |
--cors-allow-origins |
Comma-separated list of allowed CORS origins | $LOCALAI_CORS_ALLOW_ORIGINS, $CORS_ALLOW_ORIGINS |
|
--csrf |
false |
Enable Fiber CSRF middleware | $LOCALAI_CSRF |
--upload-limit |
15 |
Default upload-limit in MB | $LOCALAI_UPLOAD_LIMIT, $UPLOAD_LIMIT |
--api-keys |
List of API Keys to enable API authentication. When this is set, all requests must be authenticated with one of these API keys | $LOCALAI_API_KEY, $API_KEY |
|
--disable-webui |
false |
Disables the web user interface. When set to true, the server will only expose API endpoints without serving the web interface | $LOCALAI_DISABLE_WEBUI, $DISABLE_WEBUI |
--disable-runtime-settings |
false |
Disables the runtime settings feature. When set to true, the server will not load runtime settings from the runtime_settings.json file and the settings web interface will be disabled |
$LOCALAI_DISABLE_RUNTIME_SETTINGS, $DISABLE_RUNTIME_SETTINGS |
--disable-gallery-endpoint |
false |
Disable the gallery endpoints | $LOCALAI_DISABLE_GALLERY_ENDPOINT, $DISABLE_GALLERY_ENDPOINT |
--disable-metrics-endpoint |
false |
Disable the /metrics endpoint |
$LOCALAI_DISABLE_METRICS_ENDPOINT, $DISABLE_METRICS_ENDPOINT |
--machine-tag |
If not empty, add that string to Machine-Tag header in each response. Useful to track response from different machines using multiple P2P federated nodes | $LOCALAI_MACHINE_TAG, $MACHINE_TAG |
Hardening Flags
| Parameter | Default | Description | Environment Variable |
|---|---|---|---|
--disable-predownload-scan |
false |
If true, disables the best-effort security scanner before downloading any files | $LOCALAI_DISABLE_PREDOWNLOAD_SCAN |
--opaque-errors |
false |
If true, all error responses are replaced with blank 500 errors. This is intended only for hardening against information leaks and is normally not recommended | $LOCALAI_OPAQUE_ERRORS |
--use-subtle-key-comparison |
false |
If true, API Key validation comparisons will be performed using constant-time comparisons rather than simple equality. This trades off performance on each request for resilience against timing attacks | $LOCALAI_SUBTLE_KEY_COMPARISON |
--disable-api-key-requirement-for-http-get |
false |
If true, a valid API key is not required to issue GET requests to portions of the web UI. This should only be enabled in secure testing environments | $LOCALAI_DISABLE_API_KEY_REQUIREMENT_FOR_HTTP_GET |
--http-get-exempted-endpoints |
^/$,^/browse/?$,^/talk/?$,^/p2p/?$,^/chat/?$,^/text2image/?$,^/tts/?$,^/static/.*$,^/swagger.*$ |
If --disable-api-key-requirement-for-http-get is overridden to true, this is the list of endpoints to exempt. Only adjust this in case of a security incident or as a result of a personal security posture review |
$LOCALAI_HTTP_GET_EXEMPTED_ENDPOINTS |
P2P Flags
| Parameter | Default | Description | Environment Variable |
|---|---|---|---|
--p2p |
false |
Enable P2P mode | $LOCALAI_P2P, $P2P |
--p2p-dht-interval |
360 |
Interval for DHT refresh (used during token generation) | $LOCALAI_P2P_DHT_INTERVAL, $P2P_DHT_INTERVAL |
--p2p-otp-interval |
9000 |
Interval for OTP refresh (used during token generation) | $LOCALAI_P2P_OTP_INTERVAL, $P2P_OTP_INTERVAL |
--p2ptoken |
Token for P2P mode (optional) | $LOCALAI_P2P_TOKEN, $P2P_TOKEN, $TOKEN |
|
--p2p-network-id |
Network ID for P2P mode, can be set arbitrarily by the user for grouping a set of instances | $LOCALAI_P2P_NETWORK_ID, $P2P_NETWORK_ID |
|
--federated |
false |
Enable federated instance | $LOCALAI_FEDERATED, $FEDERATED |
Other Commands
LocalAI supports several subcommands beyond run:
local-ai models- Manage LocalAI models and definitionslocal-ai backends- Manage LocalAI backends and definitionslocal-ai tts- Convert text to speechlocal-ai sound-generation- Generate audio files from text or audiolocal-ai transcript- Convert audio to textlocal-ai worker- Run workers to distribute workload (llama.cpp-only)local-ai util- Utility commandslocal-ai explorer- Run P2P explorerlocal-ai federated- Run LocalAI in federated mode
Use local-ai <command> --help for more information on each command.
Examples
Basic Usage
./local-ai run
./local-ai run --models-path /path/to/models --address :9090
./local-ai run --f16
Environment Variables
export LOCALAI_MODELS_PATH=/path/to/models
export LOCALAI_ADDRESS=:9090
export LOCALAI_F16=true
./local-ai run
Advanced Configuration
./local-ai run \
--models model1.yaml model2.yaml \
--enable-watchdog-idle \
--watchdog-idle-timeout=10m \
--p2p \
--federated
Related Documentation
- See [Advanced Usage]({{%relref "advanced/advanced-usage" %}}) for configuration examples
- See [VRAM and Memory Management]({{%relref "advanced/vram-management" %}}) for memory management options