Files
LocalAI/docs/content/reference/cli-reference.md
Ettore Di Giacinto c844b7ac58 feat: disable force eviction (#7725)
* feat: allow to set forcing backends eviction while requests are in flight

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat: try to make the request sit and retry if eviction couldn't be done

Otherwise calls that in order to pass would need to shutdown other
backends would just fail.

In this way instead we make the request sit and retry eviction until it
succeeds. The thresholds can be configured by the user.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* add tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* expose settings to CLI

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Update docs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-25 14:26:18 +01:00

12 KiB

+++ disableToc = false title = "CLI Reference" weight = 25 url = '/reference/cli-reference' +++

Complete reference for all LocalAI command-line interface (CLI) parameters and environment variables.

Note: All CLI flags can also be set via environment variables. Environment variables take precedence over CLI flags. See [.env files]({{%relref "advanced/advanced-usage#env-files" %}}) for configuration file support.

Global Flags

Parameter Default Description Environment Variable
-h, --help Show context-sensitive help
--log-level info Set the level of logs to output [error,warn,info,debug,trace] $LOCALAI_LOG_LEVEL
--debug false DEPRECATED - Use --log-level=debug instead. Enable debug logging $LOCALAI_DEBUG, $DEBUG

Storage Flags

Parameter Default Description Environment Variable
--models-path BASEPATH/models Path containing models used for inferencing $LOCALAI_MODELS_PATH, $MODELS_PATH
--generated-content-path /tmp/generated/content Location for assets generated by backends (e.g. stablediffusion, images, audio, videos) $LOCALAI_GENERATED_CONTENT_PATH, $GENERATED_CONTENT_PATH
--upload-path /tmp/localai/upload Path to store uploads from files API $LOCALAI_UPLOAD_PATH, $UPLOAD_PATH
--localai-config-dir BASEPATH/configuration Directory for dynamic loading of certain configuration files (currently runtime_settings.json, api_keys.json, and external_backends.json). See [Runtime Settings]({{%relref "features/runtime-settings" %}}) for web-based configuration. $LOCALAI_CONFIG_DIR
--localai-config-dir-poll-interval Time duration to poll the LocalAI Config Dir if your system has broken fsnotify events (example: 1m) $LOCALAI_CONFIG_DIR_POLL_INTERVAL
--models-config-file YAML file containing a list of model backend configs (alias: --config-file) $LOCALAI_MODELS_CONFIG_FILE, $CONFIG_FILE

Backend Flags

Parameter Default Description Environment Variable
--backends-path BASEPATH/backends Path containing backends used for inferencing $LOCALAI_BACKENDS_PATH, $BACKENDS_PATH
--backends-system-path /var/lib/local-ai/backends Path containing system backends used for inferencing $LOCALAI_BACKENDS_SYSTEM_PATH, $BACKEND_SYSTEM_PATH
--external-backends A list of external backends to load from gallery on boot $LOCALAI_EXTERNAL_BACKENDS, $EXTERNAL_BACKENDS
--external-grpc-backends A list of external gRPC backends (format: BACKEND_NAME:URI) $LOCALAI_EXTERNAL_GRPC_BACKENDS, $EXTERNAL_GRPC_BACKENDS
--backend-galleries JSON list of backend galleries $LOCALAI_BACKEND_GALLERIES, $BACKEND_GALLERIES
--autoload-backend-galleries true Automatically load backend galleries on startup $LOCALAI_AUTOLOAD_BACKEND_GALLERIES, $AUTOLOAD_BACKEND_GALLERIES
--parallel-requests false Enable backends to handle multiple requests in parallel if they support it (e.g.: llama.cpp or vllm) $LOCALAI_PARALLEL_REQUESTS, $PARALLEL_REQUESTS
--max-active-backends 0 Maximum number of active backends (loaded models). When exceeded, the least recently used model is evicted. Set to 0 for unlimited, 1 for single-backend mode $LOCALAI_MAX_ACTIVE_BACKENDS, $MAX_ACTIVE_BACKENDS
--single-active-backend false DEPRECATED - Use --max-active-backends=1 instead. Allow only one backend to be run at a time $LOCALAI_SINGLE_ACTIVE_BACKEND, $SINGLE_ACTIVE_BACKEND
--preload-backend-only false Do not launch the API services, only the preloaded models/backends are started (useful for multi-node setups) $LOCALAI_PRELOAD_BACKEND_ONLY, $PRELOAD_BACKEND_ONLY
--enable-watchdog-idle false Enable watchdog for stopping backends that are idle longer than the watchdog-idle-timeout $LOCALAI_WATCHDOG_IDLE, $WATCHDOG_IDLE
--watchdog-idle-timeout 15m Threshold beyond which an idle backend should be stopped $LOCALAI_WATCHDOG_IDLE_TIMEOUT, $WATCHDOG_IDLE_TIMEOUT
--enable-watchdog-busy false Enable watchdog for stopping backends that are busy longer than the watchdog-busy-timeout $LOCALAI_WATCHDOG_BUSY, $WATCHDOG_BUSY
--watchdog-busy-timeout 5m Threshold beyond which a busy backend should be stopped $LOCALAI_WATCHDOG_BUSY_TIMEOUT, $WATCHDOG_BUSY_TIMEOUT
--force-eviction-when-busy false Force eviction even when models have active API calls (default: false for safety). Warning: Enabling this can interrupt active requests $LOCALAI_FORCE_EVICTION_WHEN_BUSY, $FORCE_EVICTION_WHEN_BUSY
--lru-eviction-max-retries 30 Maximum number of retries when waiting for busy models to become idle before eviction $LOCALAI_LRU_EVICTION_MAX_RETRIES, $LRU_EVICTION_MAX_RETRIES
--lru-eviction-retry-interval 1s Interval between retries when waiting for busy models to become idle (e.g., 1s, 2s) $LOCALAI_LRU_EVICTION_RETRY_INTERVAL, $LRU_EVICTION_RETRY_INTERVAL

For more information on VRAM management, see [VRAM and Memory Management]({{%relref "advanced/vram-management" %}}).

Models Flags

Parameter Default Description Environment Variable
--galleries JSON list of galleries $LOCALAI_GALLERIES, $GALLERIES
--autoload-galleries true Automatically load galleries on startup $LOCALAI_AUTOLOAD_GALLERIES, $AUTOLOAD_GALLERIES
--preload-models A list of models to apply in JSON at start $LOCALAI_PRELOAD_MODELS, $PRELOAD_MODELS
--models A list of model configuration URLs to load $LOCALAI_MODELS, $MODELS
--preload-models-config A list of models to apply at startup. Path to a YAML config file $LOCALAI_PRELOAD_MODELS_CONFIG, $PRELOAD_MODELS_CONFIG
--load-to-memory A list of models to load into memory at startup $LOCALAI_LOAD_TO_MEMORY, $LOAD_TO_MEMORY

Note: You can also pass model configuration URLs as positional arguments: local-ai run MODEL_URL1 MODEL_URL2 ...

Performance Flags

Parameter Default Description Environment Variable
--f16 false Enable GPU acceleration $LOCALAI_F16, $F16
-t, --threads Number of threads used for parallel computation. Usage of the number of physical cores in the system is suggested $LOCALAI_THREADS, $THREADS
--context-size Default context size for models $LOCALAI_CONTEXT_SIZE, $CONTEXT_SIZE

API Flags

Parameter Default Description Environment Variable
--address :8080 Bind address for the API server $LOCALAI_ADDRESS, $ADDRESS
--cors false Enable CORS (Cross-Origin Resource Sharing) $LOCALAI_CORS, $CORS
--cors-allow-origins Comma-separated list of allowed CORS origins $LOCALAI_CORS_ALLOW_ORIGINS, $CORS_ALLOW_ORIGINS
--csrf false Enable Fiber CSRF middleware $LOCALAI_CSRF
--upload-limit 15 Default upload-limit in MB $LOCALAI_UPLOAD_LIMIT, $UPLOAD_LIMIT
--api-keys List of API Keys to enable API authentication. When this is set, all requests must be authenticated with one of these API keys $LOCALAI_API_KEY, $API_KEY
--disable-webui false Disables the web user interface. When set to true, the server will only expose API endpoints without serving the web interface $LOCALAI_DISABLE_WEBUI, $DISABLE_WEBUI
--disable-runtime-settings false Disables the runtime settings feature. When set to true, the server will not load runtime settings from the runtime_settings.json file and the settings web interface will be disabled $LOCALAI_DISABLE_RUNTIME_SETTINGS, $DISABLE_RUNTIME_SETTINGS
--disable-gallery-endpoint false Disable the gallery endpoints $LOCALAI_DISABLE_GALLERY_ENDPOINT, $DISABLE_GALLERY_ENDPOINT
--disable-metrics-endpoint false Disable the /metrics endpoint $LOCALAI_DISABLE_METRICS_ENDPOINT, $DISABLE_METRICS_ENDPOINT
--machine-tag If not empty, add that string to Machine-Tag header in each response. Useful to track response from different machines using multiple P2P federated nodes $LOCALAI_MACHINE_TAG, $MACHINE_TAG

Hardening Flags

Parameter Default Description Environment Variable
--disable-predownload-scan false If true, disables the best-effort security scanner before downloading any files $LOCALAI_DISABLE_PREDOWNLOAD_SCAN
--opaque-errors false If true, all error responses are replaced with blank 500 errors. This is intended only for hardening against information leaks and is normally not recommended $LOCALAI_OPAQUE_ERRORS
--use-subtle-key-comparison false If true, API Key validation comparisons will be performed using constant-time comparisons rather than simple equality. This trades off performance on each request for resilience against timing attacks $LOCALAI_SUBTLE_KEY_COMPARISON
--disable-api-key-requirement-for-http-get false If true, a valid API key is not required to issue GET requests to portions of the web UI. This should only be enabled in secure testing environments $LOCALAI_DISABLE_API_KEY_REQUIREMENT_FOR_HTTP_GET
--http-get-exempted-endpoints ^/$,^/browse/?$,^/talk/?$,^/p2p/?$,^/chat/?$,^/text2image/?$,^/tts/?$,^/static/.*$,^/swagger.*$ If --disable-api-key-requirement-for-http-get is overridden to true, this is the list of endpoints to exempt. Only adjust this in case of a security incident or as a result of a personal security posture review $LOCALAI_HTTP_GET_EXEMPTED_ENDPOINTS

P2P Flags

Parameter Default Description Environment Variable
--p2p false Enable P2P mode $LOCALAI_P2P, $P2P
--p2p-dht-interval 360 Interval for DHT refresh (used during token generation) $LOCALAI_P2P_DHT_INTERVAL, $P2P_DHT_INTERVAL
--p2p-otp-interval 9000 Interval for OTP refresh (used during token generation) $LOCALAI_P2P_OTP_INTERVAL, $P2P_OTP_INTERVAL
--p2ptoken Token for P2P mode (optional) $LOCALAI_P2P_TOKEN, $P2P_TOKEN, $TOKEN
--p2p-network-id Network ID for P2P mode, can be set arbitrarily by the user for grouping a set of instances $LOCALAI_P2P_NETWORK_ID, $P2P_NETWORK_ID
--federated false Enable federated instance $LOCALAI_FEDERATED, $FEDERATED

Other Commands

LocalAI supports several subcommands beyond run:

  • local-ai models - Manage LocalAI models and definitions
  • local-ai backends - Manage LocalAI backends and definitions
  • local-ai tts - Convert text to speech
  • local-ai sound-generation - Generate audio files from text or audio
  • local-ai transcript - Convert audio to text
  • local-ai worker - Run workers to distribute workload (llama.cpp-only)
  • local-ai util - Utility commands
  • local-ai explorer - Run P2P explorer
  • local-ai federated - Run LocalAI in federated mode

Use local-ai <command> --help for more information on each command.

Examples

Basic Usage

./local-ai run

./local-ai run --models-path /path/to/models --address :9090

./local-ai run --f16

Environment Variables

export LOCALAI_MODELS_PATH=/path/to/models
export LOCALAI_ADDRESS=:9090
export LOCALAI_F16=true
./local-ai run

Advanced Configuration

./local-ai run \
  --models model1.yaml model2.yaml \
  --enable-watchdog-idle \
  --watchdog-idle-timeout=10m \
  --p2p \
  --federated
  • See [Advanced Usage]({{%relref "advanced/advanced-usage" %}}) for configuration examples
  • See [VRAM and Memory Management]({{%relref "advanced/vram-management" %}}) for memory management options