LocalAI/docs/content/reference/cli-reference.md at master

mirror of https://github.com/mudler/LocalAI.git synced 2025-12-26 16:09:20 -05:00

Files

Ettore Di Giacinto c844b7ac58 feat: disable force eviction (#7725 )

* feat: allow to set forcing backends eviction while requests are in flight

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat: try to make the request sit and retry if eviction couldn't be done

Otherwise calls that in order to pass would need to shutdown other
backends would just fail.

In this way instead we make the request sit and retry eviction until it
succeeds. The thresholds can be configured by the user.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* add tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* expose settings to CLI

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Update docs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2025-12-25 14:26:18 +01:00

12 KiB

Raw Permalink Blame History

+++ disableToc = false title = "CLI Reference" weight = 25 url = '/reference/cli-reference' +++

Complete reference for all LocalAI command-line interface (CLI) parameters and environment variables.

Note: All CLI flags can also be set via environment variables. Environment variables take precedence over CLI flags. See [.env files]({{%relref "advanced/advanced-usage#env-files" %}}) for configuration file support.

Global Flags

Parameter	Default	Description	Environment Variable
`-h, --help`		Show context-sensitive help
`--log-level`	`info`	Set the level of logs to output [error,warn,info,debug,trace]	`$LOCALAI_LOG_LEVEL`
`--debug`	`false`	DEPRECATED - Use `--log-level=debug` instead. Enable debug logging	`$LOCALAI_DEBUG`, `$DEBUG`

Storage Flags

Parameter	Default	Description	Environment Variable
`--models-path`	`BASEPATH/models`	Path containing models used for inferencing	`$LOCALAI_MODELS_PATH`, `$MODELS_PATH`
`--generated-content-path`	`/tmp/generated/content`	Location for assets generated by backends (e.g. stablediffusion, images, audio, videos)	`$LOCALAI_GENERATED_CONTENT_PATH`, `$GENERATED_CONTENT_PATH`
`--upload-path`	`/tmp/localai/upload`	Path to store uploads from files API	`$LOCALAI_UPLOAD_PATH`, `$UPLOAD_PATH`
`--localai-config-dir`	`BASEPATH/configuration`	Directory for dynamic loading of certain configuration files (currently runtime_settings.json, api_keys.json, and external_backends.json). See [Runtime Settings]({{%relref "features/runtime-settings" %}}) for web-based configuration.	`$LOCALAI_CONFIG_DIR`
`--localai-config-dir-poll-interval`		Time duration to poll the LocalAI Config Dir if your system has broken fsnotify events (example: `1m`)	`$LOCALAI_CONFIG_DIR_POLL_INTERVAL`
`--models-config-file`		YAML file containing a list of model backend configs (alias: `--config-file`)	`$LOCALAI_MODELS_CONFIG_FILE`, `$CONFIG_FILE`

Backend Flags

Parameter	Default	Description	Environment Variable
`--backends-path`	`BASEPATH/backends`	Path containing backends used for inferencing	`$LOCALAI_BACKENDS_PATH`, `$BACKENDS_PATH`
`--backends-system-path`	`/var/lib/local-ai/backends`	Path containing system backends used for inferencing	`$LOCALAI_BACKENDS_SYSTEM_PATH`, `$BACKEND_SYSTEM_PATH`
`--external-backends`		A list of external backends to load from gallery on boot	`$LOCALAI_EXTERNAL_BACKENDS`, `$EXTERNAL_BACKENDS`
`--external-grpc-backends`		A list of external gRPC backends (format: `BACKEND_NAME:URI`)	`$LOCALAI_EXTERNAL_GRPC_BACKENDS`, `$EXTERNAL_GRPC_BACKENDS`
`--backend-galleries`		JSON list of backend galleries	`$LOCALAI_BACKEND_GALLERIES`, `$BACKEND_GALLERIES`
`--autoload-backend-galleries`	`true`	Automatically load backend galleries on startup	`$LOCALAI_AUTOLOAD_BACKEND_GALLERIES`, `$AUTOLOAD_BACKEND_GALLERIES`
`--parallel-requests`	`false`	Enable backends to handle multiple requests in parallel if they support it (e.g.: llama.cpp or vllm)	`$LOCALAI_PARALLEL_REQUESTS`, `$PARALLEL_REQUESTS`
`--max-active-backends`	`0`	Maximum number of active backends (loaded models). When exceeded, the least recently used model is evicted. Set to `0` for unlimited, `1` for single-backend mode	`$LOCALAI_MAX_ACTIVE_BACKENDS`, `$MAX_ACTIVE_BACKENDS`
`--single-active-backend`	`false`	DEPRECATED - Use `--max-active-backends=1` instead. Allow only one backend to be run at a time	`$LOCALAI_SINGLE_ACTIVE_BACKEND`, `$SINGLE_ACTIVE_BACKEND`
`--preload-backend-only`	`false`	Do not launch the API services, only the preloaded models/backends are started (useful for multi-node setups)	`$LOCALAI_PRELOAD_BACKEND_ONLY`, `$PRELOAD_BACKEND_ONLY`
`--enable-watchdog-idle`	`false`	Enable watchdog for stopping backends that are idle longer than the watchdog-idle-timeout	`$LOCALAI_WATCHDOG_IDLE`, `$WATCHDOG_IDLE`
`--watchdog-idle-timeout`	`15m`	Threshold beyond which an idle backend should be stopped	`$LOCALAI_WATCHDOG_IDLE_TIMEOUT`, `$WATCHDOG_IDLE_TIMEOUT`
`--enable-watchdog-busy`	`false`	Enable watchdog for stopping backends that are busy longer than the watchdog-busy-timeout	`$LOCALAI_WATCHDOG_BUSY`, `$WATCHDOG_BUSY`
`--watchdog-busy-timeout`	`5m`	Threshold beyond which a busy backend should be stopped	`$LOCALAI_WATCHDOG_BUSY_TIMEOUT`, `$WATCHDOG_BUSY_TIMEOUT`
`--force-eviction-when-busy`	`false`	Force eviction even when models have active API calls (default: false for safety). Warning: Enabling this can interrupt active requests	`$LOCALAI_FORCE_EVICTION_WHEN_BUSY`, `$FORCE_EVICTION_WHEN_BUSY`
`--lru-eviction-max-retries`	`30`	Maximum number of retries when waiting for busy models to become idle before eviction	`$LOCALAI_LRU_EVICTION_MAX_RETRIES`, `$LRU_EVICTION_MAX_RETRIES`
`--lru-eviction-retry-interval`	`1s`	Interval between retries when waiting for busy models to become idle (e.g., `1s`, `2s`)	`$LOCALAI_LRU_EVICTION_RETRY_INTERVAL`, `$LRU_EVICTION_RETRY_INTERVAL`

For more information on VRAM management, see [VRAM and Memory Management]({{%relref "advanced/vram-management" %}}).

Models Flags

Parameter	Default	Description	Environment Variable
`--galleries`		JSON list of galleries	`$LOCALAI_GALLERIES`, `$GALLERIES`
`--autoload-galleries`	`true`	Automatically load galleries on startup	`$LOCALAI_AUTOLOAD_GALLERIES`, `$AUTOLOAD_GALLERIES`
`--preload-models`		A list of models to apply in JSON at start	`$LOCALAI_PRELOAD_MODELS`, `$PRELOAD_MODELS`
`--models`		A list of model configuration URLs to load	`$LOCALAI_MODELS`, `$MODELS`
`--preload-models-config`		A list of models to apply at startup. Path to a YAML config file	`$LOCALAI_PRELOAD_MODELS_CONFIG`, `$PRELOAD_MODELS_CONFIG`
`--load-to-memory`		A list of models to load into memory at startup	`$LOCALAI_LOAD_TO_MEMORY`, `$LOAD_TO_MEMORY`

Note: You can also pass model configuration URLs as positional arguments: local-ai run MODEL_URL1 MODEL_URL2 ...

Performance Flags

Parameter	Default	Description	Environment Variable
`--f16`	`false`	Enable GPU acceleration	`$LOCALAI_F16`, `$F16`
`-t, --threads`		Number of threads used for parallel computation. Usage of the number of physical cores in the system is suggested	`$LOCALAI_THREADS`, `$THREADS`
`--context-size`		Default context size for models	`$LOCALAI_CONTEXT_SIZE`, `$CONTEXT_SIZE`

API Flags

Parameter	Default	Description	Environment Variable
`--address`	`:8080`	Bind address for the API server	`$LOCALAI_ADDRESS`, `$ADDRESS`
`--cors`	`false`	Enable CORS (Cross-Origin Resource Sharing)	`$LOCALAI_CORS`, `$CORS`
`--cors-allow-origins`		Comma-separated list of allowed CORS origins	`$LOCALAI_CORS_ALLOW_ORIGINS`, `$CORS_ALLOW_ORIGINS`
`--csrf`	`false`	Enable Fiber CSRF middleware	`$LOCALAI_CSRF`
`--upload-limit`	`15`	Default upload-limit in MB	`$LOCALAI_UPLOAD_LIMIT`, `$UPLOAD_LIMIT`
`--api-keys`		List of API Keys to enable API authentication. When this is set, all requests must be authenticated with one of these API keys	`$LOCALAI_API_KEY`, `$API_KEY`
`--disable-webui`	`false`	Disables the web user interface. When set to true, the server will only expose API endpoints without serving the web interface	`$LOCALAI_DISABLE_WEBUI`, `$DISABLE_WEBUI`
`--disable-runtime-settings`	`false`	Disables the runtime settings feature. When set to true, the server will not load runtime settings from the `runtime_settings.json` file and the settings web interface will be disabled	`$LOCALAI_DISABLE_RUNTIME_SETTINGS`, `$DISABLE_RUNTIME_SETTINGS`
`--disable-gallery-endpoint`	`false`	Disable the gallery endpoints	`$LOCALAI_DISABLE_GALLERY_ENDPOINT`, `$DISABLE_GALLERY_ENDPOINT`
`--disable-metrics-endpoint`	`false`	Disable the `/metrics` endpoint	`$LOCALAI_DISABLE_METRICS_ENDPOINT`, `$DISABLE_METRICS_ENDPOINT`
`--machine-tag`		If not empty, add that string to Machine-Tag header in each response. Useful to track response from different machines using multiple P2P federated nodes	`$LOCALAI_MACHINE_TAG`, `$MACHINE_TAG`

Hardening Flags

Parameter	Default	Description	Environment Variable
`--disable-predownload-scan`	`false`	If true, disables the best-effort security scanner before downloading any files	`$LOCALAI_DISABLE_PREDOWNLOAD_SCAN`
`--opaque-errors`	`false`	If true, all error responses are replaced with blank 500 errors. This is intended only for hardening against information leaks and is normally not recommended	`$LOCALAI_OPAQUE_ERRORS`
`--use-subtle-key-comparison`	`false`	If true, API Key validation comparisons will be performed using constant-time comparisons rather than simple equality. This trades off performance on each request for resilience against timing attacks	`$LOCALAI_SUBTLE_KEY_COMPARISON`
`--disable-api-key-requirement-for-http-get`	`false`	If true, a valid API key is not required to issue GET requests to portions of the web UI. This should only be enabled in secure testing environments	`$LOCALAI_DISABLE_API_KEY_REQUIREMENT_FOR_HTTP_GET`
`--http-get-exempted-endpoints`	`^/$,^/browse/?$,^/talk/?$,^/p2p/?$,^/chat/?$,^/text2image/?$,^/tts/?$,^/static/.$,^/swagger.$`	If `--disable-api-key-requirement-for-http-get` is overridden to true, this is the list of endpoints to exempt. Only adjust this in case of a security incident or as a result of a personal security posture review	`$LOCALAI_HTTP_GET_EXEMPTED_ENDPOINTS`

P2P Flags

Parameter	Default	Description	Environment Variable
`--p2p`	`false`	Enable P2P mode	`$LOCALAI_P2P`, `$P2P`
`--p2p-dht-interval`	`360`	Interval for DHT refresh (used during token generation)	`$LOCALAI_P2P_DHT_INTERVAL`, `$P2P_DHT_INTERVAL`
`--p2p-otp-interval`	`9000`	Interval for OTP refresh (used during token generation)	`$LOCALAI_P2P_OTP_INTERVAL`, `$P2P_OTP_INTERVAL`
`--p2ptoken`		Token for P2P mode (optional)	`$LOCALAI_P2P_TOKEN`, `$P2P_TOKEN`, `$TOKEN`
`--p2p-network-id`		Network ID for P2P mode, can be set arbitrarily by the user for grouping a set of instances	`$LOCALAI_P2P_NETWORK_ID`, `$P2P_NETWORK_ID`
`--federated`	`false`	Enable federated instance	`$LOCALAI_FEDERATED`, `$FEDERATED`

Other Commands

LocalAI supports several subcommands beyond run:

local-ai models - Manage LocalAI models and definitions
local-ai backends - Manage LocalAI backends and definitions
local-ai tts - Convert text to speech
local-ai sound-generation - Generate audio files from text or audio
local-ai transcript - Convert audio to text
local-ai worker - Run workers to distribute workload (llama.cpp-only)
local-ai util - Utility commands
local-ai explorer - Run P2P explorer
local-ai federated - Run LocalAI in federated mode

Use local-ai <command> --help for more information on each command.

Examples

Basic Usage

./local-ai run

./local-ai run --models-path /path/to/models --address :9090

./local-ai run --f16

Environment Variables

export LOCALAI_MODELS_PATH=/path/to/models
export LOCALAI_ADDRESS=:9090
export LOCALAI_F16=true
./local-ai run

Advanced Configuration

./local-ai run \
  --models model1.yaml model2.yaml \
  --enable-watchdog-idle \
  --watchdog-idle-timeout=10m \
  --p2p \
  --federated

See [Advanced Usage]({{%relref "advanced/advanced-usage" %}}) for configuration examples
See [VRAM and Memory Management]({{%relref "advanced/vram-management" %}}) for memory management options

12 KiB Raw Permalink Blame History