{{template "views/partials/head" .}}
Configure watchdog and backend request settings
Configure automatic monitoring and management of backend processes
Enable automatic monitoring of backend processes
Automatically stop backends that are idle for too long
Time before an idle backend is stopped (e.g., 15m, 1h)
Automatically stop backends that are busy for too long (stuck processes)
Time before a busy backend is stopped (e.g., 5m, 30m)
How often the watchdog checks backends and memory usage (e.g., 2s, 30s)
Allow evicting models even when they have active API calls (default: disabled for safety)
Maximum number of retries when waiting for busy models to become idle (default: 30)
Interval between retries when waiting for busy models (e.g., 1s, 2s) (default: 1s)
Automatically evict backends when memory usage exceeds a threshold. Uses GPU VRAM if available, otherwise system RAM. Uses LRU strategy.
Memory monitoring unavailable
Evict backends when memory usage exceeds threshold
When memory usage exceeds this, backends will be evicted (50-100%)
Configure how backends handle multiple requests
Maximum number of models to keep loaded at once (0 = unlimited, 1 = single backend mode). Least recently used models are evicted when limit is reached.
Enable backends to handle multiple requests in parallel (if supported)
Configure default performance parameters for models
Number of threads to use for model inference (0 = auto)
Default context window size for models
Use 16-bit floating point precision
Enable debug logging
Enable tracing of requests and responses
Maximum number of tracing items to keep
Configure CORS and CSRF protection
Enable Cross-Origin Resource Sharing
Comma-separated list of allowed origins
Enable Cross-Site Request Forgery protection
Configure peer-to-peer networking
Authentication token for P2P network (set to 0 to generate a new token)
Network identifier for P2P connections
Enable federated instance mode
Configure agent job retention and cleanup
Number of days to keep job history (default: 30)
Configure Open Responses API response storage
Time-to-live for stored responses (e.g., 1h, 30m, 0 = no expiration)
Manage API keys for authentication. Keys from environment variables are always included.
List of API keys (one per line or comma-separated)
Note: API keys are sensitive. Handle with care.
Configure model and backend galleries
Automatically load model galleries on startup
Automatically load backend galleries on startup
Array of gallery objects with 'url' and 'name' fields
Array of backend gallery objects with 'url' and 'name' fields