LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-20 14:49:09 -04:00

Files

Ettore Di Giacinto bca250e2bd feat(config): node-aware hardware defaults — larger physical batch on Blackwell

A larger physical batch (n_batch/n_ubatch) materially lifts MoE prefill on
NVIDIA Blackwell consumer GPUs (sm_120/121, incl. GB10 / DGX Spark) — measured
on a GB10 with Qwen3-Coder-30B-A3B, the prefill ceiling rises (ub512 ~2994 ->
ub2048 ~3316 t/s) and saturates around 2048.

The heuristic lives in core/config alongside the other config overriders
(ApplyInferenceDefaults, guessDefaultsFromFile/NGPULayers) — they all fill the
ModelConfig from heuristics, so hardware tuning is the same domain and stays in
one place. It is parameterized on a GPU descriptor (not direct detection) so it
works in both deployment shapes:

- Single host: SetDefaults applies it with the LocalGPU.
- Distributed: only the worker sees the GPU, so the worker reports its compute
  capability on registration (gpu_compute_capability -> BackendNode), and the
  router re-applies the SAME core/config heuristic for the SELECTED node before
  loading — fixing the case where the frontend has no GPU at all.

Explicit `batch:` always wins (only managed default values are touched).
xsysinfo gains NVIDIAComputeCapability() (detection only); all interpretation
lives in core/config. Tests: core/config, pkg/xsysinfo, core/services/nodes.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-19 22:02:14 +00:00

agent_collections.go

fix(agents): handle embedding model dim changes on collection upload (#9365 )

2026-04-15 20:05:28 +02:00

agent_jobs.go

feat(api): Allow coding agents to interactively discover how to control and configure LocalAI (#9084 )

2026-04-04 15:14:35 +02:00

agent_responses.go

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

agent_skills.go

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

agents_isolation_test.go

chore: Security hardening (#9719 )

2026-05-08 16:25:45 +02:00

agents.go

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

api_instructions_test.go

feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802 )

2026-05-25 09:28:27 +02:00

api_instructions.go

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

audio_transform.go

feat(whisper): honor client cancellation via ggml abort_callback (#9710 )

2026-05-08 01:44:47 +02:00

audio.go

security(http): refuse redirects on outbound clients via hardened pkg/httpclient (#10087 )

2026-05-30 12:04:10 +02:00

backend_logs_test.go

feat(api): Allow coding agents to interactively discover how to control and configure LocalAI (#9084 )

2026-04-04 15:14:35 +02:00

backend_logs.go

feat(api): Allow coding agents to interactively discover how to control and configure LocalAI (#9084 )

2026-04-04 15:14:35 +02:00

backend_monitor.go

fix(backend-monitor): accept model as a query parameter (#9411 )

2026-04-21 22:06:35 +02:00

backend_test.go

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

backend.go

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

branding_endpoints_test.go

test: add Go + React UI coverage gates and fill test gaps (#9989 )

2026-05-26 22:06:10 +02:00

branding_test.go

chore: Security hardening (#9719 )

2026-05-08 16:25:45 +02:00

branding.go

chore: Security hardening (#9719 )

2026-05-08 16:25:45 +02:00

config_meta_test.go

feat(ui): Interactive model config editor with autocomplete (#9149 )

2026-04-07 14:42:23 +02:00

config_meta.go

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

cors_proxy_test.go

chore: Security hardening (#9719 )

2026-05-08 16:25:45 +02:00

cors_proxy.go

security(http): refuse redirects on outbound clients via hardened pkg/httpclient (#10087 )

2026-05-30 12:04:10 +02:00

depth.go

feat(backend): add depth-anything (Depth Anything 3) C++/ggml backend + gallery (#10352 )

2026-06-16 16:28:28 +02:00

detection.go

feat(whisper): honor client cancellation via ggml abort_callback (#9710 )

2026-05-08 01:44:47 +02:00

edit_model_test.go

fix(ui): rename model config files on save to prevent duplicates (#9388 )

2026-04-17 08:12:48 +02:00

edit_model.go

feat: localai assistant chat modality (#9602 )

2026-04-28 19:29:27 +02:00

face_analyze.go

feat(whisper): honor client cancellation via ggml abort_callback (#9710 )

2026-05-08 01:44:47 +02:00

face_embed.go

feat(whisper): honor client cancellation via ggml abort_callback (#9710 )

2026-05-08 01:44:47 +02:00

face_forget.go

feat(face-recognition): add insightface/onnx backend for 1:1 verify, 1:N identify, embedding, detection, analysis (#9480 )

2026-04-22 21:55:41 +02:00

face_identify.go

feat(whisper): honor client cancellation via ggml abort_callback (#9710 )

2026-05-08 01:44:47 +02:00

face_register.go

feat(whisper): honor client cancellation via ggml abort_callback (#9710 )

2026-05-08 01:44:47 +02:00

face_verify.go

feat(whisper): honor client cancellation via ggml abort_callback (#9710 )

2026-05-08 01:44:47 +02:00

finetune.go

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

gallery.go

feat(api): Allow coding agents to interactively discover how to control and configure LocalAI (#9084 )

2026-04-04 15:14:35 +02:00

get_token_metrics.go

feat(whisper): honor client cancellation via ggml abort_callback (#9710 )

2026-05-08 01:44:47 +02:00

images.go

feat(face-recognition): add insightface/onnx backend for 1:1 verify, 1:N identify, embedding, detection, analysis (#9480 )

2026-04-22 21:55:41 +02:00

import_model_test.go

feat(importer): expand importer flow to almost all backends (#9466 )

2026-04-22 22:42:37 +02:00

import_model.go

feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802 )

2026-05-25 09:28:27 +02:00

localai_suite_test.go

feat(webui): add import/edit model page (#6050 )

2025-08-14 23:48:09 +02:00

mcp_prompts.go

feat(ui): MCP Apps, mcp streaming and client-side support (#8947 )

2026-03-11 07:30:49 +01:00

mcp_resources.go

feat(ui): MCP Apps, mcp streaming and client-side support (#8947 )

2026-03-11 07:30:49 +01:00

mcp_tools.go

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

mcp.go

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

metrics.go

feat(api): Allow coding agents to interactively discover how to control and configure LocalAI (#9084 )

2026-04-04 15:14:35 +02:00

nodes_install_async_test.go

fix(nodes): make per-node backend install async via gallery job queue (#9928 )

2026-05-21 22:25:53 +02:00

nodes_scheduling_test.go

feat(distributed): declarative per-model scheduling via env/args (#10308 )

2026-06-13 18:31:06 +02:00

nodes_scheduling_validation_test.go

feat: prefix-cache-aware routing for distributed mode (#10071 )

2026-05-30 23:24:22 +02:00

nodes_test.go

feat(distributed): Add NATS JWT authentication and TLS/mTLS options (#10159 )

2026-06-03 19:43:56 +02:00

nodes.go

feat(config): node-aware hardware defaults — larger physical batch on Blackwell

2026-06-19 22:02:14 +00:00

p2p.go

feat(api): Allow coding agents to interactively discover how to control and configure LocalAI (#9084 )

2026-04-04 15:14:35 +02:00

pii_test.go

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

pii.go

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

pin_model.go

feat: localai assistant chat modality (#9602 )

2026-04-28 19:29:27 +02:00

quantization.go

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

router_decide_test.go

feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802 )

2026-05-25 09:28:27 +02:00

router_decide.go

feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802 )

2026-05-25 09:28:27 +02:00

score.go

feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802 )

2026-05-25 09:28:27 +02:00

settings_test.go

fix(settings): start watchdog on cold-enable from the React UI (#9125 ) (#10287 )

2026-06-14 16:46:14 +02:00

settings.go

fix(settings): start watchdog on cold-enable from the React UI (#9125 ) (#10287 )

2026-06-14 16:46:14 +02:00

stores.go

feat(loader): enhance single active backend to support LRU eviction (#7535 )

2025-12-12 12:28:38 +01:00

system.go

feat(api): Allow coding agents to interactively discover how to control and configure LocalAI (#9084 )

2026-04-04 15:14:35 +02:00

toggle_model.go

feat: localai assistant chat modality (#9602 )

2026-04-28 19:29:27 +02:00

tokenize.go

feat(api): Allow coding agents to interactively discover how to control and configure LocalAI (#9084 )

2026-04-04 15:14:35 +02:00

traces_test.go

feat(api): Allow coding agents to interactively discover how to control and configure LocalAI (#9084 )

2026-04-04 15:14:35 +02:00

traces.go

feat(api): Allow coding agents to interactively discover how to control and configure LocalAI (#9084 )

2026-04-04 15:14:35 +02:00

tts.go

feat(tts): support per-request instructions and params (#10172 )

2026-06-04 11:45:02 +02:00

types.go

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

vad.go

feat(api): Allow coding agents to interactively discover how to control and configure LocalAI (#9084 )

2026-04-04 15:14:35 +02:00

video.go

security(http): refuse redirects on outbound clients via hardened pkg/httpclient (#10087 )

2026-05-30 12:04:10 +02:00

voice_analyze.go

feat(whisper): honor client cancellation via ggml abort_callback (#9710 )

2026-05-08 01:44:47 +02:00

voice_embed.go

feat(whisper): honor client cancellation via ggml abort_callback (#9710 )

2026-05-08 01:44:47 +02:00

voice_forget.go

feat: voice recognition (#9500 )

2026-04-23 12:07:14 +02:00

voice_identify.go

feat(whisper): honor client cancellation via ggml abort_callback (#9710 )

2026-05-08 01:44:47 +02:00

voice_register.go

feat(whisper): honor client cancellation via ggml abort_callback (#9710 )

2026-05-08 01:44:47 +02:00

voice_verify.go

feat(whisper): honor client cancellation via ggml abort_callback (#9710 )

2026-05-08 01:44:47 +02:00

vram_test.go

feat(gallery): Speed up load times and clean gallery entries (#9211 )

2026-05-06 14:51:38 +02:00

vram.go

feat(gallery): Speed up load times and clean gallery entries (#9211 )

2026-05-06 14:51:38 +02:00

welcome.go

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00