LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-05-30 11:36:31 -04:00

Author	SHA1	Message	Date
Richard Palethorpe	90ea327178	fix(intel): VRAM detection (#9944 ) * fix(gpu-detect): clinfo --json fallback for Intel discrete VRAM ghw returns 0 VRAM for any i915-driven Intel GPU because the kernel driver doesn't expose VRAM through the sysfs paths ghw checks (no mem_info_vram_total — that's an amdgpu interface). xpu-smi, the canonical Intel tool, isn't in the oneAPI base image (it lives in a separate xpumanager package). The capability gate added in `19c92c70` ("default to CPU if there is less than 4GB of GPU available") then demotes the host to CPU even on a 16 GB Arc A770. clinfo ships with the OpenCL ICD loader and is present in the oneAPI base image, so plug it in as the last-resort Intel VRAM source: xpu-smi -> intel_gpu_top -> clinfo --json The parser drops UMA devices via HOST_UNIFIED_MEMORY=true so an iGPU sibling can't double-count system RAM, and dedups by PCI BDF when multiple ICDs enumerate the same physical device (POCL caps reported GLOBAL_MEM_SIZE at 4 GiB; the largest non-capped value wins). Subprocess is wrapped in a 2s timeout and memoised with sync.OnceValue — GPU hardware is static for the process lifetime. The Intel branch also short-circuits when ghw saw no Intel vendor, so NVIDIA-only hosts don't pay the spawn cost. Verified end-to-end on Intel Arc A770: ghw -> 0, clinfo path reports 16,225,243,136 bytes (15.11 GiB), capability gate now passes naturally without LOCALAI_FORCE_META_BACKEND_CAPABILITY=intel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(gpu-detect): live VRAM usage from DRM fdinfo The clinfo fallback reports total VRAM correctly but leaves UsedVRAM at 0 because OpenCL has no portable live-memory property — the UI ends up showing 0% utilisation even when llama-cpp is actually holding gigabytes in device memory. Fill that gap with the standardised Linux DRM fdinfo interface (Documentation/gpu/drm-usage-stats.rst, kernel ≥5.19). Walking /proc/<pid>/fdinfo for any fd that points at /dev/dri/render* yields drm-total-<region> / drm-resident-<region> keys; aggregate per render-node, resolve the render node to a PCI BDF via /sys/class/drm/<name>/device, and merge the result into the matching GPUMemoryInfo by BDF. Region naming is driver-defined — i915 uses "local0" for device-local VRAM, amdgpu and xe use "vram0" — so a prefix-match on local/vram covers all three DRM drivers that LocalAI cares about. system/gtt/ stolen regions are deliberately excluded since they're host RAM mirrors and would double-count against system RAM. GPUMemoryInfo gains an optional BDF field (`bdf,omitempty` in JSON) so future vendor-specific detectors can plug into the same matcher. Empty BDF skips the merge — non-PCI devices and detection paths that don't surface PCI location keep their existing behaviour. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-25 09:29:00 +02:00
Andreas Egli	a2940e5d47	feat: also parse VRAM budget/usage from vulkaninfo (#9800 ) Signed-off-by: Andreas Egli <github@kharan.ch>	2026-05-13 21:43:12 +02:00
Andreas Egli	03815e3b59	fix: parse vulkan VRAM from text (#9669 ) * fix: parse vulkan VRAM from text Assisted-by: opencode:gpt-5.5 Signed-off-by: Andreas Egli <github@kharan.ch> * fix: replace string.split with streaming iteration Assisted-by: Opencode:Gemma4 Signed-off-by: Andreas Egli <github@kharan.ch> --------- Signed-off-by: Andreas Egli <github@kharan.ch>	2026-05-12 09:53:48 +02:00
Ettore Di Giacinto	551ebdb57a	fix(distributed): correct VRAM/RAM reporting on NVIDIA unified-memory hosts (#9545 ) Workers on NVIDIA unified-memory hardware (DGX Spark / GB10, Jetson AGX Thor, Jetson Orin/Xavier/Nano) were reporting `available_vram=0` back to the frontend, so the Nodes UI showed the node as fully used even when most of the unified memory was actually free. Three causes addressed: * `isTegraDevice` only matched `/sys/devices/soc0/family == "Tegra"`. DGX Spark (SBSA) reports JEDEC codes there instead — `jep106:0426` for the NVIDIA manufacturer — so the Tegra/unified-memory fallback never ran. Renamed to `isNVIDIAIntegratedGPU` and extended to also match `jep106:0426[:]` via `/sys/devices/soc0/soc_id`. The unified-iGPU code defaulted the device name to `"NVIDIA Jetson"` when `/proc/device-tree/model` was missing. That's what happens for Thor inside a docker container, and always on DGX Spark. New `nvidiaIntegratedGPUName` resolves via dt-model → `/sys/devices/soc0/machine` → `soc_id` lookup (`jep106:0426:8901` → `"NVIDIA GB10"`) so the Nodes UI labels the box correctly. * Worker heartbeat sent `available_vram=0` (or total-as-available) when VRAM usage was momentarily unknown — e.g. when `nvidia-smi` intermittently failed with `waitid: no child processes` under containers without `--init`. Each such heartbeat overwrote the DB and made the UI flip to "fully used". `heartbeatBody` now omits `available_vram` in that case so the DB keeps its last good value. Also updates the commented GPU blocks in both compose files with `NVIDIA_DRIVER_CAPABILITIES=compute,utility`, `capabilities: [gpu, utility]`, and `init: true`, and documents the requirement in the distributed-mode and nvidia-l4t pages. Without `utility`, NVML/`nvidia-smi` are absent inside the container, which is what put the DGX Spark worker into the buggy fallback in the first place. Detection verified on live hardware (dgx.casa / GB10 and 192.168.68.23 / Thor) by running a cross-compiled probe of the new helpers on both host and inside the worker container. Assisted-by: Claude:opus-4.7 [Claude Code]	2026-04-24 22:02:23 +02:00
Ettore Di Giacinto	505c417fa7	fix(gpu): better detection for MacOS and Thor (#9263 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-07 00:39:07 +02:00
Ettore Di Giacinto	f259036a27	feat(gpu): add jetson/tegra detection Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-03-31 15:45:07 +00:00
Ettore Di Giacinto	59108fbe32	feat: add distributed mode (#9124 ) * feat: add distributed mode (experimental) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix data races, mutexes, transactions Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix events and tool stream in agent chat Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * use ginkgo Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(cron): compute correctly time boundaries avoiding re-triggering Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * enhancements, refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * do not flood of healthy checks Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * do not list obvious backends as text backends Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * tests fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Drop redundant healthcheck Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * enhancements, refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-03-30 00:47:27 +02:00
Ettore Di Giacinto	800f749c7b	fix: drop gguf VRAM estimation (now redundant) (#8325 ) fix: drop gguf VRAM estimation Cleanup. This is now handled directly in llama.cpp, no need to estimate from Go. VRAM estimation in general is tricky, but llama.cpp ( `41ea26144e/src/llama.cpp (L168)` ) lately has added an automatic "fitting" of models to VRAM, so we can drop backend-specific GGUF VRAM estimation from our code instead of trying to guess as we already enable it `397f7f0862/backend/cpp/llama-cpp/grpc-server.cpp (L393)` Fixes: https://github.com/mudler/LocalAI/issues/8302 See: https://github.com/mudler/LocalAI/issues/8302#issuecomment-3830773472	2026-02-01 17:33:28 +01:00
Ettore Di Giacinto	f5fade97e6	chore: drop noisy logs (#8142 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-01-21 09:52:20 +01:00
Ettore Di Giacinto	34e054f607	fix(reasoning): support models with reasoning without starting thinking tag (#8132 ) * chore: extract reasoning to its own package Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * make sure we detect thinking tokens from template Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Allow to override via config, add tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-01-20 21:07:59 +01:00
Ettore Di Giacinto	ffb2dc4666	chore(detection): detect GPU vendor from files present in the system (#7908 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-01-07 16:18:27 +01:00
Ettore Di Giacinto	185a685211	fix(amd-gpu): correctly show total and used vram (#7761 ) An example output of `rocm-smi --showproductname --showmeminfo vram --showuniqueid --csv`: ``` device,Unique ID,VRAM Total Memory (B),VRAM Total Used Memory (B),Card Series,Card Model,Card Vendor,Card SKU,Subsystem ID,Device Rev,Node ID,GUID,GFX Version card0,0x9246____________,17163091968,692142080,Navi 21 [Radeon RX 6800/6800 XT / 6900 XT],0x73bf,Advanced Micro Devices Inc. [AMD/ATI],001,0x2406,0xc1,1,45534,gfx1030 card1,N/A,67108864,26079232,Raphael,0x164e,Advanced Micro Devices Inc. [AMD/ATI],RAPHAEL,0x364e,0xc6,2,52156,gfx1036 ``` Total memory is actually showed before the total used memory as can be seen in https://github.com/LostRuins/koboldcpp/issues/1104#issuecomment-2321143507. This PR fixes https://github.com/mudler/LocalAI/issues/7724 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-12-29 07:57:07 +01:00
Ettore Di Giacinto	c37785b78c	chore(refactor): move logging to common package based on slog (#7668 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-12-21 19:33:13 +01:00
Ettore Di Giacinto	3ca90876f1	chore(memory detection): do not use go-sigar as requires CGO on darwin (#7618 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-12-16 23:10:42 +01:00
Ettore Di Giacinto	e3e5f59965	fix(ram): do not read from cgroup (#7606 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-12-16 13:28:11 +01:00
Ettore Di Giacinto	878c9d46d5	fix: improve ram estimation (#7603 ) * fix: default to 10seconds of watchdog if runtime setting is malformed Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: use gosigar for RAM estimation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-12-16 10:18:36 +01:00
Ettore Di Giacinto	50f9c9a058	feat(watchdog): add Memory resource reclaimer (#7583 ) * feat(watchdog): add GPU reclaimer Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Handle vram calculation for unified memory devices Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Support RAM eviction, set watchdog interval from runtime settings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-12-16 09:15:18 +01:00
Ettore Di Giacinto	b034cff149	feat: improve RAM estimation by using values from summary (#5525 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-06-05 19:16:26 +02:00
Ettore Di Giacinto	159388cce8	chore: memoize detected GPUs (#5385 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-05-18 08:55:44 +02:00
Ettore Di Giacinto	72111c597d	fix(gpu): do not assume gpu being returned has node and mem (#5310 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-05-03 19:00:24 +02:00
Ettore Di Giacinto	5c6cd50ed6	feat(llama.cpp): estimate vram usage (#5299 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-05-02 17:40:26 +02:00
Ettore Di Giacinto	9628860c0e	feat(llama.cpp/clip): inject gpu options if we detect GPUs (#5243 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-26 00:04:47 +02:00
Ettore Di Giacinto	bdd6769b2d	feat(default): use number of physical cores as default (#2483 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2024-06-04 15:23:29 +02:00
Ettore Di Giacinto	b69ff46c7e	feat(startup): show CPU/GPU information with --debug (#2241 ) Signed-off-by: mudler <mudler@localai.io>	2024-05-05 09:10:23 +02:00

24 Commits