Files
Daniel Hiltgen 630882621b llama-server followups (#16353)
* llama-server followups

Misc fixes for #16031
- Add back dropped ROCm build flag for multi-GPU support on windows
- Fix amdhip64_*.dll version detection for "latest" selection
- Fix embeddings API for consistent normalize behavior with prior versions

* ci: set up for automated llama.cpp update testing

* reduce batch for fa-disabled, and constrained vram

* mlx: fix v3 load bug on m5

Imagegen was incorrectly loading v3 first.  This DRYs out the loading code so imagegen gets the same new v4/v3 selection logic.

* fix reload bug on embedding models

* bump version

* steer user how to enable iGPU when disabled
2026-06-01 10:44:21 -07:00
..