LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-07-30 09:57:57 -04:00

Files

Ettore Di Giacinto b6fed26271 chore(turboquant): retreat pin to 4c1c3ac0 to skip fork GPU regression

CI on the prior 2cbfdc62 pin confirmed our grpc-server.cpp/patch fix
works (tests-turboquant-grpc + all multiarch turboquant builds passed),
but every GPU singlearch turboquant build now hits a static-assertion
error in the fork's own ggml/src/ggml-cuda/fattn-mma-f16.cuh — a
regression introduced by the May 14 #22880 `HIP: RDNA3 mma FA` refactor
(file went from 1855 to 2049 lines).

4c1c3ac0 (2026-05-13 22:12 UTC) is the last commit before that refactor
and still has every API piece grpc-server.cpp depends on (DRAFT_SIMPLE
enum, nested common_params_speculative, model_tgt, get_media_marker(),
common_speculative_types_from_names). MTP support landed later (May 16)
and is not exercised by grpc-server.cpp.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-05-21 15:54:38 +00:00

ds4

chore: ⬆️ Update antirez/ds4 to 2606543be7a8c125a32cee37f5d1d85dc78f2fcf (#9909 )

2026-05-20 21:22:26 +00:00

grpc

fix: speedup git submodule update with --single-branch (#2847 )

2024-07-13 22:32:25 +02:00

ik-llama-cpp

chore: ⬆️ Update ikawrakow/ik_llama.cpp to 11a1fea9e291f12ce2c803a9d7812c30ca806bcf (#9914 )

2026-05-20 22:04:06 +00:00

llama-cpp

fix(llama-cpp): terminate tensor_buft_overrides with sentinel (#9919 )

2026-05-21 12:55:06 +02:00

turboquant

chore(turboquant): retreat pin to 4c1c3ac0 to skip fork GPU regression

2026-05-21 15:54:38 +00:00