Files
LocalAI/backend/cpp/turboquant
Ettore Di Giacinto 66af748332 chore(turboquant): bump to 2cbfdc62 and retire obsolete grpc-server patches
The turboquant fork rebased past ggml-org/llama.cpp#21962, #22397 and
#22838, so common_params_speculative now uses the nested draft/
ngram_simple/ngram_mod layout, server_context_impl exposes model_tgt
(not model), and get_media_marker() is provided. The compatibility
shims in patch-grpc-server.sh were rewriting the shared grpc-server.cpp
to the pre-refactor flat layout, which no longer matches the fork and
broke the build (see PR #9912 CI failure).

Keep only the fork-specific kv_cache_types[] insertion for the
TURBO2_0 / TURBO3_0 / TURBO4_0 enum entries. The dormant
LOCALAI_LEGACY_LLAMA_CPP_SPEC #ifdef blocks in
backend/cpp/llama-cpp/grpc-server.cpp stay as an escape hatch if a
future fork bump regresses.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-21 11:03:46 +00:00
..