mirror of
https://github.com/mudler/LocalAI.git
synced 2026-05-29 11:07:18 -04:00
The turboquant fork rebased past ggml-org/llama.cpp#21962, #22397 and #22838, so common_params_speculative now uses the nested draft/ ngram_simple/ngram_mod layout, server_context_impl exposes model_tgt (not model), and get_media_marker() is provided. The compatibility shims in patch-grpc-server.sh were rewriting the shared grpc-server.cpp to the pre-refactor flat layout, which no longer matches the fork and broke the build (see PR #9912 CI failure). Keep only the fork-specific kv_cache_types[] insertion for the TURBO2_0 / TURBO3_0 / TURBO4_0 enum entries. The dormant LOCALAI_LEGACY_LLAMA_CPP_SPEC #ifdef blocks in backend/cpp/llama-cpp/grpc-server.cpp stay as an escape hatch if a future fork bump regresses. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>