LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-08-01 02:49:51 -04:00

Files

Ettore Di Giacinto 66af748332 chore(turboquant): bump to 2cbfdc62 and retire obsolete grpc-server patches

The turboquant fork rebased past ggml-org/llama.cpp#21962, #22397 and
#22838, so common_params_speculative now uses the nested draft/
ngram_simple/ngram_mod layout, server_context_impl exposes model_tgt
(not model), and get_media_marker() is provided. The compatibility
shims in patch-grpc-server.sh were rewriting the shared grpc-server.cpp
to the pre-refactor flat layout, which no longer matches the fork and
broke the build (see PR #9912 CI failure).

Keep only the fork-specific kv_cache_types[] insertion for the
TURBO2_0 / TURBO3_0 / TURBO4_0 enum entries. The dormant
LOCALAI_LEGACY_LLAMA_CPP_SPEC #ifdef blocks in
backend/cpp/llama-cpp/grpc-server.cpp stay as an escape hatch if a
future fork bump regresses.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-05-21 11:03:46 +00:00

apply-patches.sh

feat(backend): add turboquant llama.cpp-fork backend (#9355 )

2026-04-15 01:25:04 +02:00

Makefile

chore(turboquant): bump to 2cbfdc62 and retire obsolete grpc-server patches

2026-05-21 11:03:46 +00:00

package.sh

feat(backend): add turboquant llama.cpp-fork backend (#9355 )

2026-04-15 01:25:04 +02:00

patch-grpc-server.sh

chore(turboquant): bump to 2cbfdc62 and retire obsolete grpc-server patches

2026-05-21 11:03:46 +00:00

run.sh

feat(backend): add turboquant llama.cpp-fork backend (#9355 )

2026-04-15 01:25:04 +02:00