chore: ⬆️ Update ggml-org/llama.cpp to 35c9b1f39ebe5a7bb83986d64415a079218be78d (#9998)

* ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * fix(llama-cpp): track upstream rename checkpoint_every_nt -> checkpoint_min_step Upstream llama.cpp renamed common_params::checkpoint_every_nt to checkpoint_min_step and changed its default from 8192 to 256. The semantics also shifted: it used to enforce a fixed checkpoint cadence during prefill, now it sets a minimum spacing between context checkpoints. Track the new field name in grpc-server.cpp and accept the old option names as backward- compatible aliases for users with existing configs. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: claude-code:claude-opus-4-7 --------- Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-07-15 10:43:53 -04:00 · 2026-05-26 08:34:41 +02:00
parent e4c70fca7a
commit 4aad97971c
3 changed files with 16 additions and 10 deletions
--- a/backend/cpp/llama-cpp/grpc-server.cpp
+++ b/backend/cpp/llama-cpp/grpc-server.cpp
@@ -570,9 +570,11 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
    // kv_unified=false or cache_ram_mib=0, so flipping kv_unified above is
    // what actually unlocks it.
    params.cache_idle_slots = true;
-    // checkpoint_every_nt: create a context checkpoint every N tokens during
-    // prefill (-1 disables). Match upstream's default (8192).
-    params.checkpoint_every_nt = 8192;
+    // checkpoint_min_step: minimum spacing between context checkpoints in
+    // tokens (0 disables the minimum). Match upstream's default (256). This
+    // field was renamed from `checkpoint_every_nt` in llama.cpp; the semantics
+    // also shifted from a fixed cadence to a minimum spacing.
+    params.checkpoint_min_step = 256;

     // decode options. Options are in form optname:optvale, or if booleans only optname.
    for (int i = 0; i < request->options_size(); i++) {
@@ -746,14 +748,18 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
                params.cache_idle_slots = false;
            }

-        // --- prefill checkpoint cadence (upstream -cpent / --checkpoint-every-n-tokens) ---
-        // -1 disables checkpointing during prefill.
-        } else if (!strcmp(optname, "checkpoint_every_nt") || !strcmp(optname, "checkpoint_every_n_tokens")) {
+        // --- minimum context-checkpoint spacing (upstream -cms / --checkpoint-min-step) ---
+        // 0 disables the minimum-spacing gate. Old option names (`checkpoint_every_nt`,
+        // `checkpoint_every_n_tokens`) are kept as aliases for backward compatibility
+        // with existing user configs: upstream renamed the field and shifted its
+        // semantics from a fixed cadence to a minimum spacing.
+        } else if (!strcmp(optname, "checkpoint_min_step") || !strcmp(optname, "checkpoint_min_spacing") ||
+                   !strcmp(optname, "checkpoint_every_nt") || !strcmp(optname, "checkpoint_every_n_tokens")) {
            if (optval != NULL) {
                try {
-                    params.checkpoint_every_nt = std::stoi(optval_str);
+                    params.checkpoint_min_step = std::stoi(optval_str);
                } catch (const std::exception& e) {
-                    // If conversion fails, keep default value (8192)
+                    // If conversion fails, keep default value (256)
                }
            }