chore: ⬆️ Update ggml-org/llama.cpp to d6588daa800058dfa54f1d7ea695b1a810c8ae18 (#10093)

* ⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix(llama-cpp): skip begin-of-stream null partial in PredictStream

Upstream llama.cpp (ggml-org/llama.cpp#23884), pulled in by this bump,
now emits an initial "begin" partial whose to_json() returns null. It
exists only to signal the HTTP layer to flush 200 status headers before
any token is produced.

gRPC has no such concept, and PredictStream had no guard: the null result
was fed straight into build_reply_from_json, which threw an uncaught
exception. That surfaced as a generic "Unexpected error in RPC handling"
and the task was cancelled the instant it launched, breaking the
PredictStream e2e spec.

Skip null results in both the first-result handling and the streaming
loop, mirroring upstream's own `if (first_result_json == nullptr)` guard.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
This commit is contained in:
LocalAI [bot]
2026-05-31 12:26:03 +02:00
committed by GitHub
parent 0d57957ebb
commit aa80d4681b
2 changed files with 14 additions and 3 deletions

View File

@@ -2204,7 +2204,15 @@ public:
// content element — attaching to both would duplicate the first
// token since oaicompat_msg_diffs is the same for both.
json first_res_json = first_result->to_json();
if (first_res_json.is_array()) {
// Upstream llama.cpp (ggml-org/llama.cpp#23884) now emits an initial
// "begin" partial whose to_json() returns null, used only to signal the
// HTTP layer to flush 200 status headers before any token. gRPC has no
// such concept, so there is nothing to emit — the real tokens arrive in
// the loop below. Feeding this null into build_reply_from_json would
// throw (uncaught) and surface as a generic RPC error.
if (first_res_json.is_null()) {
// skip the begin-of-stream marker
} else if (first_res_json.is_array()) {
for (const auto & res : first_res_json) {
auto reply = build_reply_from_json(res, first_result.get());
// Skip chat deltas for role-init elements (have "role" in
@@ -2234,7 +2242,10 @@ public:
}
json res_json = result->to_json();
if (res_json.is_array()) {
if (res_json.is_null()) {
// begin-of-stream marker (see note above) — nothing to emit
continue;
} else if (res_json.is_array()) {
for (const auto & res : res_json) {
auto reply = build_reply_from_json(res, result.get());
bool is_role_init = res.contains("choices") && !res["choices"].empty() &&