Compare commits

...

18 Commits

Author SHA1 Message Date
dependabot[bot]
fdc9f7bf35 chore(deps): bump go.opentelemetry.io/otel/exporters/prometheus from 0.64.0 to 0.65.0 (#9254)
chore(deps): bump go.opentelemetry.io/otel/exporters/prometheus

Bumps [go.opentelemetry.io/otel/exporters/prometheus](https://github.com/open-telemetry/opentelemetry-go) from 0.64.0 to 0.65.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/exporters/prometheus/v0.64.0...exporters/prometheus/v0.65.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/exporters/prometheus
  dependency-version: 0.65.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-07 00:39:52 +02:00
LocalAI [bot]
8e59346091 chore: ⬆️ Update leejet/stable-diffusion.cpp to 8afbeb6ba9702c15d41a38296f2ab1fe5c829fa0 (#9262)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-07 00:39:38 +02:00
LocalAI [bot]
e6e4e19633 chore: ⬆️ Update ace-step/acestep.cpp to e0c8d75a672fca5684c88c68dbf6d12f58754258 (#9261)
⬆️ Update ace-step/acestep.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-07 00:39:24 +02:00
Ettore Di Giacinto
505c417fa7 fix(gpu): better detection for MacOS and Thor (#9263)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-07 00:39:07 +02:00
LocalAI [bot]
17215f6fbc docs: ⬆️ update docs version mudler/LocalAI (#9260)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-07 00:38:50 +02:00
LocalAI [bot]
bccaba1f66 chore: ⬆️ Update ggml-org/llama.cpp to d0a6dfeb28a09831d904fc4d910ddb740da82834 (#9259)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-07 00:38:36 +02:00
Ettore Di Giacinto
0f9d516a6c fix(anthropic): do not emit empty tokens and fix SSE tool calls (#9258)
This fixes Claude Code compatibility

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-07 00:38:21 +02:00
dependabot[bot]
33b124c6f1 chore(deps): bump github.com/aws/aws-sdk-go-v2/config from 1.32.12 to 1.32.14 (#9256)
chore(deps): bump github.com/aws/aws-sdk-go-v2/config

Bumps [github.com/aws/aws-sdk-go-v2/config](https://github.com/aws/aws-sdk-go-v2) from 1.32.12 to 1.32.14.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Commits](https://github.com/aws/aws-sdk-go-v2/compare/config/v1.32.12...config/v1.32.14)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/config
  dependency-version: 1.32.14
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-06 21:46:52 +02:00
dependabot[bot]
6b8007e88e chore(deps): bump github.com/jaypipes/ghw from 0.23.0 to 0.24.0 (#9250)
Bumps [github.com/jaypipes/ghw](https://github.com/jaypipes/ghw) from 0.23.0 to 0.24.0.
- [Release notes](https://github.com/jaypipes/ghw/releases)
- [Commits](https://github.com/jaypipes/ghw/compare/v0.23.0...v0.24.0)

---
updated-dependencies:
- dependency-name: github.com/jaypipes/ghw
  dependency-version: 0.24.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-06 21:46:18 +02:00
dependabot[bot]
b3837c2078 chore(deps): bump google.golang.org/grpc from 1.79.3 to 1.80.0 (#9253)
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.79.3 to 1.80.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.79.3...v1.80.0)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.80.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-06 21:45:50 +02:00
Ettore Di Giacinto
92f99b1ec3 fix(token): login via legacy api keys (#9249)
We were not checking against the api keys when db == nil.

This commit also cleanups now unused middleware

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-06 21:45:09 +02:00
LocalAI [bot]
ad232fdb1a docs: ⬆️ update docs version mudler/LocalAI (#9241)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-06 10:53:07 +02:00
LocalAI [bot]
11637b5a1b chore: ⬆️ Update leejet/stable-diffusion.cpp to 7397ddaa86f4e8837d5261724678cde0f36d4d89 (#9242)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-06 10:52:51 +02:00
LocalAI [bot]
0dda4fe6f0 chore: ⬆️ Update ggml-org/llama.cpp to 761797ffdf2ce3f118e82c663b1ad7d935fbd656 (#9243)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-04-06 10:52:38 +02:00
Ettore Di Giacinto
773489eeb1 fix(chat): do not retry if we had chatdeltas or tooldeltas from backend (#9244)
* fix(chat): do not retry if we had chatdeltas or tooldeltas from backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: use oai compat for llama.cpp

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: apply to non-streaming path too

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* map also other fields

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-06 10:52:23 +02:00
Ettore Di Giacinto
06fbe48b3f feat(llama.cpp): wire speculative decoding settings (#9238)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-05 14:56:30 +02:00
Ettore Di Giacinto
232e324a68 fix(autoparser): correctly pass by logprobs (#9239)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-05 09:39:22 +02:00
ER-EPR
39c954764c Update index.yaml and add Qwen3.5 model files (#9237)
* Update index.yaml

Signed-off-by: ER-EPR <38782737+ER-EPR@users.noreply.github.com>

* Add mmproj files for Qwen3.5 models

Signed-off-by: ER-EPR <38782737+ER-EPR@users.noreply.github.com>

* Update file paths for Qwen models in index.yaml

Signed-off-by: ER-EPR <38782737+ER-EPR@users.noreply.github.com>

* Update index.yaml

Signed-off-by: ER-EPR <38782737+ER-EPR@users.noreply.github.com>

* Refactor Qwen3-Reranker-0.6B entry in index.yaml

Signed-off-by: ER-EPR <38782737+ER-EPR@users.noreply.github.com>

* Update qwen3.yaml configuration parameters

Signed-off-by: ER-EPR <38782737+ER-EPR@users.noreply.github.com>

---------

Signed-off-by: ER-EPR <38782737+ER-EPR@users.noreply.github.com>
2026-04-05 09:21:21 +02:00
23 changed files with 1279 additions and 615 deletions

View File

@@ -1,5 +1,5 @@
LLAMA_VERSION?=b8635075ffe27b135c49afb9a8b5c434bd42c502
LLAMA_VERSION?=d0a6dfeb28a09831d904fc4d910ddb740da82834
LLAMA_REPO?=https://github.com/ggerganov/llama.cpp
CMAKE_ARGS?=

View File

@@ -284,6 +284,12 @@ json parse_options(bool streaming, const backend::PredictOptions* predict, const
data["ignore_eos"] = predict->ignoreeos();
data["embeddings"] = predict->embeddings();
// Speculative decoding per-request overrides
// NDraft maps to speculative.n_max (maximum draft tokens per speculation step)
if (predict->ndraft() > 0) {
data["speculative.n_max"] = predict->ndraft();
}
// Add the correlationid to json data
data["correlation_id"] = predict->correlationid();
@@ -402,6 +408,16 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
if (!request->mmproj().empty()) {
params.mmproj.path = request->mmproj();
}
// Draft model for speculative decoding
if (!request->draftmodel().empty()) {
params.speculative.mparams_dft.path = request->draftmodel();
// Default to draft type if a draft model is set but no explicit type
if (params.speculative.type == COMMON_SPECULATIVE_TYPE_NONE) {
params.speculative.type = COMMON_SPECULATIVE_TYPE_DRAFT;
}
}
// params.model_alias ??
params.model_alias.insert(request->modelfile());
if (!request->cachetypekey().empty()) {
@@ -609,6 +625,48 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
// If conversion fails, keep default value (8)
}
}
// Speculative decoding options
} else if (!strcmp(optname, "spec_type") || !strcmp(optname, "speculative_type")) {
auto type = common_speculative_type_from_name(optval_str);
if (type != COMMON_SPECULATIVE_TYPE_COUNT) {
params.speculative.type = type;
}
} else if (!strcmp(optname, "spec_n_max") || !strcmp(optname, "draft_max")) {
if (optval != NULL) {
try { params.speculative.n_max = std::stoi(optval_str); } catch (...) {}
}
} else if (!strcmp(optname, "spec_n_min") || !strcmp(optname, "draft_min")) {
if (optval != NULL) {
try { params.speculative.n_min = std::stoi(optval_str); } catch (...) {}
}
} else if (!strcmp(optname, "spec_p_min") || !strcmp(optname, "draft_p_min")) {
if (optval != NULL) {
try { params.speculative.p_min = std::stof(optval_str); } catch (...) {}
}
} else if (!strcmp(optname, "spec_p_split")) {
if (optval != NULL) {
try { params.speculative.p_split = std::stof(optval_str); } catch (...) {}
}
} else if (!strcmp(optname, "spec_ngram_size_n") || !strcmp(optname, "ngram_size_n")) {
if (optval != NULL) {
try { params.speculative.ngram_size_n = (uint16_t)std::stoi(optval_str); } catch (...) {}
}
} else if (!strcmp(optname, "spec_ngram_size_m") || !strcmp(optname, "ngram_size_m")) {
if (optval != NULL) {
try { params.speculative.ngram_size_m = (uint16_t)std::stoi(optval_str); } catch (...) {}
}
} else if (!strcmp(optname, "spec_ngram_min_hits") || !strcmp(optname, "ngram_min_hits")) {
if (optval != NULL) {
try { params.speculative.ngram_min_hits = (uint16_t)std::stoi(optval_str); } catch (...) {}
}
} else if (!strcmp(optname, "draft_gpu_layers")) {
if (optval != NULL) {
try { params.speculative.n_gpu_layers = std::stoi(optval_str); } catch (...) {}
}
} else if (!strcmp(optname, "draft_ctx_size")) {
if (optval != NULL) {
try { params.speculative.n_ctx = std::stoi(optval_str); } catch (...) {}
}
}
}
@@ -1251,6 +1309,7 @@ public:
body_json["messages"] = messages_json;
body_json["stream"] = true; // PredictStream is always streaming
body_json["stream_options"] = {{"include_usage", true}}; // Ensure token counts in final chunk
// Check if grammar is provided from Go layer (NoGrammar=false)
// If grammar is provided, we must use it and NOT let template generate grammar from tools
@@ -1558,8 +1617,11 @@ public:
data);
task.id_slot = json_value(data, "id_slot", -1);
// OAI-compat
task.params.res_type = TASK_RESPONSE_TYPE_NONE;
// OAI-compat: enable autoparser (PEG-based chat parsing) so that
// reasoning, tool calls, and content are classified into ChatDeltas.
// Without this, the PEG parser never produces diffs and the Go side
// cannot detect tool calls or separate reasoning from content.
task.params.res_type = TASK_RESPONSE_TYPE_OAI_CHAT;
task.params.oaicompat_cmpl_id = completion_id;
// oaicompat_model is already populated by params_from_json_cmpl
@@ -1584,19 +1646,47 @@ public:
return grpc::Status(grpc::StatusCode::INTERNAL, error_json.value("message", "Error occurred"));
}
// Lambda to build a Reply from JSON + attach chat deltas from a result
// Lambda to build a Reply from JSON + attach chat deltas from a result.
// Handles both native format ({"content": "..."}) and OAI chat format
// ({"choices": [{"delta": {"content": "...", "reasoning": "..."}}]}).
auto build_reply_from_json = [](const json & res_json, server_task_result * raw_result) -> backend::Reply {
backend::Reply reply;
std::string completion_text = res_json.value("content", "");
reply.set_message(completion_text);
reply.set_tokens(res_json.value("tokens_predicted", 0));
reply.set_prompt_tokens(res_json.value("tokens_evaluated", 0));
std::string completion_text;
if (res_json.contains("choices")) {
// OAI chat format — extract content from choices[0].delta
const auto & choices = res_json.at("choices");
if (!choices.empty()) {
const auto & delta = choices[0].value("delta", json::object());
if (delta.contains("content") && !delta.at("content").is_null()) {
completion_text = delta.at("content").get<std::string>();
}
}
} else {
// Native llama.cpp format
completion_text = res_json.value("content", "");
}
reply.set_message(completion_text);
// Token counts: native format has top-level fields,
// OAI format has them in "usage" (final chunk only)
if (res_json.contains("usage")) {
const auto & usage = res_json.at("usage");
reply.set_tokens(usage.value("completion_tokens", 0));
reply.set_prompt_tokens(usage.value("prompt_tokens", 0));
} else {
reply.set_tokens(res_json.value("tokens_predicted", 0));
reply.set_prompt_tokens(res_json.value("tokens_evaluated", 0));
}
// Timings: present as top-level "timings" in both formats
if (res_json.contains("timings")) {
reply.set_timing_prompt_processing(res_json.at("timings").value("prompt_ms", 0.0));
reply.set_timing_token_generation(res_json.at("timings").value("predicted_ms", 0.0));
}
// Logprobs: extract_logprobs_from_json handles both formats
json logprobs_json = extract_logprobs_from_json(res_json);
if (!logprobs_json.empty() && !logprobs_json.is_null()) {
reply.set_logprobs(logprobs_json.dump());
@@ -1605,21 +1695,17 @@ public:
return reply;
};
// Attach chat deltas from the autoparser to a Reply.
// When diffs are available, populate ChatDeltas on the reply.
// The raw message is always preserved so the Go side can use it
// for reasoning extraction and tool call parsing as a fallback
// (important in distributed mode where ChatDeltas may not be
// the primary parsing path).
auto attach_chat_deltas = [](backend::Reply & reply, server_task_result * raw_result) {
// Try streaming partial result first
auto* partial = dynamic_cast<server_task_result_cmpl_partial*>(raw_result);
if (partial) {
if (!partial->oaicompat_msg_diffs.empty()) {
populate_chat_deltas_from_diffs(reply, partial->oaicompat_msg_diffs);
} else if (partial->is_updated) {
// Autoparser is active but hasn't classified this chunk yet
// (PEG parser warming up). Clear the raw message so the Go
// side doesn't try to parse partial tag tokens (e.g. "<|channel>"
// before the full "<|channel>thought\n" is received).
// This matches llama.cpp server behavior which only emits SSE
// chunks when the parser produces diffs.
reply.set_message("");
}
if (partial && !partial->oaicompat_msg_diffs.empty()) {
populate_chat_deltas_from_diffs(reply, partial->oaicompat_msg_diffs);
return;
}
// Try final result
@@ -2299,8 +2385,9 @@ public:
data);
task.id_slot = json_value(data, "id_slot", -1);
// OAI-compat
task.params.res_type = TASK_RESPONSE_TYPE_NONE;
// OAI-compat: enable autoparser (PEG-based chat parsing) so that
// reasoning, tool calls, and content are classified into ChatDeltas.
task.params.res_type = TASK_RESPONSE_TYPE_OAI_CHAT;
task.params.oaicompat_cmpl_id = completion_id;
// oaicompat_model is already populated by params_from_json_cmpl
@@ -2331,25 +2418,48 @@ public:
auto* final_res = dynamic_cast<server_task_result_cmpl_final*>(all_results.results[0].get());
GGML_ASSERT(final_res != nullptr);
json result_json = all_results.results[0]->to_json();
reply->set_message(result_json.value("content", ""));
int32_t tokens_predicted = result_json.value("tokens_predicted", 0);
// Handle both native format ({"content": "...", "tokens_predicted": N})
// and OAI chat format ({"choices": [{"message": {"content": "..."}}],
// "usage": {"completion_tokens": N, "prompt_tokens": N}}).
std::string completion_text;
int32_t tokens_predicted = 0;
int32_t tokens_evaluated = 0;
if (result_json.contains("choices")) {
// OAI chat format
const auto & choices = result_json.at("choices");
if (!choices.empty()) {
const auto & msg = choices[0].value("message", json::object());
if (msg.contains("content") && !msg.at("content").is_null()) {
completion_text = msg.at("content").get<std::string>();
}
}
if (result_json.contains("usage")) {
const auto & usage = result_json.at("usage");
tokens_predicted = usage.value("completion_tokens", 0);
tokens_evaluated = usage.value("prompt_tokens", 0);
}
} else {
// Native llama.cpp format
completion_text = result_json.value("content", "");
tokens_predicted = result_json.value("tokens_predicted", 0);
tokens_evaluated = result_json.value("tokens_evaluated", 0);
}
reply->set_message(completion_text);
reply->set_tokens(tokens_predicted);
int32_t tokens_evaluated = result_json.value("tokens_evaluated", 0);
reply->set_prompt_tokens(tokens_evaluated);
// Timings: present in both formats as a top-level "timings" object
if (result_json.contains("timings")) {
double timing_prompt_processing = result_json.at("timings").value("prompt_ms", 0.0);
reply->set_timing_prompt_processing(timing_prompt_processing);
double timing_token_generation = result_json.at("timings").value("predicted_ms", 0.0);
reply->set_timing_token_generation(timing_token_generation);
reply->set_timing_prompt_processing(result_json.at("timings").value("prompt_ms", 0.0));
reply->set_timing_token_generation(result_json.at("timings").value("predicted_ms", 0.0));
}
// Extract and set logprobs if present
// Logprobs: extract_logprobs_from_json handles both formats
json logprobs_json = extract_logprobs_from_json(result_json);
if (!logprobs_json.empty() && !logprobs_json.is_null()) {
std::string logprobs_str = logprobs_json.dump();
reply->set_logprobs(logprobs_str);
reply->set_logprobs(logprobs_json.dump());
}
// Populate chat deltas from the autoparser's final parsed message
@@ -2365,7 +2475,20 @@ public:
for (auto & res : all_results.results) {
GGML_ASSERT(dynamic_cast<server_task_result_cmpl_final*>(res.get()) != nullptr);
json res_json = res->to_json();
arr.push_back(res_json.value("content", ""));
// Handle both native and OAI chat formats
std::string result_content;
if (res_json.contains("choices")) {
const auto & choices = res_json.at("choices");
if (!choices.empty()) {
const auto & msg = choices[0].value("message", json::object());
if (msg.contains("content") && !msg.at("content").is_null()) {
result_content = msg.at("content").get<std::string>();
}
}
} else {
result_content = res_json.value("content", "");
}
arr.push_back(result_content);
// Extract logprobs for each result
json logprobs_json = extract_logprobs_from_json(res_json);

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# acestep.cpp version
ACESTEP_REPO?=https://github.com/ace-step/acestep.cpp
ACESTEP_CPP_VERSION?=6f35c874ee11e86d511b860019b84976f5b52d3a
ACESTEP_CPP_VERSION?=e0c8d75a672fca5684c88c68dbf6d12f58754258
SO_TARGET?=libgoacestepcpp.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# stablediffusion.cpp (ggml)
STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
STABLEDIFFUSION_GGML_VERSION?=87ecb95cbc65dc8e58e3d88f4f4a59a0939796f5
STABLEDIFFUSION_GGML_VERSION?=8afbeb6ba9702c15d41a38296f2ab1fe5c829fa0
CMAKE_ARGS+=-DGGML_MAX_NAME=128

View File

@@ -3,6 +3,8 @@ package anthropic
import (
"encoding/json"
"fmt"
"sync"
"time"
"github.com/google/uuid"
"github.com/labstack/echo/v4"
@@ -366,7 +368,33 @@ func handleAnthropicStream(c echo.Context, id string, input *schema.AnthropicReq
// Collect tool calls for MCP execution
var collectedToolCalls []functions.FuncCallResults
// SSE keepalive: send comment pings every 3s until the first token arrives.
// This prevents clients (e.g. Claude Code) from timing out while the model loads or processes the prompt.
firstTokenReceived := make(chan struct{})
keepaliveDone := make(chan struct{})
go func() {
defer close(keepaliveDone)
ticker := time.NewTicker(3 * time.Second)
defer ticker.Stop()
for {
select {
case <-firstTokenReceived:
return
case <-c.Request().Context().Done():
return
case <-ticker.C:
fmt.Fprintf(c.Response().Writer, "event: ping\ndata: {\"type\": \"ping\"}\n\n")
c.Response().Flush()
}
}
}()
firstTokenOnce := sync.Once{}
tokenCallback := func(token string, usage backend.TokenUsage) bool {
firstTokenOnce.Do(func() {
close(firstTokenReceived)
<-keepaliveDone // wait for keepalive goroutine to exit before writing
})
accumulatedContent += token
if shouldUseFn {
@@ -414,7 +442,7 @@ func handleAnthropicStream(c echo.Context, id string, input *schema.AnthropicReq
}
}
if !inToolCall {
if !inToolCall && token != "" {
sendAnthropicSSE(c, schema.AnthropicStreamEvent{
Type: "content_block_delta",
Index: intPtr(0),
@@ -433,6 +461,11 @@ func handleAnthropicStream(c echo.Context, id string, input *schema.AnthropicReq
openAIReq.Metadata = input.Metadata
_, tokenUsage, chatDeltas, err := openaiEndpoint.ComputeChoices(openAIReq, predInput, cfg, cl, appConfig, ml, func(s string, c *[]schema.Choice) {}, tokenCallback)
// Stop the keepalive goroutine now that inference is done
firstTokenOnce.Do(func() { close(firstTokenReceived) })
<-keepaliveDone
if err != nil {
xlog.Error("Anthropic stream model inference failed", "error", err)
sendAnthropicSSE(c, schema.AnthropicStreamEvent{
@@ -445,9 +478,68 @@ func handleAnthropicStream(c echo.Context, id string, input *schema.AnthropicReq
return nil
}
// Also check chat deltas for tool calls
if deltaToolCalls := functions.ToolCallsFromChatDeltas(chatDeltas); len(deltaToolCalls) > 0 && len(collectedToolCalls) == 0 {
collectedToolCalls = deltaToolCalls
// Check chat deltas from C++ autoparser — when active, the raw
// message is cleared and content/tool calls arrive via ChatDeltas.
if len(chatDeltas) > 0 {
deltaContent := functions.ContentFromChatDeltas(chatDeltas)
deltaToolCalls := functions.ToolCallsFromChatDeltas(chatDeltas)
// Emit text content from ChatDeltas only when the tokenCallback
// didn't already stream it (autoparser clears raw text, so
// accumulatedContent will be empty in that case).
if deltaContent != "" && !inToolCall && accumulatedContent == "" {
sendAnthropicSSE(c, schema.AnthropicStreamEvent{
Type: "content_block_delta",
Index: intPtr(0),
Delta: &schema.AnthropicStreamDelta{
Type: "text_delta",
Text: deltaContent,
},
})
}
// Emit tool_use blocks from ChatDeltas
if len(deltaToolCalls) > 0 && len(collectedToolCalls) == 0 {
collectedToolCalls = deltaToolCalls
if !inToolCall && currentBlockIndex == 0 {
sendAnthropicSSE(c, schema.AnthropicStreamEvent{
Type: "content_block_stop",
Index: intPtr(currentBlockIndex),
})
currentBlockIndex++
inToolCall = true
}
for i, tc := range deltaToolCalls {
toolCallID := tc.ID
if toolCallID == "" {
toolCallID = fmt.Sprintf("toolu_%s_%d", id, i)
}
sendAnthropicSSE(c, schema.AnthropicStreamEvent{
Type: "content_block_start",
Index: intPtr(currentBlockIndex),
ContentBlock: &schema.AnthropicContentBlock{
Type: "tool_use",
ID: toolCallID,
Name: tc.Name,
},
})
sendAnthropicSSE(c, schema.AnthropicStreamEvent{
Type: "content_block_delta",
Index: intPtr(currentBlockIndex),
Delta: &schema.AnthropicStreamDelta{
Type: "input_json_delta",
PartialJSON: tc.Arguments,
},
})
sendAnthropicSSE(c, schema.AnthropicStreamEvent{
Type: "content_block_stop",
Index: intPtr(currentBlockIndex),
})
currentBlockIndex++
toolCallsEmitted++
}
}
}
// MCP streaming tool execution: if we collected MCP tool calls, execute and loop

View File

@@ -147,10 +147,23 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
result := ""
lastEmittedCount := 0
sentInitialRole := false
hasChatDeltaToolCalls := false
hasChatDeltaContent := false
_, tokenUsage, chatDeltas, err := ComputeChoices(req, prompt, config, cl, startupOptions, loader, func(s string, c *[]schema.Choice) {}, func(s string, usage backend.TokenUsage) bool {
result += s
// Track whether ChatDeltas from the C++ autoparser contain
// tool calls or content, so the retry decision can account for them.
for _, d := range usage.ChatDeltas {
if len(d.ToolCalls) > 0 {
hasChatDeltaToolCalls = true
}
if d.Content != "" {
hasChatDeltaContent = true
}
}
var reasoningDelta, contentDelta string
goReasoning, goContent := extractor.ProcessToken(s)
@@ -309,15 +322,22 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
// After streaming completes: check if we got actionable content
cleaned := extractor.CleanedContent()
// Check for tool calls from chat deltas (will be re-checked after ComputeChoices,
// but we need to know here whether to retry)
hasToolCalls := lastEmittedCount > 0
if cleaned == "" && !hasToolCalls {
// but we need to know here whether to retry).
// Also check ChatDelta flags — when the C++ autoparser is active,
// tool calls and content are delivered via ChatDeltas while the
// raw message is cleared. Without this check, we'd retry
// unnecessarily, losing valid results and concatenating output.
hasToolCalls := lastEmittedCount > 0 || hasChatDeltaToolCalls
hasContent := cleaned != "" || hasChatDeltaContent
if !hasContent && !hasToolCalls {
xlog.Warn("Streaming: backend produced only reasoning, retrying",
"reasoning_len", len(extractor.Reasoning()), "attempt", attempt+1)
extractor.ResetAndSuppressReasoning()
result = ""
lastEmittedCount = 0
sentInitialRole = false
hasChatDeltaToolCalls = false
hasChatDeltaContent = false
return true
}
return false
@@ -1006,7 +1026,12 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
if deltaReasoning != "" {
message.Reasoning = &deltaReasoning
}
result = []schema.Choice{{FinishReason: &stopReason, Index: 0, Message: message}}
newChoice := schema.Choice{FinishReason: &stopReason, Index: 0, Message: message}
// Preserve logprobs from the original result
if len(result) > 0 && result[0].Logprobs != nil {
newChoice.Logprobs = result[0].Logprobs
}
result = []schema.Choice{newChoice}
}
}

View File

@@ -113,11 +113,23 @@ func ComputeChoices(
}
prediction = p
// Built-in: retry on truly empty response (no tokens at all)
// Built-in: retry on truly empty response (no tokens at all).
// However, when the C++ autoparser is active, it clears the raw
// message and delivers content via ChatDeltas instead. Do NOT
// retry if ChatDeltas contain tool calls or content.
if strings.TrimSpace(prediction.Response) == "" && attempt < maxRetries {
xlog.Warn("Backend returned empty response, retrying",
"attempt", attempt+1, "maxRetries", maxRetries)
continue
hasChatDeltaData := false
for _, d := range prediction.ChatDeltas {
if d.Content != "" || len(d.ToolCalls) > 0 {
hasChatDeltaData = true
break
}
}
if !hasChatDeltaData {
xlog.Warn("Backend returned empty response, retrying",
"attempt", attempt+1, "maxRetries", maxRetries)
continue
}
}
tokenUsage.Prompt = prediction.Usage.Prompt
@@ -130,8 +142,21 @@ func ComputeChoices(
finetunedResponse := backend.Finetune(*config, predInput, prediction.Response)
cb(finetunedResponse, &result)
// Caller-driven retry (tool parsing, reasoning-only, etc.)
if shouldRetryFn != nil && shouldRetryFn(attempt) && attempt < maxRetries {
// Caller-driven retry (tool parsing, reasoning-only, etc.).
// When the C++ autoparser is active, it clears the raw response
// and delivers data via ChatDeltas. If the response is empty but
// ChatDeltas contain actionable data, skip the caller retry —
// the autoparser already parsed the response successfully.
skipCallerRetry := false
if strings.TrimSpace(prediction.Response) == "" && len(prediction.ChatDeltas) > 0 {
for _, d := range prediction.ChatDeltas {
if d.Content != "" || len(d.ToolCalls) > 0 {
skipCallerRetry = true
break
}
}
}
if shouldRetryFn != nil && !skipCallerRetry && shouldRetryFn(attempt) && attempt < maxRetries {
// Caller has already reset its state inside shouldRetry
result = result[:0]
allChatDeltas = nil

View File

@@ -1,179 +0,0 @@
package middleware
import (
"crypto/subtle"
"errors"
"net/http"
"strings"
"github.com/labstack/echo/v4"
"github.com/labstack/echo/v4/middleware"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/schema"
)
var ErrMissingOrMalformedAPIKey = errors.New("missing or malformed API Key")
// GetKeyAuthConfig returns Echo's KeyAuth middleware configuration
func GetKeyAuthConfig(applicationConfig *config.ApplicationConfig) (echo.MiddlewareFunc, error) {
// Create validator function
validator := getApiKeyValidationFunction(applicationConfig)
// Create error handler
errorHandler := getApiKeyErrorHandler(applicationConfig)
// Create Next function (skip middleware for certain requests)
skipper := getApiKeyRequiredFilterFunction(applicationConfig)
// Wrap it with our custom key lookup that checks multiple sources
return func(next echo.HandlerFunc) echo.HandlerFunc {
return func(c echo.Context) error {
if len(applicationConfig.ApiKeys) == 0 {
return next(c)
}
// Skip if skipper says so
if skipper != nil && skipper(c) {
return next(c)
}
// Try to extract key from multiple sources
key, err := extractKeyFromMultipleSources(c)
if err != nil {
return errorHandler(err, c)
}
// Validate the key
valid, err := validator(key, c)
if err != nil || !valid {
return errorHandler(ErrMissingOrMalformedAPIKey, c)
}
// Store key in context for later use
c.Set("api_key", key)
return next(c)
}
}, nil
}
// extractKeyFromMultipleSources checks multiple sources for the API key
// in order: Authorization header, x-api-key header, xi-api-key header, token cookie
func extractKeyFromMultipleSources(c echo.Context) (string, error) {
// Check Authorization header first
auth := c.Request().Header.Get("Authorization")
if auth != "" {
// Check for Bearer scheme
if strings.HasPrefix(auth, "Bearer ") {
return strings.TrimPrefix(auth, "Bearer "), nil
}
// If no Bearer prefix, return as-is (for backward compatibility)
return auth, nil
}
// Check x-api-key header
if key := c.Request().Header.Get("x-api-key"); key != "" {
return key, nil
}
// Check xi-api-key header
if key := c.Request().Header.Get("xi-api-key"); key != "" {
return key, nil
}
// Check token cookie
cookie, err := c.Cookie("token")
if err == nil && cookie != nil && cookie.Value != "" {
return cookie.Value, nil
}
return "", ErrMissingOrMalformedAPIKey
}
func getApiKeyErrorHandler(applicationConfig *config.ApplicationConfig) func(error, echo.Context) error {
return func(err error, c echo.Context) error {
if errors.Is(err, ErrMissingOrMalformedAPIKey) {
if len(applicationConfig.ApiKeys) == 0 {
return nil // if no keys are set up, any error we get here is not an error.
}
c.Response().Header().Set("WWW-Authenticate", "Bearer")
if applicationConfig.OpaqueErrors {
return c.NoContent(http.StatusUnauthorized)
}
// Check if the request content type is JSON
contentType := c.Request().Header.Get("Content-Type")
if strings.Contains(contentType, "application/json") {
return c.JSON(http.StatusUnauthorized, schema.ErrorResponse{
Error: &schema.APIError{
Message: "An authentication key is required",
Code: 401,
Type: "invalid_request_error",
},
})
}
return c.Render(http.StatusUnauthorized, "views/login", map[string]any{
"BaseURL": BaseURL(c),
})
}
if applicationConfig.OpaqueErrors {
return c.NoContent(http.StatusInternalServerError)
}
return err
}
}
func getApiKeyValidationFunction(applicationConfig *config.ApplicationConfig) func(string, echo.Context) (bool, error) {
if applicationConfig.UseSubtleKeyComparison {
return func(key string, c echo.Context) (bool, error) {
if len(applicationConfig.ApiKeys) == 0 {
return true, nil // If no keys are setup, accept everything
}
for _, validKey := range applicationConfig.ApiKeys {
if subtle.ConstantTimeCompare([]byte(key), []byte(validKey)) == 1 {
return true, nil
}
}
return false, ErrMissingOrMalformedAPIKey
}
}
return func(key string, c echo.Context) (bool, error) {
if len(applicationConfig.ApiKeys) == 0 {
return true, nil // If no keys are setup, accept everything
}
for _, validKey := range applicationConfig.ApiKeys {
if key == validKey {
return true, nil
}
}
return false, ErrMissingOrMalformedAPIKey
}
}
func getApiKeyRequiredFilterFunction(applicationConfig *config.ApplicationConfig) middleware.Skipper {
return func(c echo.Context) bool {
path := c.Request().URL.Path
for _, p := range applicationConfig.PathWithoutAuth {
if strings.HasPrefix(path, p) {
return true
}
}
// Handle GET request exemptions if enabled
if applicationConfig.DisableApiKeyRequirementForHttpGet {
if c.Request().Method != http.MethodGet {
return false
}
for _, rx := range applicationConfig.HttpGetExemptedEndpoints {
if rx.MatchString(c.Path()) {
return true
}
}
}
return false
}
}

View File

@@ -1,228 +0,0 @@
package middleware_test
import (
"net/http"
"net/http/httptest"
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/config"
. "github.com/mudler/LocalAI/core/http/middleware"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
// ok is a simple handler that returns 200 OK.
func ok(c echo.Context) error {
return c.String(http.StatusOK, "ok")
}
// newAuthApp creates a minimal Echo app with auth middleware applied.
// Requests that fail auth with Content-Type: application/json get a JSON 401
// (no template renderer needed).
func newAuthApp(appConfig *config.ApplicationConfig) *echo.Echo {
e := echo.New()
mw, err := GetKeyAuthConfig(appConfig)
Expect(err).ToNot(HaveOccurred())
e.Use(mw)
// Sensitive API routes
e.GET("/v1/models", ok)
e.POST("/v1/chat/completions", ok)
// UI routes
e.GET("/app", ok)
e.GET("/app/*", ok)
e.GET("/browse", ok)
e.GET("/browse/*", ok)
e.GET("/login", ok)
e.GET("/explorer", ok)
e.GET("/assets/*", ok)
e.POST("/app", ok)
return e
}
// doRequest performs an HTTP request against the given Echo app and returns the recorder.
func doRequest(e *echo.Echo, method, path string, opts ...func(*http.Request)) *httptest.ResponseRecorder {
req := httptest.NewRequest(method, path, nil)
req.Header.Set("Content-Type", "application/json")
for _, opt := range opts {
opt(req)
}
rec := httptest.NewRecorder()
e.ServeHTTP(rec, req)
return rec
}
func withBearerToken(token string) func(*http.Request) {
return func(req *http.Request) {
req.Header.Set("Authorization", "Bearer "+token)
}
}
func withXApiKey(key string) func(*http.Request) {
return func(req *http.Request) {
req.Header.Set("x-api-key", key)
}
}
func withXiApiKey(key string) func(*http.Request) {
return func(req *http.Request) {
req.Header.Set("xi-api-key", key)
}
}
func withTokenCookie(token string) func(*http.Request) {
return func(req *http.Request) {
req.AddCookie(&http.Cookie{Name: "token", Value: token})
}
}
var _ = Describe("Auth Middleware", func() {
Context("when API keys are configured", func() {
var app *echo.Echo
const validKey = "sk-test-key-123"
BeforeEach(func() {
appConfig := config.NewApplicationConfig()
appConfig.ApiKeys = []string{validKey}
app = newAuthApp(appConfig)
})
It("returns 401 for GET request without a key", func() {
rec := doRequest(app, http.MethodGet, "/v1/models")
Expect(rec.Code).To(Equal(http.StatusUnauthorized))
})
It("returns 401 for POST request without a key", func() {
rec := doRequest(app, http.MethodPost, "/v1/chat/completions")
Expect(rec.Code).To(Equal(http.StatusUnauthorized))
})
It("returns 401 for request with an invalid key", func() {
rec := doRequest(app, http.MethodGet, "/v1/models", withBearerToken("wrong-key"))
Expect(rec.Code).To(Equal(http.StatusUnauthorized))
})
It("passes through with valid Bearer token in Authorization header", func() {
rec := doRequest(app, http.MethodGet, "/v1/models", withBearerToken(validKey))
Expect(rec.Code).To(Equal(http.StatusOK))
})
It("passes through with valid x-api-key header", func() {
rec := doRequest(app, http.MethodGet, "/v1/models", withXApiKey(validKey))
Expect(rec.Code).To(Equal(http.StatusOK))
})
It("passes through with valid xi-api-key header", func() {
rec := doRequest(app, http.MethodGet, "/v1/models", withXiApiKey(validKey))
Expect(rec.Code).To(Equal(http.StatusOK))
})
It("passes through with valid token cookie", func() {
rec := doRequest(app, http.MethodGet, "/v1/models", withTokenCookie(validKey))
Expect(rec.Code).To(Equal(http.StatusOK))
})
})
Context("when no API keys are configured", func() {
var app *echo.Echo
BeforeEach(func() {
appConfig := config.NewApplicationConfig()
app = newAuthApp(appConfig)
})
It("passes through without any key", func() {
rec := doRequest(app, http.MethodGet, "/v1/models")
Expect(rec.Code).To(Equal(http.StatusOK))
})
})
Context("GET exempted endpoints (feature enabled)", func() {
var app *echo.Echo
const validKey = "sk-test-key-456"
BeforeEach(func() {
appConfig := config.NewApplicationConfig(
config.WithApiKeys([]string{validKey}),
config.WithDisableApiKeyRequirementForHttpGet(true),
config.WithHttpGetExemptedEndpoints([]string{
"^/$",
"^/app(/.*)?$",
"^/browse(/.*)?$",
"^/login/?$",
"^/explorer/?$",
"^/assets/.*$",
"^/static/.*$",
"^/swagger.*$",
}),
)
app = newAuthApp(appConfig)
})
It("allows GET to /app without a key", func() {
rec := doRequest(app, http.MethodGet, "/app")
Expect(rec.Code).To(Equal(http.StatusOK))
})
It("allows GET to /app/chat/model sub-route without a key", func() {
rec := doRequest(app, http.MethodGet, "/app/chat/llama3")
Expect(rec.Code).To(Equal(http.StatusOK))
})
It("allows GET to /browse/models without a key", func() {
rec := doRequest(app, http.MethodGet, "/browse/models")
Expect(rec.Code).To(Equal(http.StatusOK))
})
It("allows GET to /login without a key", func() {
rec := doRequest(app, http.MethodGet, "/login")
Expect(rec.Code).To(Equal(http.StatusOK))
})
It("allows GET to /explorer without a key", func() {
rec := doRequest(app, http.MethodGet, "/explorer")
Expect(rec.Code).To(Equal(http.StatusOK))
})
It("allows GET to /assets/main.js without a key", func() {
rec := doRequest(app, http.MethodGet, "/assets/main.js")
Expect(rec.Code).To(Equal(http.StatusOK))
})
It("rejects POST to /app without a key", func() {
rec := doRequest(app, http.MethodPost, "/app")
Expect(rec.Code).To(Equal(http.StatusUnauthorized))
})
It("rejects GET to /v1/models without a key", func() {
rec := doRequest(app, http.MethodGet, "/v1/models")
Expect(rec.Code).To(Equal(http.StatusUnauthorized))
})
})
Context("GET exempted endpoints (feature disabled)", func() {
var app *echo.Echo
const validKey = "sk-test-key-789"
BeforeEach(func() {
appConfig := config.NewApplicationConfig(
config.WithApiKeys([]string{validKey}),
// DisableApiKeyRequirementForHttpGet defaults to false
config.WithHttpGetExemptedEndpoints([]string{
"^/$",
"^/app(/.*)?$",
}),
)
app = newAuthApp(appConfig)
})
It("requires auth for GET to /app even though it matches exempted pattern", func() {
rec := doRequest(app, http.MethodGet, "/app")
Expect(rec.Code).To(Equal(http.StatusUnauthorized))
})
})
})

View File

@@ -42,5 +42,6 @@ export function vendorColor(vendor) {
if (v.includes('nvidia')) return '#76b900'
if (v.includes('amd')) return '#ed1c24'
if (v.includes('intel')) return '#0071c5'
if (v.includes('apple')) return '#a2aaad'
return 'var(--color-accent)'
}

View File

@@ -157,11 +157,11 @@ func RegisterAuthRoutes(e *echo.Echo, app *application.Application) {
}
resp := map[string]any{
"authEnabled": authEnabled,
"staticApiKeyRequired": !authEnabled && len(appConfig.ApiKeys) > 0,
"providers": providers,
"hasUsers": hasUsers,
"registrationMode": registrationMode,
"authEnabled": authEnabled,
"staticApiKeyRequired": !authEnabled && len(appConfig.ApiKeys) > 0,
"providers": providers,
"hasUsers": hasUsers,
"registrationMode": registrationMode,
}
// Include current user if authenticated
@@ -186,7 +186,73 @@ func RegisterAuthRoutes(e *echo.Echo, app *application.Application) {
return c.JSON(http.StatusOK, resp)
})
// OAuth routes - only registered when auth is enabled
// Rate limiter for auth endpoints: 5 attempts per minute per IP
authRL := newRateLimiter(1*time.Minute, 5)
authRateLimitMw := rateLimitMiddleware(authRL)
// Start background goroutine to periodically prune stale IP entries
go func() {
ticker := time.NewTicker(10 * time.Minute)
defer ticker.Stop()
for {
select {
case <-appConfig.Context.Done():
return
case <-ticker.C:
authRL.cleanup()
}
}
}()
// POST /api/auth/token-login - authenticate with API key/token.
// Registered when auth DB or legacy API keys are configured.
if db != nil || len(appConfig.ApiKeys) > 0 {
e.POST("/api/auth/token-login", func(c echo.Context) error {
var body struct {
Token string `json:"token"`
}
if err := c.Bind(&body); err != nil || strings.TrimSpace(body.Token) == "" {
return c.JSON(http.StatusBadRequest, map[string]string{"error": "token is required"})
}
token := strings.TrimSpace(body.Token)
// Try as user API key (only when auth DB is available)
if db != nil {
if apiKey, err := auth.ValidateAPIKey(db, token, appConfig.Auth.APIKeyHMACSecret); err == nil {
sessionID, err := auth.CreateSession(db, apiKey.User.ID, appConfig.Auth.APIKeyHMACSecret)
if err != nil {
return c.JSON(http.StatusInternalServerError, map[string]string{"error": "failed to create session"})
}
auth.SetSessionCookie(c, sessionID)
return c.JSON(http.StatusOK, map[string]any{
"user": map[string]any{
"id": apiKey.User.ID,
"email": apiKey.User.Email,
"name": apiKey.User.Name,
"role": apiKey.User.Role,
},
})
}
}
// Try as legacy API key
if len(appConfig.ApiKeys) > 0 && isValidLegacyKey(token, appConfig) {
auth.SetTokenCookie(c, token)
return c.JSON(http.StatusOK, map[string]any{
"user": map[string]any{
"id": "legacy-api-key",
"name": "API Key User",
"role": auth.RoleAdmin,
},
})
}
return c.JSON(http.StatusUnauthorized, map[string]string{"error": "invalid token"})
}, authRateLimitMw)
}
// Remaining routes require auth DB
if db == nil {
return
}
@@ -219,24 +285,6 @@ func RegisterAuthRoutes(e *echo.Echo, app *application.Application) {
}
}
// Rate limiter for auth endpoints: 5 attempts per minute per IP
authRL := newRateLimiter(1*time.Minute, 5)
authRateLimitMw := rateLimitMiddleware(authRL)
// Start background goroutine to periodically prune stale IP entries (#12)
go func() {
ticker := time.NewTicker(10 * time.Minute)
defer ticker.Stop()
for {
select {
case <-appConfig.Context.Done():
return
case <-ticker.C:
authRL.cleanup()
}
}
}()
// POST /api/auth/register - public, email/password registration
e.POST("/api/auth/register", func(c echo.Context) error {
if appConfig.Auth.DisableLocalAuth {
@@ -427,53 +475,6 @@ func RegisterAuthRoutes(e *echo.Echo, app *application.Application) {
})
}, authRateLimitMw)
// POST /api/auth/token-login - public, authenticate with API key/token (#3)
e.POST("/api/auth/token-login", func(c echo.Context) error {
var body struct {
Token string `json:"token"`
}
if err := c.Bind(&body); err != nil || strings.TrimSpace(body.Token) == "" {
return c.JSON(http.StatusBadRequest, map[string]string{"error": "token is required"})
}
token := strings.TrimSpace(body.Token)
hmacSecret := appConfig.Auth.APIKeyHMACSecret
// Try as user API key
if apiKey, err := auth.ValidateAPIKey(db, token, hmacSecret); err == nil {
sessionID, err := auth.CreateSession(db, apiKey.User.ID, appConfig.Auth.APIKeyHMACSecret)
if err != nil {
return c.JSON(http.StatusInternalServerError, map[string]string{"error": "failed to create session"})
}
auth.SetSessionCookie(c, sessionID)
return c.JSON(http.StatusOK, map[string]any{
"user": map[string]any{
"id": apiKey.User.ID,
"email": apiKey.User.Email,
"name": apiKey.User.Name,
"role": apiKey.User.Role,
},
})
}
// Try as legacy API key
if len(appConfig.ApiKeys) > 0 && isValidLegacyKey(token, appConfig) {
// Create a synthetic session cookie with the token for legacy mode
auth.SetTokenCookie(c, token)
return c.JSON(http.StatusOK, map[string]any{
"user": map[string]any{
"id": "legacy-api-key",
"name": "API Key User",
"role": auth.RoleAdmin,
},
})
}
return c.JSON(http.StatusUnauthorized, map[string]string{"error": "invalid token"})
}, authRateLimitMw)
// POST /api/auth/logout - requires auth
e.POST("/api/auth/logout", func(c echo.Context) error {
user := auth.GetUser(c)

View File

@@ -189,8 +189,8 @@ These settings apply to most LLM backends (llama.cpp, vLLM, etc.):
| Field | Type | Description |
|-------|------|-------------|
| `no_mulmatq` | bool | Disable matrix multiplication queuing |
| `draft_model` | string | Draft model for speculative decoding |
| `n_draft` | int32 | Number of draft tokens |
| `draft_model` | string | Draft model GGUF file for speculative decoding (see [Speculative Decoding](#speculative-decoding)) |
| `n_draft` | int32 | Maximum number of draft tokens per speculative step (default: 16) |
| `quantization` | string | Quantization format |
| `load_format` | string | Model load format |
| `numa` | bool | Enable NUMA (Non-Uniform Memory Access) |
@@ -211,6 +211,76 @@ YARN (Yet Another RoPE extensioN) settings for context extension:
| `yarn_beta_fast` | float32 | YARN beta fast parameter |
| `yarn_beta_slow` | float32 | YARN beta slow parameter |
### Speculative Decoding
Speculative decoding speeds up text generation by predicting multiple tokens ahead and verifying them in a single forward pass. The output is identical to normal decoding — only faster. This feature is only available with the `llama-cpp` backend.
There are two approaches:
#### Draft Model Speculative Decoding
Uses a smaller, faster model from the same model family to draft candidate tokens, which the main model then verifies. Requires a separate GGUF file for the draft model.
```yaml
name: my-model
backend: llama-cpp
parameters:
model: large-model.gguf
draft_model: small-draft-model.gguf
n_draft: 8
options:
- spec_p_min:0.8
- draft_gpu_layers:99
```
#### N-gram Self-Speculative Decoding
Uses patterns from the token history to predict future tokens — no extra model required. Works well for repetitive or structured output (code, JSON, lists).
```yaml
name: my-model
backend: llama-cpp
parameters:
model: my-model.gguf
options:
- spec_type:ngram_simple
- spec_n_max:16
```
#### Speculative Decoding Options
These are set via the `options:` array in the model configuration (format: `key:value`):
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `spec_type` | string | `none` | Speculative decoding type (see table below) |
| `spec_n_max` / `draft_max` | int | 16 | Maximum number of tokens to draft per step |
| `spec_n_min` / `draft_min` | int | 0 | Minimum draft tokens required to use speculation |
| `spec_p_min` / `draft_p_min` | float | 0.75 | Minimum probability threshold for greedy acceptance |
| `spec_p_split` | float | 0.1 | Split probability for tree-based branching |
| `spec_ngram_size_n` / `ngram_size_n` | int | 12 | N-gram lookup size |
| `spec_ngram_size_m` / `ngram_size_m` | int | 48 | M-gram proposal size |
| `spec_ngram_min_hits` / `ngram_min_hits` | int | 1 | Minimum hits for accepting n-gram proposals |
| `draft_gpu_layers` | int | -1 | GPU layers for the draft model (-1 = use default) |
| `draft_ctx_size` | int | 0 | Context size for the draft model (0 = auto) |
#### Speculative Type Values
| Type | Description |
|------|-------------|
| `none` | No speculative decoding (default) |
| `draft` | Draft model-based speculation (auto-set when `draft_model` is configured) |
| `eagle3` | EAGLE3 draft model architecture |
| `ngram_simple` | Simple self-speculative using token history |
| `ngram_map_k` | N-gram with key-only map |
| `ngram_map_k4v` | N-gram with keys and 4 m-gram values |
| `ngram_mod` | Modified n-gram speculation |
| `ngram_cache` | 3-level n-gram cache |
{{% notice note %}}
Speculative decoding is automatically disabled when multimodal models (with `mmproj`) are active. The `n_draft` parameter can also be overridden per-request.
{{% /notice %}}
### Prompt Caching
| Field | Type | Description |

View File

@@ -166,6 +166,67 @@ This section provides step-by-step instructions for configuring specific softwar
After saving the configuration file, restart OpenCode for the changes to take effect.
### Claude Code
[Claude Code](https://docs.anthropic.com/en/docs/claude-code) is Anthropic's official CLI tool for coding with Claude. LocalAI implements the Anthropic Messages API (`/v1/messages`), so Claude Code can be pointed directly at a LocalAI instance.
#### Prerequisites
- LocalAI must be running and accessible (either locally or on a network)
- You need to know your LocalAI server's IP address/hostname and port (default is `8080`)
- An API key configured in your LocalAI instance
#### Running Claude Code with LocalAI
Set the `ANTHROPIC_BASE_URL` and `ANTHROPIC_API_KEY` environment variables to point Claude Code at your LocalAI server:
```bash
ANTHROPIC_BASE_URL=http://127.0.0.1:8080 \
ANTHROPIC_API_KEY=your-localai-api-key \
claude --model your-model-name
```
For example, if you have a Gemma model loaded:
```bash
ANTHROPIC_BASE_URL=http://127.0.0.1:8080 \
ANTHROPIC_API_KEY=your-localai-api-key \
claude --model gemma-4-12B-it-GGUF
```
You can also run a single prompt non-interactively:
```bash
ANTHROPIC_BASE_URL=http://127.0.0.1:8080 \
ANTHROPIC_API_KEY=your-localai-api-key \
claude -p "list the files in /tmp" --model your-model-name
```
#### Configuration
To avoid setting environment variables every time, you can add them to your shell profile (e.g., `~/.bashrc` or `~/.zshrc`):
```bash
export ANTHROPIC_BASE_URL=http://127.0.0.1:8080
export ANTHROPIC_API_KEY=your-localai-api-key
```
#### Verify available models
Check which models are available in your LocalAI instance:
```bash
curl http://127.0.0.1:8080/v1/models
```
Use one of the listed model IDs as the `--model` argument.
#### Notes
- Models with tool calling support (e.g., Gemma 4, Qwen 3) work best, as Claude Code relies heavily on tool use for file operations and code editing.
- Larger models generally produce better results for complex coding tasks.
- The Anthropic Messages API endpoint supports both streaming and non-streaming modes.
### Charm Crush
You can ask [Charm Crush](https://charm.land/crush) to generate your config by giving it this documentation's URL and your LocalAI instance URL. The configuration will look something like the following and goes in `~/.config/crush/crush.json`:

View File

@@ -1,3 +1,3 @@
{
"version": "v4.1.0"
"version": "v4.1.2"
}

View File

@@ -1288,6 +1288,59 @@
- filename: llama-cpp/mmproj/Qwen3-VL-Reranker-8B.mmproj-f16.gguf
sha256: 15cd9bd4882dae771344f0ac204fce07de91b47c1438ada3861dfc817403c31e
uri: https://huggingface.co/mradermacher/Qwen3-VL-Reranker-8B-GGUF/resolve/main/Qwen3-VL-Reranker-8B.mmproj-f16.gguf
- name: "qwen3-vl-reranker-2b-i1"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:
- https://huggingface.co/mradermacher/Qwen3-VL-Reranker-2B-i1-GGUF
description: |
**Model Name:** Qwen3-VL-Reranker-2B-i1
**Base Model:** Qwen/Qwen3-VL-Reranker-2B
**Description:**
A high-performance multimodal reranking model for state-of-the-art cross-modal search. It supports 30+ languages and handles text, images, screenshots, videos, and mixed modalities. With 8B parameters and a 32K context length, it refines retrieval results by combining embedding vectors with precise relevance scores. Optimized for efficiency, it supports quantized versions (e.g., Q8_0, Q4_K_M) and is ideal for applications requiring accurate multimodal content matching.
**Key Features:**
- **Multimodal**: Text, images, videos, and mixed content.
- **Language Support**: 30+ languages.
- **Quantization**: Available in Q8_0 (best quality), Q4_K_M (fast, recommended), and lower-precision options.
- **Performance**: Outperforms base models in retrieval tasks (e.g., JinaVDR, ViDoRe v3).
- **Use Case**: Enhances search pipelines by refining embeddings with precise relevance scores.
**Downloads:**
- [GGUF Files](https://huggingface.co/mradermacher/Qwen3-VL-Reranker-2B-i1-GGUF) (e.g., `Qwen3-VL-Reranker-2B.i1-Q4_K_M.gguf`).
**Usage:**
- Requires `transformers`, `qwen-vl-utils`, and `torch`.
- Example: `from scripts.qwen3_vl_reranker import Qwen3VLReranker; model = Qwen3VLReranker(...)`
**Citation:**
@article{qwen3vlembedding, ...}
This description emphasizes its capabilities, efficiency, and versatility for multimodal search tasks.
overrides:
reranking: true
parameters:
model: llama-cpp/models/Qwen3-VL-Reranker-2B.i1-Q4_K_M.gguf
name: Qwen3-VL-Reranker-2B-i1-GGUF
backend: llama-cpp
template:
use_tokenizer_template: true
known_usecases:
- chat
function:
grammar:
disable: true
mmproj: llama-cpp/mmproj/Qwen3-VL-Reranker-2B.mmproj-f16.gguf
description: Imported from https://huggingface.co/mradermacher/Qwen3-VL-Reranker-2B-GGUF/
options:
- use_jinja:true
files:
- filename: llama-cpp/models/Qwen3-VL-Reranker-2B.i1-Q4_K_M.gguf
sha256: f19dfbceeef9f6ee1f7d0ff536d66e9b1b90424a4b8aa1d1777db43d20afdbc5
uri: https://huggingface.co/mradermacher/Qwen3-VL-Reranker-2B-i1-GGUF/resolve/main/Qwen3-VL-Reranker-2B.i1-Q4_K_M.gguf
- filename: llama-cpp/mmproj/Qwen3-VL-Reranker-8B.mmproj-f16.gguf
sha256: d38b7ae347fc3e51726bfb9cba1b04885f1f005a4087d8070933e46509db5a6e
uri: https://huggingface.co/mradermacher/Qwen3-VL-Reranker-2B-GGUF/resolve/main/Qwen3-VL-Reranker-2B.mmproj-f16.gguf
- name: "liquidai.lfm2-2.6b-transcript"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:
@@ -3095,6 +3148,35 @@
- filename: Qwen_Qwen3-30B-A3B-Q4_K_M.gguf
sha256: a015794bfb1d69cb03dbb86b185fb2b9b339f757df5f8f9dd9ebdab8f6ed5d32
uri: huggingface://bartowski/Qwen_Qwen3-30B-A3B-GGUF/Qwen_Qwen3-30B-A3B-Q4_K_M.gguf
- !!merge <<: *qwen3
name: "qwen3-reranker-0.6b"
tags:
- qwen3
- reranker
- gguf
- gpu
- cpu
urls:
- https://huggingface.co/Qwen/Qwen3-Reranker-0.6B
description: |
The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B). This series inherits the exceptional multilingual capabilities, long-text understanding, and reasoning skills of its foundational model. The Qwen3 Embedding series represents significant advancements in multiple text embedding and ranking tasks, including text retrieval, code retrieval, text classification, text clustering, and bitext mining.
**Exceptional Versatility**: The embedding model has achieved state-of-the-art performance across a wide range of downstream application evaluations. The 8B size embedding model ranks No.1 in the MTEB multilingual leaderboard (as of June 5, 2025, score 70.58), while the reranking model excels in various text retrieval scenarios.
**Comprehensive Flexibility**: The Qwen3 Embedding series offers a full spectrum of sizes (from 0.6B to 8B) for both embedding and reranking models, catering to diverse use cases that prioritize efficiency and effectiveness. Developers can seamlessly combine these two modules. Additionally, the embedding model allows for flexible vector definitions across all dimensions, and both embedding and reranking models support user-defined instructions to enhance performance for specific tasks, languages, or scenarios.
**Multilingual Capability**: The Qwen3 Embedding series offer support for over 100 languages, thanks to the multilingual capabilites of Qwen3 models. This includes various programming languages, and provides robust multilingual, cross-lingual, and code retrieval capabilities.
**Qwen3-Reranker-0.6B** has the following features:
- Model Type: Text Reranking
- Supported Languages: 100+ Languages
- Number of Paramaters: 0.6B
- Context Length: 32k
- Quantization: q4_K_M, q5_0, q5_K_M, q6_K, q8_0, f16
overrides:
reranking: true
parameters:
model: Qwen3-Reranker-0.6B.Q8_0.gguf
files:
- filename: Qwen3-Reranker-0.6B.Q8_0.gguf
uri: huggingface://mradermacher/Qwen3-Reranker-0.6B-GGUF/Qwen3-Reranker-0.6B.Q8_0.gguf
sha256: c525a7449243f690a7062e6377d6cf5adbb289354bd4316312367cd20e187ab7
- !!merge <<: *qwen3
name: "qwen3-235b-a22b-instruct-2507"
icon: https://cdn-avatars.huggingface.co/v1/production/uploads/620760a26e3b7210c2ff1943/-s1gyJfvbE1RgO5iBeNOi.png

View File

@@ -2,7 +2,10 @@
name: "qwen3"
config_file: |
mmap: true
parameters:
context_size: 8192
f16: true
mmap: true
backend: "llama-cpp"
template:
chat_message: |
@@ -36,8 +39,6 @@ config_file: |
<|im_start|>assistant
completion: |
{{.Input}}
context_size: 8192
f16: true
stopwords:
- '<|im_end|>'
- '<dummy32000>'

46
go.mod
View File

@@ -8,9 +8,9 @@ require (
github.com/Masterminds/sprig/v3 v3.3.0
github.com/alecthomas/kong v1.14.0
github.com/anthropics/anthropic-sdk-go v1.27.0
github.com/aws/aws-sdk-go-v2 v1.41.4
github.com/aws/aws-sdk-go-v2/config v1.32.12
github.com/aws/aws-sdk-go-v2/credentials v1.19.12
github.com/aws/aws-sdk-go-v2 v1.41.5
github.com/aws/aws-sdk-go-v2/config v1.32.14
github.com/aws/aws-sdk-go-v2/credentials v1.19.14
github.com/aws/aws-sdk-go-v2/service/s3 v1.97.1
github.com/charmbracelet/glamour v0.10.0
github.com/containerd/containerd v1.7.30
@@ -27,7 +27,7 @@ require (
github.com/gpustack/gguf-parser-go v0.24.0
github.com/hpcloud/tail v1.0.0
github.com/ipfs/go-log v1.0.5
github.com/jaypipes/ghw v0.23.0
github.com/jaypipes/ghw v0.24.0
github.com/joho/godotenv v1.5.1
github.com/klauspost/cpuid/v2 v2.3.0
github.com/labstack/echo/v4 v4.15.1
@@ -39,7 +39,7 @@ require (
github.com/mudler/cogito v0.9.5-0.20260315222927-63abdec7189b
github.com/mudler/edgevpn v0.31.1
github.com/mudler/go-processmanager v0.1.0
github.com/mudler/memory v0.0.0-20251216220809-d1256471a6c2
github.com/mudler/memory v0.0.0-20260406210934-424c1ecf2cf8
github.com/mudler/xlog v0.0.6
github.com/nats-io/nats.go v1.50.0
github.com/onsi/ginkgo/v2 v2.28.1
@@ -60,11 +60,11 @@ require (
github.com/testcontainers/testcontainers-go v0.41.0
github.com/testcontainers/testcontainers-go/modules/nats v0.41.0
github.com/testcontainers/testcontainers-go/modules/postgres v0.41.0
go.opentelemetry.io/otel v1.42.0
go.opentelemetry.io/otel/exporters/prometheus v0.64.0
go.opentelemetry.io/otel/metric v1.42.0
go.opentelemetry.io/otel/sdk/metric v1.42.0
google.golang.org/grpc v1.79.3
go.opentelemetry.io/otel v1.43.0
go.opentelemetry.io/otel/exporters/prometheus v0.65.0
go.opentelemetry.io/otel/metric v1.43.0
go.opentelemetry.io/otel/sdk/metric v1.43.0
google.golang.org/grpc v1.80.0
google.golang.org/protobuf v1.36.11
gopkg.in/yaml.v3 v3.0.1
gorm.io/driver/postgres v1.6.0
@@ -75,19 +75,19 @@ require (
require (
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.7 // indirect
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.20 // indirect
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.20 // indirect
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.20 // indirect
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.21 // indirect
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.21 // indirect
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.21 // indirect
github.com/aws/aws-sdk-go-v2/internal/ini v1.8.6 // indirect
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.21 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.7 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.12 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.20 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.21 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.20 // indirect
github.com/aws/aws-sdk-go-v2/service/signin v1.0.8 // indirect
github.com/aws/aws-sdk-go-v2/service/sso v1.30.13 // indirect
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.17 // indirect
github.com/aws/aws-sdk-go-v2/service/sts v1.41.9 // indirect
github.com/aws/aws-sdk-go-v2/service/signin v1.0.9 // indirect
github.com/aws/aws-sdk-go-v2/service/sso v1.30.15 // indirect
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.19 // indirect
github.com/aws/aws-sdk-go-v2/service/sts v1.41.10 // indirect
github.com/aws/smithy-go v1.24.2 // indirect
github.com/go-jose/go-jose/v4 v4.1.3 // indirect
github.com/grpc-ecosystem/grpc-gateway/v2 v2.26.3 // indirect
@@ -413,7 +413,7 @@ require (
github.com/power-devops/perfstat v0.0.0-20240221224432-82ca36839d55 // indirect
github.com/prometheus/client_model v0.6.2 // indirect
github.com/prometheus/common v0.67.5 // indirect
github.com/prometheus/procfs v0.19.2 // indirect
github.com/prometheus/procfs v0.20.1 // indirect
github.com/quic-go/qpack v0.5.1 // indirect
github.com/quic-go/quic-go v0.54.1 // indirect
github.com/quic-go/webtransport-go v0.9.0 // indirect
@@ -438,8 +438,8 @@ require (
github.com/yuin/goldmark-emoji v1.0.5 // indirect
github.com/yusufpapurcu/wmi v1.2.4 // indirect
go.opencensus.io v0.24.0 // indirect
go.opentelemetry.io/otel/sdk v1.42.0 // indirect
go.opentelemetry.io/otel/trace v1.42.0 // indirect
go.opentelemetry.io/otel/sdk v1.43.0 // indirect
go.opentelemetry.io/otel/trace v1.43.0 // indirect
go.uber.org/dig v1.19.0 // indirect
go.uber.org/fx v1.24.0 // indirect
go.uber.org/multierr v1.11.0 // indirect
@@ -455,8 +455,8 @@ require (
golang.zx2c4.com/wintun v0.0.0-20230126152724-0fa3db229ce2 // indirect
golang.zx2c4.com/wireguard v0.0.0-20250521234502-f333402bd9cb // indirect
golang.zx2c4.com/wireguard/windows v0.5.3 // indirect
gonum.org/v1/gonum v0.16.0 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20251202230838-ff82c1b0f217 // indirect
gonum.org/v1/gonum v0.17.0 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20260120221211-b8f7ae30c516 // indirect
gopkg.in/fsnotify.v1 v1.4.7 // indirect
gopkg.in/tomb.v1 v1.0.0-20141024135613-dd632973f1e7 // indirect
howett.net/plist v1.0.2-0.20250314012144-ee69052608d9 // indirect

94
go.sum
View File

@@ -70,20 +70,20 @@ github.com/anthropics/anthropic-sdk-go v1.27.0 h1:0CWbmBq5ofGAjF2H6lefCNRbnaUMGi
github.com/anthropics/anthropic-sdk-go v1.27.0/go.mod h1:qUKmaW+uuPB64iy1l+4kOSvaLqPXnHTTBKH6RVZ7q5Q=
github.com/armon/go-socks5 v0.0.0-20160902184237-e75332964ef5 h1:0CwZNZbxp69SHPdPJAN/hZIm0C4OItdklCFmMRWYpio=
github.com/armon/go-socks5 v0.0.0-20160902184237-e75332964ef5/go.mod h1:wHh0iHkYZB8zMSxRWpUBQtwG5a7fFgvEO+odwuTv2gs=
github.com/aws/aws-sdk-go-v2 v1.41.4 h1:10f50G7WyU02T56ox1wWXq+zTX9I1zxG46HYuG1hH/k=
github.com/aws/aws-sdk-go-v2 v1.41.4/go.mod h1:mwsPRE8ceUUpiTgF7QmQIJ7lgsKUPQOUl3o72QBrE1o=
github.com/aws/aws-sdk-go-v2 v1.41.5 h1:dj5kopbwUsVUVFgO4Fi5BIT3t4WyqIDjGKCangnV/yY=
github.com/aws/aws-sdk-go-v2 v1.41.5/go.mod h1:mwsPRE8ceUUpiTgF7QmQIJ7lgsKUPQOUl3o72QBrE1o=
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.7 h1:3kGOqnh1pPeddVa/E37XNTaWJ8W6vrbYV9lJEkCnhuY=
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.7/go.mod h1:lyw7GFp3qENLh7kwzf7iMzAxDn+NzjXEAGjKS2UOKqI=
github.com/aws/aws-sdk-go-v2/config v1.32.12 h1:O3csC7HUGn2895eNrLytOJQdoL2xyJy0iYXhoZ1OmP0=
github.com/aws/aws-sdk-go-v2/config v1.32.12/go.mod h1:96zTvoOFR4FURjI+/5wY1vc1ABceROO4lWgWJuxgy0g=
github.com/aws/aws-sdk-go-v2/credentials v1.19.12 h1:oqtA6v+y5fZg//tcTWahyN9PEn5eDU/Wpvc2+kJ4aY8=
github.com/aws/aws-sdk-go-v2/credentials v1.19.12/go.mod h1:U3R1RtSHx6NB0DvEQFGyf/0sbrpJrluENHdPy1j/3TE=
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.20 h1:zOgq3uezl5nznfoK3ODuqbhVg1JzAGDUhXOsU0IDCAo=
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.20/go.mod h1:z/MVwUARehy6GAg/yQ1GO2IMl0k++cu1ohP9zo887wE=
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.20 h1:CNXO7mvgThFGqOFgbNAP2nol2qAWBOGfqR/7tQlvLmc=
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.20/go.mod h1:oydPDJKcfMhgfcgBUZaG+toBbwy8yPWubJXBVERtI4o=
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.20 h1:tN6W/hg+pkM+tf9XDkWUbDEjGLb+raoBMFsTodcoYKw=
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.20/go.mod h1:YJ898MhD067hSHA6xYCx5ts/jEd8BSOLtQDL3iZsvbc=
github.com/aws/aws-sdk-go-v2/config v1.32.14 h1:opVIRo/ZbbI8OIqSOKmpFaY7IwfFUOCCXBsUpJOwDdI=
github.com/aws/aws-sdk-go-v2/config v1.32.14/go.mod h1:U4/V0uKxh0Tl5sxmCBZ3AecYny4UNlVmObYjKuuaiOo=
github.com/aws/aws-sdk-go-v2/credentials v1.19.14 h1:n+UcGWAIZHkXzYt87uMFBv/l8THYELoX6gVcUvgl6fI=
github.com/aws/aws-sdk-go-v2/credentials v1.19.14/go.mod h1:cJKuyWB59Mqi0jM3nFYQRmnHVQIcgoxjEMAbLkpr62w=
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.21 h1:NUS3K4BTDArQqNu2ih7yeDLaS3bmHD0YndtA6UP884g=
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.21/go.mod h1:YWNWJQNjKigKY1RHVJCuupeWDrrHjRqHm0N9rdrWzYI=
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.21 h1:Rgg6wvjjtX8bNHcvi9OnXWwcE0a2vGpbwmtICOsvcf4=
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.21/go.mod h1:A/kJFst/nm//cyqonihbdpQZwiUhhzpqTsdbhDdRF9c=
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.21 h1:PEgGVtPoB6NTpPrBgqSE5hE/o47Ij9qk/SEZFbUOe9A=
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.21/go.mod h1:p+hz+PRAYlY3zcpJhPwXlLC4C+kqn70WIHwnzAfs6ps=
github.com/aws/aws-sdk-go-v2/internal/ini v1.8.6 h1:qYQ4pzQ2Oz6WpQ8T3HvGHnZydA72MnLuFK9tJwmrbHw=
github.com/aws/aws-sdk-go-v2/internal/ini v1.8.6/go.mod h1:O3h0IK87yXci+kg6flUKzJnWeziQUKciKrLjcatSNcY=
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.21 h1:SwGMTMLIlvDNyhMteQ6r8IJSBPlRdXX5d4idhIGbkXA=
@@ -92,20 +92,20 @@ github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.7 h1:5EniKhL
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.7/go.mod h1:x0nZssQ3qZSnIcePWLvcoFisRXJzcTVvYpAAdYX8+GI=
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.12 h1:qtJZ70afD3ISKWnoX3xB0J2otEqu3LqicRcDBqsj0hQ=
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.12/go.mod h1:v2pNpJbRNl4vEUWEh5ytQok0zACAKfdmKS51Hotc3pQ=
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.20 h1:2HvVAIq+YqgGotK6EkMf+KIEqTISmTYh5zLpYyeTo1Y=
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.20/go.mod h1:V4X406Y666khGa8ghKmphma/7C0DAtEQYhkq9z4vpbk=
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.21 h1:c31//R3xgIJMSC8S6hEVq+38DcvUlgFY0FM6mSI5oto=
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.21/go.mod h1:r6+pf23ouCB718FUxaqzZdbpYFyDtehyZcmP5KL9FkA=
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.20 h1:siU1A6xjUZ2N8zjTHSXFhB9L/2OY8Dqs0xXiLjF30jA=
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.20/go.mod h1:4TLZCmVJDM3FOu5P5TJP0zOlu9zWgDWU7aUxWbr+rcw=
github.com/aws/aws-sdk-go-v2/service/s3 v1.97.1 h1:csi9NLpFZXb9fxY7rS1xVzgPRGMt7MSNWeQ6eo247kE=
github.com/aws/aws-sdk-go-v2/service/s3 v1.97.1/go.mod h1:qXVal5H0ChqXP63t6jze5LmFalc7+ZE7wOdLtZ0LCP0=
github.com/aws/aws-sdk-go-v2/service/signin v1.0.8 h1:0GFOLzEbOyZABS3PhYfBIx2rNBACYcKty+XGkTgw1ow=
github.com/aws/aws-sdk-go-v2/service/signin v1.0.8/go.mod h1:LXypKvk85AROkKhOG6/YEcHFPoX+prKTowKnVdcaIxE=
github.com/aws/aws-sdk-go-v2/service/sso v1.30.13 h1:kiIDLZ005EcKomYYITtfsjn7dtOwHDOFy7IbPXKek2o=
github.com/aws/aws-sdk-go-v2/service/sso v1.30.13/go.mod h1:2h/xGEowcW/g38g06g3KpRWDlT+OTfxxI0o1KqayAB8=
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.17 h1:jzKAXIlhZhJbnYwHbvUQZEB8KfgAEuG0dc08Bkda7NU=
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.17/go.mod h1:Al9fFsXjv4KfbzQHGe6V4NZSZQXecFcvaIF4e70FoRA=
github.com/aws/aws-sdk-go-v2/service/sts v1.41.9 h1:Cng+OOwCHmFljXIxpEVXAGMnBia8MSU6Ch5i9PgBkcU=
github.com/aws/aws-sdk-go-v2/service/sts v1.41.9/go.mod h1:LrlIndBDdjA/EeXeyNBle+gyCwTlizzW5ycgWnvIxkk=
github.com/aws/aws-sdk-go-v2/service/signin v1.0.9 h1:QKZH0S178gCmFEgst8hN0mCX1KxLgHBKKY/CLqwP8lg=
github.com/aws/aws-sdk-go-v2/service/signin v1.0.9/go.mod h1:7yuQJoT+OoH8aqIxw9vwF+8KpvLZ8AWmvmUWHsGQZvI=
github.com/aws/aws-sdk-go-v2/service/sso v1.30.15 h1:lFd1+ZSEYJZYvv9d6kXzhkZu07si3f+GQ1AaYwa2LUM=
github.com/aws/aws-sdk-go-v2/service/sso v1.30.15/go.mod h1:WSvS1NLr7JaPunCXqpJnWk1Bjo7IxzZXrZi1QQCkuqM=
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.19 h1:dzztQ1YmfPrxdrOiuZRMF6fuOwWlWpD2StNLTceKpys=
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.19/go.mod h1:YO8TrYtFdl5w/4vmjL8zaBSsiNp3w0L1FfKVKenZT7w=
github.com/aws/aws-sdk-go-v2/service/sts v1.41.10 h1:p8ogvvLugcR/zLBXTXrTkj0RYBUdErbMnAFFp12Lm/U=
github.com/aws/aws-sdk-go-v2/service/sts v1.41.10/go.mod h1:60dv0eZJfeVXfbT1tFJinbHrDfSJ2GZl4Q//OSSNAVw=
github.com/aws/smithy-go v1.24.2 h1:FzA3bu/nt/vDvmnkg+R8Xl46gmzEDam6mZ1hzmwXFng=
github.com/aws/smithy-go v1.24.2/go.mod h1:YE2RhdIuDbA5E5bTdciG9KrW3+TiEONeUWCqxX9i1Fc=
github.com/aymanbagabas/go-osc52/v2 v2.0.1 h1:HwpRHbFMcZLEVr42D4p7XBqjyuxQH5SMiErDT4WkJ2k=
@@ -516,8 +516,8 @@ github.com/jackc/puddle/v2 v2.2.2 h1:PR8nw+E/1w0GLuRFSmiioY6UooMp6KJv0/61nB7icHo
github.com/jackc/puddle/v2 v2.2.2/go.mod h1:vriiEXHvEE654aYKXXjOvZM39qJ0q+azkZFrfEOc3H4=
github.com/jackpal/go-nat-pmp v1.0.2 h1:KzKSgb7qkJvOUTqYl9/Hg/me3pWgBmERKrTGD7BdWus=
github.com/jackpal/go-nat-pmp v1.0.2/go.mod h1:QPH045xvCAeXUZOxsnwmrtiCoxIr9eob+4orBN1SBKc=
github.com/jaypipes/ghw v0.23.0 h1:WOL4hpLcIu1kIm+z5Oz19Tk1HNw/Sncrx/6GS8O0Kl0=
github.com/jaypipes/ghw v0.23.0/go.mod h1:fUNUjMZ0cjahKo+/u+32m9FutIx53Nkbi0Ti0m7j5HY=
github.com/jaypipes/ghw v0.24.0 h1:6RBrJzvHvZ0t+hSvqPmOd5b21C4fMsyiyFzWljEj8Wg=
github.com/jaypipes/ghw v0.24.0/go.mod h1:Qk3UjdH8Xu/OiVyb/eDJqnDsUc+awHU75y23ErZU33s=
github.com/jaypipes/pcidb v1.1.1 h1:QmPhpsbmmnCwZmHeYAATxEaoRuiMAJusKYkUncMC0ro=
github.com/jaypipes/pcidb v1.1.1/go.mod h1:x27LT2krrUgjf875KxQXKB0Ha/YXLdZRVmw6hH0G7g8=
github.com/jbenet/go-context v0.0.0-20150711004518-d14ea06fba99 h1:BQSFePA1RWJOlocH6Fxy8MmwDt+yVQYULKfN0RoTN8A=
@@ -723,6 +723,8 @@ github.com/mudler/localrecall v0.5.9-0.20260321005011-810084e9369b h1:XeAnOEOOSK
github.com/mudler/localrecall v0.5.9-0.20260321005011-810084e9369b/go.mod h1:xuPtgL9zUyiQLmspYzO3kaboYrGbWmwi8BQPt1aCAcs=
github.com/mudler/memory v0.0.0-20251216220809-d1256471a6c2 h1:+WHsL/j6EWOMUiMVIOJNKOwSKiQt/qDPc9fePCf87fA=
github.com/mudler/memory v0.0.0-20251216220809-d1256471a6c2/go.mod h1:EA8Ashhd56o32qN7ouPKFSRUs/Z+LrRCF4v6R2Oarm8=
github.com/mudler/memory v0.0.0-20260406210934-424c1ecf2cf8 h1:Ry8RiWy8fZ6Ff4E7dPmjRsBrnHOnPeOOj2LhCgyjQu0=
github.com/mudler/memory v0.0.0-20260406210934-424c1ecf2cf8/go.mod h1:EA8Ashhd56o32qN7ouPKFSRUs/Z+LrRCF4v6R2Oarm8=
github.com/mudler/skillserver v0.0.6 h1:ixz6wUekLdTmbnpAavCkTydDF6UdXAG3ncYufSPK9G0=
github.com/mudler/skillserver v0.0.6/go.mod h1:z3yFhcL9bSykmmh6xgGu0hyoItd4CnxgtWMEWw8uFJU=
github.com/mudler/water v0.0.0-20250808092830-dd90dcf09025 h1:WFLP5FHInarYGXi6B/Ze204x7Xy6q/I4nCZnWEyPHK0=
@@ -882,8 +884,8 @@ github.com/prometheus/common v0.67.5/go.mod h1:SjE/0MzDEEAyrdr5Gqc6G+sXI67maCxza
github.com/prometheus/otlptranslator v1.0.0 h1:s0LJW/iN9dkIH+EnhiD3BlkkP5QVIUVEoIwkU+A6qos=
github.com/prometheus/otlptranslator v1.0.0/go.mod h1:vRYWnXvI6aWGpsdY/mOT/cbeVRBlPWtBNDb7kGR3uKM=
github.com/prometheus/procfs v0.0.0-20180725123919-05ee40e3a273/go.mod h1:c3At6R/oaqEKCNdg8wHV1ftS6bRYblBhIjjI8uT2IGk=
github.com/prometheus/procfs v0.19.2 h1:zUMhqEW66Ex7OXIiDkll3tl9a1ZdilUOd/F6ZXw4Vws=
github.com/prometheus/procfs v0.19.2/go.mod h1:M0aotyiemPhBCM0z5w87kL22CxfcH05ZpYlu+b4J7mw=
github.com/prometheus/procfs v0.20.1 h1:XwbrGOIplXW/AU3YhIhLODXMJYyC1isLFfYCsTEycfc=
github.com/prometheus/procfs v0.20.1/go.mod h1:o9EMBZGRyvDrSPH1RqdxhojkuXstoe4UlK79eF5TGGo=
github.com/quic-go/qpack v0.5.1 h1:giqksBPnT/HDtZ6VhtFKgoLOWmlyo9Ei6u9PqzIMbhI=
github.com/quic-go/qpack v0.5.1/go.mod h1:+PC4XFrEskIVkcLzpEkbLqq1uCoxPhQuvK5rH1ZgaEg=
github.com/quic-go/quic-go v0.54.1 h1:4ZAWm0AhCb6+hE+l5Q1NAL0iRn/ZrMwqHRGQiFwj2eg=
@@ -1097,22 +1099,22 @@ go.opentelemetry.io/auto/sdk v1.2.1 h1:jXsnJ4Lmnqd11kwkBV2LgLoFMZKizbCi5fNZ/ipaZ
go.opentelemetry.io/auto/sdk v1.2.1/go.mod h1:KRTj+aOaElaLi+wW1kO/DZRXwkF4C5xPbEe3ZiIhN7Y=
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.61.0 h1:F7Jx+6hwnZ41NSFTO5q4LYDtJRXBf2PD0rNBkeB/lus=
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.61.0/go.mod h1:UHB22Z8QsdRDrnAtX4PntOl36ajSxcdUMt1sF7Y6E7Q=
go.opentelemetry.io/otel v1.42.0 h1:lSQGzTgVR3+sgJDAU/7/ZMjN9Z+vUip7leaqBKy4sho=
go.opentelemetry.io/otel v1.42.0/go.mod h1:lJNsdRMxCUIWuMlVJWzecSMuNjE7dOYyWlqOXWkdqCc=
go.opentelemetry.io/otel v1.43.0 h1:mYIM03dnh5zfN7HautFE4ieIig9amkNANT+xcVxAj9I=
go.opentelemetry.io/otel v1.43.0/go.mod h1:JuG+u74mvjvcm8vj8pI5XiHy1zDeoCS2LB1spIq7Ay0=
go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.38.0 h1:GqRJVj7UmLjCVyVJ3ZFLdPRmhDUp2zFmQe3RHIOsw24=
go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.38.0/go.mod h1:ri3aaHSmCTVYu2AWv44YMauwAQc0aqI9gHKIcSbI1pU=
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.41.0 h1:inYW9ZhgqiDqh6BioM7DVHHzEGVq76Db5897WLGZ5Go=
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.41.0/go.mod h1:Izur+Wt8gClgMJqO/cZ8wdeeMryJ/xxiOVgFSSfpDTY=
go.opentelemetry.io/otel/exporters/prometheus v0.64.0 h1:g0LRDXMX/G1SEZtK8zl8Chm4K6GBwRkjPKE36LxiTYs=
go.opentelemetry.io/otel/exporters/prometheus v0.64.0/go.mod h1:UrgcjnarfdlBDP3GjDIJWe6HTprwSazNjwsI+Ru6hro=
go.opentelemetry.io/otel/metric v1.42.0 h1:2jXG+3oZLNXEPfNmnpxKDeZsFI5o4J+nz6xUlaFdF/4=
go.opentelemetry.io/otel/metric v1.42.0/go.mod h1:RlUN/7vTU7Ao/diDkEpQpnz3/92J9ko05BIwxYa2SSI=
go.opentelemetry.io/otel/sdk v1.42.0 h1:LyC8+jqk6UJwdrI/8VydAq/hvkFKNHZVIWuslJXYsDo=
go.opentelemetry.io/otel/sdk v1.42.0/go.mod h1:rGHCAxd9DAph0joO4W6OPwxjNTYWghRWmkHuGbayMts=
go.opentelemetry.io/otel/sdk/metric v1.42.0 h1:D/1QR46Clz6ajyZ3G8SgNlTJKBdGp84q9RKCAZ3YGuA=
go.opentelemetry.io/otel/sdk/metric v1.42.0/go.mod h1:Ua6AAlDKdZ7tdvaQKfSmnFTdHx37+J4ba8MwVCYM5hc=
go.opentelemetry.io/otel/trace v1.42.0 h1:OUCgIPt+mzOnaUTpOQcBiM/PLQ/Op7oq6g4LenLmOYY=
go.opentelemetry.io/otel/trace v1.42.0/go.mod h1:f3K9S+IFqnumBkKhRJMeaZeNk9epyhnCmQh/EysQCdc=
go.opentelemetry.io/otel/exporters/prometheus v0.65.0 h1:jOveH/b4lU9HT7y+Gfamf18BqlOuz2PWEvs8yM7Q6XE=
go.opentelemetry.io/otel/exporters/prometheus v0.65.0/go.mod h1:i1P8pcumauPtUI4YNopea1dhzEMuEqWP1xoUZDylLHo=
go.opentelemetry.io/otel/metric v1.43.0 h1:d7638QeInOnuwOONPp4JAOGfbCEpYb+K6DVWvdxGzgM=
go.opentelemetry.io/otel/metric v1.43.0/go.mod h1:RDnPtIxvqlgO8GRW18W6Z/4P462ldprJtfxHxyKd2PY=
go.opentelemetry.io/otel/sdk v1.43.0 h1:pi5mE86i5rTeLXqoF/hhiBtUNcrAGHLKQdhg4h4V9Dg=
go.opentelemetry.io/otel/sdk v1.43.0/go.mod h1:P+IkVU3iWukmiit/Yf9AWvpyRDlUeBaRg6Y+C58QHzg=
go.opentelemetry.io/otel/sdk/metric v1.43.0 h1:S88dyqXjJkuBNLeMcVPRFXpRw2fuwdvfCGLEo89fDkw=
go.opentelemetry.io/otel/sdk/metric v1.43.0/go.mod h1:C/RJtwSEJ5hzTiUz5pXF1kILHStzb9zFlIEe85bhj6A=
go.opentelemetry.io/otel/trace v1.43.0 h1:BkNrHpup+4k4w+ZZ86CZoHHEkohws8AY+WTX09nk+3A=
go.opentelemetry.io/otel/trace v1.43.0/go.mod h1:/QJhyVBUUswCphDVxq+8mld+AvhXZLhe+8WVFxiFff0=
go.opentelemetry.io/proto/otlp v1.8.0 h1:fRAZQDcAFHySxpJ1TwlA1cJ4tvcrw7nXl9xWWC8N5CE=
go.opentelemetry.io/proto/otlp v1.8.0/go.mod h1:tIeYOeNBU4cvmPqpaji1P+KbB4Oloai8wN4rWzRrFF0=
go.starlark.net v0.0.0-20250417143717-f57e51f710eb h1:zOg9DxxrorEmgGUr5UPdCEwKqiqG0MlZciuCuA3XiDE=
@@ -1342,8 +1344,8 @@ golang.zx2c4.com/wireguard v0.0.0-20250521234502-f333402bd9cb h1:whnFRlWMcXI9d+Z
golang.zx2c4.com/wireguard v0.0.0-20250521234502-f333402bd9cb/go.mod h1:rpwXGsirqLqN2L0JDJQlwOboGHmptD5ZD6T2VmcqhTw=
golang.zx2c4.com/wireguard/windows v0.5.3 h1:On6j2Rpn3OEMXqBq00QEDC7bWSZrPIHKIus8eIuExIE=
golang.zx2c4.com/wireguard/windows v0.5.3/go.mod h1:9TEe8TJmtwyQebdFwAkEWOPr3prrtqm+REGFifP60hI=
gonum.org/v1/gonum v0.16.0 h1:5+ul4Swaf3ESvrOnidPp4GZbzf0mxVQpDCYUQE7OJfk=
gonum.org/v1/gonum v0.16.0/go.mod h1:fef3am4MQ93R2HHpKnLk4/Tbh/s0+wqD5nfa6Pnwy4E=
gonum.org/v1/gonum v0.17.0 h1:VbpOemQlsSMrYmn7T2OUvQ4dqxQXU+ouZFQsZOx50z4=
gonum.org/v1/gonum v0.17.0/go.mod h1:El3tOrEuMpv2UdMrbNlKEh9vd86bmQ6vqIcDwxEOc1E=
google.golang.org/api v0.0.0-20180910000450-7ca32eb868bf/go.mod h1:4mhQ8q/RsB7i+udVvVy5NUi08OU8ZlA0gRVgrF7VFY0=
google.golang.org/api v0.0.0-20181030000543-1d582fd0359e/go.mod h1:4mhQ8q/RsB7i+udVvVy5NUi08OU8ZlA0gRVgrF7VFY0=
google.golang.org/api v0.1.0/go.mod h1:UGEZY7KEX120AnNLIHFMKIo4obdJhkp2tPbaPlQx13Y=
@@ -1361,10 +1363,10 @@ google.golang.org/genproto v0.0.0-20190306203927-b5d61aea6440/go.mod h1:VzzqZJRn
google.golang.org/genproto v0.0.0-20190819201941-24fa4b261c55/go.mod h1:DMBHOl98Agz4BDEuKkezgsaosCRResVns1a3J2ZsMNc=
google.golang.org/genproto v0.0.0-20200526211855-cb27e3aa2013/go.mod h1:NbSheEEYHJ7i3ixzK3sjbqSGDJWnxyFXZblF3eUsNvo=
google.golang.org/genproto v0.0.0-20241118233622-e639e219e697 h1:ToEetK57OidYuqD4Q5w+vfEnPvPpuTwedCNVohYJfNk=
google.golang.org/genproto/googleapis/api v0.0.0-20251202230838-ff82c1b0f217 h1:fCvbg86sFXwdrl5LgVcTEvNC+2txB5mgROGmRL5mrls=
google.golang.org/genproto/googleapis/api v0.0.0-20251202230838-ff82c1b0f217/go.mod h1:+rXWjjaukWZun3mLfjmVnQi18E1AsFbDN9QdJ5YXLto=
google.golang.org/genproto/googleapis/rpc v0.0.0-20251202230838-ff82c1b0f217 h1:gRkg/vSppuSQoDjxyiGfN4Upv/h/DQmIR10ZU8dh4Ww=
google.golang.org/genproto/googleapis/rpc v0.0.0-20251202230838-ff82c1b0f217/go.mod h1:7i2o+ce6H/6BluujYR+kqX3GKH+dChPTQU19wjRPiGk=
google.golang.org/genproto/googleapis/api v0.0.0-20260120221211-b8f7ae30c516 h1:vmC/ws+pLzWjj/gzApyoZuSVrDtF1aod4u/+bbj8hgM=
google.golang.org/genproto/googleapis/api v0.0.0-20260120221211-b8f7ae30c516/go.mod h1:p3MLuOwURrGBRoEyFHBT3GjUwaCQVKeNqqWxlcISGdw=
google.golang.org/genproto/googleapis/rpc v0.0.0-20260120221211-b8f7ae30c516 h1:sNrWoksmOyF5bvJUcnmbeAmQi8baNhqg5IWaI3llQqU=
google.golang.org/genproto/googleapis/rpc v0.0.0-20260120221211-b8f7ae30c516/go.mod h1:j9x/tPzZkyxcgEFkiKEEGxfvyumM01BEtsW8xzOahRQ=
google.golang.org/grpc v1.14.0/go.mod h1:yo6s7OP7yaDglbqo1J04qKzAhqBH6lvTonzMVmEdcZw=
google.golang.org/grpc v1.16.0/go.mod h1:0JHn/cJsOMiMfNA9+DeHDlAU7KAAB5GDlYFpa9MZMio=
google.golang.org/grpc v1.17.0/go.mod h1:6QZJwpn2B+Zp71q/5VxRsJ6NXXVCE5NRUHRo+f3cWCs=
@@ -1373,8 +1375,8 @@ google.golang.org/grpc v1.23.0/go.mod h1:Y5yQAOtifL1yxbo5wqy6BxZv8vAUGQwXBOALyac
google.golang.org/grpc v1.25.1/go.mod h1:c3i+UQWmh7LiEpx4sFZnkU36qjEYZ0imhYfXVyQciAY=
google.golang.org/grpc v1.27.0/go.mod h1:qbnxyOmOxrQa7FizSgH+ReBfzJrCY1pSN7KXBS8abTk=
google.golang.org/grpc v1.33.2/go.mod h1:JMHMWHQWaTccqQQlmk3MJZS+GWXOdAesneDmEnv2fbc=
google.golang.org/grpc v1.79.3 h1:sybAEdRIEtvcD68Gx7dmnwjZKlyfuc61Dyo9pGXXkKE=
google.golang.org/grpc v1.79.3/go.mod h1:KmT0Kjez+0dde/v2j9vzwoAScgEPx/Bw1CYChhHLrHQ=
google.golang.org/grpc v1.80.0 h1:Xr6m2WmWZLETvUNvIUmeD5OAagMw3FiKmMlTdViWsHM=
google.golang.org/grpc v1.80.0/go.mod h1:ho/dLnxwi3EDJA4Zghp7k2Ec1+c2jqup0bFkw07bwF4=
google.golang.org/protobuf v0.0.0-20200109180630-ec00e32a8dfd/go.mod h1:DFci5gLYBciE7Vtevhsrf46CRTquxDuWsQurQQe4oz8=
google.golang.org/protobuf v0.0.0-20200221191635-4d8936d0db64/go.mod h1:kwYJMbMJ01Woi6D6+Kah6886xMZcty6N08ah7+eCXa0=
google.golang.org/protobuf v0.0.0-20200228230310-ab0ca4ff8a60/go.mod h1:cfTl7dwQJ+fmap5saPgwCLgHXTUD7jkjRqWcaiX5VyM=

View File

@@ -19,6 +19,7 @@ const (
VendorNVIDIA = "nvidia"
VendorAMD = "amd"
VendorIntel = "intel"
VendorApple = "apple"
VendorVulkan = "vulkan"
VendorUnknown = "unknown"
)
@@ -29,7 +30,8 @@ const (
var UnifiedMemoryDevices = []string{
"NVIDIA GB10",
"GB10",
// Add more unified memory devices here as needed
"NVIDIA Thor",
"Thor",
}
// GPUMemoryInfo contains real-time GPU memory usage information
@@ -196,6 +198,12 @@ func DetectGPUVendor() (string, error) {
return VendorVulkan, nil
}
// Check for Apple Silicon (macOS)
if appleGPUs := getAppleGPUMemory(); len(appleGPUs) > 0 {
xlog.Debug("GPU vendor detected via system_profiler", "vendor", VendorApple)
return VendorApple, nil
}
// No vendor detected
return "", nil
}
@@ -258,6 +266,12 @@ func GetGPUMemoryUsage() []GPUMemoryInfo {
gpus = append(gpus, vulkanGPUs...)
}
// Try Apple Silicon (macOS only)
if len(gpus) == 0 {
appleGPUs := getAppleGPUMemory()
gpus = append(gpus, appleGPUs...)
}
return gpus
}
@@ -351,18 +365,44 @@ func getNVIDIAGPUMemory() []GPUMemoryInfo {
usagePercent = float64(usedBytes) / float64(totalBytes) * 100
}
} else if isNA {
// Unknown device with N/A values - skip memory info
xlog.Debug("nvidia-smi returned N/A for unknown device", "device", name)
gpus = append(gpus, GPUMemoryInfo{
Index: idx,
Name: name,
Vendor: VendorNVIDIA,
TotalVRAM: 0,
UsedVRAM: 0,
FreeVRAM: 0,
UsagePercent: 0,
})
continue
// Check if this is a Tegra/Jetson device — if so, it uses unified memory
if isTegraDevice() {
xlog.Debug("nvidia-smi returned N/A on Tegra device, using system RAM", "device", name)
sysInfo, err := GetSystemRAMInfo()
if err != nil {
xlog.Debug("failed to get system RAM for Tegra device", "error", err, "device", name)
gpus = append(gpus, GPUMemoryInfo{
Index: idx,
Name: name,
Vendor: VendorNVIDIA,
TotalVRAM: 0,
UsedVRAM: 0,
FreeVRAM: 0,
UsagePercent: 0,
})
continue
}
totalBytes = sysInfo.Total
usedBytes = sysInfo.Used
freeBytes = sysInfo.Free
if totalBytes > 0 {
usagePercent = float64(usedBytes) / float64(totalBytes) * 100
}
} else {
// Truly unknown device with N/A values - skip memory info
xlog.Debug("nvidia-smi returned N/A for unknown device", "device", name)
gpus = append(gpus, GPUMemoryInfo{
Index: idx,
Name: name,
Vendor: VendorNVIDIA,
TotalVRAM: 0,
UsedVRAM: 0,
FreeVRAM: 0,
UsagePercent: 0,
})
continue
}
} else {
// Normal GPU with dedicated VRAM
totalMB, _ := strconv.ParseFloat(totalStr, 64)
@@ -790,3 +830,84 @@ func getVulkanGPUMemory() []GPUMemoryInfo {
return gpus
}
// getAppleGPUMemory detects Apple Silicon GPUs using system_profiler (macOS only).
// Apple Silicon uses unified memory, so GPU memory is reported as system RAM.
func getAppleGPUMemory() []GPUMemoryInfo {
if _, err := exec.LookPath("system_profiler"); err != nil {
return nil
}
cmd := exec.Command("system_profiler", "SPDisplaysDataType", "-json")
var stdout, stderr bytes.Buffer
cmd.Stdout = &stdout
cmd.Stderr = &stderr
if err := cmd.Run(); err != nil {
xlog.Debug("system_profiler failed", "error", err, "stderr", stderr.String())
return nil
}
var result struct {
SPDisplaysDataType []struct {
Name string `json:"_name"`
Model string `json:"sppci_model"`
Cores string `json:"sppci_cores"`
DeviceType string `json:"sppci_device_type"`
Vendor string `json:"spdisplays_vendor"`
} `json:"SPDisplaysDataType"`
}
if err := json.Unmarshal(stdout.Bytes(), &result); err != nil {
xlog.Debug("failed to parse system_profiler output", "error", err)
return nil
}
var gpus []GPUMemoryInfo
for i, display := range result.SPDisplaysDataType {
if display.DeviceType != "spdisplays_gpu" {
continue
}
if !strings.Contains(strings.ToLower(display.Vendor), "apple") {
continue
}
name := display.Model
if name == "" {
name = display.Name
}
if name == "" {
name = "Apple GPU"
}
// Apple Silicon uses unified memory — report system RAM
ramInfo, err := GetSystemRAMInfo()
if err != nil {
xlog.Debug("Apple GPU detected but failed to get system RAM", "error", err)
gpus = append(gpus, GPUMemoryInfo{
Index: i,
Name: name,
Vendor: VendorApple,
})
continue
}
usagePercent := 0.0
if ramInfo.Total > 0 {
usagePercent = float64(ramInfo.Used) / float64(ramInfo.Total) * 100
}
xlog.Debug("Apple Silicon GPU detected (unified memory)", "device", name, "total_ram", ramInfo.Total)
gpus = append(gpus, GPUMemoryInfo{
Index: i,
Name: name,
Vendor: VendorApple,
TotalVRAM: ramInfo.Total,
UsedVRAM: ramInfo.Used,
FreeVRAM: ramInfo.Free,
UsagePercent: usagePercent,
})
}
return gpus
}

View File

@@ -383,5 +383,144 @@ var _ = Describe("Anthropic API E2E test", func() {
Expect(string(message.StopReason)).To(Equal("tool_use"))
})
})
Context("ChatDeltas (C++ autoparser)", func() {
It("streams tool calls via ChatDeltas", func() {
stream := client.Messages.NewStreaming(context.TODO(), anthropic.MessageNewParams{
Model: "mock-model-autoparser",
MaxTokens: 1024,
Messages: []anthropic.MessageParam{
anthropic.NewUserMessage(anthropic.NewTextBlock("AUTOPARSER_TOOL_CALL What's the weather like in San Francisco?")),
},
Tools: []anthropic.ToolUnionParam{
anthropic.ToolUnionParam{
OfTool: &anthropic.ToolParam{
Name: "get_weather",
Description: anthropic.Opt("Get the current weather in a given location"),
InputSchema: anthropic.ToolInputSchemaParam{
Type: constant.ValueOf[constant.Object](),
Properties: map[string]any{
"location": map[string]any{
"type": "string",
"description": "The city and state",
},
},
Required: []string{"location"},
},
},
},
},
})
message := anthropic.Message{}
hasToolUseStart := false
for stream.Next() {
event := stream.Current()
err := message.Accumulate(event)
Expect(err).ToNot(HaveOccurred())
if e, ok := event.AsAny().(anthropic.ContentBlockStartEvent); ok {
if e.ContentBlock.Type == "tool_use" {
hasToolUseStart = true
}
}
}
Expect(stream.Err()).ToNot(HaveOccurred())
Expect(hasToolUseStart).To(BeTrue(), "Should have tool_use content_block_start event from ChatDeltas")
Expect(string(message.StopReason)).To(Equal("tool_use"))
// Verify tool call is present in accumulated message
foundToolUse := false
for _, block := range message.Content {
if block.Type == "tool_use" {
foundToolUse = true
Expect(block.ID).ToNot(BeEmpty())
}
}
Expect(foundToolUse).To(BeTrue(), "Accumulated message should contain tool_use block from ChatDeltas")
})
It("streams content via ChatDeltas without duplication", func() {
stream := client.Messages.NewStreaming(context.TODO(), anthropic.MessageNewParams{
Model: "mock-model-autoparser",
MaxTokens: 1024,
Messages: []anthropic.MessageParam{
anthropic.NewUserMessage(anthropic.NewTextBlock("AUTOPARSER_CONTENT Tell me about LocalAI")),
},
})
message := anthropic.Message{}
var textDeltas []string
for stream.Next() {
event := stream.Current()
err := message.Accumulate(event)
Expect(err).ToNot(HaveOccurred())
if e, ok := event.AsAny().(anthropic.ContentBlockDeltaEvent); ok {
if e.Delta.Type == "text_delta" && e.Delta.Text != "" {
textDeltas = append(textDeltas, e.Delta.Text)
}
}
}
Expect(stream.Err()).ToNot(HaveOccurred())
Expect(message.Content).ToNot(BeEmpty())
Expect(string(message.StopReason)).To(Equal("end_turn"))
// Content should appear exactly once (no duplication)
fullText := ""
for _, block := range message.Content {
if block.Type == "text" {
fullText += block.Text
}
}
Expect(fullText).To(ContainSubstring("LocalAI"))
// Check that the content is not duplicated by counting occurrences
Expect(len(fullText)).To(BeNumerically("<", 200), "Content should not be duplicated")
})
It("handles tool calls via ChatDeltas in non-streaming mode", func() {
message, err := client.Messages.New(context.TODO(), anthropic.MessageNewParams{
Model: "mock-model-autoparser",
MaxTokens: 1024,
Messages: []anthropic.MessageParam{
anthropic.NewUserMessage(anthropic.NewTextBlock("AUTOPARSER_TOOL_CALL What's the weather like in San Francisco?")),
},
Tools: []anthropic.ToolUnionParam{
anthropic.ToolUnionParam{
OfTool: &anthropic.ToolParam{
Name: "get_weather",
Description: anthropic.Opt("Get the current weather"),
InputSchema: anthropic.ToolInputSchemaParam{
Type: constant.ValueOf[constant.Object](),
Properties: map[string]any{
"location": map[string]any{
"type": "string",
},
},
Required: []string{"location"},
},
},
},
},
})
Expect(err).ToNot(HaveOccurred())
Expect(message.Content).ToNot(BeEmpty())
Expect(string(message.StopReason)).To(Equal("tool_use"))
foundToolUse := false
for _, block := range message.Content {
if block.Type == "tool_use" {
foundToolUse = true
Expect(block.ID).ToNot(BeEmpty())
}
}
Expect(foundToolUse).To(BeTrue(), "Should have tool_use block from ChatDeltas")
})
})
})
})

View File

@@ -101,6 +101,25 @@ var _ = BeforeSuite(func() {
Expect(err).ToNot(HaveOccurred())
Expect(os.WriteFile(configPath, configYAML, 0644)).To(Succeed())
// Create model config for autoparser tests (NoGrammar so tool calls
// are driven entirely by the backend's ChatDeltas, not grammar enforcement)
autoparserConfig := map[string]any{
"name": "mock-model-autoparser",
"backend": "mock-backend",
"parameters": map[string]any{
"model": "mock-model.bin",
},
"function": map[string]any{
"grammar": map[string]any{
"disable": true,
},
},
}
autoparserPath := filepath.Join(modelsPath, "mock-model-autoparser.yaml")
autoparserYAML, err := yaml.Marshal(autoparserConfig)
Expect(err).ToNot(HaveOccurred())
Expect(os.WriteFile(autoparserPath, autoparserYAML, 0644)).To(Succeed())
// Start mock MCP server and create MCP-enabled model config
mcpServerURL, mcpServerShutdown = startMockMCPServer()
mcpConfig := mcpModelConfig(mcpServerURL)

View File

@@ -55,6 +55,46 @@ func (m *MockBackend) Predict(ctx context.Context, in *pb.PredictOptions) (*pb.R
if strings.Contains(in.Prompt, "MOCK_ERROR") {
return nil, fmt.Errorf("mock backend predict error: simulated failure")
}
// Simulate C++ autoparser: tool call via ChatDeltas, empty message
if strings.Contains(in.Prompt, "AUTOPARSER_TOOL_CALL") {
toolName := mockToolNameFromRequest(in)
if toolName == "" {
toolName = "search_collections"
}
return &pb.Reply{
Message: []byte{},
Tokens: 10,
PromptTokens: 5,
ChatDeltas: []*pb.ChatDelta{
{ReasoningContent: "I need to search for information."},
{
ToolCalls: []*pb.ToolCallDelta{
{
Index: 0,
Id: "call_mock_123",
Name: toolName,
Arguments: `{"query":"localai"}`,
},
},
},
},
}, nil
}
// Simulate C++ autoparser: content via ChatDeltas, empty message
if strings.Contains(in.Prompt, "AUTOPARSER_CONTENT") {
return &pb.Reply{
Message: []byte{},
Tokens: 10,
PromptTokens: 5,
ChatDeltas: []*pb.ChatDelta{
{ReasoningContent: "Let me compose a response."},
{Content: "LocalAI is an open-source AI platform."},
},
}, nil
}
var response string
toolName := mockToolNameFromRequest(in)
if toolName != "" && !promptHasToolResults(in.Prompt) {
@@ -88,6 +128,77 @@ func (m *MockBackend) PredictStream(in *pb.PredictOptions, stream pb.Backend_Pre
}
return fmt.Errorf("mock backend stream error: simulated mid-stream failure")
}
// Simulate C++ autoparser behavior: tool calls delivered via ChatDeltas
// with empty message (autoparser clears raw message during parsing).
if strings.Contains(in.Prompt, "AUTOPARSER_TOOL_CALL") {
toolName := mockToolNameFromRequest(in)
if toolName == "" {
toolName = "search_collections"
}
// Phase 1: Stream reasoning tokens with empty message (autoparser active)
reasoning := "I need to search for information."
for _, r := range reasoning {
if err := stream.Send(&pb.Reply{
Message: []byte{}, // autoparser clears raw message
ChatDeltas: []*pb.ChatDelta{
{ReasoningContent: string(r)},
},
}); err != nil {
return err
}
}
// Phase 2: Emit tool call via ChatDeltas (no raw message)
if err := stream.Send(&pb.Reply{
Message: []byte{}, // autoparser clears raw message
ChatDeltas: []*pb.ChatDelta{
{
ToolCalls: []*pb.ToolCallDelta{
{
Index: 0,
Id: "call_mock_123",
Name: toolName,
Arguments: `{"query":"localai"}`,
},
},
},
},
}); err != nil {
return err
}
return nil
}
// Simulate C++ autoparser behavior: content delivered via ChatDeltas
// with empty message (autoparser clears raw message during parsing).
if strings.Contains(in.Prompt, "AUTOPARSER_CONTENT") {
// Phase 1: Stream reasoning via ChatDeltas
reasoning := "Let me compose a response."
for _, r := range reasoning {
if err := stream.Send(&pb.Reply{
Message: []byte{},
ChatDeltas: []*pb.ChatDelta{
{ReasoningContent: string(r)},
},
}); err != nil {
return err
}
}
// Phase 2: Stream content via ChatDeltas (no raw message)
content := "LocalAI is an open-source AI platform."
for _, r := range content {
if err := stream.Send(&pb.Reply{
Message: []byte{},
ChatDeltas: []*pb.ChatDelta{
{Content: string(r)},
},
}); err != nil {
return err
}
}
return nil
}
var toStream string
toolName := mockToolNameFromRequest(in)
if toolName != "" && !promptHasToolResults(in.Prompt) {

View File

@@ -2,6 +2,7 @@ package e2e_test
import (
"context"
"encoding/json"
"io"
"net/http"
"strings"
@@ -265,4 +266,201 @@ var _ = Describe("Mock Backend E2E Tests", Label("MockBackend"), func() {
}
})
})
Describe("Autoparser ChatDelta Streaming", Label("Autoparser"), func() {
// These tests verify that when the C++ autoparser handles tool calls
// and content via ChatDeltas (with empty raw message), the streaming
// endpoint does NOT unnecessarily retry. This is a regression test for
// the bug where the retry logic only checked Go-side parsing, ignoring
// ChatDelta results, causing up to 6 retries and concatenated output.
Context("Streaming with tools and ChatDelta tool calls", func() {
It("should return tool calls without unnecessary retries", func() {
body := `{
"model": "mock-model-autoparser",
"messages": [{"role": "user", "content": "AUTOPARSER_TOOL_CALL"}],
"tools": [{"type": "function", "function": {"name": "search_collections", "description": "Search documents", "parameters": {"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]}}}],
"stream": true
}`
req, err := http.NewRequest("POST", apiURL+"/chat/completions", strings.NewReader(body))
Expect(err).ToNot(HaveOccurred())
req.Header.Set("Content-Type", "application/json")
httpClient := &http.Client{Timeout: 60 * time.Second}
resp, err := httpClient.Do(req)
Expect(err).ToNot(HaveOccurred())
defer resp.Body.Close()
Expect(resp.StatusCode).To(Equal(200))
data, err := io.ReadAll(resp.Body)
Expect(err).ToNot(HaveOccurred())
bodyStr := string(data)
// Parse all SSE events
lines := strings.Split(bodyStr, "\n")
var toolCallChunks int
var reasoningChunks int
hasFinishReason := false
for _, line := range lines {
line = strings.TrimSpace(line)
if !strings.HasPrefix(line, "data: ") || line == "data: [DONE]" {
continue
}
jsonData := strings.TrimPrefix(line, "data: ")
var chunk map[string]any
if err := json.Unmarshal([]byte(jsonData), &chunk); err != nil {
continue
}
choices, ok := chunk["choices"].([]any)
if !ok || len(choices) == 0 {
continue
}
choice := choices[0].(map[string]any)
delta, _ := choice["delta"].(map[string]any)
if delta == nil {
continue
}
if _, ok := delta["tool_calls"]; ok {
toolCallChunks++
}
if _, ok := delta["reasoning"]; ok {
reasoningChunks++
}
if fr, ok := choice["finish_reason"].(string); ok && fr != "" {
hasFinishReason = true
}
}
// The key assertion: tool calls from ChatDeltas should be present
Expect(toolCallChunks).To(BeNumerically(">", 0),
"Expected tool_calls in streaming response from ChatDeltas, but got none. "+
"This likely means the retry logic discarded ChatDelta tool calls.")
// Should have a finish reason
Expect(hasFinishReason).To(BeTrue(), "Expected a finish_reason in the streaming response")
// Reasoning should be present (from ChatDelta reasoning)
Expect(reasoningChunks).To(BeNumerically(">", 0),
"Expected reasoning deltas from ChatDeltas")
})
})
Context("Streaming with tools and ChatDelta content (no tool calls)", func() {
It("should return content without retrying and without concatenation", func() {
body := `{
"model": "mock-model-autoparser",
"messages": [{"role": "user", "content": "AUTOPARSER_CONTENT"}],
"tools": [{"type": "function", "function": {"name": "search_collections", "description": "Search documents", "parameters": {"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]}}}],
"stream": true
}`
req, err := http.NewRequest("POST", apiURL+"/chat/completions", strings.NewReader(body))
Expect(err).ToNot(HaveOccurred())
req.Header.Set("Content-Type", "application/json")
httpClient := &http.Client{Timeout: 60 * time.Second}
resp, err := httpClient.Do(req)
Expect(err).ToNot(HaveOccurred())
defer resp.Body.Close()
Expect(resp.StatusCode).To(Equal(200))
data, err := io.ReadAll(resp.Body)
Expect(err).ToNot(HaveOccurred())
bodyStr := string(data)
// Parse all SSE events and collect content
lines := strings.Split(bodyStr, "\n")
var contentParts []string
var reasoningParts []string
for _, line := range lines {
line = strings.TrimSpace(line)
if !strings.HasPrefix(line, "data: ") || line == "data: [DONE]" {
continue
}
jsonData := strings.TrimPrefix(line, "data: ")
var chunk map[string]any
if err := json.Unmarshal([]byte(jsonData), &chunk); err != nil {
continue
}
choices, ok := chunk["choices"].([]any)
if !ok || len(choices) == 0 {
continue
}
choice := choices[0].(map[string]any)
delta, _ := choice["delta"].(map[string]any)
if delta == nil {
continue
}
if content, ok := delta["content"].(string); ok && content != "" {
contentParts = append(contentParts, content)
}
if reasoning, ok := delta["reasoning"].(string); ok && reasoning != "" {
reasoningParts = append(reasoningParts, reasoning)
}
}
fullContent := strings.Join(contentParts, "")
fullReasoning := strings.Join(reasoningParts, "")
// Content should be present and match the expected answer
Expect(fullContent).To(ContainSubstring("LocalAI"),
"Expected content from ChatDeltas to contain 'LocalAI'. "+
"The retry logic may have discarded ChatDelta content.")
// Content should NOT be duplicated (no retry concatenation)
occurrences := strings.Count(fullContent, "LocalAI is an open-source AI platform.")
Expect(occurrences).To(Equal(1),
"Expected content to appear exactly once, but found %d occurrences. "+
"This indicates unnecessary retries are concatenating output.", occurrences)
// Reasoning should be present
Expect(fullReasoning).To(ContainSubstring("compose"),
"Expected reasoning content from ChatDeltas")
})
})
Context("Non-streaming with tools and ChatDelta tool calls", func() {
It("should return tool calls from ChatDeltas", func() {
body := `{
"model": "mock-model-autoparser",
"messages": [{"role": "user", "content": "AUTOPARSER_TOOL_CALL"}],
"tools": [{"type": "function", "function": {"name": "search_collections", "description": "Search documents", "parameters": {"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]}}}]
}`
req, err := http.NewRequest("POST", apiURL+"/chat/completions", strings.NewReader(body))
Expect(err).ToNot(HaveOccurred())
req.Header.Set("Content-Type", "application/json")
httpClient := &http.Client{Timeout: 60 * time.Second}
resp, err := httpClient.Do(req)
Expect(err).ToNot(HaveOccurred())
defer resp.Body.Close()
Expect(resp.StatusCode).To(Equal(200))
data, err := io.ReadAll(resp.Body)
Expect(err).ToNot(HaveOccurred())
var result map[string]any
Expect(json.Unmarshal(data, &result)).To(Succeed())
choices, ok := result["choices"].([]any)
Expect(ok).To(BeTrue())
Expect(choices).To(HaveLen(1))
choice := choices[0].(map[string]any)
msg, _ := choice["message"].(map[string]any)
Expect(msg).ToNot(BeNil())
toolCalls, ok := msg["tool_calls"].([]any)
Expect(ok).To(BeTrue(),
"Expected tool_calls in non-streaming response from ChatDeltas, "+
"but got: %s", string(data))
Expect(toolCalls).To(HaveLen(1))
tc := toolCalls[0].(map[string]any)
fn, _ := tc["function"].(map[string]any)
Expect(fn["name"]).To(Equal("search_collections"))
})
})
})
})