From 3bc5ae8da694bcfb3b374d8abb87915b8b8905de Mon Sep 17 00:00:00 2001
From: Ettore Di Giacinto <mudler@localai.io>
Date: Thu, 7 May 2026 22:31:08 +0000
Subject: [PATCH] fix(tests/e2e-backends): bump ctx_size for llama-cpp
 transcription
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Qwen3-ASR-0.6B encodes the jfk.wav fixture into 777 audio tokens via
its mmproj, but the test harness defaulted BACKEND_TEST_CTX_SIZE to
512, so llama.cpp server rejected every transcription request with
"request (777 tokens) exceeds the available context size (512 tokens)".

Set BACKEND_TEST_CTX_SIZE=2048 on the llama-cpp transcription target
only — sherpa-onnx and vibevoice transcription targets don't go
through llama.cpp's slot/n_ctx and weren't failing.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
---
 Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Makefile b/Makefile
index 73d660266..8d97d675a 100644
--- a/Makefile
+++ b/Makefile
@@ -594,6 +594,7 @@ test-extra-backend-llama-cpp-transcription: docker-build-llama-cpp
 	BACKEND_TEST_MMPROJ_URL=https://huggingface.co/ggml-org/Qwen3-ASR-0.6B-GGUF/resolve/main/mmproj-Qwen3-ASR-0.6B-Q8_0.gguf \
 	BACKEND_TEST_AUDIO_URL=https://github.com/ggml-org/whisper.cpp/raw/master/samples/jfk.wav \
 	BACKEND_TEST_CAPS=health,load,transcription \
+	BACKEND_TEST_CTX_SIZE=2048 \
 	$(MAKE) test-extra-backend
 
 ## vllm is resolved from a HuggingFace model id (no file download) and