mirror of
https://github.com/mudler/LocalAI.git
synced 2026-05-16 20:52:08 -04:00
fix(tests/e2e-backends): bump ctx_size for llama-cpp transcription
Qwen3-ASR-0.6B encodes the jfk.wav fixture into 777 audio tokens via its mmproj, but the test harness defaulted BACKEND_TEST_CTX_SIZE to 512, so llama.cpp server rejected every transcription request with "request (777 tokens) exceeds the available context size (512 tokens)". Set BACKEND_TEST_CTX_SIZE=2048 on the llama-cpp transcription target only — sherpa-onnx and vibevoice transcription targets don't go through llama.cpp's slot/n_ctx and weren't failing. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude Code]
This commit is contained in:
1
Makefile
1
Makefile
@@ -594,6 +594,7 @@ test-extra-backend-llama-cpp-transcription: docker-build-llama-cpp
|
||||
BACKEND_TEST_MMPROJ_URL=https://huggingface.co/ggml-org/Qwen3-ASR-0.6B-GGUF/resolve/main/mmproj-Qwen3-ASR-0.6B-Q8_0.gguf \
|
||||
BACKEND_TEST_AUDIO_URL=https://github.com/ggml-org/whisper.cpp/raw/master/samples/jfk.wav \
|
||||
BACKEND_TEST_CAPS=health,load,transcription \
|
||||
BACKEND_TEST_CTX_SIZE=2048 \
|
||||
$(MAKE) test-extra-backend
|
||||
|
||||
## vllm is resolved from a HuggingFace model id (no file download) and
|
||||
|
||||
Reference in New Issue
Block a user