Compare commits

..

2 Commits

Author SHA1 Message Date
LocalAI [bot]
193d0e6aef fix(backends): darwin/metal support for supertonic (#10488)
The supertonic Go TTS backend dlopens ONNX Runtime, but its runtime and
packaging scripts were Linux-only: run.sh exported LD_LIBRARY_PATH, pointed
ONNXRUNTIME_LIB_PATH at libonnxruntime.so, and always tried the ld.so exec
path, while package.sh hard-failed on any non-Linux host. On macOS dyld has
no ld.so loader, uses DYLD_LIBRARY_PATH, and ONNX Runtime ships as a .dylib.

This applies the same purego .dylib/DYLD_LIBRARY_PATH fix that PR #10481
landed for 15 other ONNX/purego backends (sherpa-onnx, silero-vad, etc.) but
which omitted supertonic:

- run.sh: on darwin export DYLD_LIBRARY_PATH and point ONNXRUNTIME_LIB_PATH
  at libonnxruntime.dylib; guard the ld.so exec path to Linux only.
- package.sh: recognize Darwin instead of erroring out; the bundled .dylib is
  resolved via DYLD_LIBRARY_PATH, no glibc/ld.so to bundle.
- helper.go: platform-native default library extension (dylib on darwin) for
  the last-resort dlopen fallback.

It also wires the darwin CI build and gallery entries, resolving the
inconsistency where backend/index.yaml advertised metal for supertonic but no
includeDarwin matrix entry built the image:

- .github/backend-matrix.yml: add the -metal-darwin-arm64-supertonic Go entry.
- backend/index.yaml: declare metal capabilities and add the concrete
  metal-supertonic / metal-supertonic-development child entries.

The Makefile already detects Darwin/osx/arm64 and stages the per-OS ONNX
Runtime tarball, mirroring sherpa-onnx, so no Makefile change is required.


Assisted-by: Claude:opus-4.8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 22:19:03 +02:00
LocalAI [bot]
482314c623 fix(realtime): resolve model aliases for pipeline sub-models (#10484)
Realtime pipeline sub-models (llm/transcription/tts/vad/sound-detection)
were loaded via cl.LoadModelConfigFileByName without alias resolution,
unlike top-level API requests which resolve aliases in
core/http/middleware/request.go. So a pipeline that references an alias
(e.g. `pipeline.llm: default`, where `default` is an alias for a real
LLM) reached model loading as the alias stub with an empty Backend.

This was silently broken on a single host (it failed downstream) and a
hard error in distributed/p2p mode:

    routing model : loading model default: ... installing backend on
    node X: backend name is empty

Fix by routing every pipeline sub-model load through a small helper that
follows a single alias hop (mirroring the top-level resolution), so
non-alias sub-models behave identically and aliased ones get the
target's full config (Backend, Model, ...).

Assisted-by: Claude:claude-opus-4-8

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 21:50:44 +02:00
11 changed files with 119 additions and 226 deletions

View File

@@ -4974,13 +4974,6 @@ includeDarwin:
- backend: "kitten-tts"
tag-suffix: "-metal-darwin-arm64-kitten-tts"
build-type: "mps"
# vLLM on Apple Silicon via vllm-metal (MLX). The install is custom
# (backend/python/vllm/install.sh has a darwin branch); lang stays python so
# backend_build_darwin.yml drives it through build-darwin-python-backend ->
# scripts/build/python-darwin.sh, which runs the backend's install.sh.
- backend: "vllm"
tag-suffix: "-metal-darwin-arm64-vllm"
build-type: "mps"
- backend: "piper"
tag-suffix: "-metal-darwin-arm64-piper"
build-type: "metal"
@@ -4997,6 +4990,10 @@ includeDarwin:
tag-suffix: "-metal-darwin-arm64-sherpa-onnx"
build-type: "metal"
lang: "go"
- backend: "supertonic"
tag-suffix: "-metal-darwin-arm64-supertonic"
build-type: "metal"
lang: "go"
- backend: "local-store"
tag-suffix: "-metal-darwin-arm64-local-store"
build-type: "metal"

View File

@@ -1,55 +0,0 @@
#!/bin/bash
# Bump the single vllm-metal pin (VLLM_METAL_VERSION) in the vLLM backend's
# darwin (Apple Silicon) install path. The macOS/Metal build
# (backend/python/vllm/install.sh, Darwin branch) installs vllm-metal, which is
# version-locked to a specific vLLM source release. install.sh derives that vLLM
# version at build time from vllm-metal's own installer (`vllm_v=`) at the pinned
# tag, so there is only ONE value to bump here -- mirroring bump_vllm_wheel.sh,
# which bumps the Linux cu130 wheel pin.
#
# This deliberately tracks vllm-project/vllm-metal, NOT vllm-project/vllm: the
# darwin build can only use the exact vLLM version vllm-metal supports, so it may
# lag the Linux pin (requirements-cublas13-after.txt) until vllm-metal catches up.
set -xe
REPO=$1 # vllm-project/vllm-metal
FILE=$2 # backend/python/vllm/install.sh
VAR=$3 # VLLM_METAL_VERSION (used for the workflow's output file names)
if [ -z "$FILE" ] || [ -z "$REPO" ] || [ -z "$VAR" ]; then
echo "usage: $0 <repo> <install-file> <var-name>" >&2
exit 1
fi
# vllm-metal ships frequent dev releases, all flagged as non-prerelease, so
# /releases/latest returns the newest one (with its cp312 wheel asset).
LATEST_TAG=$(curl -sS -H "Accept: application/vnd.github+json" \
"https://api.github.com/repos/$REPO/releases/latest" \
| python3 -c "import json,sys; print(json.load(sys.stdin)['tag_name'])")
# The coupled vLLM source version lives in vllm-metal's installer at that tag.
NEW_VLLM_VERSION=$(curl -fsSL \
"https://raw.githubusercontent.com/$REPO/$LATEST_TAG/install.sh" \
| grep -oE 'vllm_v="[0-9]+\.[0-9]+\.[0-9]+"' | head -1 | cut -d'"' -f2)
if [ -z "$LATEST_TAG" ] || [ -z "$NEW_VLLM_VERSION" ]; then
echo "Could not resolve vllm-metal tag ($LATEST_TAG) or its vllm_v ($NEW_VLLM_VERSION)." >&2
exit 1
fi
set +e
CURRENT_TAG=$(grep -oE 'VLLM_METAL_VERSION="[^"]*"' "$FILE" | head -1 | cut -d'"' -f2)
set -e
# Rewrite the single pin. install.sh derives VLLM_VERSION from this tag at build
# time, so there is nothing else to touch. peter-evans/create-pull-request opens
# no PR on a clean tree, so a no-op rewrite (already current) is safe.
sed -i "$FILE" \
-e "s|VLLM_METAL_VERSION=\"[^\"]*\"|VLLM_METAL_VERSION=\"$LATEST_TAG\"|"
if [ -z "$CURRENT_TAG" ]; then
echo "Could not find VLLM_METAL_VERSION=\"...\" in $FILE." >&2
exit 0
fi
echo "vllm-metal ${CURRENT_TAG} -> ${LATEST_TAG} (builds vLLM ${NEW_VLLM_VERSION}): https://github.com/$REPO/releases/tag/${LATEST_TAG}" >> "${VAR}_message.txt"
echo "${LATEST_TAG}" >> "${VAR}_commit.txt"

View File

@@ -154,39 +154,3 @@ jobs:
branch: "update/VLLM_VERSION"
body: ${{ steps.bump.outputs.message }}
signoff: true
bump-vllm-metal:
# The darwin (Apple Silicon) vLLM build installs vllm-metal, which is locked
# to a specific vLLM source release. install.sh pins both VLLM_METAL_VERSION
# (the wheel release) and VLLM_VERSION (the vLLM it builds against); this job
# tracks vllm-project/vllm-metal and rewrites both atomically. Separate from
# bump-vllm-wheel because darwin follows vllm-metal, not vllm/vllm latest.
if: github.repository == 'mudler/LocalAI'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v7
- name: Bump vllm-metal pin 🔧
id: bump
run: |
bash .github/bump_vllm_metal.sh vllm-project/vllm-metal backend/python/vllm/install.sh VLLM_METAL_VERSION
{
echo 'message<<EOF'
cat "VLLM_METAL_VERSION_message.txt"
echo EOF
} >> "$GITHUB_OUTPUT"
{
echo 'commit<<EOF'
cat "VLLM_METAL_VERSION_commit.txt"
echo EOF
} >> "$GITHUB_OUTPUT"
rm -rfv VLLM_METAL_VERSION_message.txt VLLM_METAL_VERSION_commit.txt
- name: Create Pull Request
uses: peter-evans/create-pull-request@v8
with:
token: ${{ secrets.UPDATE_BOT_TOKEN }}
push-to-fork: ci-forks/LocalAI
commit-message: ':arrow_up: Update vllm-project/vllm-metal (darwin)'
title: 'chore: :arrow_up: Update vllm-metal (darwin) to `${{ steps.bump.outputs.commit }}`'
branch: "update/VLLM_METAL_VERSION"
body: ${{ steps.bump.outputs.message }}
signoff: true

View File

@@ -16,6 +16,7 @@ import (
"os"
"path/filepath"
"regexp"
"runtime"
"strings"
"time"
"unicode"
@@ -943,7 +944,13 @@ func InitializeONNXRuntime() error {
}
}
if libPath == "" {
libPath = "/usr/local/lib/libonnxruntime.so"
// LocalAI: default to the platform-native shared library
// extension when nothing else is found (dyld vs ld.so).
if runtime.GOOS == "darwin" {
libPath = "/usr/local/lib/libonnxruntime.dylib"
} else {
libPath = "/usr/local/lib/libonnxruntime.so"
}
}
}
ort.SetSharedLibraryPath(libPath)

View File

@@ -32,6 +32,10 @@ elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
elif [ $(uname -s) = "Darwin" ]; then
# macOS: dyld resolves the bundled .dylib via DYLD_LIBRARY_PATH (set in
# run.sh); there is no ld.so loader nor glibc to bundle.
echo "Detected Darwin"
else
echo "Error: Could not detect architecture"
exit 1

View File

@@ -3,12 +3,19 @@ set -ex
CURDIR=$(dirname "$(realpath $0)")
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export ONNXRUNTIME_LIB_PATH=$CURDIR/lib/libonnxruntime.so
if [ "$(uname)" = "Darwin" ]; then
# macOS uses dyld: there is no ld.so loader, and the search path env
# var is DYLD_LIBRARY_PATH. ONNX Runtime ships as a .dylib here.
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
export ONNXRUNTIME_LIB_PATH=$CURDIR/lib/libonnxruntime.dylib
else
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
export ONNXRUNTIME_LIB_PATH=$CURDIR/lib/libonnxruntime.so
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
exec $CURDIR/lib/ld.so $CURDIR/supertonic "$@"
if [ -f $CURDIR/lib/ld.so ]; then
echo "Using lib/ld.so"
exec $CURDIR/lib/ld.so $CURDIR/supertonic "$@"
fi
fi
exec $CURDIR/supertonic "$@"

View File

@@ -645,7 +645,6 @@
nvidia-cuda-13: "cuda13-vllm"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-vllm"
cpu: "cpu-vllm"
metal: "metal-vllm"
- &sglang
name: "sglang"
license: apache-2.0
@@ -1570,6 +1569,7 @@
- TTS
capabilities:
default: "cpu-supertonic"
metal: "metal-supertonic"
- !!merge <<: *neutts
name: "neutts-development"
capabilities:
@@ -2928,17 +2928,6 @@
nvidia-cuda-13: "cuda13-vllm-development"
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-vllm-development"
cpu: "cpu-vllm-development"
metal: "metal-vllm-development"
- !!merge <<: *vllm
name: "metal-vllm"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-vllm"
mirrors:
- localai/localai-backends:latest-metal-darwin-arm64-vllm
- !!merge <<: *vllm
name: "metal-vllm-development"
uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-vllm"
mirrors:
- localai/localai-backends:master-metal-darwin-arm64-vllm
- !!merge <<: *vllm
name: "cuda12-vllm"
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-vllm"
@@ -5496,6 +5485,7 @@
name: "supertonic-development"
capabilities:
default: "cpu-supertonic-development"
metal: "metal-supertonic-development"
- !!merge <<: *supertonic
name: "cpu-supertonic"
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-supertonic"
@@ -5506,3 +5496,13 @@
uri: "quay.io/go-skynet/local-ai-backends:master-cpu-supertonic"
mirrors:
- localai/localai-backends:master-cpu-supertonic
- !!merge <<: *supertonic
name: "metal-supertonic"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-supertonic"
mirrors:
- localai/localai-backends:latest-metal-darwin-arm64-supertonic
- !!merge <<: *supertonic
name: "metal-supertonic-development"
uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-supertonic"
mirrors:
- localai/localai-backends:master-metal-darwin-arm64-supertonic

View File

@@ -43,24 +43,6 @@ if [ "x${BUILD_PROFILE}" == "xcublas13" ]; then
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
fi
# Apple Silicon (Metal/MLX) via vllm-metal.
# vllm-metal (github.com/vllm-project/vllm-metal) brings vLLM to macOS on Apple
# Silicon: it registers through vLLM's platform-plugin entry point
# (metal -> vllm_metal:register), MetalPlatform activates, and the vLLM v1
# AsyncLLM engine runs on the GPU through MLX. LocalAI's backend.py is UNCHANGED
# on darwin — AsyncEngineArgs(...) -> AsyncLLMEngine.from_engine_args transparently
# resolves to the MLX engine (proven on a real M4 / macOS 26.5 against Qwen3-0.6B).
#
# vllm-metal REQUIRES Python 3.12, so force the portable CPython before the venv
# is created (ensureVenv reads PYTHON_VERSION/PYTHON_PATCH/PY_STANDALONE_TAG).
# The patch + standalone tag mirror the l4t13 cp312 pin — a known-good
# python-build-standalone release that also ships an aarch64-apple-darwin asset.
if [ "$(uname -s)" = "Darwin" ]; then
PYTHON_VERSION="3.12"
PYTHON_PATCH="12"
PY_STANDALONE_TAG="20251120"
fi
# JetPack 7 / L4T arm64 vllm + torch wheels come straight from PyPI now
# (torch 2.11+ ships aarch64 + cu130 manylinux wheels and vllm 0.20+ ships
# an aarch64 wheel pinned to that torch). They're cp312-only, so bump the
@@ -75,92 +57,11 @@ if [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
PY_STANDALONE_TAG="20251120"
fi
# ===================== Apple Silicon (Metal/MLX) =====================
# Reproduce vllm-metal's upstream installer
# (curl -fsSL https://raw.githubusercontent.com/vllm-project/vllm-metal/main/install.sh)
# but INTO LocalAI's managed venv (ensureVenv) instead of a throwaway
# ~/.venv-vllm-metal, so the backend integrates with LocalAI's venv lifecycle
# (portable CPython, _makeVenvPortable relocation, runtime activation). The
# normal CUDA/CPU installRequirements is skipped on darwin — there is no
# macOS/arm64 vLLM wheel on PyPI; vLLM is built from source and the MLX engine
# is layered on by the vllm-metal wheel.
if [ "$(uname -s)" = "Darwin" ]; then
# Create/activate the portable 3.12 venv. On darwin USE_PIP=true and
# PORTABLE_PYTHON=true (set by scripts/build/python-darwin.sh), so this is a
# `python -m venv` based, relocatable venv.
ensureVenv
# vllm-metal's installer drives everything through `uv`: building vLLM from
# the CPU requirements needs `--index-strategy unsafe-best-match` (mixes the
# pytorch CPU channel with PyPI), a flag plain pip does not have. The darwin
# venv is pip-based, so bootstrap uv into it. uv honours $VIRTUAL_ENV (set by
# libbackend's _activateVenv) and installs into THIS venv — same pattern the
# intel branch below relies on.
pip install uv
# The ONLY darwin version pin -- AUTO-BUMPED by .github/bump_vllm_metal.sh,
# which tracks vllm-project/vllm-metal releases (NOT vllm/vllm latest). Keep
# it as a plain double-quoted assignment on its own line so the bumper's sed
# can rewrite it. Darwin therefore follows vllm-metal and can lag the Linux
# vllm pin (requirements-cublas13-after.txt, bumped independently against
# vllm/vllm) until vllm-metal supports a newer vLLM.
VLLM_METAL_VERSION="v0.3.0.dev20260622062346"
# The coupled vLLM source version is whatever this vllm-metal release builds
# against -- it declares it in its own installer as `vllm_v=`. Derive it from
# the PINNED tag rather than hardcoding a second value that could drift. The
# tag is immutable, so this stays reproducible across rebuilds.
VLLM_VERSION=$(curl -fsSL "https://raw.githubusercontent.com/vllm-project/vllm-metal/${VLLM_METAL_VERSION}/install.sh" \
| grep -oE 'vllm_v="[0-9]+\.[0-9]+\.[0-9]+"' | head -n1 | cut -d'"' -f2)
if [ -z "${VLLM_VERSION}" ]; then
echo "ERROR: could not derive the vLLM version from vllm-metal ${VLLM_METAL_VERSION}" >&2
exit 1
fi
echo "vllm-metal ${VLLM_METAL_VERSION} builds against vLLM ${VLLM_VERSION}"
_vllm_src=$(mktemp -d)
trap 'rm -rf "${_vllm_src}"' EXIT
pushd "${_vllm_src}"
# 1) Build vLLM ${VLLM_VERSION} from the release source tarball against
# the CPU requirements. vllm-metal layers its MLX platform plugin on
# top of this exact build.
curl -fsSL -o "vllm-${VLLM_VERSION}.tar.gz" \
"https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}.tar.gz"
tar -xzf "vllm-${VLLM_VERSION}.tar.gz"
pushd "vllm-${VLLM_VERSION}"
uv pip install -r requirements/cpu.txt --index-strategy unsafe-best-match
# -Wno-parentheses: clang on macOS treats one of vLLM's C++ warnings
# as an error without it (matches the upstream installer's CXXFLAGS).
CXXFLAGS="-Wno-parentheses" uv pip install .
popd
popd
# 2) Install the prebuilt vllm-metal wheel from the PINNED release
# (${VLLM_METAL_VERSION}). It pulls mlx / mlx-metal as deps and registers
# the `metal` platform plugin that backend.py resolves to at engine-init
# time. Pinning the tag (vs releases/latest) keeps the wheel and the vLLM
# source build above reproducible and coupled; .github/bump_vllm_metal.sh
# advances both together.
_metal_wheel_url=$(curl -fsSL "https://api.github.com/repos/vllm-project/vllm-metal/releases/tags/${VLLM_METAL_VERSION}" \
| grep -oE '"browser_download_url"[[:space:]]*:[[:space:]]*"[^"]+\.whl"' \
| head -n1 | sed -E 's/.*"(https[^"]+)".*/\1/')
if [ -z "${_metal_wheel_url}" ]; then
echo "ERROR: could not resolve a vllm-metal wheel URL for release ${VLLM_METAL_VERSION}" >&2
exit 1
fi
echo "Installing vllm-metal wheel: ${_metal_wheel_url}"
uv pip install "${_metal_wheel_url}"
# Generate the gRPC stubs (backend_pb2*). installRequirements normally does
# this via runProtogen at the end; we skipped installRequirements on darwin,
# so call it explicitly here.
runProtogen
# Intel XPU has no upstream-published vllm wheels, so we always build vllm
# from source against torch-xpu and replace the default triton with
# triton-xpu (matching torch 2.11). Mirrors the upstream procedure:
# https://github.com/vllm-project/vllm/blob/main/docs/getting_started/installation/gpu.xpu.inc.md
elif [ "x${BUILD_TYPE}" == "xintel" ]; then
if [ "x${BUILD_TYPE}" == "xintel" ]; then
# Hide requirements-intel-after.txt so installRequirements doesn't
# try `pip install vllm` (would either fail or grab a non-XPU wheel).
_intel_after="${backend_dir}/requirements-intel-after.txt"

View File

@@ -4,7 +4,4 @@
# instead — the cublas13 case in install.sh adds --index-strategy=unsafe-best-match
# so uv consults this index alongside PyPI.
--extra-index-url https://wheels.vllm.ai/0.23.0/cu130
# VERSION COUPLING: darwin/Apple-Silicon builds use vllm-metal (see install.sh),
# which pins this exact vLLM version. Bumping vllm here means coordinating with a
# vllm-metal release that supports the new version, or macOS/Metal builds break.
vllm==0.23.0

View File

@@ -432,7 +432,7 @@ func loadSoundDetectionConfig(pipeline *config.Pipeline, cl *config.ModelConfigL
if pipeline.SoundDetection == "" {
return nil, nil
}
cfg, err := cl.LoadModelConfigFileByName(pipeline.SoundDetection, ml.ModelPath)
cfg, err := loadPipelineSubModel(cl, pipeline.SoundDetection, ml.ModelPath)
if err != nil {
return nil, fmt.Errorf("failed to load sound detection config: %w", err)
}
@@ -443,7 +443,7 @@ func loadSoundDetectionConfig(pipeline *config.Pipeline, cl *config.ModelConfigL
}
func newTranscriptionOnlyModel(pipeline *config.Pipeline, cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) (Model, *config.ModelConfig, error) {
cfgVAD, err := cl.LoadModelConfigFileByName(pipeline.VAD, ml.ModelPath)
cfgVAD, err := loadPipelineSubModel(cl, pipeline.VAD, ml.ModelPath)
if err != nil {
return nil, nil, fmt.Errorf("failed to load backend config: %w", err)
@@ -453,7 +453,7 @@ func newTranscriptionOnlyModel(pipeline *config.Pipeline, cl *config.ModelConfig
return nil, nil, fmt.Errorf("failed to validate config: %w", err)
}
cfgSST, err := cl.LoadModelConfigFileByName(pipeline.Transcription, ml.ModelPath)
cfgSST, err := loadPipelineSubModel(cl, pipeline.Transcription, ml.ModelPath)
if err != nil {
return nil, nil, fmt.Errorf("failed to load backend config: %w", err)
@@ -542,11 +542,30 @@ func buildRealtimeRoutingContext(a *application.Application, sessionID string) *
}
}
// loadPipelineSubModel loads a pipeline sub-model config by name and follows a
// single alias hop, so a pipeline that references an alias (e.g. `llm: default`)
// gets the alias target's full config (Backend, Model, ...) rather than the
// alias stub with an empty Backend. Without this the alias survives unresolved
// into model loading and fails downstream — notably in distributed mode with
// "backend name is empty". Mirrors the top-level alias resolution in
// core/http/middleware/request.go.
func loadPipelineSubModel(cl *config.ModelConfigLoader, name, modelPath string) (*config.ModelConfig, error) {
cfg, err := cl.LoadModelConfigFileByName(name, modelPath)
if err != nil {
return nil, err
}
resolved, _, err := cl.ResolveAlias(cfg)
if err != nil {
return nil, err
}
return resolved, nil
}
// returns and loads either a wrapped model or a model that support audio-to-audio
func newModel(pipeline *config.Pipeline, cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig, evaluator *templates.Evaluator, routing *RealtimeRoutingContext) (Model, error) {
xlog.Debug("Creating new model pipeline model", "pipeline", pipeline)
cfgVAD, err := cl.LoadModelConfigFileByName(pipeline.VAD, ml.ModelPath)
cfgVAD, err := loadPipelineSubModel(cl, pipeline.VAD, ml.ModelPath)
if err != nil {
return nil, fmt.Errorf("failed to load backend config: %w", err)
@@ -557,7 +576,7 @@ func newModel(pipeline *config.Pipeline, cl *config.ModelConfigLoader, ml *model
}
// TODO: Do we always need a transcription model? It can be disabled. Note that any-to-any instruction following models don't transcribe as such, so if transcription is required it is a separate process
cfgSST, err := cl.LoadModelConfigFileByName(pipeline.Transcription, ml.ModelPath)
cfgSST, err := loadPipelineSubModel(cl, pipeline.Transcription, ml.ModelPath)
if err != nil {
return nil, fmt.Errorf("failed to load backend config: %w", err)
@@ -589,7 +608,7 @@ func newModel(pipeline *config.Pipeline, cl *config.ModelConfigLoader, ml *model
xlog.Debug("Loading a wrapped model")
// Otherwise we want to return a wrapped model, which is a "virtual" model that re-uses other models to perform operations
cfgLLM, err := cl.LoadModelConfigFileByName(pipeline.LLM, ml.ModelPath)
cfgLLM, err := loadPipelineSubModel(cl, pipeline.LLM, ml.ModelPath)
if err != nil {
return nil, fmt.Errorf("failed to load backend config: %w", err)
@@ -604,7 +623,7 @@ func newModel(pipeline *config.Pipeline, cl *config.ModelConfigLoader, ml *model
applyPipelineReasoning(cfgLLM, *pipeline)
applyPipelineThinking(cfgLLM, *pipeline)
cfgTTS, err := cl.LoadModelConfigFileByName(pipeline.TTS, ml.ModelPath)
cfgTTS, err := loadPipelineSubModel(cl, pipeline.TTS, ml.ModelPath)
if err != nil {
return nil, fmt.Errorf("failed to load backend config: %w", err)

View File

@@ -0,0 +1,52 @@
package openai
import (
"os"
"path/filepath"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"github.com/mudler/LocalAI/core/config"
)
// loadPipelineSubModel must resolve a pipeline sub-model that references an
// alias (e.g. `llm: default`) one hop to the alias target's full config — so
// the effective backend is the target's backend, not the empty backend of the
// alias stub. This mirrors the top-level alias resolution done in
// core/http/middleware/request.go, which the realtime pipeline previously
// skipped (failing in distributed mode with "backend name is empty").
var _ = Describe("loadPipelineSubModel", func() {
It("resolves a sub-model alias one hop to the target's config", func() {
tmpDir := GinkgoT().TempDir()
// A real model config with a concrete backend.
realLLM := `name: real-llm
backend: llama-cpp
parameters:
model: real-llm.gguf
`
Expect(os.WriteFile(filepath.Join(tmpDir, "real-llm.yaml"), []byte(realLLM), 0644)).To(Succeed())
// An alias pointing at the real model.
aliasCfg := `name: default
alias: real-llm
`
Expect(os.WriteFile(filepath.Join(tmpDir, "default.yaml"), []byte(aliasCfg), 0644)).To(Succeed())
cl := config.NewModelConfigLoader(tmpDir)
Expect(cl.LoadModelConfigsFromPath(tmpDir)).To(Succeed())
// Resolving the alias must follow the hop to the target's full config.
resolved, err := loadPipelineSubModel(cl, "default", tmpDir)
Expect(err).NotTo(HaveOccurred())
Expect(resolved.IsAlias()).To(BeFalse())
Expect(resolved.Backend).To(Equal("llama-cpp"))
// A non-alias name must load unchanged.
direct, err := loadPipelineSubModel(cl, "real-llm", tmpDir)
Expect(err).NotTo(HaveOccurred())
Expect(direct.Backend).To(Equal("llama-cpp"))
Expect(direct.Name).To(Equal("real-llm"))
})
})