Compare commits

...

14 Commits

Author SHA1 Message Date
Ettore Di Giacinto
5ad4d86ec4 fix(hipblas): correct amdgpu.ids source package name in comment
Verified against the real rocm/dev-ubuntu-24.04:7.2.1 image with
hipblas-dev/hipblaslt-dev/rocblas-dev installed: /usr/share/libdrm/amdgpu.ids
is owned by libdrm-common, not libdrm-amdgpu1 as the comment said.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-07-01 20:16:57 +00:00
Ettore Di Giacinto
637e382f04 fix(hipblas): symlink amdgpu.ids so ROCm backends find the ASIC ID table
ROCm's bundled libdrm_amdgpu looks up the GPU ASIC ID table at a
hardcoded fallback path, /opt/amdgpu/share/libdrm/amdgpu.ids, which is
only populated by AMD's full amdgpu-install (graphics/DKMS) stack. The
hipblas image is compute-only and doesn't have it, so every model load
logs "No such file or directory" and the GPU can't be identified.
Symlink it to the equivalent file already shipped by Ubuntu's
libdrm-amdgpu1 package.

Fixes #10624

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-07-01 20:06:41 +00:00
LocalAI [bot]
703ea32de6 chore: ⬆️ Update vllm-metal (darwin) to v0.3.0.dev20260630095652 (#10616)
⬆️ Update vllm-project/vllm-metal (darwin)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-07-01 21:56:59 +02:00
LocalAI [bot]
751db06e35 chore: ⬆️ Update CrispStrobe/CrispASR to 8fd9db8fec8cb5e929d23d3267ed5817794feb1a (#10615)
⬆️ Update CrispStrobe/CrispASR

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-07-01 21:56:41 +02:00
LocalAI [bot]
f46c0e9c83 docs: ⬆️ update docs version mudler/LocalAI (#10614)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-07-01 21:56:21 +02:00
LocalAI [bot]
0d8adfc59a chore: ⬆️ Update ggml-org/llama.cpp to 0eca4d490e591d4e93058d07540cf47278a72577 (#10617)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-07-01 09:31:50 +02:00
LocalAI [bot]
43f2615e19 chore: ⬆️ Update vllm-project/vllm cu130 wheel to 0.24.0 (#10618)
⬆️ Update vllm-project/vllm cu130 wheel

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-07-01 08:53:03 +02:00
LocalAI [bot]
875c539ad5 chore: ⬆️ Update ikawrakow/ik_llama.cpp to 29431b31c89e79c10f8736e8f2742485ba1713d6 (#10620)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-07-01 08:52:36 +02:00
LocalAI [bot]
d641ded194 chore: ⬆️ Update ggml-org/whisper.cpp to 0874de3e8e8e48361dba85c7fe6d176f008bf158 (#10621)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-07-01 08:43:40 +02:00
LocalAI [bot]
40445fff05 chore: ⬆️ Update leejet/stable-diffusion.cpp to 484baa41e5e006c52dcd4addc38c830b9489745f (#10619)
* ⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix(stablediffusion-ggml): adapt to new generate_image() out-param signature

leejet/stable-diffusion.cpp@484baa4 changed generate_image() from
returning sd_image_t* to returning bool with images_out/num_images_out
out-parameters (same pattern already used by generate_video()).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-07-01 08:32:57 +02:00
Tai An
057dee956a fix(launcher): keep data/config under ~/.localai (#10610) (#10613)
The launcher starts the server with run --models-path/--backends-path but
leaves --data-path and the dynamic config dir unset, so the server falls
back to its /data and /configuration defaults.
 is kong.ExpandPath("."), i.e. the launcher process CWD
(commonly the user's home root), producing ~/data and ~/configuration
outside ~/.localai and an agent-pool stateDir under ~/data.

Pass --data-path and --localai-config-dir explicitly, rooted at the
launcher's own data directory (GetDataPath() -> ~/.localai), so data and
config stay consistent with --models-path/--backends-path.
2026-06-30 22:14:59 +02:00
Adira
4ec39bb776 fix(watchdog): don't log optional Free() as an error when backend returns Unimplemented (#10602) (#10607)
* fix(watchdog): don't log optional Free() as an error when backend returns Unimplemented (#10602)

When the watchdog evicts a model, deleteProcess calls the backend's gRPC
Free() to release VRAM before stopping the process. Free is optional:
backends that don't override it -- the generated UnimplementedBackendServer
stub, many Python/external backends, or a federation proxy in distributed
mode -- return gRPC Unimplemented. That is expected, not a failure: VRAM is
reclaimed when the local process is stopped, or by the remote unloader for
remote backends. Logging it as "WARN Error freeing GPU resources" made a
benign, optional RPC look like a fault (the alarming line in #10602, seen
in distributed mode where the model is remote and Free hits a stub).

Treat gRPC Unimplemented from Free() as a no-op logged at Debug; genuine
failures still Warn. Free() is still attempted for every backend, so any
backend that does implement it is unaffected.

Add a reusable grpcerrors.IsUnimplemented helper following the package's
existing code-based detection idiom (prefer the typed status code, fall
back to the message across non-gRPC boundaries), with table tests.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com>

* fix(watchdog): log a non-Unimplemented Free() failure at error level

Per review: now that the expected gRPC Unimplemented case is split out and
logged at Debug, any remaining Free() error is a genuine failure to release
VRAM, so surface it at error level instead of warn.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com>

---------

Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com>
2026-06-30 22:14:01 +02:00
Ettore Di Giacinto
25ecb9f015 fix(gallery): use Q8_0 for lfm2.5-8b-a1b to fix poor tool-call quality
The Q4_K_M quant degraded tool-call reliability for LFM2.5-8B-A1B.
Switch the gallery entry to the Q8_0 GGUF (sha256 verified via HF
x-linked-etag) while keeping the native jinja tool-parsing config.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-30 17:46:20 +00:00
LocalAI [bot]
2be495f9c0 fix(kokoros): implement AudioTranscriptionLive trait stub (#10612)
The backend.proto AudioTranscriptionLive bidirectional streaming RPC added
new required trait items (AudioTranscriptionLiveStream + audio_transcription_live)
on the generated Backend trait. The kokoros (TTS) backend did not implement
them, breaking its release build with E0046 (missing trait items).

kokoros is text-to-speech and has no live-ASR support, so stub the method to
return UNIMPLEMENTED, mirroring the existing audio_transcription_stream stub.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-30 19:38:41 +02:00
16 changed files with 91 additions and 16 deletions

View File

@@ -171,6 +171,17 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ]; then \
ln -s /opt/rocm-**/lib/llvm/lib/libomp.so /usr/lib/libomp.so \
; fi
# ROCm's bundled libdrm_amdgpu is built with a hardcoded fallback lookup path
# for the ASIC ID table (/opt/amdgpu/share/libdrm/amdgpu.ids), which only exists
# if AMD's full amdgpu graphics/DKMS stack is installed. This compute-only image
# doesn't have it, so hipblas/rocBLAS log "No such file or directory" on every
# model load and can fail to identify the GPU. Point it at the equivalent file
# Ubuntu's libdrm-common package already ships.
RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ -f /usr/share/libdrm/amdgpu.ids ] && [ ! -e /opt/amdgpu/share/libdrm/amdgpu.ids ]; then \
mkdir -p /opt/amdgpu/share/libdrm && \
ln -s /usr/share/libdrm/amdgpu.ids /opt/amdgpu/share/libdrm/amdgpu.ids \
; fi
RUN expr "${BUILD_TYPE}" = intel && echo "intel" > /run/localai/capability || echo "not intel"
# Cuda

View File

@@ -1,5 +1,5 @@
IK_LLAMA_VERSION?=f74a6fb87b315b2c3154166e075360e15021a61d
IK_LLAMA_VERSION?=29431b31c89e79c10f8736e8f2742485ba1713d6
LLAMA_REPO?=https://github.com/ikawrakow/ik_llama.cpp
CMAKE_ARGS?=

View File

@@ -1,5 +1,5 @@
LLAMA_VERSION?=6f4f53f2b7da54fcdbbecaaa734337c337ad6176
LLAMA_VERSION?=0eca4d490e591d4e93058d07540cf47278a72577
LLAMA_REPO?=https://github.com/ggerganov/llama.cpp
CMAKE_ARGS?=

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# CrispASR version (release tag)
CRISPASR_REPO?=https://github.com/CrispStrobe/CrispASR
CRISPASR_VERSION?=3b93758f9725d400eca82976f895e4cec3f31260
CRISPASR_VERSION?=8fd9db8fec8cb5e929d23d3267ed5817794feb1a
SO_TARGET?=libgocrispasr.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# stablediffusion.cpp (ggml)
STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
STABLEDIFFUSION_GGML_VERSION?=3b6c9ca97cfcda8e68e719e6670d06379fcbe943
STABLEDIFFUSION_GGML_VERSION?=484baa41e5e006c52dcd4addc38c830b9489745f
CMAKE_ARGS+=-DGGML_MAX_NAME=128

View File

@@ -798,6 +798,7 @@ void sd_img_gen_params_set_seed(sd_img_gen_params_t *params, int64_t seed) {
int gen_image(sd_img_gen_params_t *p, int steps, char *dst, float cfg_scale, char *src_image, float strength, char *mask_image, char* ref_images[], int ref_images_count) {
sd_image_t* results;
int num_results_out = 0;
std::vector<int> skip_layers = {7, 8, 9};
@@ -994,10 +995,14 @@ int gen_image(sd_img_gen_params_t *p, int steps, char *dst, float cfg_scale, cha
sd_ctx_params_to_str(&ctx_params),
sd_img_gen_params_to_str(p));
results = generate_image(sd_c, p);
bool gen_ok = generate_image(sd_c, p, &results, &num_results_out);
std::free(p);
if (!gen_ok || num_results_out == 0) {
results = NULL;
}
if (results == NULL) {
fprintf (stderr, "NO results\n");
if (input_image_buffer) free(input_image_buffer);

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp
WHISPER_CPP_VERSION?=0ae02cdb2c7317b50991367c165736ce42ed96ac
WHISPER_CPP_VERSION?=0874de3e8e8e48361dba85c7fe6d176f008bf158
SO_TARGET?=libgowhisper.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF

View File

@@ -104,7 +104,7 @@ if [ "$(uname -s)" = "Darwin" ]; then
# can rewrite it. Darwin therefore follows vllm-metal and can lag the Linux
# vllm pin (requirements-cublas13-after.txt, bumped independently against
# vllm/vllm) until vllm-metal supports a newer vLLM.
VLLM_METAL_VERSION="v0.3.0.dev20260628073537"
VLLM_METAL_VERSION="v0.3.0.dev20260630095652"
# The coupled vLLM source version is whatever this vllm-metal release builds
# against -- it declares it in its own installer as `vllm_v=`. Derive it from

View File

@@ -3,8 +3,8 @@
# on a cu130 host. Pull the cu130-flavoured wheel from vLLM's per-tag index
# instead — the cublas13 case in install.sh adds --index-strategy=unsafe-best-match
# so uv consults this index alongside PyPI.
--extra-index-url https://wheels.vllm.ai/0.23.0/cu130
--extra-index-url https://wheels.vllm.ai/0.24.0/cu130
# VERSION COUPLING: darwin/Apple-Silicon builds use vllm-metal (see install.sh),
# which pins this exact vLLM version. Bumping vllm here means coordinating with a
# vllm-metal release that supports the new version, or macOS/Metal builds break.
vllm==0.23.0
vllm==0.24.0

View File

@@ -351,6 +351,16 @@ impl Backend for KokorosService {
Err(Status::unimplemented("Not supported"))
}
type AudioTranscriptionLiveStream =
ReceiverStream<Result<backend::TranscriptLiveResponse, Status>>;
async fn audio_transcription_live(
&self,
_: Request<tonic::Streaming<backend::TranscriptLiveRequest>>,
) -> Result<Response<Self::AudioTranscriptionLiveStream>, Status> {
Err(Status::unimplemented("Not supported"))
}
async fn diarize(
&self,
_: Request<backend::DiarizeRequest>,

View File

@@ -207,12 +207,20 @@ func (l *Launcher) StartLocalAI() error {
}
// Build command arguments
dataPath := l.GetDataPath()
args := []string{
"run",
"--models-path", l.config.ModelsPath,
"--backends-path", l.config.BackendsPath,
"--address", l.config.Address,
"--log-level", l.config.LogLevel,
// Keep persistent data and dynamic config under the launcher's data
// directory (~/.localai) rather than letting the server resolve them
// to ${basepath}/{data,configuration}. ${basepath} expands to the
// launcher process's CWD (often the user's home root), which puts
// ~/data and ~/configuration outside ~/.localai. See #10610.
"--data-path", filepath.Join(dataPath, "data"),
"--localai-config-dir", filepath.Join(dataPath, "configuration"),
}
l.localaiCmd = exec.CommandContext(l.ctx, binaryPath, args...)

View File

@@ -1,3 +1,3 @@
{
"version": "v4.5.5"
"version": "v4.5.6"
}

View File

@@ -1716,7 +1716,7 @@
- use_jinja:true
parameters:
min_p: 0.15
model: llama-cpp/models/LFM2.5-8B-A1B-GGUF/LFM2.5-8B-A1B-Q4_K_M.gguf
model: llama-cpp/models/LFM2.5-8B-A1B-GGUF/LFM2.5-8B-A1B-Q8_0.gguf
repeat_penalty: 1.05
temperature: 0.1
top_k: 50
@@ -1724,9 +1724,9 @@
template:
use_tokenizer_template: true
files:
- filename: llama-cpp/models/LFM2.5-8B-A1B-GGUF/LFM2.5-8B-A1B-Q4_K_M.gguf
uri: https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF/resolve/main/LFM2.5-8B-A1B-Q4_K_M.gguf
sha256: 4923ec14f06b968b74d663e5949867d2d9c3bf13a20b8be1a9f9af39989b2bb0
- filename: llama-cpp/models/LFM2.5-8B-A1B-GGUF/LFM2.5-8B-A1B-Q8_0.gguf
uri: https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF/resolve/main/LFM2.5-8B-A1B-Q8_0.gguf
sha256: 33ab3b8ce6a964fb8ebac89360c9b3cf72c4fa418d5e4c0a94d46883124d5c02
- name: "qwopus3.5-9b-coder-mtp"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:

View File

@@ -58,6 +58,23 @@ func IsLiveTranscriptionUnsupported(err error) bool {
return strings.Contains(strings.ToLower(err.Error()), "unimplemented")
}
// IsUnimplemented reports whether err is a gRPC Unimplemented status — the
// signal a backend gives for an RPC it does not implement. The generated
// UnimplementedBackendServer stub returns exactly this for any RPC a backend
// (e.g. a Python or external backend) has not overridden, so callers can treat
// an optional RPC as a no-op rather than a failure. Prefers the typed status
// code and falls back to the message for paths that lose the status (e.g. errors
// wrapped across non-gRPC boundaries).
func IsUnimplemented(err error) bool {
if err == nil {
return false
}
if status.Code(err) == codes.Unimplemented {
return true
}
return strings.Contains(strings.ToLower(err.Error()), "unimplemented")
}
// StreamTranscriptionUnsupported returns the canonical error a backend returns
// when it (or the loaded model) cannot serve the server-streaming
// AudioTranscriptionStream RPC. It carries codes.Unimplemented like the live

View File

@@ -55,6 +55,18 @@ var _ = Describe("grpcerrors", func() {
Expect(grpcerrors.IsModelNotLoaded(err)).To(BeFalse())
})
DescribeTable("IsUnimplemented",
func(err error, want bool) {
Expect(grpcerrors.IsUnimplemented(err)).To(Equal(want))
},
Entry("nil", nil, false),
Entry("typed code", status.Error(codes.Unimplemented, "method Free not implemented"), true),
Entry("stale stub message (Unknown code)", errors.New("rpc error: code = Unimplemented desc = "), true),
Entry("unrelated error", errors.New("context deadline exceeded"), false),
Entry("unrelated grpc code", status.Error(codes.Unavailable, "connection refused"), false),
Entry("model not loaded is NOT unimplemented", grpcerrors.ModelNotLoaded("parakeet-cpp"), false),
)
It("StreamTranscriptionUnsupported carries Unimplemented and is not ModelNotLoaded", func() {
err := grpcerrors.StreamTranscriptionUnsupported("parakeet-cpp", "not a streaming model")
Expect(status.Code(err)).To(Equal(codes.Unimplemented))

View File

@@ -11,6 +11,7 @@ import (
"time"
"github.com/hpcloud/tail"
"github.com/mudler/LocalAI/pkg/grpc/grpcerrors"
"github.com/mudler/LocalAI/pkg/signals"
process "github.com/mudler/go-processmanager"
"github.com/mudler/xlog"
@@ -52,10 +53,21 @@ func (ml *ModelLoader) deleteProcess(s string) error {
hook(s)
}
// Free GPU resources before stopping the process to ensure VRAM is released
// Free GPU resources before stopping the process to ensure VRAM is released.
// Free is optional: backends that don't override it (the generated stub, many
// Python/external backends, or a federation proxy in distributed mode) return
// gRPC Unimplemented. That is expected, not a failure — VRAM is reclaimed when
// the process is stopped below, or by the remote unloader for remote backends —
// so don't surface it as an error.
xlog.Debug("Calling Free() to release GPU resources", "model", s)
if err := model.GRPC(false, ml.wd).Free(context.Background()); err != nil {
xlog.Warn("Error freeing GPU resources", "error", err, "model", s)
if grpcerrors.IsUnimplemented(err) {
xlog.Debug("Backend does not implement Free(); GPU release handled on process stop", "model", s)
} else {
// Now that the expected Unimplemented case is filtered out above, a
// remaining error is a genuine failure to release VRAM — surface it.
xlog.Error("Error freeing GPU resources", "error", err, "model", s)
}
}
process := model.Process()