Compare commits

...

10 Commits

Author SHA1 Message Date
dependabot[bot]
e9154d4a3a chore(deps): bump the pip group across 5 directories with 1 update
Bumps the pip group with 1 update in the /backend/python/ace-step directory: torch.
Bumps the pip group with 1 update in the /backend/python/rfdetr directory: torch.
Bumps the pip group with 1 update in the /backend/python/sglang directory: torch.
Bumps the pip group with 1 update in the /backend/python/trl directory: torch.
Bumps the pip group with 1 update in the /backend/python/vllm-omni directory: torch.


Updates `torch` from 2.10.0+rocm7.0 to 2.12.0+cpu

Updates `torch` from 2.7.1 to 2.12.0+cu130

Updates `torch` from 2.9.0 to 2.12.0+cpu

Updates `torch` from 2.10.0 to 2.12.0+cpu

Updates `torch` from 2.7.0 to 2.12.0+cu130

---
updated-dependencies:
- dependency-name: torch
  dependency-version: 2.12.0+cpu
  dependency-type: direct:production
  dependency-group: pip
- dependency-name: torch
  dependency-version: 2.12.0+cu130
  dependency-type: direct:production
  dependency-group: pip
- dependency-name: torch
  dependency-version: 2.12.0+cpu
  dependency-type: direct:production
  dependency-group: pip
- dependency-name: torch
  dependency-version: 2.12.0+cpu
  dependency-type: direct:production
  dependency-group: pip
- dependency-name: torch
  dependency-version: 2.12.0+cu130
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-07-01 18:55:56 +00:00
LocalAI [bot]
0d8adfc59a chore: ⬆️ Update ggml-org/llama.cpp to 0eca4d490e591d4e93058d07540cf47278a72577 (#10617)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-07-01 09:31:50 +02:00
LocalAI [bot]
43f2615e19 chore: ⬆️ Update vllm-project/vllm cu130 wheel to 0.24.0 (#10618)
⬆️ Update vllm-project/vllm cu130 wheel

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-07-01 08:53:03 +02:00
LocalAI [bot]
875c539ad5 chore: ⬆️ Update ikawrakow/ik_llama.cpp to 29431b31c89e79c10f8736e8f2742485ba1713d6 (#10620)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-07-01 08:52:36 +02:00
LocalAI [bot]
d641ded194 chore: ⬆️ Update ggml-org/whisper.cpp to 0874de3e8e8e48361dba85c7fe6d176f008bf158 (#10621)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-07-01 08:43:40 +02:00
LocalAI [bot]
40445fff05 chore: ⬆️ Update leejet/stable-diffusion.cpp to 484baa41e5e006c52dcd4addc38c830b9489745f (#10619)
* ⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix(stablediffusion-ggml): adapt to new generate_image() out-param signature

leejet/stable-diffusion.cpp@484baa4 changed generate_image() from
returning sd_image_t* to returning bool with images_out/num_images_out
out-parameters (same pattern already used by generate_video()).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-07-01 08:32:57 +02:00
Tai An
057dee956a fix(launcher): keep data/config under ~/.localai (#10610) (#10613)
The launcher starts the server with run --models-path/--backends-path but
leaves --data-path and the dynamic config dir unset, so the server falls
back to its /data and /configuration defaults.
 is kong.ExpandPath("."), i.e. the launcher process CWD
(commonly the user's home root), producing ~/data and ~/configuration
outside ~/.localai and an agent-pool stateDir under ~/data.

Pass --data-path and --localai-config-dir explicitly, rooted at the
launcher's own data directory (GetDataPath() -> ~/.localai), so data and
config stay consistent with --models-path/--backends-path.
2026-06-30 22:14:59 +02:00
Adira
4ec39bb776 fix(watchdog): don't log optional Free() as an error when backend returns Unimplemented (#10602) (#10607)
* fix(watchdog): don't log optional Free() as an error when backend returns Unimplemented (#10602)

When the watchdog evicts a model, deleteProcess calls the backend's gRPC
Free() to release VRAM before stopping the process. Free is optional:
backends that don't override it -- the generated UnimplementedBackendServer
stub, many Python/external backends, or a federation proxy in distributed
mode -- return gRPC Unimplemented. That is expected, not a failure: VRAM is
reclaimed when the local process is stopped, or by the remote unloader for
remote backends. Logging it as "WARN Error freeing GPU resources" made a
benign, optional RPC look like a fault (the alarming line in #10602, seen
in distributed mode where the model is remote and Free hits a stub).

Treat gRPC Unimplemented from Free() as a no-op logged at Debug; genuine
failures still Warn. Free() is still attempted for every backend, so any
backend that does implement it is unaffected.

Add a reusable grpcerrors.IsUnimplemented helper following the package's
existing code-based detection idiom (prefer the typed status code, fall
back to the message across non-gRPC boundaries), with table tests.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com>

* fix(watchdog): log a non-Unimplemented Free() failure at error level

Per review: now that the expected gRPC Unimplemented case is split out and
logged at Debug, any remaining Free() error is a genuine failure to release
VRAM, so surface it at error level instead of warn.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com>

---------

Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com>
2026-06-30 22:14:01 +02:00
Ettore Di Giacinto
25ecb9f015 fix(gallery): use Q8_0 for lfm2.5-8b-a1b to fix poor tool-call quality
The Q4_K_M quant degraded tool-call reliability for LFM2.5-8B-A1B.
Switch the gallery entry to the Q8_0 GGUF (sha256 verified via HF
x-linked-etag) while keeping the native jinja tool-parsing config.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-30 17:46:20 +00:00
LocalAI [bot]
2be495f9c0 fix(kokoros): implement AudioTranscriptionLive trait stub (#10612)
The backend.proto AudioTranscriptionLive bidirectional streaming RPC added
new required trait items (AudioTranscriptionLiveStream + audio_transcription_live)
on the generated Backend trait. The kokoros (TTS) backend did not implement
them, breaking its release build with E0046 (missing trait items).

kokoros is text-to-speech and has no live-ASR support, so stub the method to
return UNIMPLEMENTED, mirroring the existing audio_transcription_stream stub.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-30 19:38:41 +02:00
31 changed files with 101 additions and 37 deletions

View File

@@ -1,5 +1,5 @@
IK_LLAMA_VERSION?=f74a6fb87b315b2c3154166e075360e15021a61d
IK_LLAMA_VERSION?=29431b31c89e79c10f8736e8f2742485ba1713d6
LLAMA_REPO?=https://github.com/ikawrakow/ik_llama.cpp
CMAKE_ARGS?=

View File

@@ -1,5 +1,5 @@
LLAMA_VERSION?=6f4f53f2b7da54fcdbbecaaa734337c337ad6176
LLAMA_VERSION?=0eca4d490e591d4e93058d07540cf47278a72577
LLAMA_REPO?=https://github.com/ggerganov/llama.cpp
CMAKE_ARGS?=

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# stablediffusion.cpp (ggml)
STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
STABLEDIFFUSION_GGML_VERSION?=3b6c9ca97cfcda8e68e719e6670d06379fcbe943
STABLEDIFFUSION_GGML_VERSION?=484baa41e5e006c52dcd4addc38c830b9489745f
CMAKE_ARGS+=-DGGML_MAX_NAME=128

View File

@@ -798,6 +798,7 @@ void sd_img_gen_params_set_seed(sd_img_gen_params_t *params, int64_t seed) {
int gen_image(sd_img_gen_params_t *p, int steps, char *dst, float cfg_scale, char *src_image, float strength, char *mask_image, char* ref_images[], int ref_images_count) {
sd_image_t* results;
int num_results_out = 0;
std::vector<int> skip_layers = {7, 8, 9};
@@ -994,10 +995,14 @@ int gen_image(sd_img_gen_params_t *p, int steps, char *dst, float cfg_scale, cha
sd_ctx_params_to_str(&ctx_params),
sd_img_gen_params_to_str(p));
results = generate_image(sd_c, p);
bool gen_ok = generate_image(sd_c, p, &results, &num_results_out);
std::free(p);
if (!gen_ok || num_results_out == 0) {
results = NULL;
}
if (results == NULL) {
fprintf (stderr, "NO results\n");
if (input_image_buffer) free(input_image_buffer);

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp
WHISPER_CPP_VERSION?=0ae02cdb2c7317b50991367c165736ce42ed96ac
WHISPER_CPP_VERSION?=0874de3e8e8e48361dba85c7fe6d176f008bf158
SO_TARGET?=libgowhisper.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF

View File

@@ -4,7 +4,7 @@ torchaudio
torchvision
# Core dependencies
transformers>=4.51.0,<4.58.0
transformers>=5.12.1,<5.13.0
diffusers
gradio
matplotlib>=3.7.5

View File

@@ -4,7 +4,7 @@ torchaudio
torchvision
# Core dependencies
transformers>=4.51.0,<4.58.0
transformers>=5.12.1,<5.13.0
diffusers
gradio>=6.5.1
matplotlib>=3.7.5

View File

@@ -4,7 +4,7 @@ torchaudio
torchvision
# Core dependencies
transformers>=4.51.0,<4.58.0
transformers>=5.12.1,<5.13.0
diffusers
gradio>=6.5.1
matplotlib>=3.7.5

View File

@@ -1,10 +1,10 @@
--extra-index-url https://download.pytorch.org/whl/rocm7.0
torch==2.10.0+rocm7.0
torch==2.12.0+cpu
torchaudio
torchvision
# Core dependencies
transformers>=4.51.0,<4.58.0
transformers>=5.12.1,<5.13.0
diffusers
gradio>=6.5.1
matplotlib>=3.7.5

View File

@@ -4,7 +4,7 @@ torchaudio
torchvision
# Core dependencies
transformers>=4.51.0,<4.58.0
transformers>=5.12.1,<5.13.0
diffusers
gradio
matplotlib>=3.7.5

View File

@@ -3,7 +3,7 @@ torch
torchaudio
torchvision
# Core dependencies
transformers>=4.51.0,<4.58.0
transformers>=5.12.1,<5.13.0
diffusers
gradio>=6.5.1
matplotlib>=3.7.5

View File

@@ -3,7 +3,7 @@ torchaudio
torchvision
# Core dependencies
transformers>=4.51.0,<4.58.0
transformers>=5.12.1,<5.13.0
diffusers
gradio
matplotlib>=3.7.5

View File

@@ -3,5 +3,5 @@ opencv-python
accelerate
peft
inference
torch==2.7.1
torch==2.12.0+cu130
optimum-quanto

View File

@@ -1,4 +1,4 @@
torch==2.7.1
torch==2.12.0+cu130
rfdetr
opencv-python
accelerate

View File

@@ -1,5 +1,5 @@
--extra-index-url https://download.pytorch.org/whl/cu130
torch==2.9.1
torch==2.12.0+cu130
rfdetr
opencv-python
accelerate

View File

@@ -1,5 +1,5 @@
--extra-index-url https://download.pytorch.org/whl/rocm7.0
torch==2.10.0+rocm7.0
torch==2.12.0+cu130
torchvision==0.25.0+rocm7.0
rfdetr
opencv-python

View File

@@ -1,4 +1,4 @@
torch==2.7.1
torch==2.12.0+cu130
rfdetr
opencv-python
accelerate

View File

@@ -1,6 +1,6 @@
--extra-index-url https://download.pytorch.org/whl/cpu
accelerate
torch==2.9.0
torch==2.12.0+cpu
torchvision
torchaudio
transformers

View File

@@ -6,7 +6,7 @@
# for cublas12 so uv consults this index alongside PyPI.
--extra-index-url https://download.pytorch.org/whl/cu128
accelerate
torch==2.9.1
torch==2.12.0+cpu
torchvision
torchaudio
transformers

View File

@@ -1,9 +1,9 @@
--extra-index-url https://download.pytorch.org/whl/cpu
torch==2.10.0
torch==2.12.0+cpu
trl
peft
datasets>=3.0.0
transformers>=4.56.2
transformers>=5.12.1
accelerate>=1.4.0
huggingface-hub>=1.3.0
sentencepiece

View File

@@ -1,8 +1,8 @@
torch==2.10.0
torch==2.12.0+cpu
trl
peft
datasets>=3.0.0
transformers>=4.56.2
transformers>=5.12.1
accelerate>=1.4.0
huggingface-hub>=1.3.0
sentencepiece

View File

@@ -1,8 +1,8 @@
torch==2.10.0
torch==2.12.0+cpu
trl
peft
datasets>=3.0.0
transformers>=4.56.2
transformers>=5.12.1
accelerate>=1.4.0
huggingface-hub>=1.3.0
sentencepiece

View File

@@ -1,8 +1,8 @@
torch==2.10.0
torch==2.12.0+cpu
trl
peft
datasets>=3.0.0
transformers>=4.56.2
transformers>=5.12.1
accelerate>=1.4.0
huggingface-hub>=1.3.0
sentencepiece

View File

@@ -1,4 +1,4 @@
accelerate
torch==2.7.0
torch==2.12.0+cu130
transformers
bitsandbytes

View File

@@ -3,8 +3,8 @@
# on a cu130 host. Pull the cu130-flavoured wheel from vLLM's per-tag index
# instead — the cublas13 case in install.sh adds --index-strategy=unsafe-best-match
# so uv consults this index alongside PyPI.
--extra-index-url https://wheels.vllm.ai/0.23.0/cu130
--extra-index-url https://wheels.vllm.ai/0.24.0/cu130
# VERSION COUPLING: darwin/Apple-Silicon builds use vllm-metal (see install.sh),
# which pins this exact vLLM version. Bumping vllm here means coordinating with a
# vllm-metal release that supports the new version, or macOS/Metal builds break.
vllm==0.23.0
vllm==0.24.0

View File

@@ -351,6 +351,16 @@ impl Backend for KokorosService {
Err(Status::unimplemented("Not supported"))
}
type AudioTranscriptionLiveStream =
ReceiverStream<Result<backend::TranscriptLiveResponse, Status>>;
async fn audio_transcription_live(
&self,
_: Request<tonic::Streaming<backend::TranscriptLiveRequest>>,
) -> Result<Response<Self::AudioTranscriptionLiveStream>, Status> {
Err(Status::unimplemented("Not supported"))
}
async fn diarize(
&self,
_: Request<backend::DiarizeRequest>,

View File

@@ -207,12 +207,20 @@ func (l *Launcher) StartLocalAI() error {
}
// Build command arguments
dataPath := l.GetDataPath()
args := []string{
"run",
"--models-path", l.config.ModelsPath,
"--backends-path", l.config.BackendsPath,
"--address", l.config.Address,
"--log-level", l.config.LogLevel,
// Keep persistent data and dynamic config under the launcher's data
// directory (~/.localai) rather than letting the server resolve them
// to ${basepath}/{data,configuration}. ${basepath} expands to the
// launcher process's CWD (often the user's home root), which puts
// ~/data and ~/configuration outside ~/.localai. See #10610.
"--data-path", filepath.Join(dataPath, "data"),
"--localai-config-dir", filepath.Join(dataPath, "configuration"),
}
l.localaiCmd = exec.CommandContext(l.ctx, binaryPath, args...)

View File

@@ -1716,7 +1716,7 @@
- use_jinja:true
parameters:
min_p: 0.15
model: llama-cpp/models/LFM2.5-8B-A1B-GGUF/LFM2.5-8B-A1B-Q4_K_M.gguf
model: llama-cpp/models/LFM2.5-8B-A1B-GGUF/LFM2.5-8B-A1B-Q8_0.gguf
repeat_penalty: 1.05
temperature: 0.1
top_k: 50
@@ -1724,9 +1724,9 @@
template:
use_tokenizer_template: true
files:
- filename: llama-cpp/models/LFM2.5-8B-A1B-GGUF/LFM2.5-8B-A1B-Q4_K_M.gguf
uri: https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF/resolve/main/LFM2.5-8B-A1B-Q4_K_M.gguf
sha256: 4923ec14f06b968b74d663e5949867d2d9c3bf13a20b8be1a9f9af39989b2bb0
- filename: llama-cpp/models/LFM2.5-8B-A1B-GGUF/LFM2.5-8B-A1B-Q8_0.gguf
uri: https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF/resolve/main/LFM2.5-8B-A1B-Q8_0.gguf
sha256: 33ab3b8ce6a964fb8ebac89360c9b3cf72c4fa418d5e4c0a94d46883124d5c02
- name: "qwopus3.5-9b-coder-mtp"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:

View File

@@ -58,6 +58,23 @@ func IsLiveTranscriptionUnsupported(err error) bool {
return strings.Contains(strings.ToLower(err.Error()), "unimplemented")
}
// IsUnimplemented reports whether err is a gRPC Unimplemented status — the
// signal a backend gives for an RPC it does not implement. The generated
// UnimplementedBackendServer stub returns exactly this for any RPC a backend
// (e.g. a Python or external backend) has not overridden, so callers can treat
// an optional RPC as a no-op rather than a failure. Prefers the typed status
// code and falls back to the message for paths that lose the status (e.g. errors
// wrapped across non-gRPC boundaries).
func IsUnimplemented(err error) bool {
if err == nil {
return false
}
if status.Code(err) == codes.Unimplemented {
return true
}
return strings.Contains(strings.ToLower(err.Error()), "unimplemented")
}
// StreamTranscriptionUnsupported returns the canonical error a backend returns
// when it (or the loaded model) cannot serve the server-streaming
// AudioTranscriptionStream RPC. It carries codes.Unimplemented like the live

View File

@@ -55,6 +55,18 @@ var _ = Describe("grpcerrors", func() {
Expect(grpcerrors.IsModelNotLoaded(err)).To(BeFalse())
})
DescribeTable("IsUnimplemented",
func(err error, want bool) {
Expect(grpcerrors.IsUnimplemented(err)).To(Equal(want))
},
Entry("nil", nil, false),
Entry("typed code", status.Error(codes.Unimplemented, "method Free not implemented"), true),
Entry("stale stub message (Unknown code)", errors.New("rpc error: code = Unimplemented desc = "), true),
Entry("unrelated error", errors.New("context deadline exceeded"), false),
Entry("unrelated grpc code", status.Error(codes.Unavailable, "connection refused"), false),
Entry("model not loaded is NOT unimplemented", grpcerrors.ModelNotLoaded("parakeet-cpp"), false),
)
It("StreamTranscriptionUnsupported carries Unimplemented and is not ModelNotLoaded", func() {
err := grpcerrors.StreamTranscriptionUnsupported("parakeet-cpp", "not a streaming model")
Expect(status.Code(err)).To(Equal(codes.Unimplemented))

View File

@@ -11,6 +11,7 @@ import (
"time"
"github.com/hpcloud/tail"
"github.com/mudler/LocalAI/pkg/grpc/grpcerrors"
"github.com/mudler/LocalAI/pkg/signals"
process "github.com/mudler/go-processmanager"
"github.com/mudler/xlog"
@@ -52,10 +53,21 @@ func (ml *ModelLoader) deleteProcess(s string) error {
hook(s)
}
// Free GPU resources before stopping the process to ensure VRAM is released
// Free GPU resources before stopping the process to ensure VRAM is released.
// Free is optional: backends that don't override it (the generated stub, many
// Python/external backends, or a federation proxy in distributed mode) return
// gRPC Unimplemented. That is expected, not a failure — VRAM is reclaimed when
// the process is stopped below, or by the remote unloader for remote backends —
// so don't surface it as an error.
xlog.Debug("Calling Free() to release GPU resources", "model", s)
if err := model.GRPC(false, ml.wd).Free(context.Background()); err != nil {
xlog.Warn("Error freeing GPU resources", "error", err, "model", s)
if grpcerrors.IsUnimplemented(err) {
xlog.Debug("Backend does not implement Free(); GPU release handled on process stop", "model", s)
} else {
// Now that the expected Unimplemented case is filtered out above, a
// remaining error is a genuine failure to release VRAM — surface it.
xlog.Error("Error freeing GPU resources", "error", err, "model", s)
}
}
process := model.Process()