LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-05-02 13:07:41 -04:00

Author	SHA1	Message	Date
Ettore Di Giacinto	091eda8d70	feat: react chat redesign (#9616 ) * feat(react-ui): redesign chat — popover history, focus on send, density pass Replace the persistent 260px conversation sidebar with a Cmd/Ctrl+K popover (ChatsMenu) so the conversation owns the page. Once a chat has at least one message we auto-collapse the global app rail and fade non-essential header chrome; Esc gives the user back the full chrome for the rest of the session. Move Canvas mode and the MCP dropdown into the input wrapper as mode chips — they describe what's armed for the next message and now live where the user composes. The chat header drops to Chats · title · ModelSelector · overflow · settings, and an overflow menu carries admin-only Manage mode along with Info / Edit / Export / Clear. Density pass: tighter header (40px), smaller avatars with the assistant left-border accent doing the work, 88% bubble width, modern field-sizing on the textarea, 32px send/stop buttons. Empty state now surfaces a Recent strip (top 4 non-empty chats) and a Cmd+K hint, replacing the discoverability the persistent sidebar used to provide. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 * feat(react-ui): chat input chips, slimmer menu, focus mode polish Move Canvas mode and the MCP dropdown into the input wrapper as compact mode chips — they describe what's armed for the next message and now sit where the user composes. The MCP popover flips upward when anchored to the input row so it stays on-screen. Eliminate the chat header overflow ("…") menu entirely; relocate each item to its semantic home so users don't have to remember a miscellany drawer: - Manage mode toggle → top of the Settings drawer, alongside the other sticky chat knobs. The shield next to the title still signals state at a glance. - Model info / Edit config → small admin-only "ⓘ" button next to the ModelSelector; the existing model-info panel now hosts the Edit config link. - Export as Markdown → per-row hover action in ChatsMenu, so it works for any chat (not just the active one). - Clear chat history → destructive button at the bottom of the Settings drawer. Make the Sidebar listen to its own `sidebar-collapse` event so the chat's focus mode actually shrinks the rail (it previously only flipped the layout class, leaving the sidebar element at full width and overlapping the chat). Drop the focus-mode toast — the visual shift is enough; the toast was noise. Define `--color-text-tertiary` in both themes; without it metadata text (recent strip timestamps and a few other sites) was inheriting the platform default, which read as black on the dark surface. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 * fix(model/log-store): close merged channel exactly once; clean up Remove Two latent races in BackendLogStore.Subscribe could panic under load (distributed e2e test triggered "send on closed channel" at backend_log_store.go:288): 1. The aggregated path closed the merged channel `ch` from two places — the fan-in waiter goroutine (after all source channels drained) and unsubscribe(). When unsubscribe ran while a fan-in goroutine was mid-flight on `ch <- line`, the close beat the send and the runtime panicked. Now `ch` is closed by exactly one goroutine: the waiter that observes all fan-in goroutines finish. unsubscribe() only closes the per-buffer source channels — the for-range in each fan-in goroutine then exits naturally and the waiter takes care of the merged close. 2. Remove() closed every subscriber channel but didn't delete the entries from the subscribers map, so a concurrent unsubscribe() would call close() again on the already-closed channel ("close of closed channel"). Clear the map entry while closing. Add a regression test that hammers AppendLine concurrently with Subscribe + unsubscribe + Remove; the race detector catches both classes of regression. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 * test(model/log-store): port backend log store tests to ginkgo Bring backend_log_store_test.go in line with the rest of pkg/model (loader_test, watchdog_test, store_test): same external test package (`model_test`), same ginkgo + gomega imports, same Describe/It nesting around the public API. Behaviour is unchanged — the four existing scenarios plus the unsubscribe race regression all run as specs under the existing `TestModel` suite. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-29 22:33:26 +02:00
Ettore Di Giacinto	3280b9a287	fix(distributed): per-replica backend logs (store aggregation + UI) The multi-replica refactor (PR #9583) changed the worker's process key from `modelID` to `modelID#replicaIndex`, but the BackendLogStore kept the bare-modelID lookup. Result: every distributed deployment lost backend logs in the Nodes UI — single-replica too, since even the default capacity of 1 produces a `#0` suffix. Two changes wired together: * pkg/model: BackendLogStore.GetLines/Subscribe now treat a modelID without `#` as a model prefix and merge across all `modelID#N` replica buffers (timestamp-sorted for GetLines; fan-in for Subscribe). Calls with a full `modelID#N` key resolve exactly. ListModels strips replica suffixes and deduplicates so the listing surfaces one entry per loaded model. * react-ui: per-replica log streams as the default. Loaded Models table disambiguates each row with a `rep N` pill (only when the node hosts >1 replica of a model). Each row's "View logs" link routes to the per-replica process key so operators see only that replica's output. The logs page renders the replica context as a chip in the title and surfaces a segmented control — `Replica 0 / 1 / … / All merged` — when the model has multiple replicas; the merged segment uses the bare-modelID URL (delegating to the store's prefix aggregation) for the side-by-side comparison case. Single-replica deployments see no extra UI. Tests added first (TDD): the regression set in backend_log_store_test.go reproduces the bug at the exact failure point — GetLines/ListModels/Subscribe assertions all fail against the broken code, all pass against the fix. TestSubscribe_PerReplicaFilter pins the exact-key path so a future change can't silently break it. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: claude-code:opus-4-7 [Edit] [Skill:critique] [Skill:audit] [Skill:polish] [Skill:distill]	2026-04-27 20:55:24 +00:00
Richard Palethorpe	deca6dbdad	feat: Log backend exit code (#9581 ) Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-04-27 14:19:18 +02:00
Ettore Di Giacinto	f384c64a91	fix(model-loader): also skip .ckpt, .zip, and .tag files when scanning models The local model directory scan treats every non-skipped file as a model config candidate. Sidecar artifacts that ship alongside checkpoints (checkpoint blobs, downloaded archives, ggml-style tag files) were slipping through and showing up as bogus models in the listing. Add their extensions to the suffix-skip list. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude Code]	2026-04-26 19:37:53 +00:00
Ettore Di Giacinto	87e6de1989	feat: wire transcription for llama.cpp, add streaming support (#9353 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-14 16:13:40 +02:00
Ettore Di Giacinto	9ca03cf9cc	feat(backends): add ik-llama-cpp (#9326 ) * feat(backends): add ik-llama-cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore: add grpc e2e suite, hook to CI, update README Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2026-04-12 13:51:28 +02:00
Ettore Di Giacinto	5c35e85fe2	feat: allow to pin models and skip from reaping (#9309 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-11 08:38:17 +02:00
Ettore Di Giacinto	510d6759fe	fix(nodes): better detection if nodes goes down or model is not available (#9274 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-08 12:11:02 +02:00
Ettore Di Giacinto	6f304d1201	chore(refactor): use interface (#9226 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-04 17:29:37 +02:00
Ettore Di Giacinto	59108fbe32	feat: add distributed mode (#9124 ) * feat: add distributed mode (experimental) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix data races, mutexes, transactions Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix events and tool stream in agent chat Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * use ginkgo Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(cron): compute correctly time boundaries avoiding re-triggering Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * enhancements, refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * do not flood of healthy checks Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * do not list obvious backends as text backends Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * tests fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Drop redundant healthcheck Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * enhancements, refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-03-30 00:47:27 +02:00
Richard Palethorpe	35d509d8e7	feat(ui): Per model backend logs and various fixes (#9028 ) * feat(gallery): Switch to expandable box instead of pop-over and display model files Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(ui, backends): Add individual backend logging Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(ui): Set the context settings from the model config Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-03-18 08:31:26 +01:00
LocalAI [bot]	c0351b8e6a	Remove HuggingFace backend support (#8971 ) * Remove HuggingFace backend support, restore other backends - Removed backend/go/huggingface directory and all related files - Removed pkg/langchain/huggingface.go - Removed LCHuggingFaceBackend from pkg/model/initializers.go - Removed huggingface backend entries from backend/index.yaml - Updated backend/README.md to remove HuggingFace backend reference - Restored kitten-tts, local-store, silero-vad, piper backends that were incorrectly removed This change removes only HuggingFace backend support from LocalAI as per the P0 priority request in issue #8963, while preserving other backends (kitten-tts, local-store, silero-vad, piper). Signed-off-by: team-coding-agent-1 <team-coding-agent-1@localai.dev> * Remove huggingface backend from test.yml build command The tests-linux CI job was failing because it was trying to build the huggingface backend which no longer exists after the backend removal. This removes huggingface from the build command in test.yml. * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> --------- Signed-off-by: team-coding-agent-1 <team-coding-agent-1@localai.dev> Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Co-authored-by: team-coding-agent-1 <team-coding-agent-1@localai.dev> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2026-03-13 01:09:30 +01:00
Ettore Di Giacinto	8818452d85	feat(ui): MCP Apps, mcp streaming and client-side support (#8947 ) * Revert "fix: Add timeout-based wait for model deletion completion (#8756)" This reverts commit `9e1b0d0c82`. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat: add mcp prompts and resources Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ui): add client-side MCP Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ui): allow to authenticate MCP servers Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ui): add MCP Apps Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore: update AGENTS Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore: allow to collapse navbar, save state in storage Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ui): add MCP button also to home page Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(chat): populate string content Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-03-11 07:30:49 +01:00
LocalAI [bot]	6e5a58ca70	feat: Add Free RPC to backend.proto for VRAM cleanup (#8751 ) * fix: Add VRAM cleanup when stopping models - Add Free() method to AIModel interface for proper GPU resource cleanup - Implement Free() in llama backend to release llama.cpp model resources - Add Free() stub implementations in base and SingleThread backends - Modify deleteProcess() to call Free() before stopping the process to ensure VRAM is properly released when models are unloaded Fixes issue where VRAM was not freed when stopping models, which could lead to memory exhaustion when running multiple models sequentially. * feat: Add Free RPC to backend.proto for VRAM cleanup\n\n- Add rpc Free(HealthMessage) returns (Result) {} to backend.proto\n- This RPC is required to properly expose the Free() method\n through the gRPC interface for VRAM resource cleanup\n\nRefs: PR #8739 * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> --------- Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Co-authored-by: localai-bot <localai-bot@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2026-03-03 12:39:06 +01:00
Ettore Di Giacinto	76fba02e56	fix: do not keep track model if not existing (#8603 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-02-19 17:18:38 +01:00
Ettore Di Giacinto	ecba23d44e	fix: improve watchdown logics (#8591 ) * fix: ensure proper watchdog shutdown and state passing between restarts Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: add missing watchdog settings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: untrack model if we shut it down successfully Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-02-17 18:49:22 +01:00
Yaroslav98214	6dbcdb0b9e	fix: filter GGUF and GGML files from model list (#8397 ) Filter GGUF and GGML files from model list Skip .gguf/.ggml loose files when listing models and add a test for .gguf exclusion. Closes #1077 Signed-off-by: Yaroslav98214 <diakovichyaroslav30@gmail.com>	2026-02-05 10:17:46 +01:00
Andres	b6459ddd57	feat(api): Add transcribe response format request parameter & adjust STT backends (#8318 ) * WIP response format implementation for audio transcriptions (cherry picked from commit e271dd764bbc13846accf3beb8b6522153aa276f) Signed-off-by: Andres Smith <andressmithdev@pm.me> * Rework transcript response_format and add more formats (cherry picked from commit 6a93a8f63e2ee5726bca2980b0c9cf4ef8b7aeb8) Signed-off-by: Andres Smith <andressmithdev@pm.me> * Add test and replace go-openai package with official openai go client (cherry picked from commit f25d1a04e46526429c89db4c739e1e65942ca893) Signed-off-by: Andres Smith <andressmithdev@pm.me> * Fix faster-whisper backend and refactor transcription formatting to also work on CLI Signed-off-by: Andres Smith <andressmithdev@pm.me> (cherry picked from commit 69a93977d5e113eb7172bd85a0f918592d3d2168) Signed-off-by: Andres Smith <andressmithdev@pm.me> --------- Signed-off-by: Andres Smith <andressmithdev@pm.me> Co-authored-by: nanoandrew4 <nanoandrew4@gmail.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2026-02-01 17:33:17 +01:00
Ettore Di Giacinto	34e054f607	fix(reasoning): support models with reasoning without starting thinking tag (#8132 ) * chore: extract reasoning to its own package Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * make sure we detect thinking tokens from template Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Allow to override via config, add tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-01-20 21:07:59 +01:00
Jon Roeber	2554e9fabe	fix(model): do not assume success when deleting a model process (#7963 ) * fix(model): do not assume success when deleting a model process Signed-off-by: Jon Roeber <jon@roeber.dev> * Update pkg/model/process.go Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Signed-off-by: Jon Roeber <65431671+jroeber@users.noreply.github.com> --------- Signed-off-by: Jon Roeber <jon@roeber.dev> Signed-off-by: Jon Roeber <65431671+jroeber@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2026-01-10 23:33:44 +01:00
Ettore Di Giacinto	c844b7ac58	feat: disable force eviction (#7725 ) * feat: allow to set forcing backends eviction while requests are in flight Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat: try to make the request sit and retry if eviction couldn't be done Otherwise calls that in order to pass would need to shutdown other backends would just fail. In this way instead we make the request sit and retry eviction until it succeeds. The thresholds can be configured by the user. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * add tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * expose settings to CLI Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Update docs Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-12-25 14:26:18 +01:00
Ettore Di Giacinto	c37785b78c	chore(refactor): move logging to common package based on slog (#7668 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-12-21 19:33:13 +01:00
Ettore Di Giacinto	f251bdee64	chore: fixup tests with defaults from constants Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-12-16 21:26:55 +00:00
Ettore Di Giacinto	424c95edba	fix: correctly propagate error during model load (#7610 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-12-16 18:26:54 +01:00
Ettore Di Giacinto	b348a99b03	chore: move defaults to constants Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-12-16 17:40:51 +01:00
Ettore Di Giacinto	f3c70a96ba	chore(memory-reclaimer): use saner defaults Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-12-16 16:25:09 +01:00
Ettore Di Giacinto	50f9c9a058	feat(watchdog): add Memory resource reclaimer (#7583 ) * feat(watchdog): add GPU reclaimer Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Handle vram calculation for unified memory devices Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Support RAM eviction, set watchdog interval from runtime settings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-12-16 09:15:18 +01:00
Ettore Di Giacinto	fc5b9ebfcc	feat(loader): enhance single active backend to support LRU eviction (#7535 ) * feat(loader): refactor single active backend support to LRU This changeset introduces LRU management of loaded backends. Users can set now a maximum number of models to be loaded concurrently, and, when setting LocalAI in single active backend mode we set LRU to 1 for backward compatibility. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore: add tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Update docs Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-12-12 12:28:38 +01:00
Ettore Di Giacinto	5dde7e9ac6	fix: make sure to close on errors (#7521 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-12-11 14:03:20 +01:00
Ettore Di Giacinto	2dd42292dc	feat(ui): runtime settings (#7320 ) * feat(ui): add watchdog settings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Do not re-read env Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Some refactor, move other settings to runtime (p2p) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add API Keys handling Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Allow to disable runtime settings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Documentation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Small fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * show MCP toggle in index Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Drop context default Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-11-20 22:37:20 +01:00
Ettore Di Giacinto	10a66938f9	fix: guard from potential deadlock with requests in flight (#6484 ) * fix(watchdog): guard from potential deadlock with requests in flight Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Improve locking when loading models Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-10-16 21:28:19 +02:00
Ettore Di Giacinto	27c4161401	chore: update cogito and simplify MCP logics (#6413 ) * chore: update cogito and simplify MCP logics Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Refine signal handling Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-10-09 12:36:45 +02:00
Ettore Di Giacinto	585da99c52	chore(models): add whisper-turbo via whisper.cpp (#6340 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-09-25 09:15:06 +02:00
Ettore Di Giacinto	79a41a5e07	fix: register backends to model-loader during installation (#6159 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-08-28 19:11:02 +02:00
Ettore Di Giacinto	bef4c10629	feat(ui): General improvements (#6072 ) * wip * Simplify stop Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Improve UI Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Show installed backends at the index Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Imporve UI Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-08-16 07:44:50 +02:00
Ettore Di Giacinto	089efe05fd	feat(backends): add system backend, refactor (#6059 ) - Add a system backend path - Refactor and consolidate system information in system state - Use system state in all the components to figure out the system paths to used whenever needed - Refactor BackendConfig -> ModelConfig. This was otherway misleading as now we do have a backend configuration which is not the model config. Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-08-14 19:38:26 +02:00
Ettore Di Giacinto	10a3f0bd92	fix: chmod grpc processes only if needed (#6051 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-08-13 12:08:53 +02:00
Ettore Di Giacinto	98e5291afc	feat: refactor build process, drop embedded backends (#5875 ) * feat: split remaining backends and drop embedded backends - Drop silero-vad, huggingface, and stores backend from embedded binaries - Refactor Makefile and Dockerfile to avoid building grpc backends - Drop golang code that was used to embed backends - Simplify building by using goreleaser Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(gallery): be specific with llama-cpp backend templates Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(docs): update Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(ci): minor fixes Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore: drop all ffmpeg references Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: run protogen-go Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Always enable p2p mode Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Update gorelease file Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(stores): do not always load Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fix linting issues Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Simplify Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Mac OS fixup Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-07-22 16:31:04 +02:00
Ettore Di Giacinto	b29544d747	feat: split piper from main binary (#5858 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-07-19 08:31:33 +02:00
Ettore Di Giacinto	294f7022f3	feat: do not bundle llama-cpp anymore (#5790 ) * Build llama.cpp separately Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * WIP Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * WIP Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * WIP Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Start to try to attach some tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add git and small fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: correctly autoload external backends Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Try to run AIO tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Slightly update the Makefile helps Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Adapt auto-bumper Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Try to run linux test Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add llama-cpp into build pipelines Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add default capability (for cpu) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Drop llama-cpp specific logic from the backend loader Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * drop grpc install in ci for tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Pass by backends path for tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Build protogen at start Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(tests): set backends path consistently Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Correctly configure the backends path Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Try to build for darwin Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * WIP Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Compile for metal on arm64/darwin Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Try to run build off from cross-arch Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add to the backend index nvidia-l4t and cpu's llama-cpp backends Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Build also darwin-x86 for llama-cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Disable arm64 builds temporary Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Test backend build on PR Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fixup build backend reusable workflow Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * pass by skip drivers Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Use crane Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Skip drivers Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * x86 darwin Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add packaging step for llama.cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fix leftover from bark-cpp extraction Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Try to fix hipblas build Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-07-18 13:24:12 +02:00
Ettore Di Giacinto	2d64269763	feat: Add backend gallery (#5607 ) * feat: Add backend gallery This PR add support to manage backends as similar to models. There is now available a backend gallery which can be used to install and remove extra backends. The backend gallery can be configured similarly as a model gallery, and API calls allows to install and remove new backends in runtime, and as well during the startup phase of LocalAI. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add backends docs Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * wip: Backend Dockerfile for python backends Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat: drop extras images, build python backends separately Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixup on all backends Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test CI Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Tweaks Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Drop old backends leftovers Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fixup CI Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Move dockerfile upper Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fix proto Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Feature dropped for consistency - we prefer model galleries Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add missing packages in the build image Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * exllama is ponly available on cublas Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * pin torch on chatterbox Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fixups to index Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * CI Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Debug CI * Install accellerators deps Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add target arch * Add cuda minor version Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Use self-hosted runners Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: use quay for test images Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups for vllm and chatterbox Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Small fixups on CI Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chatterbox is only available for nvidia Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Simplify CI builds Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Adapt test, use qwen3 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(model gallery): add jina-reranker-v1-tiny-en-gguf Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(gguf-parser): recover from potential panics that can happen while reading ggufs with gguf-parser Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Use reranker from llama.cpp in AIO images Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Limit concurrent jobs Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2025-06-15 14:56:52 +02:00
kilavvy	2e1dc8deef	Fix Typos in Comments and Error Messages (#5637 ) * Update initializers.go Signed-off-by: kilavvy <140459108+kilavvy@users.noreply.github.com> * Update base.go Signed-off-by: kilavvy <140459108+kilavvy@users.noreply.github.com> --------- Signed-off-by: kilavvy <140459108+kilavvy@users.noreply.github.com>	2025-06-12 18:34:32 +02:00
omahs	0f365ac204	fix: typos (#5376 ) Signed-off-by: omahs <73983677+omahs@users.noreply.github.com>	2025-05-16 12:45:48 +02:00
Ettore Di Giacinto	9628860c0e	feat(llama.cpp/clip): inject gpu options if we detect GPUs (#5243 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-26 00:04:47 +02:00
Ettore Di Giacinto	2c425e9c69	feat(loader): enhance single active backend by treating as singleton (#5107 ) feat(loader): enhance single active backend by treating at singleton Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-01 20:58:11 +02:00
Ettore Di Giacinto	05f7004487	fix: race during stop of active backends (#5106 ) * chore: drop double call to stop all backends, refactors Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: do lock when cycling to models to delete Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-04-01 00:01:10 +02:00
Bas Hulsken	bbf30d416d	fix: change initialization order of llama-cpp-avx512 to go before avx2 variant (#4837 ) changed to initialization order of the avx512 version of llama.cpp, now tries before avx2 Signed-off-by: Bas Hulsken <bhulsken@hotmail.com>	2025-02-17 09:32:21 +01:00
Dave	3cddf24747	feat: Centralized Request Processing middleware (#3847 ) * squash past, centralize request middleware PR Signed-off-by: Dave Lee <dave@gray101.com> * migrate bruno request files to examples repo Signed-off-by: Dave Lee <dave@gray101.com> * fix Signed-off-by: Dave Lee <dave@gray101.com> * Update tests/e2e-aio/e2e_test.go Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> --------- Signed-off-by: Dave Lee <dave@gray101.com> Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2025-02-10 12:06:16 +01:00
Ettore Di Giacinto	7f90ff7aec	chore(llama-ggml): drop deprecated backend (#4775 ) The GGML format is now dead, since in the next version of LocalAI we already bring many breaking compatibility changes, taking the occasion also to drop ggml support (pre-gguf). Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-02-06 18:36:23 +01:00
Ettore Di Giacinto	5177837ab0	chore: detect and enable avx512 builds (#4675 ) chore(avx512): add support Fixes https://github.com/mudler/LocalAI/issues/4662 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2025-01-24 08:26:44 +01:00

1 2 3 4

186 Commits