The OpenAI Node.js SDK v4+ sends encoding_format=base64 by default.
LocalAI previously ignored this parameter and always returned a float
JSON array, causing a silent data corruption bug in any Node.js client
(AnythingLLM Desktop, LangChain.js, LlamaIndex.TS, …):
// What the client does when it expects base64 but receives a float array:
Buffer.from(floatArray, 'base64')
Node.js treats a non-string first argument as a byte array — each
float32 value is truncated to a single byte — and Float32Array then
reads those bytes as floats, yielding dims/4 values. Vector databases
(Qdrant, pgvector, …) then create collections with the wrong dimension,
causing all similarity searches to fail silently.
e.g. granite-embedding-107m (384 dims) → 96 stored in Qdrant
jina-embeddings-v3 (1024 dims) → 256 stored in Qdrant
Changes:
- core/schema/prediction.go: add EncodingFormat string field to
PredictionOptions so the request parameter is parsed and available
throughout the request pipeline
- core/schema/openai.go: add EmbeddingBase64 string field to Item;
add MarshalJSON so the "embedding" JSON key emits either []float32
or a base64 string depending on which field is populated — all other
Item consumers (image, video endpoints) are unaffected
- core/http/endpoints/openai/embeddings.go: add floatsToBase64()
which packs a float32 slice as little-endian bytes and base64-encodes
it; add embeddingItem() helper; both InputToken and InputStrings loops
now honour encoding_format=base64
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat: wire min_p
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: inferencing defaults
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(refactor): re-use iterative parser
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: generate automatically inference defaults from unsloth
Instead of trying to re-invent the wheel and maintain here the inference
defaults, prefer to consume unsloth ones, and contribute there as
necessary.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: apply defaults also to models installed via gallery
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: be consistent and apply fallback to all endpoint
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: add fine-tuning endpoint
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(experimental): add fine-tuning endpoint and TRL support
This changeset defines new GRPC signatues for Fine tuning backends, and
add TRL backend as initial fine-tuning engine. This implementation also
supports exporting to GGUF and automatically importing it to LocalAI
after fine-tuning.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* commit TRL backend, stop by killing process
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* move fine-tune to generic features
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* add evals, reorder menu
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
First when sending errors over SSE we now clearly identify them as such
instead of just sending the error string as a chat completion message.
We use this in the UI to identify errors and link to them to the traces.
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(ui): add users and authentication support
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: allow the admin user to impersonificate users
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: ui improvements, disable 'Users' button in navbar when no auth is configured
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: add OIDC support
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: gate models
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: cache requests to optimize speed
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* small UI enhancements
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(ui): style improvements
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: cover other paths by auth
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: separate local auth, refactor
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* security hardening, approval mode
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: fix tests and expectations
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: update localagi/localrecall
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(openresponses): do not omit required fields summary and id
* fix(openresponses): ensure ORItemParam.Summary is never null
Normalize Summary to an empty slice at serialization chokepoints
(sendSSEEvent, bufferEvent, buildORResponse) so it always serializes
as [] instead of null.
Closes#9047
* feat(gallery): Switch to expandable box instead of pop-over and display model files
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(ui, backends): Add individual backend logging
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(ui): Set the context settings from the model config
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(realtime): WebRTC support
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(tracing): Show full LLM opts and deltas
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix: add missing bufio.Flush in processImageFile
The processImageFile function writes decoded image data (from base64
or URL download) through a bufio.NewWriter but never calls Flush()
before closing the underlying file. Since bufio's default buffer is
4096 bytes, small images produce 0-byte files and large images are
truncated — causing PIL to fail with "cannot identify image file".
This breaks all image input paths: file, files, and ref_images
parameters in /v1/images/generations, making img2img, inpainting,
and reference image features non-functional.
Signed-off-by: Attila Györffy <attila+git@attilagyorffy.com>
* fix: merge options into kwargs in diffusers GenerateImage
The GenerateImage method builds a local `options` dict containing the
source image (PIL), negative_prompt, and num_inference_steps, but
never merges it into `kwargs` before calling self.pipe(**kwargs).
This causes img2img to fail with "Input is in incorrect format"
because the pipeline never receives the image parameter.
Signed-off-by: Attila Györffy <attila+git@attilagyorffy.com>
* test: add unit test for processImageFile base64 decoding
Verifies that a base64-encoded PNG survives the write path
(encode → decode → bufio.Write → Flush → file on disk) with
byte-for-byte fidelity. The test image is small enough to fit
entirely in bufio's 4096-byte buffer, which is the exact scenario
where the missing Flush() produced a 0-byte file.
Also tests that invalid base64 input is handled gracefully.
Signed-off-by: Attila Györffy <attila+git@attilagyorffy.com>
* test: verify GenerateImage merges options into pipeline kwargs
Mocks the diffusers pipeline and calls GenerateImage with a source
image and negative prompt. Asserts that the pipeline receives the
image, negative_prompt, and num_inference_steps via kwargs — the
exact parameters that were silently dropped before the fix.
Signed-off-by: Attila Györffy <attila+git@attilagyorffy.com>
* fix: move kwargs.update(options) earlier in GenerateImage
Move the options merge right after self.options merge (L742) so that
image, negative_prompt, and num_inference_steps are available to all
downstream code paths including img2vid and txt2vid.
Signed-off-by: Attila Györffy <attila+git@attilagyorffy.com>
* test: convert processImageFile tests to ginkgo
Replace standard testing with ginkgo/gomega to be consistent with
the rest of the test suites in the project.
Signed-off-by: Attila Györffy <attila+git@attilagyorffy.com>
---------
Signed-off-by: Attila Györffy <attila+git@attilagyorffy.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* feat(mlx-distributed): add new MLX-distributed backend
Add new MLX distributed backend with support for both TCP and RDMA for
model sharding.
This implementation ties in the discovery implementation already in
place, and re-uses the same P2P mechanism for the TCP MLX-distributed
inferencing.
The Auto-parallel implementation is inspired by Exo's
ones (who have been added to acknowledgement for the great work!)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* expose a CLI to facilitate backend starting
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: make manual rank0 configurable via model configs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add missing features from mlx backend
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Apply suggestion from @mudler
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* feat(functions): add peg-based parsing
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: support returning toolcalls directly from backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: do run PEG only if backend didn't send deltas
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: add standalone and agentic functionalities
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* expose agents via responses api
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* debug
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* retry instead of re-computing a response
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
fix: reload model configuration after editing (issue #8647)
- Add *model.ModelLoader parameter to EditModelEndpoint
- Call ml.ShutdownModel() after saving config to unload the running model
- Model will be reloaded on next inference request with new settings (e.g., context_size)
- Update route registration to pass ml to EditModelEndpoint
Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
* fix(realtime): Wrap functions in OpenAI chat completions format
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(realtime): Set max tokens from session object
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(realtime): Find thinking start tag for thinking extraction
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(realtime): Don't send buffer cleared message when we automatically drop it
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>
feat(realtime): Allow sending text and image conversation items
Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* fix(realtime): Use locked websocket for concurrent access
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(realtime): Use sample rate set in session
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(config): Allow pipelines to have no model parameters
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(realtime): Use the voice provided by the user or none at all
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(ui,config): Allow pipeline models to have no backend and use same validation in frontend
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
User-supplied URLs passed to GetContentURIAsBase64() and downloadFile()
were fetched without validation, allowing SSRF attacks against internal
services. Added URL validation that blocks private IPs, loopback,
link-local, and cloud metadata endpoints before fetching.
Co-authored-by: kolega.dev <faizan@kolega.ai>
* feat(musicgen): add ace-step and UI interface
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Correctly handle model dir
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop auto-download
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add to models, fixup UIs icons
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Update docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* l4t13 is incompatbile
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* avoid pinning version for cuda12
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop l4t12
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
The realtime endpoint was not passing the noAction "answer" function to the
model in the prompt template, causing the model to always call user-provided
tools even when a direct response was appropriate.
Root cause:
- User tools were added to the funcs list
- TemplateMessages() was called to generate the prompt
- noAction function was only added AFTER templating
- This meant the prompt didn't include the "answer" function, even though
the grammar did
Fix:
- Move noAction function creation before TemplateMessages() call so it's
included in both the prompt and grammar
- Add proper tool_choice parameter handling to support "auto", "required",
"none", and specific function selection
- Match behavior of the standard chat endpoint
💘 Generated with Crush
Assisted-by: Claude Sonnet 4.5 via Crush <crush@charm.land>
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* WIP response format implementation for audio transcriptions
(cherry picked from commit e271dd764bbc13846accf3beb8b6522153aa276f)
Signed-off-by: Andres Smith <andressmithdev@pm.me>
* Rework transcript response_format and add more formats
(cherry picked from commit 6a93a8f63e2ee5726bca2980b0c9cf4ef8b7aeb8)
Signed-off-by: Andres Smith <andressmithdev@pm.me>
* Add test and replace go-openai package with official openai go client
(cherry picked from commit f25d1a04e46526429c89db4c739e1e65942ca893)
Signed-off-by: Andres Smith <andressmithdev@pm.me>
* Fix faster-whisper backend and refactor transcription formatting to also work on CLI
Signed-off-by: Andres Smith <andressmithdev@pm.me>
(cherry picked from commit 69a93977d5e113eb7172bd85a0f918592d3d2168)
Signed-off-by: Andres Smith <andressmithdev@pm.me>
---------
Signed-off-by: Andres Smith <andressmithdev@pm.me>
Co-authored-by: nanoandrew4 <nanoandrew4@gmail.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* feat(tts): add support for streaming mode
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Send first audio, make sure it's 16
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(realtime): Add audio conversations
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* chore(realtime): Vendor the updated API and modify for server side
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(realtime): Update to the GA realtime API
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* chore: Document realtime API and add docs to AGENTS.md
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat: Filter reasoning from spoken output
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(realtime): Send delta and done events for tool calls and audio transcripts
Ensure that content is sent in both deltas and done events for function call arguments and audio transcripts. This fixes compatibility with clients that rely on delta events for parsing.
💘 Generated with Crush
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(realtime): Improve tool call handling and error reporting
- Refactor Model interface to accept []types.ToolUnion and *types.ToolChoiceUnion
instead of JSON strings, eliminating unnecessary marshal/unmarshal cycles
- Fix Parameters field handling: support both map[string]any and JSON string formats
- Add PredictConfig() method to Model interface for accessing model configuration
- Add comprehensive debug logging for tool call parsing and function config
- Add missing return statement after prediction error (critical bug fix)
- Add warning logs for NoAction function argument parsing failures
- Improve error visibility throughout generateResponse function
💘 Generated with Crush
Assisted-by: Claude Sonnet 4.5 via Crush <crush@charm.land>
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* Debug
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop openai video endpoint (is not complete)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add download button
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>