* WIP response format implementation for audio transcriptions
(cherry picked from commit e271dd764bbc13846accf3beb8b6522153aa276f)
Signed-off-by: Andres Smith <andressmithdev@pm.me>
* Rework transcript response_format and add more formats
(cherry picked from commit 6a93a8f63e2ee5726bca2980b0c9cf4ef8b7aeb8)
Signed-off-by: Andres Smith <andressmithdev@pm.me>
* Add test and replace go-openai package with official openai go client
(cherry picked from commit f25d1a04e46526429c89db4c739e1e65942ca893)
Signed-off-by: Andres Smith <andressmithdev@pm.me>
* Fix faster-whisper backend and refactor transcription formatting to also work on CLI
Signed-off-by: Andres Smith <andressmithdev@pm.me>
(cherry picked from commit 69a93977d5e113eb7172bd85a0f918592d3d2168)
Signed-off-by: Andres Smith <andressmithdev@pm.me>
---------
Signed-off-by: Andres Smith <andressmithdev@pm.me>
Co-authored-by: nanoandrew4 <nanoandrew4@gmail.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* chore: drop mode from image generation(unused)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(UI): improve image generation front-end
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(UI): only ref images. files is to be deprecated
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* do not override default steps
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: Add usage fields to image generation response for OpenAI API compatibility
Fixes#7354
Added input_tokens, output_tokens, and input_tokens_details fields to the
image generation API response to comply with OpenAI's image generation API
specification. This resolves validation errors in LiteLLM and the OpenAI SDK.
Changes:
- Added InputTokensDetails struct with text_tokens and image_tokens fields
- Extended OpenAIUsage struct with input_tokens, output_tokens, and input_tokens_details
- Updated ImageEndpoint to populate usage object with required fields
- Updated InpaintingEndpoint to populate usage object with required fields
- All fields initialized to 0 as per current behavior
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: majiayu000 <1835304752@qq.com>
* fix: Correct usage field types for image generation API compatibility
Changed InputTokens and OutputTokens from pointer types (*int) to
regular int types to match OpenAI API specification. This fixes
validation errors with LiteLLM and OpenAI SDK when parsing image
generation responses.
Fixes#7354🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: majiayu000 <1835304752@qq.com>
---------
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* feat: add support to logprobs in results
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: add support to logitbias
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Initial plan
* Fix SSE streaming format to comply with specification
- Replace json.Encoder with json.Marshal for explicit formatting
- Use explicit \n\n for all SSE messages (instead of relying on implicit newlines)
- Change %v to %s format specifier for proper string formatting
- Fix error message streaming to include proper SSE format
- Ensure consistency between chat.go and completion.go endpoints
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
* Add proper error handling for JSON marshal failures in streaming
- Handle json.Marshal errors explicitly in error response paths
- Add fallback simple error message if marshal fails
- Prevents sending 'data: <nil>' on marshal failures
- Addresses code review feedback
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
* Fix SSE streaming format to comply with specification
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
* Fix finish_reason field to use pointer for proper null handling
- Change FinishReason from string to *string in Choice schema
- Streaming chunks now omit finish_reason (null) instead of empty string
- Final chunks properly set finish_reason to "stop", "tool_calls", etc.
- Remove empty content from initial streaming chunks (only send role)
- Final streaming chunk sends empty delta with finish_reason
- Addresses OpenAI API compliance issues causing client failures
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
* Improve code consistency for string pointer creation
- Use consistent pattern: declare variable then take address
- Remove inline anonymous function for better readability
- Addresses code review feedback
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
* Move common finish reasons to constants
- Create constants.go with FinishReasonStop, FinishReasonToolCalls, FinishReasonFunctionCall
- Replace all string literals with constants in chat.go, completion.go, realtime.go
- Improves code maintainability and prevents typos
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
* Make it build
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix finish_reason to always be present with null or string value
- Remove omitempty from FinishReason field in Choice struct
- Explicitly set FinishReason to nil for all streaming chunks
- Ensures finish_reason appears as null in JSON for streaming chunks
- Final chunks still properly set finish_reason to "stop", "tool_calls", etc.
- Complies with OpenAI API specification example
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
* feat(llama.cpp): expose env vars as options for consistency
This allows to configure everything in the YAML file of the model rather
than have global configurations
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(llama.cpp): respect usetokenizertemplate and use llama.cpp templating system to process messages
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* WIP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Detect template exists if use tokenizer template is enabled
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Better recognization of chat
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixes to support tool calls while using templates from tokenizer
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop template guessing, fix passing tools to tokenizer
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Extract grammar and other options from chat template, add schema struct
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* WIP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* WIP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Automatically set use_jinja
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Cleanups, identify by default gguf models for chat
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Update docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(stablediffusion-ggml): add support to ref images
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add it to the model gallery
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(stablediffusion-ncn): drop in favor of ggml implementation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(ci): drop stablediffusion build
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(tests): add
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(tests): try to fixup current tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Try to fix tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Tests improvements
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(tests): use quality to specify step
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(tests): switch to sd-1.5
also increase prep time for downloading models
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add machine tag option, add extraUsage option, grpc-server -> proto -> endpoint extraUsage data is broken for now
Signed-off-by: mintyleaf <mintyleafdev@gmail.com>
* remove redurant timing fields, fix not working timings output
Signed-off-by: mintyleaf <mintyleafdev@gmail.com>
* use middleware for Machine-Tag only if tag is specified
Signed-off-by: mintyleaf <mintyleafdev@gmail.com>
---------
Signed-off-by: mintyleaf <mintyleafdev@gmail.com>
This prepares the API to receive videos as well for video understanding.
It works similarly to images, where the request should be in the form:
{
"type": "video_url",
"video_url": { "url": "url or base64 data" }
}
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(openai): add json_schema and strict mode
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* handle err vs _
security scanners prefer if we put these branches in, and I tend to agree.
Signed-off-by: Dave <dave@gray101.com>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Dave <dave@gray101.com>
Co-authored-by: Dave <dave@gray101.com>
* feat(functions): enhance parsing with broken JSON when we parse the raw results
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* breaking: make function name by default
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(grammar): dynamically generate grammars with mutating keys
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactor: simplify condition
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Update docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* docs(swagger): core more localai/openai endpoints
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix swagger descriptions for backend_monitor.go
Signed-off-by: Dave <dave@gray101.com>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Dave <dave@gray101.com>
Co-authored-by: Dave <dave@gray101.com>
When enabling grammar with functions, it might be useful to
allow more flexibility to support models that are fine-tuned against returning
function calls of the form of { "name": "function_name", "arguments" {...} }
rather then { "function": "function_name", "arguments": {..} }.
This might call out to a more generic approach later on, but for the moment being we can easily support both
as we have just to specific different types.
If needed we can expand on this later on
Signed-off-by: mudler <mudler@localai.io>
* core 1
* api/openai/files fix
* core 2 - core/config
* move over core api.go and tests to the start of core/http
* move over localai specific endpoints to core/http, begin the service/endpoint split there
* refactor big chunk on the plane
* refactor chunk 2 on plane, next step: port and modify changes to request.go
* easy fixes for request.go, major changes not done yet
* lintfix
* json tag lintfix?
* gitignore and .keep files
* strange fix attempt: rename the config dir?
This PR specifically introduces a `core` folder and moves the following packages over, without any other changes:
- `api/backend`
- `api/config`
- `api/options`
- `api/schema`
Once this is merged and we confirm there's no regressions, I can migrate over the remaining changes piece by piece to split up application startup, backend services, http, and mqtt as was the goal of the earlier PRs!