Compare commits

...

11 Commits

Author SHA1 Message Date
Ettore Di Giacinto
5f403b1631 chore: drop neutts for l4t (#8101)
Builds exhausts CI currently, and there are better backends at this
point in time. We will probably deprecate it in the future.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-18 21:55:56 +01:00
rampa3
897ad1729e chore(model gallery): add qwen3-coder-30b-a3b-instruct based on model request (#8082)
* chore(model gallery): add qwen3-coder-30b-a3b-instruct based on model request

Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>

* added missing model config import URL

Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>

---------

Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>
2026-01-18 09:23:07 +01:00
LocalAI [bot]
16a18a2e55 chore: ⬆️ Update leejet/stable-diffusion.cpp to 9565c7f6bd5fcff124c589147b2621244f2c4aa1 (#8086)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-17 22:12:21 +01:00
Ettore Di Giacinto
3387bfaee0 feat(api): add support for open responses specification (#8063)
* feat: openresponses

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add ttl settings, fix tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: register cors middleware by default

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* satisfy schema

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Logitbias and logprobs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add grammar

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* SSE compliance

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* tool JSON conversion

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* support background mode

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* swagger

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* drop code. This is handled in the handler

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Small refactorings

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* background mode for MCP

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-17 22:11:47 +01:00
LocalAI [bot]
1cd33047b4 chore: ⬆️ Update ggml-org/llama.cpp to 2fbde785bc106ae1c4102b0e82b9b41d9c466579 (#8087)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-17 21:10:18 +00:00
Ettore Di Giacinto
1de045311a chore(ui): add video generation link (#8079)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-17 09:49:47 +01:00
LocalAI [bot]
5fe9bf9f84 chore: ⬆️ Update ggml-org/whisper.cpp to f53dc74843e97f19f94a79241357f74ad5b691a6 (#8074)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-17 08:32:53 +01:00
LocalAI [bot]
d4fd0c0609 chore: ⬆️ Update ggml-org/llama.cpp to 388ce822415f24c60fcf164a321455f1e008cafb (#8073)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-01-16 21:22:33 +00:00
Ettore Di Giacinto
d16722ee13 Revert "chore(deps): bump torch from 2.3.1+cxx11.abi to 2.8.0 in /backend/python/rerankers in the pip group across 1 directory" (#8072)
Revert "chore(deps): bump torch from 2.3.1+cxx11.abi to 2.8.0 in /backend/pyt…"

This reverts commit 1f10ab39a9.
2026-01-16 20:50:33 +01:00
dependabot[bot]
1f10ab39a9 chore(deps): bump torch from 2.3.1+cxx11.abi to 2.8.0 in /backend/python/rerankers in the pip group across 1 directory (#8066)
chore(deps): bump torch

Bumps the pip group with 1 update in the /backend/python/rerankers directory: [torch](https://github.com/pytorch/pytorch).


Updates `torch` from 2.3.1+cxx11.abi to 2.8.0
- [Release notes](https://github.com/pytorch/pytorch/releases)
- [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md)
- [Commits](https://github.com/pytorch/pytorch/commits/v2.8.0)

---
updated-dependencies:
- dependency-name: torch
  dependency-version: 2.8.0
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-16 19:38:12 +00:00
Ettore Di Giacinto
4d36e393d1 fix(ci): use more beefy runner for expensive jobs (#8065)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-16 19:26:40 +01:00
22 changed files with 6048 additions and 37 deletions

View File

@@ -137,7 +137,7 @@ jobs:
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-nvidia-cuda-12-llama-cpp'
runs-on: 'ubuntu-latest'
runs-on: 'bigger-runner'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "llama-cpp"
@@ -699,7 +699,7 @@ jobs:
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-rocm-hipblas-faster-whisper'
runs-on: 'ubuntu-latest'
runs-on: 'bigger-runner'
base-image: "rocm/dev-ubuntu-24.04:6.4.4"
skip-drivers: 'false'
backend: "faster-whisper"
@@ -712,7 +712,7 @@ jobs:
platforms: 'linux/amd64'
tag-latest: 'auto'
tag-suffix: '-gpu-rocm-hipblas-coqui'
runs-on: 'ubuntu-latest'
runs-on: 'bigger-runner'
base-image: "rocm/dev-ubuntu-24.04:6.4.4"
skip-drivers: 'false'
backend: "coqui"
@@ -963,7 +963,7 @@ jobs:
platforms: 'linux/amd64,linux/arm64'
tag-latest: 'auto'
tag-suffix: '-cpu-llama-cpp'
runs-on: 'ubuntu-latest'
runs-on: 'bigger-runner'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "llama-cpp"
@@ -989,7 +989,7 @@ jobs:
platforms: 'linux/amd64,linux/arm64'
tag-latest: 'auto'
tag-suffix: '-gpu-vulkan-llama-cpp'
runs-on: 'ubuntu-latest'
runs-on: 'bigger-runner'
base-image: "ubuntu:24.04"
skip-drivers: 'false'
backend: "llama-cpp"
@@ -1330,19 +1330,6 @@ jobs:
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2404'
- build-type: 'l4t'
cuda-major-version: "12"
cuda-minor-version: "0"
platforms: 'linux/arm64'
skip-drivers: 'true'
tag-latest: 'auto'
tag-suffix: '-nvidia-l4t-arm64-neutts'
base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
runs-on: 'ubuntu-24.04-arm'
backend: "neutts"
dockerfile: "./backend/Dockerfile.python"
context: "./"
ubuntu-version: '2204'
- build-type: ''
cuda-major-version: ""
cuda-minor-version: ""

View File

@@ -1,5 +1,5 @@
LLAMA_VERSION?=785a71008573e2d84728fb0ba9e851d72d3f8fab
LLAMA_VERSION?=2fbde785bc106ae1c4102b0e82b9b41d9c466579
LLAMA_REPO?=https://github.com/ggerganov/llama.cpp
CMAKE_ARGS?=

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# stablediffusion.cpp (ggml)
STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
STABLEDIFFUSION_GGML_VERSION?=7010bb4dff7bd55b03d35ef9772142c21699eba9
STABLEDIFFUSION_GGML_VERSION?=9565c7f6bd5fcff124c589147b2621244f2c4aa1
CMAKE_ARGS+=-DGGML_MAX_NAME=128

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp
WHISPER_CPP_VERSION?=2eeeba56e9edd762b4b38467bab96c2517163158
WHISPER_CPP_VERSION?=f53dc74843e97f19f94a79241357f74ad5b691a6
SO_TARGET?=libgowhisper.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF

View File

@@ -537,18 +537,14 @@
default: "cpu-neutts"
nvidia: "cuda12-neutts"
amd: "rocm-neutts"
nvidia-l4t: "nvidia-l4t-neutts"
nvidia-cuda-12: "cuda12-neutts"
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-neutts"
- !!merge <<: *neutts
name: "neutts-development"
capabilities:
default: "cpu-neutts-development"
nvidia: "cuda12-neutts-development"
amd: "rocm-neutts-development"
nvidia-l4t: "nvidia-l4t-neutts-development"
nvidia-cuda-12: "cuda12-neutts-development"
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-neutts-development"
- !!merge <<: *llamacpp
name: "llama-cpp-development"
capabilities:
@@ -578,11 +574,6 @@
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-neutts"
mirrors:
- localai/localai-backends:latest-gpu-rocm-hipblas-neutts
- !!merge <<: *neutts
name: "nvidia-l4t-arm64-neutts"
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-neutts"
mirrors:
- localai/localai-backends:latest-nvidia-l4t-arm64-neutts
- !!merge <<: *neutts
name: "cpu-neutts-development"
uri: "quay.io/go-skynet/local-ai-backends:master-cpu-neutts"
@@ -598,11 +589,6 @@
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-neutts"
mirrors:
- localai/localai-backends:master-gpu-rocm-hipblas-neutts
- !!merge <<: *neutts
name: "nvidia-l4t-arm64-neutts-development"
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-arm64-neutts"
mirrors:
- localai/localai-backends:master-nvidia-l4t-arm64-neutts
- !!merge <<: *mlx
name: "mlx-development"
uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-mlx"

View File

@@ -83,6 +83,7 @@ type RunCMD struct {
EnableTracing bool `env:"LOCALAI_ENABLE_TRACING,ENABLE_TRACING" help:"Enable API tracing" group:"api"`
TracingMaxItems int `env:"LOCALAI_TRACING_MAX_ITEMS" default:"1024" help:"Maximum number of traces to keep" group:"api"`
AgentJobRetentionDays int `env:"LOCALAI_AGENT_JOB_RETENTION_DAYS,AGENT_JOB_RETENTION_DAYS" default:"30" help:"Number of days to keep agent job history (default: 30)" group:"api"`
OpenResponsesStoreTTL string `env:"LOCALAI_OPEN_RESPONSES_STORE_TTL,OPEN_RESPONSES_STORE_TTL" default:"0" help:"TTL for Open Responses store (e.g., 1h, 30m, 0 = no expiration)" group:"api"`
Version bool
}
@@ -249,6 +250,15 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error {
opts = append(opts, config.WithLRUEvictionRetryInterval(dur))
}
// Handle Open Responses store TTL
if r.OpenResponsesStoreTTL != "" && r.OpenResponsesStoreTTL != "0" {
dur, err := time.ParseDuration(r.OpenResponsesStoreTTL)
if err != nil {
return fmt.Errorf("invalid Open Responses store TTL: %w", err)
}
opts = append(opts, config.WithOpenResponsesStoreTTL(dur))
}
// split ":" to get backend name and the uri
for _, v := range r.ExternalGRPCBackends {
backend := v[:strings.IndexByte(v, ':')]

View File

@@ -86,6 +86,8 @@ type ApplicationConfig struct {
AgentJobRetentionDays int // Default: 30 days
OpenResponsesStoreTTL time.Duration // TTL for Open Responses store (0 = no expiration)
PathWithoutAuth []string
}
@@ -467,6 +469,12 @@ func WithAgentJobRetentionDays(days int) AppOption {
}
}
func WithOpenResponsesStoreTTL(ttl time.Duration) AppOption {
return func(o *ApplicationConfig) {
o.OpenResponsesStoreTTL = ttl
}
}
func WithEnforcedPredownloadScans(enforced bool) AppOption {
return func(o *ApplicationConfig) {
o.EnforcePredownloadScans = enforced
@@ -594,6 +602,12 @@ func (o *ApplicationConfig) ToRuntimeSettings() RuntimeSettings {
} else {
lruEvictionRetryInterval = "1s" // default
}
var openResponsesStoreTTL string
if o.OpenResponsesStoreTTL > 0 {
openResponsesStoreTTL = o.OpenResponsesStoreTTL.String()
} else {
openResponsesStoreTTL = "0" // default: no expiration
}
return RuntimeSettings{
WatchdogEnabled: &watchdogEnabled,
@@ -628,6 +642,7 @@ func (o *ApplicationConfig) ToRuntimeSettings() RuntimeSettings {
AutoloadBackendGalleries: &autoloadBackendGalleries,
ApiKeys: &apiKeys,
AgentJobRetentionDays: &agentJobRetentionDays,
OpenResponsesStoreTTL: &openResponsesStoreTTL,
}
}
@@ -769,6 +784,14 @@ func (o *ApplicationConfig) ApplyRuntimeSettings(settings *RuntimeSettings) (req
if settings.AgentJobRetentionDays != nil {
o.AgentJobRetentionDays = *settings.AgentJobRetentionDays
}
if settings.OpenResponsesStoreTTL != nil {
if *settings.OpenResponsesStoreTTL == "0" || *settings.OpenResponsesStoreTTL == "" {
o.OpenResponsesStoreTTL = 0 // No expiration
} else if dur, err := time.ParseDuration(*settings.OpenResponsesStoreTTL); err == nil {
o.OpenResponsesStoreTTL = dur
}
// This setting doesn't require restart, can be updated dynamically
}
// Note: ApiKeys requires special handling (merging with startup keys) - handled in caller
return requireRestart

View File

@@ -60,4 +60,7 @@ type RuntimeSettings struct {
// Agent settings
AgentJobRetentionDays *int `json:"agent_job_retention_days,omitempty"`
// Open Responses settings
OpenResponsesStoreTTL *string `json:"open_responses_store_ttl,omitempty"` // TTL for stored responses (e.g., "1h", "30m", "0" = no expiration)
}

View File

@@ -193,6 +193,8 @@ func API(application *application.Application) (*echo.Echo, error) {
corsConfig.AllowOrigins = strings.Split(application.ApplicationConfig().CORSAllowOrigins, ",")
}
e.Use(middleware.CORSWithConfig(corsConfig))
} else {
e.Use(middleware.CORS())
}
// CSRF middleware
@@ -214,6 +216,7 @@ func API(application *application.Application) (*echo.Echo, error) {
routes.RegisterLocalAIRoutes(e, requestExtractor, application.ModelConfigLoader(), application.ModelLoader(), application.ApplicationConfig(), application.GalleryService(), opcache, application.TemplatesEvaluator(), application)
routes.RegisterOpenAIRoutes(e, requestExtractor, application)
routes.RegisterAnthropicRoutes(e, requestExtractor, application)
routes.RegisterOpenResponsesRoutes(e, requestExtractor, application)
if !application.ApplicationConfig().DisableWebUI {
routes.RegisterUIAPIRoutes(e, application.ModelConfigLoader(), application.ModelLoader(), application.ApplicationConfig(), application.GalleryService(), opcache, application)
routes.RegisterUIRoutes(e, application.ModelConfigLoader(), application.ModelLoader(), application.ApplicationConfig(), application.GalleryService())

View File

@@ -11,6 +11,7 @@ import (
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/application"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/http/endpoints/openresponses"
"github.com/mudler/LocalAI/core/p2p"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/xlog"
@@ -84,6 +85,16 @@ func UpdateSettingsEndpoint(app *application.Application) echo.HandlerFunc {
})
}
}
if settings.OpenResponsesStoreTTL != nil {
if *settings.OpenResponsesStoreTTL != "0" && *settings.OpenResponsesStoreTTL != "" {
if _, err := time.ParseDuration(*settings.OpenResponsesStoreTTL); err != nil {
return c.JSON(http.StatusBadRequest, schema.SettingsResponse{
Success: false,
Error: "Invalid open_responses_store_ttl format: " + err.Error(),
})
}
}
}
// Save to file
if appConfig.DynamicConfigsDir == "" {
@@ -144,6 +155,22 @@ func UpdateSettingsEndpoint(app *application.Application) echo.HandlerFunc {
xlog.Info("Updated LRU eviction retry settings", "maxRetries", maxRetries, "retryInterval", retryInterval)
}
// Update Open Responses store TTL dynamically
if settings.OpenResponsesStoreTTL != nil {
ttl := time.Duration(0)
if *settings.OpenResponsesStoreTTL != "0" && *settings.OpenResponsesStoreTTL != "" {
if dur, err := time.ParseDuration(*settings.OpenResponsesStoreTTL); err == nil {
ttl = dur
} else {
xlog.Warn("Invalid Open Responses store TTL format", "ttl", *settings.OpenResponsesStoreTTL, "error", err)
}
}
// Import the store package
store := openresponses.GetGlobalStore()
store.SetTTL(ttl)
xlog.Info("Updated Open Responses store TTL", "ttl", ttl)
}
// Check if agent job retention changed
agentJobChanged := settings.AgentJobRetentionDays != nil

View File

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,453 @@
package openresponses
import (
"context"
"encoding/json"
"fmt"
"sync"
"time"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/xlog"
)
// ResponseStore provides thread-safe storage for Open Responses API responses
type ResponseStore struct {
mu sync.RWMutex
responses map[string]*StoredResponse
ttl time.Duration // Time-to-live for stored responses (0 = no expiration)
cleanupCtx context.Context
cleanupCancel context.CancelFunc
}
// StreamedEvent represents a buffered SSE event for streaming resume
type StreamedEvent struct {
SequenceNumber int `json:"sequence_number"`
EventType string `json:"event_type"`
Data []byte `json:"data"` // JSON-serialized event
}
// StoredResponse contains a complete response with its input request and output items
type StoredResponse struct {
Request *schema.OpenResponsesRequest
Response *schema.ORResponseResource
Items map[string]*schema.ORItemField // item_id -> item mapping for quick lookup
StoredAt time.Time
ExpiresAt *time.Time // nil if no expiration
// Background execution support
CancelFunc context.CancelFunc // For cancellation of background tasks
StreamEvents []StreamedEvent // Buffered events for streaming resume
StreamEnabled bool // Was created with stream=true
IsBackground bool // Was created with background=true
EventsChan chan struct{} // Signals new events for live subscribers
mu sync.RWMutex // Protect concurrent access to this response
}
var (
globalStore *ResponseStore
storeOnce sync.Once
)
// GetGlobalStore returns the singleton response store instance
func GetGlobalStore() *ResponseStore {
storeOnce.Do(func() {
globalStore = NewResponseStore(0) // Default: no TTL, will be updated from appConfig
})
return globalStore
}
// SetTTL updates the TTL for the store
// This will affect all new responses stored after this call
func (s *ResponseStore) SetTTL(ttl time.Duration) {
s.mu.Lock()
defer s.mu.Unlock()
// Stop existing cleanup loop if running
if s.cleanupCancel != nil {
s.cleanupCancel()
s.cleanupCancel = nil
s.cleanupCtx = nil
}
s.ttl = ttl
// If TTL > 0, start cleanup loop
if ttl > 0 {
s.cleanupCtx, s.cleanupCancel = context.WithCancel(context.Background())
go s.cleanupLoop(s.cleanupCtx)
}
xlog.Debug("Updated Open Responses store TTL", "ttl", ttl, "cleanup_running", ttl > 0)
}
// NewResponseStore creates a new response store with optional TTL
// If ttl is 0, responses are stored indefinitely
func NewResponseStore(ttl time.Duration) *ResponseStore {
store := &ResponseStore{
responses: make(map[string]*StoredResponse),
ttl: ttl,
}
// Start cleanup goroutine if TTL is set
if ttl > 0 {
store.cleanupCtx, store.cleanupCancel = context.WithCancel(context.Background())
go store.cleanupLoop(store.cleanupCtx)
}
return store
}
// Store stores a response with its request and items
func (s *ResponseStore) Store(responseID string, request *schema.OpenResponsesRequest, response *schema.ORResponseResource) {
s.mu.Lock()
defer s.mu.Unlock()
// Build item index for quick lookup
items := make(map[string]*schema.ORItemField)
for i := range response.Output {
item := &response.Output[i]
if item.ID != "" {
items[item.ID] = item
}
}
stored := &StoredResponse{
Request: request,
Response: response,
Items: items,
StoredAt: time.Now(),
ExpiresAt: nil,
}
// Set expiration if TTL is configured
if s.ttl > 0 {
expiresAt := time.Now().Add(s.ttl)
stored.ExpiresAt = &expiresAt
}
s.responses[responseID] = stored
xlog.Debug("Stored Open Responses response", "response_id", responseID, "items_count", len(items))
}
// Get retrieves a stored response by ID
func (s *ResponseStore) Get(responseID string) (*StoredResponse, error) {
s.mu.RLock()
defer s.mu.RUnlock()
stored, exists := s.responses[responseID]
if !exists {
return nil, fmt.Errorf("response not found: %s", responseID)
}
// Check expiration
if stored.ExpiresAt != nil && time.Now().After(*stored.ExpiresAt) {
// Expired, but we'll return it anyway and let caller handle cleanup
return nil, fmt.Errorf("response expired: %s", responseID)
}
return stored, nil
}
// GetItem retrieves a specific item from a stored response
func (s *ResponseStore) GetItem(responseID, itemID string) (*schema.ORItemField, error) {
stored, err := s.Get(responseID)
if err != nil {
return nil, err
}
item, exists := stored.Items[itemID]
if !exists {
return nil, fmt.Errorf("item not found: %s in response %s", itemID, responseID)
}
return item, nil
}
// FindItem searches for an item across all stored responses
// Returns the item and the response ID it was found in
func (s *ResponseStore) FindItem(itemID string) (*schema.ORItemField, string, error) {
s.mu.RLock()
defer s.mu.RUnlock()
now := time.Now()
for responseID, stored := range s.responses {
// Skip expired responses
if stored.ExpiresAt != nil && now.After(*stored.ExpiresAt) {
continue
}
if item, exists := stored.Items[itemID]; exists {
return item, responseID, nil
}
}
return nil, "", fmt.Errorf("item not found in any stored response: %s", itemID)
}
// Delete removes a response from storage
func (s *ResponseStore) Delete(responseID string) {
s.mu.Lock()
defer s.mu.Unlock()
delete(s.responses, responseID)
xlog.Debug("Deleted Open Responses response", "response_id", responseID)
}
// Cleanup removes expired responses
func (s *ResponseStore) Cleanup() int {
if s.ttl == 0 {
return 0
}
s.mu.Lock()
defer s.mu.Unlock()
now := time.Now()
count := 0
for id, stored := range s.responses {
if stored.ExpiresAt != nil && now.After(*stored.ExpiresAt) {
delete(s.responses, id)
count++
}
}
if count > 0 {
xlog.Debug("Cleaned up expired Open Responses", "count", count)
}
return count
}
// cleanupLoop runs periodic cleanup of expired responses
func (s *ResponseStore) cleanupLoop(ctx context.Context) {
if s.ttl == 0 {
return
}
ticker := time.NewTicker(s.ttl / 2) // Cleanup at half TTL interval
defer ticker.Stop()
for {
select {
case <-ctx.Done():
xlog.Debug("Stopped Open Responses store cleanup loop")
return
case <-ticker.C:
s.Cleanup()
}
}
}
// Count returns the number of stored responses
func (s *ResponseStore) Count() int {
s.mu.RLock()
defer s.mu.RUnlock()
return len(s.responses)
}
// StoreBackground stores a background response with cancel function and optional streaming support
func (s *ResponseStore) StoreBackground(responseID string, request *schema.OpenResponsesRequest, response *schema.ORResponseResource, cancelFunc context.CancelFunc, streamEnabled bool) {
s.mu.Lock()
defer s.mu.Unlock()
// Build item index for quick lookup
items := make(map[string]*schema.ORItemField)
for i := range response.Output {
item := &response.Output[i]
if item.ID != "" {
items[item.ID] = item
}
}
stored := &StoredResponse{
Request: request,
Response: response,
Items: items,
StoredAt: time.Now(),
ExpiresAt: nil,
CancelFunc: cancelFunc,
StreamEvents: []StreamedEvent{},
StreamEnabled: streamEnabled,
IsBackground: true,
EventsChan: make(chan struct{}, 100), // Buffered channel for event notifications
}
// Set expiration if TTL is configured
if s.ttl > 0 {
expiresAt := time.Now().Add(s.ttl)
stored.ExpiresAt = &expiresAt
}
s.responses[responseID] = stored
xlog.Debug("Stored background Open Responses response", "response_id", responseID, "stream_enabled", streamEnabled)
}
// UpdateStatus updates the status of a stored response
func (s *ResponseStore) UpdateStatus(responseID string, status string, completedAt *int64) error {
s.mu.RLock()
stored, exists := s.responses[responseID]
s.mu.RUnlock()
if !exists {
return fmt.Errorf("response not found: %s", responseID)
}
stored.mu.Lock()
defer stored.mu.Unlock()
stored.Response.Status = status
stored.Response.CompletedAt = completedAt
xlog.Debug("Updated response status", "response_id", responseID, "status", status)
return nil
}
// UpdateResponse updates the entire response object for a stored response
func (s *ResponseStore) UpdateResponse(responseID string, response *schema.ORResponseResource) error {
s.mu.RLock()
stored, exists := s.responses[responseID]
s.mu.RUnlock()
if !exists {
return fmt.Errorf("response not found: %s", responseID)
}
stored.mu.Lock()
defer stored.mu.Unlock()
// Rebuild item index
items := make(map[string]*schema.ORItemField)
for i := range response.Output {
item := &response.Output[i]
if item.ID != "" {
items[item.ID] = item
}
}
stored.Response = response
stored.Items = items
xlog.Debug("Updated response", "response_id", responseID, "status", response.Status, "items_count", len(items))
return nil
}
// AppendEvent appends a streaming event to the buffer for resume support
func (s *ResponseStore) AppendEvent(responseID string, event *schema.ORStreamEvent) error {
s.mu.RLock()
stored, exists := s.responses[responseID]
s.mu.RUnlock()
if !exists {
return fmt.Errorf("response not found: %s", responseID)
}
// Serialize the event
data, err := json.Marshal(event)
if err != nil {
return fmt.Errorf("failed to marshal event: %w", err)
}
stored.mu.Lock()
stored.StreamEvents = append(stored.StreamEvents, StreamedEvent{
SequenceNumber: event.SequenceNumber,
EventType: event.Type,
Data: data,
})
stored.mu.Unlock()
// Notify any subscribers of new event
select {
case stored.EventsChan <- struct{}{}:
default:
// Channel full, subscribers will catch up
}
return nil
}
// GetEventsAfter returns all events with sequence number greater than startingAfter
func (s *ResponseStore) GetEventsAfter(responseID string, startingAfter int) ([]StreamedEvent, error) {
s.mu.RLock()
stored, exists := s.responses[responseID]
s.mu.RUnlock()
if !exists {
return nil, fmt.Errorf("response not found: %s", responseID)
}
stored.mu.RLock()
defer stored.mu.RUnlock()
var result []StreamedEvent
for _, event := range stored.StreamEvents {
if event.SequenceNumber > startingAfter {
result = append(result, event)
}
}
return result, nil
}
// Cancel cancels a background response if it's still in progress
func (s *ResponseStore) Cancel(responseID string) (*schema.ORResponseResource, error) {
s.mu.RLock()
stored, exists := s.responses[responseID]
s.mu.RUnlock()
if !exists {
return nil, fmt.Errorf("response not found: %s", responseID)
}
stored.mu.Lock()
defer stored.mu.Unlock()
// If already in a terminal state, just return the response (idempotent)
status := stored.Response.Status
if status == schema.ORStatusCompleted || status == schema.ORStatusFailed ||
status == schema.ORStatusIncomplete || status == schema.ORStatusCancelled {
xlog.Debug("Response already in terminal state", "response_id", responseID, "status", status)
return stored.Response, nil
}
// Cancel the context if available
if stored.CancelFunc != nil {
stored.CancelFunc()
xlog.Debug("Cancelled background response", "response_id", responseID)
}
// Update status to cancelled
now := time.Now().Unix()
stored.Response.Status = schema.ORStatusCancelled
stored.Response.CompletedAt = &now
return stored.Response, nil
}
// GetEventsChan returns the events notification channel for a response
func (s *ResponseStore) GetEventsChan(responseID string) (chan struct{}, error) {
s.mu.RLock()
stored, exists := s.responses[responseID]
s.mu.RUnlock()
if !exists {
return nil, fmt.Errorf("response not found: %s", responseID)
}
return stored.EventsChan, nil
}
// IsStreamEnabled checks if a response was created with streaming enabled
func (s *ResponseStore) IsStreamEnabled(responseID string) (bool, error) {
s.mu.RLock()
stored, exists := s.responses[responseID]
s.mu.RUnlock()
if !exists {
return false, fmt.Errorf("response not found: %s", responseID)
}
stored.mu.RLock()
defer stored.mu.RUnlock()
return stored.StreamEnabled, nil
}

View File

@@ -0,0 +1,13 @@
package openresponses
import (
"testing"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
func TestStore(t *testing.T) {
RegisterFailHandler(Fail)
RunSpecs(t, "ResponseStore Suite")
}

View File

@@ -0,0 +1,626 @@
package openresponses
import (
"context"
"fmt"
"time"
"github.com/mudler/LocalAI/core/schema"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
var _ = Describe("ResponseStore", func() {
var store *ResponseStore
BeforeEach(func() {
store = NewResponseStore(0) // No TTL for most tests
})
AfterEach(func() {
// Clean up
})
Describe("Store and Get", func() {
It("should store and retrieve a response", func() {
responseID := "resp_test123"
request := &schema.OpenResponsesRequest{
Model: "test-model",
Input: "Hello",
}
response := &schema.ORResponseResource{
ID: responseID,
Object: "response",
CreatedAt: time.Now().Unix(),
Status: "completed",
Model: "test-model",
Output: []schema.ORItemField{
{
Type: "message",
ID: "msg_123",
Status: "completed",
Role: "assistant",
Content: []schema.ORContentPart{{
Type: "output_text",
Text: "Hello, world!",
Annotations: []schema.ORAnnotation{},
Logprobs: []schema.ORLogProb{},
}},
},
},
}
store.Store(responseID, request, response)
stored, err := store.Get(responseID)
Expect(err).ToNot(HaveOccurred())
Expect(stored).ToNot(BeNil())
Expect(stored.Response.ID).To(Equal(responseID))
Expect(stored.Request.Model).To(Equal("test-model"))
Expect(len(stored.Items)).To(Equal(1))
Expect(stored.Items["msg_123"]).ToNot(BeNil())
Expect(stored.Items["msg_123"].ID).To(Equal("msg_123"))
})
It("should return error for non-existent response", func() {
_, err := store.Get("nonexistent")
Expect(err).To(HaveOccurred())
Expect(err.Error()).To(ContainSubstring("not found"))
})
It("should index all items by ID", func() {
responseID := "resp_test456"
request := &schema.OpenResponsesRequest{
Model: "test-model",
Input: "Test",
}
response := &schema.ORResponseResource{
ID: responseID,
Object: "response",
Output: []schema.ORItemField{
{
Type: "message",
ID: "msg_1",
Status: "completed",
Role: "assistant",
},
{
Type: "function_call",
ID: "fc_1",
Status: "completed",
CallID: "fc_1",
Name: "test_function",
Arguments: `{"arg": "value"}`,
},
{
Type: "message",
ID: "msg_2",
Status: "completed",
Role: "assistant",
},
},
}
store.Store(responseID, request, response)
stored, err := store.Get(responseID)
Expect(err).ToNot(HaveOccurred())
Expect(len(stored.Items)).To(Equal(3))
Expect(stored.Items["msg_1"]).ToNot(BeNil())
Expect(stored.Items["fc_1"]).ToNot(BeNil())
Expect(stored.Items["msg_2"]).ToNot(BeNil())
})
It("should handle items without IDs", func() {
responseID := "resp_test789"
request := &schema.OpenResponsesRequest{
Model: "test-model",
Input: "Test",
}
response := &schema.ORResponseResource{
ID: responseID,
Object: "response",
Output: []schema.ORItemField{
{
Type: "message",
ID: "", // No ID
Status: "completed",
Role: "assistant",
},
{
Type: "message",
ID: "msg_with_id",
Status: "completed",
Role: "assistant",
},
},
}
store.Store(responseID, request, response)
stored, err := store.Get(responseID)
Expect(err).ToNot(HaveOccurred())
// Only items with IDs are indexed
Expect(len(stored.Items)).To(Equal(1))
Expect(stored.Items["msg_with_id"]).ToNot(BeNil())
})
})
Describe("GetItem", func() {
It("should retrieve a specific item by ID", func() {
responseID := "resp_item_test"
itemID := "msg_specific"
request := &schema.OpenResponsesRequest{
Model: "test-model",
Input: "Test",
}
response := &schema.ORResponseResource{
ID: responseID,
Object: "response",
Output: []schema.ORItemField{
{
Type: "message",
ID: itemID,
Status: "completed",
Role: "assistant",
Content: []schema.ORContentPart{{
Type: "output_text",
Text: "Specific message",
Annotations: []schema.ORAnnotation{},
Logprobs: []schema.ORLogProb{},
}},
},
},
}
store.Store(responseID, request, response)
item, err := store.GetItem(responseID, itemID)
Expect(err).ToNot(HaveOccurred())
Expect(item).ToNot(BeNil())
Expect(item.ID).To(Equal(itemID))
Expect(item.Type).To(Equal("message"))
})
It("should return error for non-existent item", func() {
responseID := "resp_item_test2"
request := &schema.OpenResponsesRequest{
Model: "test-model",
Input: "Test",
}
response := &schema.ORResponseResource{
ID: responseID,
Object: "response",
Output: []schema.ORItemField{
{
Type: "message",
ID: "msg_existing",
Status: "completed",
},
},
}
store.Store(responseID, request, response)
_, err := store.GetItem(responseID, "nonexistent_item")
Expect(err).To(HaveOccurred())
Expect(err.Error()).To(ContainSubstring("item not found"))
})
It("should return error for non-existent response when getting item", func() {
_, err := store.GetItem("nonexistent_response", "any_item")
Expect(err).To(HaveOccurred())
Expect(err.Error()).To(ContainSubstring("response not found"))
})
})
Describe("FindItem", func() {
It("should find an item across all stored responses", func() {
// Store first response
responseID1 := "resp_find_1"
itemID1 := "msg_find_1"
store.Store(responseID1, &schema.OpenResponsesRequest{Model: "test"}, &schema.ORResponseResource{
ID: responseID1,
Object: "response",
Output: []schema.ORItemField{
{Type: "message", ID: itemID1, Status: "completed"},
},
})
// Store second response
responseID2 := "resp_find_2"
itemID2 := "msg_find_2"
store.Store(responseID2, &schema.OpenResponsesRequest{Model: "test"}, &schema.ORResponseResource{
ID: responseID2,
Object: "response",
Output: []schema.ORItemField{
{Type: "message", ID: itemID2, Status: "completed"},
},
})
// Find item from first response
item, foundResponseID, err := store.FindItem(itemID1)
Expect(err).ToNot(HaveOccurred())
Expect(item).ToNot(BeNil())
Expect(item.ID).To(Equal(itemID1))
Expect(foundResponseID).To(Equal(responseID1))
// Find item from second response
item, foundResponseID, err = store.FindItem(itemID2)
Expect(err).ToNot(HaveOccurred())
Expect(item).ToNot(BeNil())
Expect(item.ID).To(Equal(itemID2))
Expect(foundResponseID).To(Equal(responseID2))
})
It("should return error when item not found in any response", func() {
_, _, err := store.FindItem("nonexistent_item")
Expect(err).To(HaveOccurred())
Expect(err.Error()).To(ContainSubstring("item not found in any stored response"))
})
})
Describe("Delete", func() {
It("should delete a stored response", func() {
responseID := "resp_delete_test"
request := &schema.OpenResponsesRequest{Model: "test"}
response := &schema.ORResponseResource{
ID: responseID,
Object: "response",
}
store.Store(responseID, request, response)
Expect(store.Count()).To(Equal(1))
store.Delete(responseID)
Expect(store.Count()).To(Equal(0))
_, err := store.Get(responseID)
Expect(err).To(HaveOccurred())
})
It("should handle deleting non-existent response gracefully", func() {
// Should not panic
store.Delete("nonexistent")
Expect(store.Count()).To(Equal(0))
})
})
Describe("Count", func() {
It("should return correct count of stored responses", func() {
Expect(store.Count()).To(Equal(0))
store.Store("resp_1", &schema.OpenResponsesRequest{Model: "test"}, &schema.ORResponseResource{ID: "resp_1", Object: "response"})
Expect(store.Count()).To(Equal(1))
store.Store("resp_2", &schema.OpenResponsesRequest{Model: "test"}, &schema.ORResponseResource{ID: "resp_2", Object: "response"})
Expect(store.Count()).To(Equal(2))
store.Delete("resp_1")
Expect(store.Count()).To(Equal(1))
})
})
Describe("TTL and Expiration", func() {
It("should set expiration when TTL is configured", func() {
ttlStore := NewResponseStore(100 * time.Millisecond)
responseID := "resp_ttl_test"
request := &schema.OpenResponsesRequest{Model: "test"}
response := &schema.ORResponseResource{ID: responseID, Object: "response"}
ttlStore.Store(responseID, request, response)
stored, err := ttlStore.Get(responseID)
Expect(err).ToNot(HaveOccurred())
Expect(stored.ExpiresAt).ToNot(BeNil())
Expect(stored.ExpiresAt.After(time.Now())).To(BeTrue())
})
It("should not set expiration when TTL is 0", func() {
responseID := "resp_no_ttl"
request := &schema.OpenResponsesRequest{Model: "test"}
response := &schema.ORResponseResource{ID: responseID, Object: "response"}
store.Store(responseID, request, response)
stored, err := store.Get(responseID)
Expect(err).ToNot(HaveOccurred())
Expect(stored.ExpiresAt).To(BeNil())
})
It("should clean up expired responses", func() {
ttlStore := NewResponseStore(50 * time.Millisecond)
responseID := "resp_expire_test"
request := &schema.OpenResponsesRequest{Model: "test"}
response := &schema.ORResponseResource{ID: responseID, Object: "response"}
ttlStore.Store(responseID, request, response)
Expect(ttlStore.Count()).To(Equal(1))
// Wait for expiration (longer than TTL and cleanup interval)
time.Sleep(150 * time.Millisecond)
// Cleanup should remove expired response (may have already been cleaned by goroutine)
count := ttlStore.Cleanup()
// Count might be 0 if cleanup goroutine already ran, or 1 if we're first
Expect(count).To(BeNumerically(">=", 0))
Expect(ttlStore.Count()).To(Equal(0))
_, err := ttlStore.Get(responseID)
Expect(err).To(HaveOccurred())
})
It("should return error for expired response", func() {
ttlStore := NewResponseStore(50 * time.Millisecond)
responseID := "resp_expire_error"
request := &schema.OpenResponsesRequest{Model: "test"}
response := &schema.ORResponseResource{ID: responseID, Object: "response"}
ttlStore.Store(responseID, request, response)
// Wait for expiration (but not long enough for cleanup goroutine to remove it)
time.Sleep(75 * time.Millisecond)
// Try to get before cleanup goroutine removes it
_, err := ttlStore.Get(responseID)
// Error could be "expired" or "not found" (if cleanup already ran)
Expect(err).To(HaveOccurred())
// Either error message is acceptable
errMsg := err.Error()
Expect(errMsg).To(Or(ContainSubstring("expired"), ContainSubstring("not found")))
})
})
Describe("Thread Safety", func() {
It("should handle concurrent stores and gets", func() {
// This is a basic concurrency test
done := make(chan bool, 10)
for i := 0; i < 10; i++ {
go func(id int) {
responseID := fmt.Sprintf("resp_concurrent_%d", id)
request := &schema.OpenResponsesRequest{Model: "test"}
response := &schema.ORResponseResource{
ID: responseID,
Object: "response",
Output: []schema.ORItemField{
{Type: "message", ID: fmt.Sprintf("msg_%d", id), Status: "completed"},
},
}
store.Store(responseID, request, response)
// Retrieve immediately
stored, err := store.Get(responseID)
Expect(err).ToNot(HaveOccurred())
Expect(stored).ToNot(BeNil())
done <- true
}(i)
}
// Wait for all goroutines
for i := 0; i < 10; i++ {
<-done
}
Expect(store.Count()).To(Equal(10))
})
})
Describe("GetGlobalStore", func() {
It("should return singleton instance", func() {
store1 := GetGlobalStore()
store2 := GetGlobalStore()
Expect(store1).To(Equal(store2))
})
It("should persist data across GetGlobalStore calls", func() {
globalStore := GetGlobalStore()
responseID := "resp_global_test"
request := &schema.OpenResponsesRequest{Model: "test"}
response := &schema.ORResponseResource{ID: responseID, Object: "response"}
globalStore.Store(responseID, request, response)
// Get store again
globalStore2 := GetGlobalStore()
stored, err := globalStore2.Get(responseID)
Expect(err).ToNot(HaveOccurred())
Expect(stored).ToNot(BeNil())
})
})
Describe("Background Mode Support", func() {
It("should store background response with cancel function", func() {
responseID := "resp_bg_test"
request := &schema.OpenResponsesRequest{Model: "test"}
response := &schema.ORResponseResource{
ID: responseID,
Object: "response",
Status: schema.ORStatusQueued,
}
_, cancel := context.WithCancel(context.Background())
defer cancel()
store.StoreBackground(responseID, request, response, cancel, true)
stored, err := store.Get(responseID)
Expect(err).ToNot(HaveOccurred())
Expect(stored).ToNot(BeNil())
Expect(stored.IsBackground).To(BeTrue())
Expect(stored.StreamEnabled).To(BeTrue())
Expect(stored.CancelFunc).ToNot(BeNil())
})
It("should update status of stored response", func() {
responseID := "resp_status_test"
request := &schema.OpenResponsesRequest{Model: "test"}
response := &schema.ORResponseResource{
ID: responseID,
Object: "response",
Status: schema.ORStatusQueued,
}
store.Store(responseID, request, response)
err := store.UpdateStatus(responseID, schema.ORStatusInProgress, nil)
Expect(err).ToNot(HaveOccurred())
stored, err := store.Get(responseID)
Expect(err).ToNot(HaveOccurred())
Expect(stored.Response.Status).To(Equal(schema.ORStatusInProgress))
})
It("should append and retrieve streaming events", func() {
responseID := "resp_events_test"
request := &schema.OpenResponsesRequest{Model: "test"}
response := &schema.ORResponseResource{
ID: responseID,
Object: "response",
Status: schema.ORStatusInProgress,
}
_, cancel := context.WithCancel(context.Background())
defer cancel()
store.StoreBackground(responseID, request, response, cancel, true)
// Append events
event1 := &schema.ORStreamEvent{
Type: "response.created",
SequenceNumber: 0,
}
event2 := &schema.ORStreamEvent{
Type: "response.in_progress",
SequenceNumber: 1,
}
event3 := &schema.ORStreamEvent{
Type: "response.output_text.delta",
SequenceNumber: 2,
}
err := store.AppendEvent(responseID, event1)
Expect(err).ToNot(HaveOccurred())
err = store.AppendEvent(responseID, event2)
Expect(err).ToNot(HaveOccurred())
err = store.AppendEvent(responseID, event3)
Expect(err).ToNot(HaveOccurred())
// Get all events after -1 (all events)
events, err := store.GetEventsAfter(responseID, -1)
Expect(err).ToNot(HaveOccurred())
Expect(events).To(HaveLen(3))
// Get events after sequence 1
events, err = store.GetEventsAfter(responseID, 1)
Expect(err).ToNot(HaveOccurred())
Expect(events).To(HaveLen(1))
Expect(events[0].SequenceNumber).To(Equal(2))
})
It("should cancel an in-progress response", func() {
responseID := "resp_cancel_test"
request := &schema.OpenResponsesRequest{Model: "test"}
response := &schema.ORResponseResource{
ID: responseID,
Object: "response",
Status: schema.ORStatusInProgress,
}
_, cancel := context.WithCancel(context.Background())
defer cancel()
store.StoreBackground(responseID, request, response, cancel, false)
// Cancel the response
cancelledResponse, err := store.Cancel(responseID)
Expect(err).ToNot(HaveOccurred())
Expect(cancelledResponse.Status).To(Equal(schema.ORStatusCancelled))
Expect(cancelledResponse.CompletedAt).ToNot(BeNil())
})
It("should be idempotent when cancelling already completed response", func() {
responseID := "resp_idempotent_cancel"
request := &schema.OpenResponsesRequest{Model: "test"}
completedAt := time.Now().Unix()
response := &schema.ORResponseResource{
ID: responseID,
Object: "response",
Status: schema.ORStatusCompleted,
CompletedAt: &completedAt,
}
store.Store(responseID, request, response)
// Try to cancel a completed response
cancelledResponse, err := store.Cancel(responseID)
Expect(err).ToNot(HaveOccurred())
// Status should remain completed (not changed to cancelled)
Expect(cancelledResponse.Status).To(Equal(schema.ORStatusCompleted))
})
It("should check if streaming is enabled", func() {
responseID := "resp_stream_check"
request := &schema.OpenResponsesRequest{Model: "test"}
response := &schema.ORResponseResource{
ID: responseID,
Object: "response",
Status: schema.ORStatusQueued,
}
_, cancel := context.WithCancel(context.Background())
defer cancel()
store.StoreBackground(responseID, request, response, cancel, true)
enabled, err := store.IsStreamEnabled(responseID)
Expect(err).ToNot(HaveOccurred())
Expect(enabled).To(BeTrue())
// Store another without streaming
responseID2 := "resp_no_stream"
store.StoreBackground(responseID2, request, response, cancel, false)
enabled2, err := store.IsStreamEnabled(responseID2)
Expect(err).ToNot(HaveOccurred())
Expect(enabled2).To(BeFalse())
})
It("should notify subscribers of new events", func() {
responseID := "resp_events_chan"
request := &schema.OpenResponsesRequest{Model: "test"}
response := &schema.ORResponseResource{
ID: responseID,
Object: "response",
Status: schema.ORStatusInProgress,
}
_, cancel := context.WithCancel(context.Background())
defer cancel()
store.StoreBackground(responseID, request, response, cancel, true)
eventsChan, err := store.GetEventsChan(responseID)
Expect(err).ToNot(HaveOccurred())
Expect(eventsChan).ToNot(BeNil())
// Append an event
event := &schema.ORStreamEvent{
Type: "response.output_text.delta",
SequenceNumber: 0,
}
go func() {
time.Sleep(10 * time.Millisecond)
store.AppendEvent(responseID, event)
}()
// Wait for notification
select {
case <-eventsChan:
// Event received
case <-time.After(1 * time.Second):
Fail("Timeout waiting for event notification")
}
})
})
})

View File

@@ -1,13 +1,33 @@
package http_test
import (
"os"
"path/filepath"
"testing"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
var (
tmpdir string
modelDir string
)
func TestLocalAI(t *testing.T) {
RegisterFailHandler(Fail)
var err error
tmpdir, err = os.MkdirTemp("", "")
Expect(err).ToNot(HaveOccurred())
modelDir = filepath.Join(tmpdir, "models")
err = os.Mkdir(modelDir, 0750)
Expect(err).ToNot(HaveOccurred())
AfterSuite(func() {
err := os.RemoveAll(tmpdir)
Expect(err).ToNot(HaveOccurred())
})
RunSpecs(t, "LocalAI HTTP test suite")
}

View File

@@ -484,3 +484,103 @@ func mergeOpenAIRequestAndModelConfig(config *config.ModelConfig, input *schema.
}
return fmt.Errorf("unable to validate configuration after merging")
}
func (re *RequestExtractor) SetOpenResponsesRequest(c echo.Context) error {
input, ok := c.Get(CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.OpenResponsesRequest)
if !ok || input.Model == "" {
return echo.ErrBadRequest
}
cfg, ok := c.Get(CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
if !ok || cfg == nil {
return echo.ErrBadRequest
}
// Extract or generate the correlation ID (Open Responses uses x-request-id)
correlationID := c.Request().Header.Get("x-request-id")
if correlationID == "" {
correlationID = uuid.New().String()
}
c.Response().Header().Set("x-request-id", correlationID)
// Use the request context directly - Echo properly supports context cancellation!
reqCtx := c.Request().Context()
c1, cancel := context.WithCancel(re.applicationConfig.Context)
// Cancel when request context is cancelled (client disconnects)
go func() {
select {
case <-reqCtx.Done():
cancel()
case <-c1.Done():
// Already cancelled
}
}()
// Add the correlation ID to the new context
ctxWithCorrelationID := context.WithValue(c1, CorrelationIDKey, correlationID)
input.Context = ctxWithCorrelationID
input.Cancel = cancel
err := mergeOpenResponsesRequestAndModelConfig(cfg, input)
if err != nil {
return err
}
if cfg.Model == "" {
xlog.Debug("replacing empty cfg.Model with input value", "input.Model", input.Model)
cfg.Model = input.Model
}
c.Set(CONTEXT_LOCALS_KEY_LOCALAI_REQUEST, input)
c.Set(CONTEXT_LOCALS_KEY_MODEL_CONFIG, cfg)
return nil
}
func mergeOpenResponsesRequestAndModelConfig(config *config.ModelConfig, input *schema.OpenResponsesRequest) error {
// Temperature
if input.Temperature != nil {
config.Temperature = input.Temperature
}
// TopP
if input.TopP != nil {
config.TopP = input.TopP
}
// MaxOutputTokens -> Maxtokens
if input.MaxOutputTokens != nil {
config.Maxtokens = input.MaxOutputTokens
}
// Convert tools to functions - this will be handled in the endpoint handler
// We just validate that tools are present if needed
// Handle tool_choice
if input.ToolChoice != nil {
switch tc := input.ToolChoice.(type) {
case string:
// "auto", "required", or "none"
if tc == "required" {
config.SetFunctionCallString("required")
} else if tc == "none" {
// Don't use tools - handled in endpoint
}
// "auto" is default - let model decide
case map[string]interface{}:
// Specific tool: {type:"function", name:"..."}
if tcType, ok := tc["type"].(string); ok && tcType == "function" {
if name, ok := tc["name"].(string); ok {
config.SetFunctionCallString(name)
}
}
}
}
if valid, _ := config.Validate(); valid {
return nil
}
return fmt.Errorf("unable to validate configuration after merging")
}

View File

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,58 @@
package routes
import (
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/application"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/http/endpoints/openresponses"
"github.com/mudler/LocalAI/core/http/middleware"
"github.com/mudler/LocalAI/core/schema"
)
func RegisterOpenResponsesRoutes(app *echo.Echo,
re *middleware.RequestExtractor,
application *application.Application) {
// Open Responses API endpoint
responsesHandler := openresponses.ResponsesEndpoint(
application.ModelConfigLoader(),
application.ModelLoader(),
application.TemplatesEvaluator(),
application.ApplicationConfig(),
)
responsesMiddleware := []echo.MiddlewareFunc{
middleware.TraceMiddleware(application),
re.BuildFilteredFirstAvailableDefaultModel(config.BuildUsecaseFilterFn(config.FLAG_CHAT)),
re.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.OpenResponsesRequest) }),
setOpenResponsesRequestContext(re),
}
// Main Open Responses endpoint
app.POST("/v1/responses", responsesHandler, responsesMiddleware...)
// Also support without version prefix for compatibility
app.POST("/responses", responsesHandler, responsesMiddleware...)
// GET /responses/:id - Retrieve a response (for polling background requests)
getResponseHandler := openresponses.GetResponseEndpoint()
app.GET("/v1/responses/:id", getResponseHandler, middleware.TraceMiddleware(application))
app.GET("/responses/:id", getResponseHandler, middleware.TraceMiddleware(application))
// POST /responses/:id/cancel - Cancel a background response
cancelResponseHandler := openresponses.CancelResponseEndpoint()
app.POST("/v1/responses/:id/cancel", cancelResponseHandler, middleware.TraceMiddleware(application))
app.POST("/responses/:id/cancel", cancelResponseHandler, middleware.TraceMiddleware(application))
}
// setOpenResponsesRequestContext sets up the context and cancel function for Open Responses requests
func setOpenResponsesRequestContext(re *middleware.RequestExtractor) echo.MiddlewareFunc {
return func(next echo.HandlerFunc) echo.HandlerFunc {
return func(c echo.Context) error {
if err := re.SetOpenResponsesRequest(c); err != nil {
return err
}
return next(c)
}
}
}

View File

@@ -28,6 +28,9 @@
<a href="image/" class="text-[var(--color-text-secondary)] hover:text-[var(--color-text-primary)] px-2 py-2 rounded-lg transition duration-300 ease-in-out hover:bg-[var(--color-bg-secondary)] flex items-center group text-sm">
<i class="fas fa-image text-[var(--color-primary)] mr-1.5 text-sm group-hover:scale-110 transition-transform"></i>Images
</a>
<a href="video/" class="text-[var(--color-text-secondary)] hover:text-[var(--color-text-primary)] px-2 py-2 rounded-lg transition duration-300 ease-in-out hover:bg-[var(--color-bg-secondary)] flex items-center group text-sm">
<i class="fas fa-video text-[var(--color-primary)] mr-1.5 text-sm group-hover:scale-110 transition-transform"></i>Video
</a>
<a href="tts/" class="text-[var(--color-text-secondary)] hover:text-[var(--color-text-primary)] px-2 py-2 rounded-lg transition duration-300 ease-in-out hover:bg-[var(--color-bg-secondary)] flex items-center group text-sm">
<i class="fa-solid fa-music text-[var(--color-primary)] mr-1.5 text-sm group-hover:scale-110 transition-transform"></i>TTS
</a>
@@ -88,6 +91,9 @@
<a href="image/" class="block text-[var(--color-text-secondary)] hover:text-[var(--color-text-primary)] hover:bg-[var(--color-bg-secondary)] px-3 py-2 rounded-lg transition duration-300 ease-in-out flex items-center text-sm">
<i class="fas fa-image text-[var(--color-primary)] mr-3 w-5 text-center text-sm"></i>Images
</a>
<a href="video/" class="block text-[var(--color-text-secondary)] hover:text-[var(--color-text-primary)] hover:bg-[var(--color-bg-secondary)] px-3 py-2 rounded-lg transition duration-300 ease-in-out flex items-center text-sm">
<i class="fas fa-video text-[var(--color-primary)] mr-3 w-5 text-center text-sm"></i>Video
</a>
<a href="tts/" class="block text-[var(--color-text-secondary)] hover:text-[var(--color-text-primary)] hover:bg-[var(--color-bg-secondary)] px-3 py-2 rounded-lg transition duration-300 ease-in-out flex items-center text-sm">
<i class="fa-solid fa-music text-[var(--color-primary)] mr-3 w-5 text-center text-sm"></i>TTS
</a>

View File

@@ -485,6 +485,28 @@
</div>
</div>
<!-- Open Responses Settings Section -->
<div class="bg-[var(--color-bg-secondary)] border border-[var(--color-accent)]/20 rounded-lg p-6">
<h2 class="text-xl font-semibold text-[var(--color-text-primary)] mb-4 flex items-center">
<i class="fas fa-database mr-2 text-[var(--color-accent)] text-sm"></i>
Open Responses Settings
</h2>
<p class="text-xs text-[var(--color-text-secondary)] mb-4">
Configure Open Responses API response storage
</p>
<div class="space-y-4">
<!-- Store TTL -->
<div>
<label class="block text-sm font-medium text-[var(--color-text-primary)] mb-2">Response Store TTL</label>
<p class="text-xs text-[var(--color-text-secondary)] mb-2">Time-to-live for stored responses (e.g., 1h, 30m, 0 = no expiration)</p>
<input type="text" x-model="settings.open_responses_store_ttl"
placeholder="0"
class="w-full px-3 py-2 bg-[var(--color-bg-primary)] border border-[var(--color-accent)]/20 rounded text-sm text-[var(--color-text-primary)] focus:outline-none focus:ring-2 focus:ring-[var(--color-accent)]/50">
</div>
</div>
</div>
<!-- API Keys Settings Section -->
<div class="bg-[var(--color-bg-secondary)] border border-[var(--color-error-light)] rounded-lg p-6">
<h2 class="text-xl font-semibold text-[var(--color-text-primary)] mb-4 flex items-center">
@@ -633,7 +655,8 @@ function settingsDashboard() {
galleries_json: '[]',
backend_galleries_json: '[]',
api_keys_text: '',
agent_job_retention_days: 30
agent_job_retention_days: 30,
open_responses_store_ttl: '0'
},
sourceInfo: '',
saving: false,
@@ -680,7 +703,8 @@ function settingsDashboard() {
galleries_json: JSON.stringify(data.galleries || [], null, 2),
backend_galleries_json: JSON.stringify(data.backend_galleries || [], null, 2),
api_keys_text: (data.api_keys || []).join('\n'),
agent_job_retention_days: data.agent_job_retention_days || 30
agent_job_retention_days: data.agent_job_retention_days || 30,
open_responses_store_ttl: data.open_responses_store_ttl || '0'
};
this.sourceInfo = data.source || 'default';
} else {
@@ -838,6 +862,9 @@ function settingsDashboard() {
if (this.settings.agent_job_retention_days !== undefined) {
payload.agent_job_retention_days = parseInt(this.settings.agent_job_retention_days) || 30;
}
if (this.settings.open_responses_store_ttl !== undefined) {
payload.open_responses_store_ttl = this.settings.open_responses_store_ttl;
}
const response = await fetch('/api/settings', {
method: 'POST',

View File

@@ -0,0 +1,306 @@
package schema
import (
"context"
)
// Open Responses status constants
const (
ORStatusQueued = "queued"
ORStatusInProgress = "in_progress"
ORStatusCompleted = "completed"
ORStatusFailed = "failed"
ORStatusIncomplete = "incomplete"
ORStatusCancelled = "cancelled"
)
// OpenResponsesRequest represents a request to the Open Responses API
// https://www.openresponses.org/specification
type OpenResponsesRequest struct {
Model string `json:"model"`
Input interface{} `json:"input"` // string or []ORItemParam
Tools []ORFunctionTool `json:"tools,omitempty"`
ToolChoice interface{} `json:"tool_choice,omitempty"` // "auto"|"required"|"none"|{type:"function",name:"..."}
Stream bool `json:"stream,omitempty"`
MaxOutputTokens *int `json:"max_output_tokens,omitempty"`
Temperature *float64 `json:"temperature,omitempty"`
TopP *float64 `json:"top_p,omitempty"`
Truncation string `json:"truncation,omitempty"` // "auto"|"disabled"
Instructions string `json:"instructions,omitempty"`
Reasoning *ORReasoningParam `json:"reasoning,omitempty"`
Metadata map[string]string `json:"metadata,omitempty"`
PreviousResponseID string `json:"previous_response_id,omitempty"`
// Additional parameters from spec
TextFormat interface{} `json:"text_format,omitempty"` // TextResponseFormat or JsonSchemaResponseFormatParam
ServiceTier string `json:"service_tier,omitempty"` // "auto"|"default"|priority hint
AllowedTools []string `json:"allowed_tools,omitempty"` // Restrict which tools can be invoked
Store *bool `json:"store,omitempty"` // Whether to store the response
Include []string `json:"include,omitempty"` // What to include in response
ParallelToolCalls *bool `json:"parallel_tool_calls,omitempty"` // Allow parallel tool calls
PresencePenalty *float64 `json:"presence_penalty,omitempty"` // Presence penalty (-2.0 to 2.0)
FrequencyPenalty *float64 `json:"frequency_penalty,omitempty"` // Frequency penalty (-2.0 to 2.0)
TopLogprobs *int `json:"top_logprobs,omitempty"` // Number of top logprobs to return
Background *bool `json:"background,omitempty"` // Run request in background
MaxToolCalls *int `json:"max_tool_calls,omitempty"` // Maximum number of tool calls
// OpenAI-compatible extensions (not in Open Responses spec)
LogitBias map[string]float64 `json:"logit_bias,omitempty"` // Map of token IDs to bias values (-100 to 100)
// Internal fields (like OpenAIRequest)
Context context.Context `json:"-"`
Cancel context.CancelFunc `json:"-"`
}
// ModelName implements the LocalAIRequest interface
func (r *OpenResponsesRequest) ModelName(s *string) string {
if s != nil {
r.Model = *s
}
return r.Model
}
// ORFunctionTool represents a function tool definition
type ORFunctionTool struct {
Type string `json:"type"` // always "function"
Name string `json:"name"`
Description string `json:"description,omitempty"`
Parameters map[string]interface{} `json:"parameters,omitempty"`
Strict bool `json:"strict"` // Always include in response
}
// ORReasoningParam represents reasoning configuration
type ORReasoningParam struct {
Effort string `json:"effort,omitempty"` // "none"|"low"|"medium"|"high"|"xhigh"
Summary string `json:"summary,omitempty"` // "auto"|"concise"|"detailed"
}
// ORItemParam represents an input/output item (discriminated union by type)
type ORItemParam struct {
Type string `json:"type"` // message|function_call|function_call_output|reasoning|item_reference
ID string `json:"id,omitempty"` // Present for all output items
Status string `json:"status,omitempty"` // in_progress|completed|incomplete
// Message fields
Role string `json:"role,omitempty"` // user|assistant|system|developer
Content interface{} `json:"content,omitempty"` // string or []ORContentPart for messages
// Function call fields
CallID string `json:"call_id,omitempty"`
Name string `json:"name,omitempty"`
Arguments string `json:"arguments,omitempty"`
// Function call output fields
Output interface{} `json:"output,omitempty"` // string or []ORContentPart
// Note: For item_reference type, use the ID field above to reference the item
}
// ORContentPart represents a content block (discriminated union by type)
// For output_text: type, text, annotations, logprobs are ALL REQUIRED per Open Responses spec
type ORContentPart struct {
Type string `json:"type"` // input_text|input_image|input_file|output_text|refusal
Text string `json:"text"` // REQUIRED for output_text - must always be present (even if empty)
Annotations []ORAnnotation `json:"annotations"` // REQUIRED for output_text - must always be present (use [])
Logprobs []ORLogProb `json:"logprobs"` // REQUIRED for output_text - must always be present (use [])
ImageURL string `json:"image_url,omitempty"`
FileURL string `json:"file_url,omitempty"`
Filename string `json:"filename,omitempty"`
FileData string `json:"file_data,omitempty"`
Refusal string `json:"refusal,omitempty"`
Detail string `json:"detail,omitempty"` // low|high|auto for images
}
// OROutputTextContentPart is an alias for ORContentPart used specifically for output_text
type OROutputTextContentPart = ORContentPart
// ORItemField represents an output item (same structure as ORItemParam)
type ORItemField = ORItemParam
// ORResponseResource represents the main response object
type ORResponseResource struct {
ID string `json:"id"`
Object string `json:"object"` // always "response"
CreatedAt int64 `json:"created_at"`
CompletedAt *int64 `json:"completed_at"` // Required: present as number or null
Status string `json:"status"` // in_progress|completed|failed|incomplete
Model string `json:"model"`
Output []ORItemField `json:"output"`
Error *ORError `json:"error"` // Always present, null if no error
IncompleteDetails *ORIncompleteDetails `json:"incomplete_details"` // Always present, null if complete
PreviousResponseID *string `json:"previous_response_id"`
Instructions *string `json:"instructions"`
// Tool-related fields
Tools []ORFunctionTool `json:"tools"` // Always present, empty array if no tools
ToolChoice interface{} `json:"tool_choice"`
ParallelToolCalls bool `json:"parallel_tool_calls"`
MaxToolCalls *int `json:"max_tool_calls"` // nullable
// Sampling parameters (always required)
Temperature float64 `json:"temperature"`
TopP float64 `json:"top_p"`
PresencePenalty float64 `json:"presence_penalty"`
FrequencyPenalty float64 `json:"frequency_penalty"`
TopLogprobs int `json:"top_logprobs"` // Default to 0
MaxOutputTokens *int `json:"max_output_tokens"`
// Text format configuration
Text *ORTextConfig `json:"text"`
// Truncation and reasoning
Truncation string `json:"truncation"`
Reasoning *ORReasoning `json:"reasoning"` // nullable
// Usage statistics
Usage *ORUsage `json:"usage"` // nullable
// Metadata and operational flags
Metadata map[string]string `json:"metadata"`
Store bool `json:"store"`
Background bool `json:"background"`
ServiceTier string `json:"service_tier"`
// Safety and caching
SafetyIdentifier *string `json:"safety_identifier"` // nullable
PromptCacheKey *string `json:"prompt_cache_key"` // nullable
}
// ORTextConfig represents text format configuration
type ORTextConfig struct {
Format *ORTextFormat `json:"format,omitempty"`
}
// ORTextFormat represents the text format type
type ORTextFormat struct {
Type string `json:"type"` // "text" or "json_schema"
}
// ORError represents an error in the response
type ORError struct {
Type string `json:"type"` // invalid_request|not_found|server_error|model_error|too_many_requests
Code string `json:"code,omitempty"`
Message string `json:"message"`
Param string `json:"param,omitempty"`
}
// ORUsage represents token usage statistics
type ORUsage struct {
InputTokens int `json:"input_tokens"`
OutputTokens int `json:"output_tokens"`
TotalTokens int `json:"total_tokens"`
InputTokensDetails *ORInputTokensDetails `json:"input_tokens_details"` // Always present
OutputTokensDetails *OROutputTokensDetails `json:"output_tokens_details"` // Always present
}
// ORInputTokensDetails represents input token breakdown
type ORInputTokensDetails struct {
CachedTokens int `json:"cached_tokens"` // Always include, even if 0
}
// OROutputTokensDetails represents output token breakdown
type OROutputTokensDetails struct {
ReasoningTokens int `json:"reasoning_tokens"` // Always include, even if 0
}
// ORReasoning represents reasoning configuration and metadata
type ORReasoning struct {
Effort string `json:"effort,omitempty"`
Summary string `json:"summary,omitempty"`
}
// ORIncompleteDetails represents details about why a response was incomplete
type ORIncompleteDetails struct {
Reason string `json:"reason"`
}
// ORStreamEvent represents a streaming event
// Note: Fields like delta, text, logprobs should be set explicitly for events that require them
// The sendSSEEvent function uses a custom serializer to handle conditional field inclusion
type ORStreamEvent struct {
Type string `json:"type"`
SequenceNumber int `json:"sequence_number"`
Response *ORResponseResource `json:"response,omitempty"`
OutputIndex *int `json:"output_index,omitempty"`
ContentIndex *int `json:"content_index,omitempty"`
SummaryIndex *int `json:"summary_index,omitempty"`
ItemID string `json:"item_id,omitempty"`
Item *ORItemField `json:"item,omitempty"`
Part *ORContentPart `json:"part,omitempty"`
Delta *string `json:"delta,omitempty"` // Pointer to distinguish unset from empty
Text *string `json:"text,omitempty"` // Pointer to distinguish unset from empty
Arguments *string `json:"arguments,omitempty"` // Pointer to distinguish unset from empty
Refusal string `json:"refusal,omitempty"`
Error *ORErrorPayload `json:"error,omitempty"`
Logprobs *[]ORLogProb `json:"logprobs,omitempty"` // Pointer to distinguish unset from empty
Obfuscation string `json:"obfuscation,omitempty"`
Annotation *ORAnnotation `json:"annotation,omitempty"`
AnnotationIndex *int `json:"annotation_index,omitempty"`
}
// ORErrorPayload represents an error payload in streaming events
type ORErrorPayload struct {
Type string `json:"type"`
Code string `json:"code,omitempty"`
Message string `json:"message"`
Param string `json:"param,omitempty"`
Headers map[string]string `json:"headers,omitempty"`
}
// ORLogProb represents log probability information
type ORLogProb struct {
Token string `json:"token"`
Logprob float64 `json:"logprob"`
Bytes []int `json:"bytes"`
TopLogprobs []ORTopLogProb `json:"top_logprobs,omitempty"`
}
// ORTopLogProb represents a top log probability
type ORTopLogProb struct {
Token string `json:"token"`
Logprob float64 `json:"logprob"`
Bytes []int `json:"bytes"`
}
// ORAnnotation represents an annotation (e.g., URL citation)
type ORAnnotation struct {
Type string `json:"type"` // url_citation
StartIndex int `json:"start_index"`
EndIndex int `json:"end_index"`
URL string `json:"url"`
Title string `json:"title"`
}
// ORContentPartWithLogprobs creates an output_text content part with logprobs converted from OpenAI format
func ORContentPartWithLogprobs(text string, logprobs *Logprobs) ORContentPart {
orLogprobs := []ORLogProb{}
// Convert OpenAI-style logprobs to Open Responses format
if logprobs != nil && len(logprobs.Content) > 0 {
for _, lp := range logprobs.Content {
// Convert top logprobs
topLPs := []ORTopLogProb{}
for _, tlp := range lp.TopLogprobs {
topLPs = append(topLPs, ORTopLogProb{
Token: tlp.Token,
Logprob: tlp.Logprob,
Bytes: tlp.Bytes,
})
}
orLogprobs = append(orLogprobs, ORLogProb{
Token: lp.Token,
Logprob: lp.Logprob,
Bytes: lp.Bytes,
TopLogprobs: topLPs,
})
}
}
return ORContentPart{
Type: "output_text",
Text: text,
Annotations: []ORAnnotation{}, // REQUIRED - must always be present as array (empty if none)
Logprobs: orLogprobs, // REQUIRED - must always be present as array (empty if none)
}
}

View File

@@ -3822,6 +3822,41 @@
- filename: boomerang-qwen3-4.9B.Q4_K_M.gguf
sha256: 11e6c068351d104dee31dd63550e5e2fc9be70467c1cfc07a6f84030cb701537
uri: huggingface://mradermacher/boomerang-qwen3-4.9B-GGUF/boomerang-qwen3-4.9B.Q4_K_M.gguf
- !!merge <<: *qwen3
name: "qwen3-coder-30b-a3b-instruct"
icon: https://cdn-avatars.huggingface.co/v1/production/uploads/620760a26e3b7210c2ff1943/-s1gyJfvbE1RgO5iBeNOi.png
url: "github:mudler/LocalAI/gallery/qwen3.yaml@master"
urls:
- https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct
- https://huggingface.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF
description: |
Qwen3-Coder is available in multiple sizes. Today, we're excited to introduce Qwen3-Coder-30B-A3B-Instruct. This streamlined model maintains impressive performance and efficiency, featuring the following key enhancements:
- Significant Performance among open models on Agentic Coding, Agentic Browser-Use, and other foundational coding tasks.
- Long-context Capabilities with native support for 256K tokens, extendable up to 1M tokens using Yarn, optimized for repository-scale understanding.
- Agentic Coding supporting for most platform such as Qwen Code, CLINE, featuring a specially designed function call format.
Model Overview:
Qwen3-Coder-30B-A3B-Instruct has the following features:
- Type: Causal Language Models
- Training Stage: Pretraining & Post-training
- Number of Parameters: 30.5B in total and 3.3B activated
- Number of Layers: 48
- Number of Attention Heads (GQA): 32 for Q and 4 for KV
- Number of Experts: 128
- Number of Activated Experts: 8
- Context Length: 262,144 natively.
NOTE: This model supports only non-thinking mode and does not generate <think></think> blocks in its output. Meanwhile, specifying enable_thinking=False is no longer required.
overrides:
parameters:
model: Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf
files:
- filename: Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf
sha256: fadc3e5f8d42bf7e894a785b05082e47daee4df26680389817e2093056f088ad
uri: huggingface://unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF/Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf
- &gemma3
url: "github:mudler/LocalAI/gallery/gemma.yaml@master"
name: "gemma-3-27b-it"