chore: drop neutts for l4t (#8101 )

Builds exhausts CI currently, and there are better backends at this point in time. We will probably deprecate it in the future. Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
chore(model gallery): add qwen3-coder-30b-a3b-instruct based on model request (#8082 )
2026-02-03 03:02:38 -05:00 · 2026-01-18 21:55:56 +01:00 · 2026-01-18 09:23:07 +01:00 · 2026-01-17 22:12:21 +01:00 · 2026-01-17 22:11:47 +01:00 · 2026-01-17 21:10:18 +00:00
22 changed files with 6048 additions and 37 deletions
--- a/.github/workflows/backend.yml
+++ b/.github/workflows/backend.yml
@@ -137,7 +137,7 @@ jobs:
            platforms: 'linux/amd64'
            tag-latest: 'auto'
            tag-suffix: '-gpu-nvidia-cuda-12-llama-cpp'
-            runs-on: 'ubuntu-latest'
+            runs-on: 'bigger-runner'
            base-image: "ubuntu:24.04"
            skip-drivers: 'false'
            backend: "llama-cpp"
@@ -699,7 +699,7 @@ jobs:
            platforms: 'linux/amd64'
            tag-latest: 'auto'
            tag-suffix: '-gpu-rocm-hipblas-faster-whisper'
-            runs-on: 'ubuntu-latest'
+            runs-on: 'bigger-runner'
            base-image: "rocm/dev-ubuntu-24.04:6.4.4"
            skip-drivers: 'false'
            backend: "faster-whisper"
@@ -712,7 +712,7 @@ jobs:
            platforms: 'linux/amd64'
            tag-latest: 'auto'
            tag-suffix: '-gpu-rocm-hipblas-coqui'
-            runs-on: 'ubuntu-latest'
+            runs-on: 'bigger-runner'
            base-image: "rocm/dev-ubuntu-24.04:6.4.4"
            skip-drivers: 'false'
            backend: "coqui"
@@ -963,7 +963,7 @@ jobs:
            platforms: 'linux/amd64,linux/arm64'
            tag-latest: 'auto'
            tag-suffix: '-cpu-llama-cpp'
-            runs-on: 'ubuntu-latest'
+            runs-on: 'bigger-runner'
            base-image: "ubuntu:24.04"
            skip-drivers: 'false'
            backend: "llama-cpp"
@@ -989,7 +989,7 @@ jobs:
            platforms: 'linux/amd64,linux/arm64'
            tag-latest: 'auto'
            tag-suffix: '-gpu-vulkan-llama-cpp'
-            runs-on: 'ubuntu-latest'
+            runs-on: 'bigger-runner'
            base-image: "ubuntu:24.04"
            skip-drivers: 'false'
            backend: "llama-cpp"
@@ -1330,19 +1330,6 @@ jobs:
            dockerfile: "./backend/Dockerfile.python"
            context: "./"
            ubuntu-version: '2404'
-          - build-type: 'l4t'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            skip-drivers: 'true'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-arm64-neutts'
-            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-            runs-on: 'ubuntu-24.04-arm'
-            backend: "neutts"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2204'
          - build-type: ''
            cuda-major-version: ""
            cuda-minor-version: ""
--- a/backend/cpp/llama-cpp/Makefile
+++ b/backend/cpp/llama-cpp/Makefile
@@ -1,5 +1,5 @@

-LLAMA_VERSION?=785a71008573e2d84728fb0ba9e851d72d3f8fab
+LLAMA_VERSION?=2fbde785bc106ae1c4102b0e82b9b41d9c466579
 LLAMA_REPO?=https://github.com/ggerganov/llama.cpp

 CMAKE_ARGS?=
--- a/backend/go/stablediffusion-ggml/Makefile
+++ b/backend/go/stablediffusion-ggml/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # stablediffusion.cpp (ggml)
 STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
-STABLEDIFFUSION_GGML_VERSION?=7010bb4dff7bd55b03d35ef9772142c21699eba9
+STABLEDIFFUSION_GGML_VERSION?=9565c7f6bd5fcff124c589147b2621244f2c4aa1

 CMAKE_ARGS+=-DGGML_MAX_NAME=128

--- a/backend/go/whisper/Makefile
+++ b/backend/go/whisper/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # whisper.cpp version
 WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp
-WHISPER_CPP_VERSION?=2eeeba56e9edd762b4b38467bab96c2517163158
+WHISPER_CPP_VERSION?=f53dc74843e97f19f94a79241357f74ad5b691a6
 SO_TARGET?=libgowhisper.so

 CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
--- a/backend/index.yaml
+++ b/backend/index.yaml
@@ -537,18 +537,14 @@
    default: "cpu-neutts"
    nvidia: "cuda12-neutts"
    amd: "rocm-neutts"
-    nvidia-l4t: "nvidia-l4t-neutts"
    nvidia-cuda-12: "cuda12-neutts"
-    nvidia-l4t-cuda-12: "nvidia-l4t-arm64-neutts"
 - !!merge <<: *neutts
  name: "neutts-development"
  capabilities:
    default: "cpu-neutts-development"
    nvidia: "cuda12-neutts-development"
    amd: "rocm-neutts-development"
-    nvidia-l4t: "nvidia-l4t-neutts-development"
    nvidia-cuda-12: "cuda12-neutts-development"
-    nvidia-l4t-cuda-12: "nvidia-l4t-arm64-neutts-development"
 - !!merge <<: *llamacpp
  name: "llama-cpp-development"
  capabilities:
@@ -578,11 +574,6 @@
  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-neutts"
  mirrors:
    - localai/localai-backends:latest-gpu-rocm-hipblas-neutts
- !!merge <<: *neutts
-  name: "nvidia-l4t-arm64-neutts"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-neutts"
-  mirrors:
-    - localai/localai-backends:latest-nvidia-l4t-arm64-neutts
 - !!merge <<: *neutts
  name: "cpu-neutts-development"
  uri: "quay.io/go-skynet/local-ai-backends:master-cpu-neutts"
@@ -598,11 +589,6 @@
  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-neutts"
  mirrors:
    - localai/localai-backends:master-gpu-rocm-hipblas-neutts
- !!merge <<: *neutts
-  name: "nvidia-l4t-arm64-neutts-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-arm64-neutts"
-  mirrors:
-    - localai/localai-backends:master-nvidia-l4t-arm64-neutts
 - !!merge <<: *mlx
  name: "mlx-development"
  uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-mlx"
--- a/core/cli/run.go
+++ b/core/cli/run.go
@@ -83,6 +83,7 @@ type RunCMD struct {
 	EnableTracing                      bool     `env:"LOCALAI_ENABLE_TRACING,ENABLE_TRACING" help:"Enable API tracing" group:"api"`
 	TracingMaxItems                    int      `env:"LOCALAI_TRACING_MAX_ITEMS" default:"1024" help:"Maximum number of traces to keep" group:"api"`
 	AgentJobRetentionDays              int      `env:"LOCALAI_AGENT_JOB_RETENTION_DAYS,AGENT_JOB_RETENTION_DAYS" default:"30" help:"Number of days to keep agent job history (default: 30)" group:"api"`
+	OpenResponsesStoreTTL               string   `env:"LOCALAI_OPEN_RESPONSES_STORE_TTL,OPEN_RESPONSES_STORE_TTL" default:"0" help:"TTL for Open Responses store (e.g., 1h, 30m, 0 = no expiration)" group:"api"`

 	Version bool
 }
@@ -249,6 +250,15 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error {
 		opts = append(opts, config.WithLRUEvictionRetryInterval(dur))
 	}

+	// Handle Open Responses store TTL
+	if r.OpenResponsesStoreTTL != "" && r.OpenResponsesStoreTTL != "0" {
+		dur, err := time.ParseDuration(r.OpenResponsesStoreTTL)
+		if err != nil {
+			return fmt.Errorf("invalid Open Responses store TTL: %w", err)
+		}
+		opts = append(opts, config.WithOpenResponsesStoreTTL(dur))
+	}
+
 	// split ":" to get backend name and the uri
 	for _, v := range r.ExternalGRPCBackends {
 		backend := v[:strings.IndexByte(v, ':')]
--- a/core/config/application_config.go
+++ b/core/config/application_config.go
@@ -86,6 +86,8 @@ type ApplicationConfig struct {

 	AgentJobRetentionDays int // Default: 30 days

+	OpenResponsesStoreTTL time.Duration // TTL for Open Responses store (0 = no expiration)
+
 	PathWithoutAuth []string
 }

@@ -467,6 +469,12 @@ func WithAgentJobRetentionDays(days int) AppOption {
 	}
 }

+func WithOpenResponsesStoreTTL(ttl time.Duration) AppOption {
+	return func(o *ApplicationConfig) {
+		o.OpenResponsesStoreTTL = ttl
+	}
+}
+
 func WithEnforcedPredownloadScans(enforced bool) AppOption {
 	return func(o *ApplicationConfig) {
 		o.EnforcePredownloadScans = enforced
@@ -594,6 +602,12 @@ func (o *ApplicationConfig) ToRuntimeSettings() RuntimeSettings {
 	} else {
 		lruEvictionRetryInterval = "1s" // default
 	}
+	var openResponsesStoreTTL string
+	if o.OpenResponsesStoreTTL > 0 {
+		openResponsesStoreTTL = o.OpenResponsesStoreTTL.String()
+	} else {
+		openResponsesStoreTTL = "0" // default: no expiration
+	}

 	return RuntimeSettings{
 		WatchdogEnabled:          &watchdogEnabled,
@@ -628,6 +642,7 @@ func (o *ApplicationConfig) ToRuntimeSettings() RuntimeSettings {
 		AutoloadBackendGalleries: &autoloadBackendGalleries,
 		ApiKeys:                  &apiKeys,
 		AgentJobRetentionDays:    &agentJobRetentionDays,
+		OpenResponsesStoreTTL:     &openResponsesStoreTTL,
 	}
 }

@@ -769,6 +784,14 @@ func (o *ApplicationConfig) ApplyRuntimeSettings(settings *RuntimeSettings) (req
 	if settings.AgentJobRetentionDays != nil {
 		o.AgentJobRetentionDays = *settings.AgentJobRetentionDays
 	}
+	if settings.OpenResponsesStoreTTL != nil {
+		if *settings.OpenResponsesStoreTTL == "0" || *settings.OpenResponsesStoreTTL == "" {
+			o.OpenResponsesStoreTTL = 0 // No expiration
+		} else if dur, err := time.ParseDuration(*settings.OpenResponsesStoreTTL); err == nil {
+			o.OpenResponsesStoreTTL = dur
+		}
+		// This setting doesn't require restart, can be updated dynamically
+	}
 	// Note: ApiKeys requires special handling (merging with startup keys) - handled in caller

 	return requireRestart
--- a/core/config/runtime_settings.go
+++ b/core/config/runtime_settings.go
@@ -60,4 +60,7 @@ type RuntimeSettings struct {

 	// Agent settings
 	AgentJobRetentionDays *int `json:"agent_job_retention_days,omitempty"`
+
+	// Open Responses settings
+	OpenResponsesStoreTTL *string `json:"open_responses_store_ttl,omitempty"` // TTL for stored responses (e.g., "1h", "30m", "0" = no expiration)
 }
--- a/core/http/app.go
+++ b/core/http/app.go
@@ -193,6 +193,8 @@ func API(application *application.Application) (*echo.Echo, error) {
 			corsConfig.AllowOrigins = strings.Split(application.ApplicationConfig().CORSAllowOrigins, ",")
 		}
 		e.Use(middleware.CORSWithConfig(corsConfig))
+	} else {
+		e.Use(middleware.CORS())
 	}

 	// CSRF middleware
@@ -214,6 +216,7 @@ func API(application *application.Application) (*echo.Echo, error) {
 	routes.RegisterLocalAIRoutes(e, requestExtractor, application.ModelConfigLoader(), application.ModelLoader(), application.ApplicationConfig(), application.GalleryService(), opcache, application.TemplatesEvaluator(), application)
 	routes.RegisterOpenAIRoutes(e, requestExtractor, application)
 	routes.RegisterAnthropicRoutes(e, requestExtractor, application)
+	routes.RegisterOpenResponsesRoutes(e, requestExtractor, application)
 	if !application.ApplicationConfig().DisableWebUI {
 		routes.RegisterUIAPIRoutes(e, application.ModelConfigLoader(), application.ModelLoader(), application.ApplicationConfig(), application.GalleryService(), opcache, application)
 		routes.RegisterUIRoutes(e, application.ModelConfigLoader(), application.ModelLoader(), application.ApplicationConfig(), application.GalleryService())
--- a/core/http/endpoints/localai/settings.go
+++ b/core/http/endpoints/localai/settings.go
@@ -11,6 +11,7 @@ import (
 	"github.com/labstack/echo/v4"
 	"github.com/mudler/LocalAI/core/application"
 	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/http/endpoints/openresponses"
 	"github.com/mudler/LocalAI/core/p2p"
 	"github.com/mudler/LocalAI/core/schema"
 	"github.com/mudler/xlog"
@@ -84,6 +85,16 @@ func UpdateSettingsEndpoint(app *application.Application) echo.HandlerFunc {
 				})
 			}
 		}
+		if settings.OpenResponsesStoreTTL != nil {
+			if *settings.OpenResponsesStoreTTL != "0" && *settings.OpenResponsesStoreTTL != "" {
+				if _, err := time.ParseDuration(*settings.OpenResponsesStoreTTL); err != nil {
+					return c.JSON(http.StatusBadRequest, schema.SettingsResponse{
+						Success: false,
+						Error:   "Invalid open_responses_store_ttl format: " + err.Error(),
+					})
+				}
+			}
+		}

 		// Save to file
 		if appConfig.DynamicConfigsDir == "" {
@@ -144,6 +155,22 @@ func UpdateSettingsEndpoint(app *application.Application) echo.HandlerFunc {
 			xlog.Info("Updated LRU eviction retry settings", "maxRetries", maxRetries, "retryInterval", retryInterval)
 		}

+		// Update Open Responses store TTL dynamically
+		if settings.OpenResponsesStoreTTL != nil {
+			ttl := time.Duration(0)
+			if *settings.OpenResponsesStoreTTL != "0" && *settings.OpenResponsesStoreTTL != "" {
+				if dur, err := time.ParseDuration(*settings.OpenResponsesStoreTTL); err == nil {
+					ttl = dur
+				} else {
+					xlog.Warn("Invalid Open Responses store TTL format", "ttl", *settings.OpenResponsesStoreTTL, "error", err)
+				}
+			}
+			// Import the store package
+			store := openresponses.GetGlobalStore()
+			store.SetTTL(ttl)
+			xlog.Info("Updated Open Responses store TTL", "ttl", ttl)
+		}
+
 		// Check if agent job retention changed
 		agentJobChanged := settings.AgentJobRetentionDays != nil

--- a/core/http/endpoints/openresponses/responses.go
+++ b/core/http/endpoints/openresponses/responses.go
--- a/core/http/endpoints/openresponses/store.go
+++ b/core/http/endpoints/openresponses/store.go
@@ -0,0 +1,453 @@
+package openresponses
+
+import (
+	"context"
+	"encoding/json"
+	"fmt"
+	"sync"
+	"time"
+
+	"github.com/mudler/LocalAI/core/schema"
+	"github.com/mudler/xlog"
+)
+
+// ResponseStore provides thread-safe storage for Open Responses API responses
+type ResponseStore struct {
+	mu            sync.RWMutex
+	responses     map[string]*StoredResponse
+	ttl           time.Duration // Time-to-live for stored responses (0 = no expiration)
+	cleanupCtx    context.Context
+	cleanupCancel context.CancelFunc
+}
+
+// StreamedEvent represents a buffered SSE event for streaming resume
+type StreamedEvent struct {
+	SequenceNumber int    `json:"sequence_number"`
+	EventType      string `json:"event_type"`
+	Data           []byte `json:"data"` // JSON-serialized event
+}
+
+// StoredResponse contains a complete response with its input request and output items
+type StoredResponse struct {
+	Request   *schema.OpenResponsesRequest
+	Response  *schema.ORResponseResource
+	Items     map[string]*schema.ORItemField // item_id -> item mapping for quick lookup
+	StoredAt  time.Time
+	ExpiresAt *time.Time // nil if no expiration
+
+	// Background execution support
+	CancelFunc    context.CancelFunc // For cancellation of background tasks
+	StreamEvents  []StreamedEvent    // Buffered events for streaming resume
+	StreamEnabled bool               // Was created with stream=true
+	IsBackground  bool               // Was created with background=true
+	EventsChan    chan struct{}      // Signals new events for live subscribers
+	mu            sync.RWMutex       // Protect concurrent access to this response
+}
+
+var (
+	globalStore *ResponseStore
+	storeOnce   sync.Once
+)
+
+// GetGlobalStore returns the singleton response store instance
+func GetGlobalStore() *ResponseStore {
+	storeOnce.Do(func() {
+		globalStore = NewResponseStore(0) // Default: no TTL, will be updated from appConfig
+	})
+	return globalStore
+}
+
+// SetTTL updates the TTL for the store
+// This will affect all new responses stored after this call
+func (s *ResponseStore) SetTTL(ttl time.Duration) {
+	s.mu.Lock()
+	defer s.mu.Unlock()
+
+	// Stop existing cleanup loop if running
+	if s.cleanupCancel != nil {
+		s.cleanupCancel()
+		s.cleanupCancel = nil
+		s.cleanupCtx = nil
+	}
+
+	s.ttl = ttl
+
+	// If TTL > 0, start cleanup loop
+	if ttl > 0 {
+		s.cleanupCtx, s.cleanupCancel = context.WithCancel(context.Background())
+		go s.cleanupLoop(s.cleanupCtx)
+	}
+
+	xlog.Debug("Updated Open Responses store TTL", "ttl", ttl, "cleanup_running", ttl > 0)
+}
+
+// NewResponseStore creates a new response store with optional TTL
+// If ttl is 0, responses are stored indefinitely
+func NewResponseStore(ttl time.Duration) *ResponseStore {
+	store := &ResponseStore{
+		responses: make(map[string]*StoredResponse),
+		ttl:       ttl,
+	}
+
+	// Start cleanup goroutine if TTL is set
+	if ttl > 0 {
+		store.cleanupCtx, store.cleanupCancel = context.WithCancel(context.Background())
+		go store.cleanupLoop(store.cleanupCtx)
+	}
+
+	return store
+}
+
+// Store stores a response with its request and items
+func (s *ResponseStore) Store(responseID string, request *schema.OpenResponsesRequest, response *schema.ORResponseResource) {
+	s.mu.Lock()
+	defer s.mu.Unlock()
+
+	// Build item index for quick lookup
+	items := make(map[string]*schema.ORItemField)
+	for i := range response.Output {
+		item := &response.Output[i]
+		if item.ID != "" {
+			items[item.ID] = item
+		}
+	}
+
+	stored := &StoredResponse{
+		Request:   request,
+		Response:  response,
+		Items:     items,
+		StoredAt:  time.Now(),
+		ExpiresAt: nil,
+	}
+
+	// Set expiration if TTL is configured
+	if s.ttl > 0 {
+		expiresAt := time.Now().Add(s.ttl)
+		stored.ExpiresAt = &expiresAt
+	}
+
+	s.responses[responseID] = stored
+	xlog.Debug("Stored Open Responses response", "response_id", responseID, "items_count", len(items))
+}
+
+// Get retrieves a stored response by ID
+func (s *ResponseStore) Get(responseID string) (*StoredResponse, error) {
+	s.mu.RLock()
+	defer s.mu.RUnlock()
+
+	stored, exists := s.responses[responseID]
+	if !exists {
+		return nil, fmt.Errorf("response not found: %s", responseID)
+	}
+
+	// Check expiration
+	if stored.ExpiresAt != nil && time.Now().After(*stored.ExpiresAt) {
+		// Expired, but we'll return it anyway and let caller handle cleanup
+		return nil, fmt.Errorf("response expired: %s", responseID)
+	}
+
+	return stored, nil
+}
+
+// GetItem retrieves a specific item from a stored response
+func (s *ResponseStore) GetItem(responseID, itemID string) (*schema.ORItemField, error) {
+	stored, err := s.Get(responseID)
+	if err != nil {
+		return nil, err
+	}
+
+	item, exists := stored.Items[itemID]
+	if !exists {
+		return nil, fmt.Errorf("item not found: %s in response %s", itemID, responseID)
+	}
+
+	return item, nil
+}
+
+// FindItem searches for an item across all stored responses
+// Returns the item and the response ID it was found in
+func (s *ResponseStore) FindItem(itemID string) (*schema.ORItemField, string, error) {
+	s.mu.RLock()
+	defer s.mu.RUnlock()
+
+	now := time.Now()
+	for responseID, stored := range s.responses {
+		// Skip expired responses
+		if stored.ExpiresAt != nil && now.After(*stored.ExpiresAt) {
+			continue
+		}
+
+		if item, exists := stored.Items[itemID]; exists {
+			return item, responseID, nil
+		}
+	}
+
+	return nil, "", fmt.Errorf("item not found in any stored response: %s", itemID)
+}
+
+// Delete removes a response from storage
+func (s *ResponseStore) Delete(responseID string) {
+	s.mu.Lock()
+	defer s.mu.Unlock()
+	delete(s.responses, responseID)
+	xlog.Debug("Deleted Open Responses response", "response_id", responseID)
+}
+
+// Cleanup removes expired responses
+func (s *ResponseStore) Cleanup() int {
+	if s.ttl == 0 {
+		return 0
+	}
+
+	s.mu.Lock()
+	defer s.mu.Unlock()
+
+	now := time.Now()
+	count := 0
+	for id, stored := range s.responses {
+		if stored.ExpiresAt != nil && now.After(*stored.ExpiresAt) {
+			delete(s.responses, id)
+			count++
+		}
+	}
+
+	if count > 0 {
+		xlog.Debug("Cleaned up expired Open Responses", "count", count)
+	}
+
+	return count
+}
+
+// cleanupLoop runs periodic cleanup of expired responses
+func (s *ResponseStore) cleanupLoop(ctx context.Context) {
+	if s.ttl == 0 {
+		return
+	}
+
+	ticker := time.NewTicker(s.ttl / 2) // Cleanup at half TTL interval
+	defer ticker.Stop()
+
+	for {
+		select {
+		case <-ctx.Done():
+			xlog.Debug("Stopped Open Responses store cleanup loop")
+			return
+		case <-ticker.C:
+			s.Cleanup()
+		}
+	}
+}
+
+// Count returns the number of stored responses
+func (s *ResponseStore) Count() int {
+	s.mu.RLock()
+	defer s.mu.RUnlock()
+	return len(s.responses)
+}
+
+// StoreBackground stores a background response with cancel function and optional streaming support
+func (s *ResponseStore) StoreBackground(responseID string, request *schema.OpenResponsesRequest, response *schema.ORResponseResource, cancelFunc context.CancelFunc, streamEnabled bool) {
+	s.mu.Lock()
+	defer s.mu.Unlock()
+
+	// Build item index for quick lookup
+	items := make(map[string]*schema.ORItemField)
+	for i := range response.Output {
+		item := &response.Output[i]
+		if item.ID != "" {
+			items[item.ID] = item
+		}
+	}
+
+	stored := &StoredResponse{
+		Request:       request,
+		Response:      response,
+		Items:         items,
+		StoredAt:      time.Now(),
+		ExpiresAt:     nil,
+		CancelFunc:    cancelFunc,
+		StreamEvents:  []StreamedEvent{},
+		StreamEnabled: streamEnabled,
+		IsBackground:  true,
+		EventsChan:    make(chan struct{}, 100), // Buffered channel for event notifications
+	}
+
+	// Set expiration if TTL is configured
+	if s.ttl > 0 {
+		expiresAt := time.Now().Add(s.ttl)
+		stored.ExpiresAt = &expiresAt
+	}
+
+	s.responses[responseID] = stored
+	xlog.Debug("Stored background Open Responses response", "response_id", responseID, "stream_enabled", streamEnabled)
+}
+
+// UpdateStatus updates the status of a stored response
+func (s *ResponseStore) UpdateStatus(responseID string, status string, completedAt *int64) error {
+	s.mu.RLock()
+	stored, exists := s.responses[responseID]
+	s.mu.RUnlock()
+
+	if !exists {
+		return fmt.Errorf("response not found: %s", responseID)
+	}
+
+	stored.mu.Lock()
+	defer stored.mu.Unlock()
+
+	stored.Response.Status = status
+	stored.Response.CompletedAt = completedAt
+
+	xlog.Debug("Updated response status", "response_id", responseID, "status", status)
+	return nil
+}
+
+// UpdateResponse updates the entire response object for a stored response
+func (s *ResponseStore) UpdateResponse(responseID string, response *schema.ORResponseResource) error {
+	s.mu.RLock()
+	stored, exists := s.responses[responseID]
+	s.mu.RUnlock()
+
+	if !exists {
+		return fmt.Errorf("response not found: %s", responseID)
+	}
+
+	stored.mu.Lock()
+	defer stored.mu.Unlock()
+
+	// Rebuild item index
+	items := make(map[string]*schema.ORItemField)
+	for i := range response.Output {
+		item := &response.Output[i]
+		if item.ID != "" {
+			items[item.ID] = item
+		}
+	}
+
+	stored.Response = response
+	stored.Items = items
+
+	xlog.Debug("Updated response", "response_id", responseID, "status", response.Status, "items_count", len(items))
+	return nil
+}
+
+// AppendEvent appends a streaming event to the buffer for resume support
+func (s *ResponseStore) AppendEvent(responseID string, event *schema.ORStreamEvent) error {
+	s.mu.RLock()
+	stored, exists := s.responses[responseID]
+	s.mu.RUnlock()
+
+	if !exists {
+		return fmt.Errorf("response not found: %s", responseID)
+	}
+
+	// Serialize the event
+	data, err := json.Marshal(event)
+	if err != nil {
+		return fmt.Errorf("failed to marshal event: %w", err)
+	}
+
+	stored.mu.Lock()
+	stored.StreamEvents = append(stored.StreamEvents, StreamedEvent{
+		SequenceNumber: event.SequenceNumber,
+		EventType:      event.Type,
+		Data:           data,
+	})
+	stored.mu.Unlock()
+
+	// Notify any subscribers of new event
+	select {
+	case stored.EventsChan <- struct{}{}:
+	default:
+		// Channel full, subscribers will catch up
+	}
+
+	return nil
+}
+
+// GetEventsAfter returns all events with sequence number greater than startingAfter
+func (s *ResponseStore) GetEventsAfter(responseID string, startingAfter int) ([]StreamedEvent, error) {
+	s.mu.RLock()
+	stored, exists := s.responses[responseID]
+	s.mu.RUnlock()
+
+	if !exists {
+		return nil, fmt.Errorf("response not found: %s", responseID)
+	}
+
+	stored.mu.RLock()
+	defer stored.mu.RUnlock()
+
+	var result []StreamedEvent
+	for _, event := range stored.StreamEvents {
+		if event.SequenceNumber > startingAfter {
+			result = append(result, event)
+		}
+	}
+
+	return result, nil
+}
+
+// Cancel cancels a background response if it's still in progress
+func (s *ResponseStore) Cancel(responseID string) (*schema.ORResponseResource, error) {
+	s.mu.RLock()
+	stored, exists := s.responses[responseID]
+	s.mu.RUnlock()
+
+	if !exists {
+		return nil, fmt.Errorf("response not found: %s", responseID)
+	}
+
+	stored.mu.Lock()
+	defer stored.mu.Unlock()
+
+	// If already in a terminal state, just return the response (idempotent)
+	status := stored.Response.Status
+	if status == schema.ORStatusCompleted || status == schema.ORStatusFailed ||
+		status == schema.ORStatusIncomplete || status == schema.ORStatusCancelled {
+		xlog.Debug("Response already in terminal state", "response_id", responseID, "status", status)
+		return stored.Response, nil
+	}
+
+	// Cancel the context if available
+	if stored.CancelFunc != nil {
+		stored.CancelFunc()
+		xlog.Debug("Cancelled background response", "response_id", responseID)
+	}
+
+	// Update status to cancelled
+	now := time.Now().Unix()
+	stored.Response.Status = schema.ORStatusCancelled
+	stored.Response.CompletedAt = &now
+
+	return stored.Response, nil
+}
+
+// GetEventsChan returns the events notification channel for a response
+func (s *ResponseStore) GetEventsChan(responseID string) (chan struct{}, error) {
+	s.mu.RLock()
+	stored, exists := s.responses[responseID]
+	s.mu.RUnlock()
+
+	if !exists {
+		return nil, fmt.Errorf("response not found: %s", responseID)
+	}
+
+	return stored.EventsChan, nil
+}
+
+// IsStreamEnabled checks if a response was created with streaming enabled
+func (s *ResponseStore) IsStreamEnabled(responseID string) (bool, error) {
+	s.mu.RLock()
+	stored, exists := s.responses[responseID]
+	s.mu.RUnlock()
+
+	if !exists {
+		return false, fmt.Errorf("response not found: %s", responseID)
+	}
+
+	stored.mu.RLock()
+	defer stored.mu.RUnlock()
+
+	return stored.StreamEnabled, nil
+}
--- a/core/http/endpoints/openresponses/store_suite_test.go
+++ b/core/http/endpoints/openresponses/store_suite_test.go
@@ -0,0 +1,13 @@
+package openresponses
+
+import (
+	"testing"
+
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+)
+
+func TestStore(t *testing.T) {
+	RegisterFailHandler(Fail)
+	RunSpecs(t, "ResponseStore Suite")
+}
--- a/core/http/endpoints/openresponses/store_test.go
+++ b/core/http/endpoints/openresponses/store_test.go
@@ -0,0 +1,626 @@
+package openresponses
+
+import (
+	"context"
+	"fmt"
+	"time"
+
+	"github.com/mudler/LocalAI/core/schema"
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+)
+
+var _ = Describe("ResponseStore", func() {
+	var store *ResponseStore
+
+	BeforeEach(func() {
+		store = NewResponseStore(0) // No TTL for most tests
+	})
+
+	AfterEach(func() {
+		// Clean up
+	})
+
+	Describe("Store and Get", func() {
+		It("should store and retrieve a response", func() {
+			responseID := "resp_test123"
+			request := &schema.OpenResponsesRequest{
+				Model: "test-model",
+				Input: "Hello",
+			}
+			response := &schema.ORResponseResource{
+				ID:        responseID,
+				Object:    "response",
+				CreatedAt: time.Now().Unix(),
+				Status:    "completed",
+				Model:     "test-model",
+				Output: []schema.ORItemField{
+					{
+						Type:   "message",
+						ID:     "msg_123",
+						Status: "completed",
+						Role:   "assistant",
+						Content: []schema.ORContentPart{{
+							Type:        "output_text",
+							Text:        "Hello, world!",
+							Annotations: []schema.ORAnnotation{},
+							Logprobs:    []schema.ORLogProb{},
+						}},
+					},
+				},
+			}
+
+			store.Store(responseID, request, response)
+
+			stored, err := store.Get(responseID)
+			Expect(err).ToNot(HaveOccurred())
+			Expect(stored).ToNot(BeNil())
+			Expect(stored.Response.ID).To(Equal(responseID))
+			Expect(stored.Request.Model).To(Equal("test-model"))
+			Expect(len(stored.Items)).To(Equal(1))
+			Expect(stored.Items["msg_123"]).ToNot(BeNil())
+			Expect(stored.Items["msg_123"].ID).To(Equal("msg_123"))
+		})
+
+		It("should return error for non-existent response", func() {
+			_, err := store.Get("nonexistent")
+			Expect(err).To(HaveOccurred())
+			Expect(err.Error()).To(ContainSubstring("not found"))
+		})
+
+		It("should index all items by ID", func() {
+			responseID := "resp_test456"
+			request := &schema.OpenResponsesRequest{
+				Model: "test-model",
+				Input: "Test",
+			}
+			response := &schema.ORResponseResource{
+				ID:     responseID,
+				Object: "response",
+				Output: []schema.ORItemField{
+					{
+						Type:   "message",
+						ID:     "msg_1",
+						Status: "completed",
+						Role:   "assistant",
+					},
+					{
+						Type:      "function_call",
+						ID:        "fc_1",
+						Status:    "completed",
+						CallID:    "fc_1",
+						Name:      "test_function",
+						Arguments: `{"arg": "value"}`,
+					},
+					{
+						Type:   "message",
+						ID:     "msg_2",
+						Status: "completed",
+						Role:   "assistant",
+					},
+				},
+			}
+
+			store.Store(responseID, request, response)
+
+			stored, err := store.Get(responseID)
+			Expect(err).ToNot(HaveOccurred())
+			Expect(len(stored.Items)).To(Equal(3))
+			Expect(stored.Items["msg_1"]).ToNot(BeNil())
+			Expect(stored.Items["fc_1"]).ToNot(BeNil())
+			Expect(stored.Items["msg_2"]).ToNot(BeNil())
+		})
+
+		It("should handle items without IDs", func() {
+			responseID := "resp_test789"
+			request := &schema.OpenResponsesRequest{
+				Model: "test-model",
+				Input: "Test",
+			}
+			response := &schema.ORResponseResource{
+				ID:     responseID,
+				Object: "response",
+				Output: []schema.ORItemField{
+					{
+						Type:   "message",
+						ID:     "", // No ID
+						Status: "completed",
+						Role:   "assistant",
+					},
+					{
+						Type:   "message",
+						ID:     "msg_with_id",
+						Status: "completed",
+						Role:   "assistant",
+					},
+				},
+			}
+
+			store.Store(responseID, request, response)
+
+			stored, err := store.Get(responseID)
+			Expect(err).ToNot(HaveOccurred())
+			// Only items with IDs are indexed
+			Expect(len(stored.Items)).To(Equal(1))
+			Expect(stored.Items["msg_with_id"]).ToNot(BeNil())
+		})
+	})
+
+	Describe("GetItem", func() {
+		It("should retrieve a specific item by ID", func() {
+			responseID := "resp_item_test"
+			itemID := "msg_specific"
+			request := &schema.OpenResponsesRequest{
+				Model: "test-model",
+				Input: "Test",
+			}
+			response := &schema.ORResponseResource{
+				ID:     responseID,
+				Object: "response",
+				Output: []schema.ORItemField{
+					{
+						Type:   "message",
+						ID:     itemID,
+						Status: "completed",
+						Role:   "assistant",
+						Content: []schema.ORContentPart{{
+							Type:        "output_text",
+							Text:        "Specific message",
+							Annotations: []schema.ORAnnotation{},
+							Logprobs:    []schema.ORLogProb{},
+						}},
+					},
+				},
+			}
+
+			store.Store(responseID, request, response)
+
+			item, err := store.GetItem(responseID, itemID)
+			Expect(err).ToNot(HaveOccurred())
+			Expect(item).ToNot(BeNil())
+			Expect(item.ID).To(Equal(itemID))
+			Expect(item.Type).To(Equal("message"))
+		})
+
+		It("should return error for non-existent item", func() {
+			responseID := "resp_item_test2"
+			request := &schema.OpenResponsesRequest{
+				Model: "test-model",
+				Input: "Test",
+			}
+			response := &schema.ORResponseResource{
+				ID:     responseID,
+				Object: "response",
+				Output: []schema.ORItemField{
+					{
+						Type:   "message",
+						ID:     "msg_existing",
+						Status: "completed",
+					},
+				},
+			}
+
+			store.Store(responseID, request, response)
+
+			_, err := store.GetItem(responseID, "nonexistent_item")
+			Expect(err).To(HaveOccurred())
+			Expect(err.Error()).To(ContainSubstring("item not found"))
+		})
+
+		It("should return error for non-existent response when getting item", func() {
+			_, err := store.GetItem("nonexistent_response", "any_item")
+			Expect(err).To(HaveOccurred())
+			Expect(err.Error()).To(ContainSubstring("response not found"))
+		})
+	})
+
+	Describe("FindItem", func() {
+		It("should find an item across all stored responses", func() {
+			// Store first response
+			responseID1 := "resp_find_1"
+			itemID1 := "msg_find_1"
+			store.Store(responseID1, &schema.OpenResponsesRequest{Model: "test"}, &schema.ORResponseResource{
+				ID:     responseID1,
+				Object: "response",
+				Output: []schema.ORItemField{
+					{Type: "message", ID: itemID1, Status: "completed"},
+				},
+			})
+
+			// Store second response
+			responseID2 := "resp_find_2"
+			itemID2 := "msg_find_2"
+			store.Store(responseID2, &schema.OpenResponsesRequest{Model: "test"}, &schema.ORResponseResource{
+				ID:     responseID2,
+				Object: "response",
+				Output: []schema.ORItemField{
+					{Type: "message", ID: itemID2, Status: "completed"},
+				},
+			})
+
+			// Find item from first response
+			item, foundResponseID, err := store.FindItem(itemID1)
+			Expect(err).ToNot(HaveOccurred())
+			Expect(item).ToNot(BeNil())
+			Expect(item.ID).To(Equal(itemID1))
+			Expect(foundResponseID).To(Equal(responseID1))
+
+			// Find item from second response
+			item, foundResponseID, err = store.FindItem(itemID2)
+			Expect(err).ToNot(HaveOccurred())
+			Expect(item).ToNot(BeNil())
+			Expect(item.ID).To(Equal(itemID2))
+			Expect(foundResponseID).To(Equal(responseID2))
+		})
+
+		It("should return error when item not found in any response", func() {
+			_, _, err := store.FindItem("nonexistent_item")
+			Expect(err).To(HaveOccurred())
+			Expect(err.Error()).To(ContainSubstring("item not found in any stored response"))
+		})
+	})
+
+	Describe("Delete", func() {
+		It("should delete a stored response", func() {
+			responseID := "resp_delete_test"
+			request := &schema.OpenResponsesRequest{Model: "test"}
+			response := &schema.ORResponseResource{
+				ID:     responseID,
+				Object: "response",
+			}
+
+			store.Store(responseID, request, response)
+			Expect(store.Count()).To(Equal(1))
+
+			store.Delete(responseID)
+			Expect(store.Count()).To(Equal(0))
+
+			_, err := store.Get(responseID)
+			Expect(err).To(HaveOccurred())
+		})
+
+		It("should handle deleting non-existent response gracefully", func() {
+			// Should not panic
+			store.Delete("nonexistent")
+			Expect(store.Count()).To(Equal(0))
+		})
+	})
+
+	Describe("Count", func() {
+		It("should return correct count of stored responses", func() {
+			Expect(store.Count()).To(Equal(0))
+
+			store.Store("resp_1", &schema.OpenResponsesRequest{Model: "test"}, &schema.ORResponseResource{ID: "resp_1", Object: "response"})
+			Expect(store.Count()).To(Equal(1))
+
+			store.Store("resp_2", &schema.OpenResponsesRequest{Model: "test"}, &schema.ORResponseResource{ID: "resp_2", Object: "response"})
+			Expect(store.Count()).To(Equal(2))
+
+			store.Delete("resp_1")
+			Expect(store.Count()).To(Equal(1))
+		})
+	})
+
+	Describe("TTL and Expiration", func() {
+		It("should set expiration when TTL is configured", func() {
+			ttlStore := NewResponseStore(100 * time.Millisecond)
+			responseID := "resp_ttl_test"
+			request := &schema.OpenResponsesRequest{Model: "test"}
+			response := &schema.ORResponseResource{ID: responseID, Object: "response"}
+
+			ttlStore.Store(responseID, request, response)
+
+			stored, err := ttlStore.Get(responseID)
+			Expect(err).ToNot(HaveOccurred())
+			Expect(stored.ExpiresAt).ToNot(BeNil())
+			Expect(stored.ExpiresAt.After(time.Now())).To(BeTrue())
+		})
+
+		It("should not set expiration when TTL is 0", func() {
+			responseID := "resp_no_ttl"
+			request := &schema.OpenResponsesRequest{Model: "test"}
+			response := &schema.ORResponseResource{ID: responseID, Object: "response"}
+
+			store.Store(responseID, request, response)
+
+			stored, err := store.Get(responseID)
+			Expect(err).ToNot(HaveOccurred())
+			Expect(stored.ExpiresAt).To(BeNil())
+		})
+
+		It("should clean up expired responses", func() {
+			ttlStore := NewResponseStore(50 * time.Millisecond)
+			responseID := "resp_expire_test"
+			request := &schema.OpenResponsesRequest{Model: "test"}
+			response := &schema.ORResponseResource{ID: responseID, Object: "response"}
+
+			ttlStore.Store(responseID, request, response)
+			Expect(ttlStore.Count()).To(Equal(1))
+
+			// Wait for expiration (longer than TTL and cleanup interval)
+			time.Sleep(150 * time.Millisecond)
+
+			// Cleanup should remove expired response (may have already been cleaned by goroutine)
+			count := ttlStore.Cleanup()
+			// Count might be 0 if cleanup goroutine already ran, or 1 if we're first
+			Expect(count).To(BeNumerically(">=", 0))
+			Expect(ttlStore.Count()).To(Equal(0))
+
+			_, err := ttlStore.Get(responseID)
+			Expect(err).To(HaveOccurred())
+		})
+
+		It("should return error for expired response", func() {
+			ttlStore := NewResponseStore(50 * time.Millisecond)
+			responseID := "resp_expire_error"
+			request := &schema.OpenResponsesRequest{Model: "test"}
+			response := &schema.ORResponseResource{ID: responseID, Object: "response"}
+
+			ttlStore.Store(responseID, request, response)
+
+			// Wait for expiration (but not long enough for cleanup goroutine to remove it)
+			time.Sleep(75 * time.Millisecond)
+
+			// Try to get before cleanup goroutine removes it
+			_, err := ttlStore.Get(responseID)
+			// Error could be "expired" or "not found" (if cleanup already ran)
+			Expect(err).To(HaveOccurred())
+			// Either error message is acceptable
+			errMsg := err.Error()
+			Expect(errMsg).To(Or(ContainSubstring("expired"), ContainSubstring("not found")))
+		})
+	})
+
+	Describe("Thread Safety", func() {
+		It("should handle concurrent stores and gets", func() {
+			// This is a basic concurrency test
+			done := make(chan bool, 10)
+			for i := 0; i < 10; i++ {
+				go func(id int) {
+					responseID := fmt.Sprintf("resp_concurrent_%d", id)
+					request := &schema.OpenResponsesRequest{Model: "test"}
+					response := &schema.ORResponseResource{
+						ID:     responseID,
+						Object: "response",
+						Output: []schema.ORItemField{
+							{Type: "message", ID: fmt.Sprintf("msg_%d", id), Status: "completed"},
+						},
+					}
+					store.Store(responseID, request, response)
+
+					// Retrieve immediately
+					stored, err := store.Get(responseID)
+					Expect(err).ToNot(HaveOccurred())
+					Expect(stored).ToNot(BeNil())
+					done <- true
+				}(i)
+			}
+
+			// Wait for all goroutines
+			for i := 0; i < 10; i++ {
+				<-done
+			}
+
+			Expect(store.Count()).To(Equal(10))
+		})
+	})
+
+	Describe("GetGlobalStore", func() {
+		It("should return singleton instance", func() {
+			store1 := GetGlobalStore()
+			store2 := GetGlobalStore()
+			Expect(store1).To(Equal(store2))
+		})
+
+		It("should persist data across GetGlobalStore calls", func() {
+			globalStore := GetGlobalStore()
+			responseID := "resp_global_test"
+			request := &schema.OpenResponsesRequest{Model: "test"}
+			response := &schema.ORResponseResource{ID: responseID, Object: "response"}
+
+			globalStore.Store(responseID, request, response)
+
+			// Get store again
+			globalStore2 := GetGlobalStore()
+			stored, err := globalStore2.Get(responseID)
+			Expect(err).ToNot(HaveOccurred())
+			Expect(stored).ToNot(BeNil())
+		})
+	})
+
+	Describe("Background Mode Support", func() {
+		It("should store background response with cancel function", func() {
+			responseID := "resp_bg_test"
+			request := &schema.OpenResponsesRequest{Model: "test"}
+			response := &schema.ORResponseResource{
+				ID:     responseID,
+				Object: "response",
+				Status: schema.ORStatusQueued,
+			}
+
+			_, cancel := context.WithCancel(context.Background())
+			defer cancel()
+
+			store.StoreBackground(responseID, request, response, cancel, true)
+
+			stored, err := store.Get(responseID)
+			Expect(err).ToNot(HaveOccurred())
+			Expect(stored).ToNot(BeNil())
+			Expect(stored.IsBackground).To(BeTrue())
+			Expect(stored.StreamEnabled).To(BeTrue())
+			Expect(stored.CancelFunc).ToNot(BeNil())
+		})
+
+		It("should update status of stored response", func() {
+			responseID := "resp_status_test"
+			request := &schema.OpenResponsesRequest{Model: "test"}
+			response := &schema.ORResponseResource{
+				ID:     responseID,
+				Object: "response",
+				Status: schema.ORStatusQueued,
+			}
+
+			store.Store(responseID, request, response)
+
+			err := store.UpdateStatus(responseID, schema.ORStatusInProgress, nil)
+			Expect(err).ToNot(HaveOccurred())
+
+			stored, err := store.Get(responseID)
+			Expect(err).ToNot(HaveOccurred())
+			Expect(stored.Response.Status).To(Equal(schema.ORStatusInProgress))
+		})
+
+		It("should append and retrieve streaming events", func() {
+			responseID := "resp_events_test"
+			request := &schema.OpenResponsesRequest{Model: "test"}
+			response := &schema.ORResponseResource{
+				ID:     responseID,
+				Object: "response",
+				Status: schema.ORStatusInProgress,
+			}
+
+			_, cancel := context.WithCancel(context.Background())
+			defer cancel()
+
+			store.StoreBackground(responseID, request, response, cancel, true)
+
+			// Append events
+			event1 := &schema.ORStreamEvent{
+				Type:           "response.created",
+				SequenceNumber: 0,
+			}
+			event2 := &schema.ORStreamEvent{
+				Type:           "response.in_progress",
+				SequenceNumber: 1,
+			}
+			event3 := &schema.ORStreamEvent{
+				Type:           "response.output_text.delta",
+				SequenceNumber: 2,
+			}
+
+			err := store.AppendEvent(responseID, event1)
+			Expect(err).ToNot(HaveOccurred())
+			err = store.AppendEvent(responseID, event2)
+			Expect(err).ToNot(HaveOccurred())
+			err = store.AppendEvent(responseID, event3)
+			Expect(err).ToNot(HaveOccurred())
+
+			// Get all events after -1 (all events)
+			events, err := store.GetEventsAfter(responseID, -1)
+			Expect(err).ToNot(HaveOccurred())
+			Expect(events).To(HaveLen(3))
+
+			// Get events after sequence 1
+			events, err = store.GetEventsAfter(responseID, 1)
+			Expect(err).ToNot(HaveOccurred())
+			Expect(events).To(HaveLen(1))
+			Expect(events[0].SequenceNumber).To(Equal(2))
+		})
+
+		It("should cancel an in-progress response", func() {
+			responseID := "resp_cancel_test"
+			request := &schema.OpenResponsesRequest{Model: "test"}
+			response := &schema.ORResponseResource{
+				ID:     responseID,
+				Object: "response",
+				Status: schema.ORStatusInProgress,
+			}
+
+			_, cancel := context.WithCancel(context.Background())
+			defer cancel()
+
+			store.StoreBackground(responseID, request, response, cancel, false)
+
+			// Cancel the response
+			cancelledResponse, err := store.Cancel(responseID)
+			Expect(err).ToNot(HaveOccurred())
+			Expect(cancelledResponse.Status).To(Equal(schema.ORStatusCancelled))
+			Expect(cancelledResponse.CompletedAt).ToNot(BeNil())
+		})
+
+		It("should be idempotent when cancelling already completed response", func() {
+			responseID := "resp_idempotent_cancel"
+			request := &schema.OpenResponsesRequest{Model: "test"}
+			completedAt := time.Now().Unix()
+			response := &schema.ORResponseResource{
+				ID:          responseID,
+				Object:      "response",
+				Status:      schema.ORStatusCompleted,
+				CompletedAt: &completedAt,
+			}
+
+			store.Store(responseID, request, response)
+
+			// Try to cancel a completed response
+			cancelledResponse, err := store.Cancel(responseID)
+			Expect(err).ToNot(HaveOccurred())
+			// Status should remain completed (not changed to cancelled)
+			Expect(cancelledResponse.Status).To(Equal(schema.ORStatusCompleted))
+		})
+
+		It("should check if streaming is enabled", func() {
+			responseID := "resp_stream_check"
+			request := &schema.OpenResponsesRequest{Model: "test"}
+			response := &schema.ORResponseResource{
+				ID:     responseID,
+				Object: "response",
+				Status: schema.ORStatusQueued,
+			}
+
+			_, cancel := context.WithCancel(context.Background())
+			defer cancel()
+
+			store.StoreBackground(responseID, request, response, cancel, true)
+
+			enabled, err := store.IsStreamEnabled(responseID)
+			Expect(err).ToNot(HaveOccurred())
+			Expect(enabled).To(BeTrue())
+
+			// Store another without streaming
+			responseID2 := "resp_no_stream"
+			store.StoreBackground(responseID2, request, response, cancel, false)
+
+			enabled2, err := store.IsStreamEnabled(responseID2)
+			Expect(err).ToNot(HaveOccurred())
+			Expect(enabled2).To(BeFalse())
+		})
+
+		It("should notify subscribers of new events", func() {
+			responseID := "resp_events_chan"
+			request := &schema.OpenResponsesRequest{Model: "test"}
+			response := &schema.ORResponseResource{
+				ID:     responseID,
+				Object: "response",
+				Status: schema.ORStatusInProgress,
+			}
+
+			_, cancel := context.WithCancel(context.Background())
+			defer cancel()
+
+			store.StoreBackground(responseID, request, response, cancel, true)
+
+			eventsChan, err := store.GetEventsChan(responseID)
+			Expect(err).ToNot(HaveOccurred())
+			Expect(eventsChan).ToNot(BeNil())
+
+			// Append an event
+			event := &schema.ORStreamEvent{
+				Type:           "response.output_text.delta",
+				SequenceNumber: 0,
+			}
+
+			go func() {
+				time.Sleep(10 * time.Millisecond)
+				store.AppendEvent(responseID, event)
+			}()
+
+			// Wait for notification
+			select {
+			case <-eventsChan:
+				// Event received
+			case <-time.After(1 * time.Second):
+				Fail("Timeout waiting for event notification")
+			}
+		})
+	})
+})
--- a/core/http/http_suite_test.go
+++ b/core/http/http_suite_test.go
@@ -1,13 +1,33 @@
 package http_test

 import (
+	"os"
+	"path/filepath"
 	"testing"

 	. "github.com/onsi/ginkgo/v2"
 	. "github.com/onsi/gomega"
 )

+var (
+	tmpdir   string
+	modelDir string
+)
+
 func TestLocalAI(t *testing.T) {
 	RegisterFailHandler(Fail)
+
+	var err error
+	tmpdir, err = os.MkdirTemp("", "")
+	Expect(err).ToNot(HaveOccurred())
+	modelDir = filepath.Join(tmpdir, "models")
+	err = os.Mkdir(modelDir, 0750)
+	Expect(err).ToNot(HaveOccurred())
+
+	AfterSuite(func() {
+		err := os.RemoveAll(tmpdir)
+		Expect(err).ToNot(HaveOccurred())
+	})
+
 	RunSpecs(t, "LocalAI HTTP test suite")
 }
--- a/core/http/middleware/request.go
+++ b/core/http/middleware/request.go
@@ -484,3 +484,103 @@ func mergeOpenAIRequestAndModelConfig(config *config.ModelConfig, input *schema.
 	}
 	return fmt.Errorf("unable to validate configuration after merging")
 }
+
+func (re *RequestExtractor) SetOpenResponsesRequest(c echo.Context) error {
+	input, ok := c.Get(CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.OpenResponsesRequest)
+	if !ok || input.Model == "" {
+		return echo.ErrBadRequest
+	}
+
+	cfg, ok := c.Get(CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig)
+	if !ok || cfg == nil {
+		return echo.ErrBadRequest
+	}
+
+	// Extract or generate the correlation ID (Open Responses uses x-request-id)
+	correlationID := c.Request().Header.Get("x-request-id")
+	if correlationID == "" {
+		correlationID = uuid.New().String()
+	}
+	c.Response().Header().Set("x-request-id", correlationID)
+
+	// Use the request context directly - Echo properly supports context cancellation!
+	reqCtx := c.Request().Context()
+	c1, cancel := context.WithCancel(re.applicationConfig.Context)
+
+	// Cancel when request context is cancelled (client disconnects)
+	go func() {
+		select {
+		case <-reqCtx.Done():
+			cancel()
+		case <-c1.Done():
+			// Already cancelled
+		}
+	}()
+
+	// Add the correlation ID to the new context
+	ctxWithCorrelationID := context.WithValue(c1, CorrelationIDKey, correlationID)
+
+	input.Context = ctxWithCorrelationID
+	input.Cancel = cancel
+
+	err := mergeOpenResponsesRequestAndModelConfig(cfg, input)
+	if err != nil {
+		return err
+	}
+
+	if cfg.Model == "" {
+		xlog.Debug("replacing empty cfg.Model with input value", "input.Model", input.Model)
+		cfg.Model = input.Model
+	}
+
+	c.Set(CONTEXT_LOCALS_KEY_LOCALAI_REQUEST, input)
+	c.Set(CONTEXT_LOCALS_KEY_MODEL_CONFIG, cfg)
+
+	return nil
+}
+
+func mergeOpenResponsesRequestAndModelConfig(config *config.ModelConfig, input *schema.OpenResponsesRequest) error {
+	// Temperature
+	if input.Temperature != nil {
+		config.Temperature = input.Temperature
+	}
+
+	// TopP
+	if input.TopP != nil {
+		config.TopP = input.TopP
+	}
+
+	// MaxOutputTokens -> Maxtokens
+	if input.MaxOutputTokens != nil {
+		config.Maxtokens = input.MaxOutputTokens
+	}
+
+	// Convert tools to functions - this will be handled in the endpoint handler
+	// We just validate that tools are present if needed
+
+	// Handle tool_choice
+	if input.ToolChoice != nil {
+		switch tc := input.ToolChoice.(type) {
+		case string:
+			// "auto", "required", or "none"
+			if tc == "required" {
+				config.SetFunctionCallString("required")
+			} else if tc == "none" {
+				// Don't use tools - handled in endpoint
+			}
+			// "auto" is default - let model decide
+		case map[string]interface{}:
+			// Specific tool: {type:"function", name:"..."}
+			if tcType, ok := tc["type"].(string); ok && tcType == "function" {
+				if name, ok := tc["name"].(string); ok {
+					config.SetFunctionCallString(name)
+				}
+			}
+		}
+	}
+
+	if valid, _ := config.Validate(); valid {
+		return nil
+	}
+	return fmt.Errorf("unable to validate configuration after merging")
+}
--- a/core/http/openresponses_test.go
+++ b/core/http/openresponses_test.go
--- a/core/http/routes/openresponses.go
+++ b/core/http/routes/openresponses.go
@@ -0,0 +1,58 @@
+package routes
+
+import (
+	"github.com/labstack/echo/v4"
+	"github.com/mudler/LocalAI/core/application"
+	"github.com/mudler/LocalAI/core/config"
+	"github.com/mudler/LocalAI/core/http/endpoints/openresponses"
+	"github.com/mudler/LocalAI/core/http/middleware"
+	"github.com/mudler/LocalAI/core/schema"
+)
+
+func RegisterOpenResponsesRoutes(app *echo.Echo,
+	re *middleware.RequestExtractor,
+	application *application.Application) {
+
+	// Open Responses API endpoint
+	responsesHandler := openresponses.ResponsesEndpoint(
+		application.ModelConfigLoader(),
+		application.ModelLoader(),
+		application.TemplatesEvaluator(),
+		application.ApplicationConfig(),
+	)
+
+	responsesMiddleware := []echo.MiddlewareFunc{
+		middleware.TraceMiddleware(application),
+		re.BuildFilteredFirstAvailableDefaultModel(config.BuildUsecaseFilterFn(config.FLAG_CHAT)),
+		re.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.OpenResponsesRequest) }),
+		setOpenResponsesRequestContext(re),
+	}
+
+	// Main Open Responses endpoint
+	app.POST("/v1/responses", responsesHandler, responsesMiddleware...)
+
+	// Also support without version prefix for compatibility
+	app.POST("/responses", responsesHandler, responsesMiddleware...)
+
+	// GET /responses/:id - Retrieve a response (for polling background requests)
+	getResponseHandler := openresponses.GetResponseEndpoint()
+	app.GET("/v1/responses/:id", getResponseHandler, middleware.TraceMiddleware(application))
+	app.GET("/responses/:id", getResponseHandler, middleware.TraceMiddleware(application))
+
+	// POST /responses/:id/cancel - Cancel a background response
+	cancelResponseHandler := openresponses.CancelResponseEndpoint()
+	app.POST("/v1/responses/:id/cancel", cancelResponseHandler, middleware.TraceMiddleware(application))
+	app.POST("/responses/:id/cancel", cancelResponseHandler, middleware.TraceMiddleware(application))
+}
+
+// setOpenResponsesRequestContext sets up the context and cancel function for Open Responses requests
+func setOpenResponsesRequestContext(re *middleware.RequestExtractor) echo.MiddlewareFunc {
+	return func(next echo.HandlerFunc) echo.HandlerFunc {
+		return func(c echo.Context) error {
+			if err := re.SetOpenResponsesRequest(c); err != nil {
+				return err
+			}
+			return next(c)
+		}
+	}
+}
--- a/core/http/views/partials/navbar.html
+++ b/core/http/views/partials/navbar.html
@@ -28,6 +28,9 @@
                <a href="image/" class="text-[var(--color-text-secondary)] hover:text-[var(--color-text-primary)] px-2 py-2 rounded-lg transition duration-300 ease-in-out hover:bg-[var(--color-bg-secondary)] flex items-center group text-sm">
                    <i class="fas fa-image text-[var(--color-primary)] mr-1.5 text-sm group-hover:scale-110 transition-transform"></i>Images
                </a>
+                <a href="video/" class="text-[var(--color-text-secondary)] hover:text-[var(--color-text-primary)] px-2 py-2 rounded-lg transition duration-300 ease-in-out hover:bg-[var(--color-bg-secondary)] flex items-center group text-sm">
+                    <i class="fas fa-video text-[var(--color-primary)] mr-1.5 text-sm group-hover:scale-110 transition-transform"></i>Video
+                </a>
                <a href="tts/" class="text-[var(--color-text-secondary)] hover:text-[var(--color-text-primary)] px-2 py-2 rounded-lg transition duration-300 ease-in-out hover:bg-[var(--color-bg-secondary)] flex items-center group text-sm">
                    <i class="fa-solid fa-music text-[var(--color-primary)] mr-1.5 text-sm group-hover:scale-110 transition-transform"></i>TTS
                </a>
@@ -88,6 +91,9 @@
                <a href="image/" class="block text-[var(--color-text-secondary)] hover:text-[var(--color-text-primary)] hover:bg-[var(--color-bg-secondary)] px-3 py-2 rounded-lg transition duration-300 ease-in-out flex items-center text-sm">
                    <i class="fas fa-image text-[var(--color-primary)] mr-3 w-5 text-center text-sm"></i>Images
                </a>
+                <a href="video/" class="block text-[var(--color-text-secondary)] hover:text-[var(--color-text-primary)] hover:bg-[var(--color-bg-secondary)] px-3 py-2 rounded-lg transition duration-300 ease-in-out flex items-center text-sm">
+                    <i class="fas fa-video text-[var(--color-primary)] mr-3 w-5 text-center text-sm"></i>Video
+                </a>
                <a href="tts/" class="block text-[var(--color-text-secondary)] hover:text-[var(--color-text-primary)] hover:bg-[var(--color-bg-secondary)] px-3 py-2 rounded-lg transition duration-300 ease-in-out flex items-center text-sm">
                    <i class="fa-solid fa-music text-[var(--color-primary)] mr-3 w-5 text-center text-sm"></i>TTS
                </a>
--- a/core/http/views/settings.html
+++ b/core/http/views/settings.html
@@ -485,6 +485,28 @@
                </div>
            </div>

+            <!-- Open Responses Settings Section -->
+            <div class="bg-[var(--color-bg-secondary)] border border-[var(--color-accent)]/20 rounded-lg p-6">
+                <h2 class="text-xl font-semibold text-[var(--color-text-primary)] mb-4 flex items-center">
+                    <i class="fas fa-database mr-2 text-[var(--color-accent)] text-sm"></i>
+                    Open Responses Settings
+                </h2>
+                <p class="text-xs text-[var(--color-text-secondary)] mb-4">
+                    Configure Open Responses API response storage
+                </p>
+
+                <div class="space-y-4">
+                    <!-- Store TTL -->
+                    <div>
+                        <label class="block text-sm font-medium text-[var(--color-text-primary)] mb-2">Response Store TTL</label>
+                        <p class="text-xs text-[var(--color-text-secondary)] mb-2">Time-to-live for stored responses (e.g., 1h, 30m, 0 = no expiration)</p>
+                        <input type="text" x-model="settings.open_responses_store_ttl"
+                               placeholder="0"
+                               class="w-full px-3 py-2 bg-[var(--color-bg-primary)] border border-[var(--color-accent)]/20 rounded text-sm text-[var(--color-text-primary)] focus:outline-none focus:ring-2 focus:ring-[var(--color-accent)]/50">
+                    </div>
+                </div>
+            </div>
+
            <!-- API Keys Settings Section -->
            <div class="bg-[var(--color-bg-secondary)] border border-[var(--color-error-light)] rounded-lg p-6">
                <h2 class="text-xl font-semibold text-[var(--color-text-primary)] mb-4 flex items-center">
@@ -633,7 +655,8 @@ function settingsDashboard() {
            galleries_json: '[]',
            backend_galleries_json: '[]',
            api_keys_text: '',
-            agent_job_retention_days: 30
+            agent_job_retention_days: 30,
+            open_responses_store_ttl: '0'
        },
        sourceInfo: '',
        saving: false,
@@ -680,7 +703,8 @@ function settingsDashboard() {
                        galleries_json: JSON.stringify(data.galleries || [], null, 2),
                        backend_galleries_json: JSON.stringify(data.backend_galleries || [], null, 2),
                        api_keys_text: (data.api_keys || []).join('\n'),
-                        agent_job_retention_days: data.agent_job_retention_days || 30
+                        agent_job_retention_days: data.agent_job_retention_days || 30,
+                        open_responses_store_ttl: data.open_responses_store_ttl || '0'
                    };
                    this.sourceInfo = data.source || 'default';
                } else {
@@ -838,6 +862,9 @@ function settingsDashboard() {
                if (this.settings.agent_job_retention_days !== undefined) {
                    payload.agent_job_retention_days = parseInt(this.settings.agent_job_retention_days) || 30;
                }
+                if (this.settings.open_responses_store_ttl !== undefined) {
+                    payload.open_responses_store_ttl = this.settings.open_responses_store_ttl;
+                }

                const response = await fetch('/api/settings', {
                    method: 'POST',
--- a/core/schema/openresponses.go
+++ b/core/schema/openresponses.go
@@ -0,0 +1,306 @@
+package schema
+
+import (
+	"context"
+)
+
+// Open Responses status constants
+const (
+	ORStatusQueued     = "queued"
+	ORStatusInProgress = "in_progress"
+	ORStatusCompleted  = "completed"
+	ORStatusFailed     = "failed"
+	ORStatusIncomplete = "incomplete"
+	ORStatusCancelled  = "cancelled"
+)
+
+// OpenResponsesRequest represents a request to the Open Responses API
+// https://www.openresponses.org/specification
+type OpenResponsesRequest struct {
+	Model              string            `json:"model"`
+	Input              interface{}       `json:"input"` // string or []ORItemParam
+	Tools              []ORFunctionTool  `json:"tools,omitempty"`
+	ToolChoice         interface{}       `json:"tool_choice,omitempty"` // "auto"|"required"|"none"|{type:"function",name:"..."}
+	Stream             bool              `json:"stream,omitempty"`
+	MaxOutputTokens    *int              `json:"max_output_tokens,omitempty"`
+	Temperature        *float64          `json:"temperature,omitempty"`
+	TopP               *float64          `json:"top_p,omitempty"`
+	Truncation         string            `json:"truncation,omitempty"` // "auto"|"disabled"
+	Instructions       string            `json:"instructions,omitempty"`
+	Reasoning          *ORReasoningParam `json:"reasoning,omitempty"`
+	Metadata           map[string]string `json:"metadata,omitempty"`
+	PreviousResponseID string            `json:"previous_response_id,omitempty"`
+
+	// Additional parameters from spec
+	TextFormat        interface{} `json:"text_format,omitempty"`         // TextResponseFormat or JsonSchemaResponseFormatParam
+	ServiceTier       string      `json:"service_tier,omitempty"`        // "auto"|"default"|priority hint
+	AllowedTools      []string    `json:"allowed_tools,omitempty"`       // Restrict which tools can be invoked
+	Store             *bool       `json:"store,omitempty"`               // Whether to store the response
+	Include           []string    `json:"include,omitempty"`             // What to include in response
+	ParallelToolCalls *bool       `json:"parallel_tool_calls,omitempty"` // Allow parallel tool calls
+	PresencePenalty   *float64    `json:"presence_penalty,omitempty"`    // Presence penalty (-2.0 to 2.0)
+	FrequencyPenalty  *float64    `json:"frequency_penalty,omitempty"`   // Frequency penalty (-2.0 to 2.0)
+	TopLogprobs       *int        `json:"top_logprobs,omitempty"`        // Number of top logprobs to return
+	Background        *bool       `json:"background,omitempty"`          // Run request in background
+	MaxToolCalls      *int        `json:"max_tool_calls,omitempty"`      // Maximum number of tool calls
+
+	// OpenAI-compatible extensions (not in Open Responses spec)
+	LogitBias map[string]float64 `json:"logit_bias,omitempty"` // Map of token IDs to bias values (-100 to 100)
+
+	// Internal fields (like OpenAIRequest)
+	Context context.Context    `json:"-"`
+	Cancel  context.CancelFunc `json:"-"`
+}
+
+// ModelName implements the LocalAIRequest interface
+func (r *OpenResponsesRequest) ModelName(s *string) string {
+	if s != nil {
+		r.Model = *s
+	}
+	return r.Model
+}
+
+// ORFunctionTool represents a function tool definition
+type ORFunctionTool struct {
+	Type        string                 `json:"type"` // always "function"
+	Name        string                 `json:"name"`
+	Description string                 `json:"description,omitempty"`
+	Parameters  map[string]interface{} `json:"parameters,omitempty"`
+	Strict      bool                   `json:"strict"` // Always include in response
+}
+
+// ORReasoningParam represents reasoning configuration
+type ORReasoningParam struct {
+	Effort  string `json:"effort,omitempty"`  // "none"|"low"|"medium"|"high"|"xhigh"
+	Summary string `json:"summary,omitempty"` // "auto"|"concise"|"detailed"
+}
+
+// ORItemParam represents an input/output item (discriminated union by type)
+type ORItemParam struct {
+	Type   string `json:"type"`             // message|function_call|function_call_output|reasoning|item_reference
+	ID     string `json:"id,omitempty"`     // Present for all output items
+	Status string `json:"status,omitempty"` // in_progress|completed|incomplete
+
+	// Message fields
+	Role    string      `json:"role,omitempty"`    // user|assistant|system|developer
+	Content interface{} `json:"content,omitempty"` // string or []ORContentPart for messages
+
+	// Function call fields
+	CallID    string `json:"call_id,omitempty"`
+	Name      string `json:"name,omitempty"`
+	Arguments string `json:"arguments,omitempty"`
+
+	// Function call output fields
+	Output interface{} `json:"output,omitempty"` // string or []ORContentPart
+
+	// Note: For item_reference type, use the ID field above to reference the item
+}
+
+// ORContentPart represents a content block (discriminated union by type)
+// For output_text: type, text, annotations, logprobs are ALL REQUIRED per Open Responses spec
+type ORContentPart struct {
+	Type        string         `json:"type"`        // input_text|input_image|input_file|output_text|refusal
+	Text        string         `json:"text"`        // REQUIRED for output_text - must always be present (even if empty)
+	Annotations []ORAnnotation `json:"annotations"` // REQUIRED for output_text - must always be present (use [])
+	Logprobs    []ORLogProb    `json:"logprobs"`    // REQUIRED for output_text - must always be present (use [])
+	ImageURL    string         `json:"image_url,omitempty"`
+	FileURL     string         `json:"file_url,omitempty"`
+	Filename    string         `json:"filename,omitempty"`
+	FileData    string         `json:"file_data,omitempty"`
+	Refusal     string         `json:"refusal,omitempty"`
+	Detail      string         `json:"detail,omitempty"` // low|high|auto for images
+}
+
+// OROutputTextContentPart is an alias for ORContentPart used specifically for output_text
+type OROutputTextContentPart = ORContentPart
+
+// ORItemField represents an output item (same structure as ORItemParam)
+type ORItemField = ORItemParam
+
+// ORResponseResource represents the main response object
+type ORResponseResource struct {
+	ID                 string               `json:"id"`
+	Object             string               `json:"object"` // always "response"
+	CreatedAt          int64                `json:"created_at"`
+	CompletedAt        *int64               `json:"completed_at"` // Required: present as number or null
+	Status             string               `json:"status"`       // in_progress|completed|failed|incomplete
+	Model              string               `json:"model"`
+	Output             []ORItemField        `json:"output"`
+	Error              *ORError             `json:"error"`              // Always present, null if no error
+	IncompleteDetails  *ORIncompleteDetails `json:"incomplete_details"` // Always present, null if complete
+	PreviousResponseID *string              `json:"previous_response_id"`
+	Instructions       *string              `json:"instructions"`
+
+	// Tool-related fields
+	Tools             []ORFunctionTool `json:"tools"` // Always present, empty array if no tools
+	ToolChoice        interface{}      `json:"tool_choice"`
+	ParallelToolCalls bool             `json:"parallel_tool_calls"`
+	MaxToolCalls      *int             `json:"max_tool_calls"` // nullable
+
+	// Sampling parameters (always required)
+	Temperature      float64 `json:"temperature"`
+	TopP             float64 `json:"top_p"`
+	PresencePenalty  float64 `json:"presence_penalty"`
+	FrequencyPenalty float64 `json:"frequency_penalty"`
+	TopLogprobs      int     `json:"top_logprobs"` // Default to 0
+	MaxOutputTokens  *int    `json:"max_output_tokens"`
+
+	// Text format configuration
+	Text *ORTextConfig `json:"text"`
+
+	// Truncation and reasoning
+	Truncation string       `json:"truncation"`
+	Reasoning  *ORReasoning `json:"reasoning"` // nullable
+
+	// Usage statistics
+	Usage *ORUsage `json:"usage"` // nullable
+
+	// Metadata and operational flags
+	Metadata    map[string]string `json:"metadata"`
+	Store       bool              `json:"store"`
+	Background  bool              `json:"background"`
+	ServiceTier string            `json:"service_tier"`
+
+	// Safety and caching
+	SafetyIdentifier *string `json:"safety_identifier"` // nullable
+	PromptCacheKey   *string `json:"prompt_cache_key"`  // nullable
+}
+
+// ORTextConfig represents text format configuration
+type ORTextConfig struct {
+	Format *ORTextFormat `json:"format,omitempty"`
+}
+
+// ORTextFormat represents the text format type
+type ORTextFormat struct {
+	Type string `json:"type"` // "text" or "json_schema"
+}
+
+// ORError represents an error in the response
+type ORError struct {
+	Type    string `json:"type"` // invalid_request|not_found|server_error|model_error|too_many_requests
+	Code    string `json:"code,omitempty"`
+	Message string `json:"message"`
+	Param   string `json:"param,omitempty"`
+}
+
+// ORUsage represents token usage statistics
+type ORUsage struct {
+	InputTokens         int                    `json:"input_tokens"`
+	OutputTokens        int                    `json:"output_tokens"`
+	TotalTokens         int                    `json:"total_tokens"`
+	InputTokensDetails  *ORInputTokensDetails  `json:"input_tokens_details"`  // Always present
+	OutputTokensDetails *OROutputTokensDetails `json:"output_tokens_details"` // Always present
+}
+
+// ORInputTokensDetails represents input token breakdown
+type ORInputTokensDetails struct {
+	CachedTokens int `json:"cached_tokens"` // Always include, even if 0
+}
+
+// OROutputTokensDetails represents output token breakdown
+type OROutputTokensDetails struct {
+	ReasoningTokens int `json:"reasoning_tokens"` // Always include, even if 0
+}
+
+// ORReasoning represents reasoning configuration and metadata
+type ORReasoning struct {
+	Effort  string `json:"effort,omitempty"`
+	Summary string `json:"summary,omitempty"`
+}
+
+// ORIncompleteDetails represents details about why a response was incomplete
+type ORIncompleteDetails struct {
+	Reason string `json:"reason"`
+}
+
+// ORStreamEvent represents a streaming event
+// Note: Fields like delta, text, logprobs should be set explicitly for events that require them
+// The sendSSEEvent function uses a custom serializer to handle conditional field inclusion
+type ORStreamEvent struct {
+	Type            string              `json:"type"`
+	SequenceNumber  int                 `json:"sequence_number"`
+	Response        *ORResponseResource `json:"response,omitempty"`
+	OutputIndex     *int                `json:"output_index,omitempty"`
+	ContentIndex    *int                `json:"content_index,omitempty"`
+	SummaryIndex    *int                `json:"summary_index,omitempty"`
+	ItemID          string              `json:"item_id,omitempty"`
+	Item            *ORItemField        `json:"item,omitempty"`
+	Part            *ORContentPart      `json:"part,omitempty"`
+	Delta           *string             `json:"delta,omitempty"`     // Pointer to distinguish unset from empty
+	Text            *string             `json:"text,omitempty"`      // Pointer to distinguish unset from empty
+	Arguments       *string             `json:"arguments,omitempty"` // Pointer to distinguish unset from empty
+	Refusal         string              `json:"refusal,omitempty"`
+	Error           *ORErrorPayload     `json:"error,omitempty"`
+	Logprobs        *[]ORLogProb        `json:"logprobs,omitempty"` // Pointer to distinguish unset from empty
+	Obfuscation     string              `json:"obfuscation,omitempty"`
+	Annotation      *ORAnnotation       `json:"annotation,omitempty"`
+	AnnotationIndex *int                `json:"annotation_index,omitempty"`
+}
+
+// ORErrorPayload represents an error payload in streaming events
+type ORErrorPayload struct {
+	Type    string            `json:"type"`
+	Code    string            `json:"code,omitempty"`
+	Message string            `json:"message"`
+	Param   string            `json:"param,omitempty"`
+	Headers map[string]string `json:"headers,omitempty"`
+}
+
+// ORLogProb represents log probability information
+type ORLogProb struct {
+	Token       string         `json:"token"`
+	Logprob     float64        `json:"logprob"`
+	Bytes       []int          `json:"bytes"`
+	TopLogprobs []ORTopLogProb `json:"top_logprobs,omitempty"`
+}
+
+// ORTopLogProb represents a top log probability
+type ORTopLogProb struct {
+	Token   string  `json:"token"`
+	Logprob float64 `json:"logprob"`
+	Bytes   []int   `json:"bytes"`
+}
+
+// ORAnnotation represents an annotation (e.g., URL citation)
+type ORAnnotation struct {
+	Type       string `json:"type"` // url_citation
+	StartIndex int    `json:"start_index"`
+	EndIndex   int    `json:"end_index"`
+	URL        string `json:"url"`
+	Title      string `json:"title"`
+}
+
+// ORContentPartWithLogprobs creates an output_text content part with logprobs converted from OpenAI format
+func ORContentPartWithLogprobs(text string, logprobs *Logprobs) ORContentPart {
+	orLogprobs := []ORLogProb{}
+
+	// Convert OpenAI-style logprobs to Open Responses format
+	if logprobs != nil && len(logprobs.Content) > 0 {
+		for _, lp := range logprobs.Content {
+			// Convert top logprobs
+			topLPs := []ORTopLogProb{}
+			for _, tlp := range lp.TopLogprobs {
+				topLPs = append(topLPs, ORTopLogProb{
+					Token:   tlp.Token,
+					Logprob: tlp.Logprob,
+					Bytes:   tlp.Bytes,
+				})
+			}
+
+			orLogprobs = append(orLogprobs, ORLogProb{
+				Token:       lp.Token,
+				Logprob:     lp.Logprob,
+				Bytes:       lp.Bytes,
+				TopLogprobs: topLPs,
+			})
+		}
+	}
+
+	return ORContentPart{
+		Type:        "output_text",
+		Text:        text,
+		Annotations: []ORAnnotation{}, // REQUIRED - must always be present as array (empty if none)
+		Logprobs:    orLogprobs,       // REQUIRED - must always be present as array (empty if none)
+	}
+}
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -3822,6 +3822,41 @@
    - filename: boomerang-qwen3-4.9B.Q4_K_M.gguf
      sha256: 11e6c068351d104dee31dd63550e5e2fc9be70467c1cfc07a6f84030cb701537
      uri: huggingface://mradermacher/boomerang-qwen3-4.9B-GGUF/boomerang-qwen3-4.9B.Q4_K_M.gguf
+- !!merge <<: *qwen3
+  name: "qwen3-coder-30b-a3b-instruct"
+  icon: https://cdn-avatars.huggingface.co/v1/production/uploads/620760a26e3b7210c2ff1943/-s1gyJfvbE1RgO5iBeNOi.png
+  url: "github:mudler/LocalAI/gallery/qwen3.yaml@master"
+  urls:
+    - https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct
+    - https://huggingface.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF
+  description: |
+    Qwen3-Coder is available in multiple sizes. Today, we're excited to introduce Qwen3-Coder-30B-A3B-Instruct. This streamlined model maintains impressive performance and efficiency, featuring the following key enhancements:
+
+        - Significant Performance among open models on Agentic Coding, Agentic Browser-Use, and other foundational coding tasks.
+        - Long-context Capabilities with native support for 256K tokens, extendable up to 1M tokens using Yarn, optimized for repository-scale understanding.
+        - Agentic Coding supporting for most platform such as Qwen Code, CLINE, featuring a specially designed function call format.
+
+
+    Model Overview:
+    Qwen3-Coder-30B-A3B-Instruct has the following features:
+
+        - Type: Causal Language Models
+        - Training Stage: Pretraining & Post-training
+        - Number of Parameters: 30.5B in total and 3.3B activated
+        - Number of Layers: 48
+        - Number of Attention Heads (GQA): 32 for Q and 4 for KV
+        - Number of Experts: 128
+        - Number of Activated Experts: 8
+        - Context Length: 262,144 natively.
+
+    NOTE: This model supports only non-thinking mode and does not generate <think></think> blocks in its output. Meanwhile, specifying enable_thinking=False is no longer required.
+  overrides:
+    parameters:
+      model: Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf
+  files:
+    - filename: Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf
+      sha256: fadc3e5f8d42bf7e894a785b05082e47daee4df26680389817e2093056f088ad
+      uri: huggingface://unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF/Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf
 - &gemma3
  url: "github:mudler/LocalAI/gallery/gemma.yaml@master"
  name: "gemma-3-27b-it"
Author	SHA1	Message	Date
Ettore Di Giacinto	5f403b1631	chore: drop neutts for l4t (#8101 ) Builds exhausts CI currently, and there are better backends at this point in time. We will probably deprecate it in the future. Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-01-18 21:55:56 +01:00
rampa3	897ad1729e	chore(model gallery): add qwen3-coder-30b-a3b-instruct based on model request (#8082 ) * chore(model gallery): add qwen3-coder-30b-a3b-instruct based on model request Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com> * added missing model config import URL Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com> --------- Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>	2026-01-18 09:23:07 +01:00
LocalAI [bot]	16a18a2e55	chore: ⬆️ Update leejet/stable-diffusion.cpp to `9565c7f6bd5fcff124c589147b2621244f2c4aa1` (#8086 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-01-17 22:12:21 +01:00
Ettore Di Giacinto	3387bfaee0	feat(api): add support for open responses specification (#8063 ) * feat: openresponses Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add ttl settings, fix tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: register cors middleware by default Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * satisfy schema Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Logitbias and logprobs Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add grammar Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * SSE compliance Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * tool JSON conversion Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * support background mode Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * swagger Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * drop code. This is handled in the handler Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Small refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * background mode for MCP Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-01-17 22:11:47 +01:00
LocalAI [bot]	1cd33047b4	chore: ⬆️ Update ggml-org/llama.cpp to `2fbde785bc106ae1c4102b0e82b9b41d9c466579` (#8087 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-01-17 21:10:18 +00:00
Ettore Di Giacinto	1de045311a	chore(ui): add video generation link (#8079 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-01-17 09:49:47 +01:00
LocalAI [bot]	5fe9bf9f84	chore: ⬆️ Update ggml-org/whisper.cpp to `f53dc74843e97f19f94a79241357f74ad5b691a6` (#8074 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-01-17 08:32:53 +01:00
LocalAI [bot]	d4fd0c0609	chore: ⬆️ Update ggml-org/llama.cpp to `388ce822415f24c60fcf164a321455f1e008cafb` (#8073 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-01-16 21:22:33 +00:00
Ettore Di Giacinto	d16722ee13	Revert "chore(deps): bump torch from 2.3.1+cxx11.abi to 2.8.0 in /backend/python/rerankers in the pip group across 1 directory" (#8072 ) Revert "chore(deps): bump torch from 2.3.1+cxx11.abi to 2.8.0 in /backend/pyt…" This reverts commit `1f10ab39a9`.	2026-01-16 20:50:33 +01:00
dependabot[bot]	1f10ab39a9	chore(deps): bump torch from 2.3.1+cxx11.abi to 2.8.0 in /backend/python/rerankers in the pip group across 1 directory (#8066 ) chore(deps): bump torch Bumps the pip group with 1 update in the /backend/python/rerankers directory: [torch](https://github.com/pytorch/pytorch). Updates `torch` from 2.3.1+cxx11.abi to 2.8.0 - [Release notes](https://github.com/pytorch/pytorch/releases) - [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md) - [Commits](https://github.com/pytorch/pytorch/commits/v2.8.0) --- updated-dependencies: - dependency-name: torch dependency-version: 2.8.0 dependency-type: direct:production dependency-group: pip ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-01-16 19:38:12 +00:00
Ettore Di Giacinto	4d36e393d1	fix(ci): use more beefy runner for expensive jobs (#8065 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-01-16 19:26:40 +01:00