chore(deps): bump torch in /backend/python/vllm

Bumps torch from 2.9.1+cpu to 2.12.1+xpu. --- updated-dependencies: - dependency-name: torch dependency-version: 2.12.1+xpu dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
2026-06-22 15:49:12 -04:00 · 2026-06-22 18:33:32 +00:00
50 changed files with 167 additions and 1427 deletions
--- a/.github/workflows/backend.yml
+++ b/.github/workflows/backend.yml
@@ -44,7 +44,7 @@ jobs:
      has-merges-singlearch: ${{ steps.set-matrix.outputs['has-merges-singlearch'] }}
    steps:
      - name: Checkout repository
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6

      - name: Setup Bun
        uses: oven-sh/setup-bun@v2
--- a/.github/workflows/backend_build.yml
+++ b/.github/workflows/backend_build.yml
@@ -101,7 +101,7 @@ jobs:
    steps:

      - name: Checkout
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true

--- a/.github/workflows/backend_build_darwin.yml
+++ b/.github/workflows/backend_build_darwin.yml
@@ -57,7 +57,7 @@ jobs:
      HOMEBREW_NO_ANALYTICS: '1'
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true

--- a/.github/workflows/backend_merge.yml
+++ b/.github/workflows/backend_merge.yml
@@ -49,7 +49,7 @@ jobs:
      # Sparse checkout: the merge job needs `.github/scripts/` (for the
      # keepalive cleanup script) but none of the source tree.
      - name: Checkout (.github/scripts only)
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          sparse-checkout: |
            .github/scripts
--- a/.github/workflows/backend_pr.yml
+++ b/.github/workflows/backend_pr.yml
@@ -23,7 +23,7 @@ jobs:
      has-merges-singlearch: ${{ steps.set-matrix.outputs['has-merges-singlearch'] }}
    steps:
      - name: Checkout repository
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6

      - name: Setup Bun
        uses: oven-sh/setup-bun@v2
--- a/.github/workflows/base-images.yml
+++ b/.github/workflows/base-images.yml
@@ -127,7 +127,7 @@ jobs:
            # the original l4t matrix entry which set skip-drivers: 'true'.
            skip-drivers: 'true'
    steps:
-      - uses: actions/checkout@v7
+      - uses: actions/checkout@v6
        with:
          submodules: false
      - name: Free disk space
--- a/.github/workflows/build-test.yaml
+++ b/.github/workflows/build-test.yaml
@@ -11,7 +11,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          fetch-depth: 0
      - name: Set up Go
@@ -25,7 +25,7 @@ jobs:
    runs-on: macos-latest
    steps:
      - name: Checkout
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          fetch-depth: 0
      - name: Set up Go
@@ -47,7 +47,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          fetch-depth: 0
      - name: Configure apt mirror on runner
--- a/.github/workflows/bump-inference-defaults.yml
+++ b/.github/workflows/bump-inference-defaults.yml
@@ -14,7 +14,7 @@ jobs:
  bump:
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@v7
+      - uses: actions/checkout@v6

      - uses: actions/setup-go@v5
        with:
--- a/.github/workflows/bump_deps.yaml
+++ b/.github/workflows/bump_deps.yaml
@@ -92,7 +92,7 @@ jobs:
            file: "backend/go/vibevoice-cpp/Makefile"
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@v7
+      - uses: actions/checkout@v6
      - name: Bump dependencies 🔧
        id: bump
        run: |
@@ -128,7 +128,7 @@ jobs:
    if: github.repository == 'mudler/LocalAI'
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@v7
+      - uses: actions/checkout@v6
      - name: Bump vLLM cu130 wheel pin 🔧
        id: bump
        run: |
--- a/.github/workflows/bump_docs.yaml
+++ b/.github/workflows/bump_docs.yaml
@@ -13,7 +13,7 @@ jobs:
          - repository: "mudler/LocalAI"
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@v7
+      - uses: actions/checkout@v6
      - name: Bump dependencies 🔧
        run: |
          bash .github/bump_docs.sh ${{ matrix.repository }}
--- a/.github/workflows/checksum_checker.yaml
+++ b/.github/workflows/checksum_checker.yaml
@@ -8,7 +8,7 @@ jobs:
    if: github.repository == 'mudler/LocalAI'
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@v7
+      - uses: actions/checkout@v6
      - name: Configure apt mirror on runner
        uses: ./.github/actions/configure-apt-mirror
      - name: Install dependencies
--- a/.github/workflows/deploy-explorer.yaml
+++ b/.github/workflows/deploy-explorer.yaml
@@ -16,7 +16,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - uses: actions/setup-go@v5
--- a/.github/workflows/gallery-agent.yaml
+++ b/.github/workflows/gallery-agent.yaml
@@ -31,7 +31,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          token: ${{ secrets.GITHUB_TOKEN }}

--- a/.github/workflows/generate_intel_image.yaml
+++ b/.github/workflows/generate_intel_image.yaml
@@ -44,7 +44,7 @@ jobs:
        uses: docker/setup-buildx-action@master

      - name: Checkout
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6

      - name: Cache Intel images
        uses: docker/build-push-action@v7
--- a/.github/workflows/gh-pages.yml
+++ b/.github/workflows/gh-pages.yml
@@ -28,7 +28,7 @@ jobs:
      HUGO_VERSION: "0.146.3"
    steps:
      - name: Checkout
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          fetch-depth: 0  # needed for enableGitInfo
          submodules: true
--- a/.github/workflows/image_build.yml
+++ b/.github/workflows/image_build.yml
@@ -80,7 +80,7 @@ jobs:
    steps:

      - name: Checkout
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6

      - name: Configure apt mirror on runner
        id: apt_mirror
--- a/.github/workflows/image_merge.yml
+++ b/.github/workflows/image_merge.yml
@@ -36,7 +36,7 @@ jobs:
      # Sparse checkout: needed for .github/scripts/ (the keepalive cleanup
      # script). Skips the rest of the source tree.
      - name: Checkout (.github/scripts only)
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          sparse-checkout: |
            .github/scripts
--- a/.github/workflows/lint.yml
+++ b/.github/workflows/lint.yml
@@ -20,7 +20,7 @@ jobs:
  golangci-lint:
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@v7
+      - uses: actions/checkout@v6
        with:
          # Full history so golangci-lint's new-from-merge-base can reach
          # origin/master and compute the diff against it.
--- a/.github/workflows/release.yaml
+++ b/.github/workflows/release.yaml
@@ -10,7 +10,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          fetch-depth: 0
      - name: Set up Go
@@ -28,7 +28,7 @@ jobs:
    runs-on: macos-latest
    steps:
      - name: Checkout
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          fetch-depth: 0
      - name: Set up Go
@@ -46,7 +46,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          fetch-depth: 0
      - name: Configure apt mirror on runner
--- a/.github/workflows/secscan.yaml
+++ b/.github/workflows/secscan.yaml
@@ -14,7 +14,7 @@ jobs:
      GO111MODULE: on
    steps:
      - name: Checkout Source
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        if: ${{ github.actor != 'dependabot[bot]' }}
      - name: Run Gosec Security Scanner
        if: ${{ github.actor != 'dependabot[bot]' }}
--- a/.github/workflows/test-extra.yml
+++ b/.github/workflows/test-extra.yml
@@ -50,7 +50,7 @@ jobs:
      parakeet-cpp: ${{ steps.detect.outputs.parakeet-cpp }}
    steps:
      - name: Checkout repository
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
      - name: Setup Bun
        uses: oven-sh/setup-bun@v2
      - name: Install dependencies
@@ -67,7 +67,7 @@ jobs:
  #   runs-on: ubuntu-latest
  #   steps:
  #     - name: Clone
-  #       uses: actions/checkout@v7
+  #       uses: actions/checkout@v6
  #       with:
  #         submodules: true
  #     - name: Dependencies
@@ -90,7 +90,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Dependencies
@@ -113,7 +113,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Dependencies
@@ -137,7 +137,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Dependencies
@@ -158,7 +158,7 @@ jobs:
  #  runs-on: ubuntu-latest
  #  steps:
  #    - name: Clone
-  #      uses: actions/checkout@v7
+  #      uses: actions/checkout@v6
  #      with:
  #        submodules: true
  #    - name: Dependencies
@@ -178,7 +178,7 @@ jobs:
  #   runs-on: ubuntu-latest
  #   steps:
  #     - name: Clone
-  #       uses: actions/checkout@v7
+  #       uses: actions/checkout@v6
  #       with:
  #         submodules: true
  #     - name: Dependencies
@@ -240,7 +240,7 @@ jobs:
  #           sudo rm -rf "$AGENT_TOOLSDIRECTORY" || true
  #           df -h
  #     - name: Clone
-  #       uses: actions/checkout@v7
+  #       uses: actions/checkout@v6
  #       with:
  #         submodules: true
  #     - name: Dependencies
@@ -265,7 +265,7 @@ jobs:
  #   runs-on: ubuntu-latest
  #   steps:
  #     - name: Clone
-  #       uses: actions/checkout@v7
+  #       uses: actions/checkout@v6
  #       with:
  #         submodules: true
  #     - name: Dependencies
@@ -288,7 +288,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Dependencies
@@ -309,7 +309,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Dependencies
@@ -330,7 +330,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Dependencies
@@ -351,7 +351,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Dependencies
@@ -373,7 +373,7 @@ jobs:
  #   timeout-minutes: 45
  #   steps:
  #     - name: Clone
-  #       uses: actions/checkout@v7
+  #       uses: actions/checkout@v6
  #       with:
  #         submodules: true
  #     - name: Dependencies
@@ -394,7 +394,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Dependencies
@@ -415,7 +415,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Dependencies
@@ -436,7 +436,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Dependencies
@@ -462,7 +462,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Dependencies
@@ -484,7 +484,7 @@ jobs:
    timeout-minutes: 30
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Dependencies
@@ -513,7 +513,7 @@ jobs:
    timeout-minutes: 90
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Setup Go
@@ -530,7 +530,7 @@ jobs:
    timeout-minutes: 90
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Setup Go
@@ -552,7 +552,7 @@ jobs:
    timeout-minutes: 20
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Setup Go
@@ -579,7 +579,7 @@ jobs:
    timeout-minutes: 90
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Setup Go
@@ -604,7 +604,7 @@ jobs:
    timeout-minutes: 90
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Setup Go
@@ -625,7 +625,7 @@ jobs:
    timeout-minutes: 90
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Setup Go
@@ -645,7 +645,7 @@ jobs:
    timeout-minutes: 90
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Setup Go
@@ -664,7 +664,7 @@ jobs:
    timeout-minutes: 90
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Setup Go
@@ -681,7 +681,7 @@ jobs:
    timeout-minutes: 90
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Setup Go
@@ -698,7 +698,7 @@ jobs:
    timeout-minutes: 90
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Setup Go
@@ -741,7 +741,7 @@ jobs:
  #   timeout-minutes: 90
  #   steps:
  #     - name: Clone
-  #       uses: actions/checkout@v7
+  #       uses: actions/checkout@v6
  #       with:
  #         submodules: true
  #     - name: Dependencies
@@ -783,7 +783,7 @@ jobs:
  #   timeout-minutes: 90
  #   steps:
  #     - name: Clone
-  #       uses: actions/checkout@v7
+  #       uses: actions/checkout@v6
  #       with:
  #         submodules: true
  #     - name: Dependencies
@@ -808,7 +808,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Dependencies
@@ -840,7 +840,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Dependencies
@@ -876,7 +876,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Dependencies
@@ -915,7 +915,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Dependencies
@@ -952,7 +952,7 @@ jobs:
    timeout-minutes: 90
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Dependencies
@@ -987,7 +987,7 @@ jobs:
    timeout-minutes: 90
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Setup Go
@@ -1013,7 +1013,7 @@ jobs:
    timeout-minutes: 150
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Dependencies
@@ -1042,7 +1042,7 @@ jobs:
    timeout-minutes: 60
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Setup Go
@@ -1058,7 +1058,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Dependencies
@@ -1091,7 +1091,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Dependencies
@@ -1114,7 +1114,7 @@ jobs:
    timeout-minutes: 90
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Dependencies
@@ -1140,7 +1140,7 @@ jobs:
    timeout-minutes: 90
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Dependencies
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -21,7 +21,7 @@ jobs:
        go-version: ['1.26.x']
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Free disk space
@@ -84,7 +84,7 @@ jobs:
        go-version: ['1.26.x']
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Setup Go ${{ matrix.go-version }}
--- a/.github/workflows/tests-aio.yml
+++ b/.github/workflows/tests-aio.yml
@@ -62,7 +62,7 @@ jobs:
          sudo rm -rfv build || true
          df -h
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Dependencies
--- a/.github/workflows/tests-e2e.yml
+++ b/.github/workflows/tests-e2e.yml
@@ -21,7 +21,7 @@ jobs:
        go-version: ['1.25.x']
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Configure apt mirror on runner
--- a/.github/workflows/tests-pii-ner-e2e.yml
+++ b/.github/workflows/tests-pii-ner-e2e.yml
@@ -57,7 +57,7 @@ jobs:
        go-version: ['1.25.x']
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Free disk space
--- a/.github/workflows/tests-ui-e2e.yml
+++ b/.github/workflows/tests-ui-e2e.yml
@@ -23,7 +23,7 @@ jobs:
        go-version: ['1.26.x']
    steps:
      - name: Clone
-        uses: actions/checkout@v7
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Configure apt mirror on runner
--- a/.github/workflows/update_swagger.yaml
+++ b/.github/workflows/update_swagger.yaml
@@ -10,7 +10,7 @@ jobs:
      fail-fast: false
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@v7
+      - uses: actions/checkout@v6
      - name: Configure apt mirror on runner
        uses: ./.github/actions/configure-apt-mirror
      - uses: actions/setup-go@v5
--- a/backend/python/vllm/requirements-cpu.txt
+++ b/backend/python/vllm/requirements-cpu.txt
@@ -1,6 +1,6 @@
 --extra-index-url https://download.pytorch.org/whl/cpu
 accelerate
-torch==2.9.1+cpu
+torch==2.12.1+xpu
 torchvision
 torchaudio
 transformers
--- a/core/config/meta/registry.go
+++ b/core/config/meta/registry.go
@@ -537,36 +537,6 @@ func DefaultRegistry() map[string]FieldMetaOverride {
 			Component:   "number",
 			Order:       79,
 		},
-		"pipeline.compaction.enabled": {
-			Section:     "pipeline",
-			Label:       "Compaction Enabled",
-			Description: "Fold conversation items that age out of the live window (Max History Items) into a rolling summary instead of dropping them, so long realtime sessions stay cheap without losing earlier context. Off by default.",
-			Component:   "toggle",
-			Order:       80,
-		},
-		"pipeline.compaction.trigger_items": {
-			Section:     "pipeline",
-			Label:       "Compaction Trigger Items",
-			Description: "High-water mark: once the live conversation exceeds this many items, the overflow above Max History Items is summarized and evicted. Must be greater than Max History Items; defaults to twice it. The gap controls how often summarization runs.",
-			Component:   "number",
-			Order:       81,
-		},
-		"pipeline.compaction.summary_model": {
-			Section:     "pipeline",
-			Label:       "Compaction Summary Model",
-			Description: "Optional smaller/cheaper model used to produce the rolling summary. Empty reuses the pipeline's own LLM. On CPU, a tiny model here keeps compaction from competing with the conversation LLM.",
-			Component:   "input",
-			Advanced:    true,
-			Order:       82,
-		},
-		"pipeline.compaction.max_summary_tokens": {
-			Section:     "pipeline",
-			Label:       "Compaction Max Summary Tokens",
-			Description: "Advisory cap on the rolling summary length (fed to the summarizer prompt). Defaults to 512.",
-			Component:   "number",
-			Advanced:    true,
-			Order:       83,
-		},

 		// --- Functions ---
 		"function.grammar.parallel_calls": {
--- a/core/config/model_config.go
+++ b/core/config/model_config.go
@@ -641,32 +641,11 @@ type Pipeline struct {
 	// context fills.
 	MaxHistoryItems *int `yaml:"max_history_items,omitempty" json:"max_history_items,omitempty"`

-	// Compaction folds conversation items that age out of the live window
-	// (max_history_items) into a rolling summary instead of dropping them, so
-	// long realtime sessions stay cheap without losing earlier context. Nil
-	// (block absent) means disabled, preserving existing behavior.
-	Compaction *PipelineCompaction `yaml:"compaction,omitempty" json:"compaction,omitempty"`
-
 	// VoiceRecognition gates the pipeline behind speaker verification. Nil
 	// (block absent) means no gate, preserving existing behavior.
 	VoiceRecognition *PipelineVoiceRecognition `yaml:"voice_recognition,omitempty" json:"voice_recognition,omitempty"`
 }

-// PipelineCompaction configures summarize-then-drop for a realtime pipeline.
-type PipelineCompaction struct {
-	// Enabled turns summarize-then-drop on. Default false.
-	Enabled bool `yaml:"enabled,omitempty" json:"enabled,omitempty"`
-	// TriggerItems is the high-water mark: once live items exceed it, overflow
-	// above max_history_items is summarized and evicted. Must exceed
-	// max_history_items; clamped up if not. Default: 2x max_history_items.
-	TriggerItems int `yaml:"trigger_items,omitempty" json:"trigger_items,omitempty"`
-	// SummaryModel optionally names a smaller/cheaper model for the summary
-	// call. Empty uses the pipeline's own LLM.
-	SummaryModel string `yaml:"summary_model,omitempty" json:"summary_model,omitempty"`
-	// MaxSummaryTokens advises the summary length (fed to the prompt). Default 512.
-	MaxSummaryTokens int `yaml:"max_summary_tokens,omitempty" json:"max_summary_tokens,omitempty"`
-}
-
 // ApplyReasoningEffort resolves the effective reasoning effort — a per-request
 // value (requestEffort) overrides the config's own ReasoningEffort default —
 // stores it on the config so gRPCPredictOpts forwards it to the backend as the
--- a/core/http/endpoints/openai/realtime.go
+++ b/core/http/endpoints/openai/realtime.go
@@ -12,7 +12,6 @@ import (
 	"os"
 	"strconv"
 	"sync"
-	"sync/atomic"
 	"time"

 	"net/http"
@@ -135,18 +134,6 @@ type Session struct {
 	// pairs are kept together so we never feed an orphaned tool result.
 	MaxHistoryItems int

-	// Compaction settings resolved from pipeline.compaction (see resolveCompaction).
-	CompactionEnabled bool
-	CompactionTrigger int
-	SummaryModel      string
-	MaxSummaryTokens  int
-
-	// summarizerFactory lazily builds the model used for compaction summaries
-	// when summary_model is configured; nil means reuse the pipeline LLM.
-	summarizerFactory func() (Model, error)
-	summarizerOnce    sync.Once
-	summarizerCached  Model
-
 	// AssistantExecutor is non-nil when the session opted into the in-process
 	// LocalAI Assistant tool surface. Tool calls whose name matches this
 	// executor's catalog are run inproc and their output is fed back to the
@@ -254,12 +241,6 @@ type Conversation struct {
 	ID    string
 	Items []*types.MessageItemUnion
 	Lock  sync.Mutex
-	// Memory is the rolling summary of items already evicted by compaction. It
-	// is kept out of Items (so trimRealtimeItems never drops it) and rendered
-	// as a system message right after the session instructions.
-	Memory string
-	// compacting ensures at most one background compaction runs per conversation.
-	compacting atomic.Bool
 }

 func (c *Conversation) ToServer() types.Conversation {
@@ -559,12 +540,13 @@ func runRealtimeSession(application *application.Application, t Transport, model
 		SoundDetectionWindowMs:  cfg.Pipeline.SoundDetectionWindowMs,
 		SoundDetectionHopMs:     cfg.Pipeline.SoundDetectionHopMs,
 	}
-	session.CompactionEnabled, session.CompactionTrigger, session.MaxSummaryTokens, session.SummaryModel = resolveCompaction(cfg, session.MaxHistoryItems)

 	// Create a default conversation
 	conversationID := generateConversationID()
 	conversation := &Conversation{
-		ID:    conversationID,
+		ID: conversationID,
+		// TODO: We need to truncate the conversation items when a new item is added and we have run out of space. There are multiple places where items
+		//       can be added so we could use a datastructure here that enforces truncation upon addition
 		Items: []*types.MessageItemUnion{},
 	}
 	session.Conversations[conversationID] = conversation
@@ -595,18 +577,6 @@ func runRealtimeSession(application *application.Application, t Transport, model
 	}
 	session.ModelInterface = m

-	if session.SummaryModel != "" {
-		summaryModelName := session.SummaryModel
-		sid := sessionID
-		session.summarizerFactory = func() (Model, error) {
-			summaryCfg, lerr := application.ModelConfigLoader().LoadModelConfigFileByNameDefaultOptions(summaryModelName, application.ApplicationConfig())
-			if lerr != nil {
-				return nil, fmt.Errorf("load summary model config %q: %w", summaryModelName, lerr)
-			}
-			return newModel(&summaryCfg.Pipeline, application.ModelConfigLoader(), application.ModelLoader(), application.ApplicationConfig(), evaluator, buildRealtimeRoutingContext(application, sid))
-		}
-	}
-
 	if cfg.Pipeline.VoiceGateEnabled() {
 		gate, gerr := newVoiceGate(
 			*cfg.Pipeline.VoiceRecognition,
@@ -837,15 +807,6 @@ func runRealtimeSession(application *application.Application, t Transport, model
 				commitUtterance(respCtx, allAudio, session, conversation, t)
 			}()

-		case types.InputAudioBufferClearEvent:
-			xlog.Debug("recv", "message", string(msg))
-			// Discard a partially-captured utterance so the client can restart
-			// input cleanly without the stale buffer leaking into the next commit.
-			clearInputAudio(session)
-			sendEvent(t, types.InputAudioBufferClearedEvent{
-				ServerEventBase: types.ServerEventBase{EventID: e.EventID},
-			})
-
 		case types.ConversationItemCreateEvent:
 			xlog.Debug("recv", "message", string(msg))
 			// Add the item to the conversation
@@ -880,39 +841,7 @@ func runRealtimeSession(application *application.Application, t Transport, model
 			})

 		case types.ConversationItemDeleteEvent:
-			xlog.Debug("recv", "message", string(msg))
-			if e.ItemID == "" {
-				sendError(t, "invalid_item_id", "Need item_id, but none specified", "", "event_TODO")
-				continue
-			}
-			conversation.Lock.Lock()
-			updated, ok := deleteItem(conversation.Items, e.ItemID)
-			conversation.Items = updated
-			conversation.Lock.Unlock()
-			if !ok {
-				sendError(t, "invalid_item_id", "Item to delete not found", "", "event_TODO")
-				continue
-			}
-			sendEvent(t, types.ConversationItemDeletedEvent{
-				ServerEventBase: types.ServerEventBase{EventID: e.EventID},
-				ItemID:          e.ItemID,
-			})
-
-		case types.ConversationItemTruncateEvent:
-			xlog.Debug("recv", "message", string(msg))
-			conversation.Lock.Lock()
-			ok := truncateAssistantText(conversation.Items, e.ItemID, e.ContentIndex)
-			conversation.Lock.Unlock()
-			if !ok {
-				sendError(t, "invalid_item_id", "Item to truncate not found", "", "event_TODO")
-				continue
-			}
-			sendEvent(t, types.ConversationItemTruncatedEvent{
-				ServerEventBase: types.ServerEventBase{EventID: e.EventID},
-				ItemID:          e.ItemID,
-				ContentIndex:    e.ContentIndex,
-				AudioEndMs:      e.AudioEndMs,
-			})
+			sendError(t, "not_implemented", "Deleting items not implemented", "", "event_TODO")

 		case types.ConversationItemRetrieveEvent:
 			xlog.Debug("recv", "message", string(msg))
@@ -925,7 +854,21 @@ func runRealtimeSession(application *application.Application, t Transport, model
 			conversation.Lock.Lock()
 			var retrievedItem types.MessageItemUnion
 			for _, item := range conversation.Items {
-				if itemID(item) == e.ItemID {
+				// We need to check ID in the union
+				var id string
+				if item.System != nil {
+					id = item.System.ID
+				} else if item.User != nil {
+					id = item.User.ID
+				} else if item.Assistant != nil {
+					id = item.Assistant.ID
+				} else if item.FunctionCall != nil {
+					id = item.FunctionCall.ID
+				} else if item.FunctionCallOutput != nil {
+					id = item.FunctionCallOutput.ID
+				}
+
+				if id == e.ItemID {
 					retrievedItem = *item
 					break
 				}
@@ -1723,9 +1666,6 @@ const maxAssistantToolTurns = 10

 func triggerResponse(ctx context.Context, session *Session, conv *Conversation, t Transport, overrides *types.ResponseCreateParams) {
 	triggerResponseAtTurn(ctx, session, conv, t, overrides, 0)
-	// Fold aged-out turns into the rolling memory off the critical path; the
-	// next turn reaps the smaller buffer.
-	session.maybeCompact(conv)
 }

 func triggerResponseAtTurn(ctx context.Context, session *Session, conv *Conversation, t Transport, overrides *types.ResponseCreateParams, toolTurn int) {
@@ -1781,7 +1721,6 @@ func triggerResponseAtTurn(ctx context.Context, session *Session, conv *Conversa
 	var lastUserSpeaker *types.Speaker
 	personalize := session.voiceGate != nil && session.voiceGate.cfg.PersonalizeEnabled()
 	conv.Lock.Lock()
-	conversationHistory = withMemory(conversationHistory, conv.Memory)
 	items := trimRealtimeItems(conv.Items, session.MaxHistoryItems)
 	for _, item := range items {
 		if item.User != nil {
--- a/core/http/endpoints/openai/realtime_compaction.go
+++ b/core/http/endpoints/openai/realtime_compaction.go
@@ -1,326 +0,0 @@
-package openai
-
-import (
-	"context"
-	"fmt"
-	"strings"
-	"time"
-
-	"github.com/mudler/LocalAI/core/config"
-	"github.com/mudler/LocalAI/core/http/endpoints/openai/types"
-	"github.com/mudler/LocalAI/core/schema"
-	"github.com/mudler/LocalAI/pkg/reasoning"
-	"github.com/mudler/xlog"
-)
-
-const (
-	defaultMaxSummaryTokens = 512
-	memoryPrefix            = "Summary of earlier conversation:\n"
-	// compactionTimeout bounds the summarizer call so a stuck model can't pin the
-	// compacting flag (and thus block all further compaction) forever.
-	compactionTimeout = 60 * time.Second
-)
-
-// withMemory inserts the rolling summary as a system message after the existing
-// (instructions) history. No-op when memory is empty.
-func withMemory(history schema.Messages, memory string) schema.Messages {
-	if memory == "" {
-		return history
-	}
-	content := memoryPrefix + memory
-	return append(history, schema.Message{
-		Role:          string(types.MessageRoleSystem),
-		StringContent: content,
-		Content:       content,
-	})
-}
-
-// renderItemsTranscript renders conversation items as a plain "role: text"
-// transcript for summarization. Non-text items (bare tool calls) are labelled
-// so the summarizer keeps track of actions taken.
-func renderItemsTranscript(items []*types.MessageItemUnion) string {
-	var b strings.Builder
-	for _, item := range items {
-		switch {
-		case item.User != nil:
-			b.WriteString("user: ")
-			for _, c := range item.User.Content {
-				if c.Text != "" {
-					b.WriteString(c.Text)
-				}
-				if c.Transcript != "" {
-					b.WriteString(c.Transcript)
-				}
-			}
-			b.WriteString("\n")
-		case item.Assistant != nil:
-			b.WriteString("assistant: ")
-			// Realtime assistant *audio* turns store the spoken words in
-			// .Transcript (not .Text), so emit both or spoken turns are dropped.
-			for _, c := range item.Assistant.Content {
-				if c.Text != "" {
-					b.WriteString(c.Text)
-				}
-				if c.Transcript != "" {
-					b.WriteString(c.Transcript)
-				}
-			}
-			b.WriteString("\n")
-		case item.FunctionCall != nil:
-			b.WriteString(fmt.Sprintf("assistant called tool %s(%s)\n", item.FunctionCall.Name, item.FunctionCall.Arguments))
-		case item.FunctionCallOutput != nil:
-			b.WriteString(fmt.Sprintf("tool result: %s\n", item.FunctionCallOutput.Output))
-		}
-	}
-	return strings.TrimSpace(b.String())
-}
-
-// buildSummaryMessages builds the chat messages for the summarizer LLM: a system
-// instruction plus prior memory and the new transcript to fold in. maxTokens is
-// advisory (fed to the prompt; not hard-enforced in v1).
-func buildSummaryMessages(priorMemory, transcript string, maxTokens int) schema.Messages {
-	system := fmt.Sprintf("You maintain a running memory of a live voice conversation. "+
-		"Merge the prior memory with the new exchanges into an updated memory. "+
-		"Keep names, decisions, facts, preferences, and open threads. Be concise "+
-		"(under ~%d tokens). Output only the updated memory, with no reasoning or tags.", maxTokens)
-	var user strings.Builder
-	if priorMemory != "" {
-		user.WriteString("Prior memory:\n")
-		user.WriteString(priorMemory)
-		user.WriteString("\n\n")
-	}
-	user.WriteString("New exchanges to fold in:\n")
-	user.WriteString(transcript)
-	return schema.Messages{
-		{Role: string(types.MessageRoleSystem), StringContent: system, Content: system},
-		{Role: string(types.MessageRoleUser), StringContent: user.String(), Content: user.String()},
-	}
-}
-
-// clearInputAudio resets the session's pending input audio buffer (the raw
-// PCM and any buffered Opus frames). Used by the input_audio_buffer.clear
-// realtime event so a client can discard a partially-captured utterance.
-func clearInputAudio(s *Session) {
-	s.AudioBufferLock.Lock()
-	s.InputAudioBuffer = nil
-	s.AudioBufferLock.Unlock()
-	s.OpusFramesLock.Lock()
-	s.OpusFrames = nil
-	s.OpusFramesLock.Unlock()
-}
-
-// itemID extracts the id from any MessageItemUnion variant ("" if none).
-func itemID(item *types.MessageItemUnion) string {
-	switch {
-	case item == nil:
-		return ""
-	case item.System != nil:
-		return item.System.ID
-	case item.User != nil:
-		return item.User.ID
-	case item.Assistant != nil:
-		return item.Assistant.ID
-	case item.FunctionCall != nil:
-		return item.FunctionCall.ID
-	case item.FunctionCallOutput != nil:
-		return item.FunctionCallOutput.ID
-	default:
-		return ""
-	}
-}
-
-// deleteItem removes the item with id from items, returning the new slice and
-// whether it was found.
-func deleteItem(items []*types.MessageItemUnion, id string) ([]*types.MessageItemUnion, bool) {
-	for i, item := range items {
-		if itemID(item) == id {
-			return append(items[:i:i], items[i+1:]...), true
-		}
-	}
-	return items, false
-}
-
-// truncateAssistantText clears the text of the assistant item's content part at
-// contentIndex. Minimal truncate: used to discard an interrupted/barge-in
-// response tail. Both .Text and .Transcript are cleared because realtime audio
-// turns store the spoken words in .Transcript (clearing only .Text would no-op).
-func truncateAssistantText(items []*types.MessageItemUnion, id string, contentIndex int) bool {
-	for _, item := range items {
-		if itemID(item) != id || item.Assistant == nil {
-			continue
-		}
-		if contentIndex >= 0 && contentIndex < len(item.Assistant.Content) {
-			item.Assistant.Content[contentIndex].Text = ""
-			item.Assistant.Content[contentIndex].Transcript = ""
-		}
-		return true
-	}
-	return false
-}
-
-// compactionCut returns the index splitting items into overflow (items[:cut],
-// to be summarized+evicted) and the kept live tail (items[cut:]), keeping the
-// last `keep` items. It mirrors trimRealtimeItems' pair-safety: the cut is
-// pulled left so a function_call and its function_call_output are never split
-// across the boundary (the whole pair lands in the kept tail). Returns 0 when
-// there is nothing to cut.
-func compactionCut(items []*types.MessageItemUnion, keep int) int {
-	// keep <= 0 means no live-window cap (the "unlimited history" sentinel, as
-	// in trimRealtimeItems): there is nothing to evict, so cut nothing. This
-	// also avoids indexing items[len(items)] in the pair-safety loop below.
-	if keep <= 0 {
-		return 0
-	}
-	cut := len(items) - keep
-	if cut <= 0 {
-		return 0
-	}
-	for cut > 0 && items[cut] != nil && items[cut].FunctionCallOutput != nil {
-		cut--
-	}
-	return cut
-}
-
-// resolveCompaction reads the pipeline.compaction block, applying defaults and
-// the trigger>max_history invariant. maxHistory is the already-resolved live
-// window size. Returns enabled=false (and zero values) when compaction is off.
-func resolveCompaction(cfg *config.ModelConfig, maxHistory int) (enabled bool, trigger, maxSummaryTokens int, summaryModel string) {
-	if cfg == nil || cfg.Pipeline.Compaction == nil || !cfg.Pipeline.Compaction.Enabled {
-		return false, 0, 0, ""
-	}
-	c := cfg.Pipeline.Compaction
-	trigger = c.TriggerItems
-	if trigger <= 0 {
-		trigger = maxHistory * 2
-	}
-	if trigger <= maxHistory {
-		trigger = maxHistory + 1
-	}
-	maxSummaryTokens = c.MaxSummaryTokens
-	if maxSummaryTokens <= 0 {
-		maxSummaryTokens = defaultMaxSummaryTokens
-	}
-	return true, trigger, maxSummaryTokens, c.SummaryModel
-}
-
-// prefixMatches reports whether items begins with the same ids, in order, as
-// snapshot — i.e. the overflow we summarized is still at the head (no concurrent
-// client delete reshuffled it).
-func prefixMatches(items, snapshot []*types.MessageItemUnion) bool {
-	if len(items) < len(snapshot) {
-		return false
-	}
-	for i := range snapshot {
-		if itemID(items[i]) != itemID(snapshot[i]) {
-			return false
-		}
-	}
-	return true
-}
-
-// compact folds overflow items into conv.Memory and evicts them. It never holds
-// conv.Lock across the summarizer call: snapshot under lock, summarize unlocked,
-// commit under lock (re-validating the head is unchanged). On any error it
-// leaves the conversation untouched — items are never dropped without a summary.
-func (s *Session) compact(conv *Conversation, model Model) {
-	if model == nil {
-		return
-	}
-	// Snapshot.
-	conv.Lock.Lock()
-	if len(conv.Items) <= s.CompactionTrigger {
-		conv.Lock.Unlock()
-		return
-	}
-	cut := compactionCut(conv.Items, s.MaxHistoryItems)
-	if cut <= 0 {
-		conv.Lock.Unlock()
-		return
-	}
-	overflow := append([]*types.MessageItemUnion(nil), conv.Items[:cut]...)
-	prior := conv.Memory
-	conv.Lock.Unlock()
-
-	// Summarize (unlocked).
-	msgs := buildSummaryMessages(prior, renderItemsTranscript(overflow), s.MaxSummaryTokens)
-	ctx, cancel := context.WithTimeout(context.Background(), compactionTimeout)
-	defer cancel()
-	predFunc, err := model.Predict(ctx, msgs, nil, nil, nil, nil, nil, nil, nil, nil, nil)
-	if err != nil {
-		xlog.Warn("realtime compaction: summarizer predict failed", "error", err)
-		return
-	}
-	pred, err := predFunc()
-	if err != nil {
-		xlog.Warn("realtime compaction: summarizer inference failed", "error", err)
-		return
-	}
-	// Strip any leaked reasoning/thinking spans using the same extractor the
-	// rest of the realtime path uses, rather than a bespoke regex.
-	rcfg := reasoning.Config{}
-	if mc := model.PredictConfig(); mc != nil {
-		rcfg = spokenReasoningConfig(mc.ReasoningConfig)
-	}
-	_, summary := reasoning.ExtractReasoningComplete(pred.Response, "", rcfg)
-	summary = strings.TrimSpace(summary)
-	if summary == "" {
-		xlog.Warn("realtime compaction: empty summary, skipping eviction")
-		return
-	}
-
-	// Commit.
-	conv.Lock.Lock()
-	defer conv.Lock.Unlock()
-	if !prefixMatches(conv.Items, overflow) {
-		xlog.Debug("realtime compaction: head changed during summary, skipping")
-		return
-	}
-	conv.Memory = summary
-	conv.Items = conv.Items[len(overflow):]
-	xlog.Debug("realtime compaction: evicted items into memory", "evicted", len(overflow), "remaining", len(conv.Items))
-}
-
-// summarizerModel resolves the model used to produce compaction summaries.
-// Without a configured summary_model (or factory) it reuses the pipeline LLM.
-func (s *Session) summarizerModel() Model {
-	if s.SummaryModel == "" || s.summarizerFactory == nil {
-		return s.ModelInterface
-	}
-	s.summarizerOnce.Do(func() {
-		m, err := s.summarizerFactory()
-		if err != nil {
-			xlog.Warn("realtime compaction: summary_model load failed, falling back to pipeline LLM", "model", s.SummaryModel, "error", err)
-			m = s.ModelInterface
-		}
-		s.summarizerCached = m
-	})
-	return s.summarizerCached
-}
-
-// maybeCompact schedules a background compaction when the live buffer has grown
-// past the trigger and none is already running. Returns immediately.
-func (s *Session) maybeCompact(conv *Conversation) {
-	if !s.CompactionEnabled {
-		return
-	}
-	conv.Lock.Lock()
-	over := len(conv.Items) > s.CompactionTrigger
-	conv.Lock.Unlock()
-	if !over {
-		return
-	}
-	if !conv.compacting.CompareAndSwap(false, true) {
-		return
-	}
-	go func() {
-		defer conv.compacting.Store(false)
-		// Resolve (and, for a configured summary_model, lazily load) the
-		// summarizer only when a compaction actually runs, off the response
-		// path — so the model load never blocks a user turn.
-		model := s.summarizerModel()
-		if model == nil {
-			return
-		}
-		s.compact(conv, model)
-	}()
-}
--- a/core/http/endpoints/openai/realtime_compaction_test.go
+++ b/core/http/endpoints/openai/realtime_compaction_test.go
@@ -1,308 +0,0 @@
-package openai
-
-import (
-	"errors"
-
-	. "github.com/onsi/ginkgo/v2"
-	. "github.com/onsi/gomega"
-
-	"github.com/mudler/LocalAI/core/backend"
-	"github.com/mudler/LocalAI/core/config"
-	"github.com/mudler/LocalAI/core/http/endpoints/openai/types"
-	"github.com/mudler/LocalAI/core/schema"
-)
-
-var _ = Describe("resolveCompaction", func() {
-	It("disables when the block is absent", func() {
-		enabled, _, _, _ := resolveCompaction(&config.ModelConfig{}, 6)
-		Expect(enabled).To(BeFalse())
-	})
-
-	It("defaults trigger to 2x max history and tokens to 512", func() {
-		cfg := &config.ModelConfig{Pipeline: config.Pipeline{Compaction: &config.PipelineCompaction{Enabled: true}}}
-		enabled, trigger, maxTok, _ := resolveCompaction(cfg, 6)
-		Expect(enabled).To(BeTrue())
-		Expect(trigger).To(Equal(12))
-		Expect(maxTok).To(Equal(512))
-	})
-
-	It("clamps trigger to max history + 1 when misconfigured", func() {
-		cfg := &config.ModelConfig{Pipeline: config.Pipeline{Compaction: &config.PipelineCompaction{Enabled: true, TriggerItems: 4}}}
-		_, trigger, _, _ := resolveCompaction(cfg, 6)
-		Expect(trigger).To(Equal(7))
-	})
-
-	It("honors explicit values", func() {
-		cfg := &config.ModelConfig{Pipeline: config.Pipeline{Compaction: &config.PipelineCompaction{
-			Enabled: true, TriggerItems: 20, MaxSummaryTokens: 256, SummaryModel: "tiny"}}}
-		enabled, trigger, maxTok, model := resolveCompaction(cfg, 6)
-		Expect(enabled).To(BeTrue())
-		Expect(trigger).To(Equal(20))
-		Expect(maxTok).To(Equal(256))
-		Expect(model).To(Equal("tiny"))
-	})
-})
-
-var _ = Describe("deleteItem", func() {
-	mk := func(ids ...string) []*types.MessageItemUnion {
-		out := make([]*types.MessageItemUnion, len(ids))
-		for i, id := range ids {
-			out[i] = &types.MessageItemUnion{User: &types.MessageItemUser{ID: id}}
-		}
-		return out
-	}
-
-	It("removes the item with the given id", func() {
-		items, ok := deleteItem(mk("a", "b", "c"), "b")
-		Expect(ok).To(BeTrue())
-		Expect(len(items)).To(Equal(2))
-		Expect(itemID(items[0])).To(Equal("a"))
-		Expect(itemID(items[1])).To(Equal("c"))
-	})
-
-	It("reports not found for an unknown id", func() {
-		_, ok := deleteItem(mk("a"), "zzz")
-		Expect(ok).To(BeFalse())
-	})
-})
-
-var _ = Describe("clearInputAudio", func() {
-	It("resets the pending PCM and buffered Opus frames", func() {
-		s := &Session{InputAudioBuffer: []byte{1, 2, 3}, OpusFrames: [][]byte{{9}}}
-		clearInputAudio(s)
-		Expect(s.InputAudioBuffer).To(BeNil())
-		Expect(s.OpusFrames).To(BeNil())
-	})
-})
-
-var _ = Describe("truncateAssistantText", func() {
-	It("clears the text of the assistant content part at the index", func() {
-		items := []*types.MessageItemUnion{{Assistant: &types.MessageItemAssistant{
-			ID:      "a1",
-			Content: []types.MessageContentOutput{{Type: types.MessageContentTypeText, Text: "hello world"}},
-		}}}
-		ok := truncateAssistantText(items, "a1", 0)
-		Expect(ok).To(BeTrue())
-		Expect(items[0].Assistant.Content[0].Text).To(Equal(""))
-	})
-
-	// Realtime assistant *audio* turns store the spoken words in .Transcript, not
-	// .Text, so a barge-in truncate must clear .Transcript too or it would no-op.
-	It("clears the transcript of an assistant audio content part", func() {
-		items := []*types.MessageItemUnion{{Assistant: &types.MessageItemAssistant{
-			ID:      "a1",
-			Content: []types.MessageContentOutput{{Type: types.MessageContentTypeAudio, Transcript: "hello world"}},
-		}}}
-		ok := truncateAssistantText(items, "a1", 0)
-		Expect(ok).To(BeTrue())
-		Expect(items[0].Assistant.Content[0].Transcript).To(Equal(""))
-	})
-
-	It("returns false for an unknown id", func() {
-		Expect(truncateAssistantText(nil, "nope", 0)).To(BeFalse())
-	})
-})
-
-var _ = Describe("compactionCut", func() {
-	user := func(id string) *types.MessageItemUnion {
-		return &types.MessageItemUnion{User: &types.MessageItemUser{ID: id}}
-	}
-	call := func(id string) *types.MessageItemUnion {
-		return &types.MessageItemUnion{FunctionCall: &types.MessageItemFunctionCall{ID: id}}
-	}
-	out := func(id string) *types.MessageItemUnion {
-		return &types.MessageItemUnion{FunctionCallOutput: &types.MessageItemFunctionCallOutput{ID: id}}
-	}
-
-	It("cuts exactly len-keep when no pairs straddle the boundary", func() {
-		items := []*types.MessageItemUnion{user("1"), user("2"), user("3"), user("4")}
-		Expect(compactionCut(items, 2)).To(Equal(2))
-	})
-
-	It("returns 0 when nothing to cut", func() {
-		Expect(compactionCut([]*types.MessageItemUnion{user("1")}, 2)).To(Equal(0))
-	})
-
-	It("returns 0 (cuts nothing) when keep is 0 — the unlimited-window sentinel", func() {
-		items := []*types.MessageItemUnion{user("1"), user("2"), user("3")}
-		Expect(compactionCut(items, 0)).To(Equal(0))
-	})
-
-	It("moves the boundary so a call/output pair is not split", func() {
-		// keep=2 -> naive cut=2, but items[2] is the output of items[1]'s call;
-		// pull the cut right so the whole pair stays in the kept tail.
-		items := []*types.MessageItemUnion{user("1"), call("c"), out("c"), user("4")}
-		Expect(compactionCut(items, 2)).To(Equal(1))
-	})
-})
-
-var _ = Describe("withMemory", func() {
-	It("inserts a memory system message when memory is non-empty", func() {
-		base := schema.Messages{{Role: "system", StringContent: "instructions"}}
-		out := withMemory(base, "user is Bob; wants pizza")
-		Expect(len(out)).To(Equal(2))
-		Expect(out[1].Role).To(Equal("system"))
-		Expect(out[1].StringContent).To(ContainSubstring("user is Bob"))
-		Expect(out[1].StringContent).To(ContainSubstring("Summary of earlier conversation"))
-	})
-
-	It("is a no-op when memory is empty", func() {
-		base := schema.Messages{{Role: "system", StringContent: "instructions"}}
-		Expect(withMemory(base, "")).To(HaveLen(1))
-	})
-})
-
-var _ = Describe("renderItemsTranscript", func() {
-	It("renders user and assistant text turns", func() {
-		items := []*types.MessageItemUnion{
-			{User: &types.MessageItemUser{Content: []types.MessageContentInput{{Type: types.MessageContentTypeInputText, Text: "hi"}}}},
-			{Assistant: &types.MessageItemAssistant{Content: []types.MessageContentOutput{{Type: types.MessageContentTypeText, Text: "hello"}}}},
-		}
-		out := renderItemsTranscript(items)
-		Expect(out).To(ContainSubstring("user: hi"))
-		Expect(out).To(ContainSubstring("assistant: hello"))
-	})
-
-	// Realtime assistant *audio* turns store the spoken words in .Transcript, not
-	// .Text, so the transcript builder must emit .Transcript too or spoken turns
-	// would be dropped from the summary.
-	It("renders an assistant audio turn from its transcript", func() {
-		items := []*types.MessageItemUnion{
-			{Assistant: &types.MessageItemAssistant{Content: []types.MessageContentOutput{{Type: types.MessageContentTypeAudio, Transcript: "spoken words"}}}},
-		}
-		Expect(renderItemsTranscript(items)).To(ContainSubstring("assistant: spoken words"))
-	})
-})
-
-var _ = Describe("buildSummaryMessages", func() {
-	It("includes prior memory and the new transcript", func() {
-		msgs := buildSummaryMessages("prior facts", "user: hi", 512)
-		Expect(len(msgs)).To(Equal(2))
-		Expect(msgs[0].Role).To(Equal("system"))
-		Expect(msgs[1].StringContent).To(ContainSubstring("prior facts"))
-		Expect(msgs[1].StringContent).To(ContainSubstring("user: hi"))
-	})
-})
-
-var _ = Describe("compact", func() {
-	user := func(id, text string) *types.MessageItemUnion {
-		return &types.MessageItemUnion{User: &types.MessageItemUser{ID: id,
-			Content: []types.MessageContentInput{{Type: types.MessageContentTypeInputText, Text: text}}}}
-	}
-
-	It("summarizes overflow into Memory and evicts it, keeping the live tail", func() {
-		conv := &Conversation{Items: []*types.MessageItemUnion{
-			user("1", "a"), user("2", "b"), user("3", "c"), user("4", "d"),
-			user("5", "e"), user("6", "f"), user("7", "g"), user("8", "h"),
-		}}
-		s := &Session{CompactionEnabled: true, CompactionTrigger: 7, MaxHistoryItems: 4, MaxSummaryTokens: 512}
-		m := &fakeModel{predictResp: backend.LLMResponse{Response: "ROLLED UP"}}
-
-		s.compact(conv, m)
-
-		Expect(conv.Memory).To(Equal("ROLLED UP"))
-		Expect(len(conv.Items)).To(Equal(4))
-		Expect(itemID(conv.Items[0])).To(Equal("5"))
-		// The summarizer saw the evicted turns.
-		Expect(m.lastMessages[1].StringContent).To(ContainSubstring("a"))
-	})
-
-	It("leaves Items and Memory untouched when the summarizer errors", func() {
-		items := []*types.MessageItemUnion{user("1", "a"), user("2", "b"), user("3", "c")}
-		conv := &Conversation{Items: items}
-		s := &Session{CompactionEnabled: true, CompactionTrigger: 2, MaxHistoryItems: 1, MaxSummaryTokens: 512}
-		m := &fakeModel{predictErr: errors.New("boom")}
-
-		s.compact(conv, m)
-
-		Expect(conv.Memory).To(Equal(""))
-		Expect(len(conv.Items)).To(Equal(3))
-	})
-
-	It("strips leaked reasoning tags from the summary via the shared extractor", func() {
-		conv := &Conversation{Items: []*types.MessageItemUnion{
-			user("1", "a"), user("2", "b"), user("3", "c"), user("4", "d"),
-			user("5", "e"), user("6", "f"), user("7", "g"), user("8", "h"),
-		}}
-		s := &Session{CompactionEnabled: true, CompactionTrigger: 7, MaxHistoryItems: 4, MaxSummaryTokens: 512}
-		m := &fakeModel{predictResp: backend.LLMResponse{Response: "<think>planning the summary</think>CLEAN SUMMARY"}}
-
-		s.compact(conv, m)
-
-		Expect(conv.Memory).To(Equal("CLEAN SUMMARY"))
-		Expect(conv.Memory).ToNot(ContainSubstring("planning"))
-	})
-
-	It("does nothing when items are at or below the trigger", func() {
-		conv := &Conversation{Items: []*types.MessageItemUnion{user("1", "a")}}
-		s := &Session{CompactionEnabled: true, CompactionTrigger: 7, MaxHistoryItems: 4}
-		s.compact(conv, &fakeModel{predictResp: backend.LLMResponse{Response: "x"}})
-		Expect(conv.Memory).To(Equal(""))
-		Expect(len(conv.Items)).To(Equal(1))
-	})
-})
-
-var _ = Describe("prefixMatches", func() {
-	user := func(id string) *types.MessageItemUnion {
-		return &types.MessageItemUnion{User: &types.MessageItemUser{ID: id}}
-	}
-
-	It("matches when items begins with the snapshot ids in order", func() {
-		items := []*types.MessageItemUnion{user("1"), user("2"), user("3")}
-		snap := []*types.MessageItemUnion{user("1"), user("2")}
-		Expect(prefixMatches(items, snap)).To(BeTrue())
-	})
-
-	It("matches an empty snapshot", func() {
-		Expect(prefixMatches([]*types.MessageItemUnion{user("1")}, nil)).To(BeTrue())
-	})
-
-	It("fails when items is shorter than the snapshot (a concurrent delete shrank the head)", func() {
-		items := []*types.MessageItemUnion{user("1")}
-		snap := []*types.MessageItemUnion{user("1"), user("2")}
-		Expect(prefixMatches(items, snap)).To(BeFalse())
-	})
-
-	It("fails when the head ids differ (a concurrent delete reordered the head)", func() {
-		items := []*types.MessageItemUnion{user("2"), user("3")}
-		snap := []*types.MessageItemUnion{user("1"), user("2")}
-		Expect(prefixMatches(items, snap)).To(BeFalse())
-	})
-})
-
-var _ = Describe("summarizerModel", func() {
-	It("returns the pipeline model when no summary_model is set", func() {
-		m := &fakeModel{}
-		s := &Session{ModelInterface: m}
-		Expect(s.summarizerModel()).To(Equal(m))
-	})
-
-	It("uses the factory (once) when summary_model is set", func() {
-		pipeline := &fakeModel{}
-		small := &fakeModel{}
-		calls := 0
-		s := &Session{ModelInterface: pipeline, SummaryModel: "tiny",
-			summarizerFactory: func() (Model, error) { calls++; return small, nil }}
-		Expect(s.summarizerModel()).To(Equal(small))
-		Expect(s.summarizerModel()).To(Equal(small))
-		Expect(calls).To(Equal(1))
-	})
-
-	It("falls back to the pipeline model when the factory errors", func() {
-		pipeline := &fakeModel{}
-		s := &Session{ModelInterface: pipeline, SummaryModel: "tiny",
-			summarizerFactory: func() (Model, error) { return nil, errors.New("nope") }}
-		Expect(s.summarizerModel()).To(Equal(pipeline))
-	})
-})
-
-var _ = Describe("itemID", func() {
-	It("returns the id for each variant and empty for nil", func() {
-		Expect(itemID(nil)).To(Equal(""))
-		Expect(itemID(&types.MessageItemUnion{User: &types.MessageItemUser{ID: "u1"}})).To(Equal("u1"))
-		Expect(itemID(&types.MessageItemUnion{Assistant: &types.MessageItemAssistant{ID: "a1"}})).To(Equal("a1"))
-		Expect(itemID(&types.MessageItemUnion{System: &types.MessageItemSystem{ID: "s1"}})).To(Equal("s1"))
-		Expect(itemID(&types.MessageItemUnion{FunctionCall: &types.MessageItemFunctionCall{ID: "f1"}})).To(Equal("f1"))
-		Expect(itemID(&types.MessageItemUnion{FunctionCallOutput: &types.MessageItemFunctionCallOutput{ID: "o1"}})).To(Equal("o1"))
-	})
-})
--- a/core/http/react-ui/e2e/role-mode-adaptive.spec.js
+++ b/core/http/react-ui/e2e/role-mode-adaptive.spec.js
@@ -1,100 +0,0 @@
-import { test, expect } from './coverage-fixtures.js'
-
-// These specs stub /api/features and /api/auth/status per cell. The test server
-// disables auth (isAdmin=true) and reports its own features, so we intercept
-// before navigation to simulate each role x mode cell.
-
-function stubFeatures(page, features) {
-  return page.route('**/api/features', route =>
-    route.fulfill({ contentType: 'application/json', body: JSON.stringify(features) }))
-}
-
-function stubNoP2P(page) {
-  // P2P token endpoint returns empty -> p2pEnabled=false.
-  return page.route('**/api/p2p/token', route =>
-    route.fulfill({ contentType: 'text/plain', body: '' }))
-}
-
-test.describe('Adaptive landing (HomeRoute)', () => {
-  test('admin + distributed redirects /app to Nodes', async ({ page }) => {
-    await stubFeatures(page, { distributed: true })
-    await stubNoP2P(page)
-    await page.goto('/app')
-    await expect(page).toHaveURL(/\/app\/nodes$/)
-    await expect(page.locator('.page-title').first()).toBeVisible({ timeout: 15_000 })
-  })
-
-  test('admin + single-node stays on Home', async ({ page }) => {
-    await stubFeatures(page, { distributed: false })
-    await stubNoP2P(page)
-    await page.goto('/app')
-    await expect(page).toHaveURL(/\/app$/)
-    await expect(page.locator('.home-greeting')).toBeVisible({ timeout: 15_000 })
-  })
-})
-
-test.describe('Adaptive sidebar', () => {
-  test('distributed pins the Cluster group with Nodes at the top', async ({ page }) => {
-    await stubFeatures(page, { distributed: true })
-    await stubNoP2P(page)
-    await page.goto('/app/chat') // any in-app page so the sidebar is mounted
-    const pinned = page.locator('.sidebar-nav .sidebar-section-items').first()
-    await expect(pinned.getByText('Nodes', { exact: false })).toBeVisible({ timeout: 15_000 })
-  })
-
-  test('single-node does not pin a Cluster group', async ({ page }) => {
-    await stubFeatures(page, { distributed: false })
-    await stubNoP2P(page)
-    await page.goto('/app/chat')
-    // Nodes is reachable only via the Operate rail, not pinned at the top.
-    await expect(page.locator('.sidebar-nav')).toBeVisible({ timeout: 15_000 })
-    await expect(page.locator('.sidebar-nav .sidebar-section-items').first()
-      .getByText('Nodes', { exact: false })).toHaveCount(0)
-  })
-})
-
-test.describe('Top navbar', () => {
-  test('admin sees the mode pill and settings cog', async ({ page }) => {
-    await stubFeatures(page, { distributed: true })
-    await stubNoP2P(page)
-    await page.goto('/app/chat')
-    await expect(page.locator('.top-navbar__mode')).toBeVisible({ timeout: 15_000 })
-    await expect(page.locator('.top-navbar__icon[aria-label]')).not.toHaveCount(0)
-  })
-
-  test('admin-via-chat jump shows when localai_assistant is enabled', async ({ page }) => {
-    await stubFeatures(page, { distributed: false, localai_assistant: true })
-    await stubNoP2P(page)
-    await page.goto('/app/chat')
-    await expect(page.locator('.top-navbar__assistant')).toBeVisible({ timeout: 15_000 })
-  })
-
-  test('admin-via-chat jump hidden when localai_assistant is off', async ({ page }) => {
-    await stubFeatures(page, { distributed: false, localai_assistant: false })
-    await stubNoP2P(page)
-    await page.goto('/app/chat')
-    await expect(page.locator('.top-navbar__assistant')).toHaveCount(0)
-  })
-})
-
-test.describe('Token usage meter', () => {
-  test('renders when admin usage has data', async ({ page }) => {
-    await stubFeatures(page, { distributed: false })
-    await stubNoP2P(page)
-    await page.route('**/api/auth/admin/usage**', route =>
-      route.fulfill({ contentType: 'application/json',
-        body: JSON.stringify({ buckets: [{ total_tokens: 1234 }] }) }))
-    await page.goto('/app/chat')
-    await expect(page.locator('.top-navbar__meter')).toBeVisible({ timeout: 15_000 })
-  })
-
-  test('hidden when admin usage is empty (graceful degrade)', async ({ page }) => {
-    await stubFeatures(page, { distributed: false })
-    await stubNoP2P(page)
-    await page.route('**/api/auth/admin/usage**', route =>
-      route.fulfill({ contentType: 'application/json', body: JSON.stringify({ buckets: [] }) }))
-    await page.goto('/app/chat')
-    await expect(page.locator('.top-navbar')).toBeVisible({ timeout: 15_000 })
-    await expect(page.locator('.top-navbar__meter')).toHaveCount(0)
-  })
-})
--- a/core/http/react-ui/public/locales/en/nav.json
+++ b/core/http/react-ui/public/locales/en/nav.json
@@ -12,16 +12,6 @@
  "accountSettings": "Account settings",
  "account": "Account",
  "accountFor": "Account: {{name}}",
-  "topbar": {
-    "label": "Top bar",
-    "modeDistributed": "Distributed",
-    "modeSwarm": "Swarm",
-    "modeSingle": "Single-node",
-    "pickModel": "Models",
-    "adminViaChat": "Admin via chat",
-    "tokensToday": "Tokens today",
-    "usageDetail": "View usage detail"
-  },
  "sections": {
    "create": "Create",
    "recognition": "Recognition",
--- a/core/http/react-ui/src/App.css
+++ b/core/http/react-ui/src/App.css
@@ -184,50 +184,6 @@
  font-size: 1.5rem;
 }

-/* Desktop top bar: deployment + admin affordances on wide screens. Hidden on
-   mobile, where .mobile-header carries the equivalent actions. */
-.top-navbar {
-  display: flex;
-  align-items: center;
-  justify-content: space-between;
-  gap: var(--spacing-md);
-  padding: var(--spacing-sm) var(--spacing-lg);
-  border-bottom: 1px solid var(--color-border-default);
-  background: var(--color-bg-secondary);
-}
-.top-navbar__right { display: flex; align-items: center; gap: var(--spacing-sm); }
-.top-navbar__mode {
-  font-size: 0.75rem;
-  padding: 2px 10px;
-  border-radius: 999px;
-  border: 1px solid var(--color-border-default);
-  color: var(--color-text-secondary);
-}
-.top-navbar__mode.is-active { color: var(--color-success); border-color: var(--color-success); }
-.top-navbar__btn {
-  display: inline-flex; align-items: center; gap: 6px;
-  font-size: 0.8125rem; padding: 5px 10px; border-radius: 8px;
-  border: 1px solid var(--color-border-default); background: var(--color-bg-tertiary);
-  color: var(--color-text-primary); cursor: pointer;
-}
-.top-navbar__icon {
-  width: 32px; height: 32px; display: inline-flex; align-items: center;
-  justify-content: center; border-radius: 8px; border: 1px solid var(--color-border-default);
-  background: var(--color-bg-tertiary); color: var(--color-text-secondary); cursor: pointer;
-}
-.top-navbar__avatar img { width: 100%; height: 100%; border-radius: 50%; object-fit: cover; }
-.top-navbar__meter {
-  display: inline-flex; flex-direction: column; gap: 3px; align-items: flex-start;
-  padding: 4px 10px; border-radius: 8px; border: 1px solid var(--color-border-default);
-  background: var(--color-bg-tertiary); cursor: pointer; min-width: 150px;
-}
-.top-navbar__meter-label { font-size: 0.6875rem; color: var(--color-text-secondary); }
-.top-navbar__meter-bar { width: 100%; height: 5px; border-radius: 3px; background: var(--color-bg-secondary); overflow: hidden; }
-.top-navbar__meter-bar i { display: block; height: 100%; background: var(--color-primary); }
-@media (max-width: 639px) {
-  .top-navbar { display: none; }
-}
-
 /* Sidebar */
 .sidebar {
  position: fixed;
--- a/core/http/react-ui/src/App.jsx
+++ b/core/http/react-ui/src/App.jsx
@@ -3,7 +3,6 @@ import { Outlet, useLocation, useNavigate } from 'react-router-dom'
 import { useTranslation } from 'react-i18next'
 import Sidebar from './components/Sidebar'
 import OperationsBar from './components/OperationsBar'
-import TopNavbar from './components/TopNavbar'
 import { ToastContainer, useToast } from './components/Toast'
 import { systemApi } from './utils/api'
 import { useTheme } from './contexts/ThemeContext'
@@ -99,7 +98,6 @@ export default function App() {
      <Sidebar isOpen={sidebarOpen} onClose={() => setSidebarOpen(false)} />
      <main className="main-content" {...(sidebarOpen ? { 'aria-hidden': 'true', inert: '' } : {})}>
        <OperationsBar />
-        <TopNavbar />
        {/* Mobile header — primary actions reachable without opening the
            drawer. Hamburger is the only way to expand the nav on phones;
            theme toggle and account avatar are mirrored from the sidebar
--- a/core/http/react-ui/src/components/HomeRoute.jsx
+++ b/core/http/react-ui/src/components/HomeRoute.jsx
@@ -1,28 +0,0 @@
-import { lazy, Suspense } from 'react'
-import { Navigate } from 'react-router-dom'
-import { useAuth } from '../context/AuthContext'
-import { useDeployment } from '../contexts/DeploymentContext'
-import { resolveHome } from '../utils/resolveHome'
-import RouteFallback from './RouteFallback'
-
-const Home = lazy(() => import('../pages/Home'))
-
-// Index-route element. Waits for auth + deployment signals to load (so we never
-// flash the wrong landing), then either renders Home or redirects to the cell's
-// landing page. Redirecting (rather than rendering Nodes/Chat inline at /app)
-// keeps each target's own route guard, active-nav state, and deep-linkability.
-export default function HomeRoute() {
-  const { isAdmin, loading: authLoading } = useAuth()
-  const { distributed, p2pEnabled, loading: deployLoading } = useDeployment()
-
-  if (authLoading || deployLoading) return <RouteFallback />
-
-  const target = resolveHome({ isAdmin, distributed, p2pEnabled })
-  if (target) return <Navigate to={target} replace />
-
-  return (
-    <Suspense fallback={<RouteFallback />}>
-      <Home />
-    </Suspense>
-  )
-}
--- a/core/http/react-ui/src/components/Sidebar.jsx
+++ b/core/http/react-ui/src/components/Sidebar.jsx
@@ -5,11 +5,9 @@ import ThemeToggle from './ThemeToggle'
 import LanguageSwitcher from './LanguageSwitcher'
 import { useAuth } from '../context/AuthContext'
 import { useBranding } from '../contexts/BrandingContext'
-import { useDeployment } from '../contexts/DeploymentContext'
 import { apiUrl } from '../utils/basePath'
 import { preloadRoute } from '../router'
 import { consoles, firstVisiblePath, consolePaths } from './console/consoleConfig'
-import { clusterPinItems, shouldCollapseCreate } from '../utils/sidebarPolicy'

 const COLLAPSED_KEY = 'localai_sidebar_collapsed'
 const SECTIONS_KEY = 'localai_sidebar_sections'
@@ -60,13 +58,11 @@ function NavItem({ item, onClose, collapsed }) {
  )
 }

-function loadSectionState(collapseCreate = false) {
-  // Tiers render expanded by default; users can collapse any tier and the
-  // choice persists (stored values override defaults). In cluster cells we
-  // start Create collapsed so the pinned cluster group leads - but only when
-  // the user has not already expressed a preference.
+function loadSectionState() {
+  // Tiers render expanded by default (the redesign favours showing the few
+  // intent groups up front); users can still collapse any tier and the choice
+  // is persisted. Stored values override the defaults so a saved collapse wins.
  const defaults = Object.fromEntries(sections.map(s => [s.id, true]))
-  if (collapseCreate) defaults.create = false
  try {
    const stored = localStorage.getItem(SECTIONS_KEY)
    return stored ? { ...defaults, ...JSON.parse(stored) } : defaults
@@ -81,34 +77,20 @@ function saveSectionState(state) {

 export default function Sidebar({ isOpen, onClose }) {
  const { t } = useTranslation('nav')
-  const { isAdmin, authEnabled, user, logout, hasFeature } = useAuth()
-  // Deployment shape (server features + p2p) drives the adaptive sidebar; the
-  // shared context replaces the sidebar's own /api/features fetch so the
-  // landing resolver, navbar, and this policy agree on one snapshot.
-  const deployment = useDeployment()
-  const features = deployment.features
-  // Shared shape for the console gating helpers (consoleConfig.js); in scope for
-  // both the pinned cluster group and the console-tier rendering below.
-  const auth = { isAdmin, authEnabled, hasFeature, features }
-  const collapseCreate = shouldCollapseCreate(auth, deployment)
+  const [features, setFeatures] = useState({})
  const [collapsed, setCollapsed] = useState(() => {
    try { return localStorage.getItem(COLLAPSED_KEY) === 'true' } catch (_) { return false }
  })
  const [openSections, setOpenSections] = useState(loadSectionState)
+  const { isAdmin, authEnabled, user, logout, hasFeature } = useAuth()
  const branding = useBranding()
  const navigate = useNavigate()
  const location = useLocation()
  const closeBtnRef = useRef(null)

-  // Apply the cluster-cell Create-collapse default once, only when the user has
-  // no stored section preference (so we never override an explicit choice).
  useEffect(() => {
-    if (deployment.loading) return
-    let hasStored = false
-    try { hasStored = !!localStorage.getItem(SECTIONS_KEY) } catch { hasStored = false }
-    if (hasStored || !collapseCreate) return
-    setOpenSections(prev => (prev.create === false ? prev : { ...prev, create: false }))
-  }, [deployment.loading, collapseCreate])
+    fetch(apiUrl('/api/features')).then(r => r.json()).then(setFeatures).catch(() => {})
+  }, [])

  // Stay in sync with external collapse dispatches (e.g. the chat
  // page's focus mode). The collapse-toggle button still owns the
@@ -175,6 +157,8 @@ export default function Sidebar({ isOpen, onClose }) {
  }

  const visibleTopItems = topItems.filter(filterItem)
+  // Shared shape for the console gating helpers (consoleConfig.js).
+  const auth = { isAdmin, authEnabled, hasFeature, features }

  // Inline sections (Create) carry no gating; a plain filterItem pass suffices.
  const getVisibleSectionItems = (section) => section.items.filter(filterItem)
@@ -215,28 +199,6 @@ export default function Sidebar({ isOpen, onClose }) {
            ))}
          </div>

-          {/* Pinned Cluster quick-access (admin + distributed/p2p). Same gate
-              as the Operate rail; surfaced at the top for cluster operators. */}
-          {(() => {
-            const pinned = clusterPinItems(auth, deployment)
-            if (pinned.length === 0) return null
-            return (
-              <div className="sidebar-section">
-                <div className="sidebar-section-title">{t('operate.cluster')}</div>
-                <div className="sidebar-section-items">
-                  {pinned.map(item => (
-                    <NavItem
-                      key={item.path}
-                      item={{ path: item.path, icon: item.icon, labelKey: item.labelKey }}
-                      onClose={onClose}
-                      collapsed={collapsed}
-                    />
-                  ))}
-                </div>
-              </div>
-            )
-          })()}
-
          {/* Collapsible sections */}
          {sections.map(section => {
            const visibleItems = getVisibleSectionItems(section)
--- a/core/http/react-ui/src/components/TopNavbar.jsx
+++ b/core/http/react-ui/src/components/TopNavbar.jsx
@@ -1,96 +0,0 @@
-import { useNavigate } from 'react-router-dom'
-import { useTranslation } from 'react-i18next'
-import { useAuth } from '../context/AuthContext'
-import { useDeployment } from '../contexts/DeploymentContext'
-import { useTheme } from '../contexts/ThemeContext'
-import { launchAssistantChat } from '../utils/launchAssistantChat'
-import TokenUsageMeter from './navbar/TokenUsageMeter'
-
-// Desktop top bar. Complementary to the mobile-only header in App.jsx: this is
-// hidden on small screens (see .top-navbar CSS) and shows deployment/admin
-// affordances on wide screens where the sidebar footer is far from the content.
-export default function TopNavbar() {
-  const { t } = useTranslation('nav')
-  const navigate = useNavigate()
-  const { isAdmin, authEnabled, user } = useAuth()
-  const { features, distributed, p2pEnabled } = useDeployment()
-  const { theme, toggleTheme } = useTheme()
-
-  const modeLabel = distributed
-    ? t('topbar.modeDistributed')
-    : p2pEnabled
-      ? t('topbar.modeSwarm')
-      : t('topbar.modeSingle')
-
-  const showAssistantJump = isAdmin && !!features.localai_assistant
-  const showAvatar = authEnabled && user
-  const themeLabel = theme === 'dark' ? t('switchToLightMode') : t('switchToDarkMode')
-
-  return (
-    <div className="top-navbar" role="navigation" aria-label={t('topbar.label')}>
-      <div className="top-navbar__left">
-        {isAdmin && (
-          <span className={`top-navbar__mode ${distributed || p2pEnabled ? 'is-active' : ''}`}>
-            <i className="fas fa-circle-nodes" aria-hidden="true" /> {modeLabel}
-          </span>
-        )}
-      </div>
-      <div className="top-navbar__right">
-        {!isAdmin && (
-          <button
-            type="button"
-            className="top-navbar__btn"
-            onClick={() => navigate('/app/chat')}
-            title={t('topbar.pickModel')}
-          >
-            <i className="fas fa-cube" aria-hidden="true" /> {t('topbar.pickModel')}
-          </button>
-        )}
-        {showAssistantJump && (
-          <button
-            type="button"
-            className="top-navbar__btn top-navbar__assistant"
-            onClick={() => launchAssistantChat(navigate)}
-            title={t('topbar.adminViaChat')}
-          >
-            <i className="fas fa-user-shield" aria-hidden="true" /> {t('topbar.adminViaChat')}
-          </button>
-        )}
-        {isAdmin && <TokenUsageMeter />}
-        {isAdmin && (
-          <button
-            type="button"
-            className="top-navbar__icon"
-            onClick={() => navigate('/app/settings')}
-            aria-label={t('items.settings')}
-            title={t('items.settings')}
-          >
-            <i className="fas fa-cog" aria-hidden="true" />
-          </button>
-        )}
-        <button
-          type="button"
-          className="top-navbar__icon"
-          onClick={toggleTheme}
-          aria-label={themeLabel}
-          title={themeLabel}
-        >
-          <i className={`fas ${theme === 'dark' ? 'fa-sun' : 'fa-moon'}`} aria-hidden="true" />
-        </button>
-        {showAvatar && (
-          <button
-            type="button"
-            className="top-navbar__icon top-navbar__avatar"
-            onClick={() => navigate('/app/account')}
-            aria-label={user.name || user.email}
-            title={user.name || user.email}
-          >
-            {user.avatarUrl
-              ? <img src={user.avatarUrl} alt="" />
-              : <i className="fas fa-user-circle" aria-hidden="true" />}
-          </button>
-        )}
-      </div>
-    </div>
-  )
-}
--- a/core/http/react-ui/src/components/navbar/TokenUsageMeter.jsx
+++ b/core/http/react-ui/src/components/navbar/TokenUsageMeter.jsx
@@ -1,52 +0,0 @@
-import { useState, useEffect } from 'react'
-import { useNavigate } from 'react-router-dom'
-import { useTranslation } from 'react-i18next'
-import { usageApi } from '../../utils/api'
-
-// Compact admin-only usage glance: today's total tokens, optionally against a
-// quota cap, linking to the full /app/usage page. Self-contained data fetch so
-// a usage-API failure cannot break the navbar - it just renders nothing.
-function sumTotalTokens(res) {
-  const buckets = res?.buckets || res?.usage || (Array.isArray(res) ? res : [])
-  if (!Array.isArray(buckets) || buckets.length === 0) return null
-  return buckets.reduce((s, b) => s + (b.total_tokens || 0), 0)
-}
-
-export default function TokenUsageMeter() {
-  const { t } = useTranslation('nav')
-  const navigate = useNavigate()
-  const [tokens, setTokens] = useState(null)
-  const [cap, setCap] = useState(null)
-
-  useEffect(() => {
-    let cancelled = false
-    usageApi.getAdminUsage('day')
-      .then(res => { if (!cancelled) setTokens(sumTotalTokens(res)) })
-      .catch(() => { if (!cancelled) setTokens(null) })
-    usageApi.getMyQuotas()
-      .then(q => { if (!cancelled) setCap(q?.token_limit || q?.tokens?.limit || null) })
-      .catch(() => { if (!cancelled) setCap(null) })
-    return () => { cancelled = true }
-  }, [])
-
-  if (tokens === null) return null
-
-  const pct = cap ? Math.min(100, Math.round((tokens / cap) * 100)) : null
-
-  return (
-    <button
-      type="button"
-      className="top-navbar__meter"
-      onClick={() => navigate('/app/usage')}
-      title={t('topbar.usageDetail')}
-    >
-      <span className="top-navbar__meter-label">
-        {t('topbar.tokensToday')}: {Intl.NumberFormat().format(tokens)}
-        {cap ? ` / ${Intl.NumberFormat().format(cap)}` : ''}
-      </span>
-      {pct !== null && (
-        <span className="top-navbar__meter-bar"><i style={{ width: `${pct}%` }} /></span>
-      )}
-    </button>
-  )
-}
--- a/core/http/react-ui/src/contexts/DeploymentContext.jsx
+++ b/core/http/react-ui/src/contexts/DeploymentContext.jsx
@@ -1,55 +0,0 @@
-import { createContext, useContext, useState, useEffect } from 'react'
-import { apiUrl } from '../utils/basePath'
-import { p2pApi } from '../utils/api'
-
-const DeploymentContext = createContext(null)
-
-// One shared fetch of the deployment-shape signals the adaptive UI keys off:
-// server features (/api/features) and whether a P2P network token exists.
-// Components used to fetch /api/features independently (Sidebar, Home); this
-// centralises it so the landing resolver, sidebar policy, and navbar agree on
-// one snapshot and we issue a single request.
-export function DeploymentProvider({ children }) {
-  const [features, setFeatures] = useState({})
-  const [p2pEnabled, setP2pEnabled] = useState(false)
-  const [loading, setLoading] = useState(true)
-
-  useEffect(() => {
-    let cancelled = false
-    const featuresP = fetch(apiUrl('/api/features'))
-      .then(r => r.json())
-      .catch(() => ({}))
-    // P2P has no /api/features flag: it is "enabled" when a network token
-    // exists (mirrors pages/P2P.jsx). A 404/disabled endpoint throws and we
-    // treat that as not-enabled.
-    const p2pP = p2pApi.getToken()
-      .then(tok => (typeof tok === 'string' ? tok : (tok?.token || '')).trim())
-      .catch(() => '')
-    Promise.all([featuresP, p2pP]).then(([f, tok]) => {
-      if (cancelled) return
-      setFeatures(f || {})
-      setP2pEnabled(!!tok)
-      setLoading(false)
-    })
-    return () => { cancelled = true }
-  }, [])
-
-  const value = {
-    features,
-    distributed: !!features.distributed,
-    p2pEnabled,
-    loading,
-  }
-
-  return (
-    <DeploymentContext.Provider value={value}>
-      {children}
-    </DeploymentContext.Provider>
-  )
-}
-
-export function useDeployment() {
-  const ctx = useContext(DeploymentContext)
-  if (!ctx) throw new Error('useDeployment must be used within DeploymentProvider')
-  return ctx
-}
--- a/core/http/react-ui/src/main.jsx
+++ b/core/http/react-ui/src/main.jsx
@@ -4,7 +4,6 @@ import { RouterProvider } from 'react-router-dom'
 import { ThemeProvider } from './contexts/ThemeContext'
 import { BrandingProvider } from './contexts/BrandingContext'
 import { AuthProvider } from './context/AuthContext'
-import { DeploymentProvider } from './contexts/DeploymentContext'
 import { OperationsProvider } from './contexts/OperationsContext'
 import { router } from './router'
 import './i18n'
@@ -33,11 +32,9 @@ createRoot(document.getElementById('root')).render(
      <ThemeProvider>
        <BrandingProvider>
          <AuthProvider>
-            <DeploymentProvider>
-              <OperationsProvider>
-                <RouterProvider router={router} />
-              </OperationsProvider>
-            </DeploymentProvider>
+            <OperationsProvider>
+              <RouterProvider router={router} />
+            </OperationsProvider>
          </AuthProvider>
        </BrandingProvider>
      </ThemeProvider>
--- a/core/http/react-ui/src/pages/Chat.jsx
+++ b/core/http/react-ui/src/pages/Chat.jsx
@@ -541,73 +541,58 @@ export default function Chat() {
    updateChatSettings(activeChat.id, { clientMCPServers: next })
  }, [activeChat, updateChatSettings])

-  // Load initial message / assistant launch from the Home page or the navbar
-  // quick-jump. Factored into a callback so both the mount-time reader and the
-  // navbar re-trigger event below consume the same payload through one path.
+  // Load initial message from home page
  const homeDataProcessed = useRef(false)
-  const consumeHomeChatData = useCallback(() => {
-    const stored = localStorage.getItem('localai_index_chat_data')
-    if (!stored) return
-    try {
-      const data = JSON.parse(stored)
-      localStorage.removeItem('localai_index_chat_data')
-
-      // Two entry shapes from Home:
-      //   - "compose-and-send": data.message present → open new chat,
-      //     prefill the composer, click submit.
-      //   - "open-assistant": no message, just data.localaiAssistant → open
-      //     a fresh chat already in admin mode so the wizard can fire.
-      const hasMessage = !!data.message
-      const wantsAssistant = !!data.localaiAssistant
-
-      if (hasMessage || wantsAssistant) {
-        let targetChat = activeChat
-        if (data.newChat) {
-          targetChat = addChat(data.model || '', '', data.mcpMode || false)
-        } else {
-          if (data.model && activeChat) {
-            updateChatSettings(activeChat.id, { model: data.model })
-          }
-          if (data.mcpMode && activeChat) {
-            updateChatSettings(activeChat.id, { mcpMode: true })
-          }
-        }
-        if (data.mcpServers?.length > 0 && targetChat) {
-          updateChatSettings(targetChat.id, { mcpServers: data.mcpServers })
-        }
-        if (data.clientMCPServers?.length > 0 && targetChat) {
-          updateChatSettings(targetChat.id, { clientMCPServers: data.clientMCPServers })
-        }
-        if (wantsAssistant && targetChat) {
-          updateChatSettings(targetChat.id, { localaiAssistant: true })
-        }
-        if (hasMessage) {
-          setInput(data.message)
-          if (data.files) setFiles(data.files)
-          setTimeout(() => {
-            const submitBtn = document.getElementById('chat-submit-btn')
-            submitBtn?.click()
-          }, 100)
-        }
-      }
-    } catch (_e) { /* ignore */ }
-  }, [activeChat, addChat, updateChatSettings])
-
  useEffect(() => {
    if (homeDataProcessed.current) return
-    homeDataProcessed.current = true
-    consumeHomeChatData()
-  }, [consumeHomeChatData])
+    const stored = localStorage.getItem('localai_index_chat_data')
+    if (stored) {
+      homeDataProcessed.current = true
+      try {
+        const data = JSON.parse(stored)
+        localStorage.removeItem('localai_index_chat_data')

-  // Admins can re-trigger the assistant jump from the navbar while already on
-  // the chat page; navigate('/app/chat') does not remount Chat, so the
-  // mount-time reader above never fires. The launcher dispatches this event
-  // after writing the payload so we re-consume it and open a fresh assistant.
-  useEffect(() => {
-    const onOpenAssistant = () => consumeHomeChatData()
-    window.addEventListener('localai-open-assistant', onOpenAssistant)
-    return () => window.removeEventListener('localai-open-assistant', onOpenAssistant)
-  }, [consumeHomeChatData])
+        // Two entry shapes from Home:
+        //   - "compose-and-send": data.message present → open new chat,
+        //     prefill the composer, click submit.
+        //   - "open-assistant": no message, just data.localaiAssistant → open
+        //     a fresh chat already in admin mode so the wizard can fire.
+        const hasMessage = !!data.message
+        const wantsAssistant = !!data.localaiAssistant
+
+        if (hasMessage || wantsAssistant) {
+          let targetChat = activeChat
+          if (data.newChat) {
+            targetChat = addChat(data.model || '', '', data.mcpMode || false)
+          } else {
+            if (data.model && activeChat) {
+              updateChatSettings(activeChat.id, { model: data.model })
+            }
+            if (data.mcpMode && activeChat) {
+              updateChatSettings(activeChat.id, { mcpMode: true })
+            }
+          }
+          if (data.mcpServers?.length > 0 && targetChat) {
+            updateChatSettings(targetChat.id, { mcpServers: data.mcpServers })
+          }
+          if (data.clientMCPServers?.length > 0 && targetChat) {
+            updateChatSettings(targetChat.id, { clientMCPServers: data.clientMCPServers })
+          }
+          if (wantsAssistant && targetChat) {
+            updateChatSettings(targetChat.id, { localaiAssistant: true })
+          }
+          if (hasMessage) {
+            setInput(data.message)
+            if (data.files) setFiles(data.files)
+            setTimeout(() => {
+              const submitBtn = document.getElementById('chat-submit-btn')
+              submitBtn?.click()
+            }, 100)
+          }
+        }
+      } catch (_e) { /* ignore */ }
+    }
+  }, [])

  // Track whether the user is pinned to the bottom. If they scroll up
  // while a response is streaming, stop forcing them back down.
--- a/core/http/react-ui/src/pages/Home.jsx
+++ b/core/http/react-ui/src/pages/Home.jsx
@@ -13,7 +13,6 @@ import { useResources } from '../hooks/useResources'
 import { fileToBase64, backendControlApi, systemApi, modelsApi, mcpApi, nodesApi } from '../utils/api'
 import { API_CONFIG } from '../utils/config'
 import { greetingKey } from '../utils/greeting'
-import { launchAssistantChat } from '../utils/launchAssistantChat'
 import StatusPill from '../components/StatusPill'
 import Skeleton from '../components/Skeleton'
 import SectionHeading from '../components/SectionHeading'
@@ -229,8 +228,16 @@ export default function Home() {
  // requiring an initial message or model selection. Useful when an admin
  // wants to start the assistant from a cold home page.
  const openAssistantChat = useCallback(() => {
-    launchAssistantChat(navigate, selectedModel)
+    const chatData = {
+      model: selectedModel || '',
+      mcpMode: false,
+      localaiAssistant: true,
+      newChat: true,
+    }
+    localStorage.setItem('localai_index_chat_data', JSON.stringify(chatData))
+    try { localStorage.setItem('localai_assistant_used', '1') } catch { /* ignore */ }
    setAssistantUsed(true)
+    navigate('/app/chat')
  }, [navigate, selectedModel])

  const handleSubmit = (e) => {
--- a/core/http/react-ui/src/router.jsx
+++ b/core/http/react-ui/src/router.jsx
@@ -6,7 +6,6 @@ import RequireAdmin from './components/RequireAdmin'
 import RequireAuth from './components/RequireAuth'
 import RequireAuthEnabled from './components/RequireAuthEnabled'
 import RequireFeature from './components/RequireFeature'
-import HomeRoute from './components/HomeRoute'

 // Pages are code-split: each becomes its own chunk loaded on demand, so a route
 // no longer drags every other page (and its heavy deps — CodeMirror, the MCP
@@ -33,7 +32,7 @@ export function preloadRoute(path) {
  preloaders[m[1] ?? '']?.().catch(() => { /* network blip — real click will retry */ })
 }

-page('', () => import('./pages/Home'))
+const Home = page('', () => import('./pages/Home'))
 const Chat = page('chat', () => import('./pages/Chat'))
 const Models = page('models', () => import('./pages/Models'))
 const Manage = page('manage', () => import('./pages/Manage'))
@@ -97,7 +96,7 @@ function Feature({ feature, children }) {
 }

 const appChildren = [
-  { index: true, element: <HomeRoute /> },
+  { index: true, element: <Home /> },
  { path: 'chat', element: <Chat /> },
  { path: 'chat/:model', element: <Chat /> },
  { path: 'image', element: <ImageGen /> },
--- a/core/http/react-ui/src/utils/launchAssistantChat.js
+++ b/core/http/react-ui/src/utils/launchAssistantChat.js
@@ -1,19 +0,0 @@
-// Opens a fresh chat already in LocalAI Assistant ("manage") mode. Chat.jsx
-// reads localai_index_chat_data on mount and enables localaiAssistant for the
-// new chat. Shared by the Home CTA and the top navbar quick-jump so there is
-// one definition of how the assistant is launched.
-export function launchAssistantChat(navigate, model = '') {
-  const chatData = {
-    model: model || '',
-    mcpMode: false,
-    localaiAssistant: true,
-    newChat: true,
-  }
-  try { localStorage.setItem('localai_index_chat_data', JSON.stringify(chatData)) } catch { /* ignore */ }
-  try { localStorage.setItem('localai_assistant_used', '1') } catch { /* ignore */ }
-  navigate('/app/chat')
-  // When already on /app/chat, navigate() does not remount Chat, so its
-  // mount-time reader would never see the payload above. Signal the mounted
-  // Chat to re-consume it; harmless elsewhere since Chat reads on mount anyway.
-  try { window.dispatchEvent(new CustomEvent('localai-open-assistant')) } catch { /* ignore */ }
-}
--- a/core/http/react-ui/src/utils/resolveHome.js
+++ b/core/http/react-ui/src/utils/resolveHome.js
@@ -1,11 +0,0 @@
-// Pure landing-page resolver for the index route. Returns a target path, or ''
-// meaning "render the default Home". Admin precedence is distributed > p2p >
-// plain; non-admins always go to Chat (distributed/p2p are admin-only and
-// invisible to them). Visibility gates are enforced elsewhere - this only
-// chooses where /app lands.
-export function resolveHome({ isAdmin, distributed, p2pEnabled }) {
-  if (!isAdmin) return '/app/chat'
-  if (distributed) return '/app/nodes'
-  if (p2pEnabled) return '/app/p2p'
-  return ''
-}
--- a/core/http/react-ui/src/utils/sidebarPolicy.js
+++ b/core/http/react-ui/src/utils/sidebarPolicy.js
@@ -1,20 +0,0 @@
-import { operateConsole, isConsoleItemVisible } from '../components/console/consoleConfig'
-
-// The Operate > Cluster group, surfaced as a pinned top-of-sidebar quick-access
-// group when the admin is running a cluster (NATS-distributed) or a P2P swarm.
-// Items are filtered through the SAME gate as everywhere else, so e.g. in a
-// p2p-only deployment Nodes/Scheduling (feature: 'distributed') drop out and
-// only Swarm remains. Returns [] when the pin does not apply.
-export function clusterPinItems(auth, deployment) {
-  if (!auth.isAdmin) return []
-  if (!deployment.distributed && !deployment.p2pEnabled) return []
-  const group = operateConsole.groups.find(g => g.titleKey === 'operate.cluster')
-  if (!group) return []
-  return group.items.filter(item => isConsoleItemVisible(item, auth))
-}
-
-// In the cluster cells the Create group defaults collapsed so the pinned
-// cluster group leads. Users can still expand it; their stored choice wins.
-export function shouldCollapseCreate(auth, deployment) {
-  return !!auth.isAdmin && (!!deployment.distributed || !!deployment.p2pEnabled)
-}
--- a/docs/content/features/openai-realtime.md
+++ b/docs/content/features/openai-realtime.md
@@ -68,33 +68,6 @@ pipeline:

 This is applied only to the realtime session's copy of the LLM config, so it does not affect other users of the same model. Leave it unset to use the LLM model config's own reasoning settings.

-### Conversation compaction (long sessions on CPU)
-
-By default a realtime session feeds only the last `max_history_items` turns to the LLM; older turns are dropped and forgotten. On CPU, long calls also grow expensive as the prompt fills with verbatim history. Enable `compaction` to instead fold older turns into a rolling summary, so long calls stay cheap without losing earlier context.
-
-Compaction works with two numbers:
-
- **`max_history_items`** is the *live window* — the recent turns kept verbatim in the prompt.
- **`compaction.trigger_items`** is the *high-water mark* — let the buffer grow to here, then summarize the overflow (everything above `max_history_items`) into a rolling memory and evict it. It must be greater than `max_history_items`; if it is not, it is clamped up.
-
-The gap between the two controls how often summarization runs: a summary call fires roughly every `(trigger_items - max_history_items)` turns (here, about every 6 turns).
-
-```yaml
-pipeline:
-  max_history_items: 6        # live window — recent turns kept verbatim
-  compaction:
-    enabled: true
-    trigger_items: 12         # summarize overflow back down to max_history_items
-    summary_model: ""         # optional: a small model for the summary (CPU); default = pipeline LLM
-    max_summary_tokens: 512
-```
-
-{{% notice tip %}}
-On CPU, set `summary_model` to a small, fast model so compaction never competes with the conversation LLM for compute. Left empty, the pipeline's own LLM produces the summary.
-{{% /notice %}}
-
-Clients can also manage history directly via the now-supported `conversation.item.delete`, `conversation.item.truncate`, and `input_audio_buffer.clear` realtime events.
-
 ## Transports

 The Realtime API supports two transports: **WebSocket** and **WebRTC**.