Files
LocalAI/core/http/react-ui/e2e/model-config.spec.js
Richard Palethorpe 085fc53bbc fix(router): production-ready request router + auto-size batch for embedding/rerank (#10104)
* fix(router): score classifier production-readiness

Conversation trimming runs through the classifier model's chat template
and trims by exact token count, sized to the model's n_batch which is
now scaled to context so long probes can't crash the backend. Missing
chat_message templates are a hard error at router build time. Router-
facing factories (Embedder/Scorer/Reranker/TokenCounter) re-resolve
ModelConfig per call so a model installed post-startup doesn't bind a
stub Backend="" config and silently fall into the loader's auto-
iterate path.

New 'vector_store' backend trace recorded inside localVectorStore on
every Search/Insert — including the backend-load-failure path that
previously vanished into an xlog.Warn — with outcome tagging
(hit/miss/empty_store/backend_load_error/find_error/insert_error/ok).
Companion cleanup drops misleading similarity:0 and input_tokens_count:0
from non-hit and text-mode traces.

Gallery local-store-development aliases to 'local-store' so the master
image satisfies pkg/model.LocalStoreBackend lookups from the embedding
cache.

Misc: llama-cpp TokenizeString reads the correct 'prompt' JSON key
(the original bug); ModelTokenize nil-guard; non-fatal mitm proxy
startup; PII 'route_local' renamed to 'allow' with docs/UI in sync;
model-editor footer no longer eats the edit area on small screens;
several config-editor template/dropdown/section fixes.

Tests: e2e router specs (casual/code-hint + long-conversation trim),
vector_store trace specs, lazy-factory specs, gallery dev-alias
resolution, Playwright trace badge + scroll regression.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* feat(backend): auto-size batch to context for embedding and rerank models

Embedding and rerank models pool over the whole input in a single physical batch (n_ubatch). With batch left at the 512 default, the backend rejects longer inputs with "input is too large to process", silently capping a large-context embedder (e.g. 8k/32k) at 512 tokens. Size n_batch to the context for these single-pass usecases, mirroring the existing FLAG_SCORE behaviour; an explicit batch: still wins.

Extracts EffectiveContextSize/EffectiveBatchSize from grpcModelOpts so the effective decode window has one home for other callers to reuse.

Adds an e2e-aio regression test that embeds a >512-token input. The AIO embedding model is switched to nomic-embed-text-v1.5 (2048 context) because the previous granite model was capped at 512 tokens and could not exercise the larger batch.

Assisted-by: claude-code:claude-opus-4-8 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* fix(gallery): raise arch-router scoring output cap via parallel:64

Scoring decodes the whole prompt+candidate in a single llama_decode and
reads one logit row per candidate token. The vendored llama.cpp server
caps causal output rows at n_parallel, so the default of 1 aborts with
GGML_ASSERT(n_outputs_max <= cparams.n_outputs_max) on multi-token route
labels. Set options: [parallel:64] on both arch-router quant entries to
lift the cap; kv_unified (the grpc-server default) keeps the full context
per sequence, so this does not split the KV cache.

Assisted-by: claude-code:claude-opus-4-8 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-06-12 16:21:15 +02:00

262 lines
13 KiB
JavaScript

import { test, expect } from './coverage-fixtures.js'
const MOCK_METADATA = {
sections: [
{ id: 'general', label: 'General', icon: 'settings', order: 0 },
{ id: 'parameters', label: 'Parameters', icon: 'sliders', order: 20 },
],
fields: [
{ path: 'name', yaml_key: 'name', go_type: 'string', ui_type: 'string', section: 'general', label: 'Model Name', description: 'Unique identifier for this model', component: 'input', order: 0 },
{ path: 'backend', yaml_key: 'backend', go_type: 'string', ui_type: 'string', section: 'general', label: 'Backend', description: 'Inference backend to use', component: 'select', autocomplete_provider: 'backends', order: 10 },
{ path: 'context_size', yaml_key: 'context_size', go_type: '*int', ui_type: 'int', section: 'general', label: 'Context Size', description: 'Maximum context window in tokens', component: 'number', vram_impact: true, order: 20 },
{ path: 'cuda', yaml_key: 'cuda', go_type: 'bool', ui_type: 'bool', section: 'general', label: 'CUDA', description: 'Enable CUDA GPU acceleration', component: 'toggle', order: 30 },
{ path: 'parameters.temperature', yaml_key: 'temperature', go_type: '*float64', ui_type: 'float', section: 'parameters', label: 'Temperature', description: 'Sampling temperature', component: 'slider', min: 0, max: 2, step: 0.1, order: 0 },
{ path: 'parameters.top_p', yaml_key: 'top_p', go_type: '*float64', ui_type: 'float', section: 'parameters', label: 'Top P', description: 'Nucleus sampling threshold', component: 'slider', min: 0, max: 1, step: 0.05, order: 10 },
],
}
// Mock raw YAML (what the edit endpoint returns) — only fields actually in the file
const MOCK_YAML = `name: mock-model
backend: mock-backend
parameters:
model: mock-model.bin
`
const MOCK_AUTOCOMPLETE_BACKENDS = { values: ['mock-backend', 'llama-cpp', 'vllm'] }
test.describe('Model Editor - Interactive Tab', () => {
test.beforeEach(async ({ page }) => {
// Mock config metadata
await page.route('**/api/models/config-metadata*', (route) => {
route.fulfill({
contentType: 'application/json',
body: JSON.stringify(MOCK_METADATA),
})
})
// Mock raw YAML edit endpoint (GET for loading, POST for saving)
await page.route('**/api/models/edit/mock-model', (route) => {
if (route.request().method() === 'POST') {
route.fulfill({
contentType: 'application/json',
body: JSON.stringify({ message: 'Configuration file saved' }),
})
} else {
route.fulfill({
contentType: 'application/json',
body: JSON.stringify({ config: MOCK_YAML, name: 'mock-model' }),
})
}
})
// Mock PATCH config-json for interactive save
await page.route('**/api/models/config-json/mock-model', (route) => {
if (route.request().method() === 'PATCH') {
route.fulfill({
contentType: 'application/json',
body: JSON.stringify({ success: true, message: "Model 'mock-model' updated successfully" }),
})
} else {
route.fulfill({ contentType: 'application/json', body: '{}' })
}
})
// Mock autocomplete for backends
await page.route('**/api/models/config-metadata/autocomplete/backends', (route) => {
route.fulfill({
contentType: 'application/json',
body: JSON.stringify(MOCK_AUTOCOMPLETE_BACKENDS),
})
})
await page.goto('/app/model-editor/mock-model')
// Wait for the page to load
await expect(page.locator('h1', { hasText: 'Model Editor' })).toBeVisible({ timeout: 10_000 })
})
test('page loads and shows model name in header', async ({ page }) => {
await expect(page.locator('text=mock-model')).toBeVisible()
await expect(page.locator('h1', { hasText: 'Model Editor' })).toBeVisible()
})
test('interactive tab is active by default', async ({ page }) => {
// The field browser should be visible (interactive tab content)
await expect(page.locator('input[placeholder="Search fields to add..."]')).toBeVisible()
})
test('existing config fields from YAML are populated', async ({ page }) => {
// The mock YAML has name and backend — they should be active fields
await expect(page.locator('text=Model Name')).toBeVisible()
await expect(page.locator('span', { hasText: /^Backend$/ }).first()).toBeVisible()
})
test('section sidebar shows sections with active fields', async ({ page }) => {
const sidebar = page.locator('nav')
await expect(sidebar.locator('text=General')).toBeVisible()
})
test('typing in field browser shows matching fields', async ({ page }) => {
const searchInput = page.locator('input[placeholder="Search fields to add..."]')
await searchInput.fill('Temperature')
await expect(page.locator('text=Temperature').first()).toBeVisible()
})
test('clicking a field result adds it to the config', async ({ page }) => {
const searchInput = page.locator('input[placeholder="Search fields to add..."]')
await searchInput.fill('Temperature')
const dropdown = searchInput.locator('..').locator('..')
await dropdown.locator('div', { hasText: 'Temperature' }).first().click()
await expect(page.locator('h3', { hasText: 'Parameters' })).toBeVisible()
})
test('toggle field renders a toggle switch', async ({ page }) => {
const searchInput = page.locator('input[placeholder="Search fields to add..."]')
await searchInput.fill('CUDA')
const dropdown = searchInput.locator('..').locator('..')
await dropdown.locator('div', { hasText: 'CUDA' }).first().click()
await expect(page.locator('text=CUDA').first()).toBeVisible()
const cudaSection = page.locator('div', { has: page.locator('span', { hasText: /^CUDA$/ }) }).first()
await expect(cudaSection.locator('input[type="checkbox"]')).toHaveCount(1)
})
test('number field renders a numeric input', async ({ page }) => {
const searchInput = page.locator('input[placeholder="Search fields to add..."]')
await searchInput.fill('Context Size')
const dropdown = searchInput.locator('..').locator('..')
await dropdown.locator('div', { hasText: 'Context Size' }).first().click()
await expect(page.locator('input[type="number"]')).toBeVisible()
})
test('changing a field value enables the Save button', async ({ page }) => {
const searchInput = page.locator('input[placeholder="Search fields to add..."]')
await searchInput.fill('Context Size')
const dropdown = searchInput.locator('..').locator('..')
await dropdown.locator('div', { hasText: 'Context Size' }).first().click()
const numberInput = page.locator('input[type="number"]')
await numberInput.fill('4096')
await expect(page.locator('button', { hasText: 'Save Changes' })).toBeVisible()
})
test('removing a field with X button removes it from the form', async ({ page }) => {
const searchInput = page.locator('input[placeholder="Search fields to add..."]')
await searchInput.fill('Temperature')
const dropdown = searchInput.locator('..').locator('..')
await dropdown.locator('div', { hasText: 'Temperature' }).first().click()
const paramsHeader = page.locator('h3', { hasText: 'Parameters' })
await expect(paramsHeader).toBeVisible()
const paramsSection = paramsHeader.locator('..')
await paramsSection.locator('button[title="Remove field"]').first().click()
await expect(paramsHeader).not.toBeVisible()
})
test('save sends PATCH and shows success toast', async ({ page }) => {
const searchInput = page.locator('input[placeholder="Search fields to add..."]')
await searchInput.fill('Context Size')
const dropdown = searchInput.locator('..').locator('..')
await dropdown.locator('div', { hasText: 'Context Size' }).first().click()
const numberInput = page.locator('input[type="number"]')
await numberInput.fill('8192')
await page.locator('button', { hasText: 'Save Changes' }).click()
await expect(page.locator('text=Configuration saved')).toBeVisible({ timeout: 5_000 })
})
test('added field is no longer shown in field browser results', async ({ page }) => {
const searchInput = page.locator('input[placeholder="Search fields to add..."]')
await searchInput.fill('Temperature')
const dropdown = searchInput.locator('..').locator('..')
await dropdown.locator('div', { hasText: 'Temperature' }).first().click()
await searchInput.fill('Temperature')
await page.waitForTimeout(200)
const results = dropdown.locator('div[style*="cursor: pointer"]', { hasText: 'Temperature' })
await expect(results).toHaveCount(0)
})
test('switching to YAML tab shows code editor', async ({ page }) => {
await page.locator('button', { hasText: 'YAML' }).click()
// The CodeMirror editor should be visible
await expect(page.locator('.cm-editor').first()).toBeVisible()
// The field browser should NOT be visible
await expect(page.locator('input[placeholder="Search fields to add..."]')).not.toBeVisible()
})
test('switching back to Interactive tab restores fields', async ({ page }) => {
// Go to YAML tab
await page.locator('button', { hasText: 'YAML' }).click()
await expect(page.locator('input[placeholder="Search fields to add..."]')).not.toBeVisible()
// Go back to Interactive tab
await page.locator('button', { hasText: 'Interactive' }).click()
await expect(page.locator('input[placeholder="Search fields to add..."]')).toBeVisible()
await expect(page.locator('text=Model Name')).toBeVisible()
})
test('shows the estimated VRAM annotation when the model has a context size', async ({ page }) => {
// Regression: the editor reads the /api/models/vram-estimate response,
// whose shape is snake_case (vram_display). The hook previously read
// camelCase (vramDisplay) and silently showed nothing.
await page.route('**/api/models/edit/mock-model', (route) => {
route.fulfill({
contentType: 'application/json',
body: JSON.stringify({
config: 'name: mock-model\nbackend: mock-backend\ncontext_size: 4096\nparameters:\n model: mock-model.bin\n',
name: 'mock-model',
}),
})
})
let estimateCalled = false
await page.route('**/api/models/vram-estimate', (route) => {
estimateCalled = true
route.fulfill({
contentType: 'application/json',
body: JSON.stringify({
size_bytes: 4294967296,
size_display: '4 GiB',
vram_bytes: 5583457484,
vram_display: '5.2 GiB',
context_length: 4096,
}),
})
})
await page.goto('/app/model-editor/mock-model')
await expect(page.locator('h1', { hasText: 'Model Editor' })).toBeVisible({ timeout: 10_000 })
await expect(page.getByText(/~\s*5\.2 GiB VRAM/)).toBeVisible({ timeout: 10_000 })
expect(estimateCalled).toBe(true)
})
test('interactive tab scrolls at body height (no inner overflow pane) and tracks the active section', async ({ page }) => {
// Regression: the form sections used to live inside an overflow:auto pane
// with maxHeight: calc(100vh - 340px), which kept the global footer in
// view on every screen and ate ~50px of editing room on short windows.
// Pin two pieces of the fix:
// 1. The two-column container (sticky nav + content) has no scrollable
// inner element on its content side — body-scroll handles overflow.
// 2. The active-section tracker now listens to window scroll. Scrolling
// the window should run the tracker without throwing, and the
// `<nav>` sidebar must still render.
const contentOverflowY = await page.evaluate(() => {
const sidebar = document.querySelector('nav')
// The content column is the next sibling of the sticky sidebar.
const content = sidebar?.nextElementSibling
return content ? getComputedStyle(content).overflowY : 'no-content'
})
expect(['visible', 'normal', 'auto', 'scroll', 'no-content']).toContain(contentOverflowY)
expect(contentOverflowY).not.toBe('scroll')
// 'auto' could exist on some browsers but should NOT — the fix removes it.
// We assert the strong invariant separately.
expect(['auto']).not.toContain(contentOverflowY)
// Add a couple of fields to give the page a touch more height, then
// force a window scroll. The tracker should run; the sidebar should
// remain visible.
const searchInput = page.locator('input[placeholder="Search fields to add..."]')
await searchInput.fill('Temperature')
const dropdown = searchInput.locator('..').locator('..')
await dropdown.locator('div', { hasText: 'Temperature' }).first().click()
await page.evaluate(() => window.scrollTo(0, 200))
await page.waitForTimeout(50)
await expect(page.locator('nav').first()).toBeVisible()
})
})