exo/dashboard at db400dbb75740569150fbe0032bdecdee24be552 - exo

mirror/exo

mirror of https://github.com/exo-explore/exo.git synced 2026-02-05 19:52:16 -05:00

Files

Alex Cheema 01b86a9e81 feat: add uncertainty visualization with token-level logprobs (#1180 )

## Motivation

Adds uncertainty visualization to the chat interface, allowing users to
see token-level confidence scores and regenerate responses from any
point in the generation. This enables users to:
- Understand model confidence at each token
- Explore alternative completions by regenerating from uncertain tokens
- Debug and analyze model behavior

## Changes

### Uncertainty Visualization
- Add `TokenHeatmap` component showing token-level probability coloring
- Toggle uncertainty view per message with bar chart icon
- Display tooltip with probability, logprob, and top alternative tokens
on hover

### Regenerate from Token
- Add "Regenerate from here" button in token tooltip
- Use `continue_final_message` in chat template to continue within same
turn (no EOS tokens)
- Add `continue_from_prefix` flag to `ChatCompletionTaskParams`

### Request Cancellation
- Add `AbortController` to cancel in-flight requests when regenerating
mid-generation
- Handle `BrokenResourceError` server-side when client disconnects
gracefully

### Additional APIs
- Add Claude Messages API support (`/v1/messages`)
- Add OpenAI Responses API support (`/v1/responses`)

## Why It Works

- **Proper continuation**: Using `continue_final_message=True` instead
of `add_generation_prompt=True` keeps the assistant turn open, allowing
the model to continue naturally from the prefix without end-of-turn
markers
- **Clean cancellation**: AbortController aborts the HTTP request, and
server catches `BrokenResourceError` to avoid crashes
- **Stable hover during generation**: TokenHeatmap tracks hover by index
(stable across re-renders) with longer hide delay during generation

## Test Plan

### Manual Testing
<!-- Hardware: MacBook Pro M1 -->
- Send a message and verify logprobs are collected
- Enable uncertainty view and verify token coloring based on probability
- Hover over tokens to see tooltip with alternatives
- Click "Regenerate from here" on a token mid-response
- Verify the response continues naturally from that point
- Verify aborting mid-generation and regenerating works without server
crash

### Automated Testing
- Added tests for Claude Messages API adapter
- Added tests for OpenAI Responses API adapter

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Evan <evanev7@gmail.com>

2026-02-05 05:21:26 -08:00