mirror of
https://github.com/exo-explore/exo.git
synced 2026-02-05 19:52:16 -05:00
## Motivation Adds uncertainty visualization to the chat interface, allowing users to see token-level confidence scores and regenerate responses from any point in the generation. This enables users to: - Understand model confidence at each token - Explore alternative completions by regenerating from uncertain tokens - Debug and analyze model behavior ## Changes ### Uncertainty Visualization - Add `TokenHeatmap` component showing token-level probability coloring - Toggle uncertainty view per message with bar chart icon - Display tooltip with probability, logprob, and top alternative tokens on hover ### Regenerate from Token - Add "Regenerate from here" button in token tooltip - Use `continue_final_message` in chat template to continue within same turn (no EOS tokens) - Add `continue_from_prefix` flag to `ChatCompletionTaskParams` ### Request Cancellation - Add `AbortController` to cancel in-flight requests when regenerating mid-generation - Handle `BrokenResourceError` server-side when client disconnects gracefully ### Additional APIs - Add Claude Messages API support (`/v1/messages`) - Add OpenAI Responses API support (`/v1/responses`) ## Why It Works - **Proper continuation**: Using `continue_final_message=True` instead of `add_generation_prompt=True` keeps the assistant turn open, allowing the model to continue naturally from the prefix without end-of-turn markers - **Clean cancellation**: AbortController aborts the HTTP request, and server catches `BrokenResourceError` to avoid crashes - **Stable hover during generation**: TokenHeatmap tracks hover by index (stable across re-renders) with longer hide delay during generation ## Test Plan ### Manual Testing <!-- Hardware: MacBook Pro M1 --> - Send a message and verify logprobs are collected - Enable uncertainty view and verify token coloring based on probability - Hover over tokens to see tooltip with alternatives - Click "Regenerate from here" on a token mid-response - Verify the response continues naturally from that point - Verify aborting mid-generation and regenerating works without server crash ### Automated Testing - Added tests for Claude Messages API adapter - Added tests for OpenAI Responses API adapter 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Evan <evanev7@gmail.com>