mirror/exo - exo - Gitea: Git with a cup of tea

mirror/exo

mirror of https://github.com/exo-explore/exo.git synced 2026-04-18 13:00:59 -04:00

Author	SHA1	Message	Date
rltakashige	2efbb8ab4f	Improve exo harness with path state (#1815 ) <img width="3224" height="1476" alt="image" src="https://github.com/user-attachments/assets/d90a7d8a-9fe5-43a1-a715-1ef7ecc15422" />	2026-03-30 11:20:46 +01:00
Jake Hillion	42e1e7322b	bench: restore --danger-delete-downloads planning phase (#1542 ) `c2f2111b` extracted shared utilities from exo_bench.py into harness.py but accidentally dropped the run_planning_phase function and --danger-delete-downloads CLI argument in the process. Restored run_planning_phase in harness.py (where its dependencies now live) and re-added the --danger-delete-downloads argument to add_common_instance_args. Re-wired the planning phase call in exo_bench.py's main() before the benchmark loop.	2026-02-19 15:42:02 +00:00
Alex Cheema	025ed9fd82	feat: add prefill progress bar for long prompts (#1181 ) ## Motivation Users processing long prompts have no visibility into when token generation will start. This feature adds a progress bar showing prefill progress, giving users real-time feedback during prompt processing. ## Changes ### Backend - Added `PrefillProgress` event type with `command_id`, `processed_tokens`, `total_tokens` - Added `PrefillProgressResponse` type (though now using direct callback approach) - Wired `prompt_progress_callback` through MLX's `stream_generate()` - Progress events sent directly from callback for real-time updates (not batched) - API generates SSE named events: `event: prefill_progress\ndata: {...}` - Added `PrefillProgressData` dataclass and `StreamEvent` union type in API ### Dashboard - Added `PrefillProgress` interface to store - Updated SSE parsing to handle `event:` lines (named events) - Created `PrefillProgressBar.svelte` with animated progress bar - Shows "Processing prompt: X/Y tokens" with percentage - Progress bar disappears when first token arrives ## Why It Works MLX's `stream_generate()` accepts a `prompt_progress_callback(processed, total)` that's called after each prefill chunk. By sending events directly from this callback (rather than yielding from the generator), progress updates are sent in real-time during prefill. Using SSE named events (`event: prefill_progress`) maintains full OpenAI/Claude API compatibility - standard clients ignore named events they don't recognize, while the exo dashboard explicitly listens for them. ## Test Plan ### Manual Testing - Hardware: MacBook Pro M3 Max - Set `prefill_step_size=256` for more frequent updates - Tested with long prompts (pasted large documents) - Verified progress bar updates incrementally during prefill - Confirmed progress bar disappears when generation starts - Tested with curl - standard `data:` events still work normally Here is it working: https://github.com/user-attachments/assets/5cc6f075-c5b2-4a44-bb4d-9efb246bc5fe ### Automated Testing - Type checker passes (0 errors) - All 192 tests pass - Dashboard builds successfully ### API Compatibility - Named SSE events are ignored by OpenAI SDK clients - Regular token data uses standard `data: {...}` format - `[DONE]` sentinel works as expected --- Note: `prefill_step_size` is temporarily set to 256 for testing. Should be changed back to 2048 before merging for production performance. --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Evan <evanev7@gmail.com> Co-authored-by: Ryuichi Leo Takashige <leo@exolabs.net>	2026-02-19 03:18:25 +00:00
rltakashige	c2f2111b88	Fix tool calling (#1529 ) ## Motivation GPT OSS tool calling issues. ## Changes Fixes those and adds a bunch of evals for tool calling. Fixes GLM5 prefix caching, where CacheList wasn't getting handled properly. Extracts a bunch of the setup functionality of exo bench to a harness that can be reused elsewhere, such as in the tool calling eval. ## Test Plan ### Automated Testing Let's run the evals for all models	2026-02-18 20:29:18 +00:00

4 Commits