mirror of
https://github.com/exo-explore/exo.git
synced 2026-06-02 19:27:55 -04:00
## Motivation Extend bench/eval tooling with robustness features, streaming support, and align model configs with vllm eval for reproducible comparisons. ## Changes - **exo_eval**: Checkpoint/resume (JSONL), instance health monitoring + early abort, `top_k`/`min_p`/`enable_thinking` params, LCB `--release-version`/`--offset` - **exo_bench**: Streaming SSE (`--stream`), Kimi tokenizer fix for transformers 5.x - **Both tools**: Auto-detect running instances instead of requiring `--skip-instance-setup`; `--fresh-instance` to override - **harness**: SSE streaming client, `find_existing_instance()` shared helper, removed download timeout, settle-timeout default 0→7200s - **models.toml**: Added `enable_thinking`, aligned `max_tokens`/temps with vllm, added new models - **API**: Streaming SSE for `/bench/chat/completions` ## Why It Works - Checkpoint/resume uses append-only JSONL + skip-on-load so interrupted evals resume without re-running completed questions - Health monitoring races an `asyncio.Event` against API calls for fast abort when the instance dies - Auto-detection queries `/state` for existing instances matching the model ID before attempting placement - Streaming reuses the existing `generate_chat_stream` infrastructure from the regular chat endpoint