mirror of
https://github.com/exo-explore/exo.git
synced 2026-02-24 02:07:17 -05:00
The bench script downloads models during the planning phase but doesn't
record how long the download took, making it difficult to track download
performance for a given model over time.
Modified `run_planning_phase` to return download metadata: whether a
fresh download occurred, the wall-clock duration, and the model size in
bytes. These fields are included in every JSON output row alongside the
existing per-run metrics, and a summary line is logged to the console.
This allows filtering bench results by `download_occurred` and grouping
by `model_id` to compute average download times across runs.
Test plan:
```
# existing model
jake@maverick:/data/users/jake/repos/exo/ > nix run .#exo-bench -- --host s1 --model mlx-community/gpt-oss-120b-MXFP4-Q8 --pp 128 --tg 128
...
2026-02-20 15:23:49.081 | INFO | __main__:main:340 - Planning phase: checking downloads...
2026-02-20 15:23:49.152 | INFO | harness:run_planning_phase:402 - Started download on 12D3KooWKx41iikn188ozrxSdoG26g88jFCfie9wEA1eQR8csbPm
2026-02-20 15:23:49.184 | INFO | __main__:main:352 - Download: model already cached
...
Wrote results JSON: bench/results.json
jake@maverick:/data/users/jake/repos/exo/ > cat bench/results.json
[
{
"elapsed_s": 2.9446684420108795,
"output_text_preview": "The user just typed a long series of \"a\". Possibly they are testing. There's no explicit question. Could be they want a response? Might be a test of handling long input. We can respond politely, ask i",
"stats": {
"prompt_tps": 117.7872141515621,
"generation_tps": 85.49598231498028,
"prompt_tokens": 129,
"generation_tokens": 128,
"peak_memory_usage": {
"inBytes": 68215145744
}
},
"model_short_id": "gpt-oss-120b-MXFP4-Q8",
"model_id": "mlx-community/gpt-oss-120b-MXFP4-Q8",
"placement_sharding": "Pipeline",
"placement_instance_meta": "MlxRing",
"placement_nodes": 1,
"instance_id": "68babc2a-6e94-4c70-aa07-7ec681f7c856",
"pp_tokens": 128,
"tg": 128,
"repeat_index": 0
}
]%
# no change to output
```
```
# missing model
jake@maverick:/data/users/jake/repos/exo/ > nix run .#exo-bench -- --host s1 --model mlx-community/Meta-Llama-3.1-8B-Instruct-4bit --pp 128 --tg 128
...
2026-02-20 15:24:42.553 | INFO | __main__:main:340 - Planning phase: checking downloads...
2026-02-20 15:24:42.625 | INFO | harness:run_planning_phase:402 - Started download on 12D3KooWKx41iikn188ozrxSdoG26g88jFCfie9wEA1eQR8csbPm
2026-02-20 15:25:37.494 | INFO | __main__:main:350 - Download: 54.9s (freshly downloaded)
...
Wrote results JSON: bench/results.json
jake@maverick:/data/users/jake/repos/exo/ > cat bench/results.json
[
{
"elapsed_s": 1.500349276990164,
"output_text_preview": "It seems like you've entered a large number of 'a's. If you'd like to discuss something or ask a question, I'm here to help. If not, is there anything else I can assist you with? \n\nIf you're intereste",
"stats": {
"prompt_tps": 395.43264952543666,
"generation_tps": 128.03520443181478,
"prompt_tokens": 129,
"generation_tokens": 128,
"peak_memory_usage": {
"inBytes": 5116952079
}
},
"model_short_id": "Meta-Llama-3.1-8B-Instruct-4bit",
"model_id": "mlx-community/Meta-Llama-3.1-8B-Instruct-4bit",
"placement_sharding": "Pipeline",
"placement_instance_meta": "MlxRing",
"placement_nodes": 1,
"instance_id": "ccd9bd71-d4cc-4b75-a37f-98090544626a",
"pp_tokens": 128,
"tg": 128,
"repeat_index": 0,
"download_duration_s": 54.88322358299047
}
]%
# one new field
```