Files
exo/bench
Jake Hillion d484b062e8 bench: add download timing to bench output (#1566)
The bench script downloads models during the planning phase but doesn't
record how long the download took, making it difficult to track download
performance for a given model over time.

Modified `run_planning_phase` to return download metadata: whether a
fresh download occurred, the wall-clock duration, and the model size in
bytes. These fields are included in every JSON output row alongside the
existing per-run metrics, and a summary line is logged to the console.

This allows filtering bench results by `download_occurred` and grouping
by `model_id` to compute average download times across runs.

Test plan:

```
# existing model
jake@maverick:/data/users/jake/repos/exo/ > nix run .#exo-bench -- --host s1 --model mlx-community/gpt-oss-120b-MXFP4-Q8 --pp 128 --tg 128
...
2026-02-20 15:23:49.081 | INFO     | __main__:main:340 - Planning phase: checking downloads...
2026-02-20 15:23:49.152 | INFO     | harness:run_planning_phase:402 - Started download on 12D3KooWKx41iikn188ozrxSdoG26g88jFCfie9wEA1eQR8csbPm
2026-02-20 15:23:49.184 | INFO     | __main__:main:352 - Download: model already cached
...
Wrote results JSON: bench/results.json
jake@maverick:/data/users/jake/repos/exo/ > cat bench/results.json
[
  {
    "elapsed_s": 2.9446684420108795,
    "output_text_preview": "The user just typed a long series of \"a\". Possibly they are testing. There's no explicit question. Could be they want a response? Might be a test of handling long input. We can respond politely, ask i",
    "stats": {
      "prompt_tps": 117.7872141515621,
      "generation_tps": 85.49598231498028,
      "prompt_tokens": 129,
      "generation_tokens": 128,
      "peak_memory_usage": {
        "inBytes": 68215145744
      }
    },
    "model_short_id": "gpt-oss-120b-MXFP4-Q8",
    "model_id": "mlx-community/gpt-oss-120b-MXFP4-Q8",
    "placement_sharding": "Pipeline",
    "placement_instance_meta": "MlxRing",
    "placement_nodes": 1,
    "instance_id": "68babc2a-6e94-4c70-aa07-7ec681f7c856",
    "pp_tokens": 128,
    "tg": 128,
    "repeat_index": 0
  }
]%
# no change to output
```

```
# missing model
jake@maverick:/data/users/jake/repos/exo/ > nix run .#exo-bench -- --host s1 --model mlx-community/Meta-Llama-3.1-8B-Instruct-4bit --pp 128 --tg 128
...
2026-02-20 15:24:42.553 | INFO     | __main__:main:340 - Planning phase: checking downloads...
2026-02-20 15:24:42.625 | INFO     | harness:run_planning_phase:402 - Started download on 12D3KooWKx41iikn188ozrxSdoG26g88jFCfie9wEA1eQR8csbPm
2026-02-20 15:25:37.494 | INFO     | __main__:main:350 - Download: 54.9s (freshly downloaded)
...
Wrote results JSON: bench/results.json
jake@maverick:/data/users/jake/repos/exo/ > cat bench/results.json
[
  {
    "elapsed_s": 1.500349276990164,
    "output_text_preview": "It seems like you've entered a large number of 'a's. If you'd like to discuss something or ask a question, I'm here to help. If not, is there anything else I can assist you with? \n\nIf you're intereste",
    "stats": {
      "prompt_tps": 395.43264952543666,
      "generation_tps": 128.03520443181478,
      "prompt_tokens": 129,
      "generation_tokens": 128,
      "peak_memory_usage": {
        "inBytes": 5116952079
      }
    },
    "model_short_id": "Meta-Llama-3.1-8B-Instruct-4bit",
    "model_id": "mlx-community/Meta-Llama-3.1-8B-Instruct-4bit",
    "placement_sharding": "Pipeline",
    "placement_instance_meta": "MlxRing",
    "placement_nodes": 1,
    "instance_id": "ccd9bd71-d4cc-4b75-a37f-98090544626a",
    "pp_tokens": 128,
    "tg": 128,
    "repeat_index": 0,
    "download_duration_s": 54.88322358299047
  }
]%
# one new field
```
2026-02-20 15:33:08 +00:00
..
2026-02-18 20:29:18 +00:00