Files
exo/bench
Jake Hillion 131fb141a6 bench: add --danger-delete-downloads flag with planning phase
exo bench previously relied on the worker's plan loop to download
models, which could fail silently or run into disk space issues during
benchmarking. This made it difficult to diagnose download failures.

Added a planning phase that runs before benchmarking to explicitly
handle downloads. It checks available disk space on each node via the
/state endpoint and starts downloads via POST /download/start. When
the --danger-delete-downloads flag is set and there's insufficient
space, it deletes existing models from smallest to largest until
there's room for the benchmark model.

Test plan:
- CI

```
jake@maverick:/data/users/jake/repos/exo/ > nix run .#exo-bench -- --pp 128,2048,4096 --tg 128 --stdout --settle-timeout 10 --host s1 --model mlx-community/gpt-oss-120b-MXFP4-Q8
PyTorch was not found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
2026-02-16 12:12:11.807 | INFO     | __main__:main:710 - pp/tg mode: combinations (product) - 3 pairs
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
2026-02-16 12:12:13.455 | DEBUG    | __main__:main:725 - [exo-bench] loaded tokenizer: mlx-community/gpt-oss-120b-MXFP4-Q8 for prompt sizer
2026-02-16 12:12:13.473 | DEBUG    | __main__:main:761 - exo-bench model: short_id=gpt-oss-120b-MXFP4-Q8 full_id=mlx-community/gpt-oss-120b-MXFP4-Q8
2026-02-16 12:12:13.473 | INFO     | __main__:main:762 - placements: 1
2026-02-16 12:12:13.474 | INFO     | __main__:main:764 -   - Pipeline / MlxRing / nodes=1
2026-02-16 12:12:13.474 | INFO     | __main__:main:771 - Planning phase: checking downloads...
Traceback (most recent call last):
  File "/nix/store/q31kmbcfr5bf97290bvbnhrvpc3fv824-source/bench/exo_bench.py", line 885, in <module>
    raise SystemExit(main())
                     ~~~~^^
  File "/nix/store/q31kmbcfr5bf97290bvbnhrvpc3fv824-source/bench/exo_bench.py", line 772, in main
    run_planning_phase(
    ~~~~~~~~~~~~~~~~~~^
        client,
        ^^^^^^^
    ...<4 lines>...
        settle_deadline,
        ^^^^^^^^^^^^^^^^
    )
    ^
  File "/nix/store/q31kmbcfr5bf97290bvbnhrvpc3fv824-source/bench/exo_bench.py", line 367, in run_planning_phase
    raise RuntimeError(
    ...<2 lines>...
    )
RuntimeError: Insufficient disk on 12D3KooWE2C7dzC9d9YJMEfWK3g8og7JdZj3HHXZ8VmGrXYAEnEj: need 65GB, have 55GB. Use --danger-delete-downloads to free space.
jake@maverick:/data/users/jake/repos/exo/ > nix run .#exo-bench -- --pp 128,2048,4096 --tg 128 --stdout --settle-timeout 10 --host s1 --model mlx-community/gpt-oss-120b-MXFP4-Q8 --danger-delete-downloads
PyTorch was not found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
2026-02-16 12:12:19.626 | INFO     | __main__:main:710 - pp/tg mode: combinations (product) - 3 pairs
2026-02-16 12:12:21.262 | DEBUG    | __main__:main:725 - [exo-bench] loaded tokenizer: mlx-community/gpt-oss-120b-MXFP4-Q8 for prompt sizer
2026-02-16 12:12:21.280 | DEBUG    | __main__:main:761 - exo-bench model: short_id=gpt-oss-120b-MXFP4-Q8 full_id=mlx-community/gpt-oss-120b-MXFP4-Q8
2026-02-16 12:12:21.280 | INFO     | __main__:main:762 - placements: 1
2026-02-16 12:12:21.280 | INFO     | __main__:main:764 -   - Pipeline / MlxRing / nodes=1
2026-02-16 12:12:21.280 | INFO     | __main__:main:771 - Planning phase: checking downloads...
2026-02-16 12:12:21.336 | INFO     | __main__:run_planning_phase:386 - Deleting mlx-community/Qwen3-0.6B-4bit from 12D3KooWE2C7dzC9d9YJMEfWK3g8og7JdZj3HHXZ8VmGrXYAEnEj (335MB)
2026-02-16 12:12:21.350 | INFO     | __main__:run_planning_phase:386 - Deleting mlx-community/Llama-3.2-1B-Instruct-4bit from 12D3KooWE2C7dzC9d9YJMEfWK3g8og7JdZj3HHXZ8VmGrXYAEnEj (679MB)
2026-02-16 12:12:21.363 | INFO     | __main__:run_planning_phase:386 - Deleting mlx-community/Llama-3.2-3B-Instruct-4bit from 12D3KooWE2C7dzC9d9YJMEfWK3g8og7JdZj3HHXZ8VmGrXYAEnEj (1740MB)
2026-02-16 12:12:21.373 | INFO     | __main__:run_planning_phase:386 - Deleting mlx-community/Llama-3.2-3B-Instruct-8bit from 12D3KooWE2C7dzC9d9YJMEfWK3g8og7JdZj3HHXZ8VmGrXYAEnEj (3264MB)
2026-02-16 12:12:21.384 | INFO     | __main__:run_planning_phase:386 - Deleting mlx-community/GLM-4.7-Flash-8bit from 12D3KooWE2C7dzC9d9YJMEfWK3g8og7JdZj3HHXZ8VmGrXYAEnEj (30366MB)
2026-02-16 12:12:21.413 | INFO     | __main__:run_planning_phase:407 - Started download on 12D3KooWE2C7dzC9d9YJMEfWK3g8og7JdZj3HHXZ8VmGrXYAEnEj
```

It's not pretty but it works!
2026-02-16 13:06:38 +00:00
..