Files
exo/TODO.md
Evan Quiney 8314a2aa78 cleaning up the todos (#1406)
kinda closes #1400 ( a bit )
2026-02-10 12:35:29 +00:00

1.6 KiB

  1. Task cancellation. When API http request gets cancelled, it should cancel corresponding task.
  2. I'd like to see profiled network latency / bandwidth.
  3. I'd like to see how much bandwidth each link is using.
  4. Solve the problem of in continuous batching when a new prompt comes in, it will block decode of the current batch until the prefill is complete.
  5. We want people to be able to copy models over to a new device without ever connecting EXO to the internet. Right now EXO require internet connection once to cache some files to check if a download is complete. Instead, we should simply check if there is a non-empty model folder locally with no .partial files. This indicates it's a fully downloaded model that can be loaded.
  6. Memory pressure instead of memory used.
  7. Show the type of each connection (TB5, Ethernet, etc.) in the UI. Refer to old exo: 56f783b38d/exo/helpers.py (L251)
  8. Prioritise certain connection types (or by latency). TB5 > Ethernet > WiFi. Refer to old exo: 56f783b38d/exo/helpers.py (L251)
  9. Dynamically switch to higher priority connection when it becomes available. Probably bring back InstanceReplacedAtomically.
  10. Faster model loads by streaming model from other devices in cluster.
  11. Add support for specifying the type of network connection to use in a test. Depends on 15/16.
  12. Rethink retry logic
  13. Log cleanup - per-module log filters and default to DEBUG log levels
  14. Validate RDMA connections with ibv_devinfo in the info gatherer