mirror of
https://github.com/exo-explore/exo.git
synced 2026-02-20 07:46:42 -05:00
1.6 KiB
1.6 KiB
- Task cancellation. When API http request gets cancelled, it should cancel corresponding task.
- I'd like to see profiled network latency / bandwidth.
- I'd like to see how much bandwidth each link is using.
- Solve the problem of in continuous batching when a new prompt comes in, it will block decode of the current batch until the prefill is complete.
- We want people to be able to copy models over to a new device without ever connecting EXO to the internet. Right now EXO require internet connection once to cache some files to check if a download is complete. Instead, we should simply check if there is a non-empty model folder locally with no .partial files. This indicates it's a fully downloaded model that can be loaded.
- Memory pressure instead of memory used.
- Show the type of each connection (TB5, Ethernet, etc.) in the UI. Refer to old exo:
56f783b38d/exo/helpers.py (L251) - Prioritise certain connection types (or by latency). TB5 > Ethernet > WiFi. Refer to old exo:
56f783b38d/exo/helpers.py (L251) - Dynamically switch to higher priority connection when it becomes available. Probably bring back InstanceReplacedAtomically.
- Faster model loads by streaming model from other devices in cluster.
- Add support for specifying the type of network connection to use in a test. Depends on 15/16.
- Rethink retry logic
- Log cleanup - per-module log filters and default to DEBUG log levels
- Validate RDMA connections with ibv_devinfo in the info gatherer