workin on it

fix instance port assignment (#1268 )
we were overassigning the port 52414 to instances because of an error in placement
2026-01-23 21:41:21 -05:00 · 2026-01-24 02:39:38 +00:00 · 2026-01-23 18:37:40 +00:00 · 2026-01-23 18:11:17 +00:00 · 2026-01-23 18:04:09 +00:00 · 2026-01-23 16:33:01 +00:00
32 changed files with 845 additions and 360 deletions
--- a/MISSED_THINGS.md
+++ b/MISSED_THINGS.md
@@ -5,16 +5,16 @@
 [X] Fetching download status of all models on start
 [X] Deduplication of tasks in plan_step.
 [X] resolve_allow_patterns should just be wildcard now.
-[] no mx_barrier in genreate.py mlx_generate at the end.
+[X] no mx_barrier in genreate.py mlx_generate at the end.
 [] cache assertion not needed in auto_parallel.py PipelineLastLayer.
-[] GPTOSS support dropped in auto_parallel.py.
-[] sharding changed "all-to-sharded" became _all_to_sharded in auto_parallel.py.
-[] same as above with "sharded-to-all" became _sharded_to_all in auto_parallel.py.
-[] Dropped support for Ministral3Model, DeepseekV32Model, Glm4MoeModel, Qwen3NextModel, GptOssMode in auto_parallel.py.
+[X] GPTOSS support dropped in auto_parallel.py.
+[X] sharding changed "all-to-sharded" became _all_to_sharded in auto_parallel.py.
+[X] same as above with "sharded-to-all" became _sharded_to_all in auto_parallel.py.
+[X] Dropped support for Ministral3Model, DeepseekV32Model, Glm4MoeModel, Qwen3NextModel, GptOssMode in auto_parallel.py.
 [] Dropped prefill/decode code in auto_parallel.py and utils_mlx.py.
 [X] KV_CACHE_BITS should be None to disable quantized KV cache.
-[] Dropped _set_nofile_limit in utils_mlx.py.
-[] We have group optional in load_mlx_items in utils_mlx.py.
+[X] Dropped _set_nofile_limit in utils_mlx.py.
+[X] We have group optional in load_mlx_items in utils_mlx.py.
 [] Dropped add_missing_chat_templates for GptOss in load_mlx_items in utils_mlx.py.
 [] Dropped model.make_cache in make_kv_cache in utils_mlx.py.
 [X] We put cache limit back in utils_mlx.py.
--- a/dashboard/src/lib/stores/app.svelte.ts
+++ b/dashboard/src/lib/stores/app.svelte.ts
@@ -216,6 +216,8 @@ export interface Message {
  attachments?: MessageAttachment[];
  ttftMs?: number; // Time to first token in ms (for assistant messages)
  tps?: number; // Tokens per second (for assistant messages)
+  requestType?: "chat" | "image-generation" | "image-editing";
+  sourceImageDataUrl?: string; // For image editing regeneration
 }

 export interface Conversation {
@@ -1270,10 +1272,46 @@ class AppStore {

    if (lastUserIndex === -1) return;

-    // Remove any messages after the user message
-    this.messages = this.messages.slice(0, lastUserIndex + 1);
+    const lastUserMessage = this.messages[lastUserIndex];
+    const requestType = lastUserMessage.requestType || "chat";
+    const prompt = lastUserMessage.content;

-    // Resend the message to get a new response
+    // Remove messages after user message (including the user message for image requests
+    // since generateImage/editImage will re-add it)
+    this.messages = this.messages.slice(0, lastUserIndex);
+
+    switch (requestType) {
+      case "image-generation":
+        await this.generateImage(prompt);
+        break;
+      case "image-editing":
+        if (lastUserMessage.sourceImageDataUrl) {
+          await this.editImage(prompt, lastUserMessage.sourceImageDataUrl);
+        } else {
+          // Can't regenerate edit without source image - restore user message and show error
+          this.messages.push(lastUserMessage);
+          const errorMessage = this.addMessage("assistant", "");
+          const idx = this.messages.findIndex((m) => m.id === errorMessage.id);
+          if (idx !== -1) {
+            this.messages[idx].content =
+              "Error: Cannot regenerate image edit - source image not found";
+          }
+          this.updateActiveConversation();
+        }
+        break;
+      case "chat":
+      default:
+        // Restore the user message for chat regeneration
+        this.messages.push(lastUserMessage);
+        await this.regenerateChatCompletion();
+        break;
+    }
+  }
+
+  /**
+   * Helper method to regenerate a chat completion response
+   */
+  private async regenerateChatCompletion(): Promise<void> {
    this.isLoading = true;
    this.currentResponse = "";

@@ -1788,6 +1826,7 @@ class AppStore {
      role: "user",
      content: prompt,
      timestamp: Date.now(),
+      requestType: "image-generation",
    };
    this.messages.push(userMessage);

@@ -1998,6 +2037,8 @@ class AppStore {
      role: "user",
      content: prompt,
      timestamp: Date.now(),
+      requestType: "image-editing",
+      sourceImageDataUrl: imageDataUrl,
    };
    this.messages.push(userMessage);

@@ -2187,6 +2228,54 @@ class AppStore {
      this.conversations.find((c) => c.id === this.activeConversationId) || null
    );
  }
+
+  /**
+   * Start a download on a specific node
+   */
+  async startDownload(nodeId: string, shardMetadata: object): Promise<void> {
+    try {
+      const response = await fetch("/download/start", {
+        method: "POST",
+        headers: { "Content-Type": "application/json" },
+        body: JSON.stringify({
+          targetNodeId: nodeId,
+          shardMetadata: shardMetadata,
+        }),
+      });
+      if (!response.ok) {
+        const errorText = await response.text();
+        throw new Error(
+          `Failed to start download: ${response.status} - ${errorText}`,
+        );
+      }
+    } catch (error) {
+      console.error("Error starting download:", error);
+      throw error;
+    }
+  }
+
+  /**
+   * Delete a downloaded model from a specific node
+   */
+  async deleteDownload(nodeId: string, modelId: string): Promise<void> {
+    try {
+      const response = await fetch(
+        `/download/${encodeURIComponent(nodeId)}/${encodeURIComponent(modelId)}`,
+        {
+          method: "DELETE",
+        },
+      );
+      if (!response.ok) {
+        const errorText = await response.text();
+        throw new Error(
+          `Failed to delete download: ${response.status} - ${errorText}`,
+        );
+      }
+    } catch (error) {
+      console.error("Error deleting download:", error);
+      throw error;
+    }
+  }
 }

 export const appStore = new AppStore();
@@ -2292,3 +2381,9 @@ export const setImageGenerationParams = (
 ) => appStore.setImageGenerationParams(params);
 export const resetImageGenerationParams = () =>
  appStore.resetImageGenerationParams();
+
+// Download actions
+export const startDownload = (nodeId: string, shardMetadata: object) =>
+  appStore.startDownload(nodeId, shardMetadata);
+export const deleteDownload = (nodeId: string, modelId: string) =>
+  appStore.deleteDownload(nodeId, modelId);
--- a/dashboard/src/routes/downloads/+page.svelte
+++ b/dashboard/src/routes/downloads/+page.svelte
@@ -6,6 +6,8 @@
    type DownloadProgress,
    refreshState,
    lastUpdate as lastUpdateStore,
+    startDownload,
+    deleteDownload,
  } from "$lib/stores/app.svelte";
  import HeaderNav from "$lib/components/HeaderNav.svelte";

@@ -28,6 +30,7 @@
    etaMs: number;
    status: "completed" | "downloading";
    files: FileProgress[];
+    shardMetadata?: Record<string, unknown>;
  };

  type NodeEntry = {
@@ -269,6 +272,12 @@
            }
          }

+          // Extract shard_metadata for use with download actions
+          const shardMetadata = (downloadPayload.shard_metadata ??
+            downloadPayload.shardMetadata) as
+            | Record<string, unknown>
+            | undefined;
+
          const entry: ModelEntry = {
            modelId,
            prettyName,
@@ -285,6 +294,7 @@
                ? "completed"
                : "downloading",
            files,
+            shardMetadata,
          };

          const existing = modelMap.get(modelId);
@@ -469,6 +479,52 @@
                    >
                      {pct.toFixed(1)}%
                    </span>
+                    {#if model.status !== "completed" && model.shardMetadata}
+                      <button
+                        type="button"
+                        class="text-exo-light-gray hover:text-exo-yellow transition-colors"
+                        onclick={() =>
+                          startDownload(node.nodeId, model.shardMetadata!)}
+                        title="Start download"
+                      >
+                        <svg
+                          class="w-4 h-4"
+                          viewBox="0 0 20 20"
+                          fill="none"
+                          stroke="currentColor"
+                          stroke-width="2"
+                        >
+                          <path
+                            d="M10 3v10m0 0l-3-3m3 3l3-3M3 17h14"
+                            stroke-linecap="round"
+                            stroke-linejoin="round"
+                          ></path>
+                        </svg>
+                      </button>
+                    {/if}
+                    {#if model.status === "completed"}
+                      <button
+                        type="button"
+                        class="text-exo-light-gray hover:text-red-400 transition-colors"
+                        onclick={() =>
+                          deleteDownload(node.nodeId, model.modelId)}
+                        title="Delete download"
+                      >
+                        <svg
+                          class="w-4 h-4"
+                          viewBox="0 0 20 20"
+                          fill="none"
+                          stroke="currentColor"
+                          stroke-width="2"
+                        >
+                          <path
+                            d="M4 6h12M8 6V4h4v2m1 0v10a1 1 0 01-1 1H8a1 1 0 01-1-1V6h6"
+                            stroke-linecap="round"
+                            stroke-linejoin="round"
+                          ></path>
+                        </svg>
+                      </button>
+                    {/if}
                    <button
                      type="button"
                      class="text-exo-light-gray hover:text-exo-yellow transition-colors"
--- a/src/exo/download/coordinator.py
+++ b/src/exo/download/coordinator.py
@@ -0,0 +1,284 @@
+import asyncio
+from dataclasses import dataclass, field
+from typing import Iterator
+
+import anyio
+from anyio import current_time
+from anyio.abc import TaskGroup
+from loguru import logger
+
+from exo.download.download_utils import (
+    RepoDownloadProgress,
+    delete_model,
+    map_repo_download_progress_to_download_progress_data,
+)
+from exo.download.shard_downloader import ShardDownloader
+from exo.shared.models.model_cards import ModelId
+from exo.shared.types.commands import (
+    DeleteDownload,
+    ForwarderDownloadCommand,
+    StartDownload,
+)
+from exo.shared.types.common import NodeId, SessionId
+from exo.shared.types.events import (
+    Event,
+    ForwarderEvent,
+    NodeDownloadProgress,
+)
+from exo.shared.types.worker.downloads import (
+    DownloadCompleted,
+    DownloadFailed,
+    DownloadOngoing,
+    DownloadPending,
+    DownloadProgress,
+)
+from exo.shared.types.worker.shards import ShardMetadata
+from exo.utils.channels import Receiver, Sender, channel
+
+
+@dataclass
+class DownloadCoordinator:
+    node_id: NodeId
+    session_id: SessionId
+    shard_downloader: ShardDownloader
+    download_command_receiver: Receiver[ForwarderDownloadCommand]
+    local_event_sender: Sender[ForwarderEvent]
+    event_index_counter: Iterator[int]
+
+    # Local state
+    download_status: dict[ModelId, DownloadProgress] = field(default_factory=dict)
+    active_downloads: dict[ModelId, asyncio.Task[None]] = field(default_factory=dict)
+
+    # Internal event channel for forwarding (initialized in __post_init__)
+    event_sender: Sender[Event] = field(init=False)
+    event_receiver: Receiver[Event] = field(init=False)
+    _tg: TaskGroup = field(init=False)
+
+    def __post_init__(self) -> None:
+        self.event_sender, self.event_receiver = channel[Event]()
+        self._tg = anyio.create_task_group()
+
+    async def run(self) -> None:
+        logger.info("Starting DownloadCoordinator")
+        async with self._tg as tg:
+            tg.start_soon(self._command_processor)
+            tg.start_soon(self._forward_events)
+            tg.start_soon(self._emit_existing_download_progress)
+
+    def shutdown(self) -> None:
+        self._tg.cancel_scope.cancel()
+
+    async def _command_processor(self) -> None:
+        with self.download_command_receiver as commands:
+            async for cmd in commands:
+                # Only process commands targeting this node
+                if cmd.command.target_node_id != self.node_id:
+                    continue
+
+                match cmd.command:
+                    case StartDownload(shard_metadata=shard):
+                        await self._start_download(shard)
+                    case DeleteDownload(model_id=model_id):
+                        await self._delete_download(model_id)
+
+    async def _start_download(self, shard: ShardMetadata) -> None:
+        model_id = shard.model_card.model_id
+
+        # Check if already downloading or complete
+        if model_id in self.download_status:
+            status = self.download_status[model_id]
+            if isinstance(status, (DownloadOngoing, DownloadCompleted)):
+                logger.debug(
+                    f"Download for {model_id} already in progress or complete, skipping"
+                )
+                return
+
+        # Emit pending status
+        progress = DownloadPending(shard_metadata=shard, node_id=self.node_id)
+        self.download_status[model_id] = progress
+        await self.event_sender.send(NodeDownloadProgress(download_progress=progress))
+
+        # Check initial status from downloader
+        initial_progress = (
+            await self.shard_downloader.get_shard_download_status_for_shard(shard)
+        )
+
+        if initial_progress.status == "complete":
+            completed = DownloadCompleted(
+                shard_metadata=shard,
+                node_id=self.node_id,
+                total_bytes=initial_progress.total_bytes,
+            )
+            self.download_status[model_id] = completed
+            await self.event_sender.send(
+                NodeDownloadProgress(download_progress=completed)
+            )
+            return
+
+        # Start actual download
+        self._start_download_task(shard, initial_progress)
+
+    def _start_download_task(
+        self, shard: ShardMetadata, initial_progress: RepoDownloadProgress
+    ) -> None:
+        model_id = shard.model_card.model_id
+
+        # Emit ongoing status
+        status = DownloadOngoing(
+            node_id=self.node_id,
+            shard_metadata=shard,
+            download_progress=map_repo_download_progress_to_download_progress_data(
+                initial_progress
+            ),
+        )
+        self.download_status[model_id] = status
+        self.event_sender.send_nowait(NodeDownloadProgress(download_progress=status))
+
+        last_progress_time = 0.0
+        throttle_interval_secs = 1.0
+
+        async def download_progress_callback(
+            callback_shard: ShardMetadata, progress: RepoDownloadProgress
+        ) -> None:
+            nonlocal last_progress_time
+
+            if progress.status == "complete":
+                completed = DownloadCompleted(
+                    shard_metadata=callback_shard,
+                    node_id=self.node_id,
+                    total_bytes=progress.total_bytes,
+                )
+                self.download_status[callback_shard.model_card.model_id] = completed
+                await self.event_sender.send(
+                    NodeDownloadProgress(download_progress=completed)
+                )
+                # Clean up active download tracking
+                if callback_shard.model_card.model_id in self.active_downloads:
+                    del self.active_downloads[callback_shard.model_card.model_id]
+            elif (
+                progress.status == "in_progress"
+                and current_time() - last_progress_time > throttle_interval_secs
+            ):
+                ongoing = DownloadOngoing(
+                    node_id=self.node_id,
+                    shard_metadata=callback_shard,
+                    download_progress=map_repo_download_progress_to_download_progress_data(
+                        progress
+                    ),
+                )
+                self.download_status[callback_shard.model_card.model_id] = ongoing
+                await self.event_sender.send(
+                    NodeDownloadProgress(download_progress=ongoing)
+                )
+                last_progress_time = current_time()
+
+        self.shard_downloader.on_progress(download_progress_callback)
+
+        async def download_wrapper() -> None:
+            try:
+                await self.shard_downloader.ensure_shard(shard)
+            except Exception as e:
+                logger.error(f"Download failed for {model_id}: {e}")
+                failed = DownloadFailed(
+                    shard_metadata=shard,
+                    node_id=self.node_id,
+                    error_message=str(e),
+                )
+                self.download_status[model_id] = failed
+                await self.event_sender.send(
+                    NodeDownloadProgress(download_progress=failed)
+                )
+            finally:
+                if model_id in self.active_downloads:
+                    del self.active_downloads[model_id]
+
+        task = asyncio.create_task(download_wrapper())
+        self.active_downloads[model_id] = task
+
+    async def _delete_download(self, model_id: ModelId) -> None:
+        # Cancel if active
+        if model_id in self.active_downloads:
+            logger.info(f"Cancelling active download for {model_id} before deletion")
+            self.active_downloads[model_id].cancel()
+            del self.active_downloads[model_id]
+
+        # Delete from disk
+        logger.info(f"Deleting model files for {model_id}")
+        deleted = await delete_model(model_id)
+
+        if deleted:
+            logger.info(f"Successfully deleted model {model_id}")
+        else:
+            logger.warning(f"Model {model_id} was not found on disk")
+
+        # Emit pending status to reset UI state, then remove from local tracking
+        if model_id in self.download_status:
+            current_status = self.download_status[model_id]
+            pending = DownloadPending(
+                shard_metadata=current_status.shard_metadata,
+                node_id=self.node_id,
+            )
+            await self.event_sender.send(
+                NodeDownloadProgress(download_progress=pending)
+            )
+            del self.download_status[model_id]
+
+    async def _forward_events(self) -> None:
+        with self.event_receiver as events:
+            async for event in events:
+                idx = next(self.event_index_counter)
+                fe = ForwarderEvent(
+                    origin_idx=idx,
+                    origin=self.node_id,
+                    session=self.session_id,
+                    event=event,
+                )
+                logger.debug(
+                    f"DownloadCoordinator published event {idx}: {str(event)[:100]}"
+                )
+                await self.local_event_sender.send(fe)
+
+    async def _emit_existing_download_progress(self) -> None:
+        try:
+            while True:
+                logger.info(
+                    "DownloadCoordinator: Fetching and emitting existing download progress..."
+                )
+                async for (
+                    _,
+                    progress,
+                ) in self.shard_downloader.get_shard_download_status():
+                    if progress.status == "complete":
+                        status: DownloadProgress = DownloadCompleted(
+                            node_id=self.node_id,
+                            shard_metadata=progress.shard,
+                            total_bytes=progress.total_bytes,
+                        )
+                    elif progress.status in ["in_progress", "not_started"]:
+                        if progress.downloaded_bytes_this_session.in_bytes == 0:
+                            status = DownloadPending(
+                                node_id=self.node_id, shard_metadata=progress.shard
+                            )
+                        else:
+                            status = DownloadOngoing(
+                                node_id=self.node_id,
+                                shard_metadata=progress.shard,
+                                download_progress=map_repo_download_progress_to_download_progress_data(
+                                    progress
+                                ),
+                            )
+                    else:
+                        continue
+
+                    self.download_status[progress.shard.model_card.model_id] = status
+                    await self.event_sender.send(
+                        NodeDownloadProgress(download_progress=status)
+                    )
+                logger.info(
+                    "DownloadCoordinator: Done emitting existing download progress."
+                )
+                await anyio.sleep(5 * 60)  # 5 minutes
+        except Exception as e:
+            logger.error(
+                f"DownloadCoordinator: Error emitting existing download progress: {e}"
+            )
--- a/src/exo/worker/download/download_utils.py
+++ b/src/exo/worker/download/download_utils.py
@@ -24,6 +24,13 @@ from pydantic import (
    TypeAdapter,
 )

+from exo.download.huggingface_utils import (
+    filter_repo_objects,
+    get_allow_patterns,
+    get_auth_headers,
+    get_hf_endpoint,
+    get_hf_token,
+)
 from exo.shared.constants import EXO_MODELS_DIR
 from exo.shared.types.common import ModelId
 from exo.shared.types.memory import Memory
@@ -35,13 +42,6 @@ from exo.shared.types.worker.downloads import (
    RepoFileDownloadProgress,
 )
 from exo.shared.types.worker.shards import ShardMetadata
-from exo.worker.download.huggingface_utils import (
-    filter_repo_objects,
-    get_allow_patterns,
-    get_auth_headers,
-    get_hf_endpoint,
-    get_hf_token,
-)


 class HuggingFaceAuthenticationError(Exception):
--- a/src/exo/worker/download/huggingface_utils.py
+++ b/src/exo/worker/download/huggingface_utils.py
--- a/src/exo/worker/download/impl_shard_downloader.py
+++ b/src/exo/worker/download/impl_shard_downloader.py
@@ -5,13 +5,13 @@ from typing import AsyncIterator, Callable

 from loguru import logger

+from exo.download.download_utils import RepoDownloadProgress, download_shard
+from exo.download.shard_downloader import ShardDownloader
 from exo.shared.models.model_cards import MODEL_CARDS, ModelCard, ModelId
 from exo.shared.types.worker.shards import (
    PipelineShardMetadata,
    ShardMetadata,
 )
-from exo.worker.download.download_utils import RepoDownloadProgress, download_shard
-from exo.worker.download.shard_downloader import ShardDownloader


 def exo_shard_downloader(max_parallel_downloads: int = 8) -> ShardDownloader:
--- a/src/exo/worker/download/shard_downloader.py
+++ b/src/exo/worker/download/shard_downloader.py
@@ -5,13 +5,13 @@ from datetime import timedelta
 from pathlib import Path
 from typing import AsyncIterator, Callable

+from exo.download.download_utils import RepoDownloadProgress
 from exo.shared.models.model_cards import ModelCard, ModelId, ModelTask
 from exo.shared.types.memory import Memory
 from exo.shared.types.worker.shards import (
    PipelineShardMetadata,
    ShardMetadata,
 )
-from exo.worker.download.download_utils import RepoDownloadProgress


 # TODO: the PipelineShardMetadata getting reinstantiated is a bit messy. Should this be a classmethod?
--- a/src/exo/main.py
+++ b/src/exo/main.py
@@ -1,10 +1,11 @@
 import argparse
+import itertools
 import multiprocessing as mp
 import os
 import resource
 import signal
 from dataclasses import dataclass, field
-from typing import Self
+from typing import Iterator, Self

 import anyio
 from anyio.abc import TaskGroup
@@ -12,6 +13,8 @@ from loguru import logger
 from pydantic import PositiveInt

 import exo.routing.topics as topics
+from exo.download.coordinator import DownloadCoordinator
+from exo.download.impl_shard_downloader import exo_shard_downloader
 from exo.master.api import API  # TODO: should API be in master?
 from exo.master.main import Master
 from exo.routing.router import Router, get_node_id_keypair
@@ -21,7 +24,6 @@ from exo.shared.logging import logger_cleanup, logger_setup
 from exo.shared.types.common import NodeId, SessionId
 from exo.utils.channels import Receiver, channel
 from exo.utils.pydantic_ext import CamelCaseModel
-from exo.worker.download.impl_shard_downloader import exo_shard_downloader
 from exo.worker.main import Worker


@@ -29,6 +31,7 @@ from exo.worker.main import Worker
@dataclass
 class Node:
    router: Router
+    download_coordinator: DownloadCoordinator | None
    worker: Worker | None
    election: Election  # Every node participates in election, as we do want a node to become master even if it isn't a master candidate if no master candidates are present.
    election_result_receiver: Receiver[ElectionResult]
@@ -36,6 +39,7 @@ class Node:
    api: API | None

    node_id: NodeId
+    event_index_counter: Iterator[int]
    _tg: TaskGroup = field(init=False, default_factory=anyio.create_task_group)

    @classmethod
@@ -49,8 +53,26 @@ class Node:
        await router.register_topic(topics.COMMANDS)
        await router.register_topic(topics.ELECTION_MESSAGES)
        await router.register_topic(topics.CONNECTION_MESSAGES)
+        await router.register_topic(topics.DOWNLOAD_COMMANDS)

        logger.info(f"Starting node {node_id}")
+
+        # Create shared event index counter for Worker and DownloadCoordinator
+        event_index_counter = itertools.count()
+
+        # Create DownloadCoordinator (unless --no-downloads)
+        if not args.no_downloads:
+            download_coordinator = DownloadCoordinator(
+                node_id,
+                session_id,
+                exo_shard_downloader(),
+                download_command_receiver=router.receiver(topics.DOWNLOAD_COMMANDS),
+                local_event_sender=router.sender(topics.LOCAL_EVENTS),
+                event_index_counter=event_index_counter,
+            )
+        else:
+            download_coordinator = None
+
        if args.spawn_api:
            api = API(
                node_id,
@@ -58,6 +80,7 @@ class Node:
                port=args.api_port,
                global_event_receiver=router.receiver(topics.GLOBAL_EVENTS),
                command_sender=router.sender(topics.COMMANDS),
+                download_command_sender=router.sender(topics.DOWNLOAD_COMMANDS),
                election_receiver=router.receiver(topics.ELECTION_MESSAGES),
            )
        else:
@@ -67,11 +90,12 @@ class Node:
            worker = Worker(
                node_id,
                session_id,
-                exo_shard_downloader(),
                connection_message_receiver=router.receiver(topics.CONNECTION_MESSAGES),
                global_event_receiver=router.receiver(topics.GLOBAL_EVENTS),
                local_event_sender=router.sender(topics.LOCAL_EVENTS),
                command_sender=router.sender(topics.COMMANDS),
+                download_command_sender=router.sender(topics.DOWNLOAD_COMMANDS),
+                event_index_counter=event_index_counter,
            )
        else:
            worker = None
@@ -99,13 +123,25 @@ class Node:
            election_result_sender=er_send,
        )

-        return cls(router, worker, election, er_recv, master, api, node_id)
+        return cls(
+            router,
+            download_coordinator,
+            worker,
+            election,
+            er_recv,
+            master,
+            api,
+            node_id,
+            event_index_counter,
+        )

    async def run(self):
        async with self._tg as tg:
            signal.signal(signal.SIGINT, lambda _, __: self.shutdown())
            tg.start_soon(self.router.run)
            tg.start_soon(self.election.run)
+            if self.download_coordinator:
+                tg.start_soon(self.download_coordinator.run)
            if self.worker:
                tg.start_soon(self.worker.run)
            if self.master:
@@ -170,13 +206,27 @@ class Node:
                    )
                if result.is_new_master:
                    await anyio.sleep(0)
+                    # Fresh counter for new session (buffer expects indices from 0)
+                    self.event_index_counter = itertools.count()
+                    if self.download_coordinator:
+                        self.download_coordinator.shutdown()
+                        self.download_coordinator = DownloadCoordinator(
+                            self.node_id,
+                            result.session_id,
+                            exo_shard_downloader(),
+                            download_command_receiver=self.router.receiver(
+                                topics.DOWNLOAD_COMMANDS
+                            ),
+                            local_event_sender=self.router.sender(topics.LOCAL_EVENTS),
+                            event_index_counter=self.event_index_counter,
+                        )
+                        self._tg.start_soon(self.download_coordinator.run)
                    if self.worker:
                        self.worker.shutdown()
                        # TODO: add profiling etc to resource monitor
                        self.worker = Worker(
                            self.node_id,
                            result.session_id,
-                            exo_shard_downloader(),
                            connection_message_receiver=self.router.receiver(
                                topics.CONNECTION_MESSAGES
                            ),
@@ -185,6 +235,10 @@ class Node:
                            ),
                            local_event_sender=self.router.sender(topics.LOCAL_EVENTS),
                            command_sender=self.router.sender(topics.COMMANDS),
+                            download_command_sender=self.router.sender(
+                                topics.DOWNLOAD_COMMANDS
+                            ),
+                            event_index_counter=self.event_index_counter,
                        )
                        self._tg.start_soon(self.worker.run)
                    if self.api:
@@ -226,6 +280,7 @@ class Args(CamelCaseModel):
    api_port: PositiveInt = 52415
    tb_only: bool = False
    no_worker: bool = False
+    no_downloads: bool = False
    fast_synch: bool | None = None  # None = auto, True = force on, False = force off

    @classmethod
@@ -268,6 +323,11 @@ class Args(CamelCaseModel):
            "--no-worker",
            action="store_true",
        )
+        parser.add_argument(
+            "--no-downloads",
+            action="store_true",
+            help="Disable the download coordinator (node won't download models)",
+        )
        fast_synch_group = parser.add_mutually_exclusive_group()
        fast_synch_group.add_argument(
            "--fast-synch",
--- a/src/exo/master/api.py
+++ b/src/exo/master/api.py
@@ -44,6 +44,7 @@ from exo.shared.types.api import (
    ChatCompletionResponse,
    CreateInstanceParams,
    CreateInstanceResponse,
+    DeleteDownloadResponse,
    DeleteInstanceResponse,
    ErrorInfo,
    ErrorResponse,
@@ -61,6 +62,8 @@ from exo.shared.types.api import (
    PlaceInstanceParams,
    PlacementPreview,
    PlacementPreviewResponse,
+    StartDownloadParams,
+    StartDownloadResponse,
    StreamingChoiceResponse,
    ToolCall,
 )
@@ -72,15 +75,20 @@ from exo.shared.types.chunks import (
    ToolCallChunk,
 )
 from exo.shared.types.commands import (
+    CancelTask,
    ChatCompletion,
    Command,
    CreateInstance,
+    DeleteDownload,
    DeleteInstance,
+    DownloadCommand,
    ForwarderCommand,
+    ForwarderDownloadCommand,
    ImageEdits,
    ImageGeneration,
    PlaceInstance,
    SendInputChunk,
+    StartDownload,
    TaskFinished,
 )
 from exo.shared.types.common import CommandId, Id, NodeId, SessionId
@@ -156,12 +164,14 @@ class API:
        # Ideally this would be a MasterForwarderEvent but type system says no :(
        global_event_receiver: Receiver[ForwarderEvent],
        command_sender: Sender[ForwarderCommand],
+        download_command_sender: Sender[ForwarderDownloadCommand],
        # This lets us pause the API if an election is running
        election_receiver: Receiver[ElectionMessage],
    ) -> None:
        self.state = State()
        self._event_log: list[Event] = []
        self.command_sender = command_sender
+        self.download_command_sender = download_command_sender
        self.global_event_receiver = global_event_receiver
        self.election_receiver = election_receiver
        self.event_buffer: OrderedBuffer[Event] = OrderedBuffer[Event]()
@@ -260,6 +270,8 @@ class API:
        self.app.get("/images/{image_id}")(self.get_image)
        self.app.get("/state")(lambda: self.state)
        self.app.get("/events")(lambda: self._event_log)
+        self.app.post("/download/start")(self.start_download)
+        self.app.delete("/download/{node_id}/{model_id:path}")(self.delete_download)

    async def place_instance(self, payload: PlaceInstanceParams):
        command = PlaceInstance(
@@ -490,12 +502,10 @@ class API:
                        break

        except anyio.get_cancelled_exc_class():
-            # TODO: TaskCancelled
-            """
+            command = CancelTask(cancelled_command_id=command_id)
            self.command_sender.send_nowait(
                ForwarderCommand(origin=self.node_id, command=command)
            )
-            """
            raise
        finally:
            command = TaskFinished(finished_command_id=command_id)
@@ -883,6 +893,10 @@ class API:
                        del image_metadata[key]

        except anyio.get_cancelled_exc_class():
+            command = CancelTask(cancelled_command_id=command_id)
+            self.command_sender.send_nowait(
+                ForwarderCommand(origin=self.node_id, command=command)
+            )
            raise
        finally:
            await self._send(TaskFinished(finished_command_id=command_id))
@@ -964,6 +978,10 @@ class API:

            return (images, stats if capture_stats else None)
        except anyio.get_cancelled_exc_class():
+            command = CancelTask(cancelled_command_id=command_id)
+            self.command_sender.send_nowait(
+                ForwarderCommand(origin=self.node_id, command=command)
+            )
            raise
        finally:
            await self._send(TaskFinished(finished_command_id=command_id))
@@ -1292,3 +1310,28 @@ class API:
        await self.command_sender.send(
            ForwarderCommand(origin=self.node_id, command=command)
        )
+
+    async def _send_download(self, command: DownloadCommand):
+        await self.download_command_sender.send(
+            ForwarderDownloadCommand(origin=self.node_id, command=command)
+        )
+
+    async def start_download(
+        self, payload: StartDownloadParams
+    ) -> StartDownloadResponse:
+        command = StartDownload(
+            target_node_id=payload.target_node_id,
+            shard_metadata=payload.shard_metadata,
+        )
+        await self._send_download(command)
+        return StartDownloadResponse(command_id=command.command_id)
+
+    async def delete_download(
+        self, node_id: NodeId, model_id: ModelId
+    ) -> DeleteDownloadResponse:
+        command = DeleteDownload(
+            target_node_id=node_id,
+            model_id=ModelId(model_id),
+        )
+        await self._send_download(command)
+        return DeleteDownloadResponse(command_id=command.command_id)
--- a/src/exo/master/main.py
+++ b/src/exo/master/main.py
@@ -12,6 +12,7 @@ from exo.master.placement import (
 )
 from exo.shared.apply import apply
 from exo.shared.types.commands import (
+    CancelTask,
    ChatCompletion,
    CreateInstance,
    DeleteInstance,
@@ -35,6 +36,7 @@ from exo.shared.types.events import (
    NodeTimedOut,
    TaskCreated,
    TaskDeleted,
+    TaskStatusUpdated,
 )
 from exo.shared.types.state import State
 from exo.shared.types.tasks import (
@@ -278,6 +280,15 @@ class Master:
                                    chunk=chunk,
                                )
                            )
+                        case CancelTask():
+                            generated_events.append(
+                                TaskStatusUpdated(
+                                    task_status=TaskStatus.Cancelled,
+                                    task_id=self.command_task_mapping[
+                                        command.cancelled_command_id
+                                    ],
+                                )
+                            )
                        case TaskFinished():
                            generated_events.append(
                                TaskDeleted(
@@ -286,10 +297,7 @@ class Master:
                                    ]
                                )
                            )
-                            if command.finished_command_id in self.command_task_mapping:
-                                del self.command_task_mapping[
-                                    command.finished_command_id
-                                ]
+                            del self.command_task_mapping[command.finished_command_id]
                        case RequestEventLog():
                            # We should just be able to send everything, since other buffers will ignore old messages
                            for i in range(command.since_idx, len(self._event_log)):
--- a/src/exo/master/placement.py
+++ b/src/exo/master/placement.py
@@ -35,7 +35,7 @@ from exo.shared.types.worker.shards import Sharding

 def random_ephemeral_port() -> int:
    port = random.randint(49153, 65535)
-    return port - 1 if port <= 52415 else 52414
+    return port - 1 if port <= 52415 else port


 def add_instance_to_placements(
--- a/src/exo/routing/topics.py
+++ b/src/exo/routing/topics.py
@@ -3,7 +3,7 @@ from enum import Enum

 from exo.routing.connection_message import ConnectionMessage
 from exo.shared.election import ElectionMessage
-from exo.shared.types.commands import ForwarderCommand
+from exo.shared.types.commands import ForwarderCommand, ForwarderDownloadCommand
 from exo.shared.types.events import (
    ForwarderEvent,
 )
@@ -45,3 +45,6 @@ ELECTION_MESSAGES = TypedTopic(
 CONNECTION_MESSAGES = TypedTopic(
    "connection_messages", PublishPolicy.Never, ConnectionMessage
 )
+DOWNLOAD_COMMANDS = TypedTopic(
+    "download_commands", PublishPolicy.Always, ForwarderDownloadCommand
+)
--- a/src/exo/shared/models/model_cards.py
+++ b/src/exo/shared/models/model_cards.py
@@ -621,7 +621,7 @@ class ConfigData(BaseModel):

 async def get_config_data(model_id: ModelId) -> ConfigData:
    """Downloads and parses config.json for a model."""
-    from exo.worker.download.download_utils import (
+    from exo.download.download_utils import (
        download_file_with_retry,
        ensure_models_dir,
    )
@@ -643,11 +643,11 @@ async def get_config_data(model_id: ModelId) -> ConfigData:

 async def get_safetensors_size(model_id: ModelId) -> Memory:
    """Gets model size from safetensors index or falls back to HF API."""
-    from exo.shared.types.worker.downloads import ModelSafetensorsIndex
-    from exo.worker.download.download_utils import (
+    from exo.download.download_utils import (
        download_file_with_retry,
        ensure_models_dir,
    )
+    from exo.shared.types.worker.downloads import ModelSafetensorsIndex

    target_dir = (await ensure_models_dir()) / model_id.normalize()
    await aios.makedirs(target_dir, exist_ok=True)
--- a/src/exo/shared/types/api.py
+++ b/src/exo/shared/types/api.py
@@ -7,10 +7,11 @@ from pydantic import BaseModel, Field, field_validator
 from pydantic_core import PydanticUseDefault

 from exo.shared.models.model_cards import ModelCard, ModelId
-from exo.shared.types.common import CommandId
+from exo.shared.types.common import CommandId, NodeId
 from exo.shared.types.memory import Memory
 from exo.shared.types.worker.instances import Instance, InstanceId, InstanceMeta
-from exo.shared.types.worker.shards import Sharding
+from exo.shared.types.worker.shards import Sharding, ShardMetadata
+from exo.utils.pydantic_ext import CamelCaseModel

 FinishReason = Literal[
    "stop", "length", "tool_calls", "content_filter", "function_call", "error"
@@ -352,3 +353,16 @@ class ImageListItem(BaseModel, frozen=True):

 class ImageListResponse(BaseModel, frozen=True):
    data: list[ImageListItem]
+
+
+class StartDownloadParams(CamelCaseModel):
+    target_node_id: NodeId
+    shard_metadata: ShardMetadata
+
+
+class StartDownloadResponse(CamelCaseModel):
+    command_id: CommandId
+
+
+class DeleteDownloadResponse(CamelCaseModel):
+    command_id: CommandId
--- a/src/exo/shared/types/commands.py
+++ b/src/exo/shared/types/commands.py
@@ -1,6 +1,6 @@
 from pydantic import Field

-from exo.shared.models.model_cards import ModelCard
+from exo.shared.models.model_cards import ModelCard, ModelId
 from exo.shared.types.api import (
    ChatCompletionTaskParams,
    ImageEditsInternalParams,
@@ -9,7 +9,7 @@ from exo.shared.types.api import (
 from exo.shared.types.chunks import InputImageChunk
 from exo.shared.types.common import CommandId, NodeId
 from exo.shared.types.worker.instances import Instance, InstanceId, InstanceMeta
-from exo.shared.types.worker.shards import Sharding
+from exo.shared.types.worker.shards import Sharding, ShardMetadata
 from exo.utils.pydantic_ext import CamelCaseModel, TaggedModel


@@ -48,6 +48,10 @@ class DeleteInstance(BaseCommand):
    instance_id: InstanceId


+class CancelTask(BaseCommand):
+    cancelled_command_id: CommandId
+
+
 class TaskFinished(BaseCommand):
    finished_command_id: CommandId

@@ -62,6 +66,19 @@ class RequestEventLog(BaseCommand):
    since_idx: int


+class StartDownload(BaseCommand):
+    target_node_id: NodeId
+    shard_metadata: ShardMetadata
+
+
+class DeleteDownload(BaseCommand):
+    target_node_id: NodeId
+    model_id: ModelId
+
+
+DownloadCommand = StartDownload | DeleteDownload
+
+
 Command = (
    TestCommand
    | RequestEventLog
@@ -71,6 +88,7 @@ Command = (
    | PlaceInstance
    | CreateInstance
    | DeleteInstance
+    | CancelTask
    | TaskFinished
    | SendInputChunk
 )
@@ -79,3 +97,8 @@ Command = (
 class ForwarderCommand(CamelCaseModel):
    origin: NodeId
    command: Command
+
+
+class ForwarderDownloadCommand(CamelCaseModel):
+    origin: NodeId
+    command: DownloadCommand
--- a/src/exo/shared/types/tasks.py
+++ b/src/exo/shared/types/tasks.py
@@ -24,6 +24,7 @@ class TaskStatus(str, Enum):
    Complete = "Complete"
    TimedOut = "TimedOut"
    Failed = "Failed"
+    Cancelled = "Cancelled"


 class BaseTask(TaggedModel):
@@ -60,6 +61,10 @@ class ChatCompletion(BaseTask):  # emitted by Master
    error_message: str | None = Field(default=None)


+class CancelTask(BaseTask):
+    cancelled_task_id: TaskId
+
+
 class ImageGeneration(BaseTask):  # emitted by Master
    command_id: CommandId
    task_params: ImageGenerationTaskParams
@@ -87,6 +92,7 @@ Task = (
    | LoadModel
    | StartWarmup
    | ChatCompletion
+    | CancelTask
    | ImageGeneration
    | ImageEdits
    | Shutdown
--- a/src/exo/utils/keyed_backoff.py
+++ b/src/exo/utils/keyed_backoff.py
@@ -0,0 +1,32 @@
+import time
+from typing import Generic, TypeVar
+
+K = TypeVar("K")
+
+
+class KeyedBackoff(Generic[K]):
+    """Tracks exponential backoff state per key."""
+
+    def __init__(self, base: float = 0.5, cap: float = 10.0):
+        self._base = base
+        self._cap = cap
+        self._attempts: dict[K, int] = {}
+        self._last_time: dict[K, float] = {}
+
+    def should_proceed(self, key: K) -> bool:
+        """Returns True if enough time has elapsed since last attempt."""
+        now = time.monotonic()
+        last = self._last_time.get(key, 0.0)
+        attempts = self._attempts.get(key, 0)
+        delay = min(self._cap, self._base * (2.0**attempts))
+        return now - last >= delay
+
+    def record_attempt(self, key: K) -> None:
+        """Record that an attempt was made for this key."""
+        self._last_time[key] = time.monotonic()
+        self._attempts[key] = self._attempts.get(key, 0) + 1
+
+    def reset(self, key: K) -> None:
+        """Reset backoff state for a key (e.g., on success)."""
+        self._attempts.pop(key, None)
+        self._last_time.pop(key, None)
--- a/src/exo/worker/engines/image/distributed_model.py
+++ b/src/exo/worker/engines/image/distributed_model.py
@@ -6,10 +6,10 @@ import mlx.core as mx
 from mflux.models.common.config.config import Config
 from PIL import Image

+from exo.download.download_utils import build_model_path
 from exo.shared.types.api import AdvancedImageParams
 from exo.shared.types.worker.instances import BoundInstance
 from exo.shared.types.worker.shards import PipelineShardMetadata
-from exo.worker.download.download_utils import build_model_path
 from exo.worker.engines.image.config import ImageModelConfig
 from exo.worker.engines.image.models import (
    create_adapter_for_model,
--- a/src/exo/worker/engines/mlx/generator/generate.py
+++ b/src/exo/worker/engines/mlx/generator/generate.py
@@ -23,7 +23,6 @@ from exo.worker.engines.mlx.constants import KV_BITS, KV_GROUP_SIZE, MAX_TOKENS
 from exo.worker.engines.mlx.utils_mlx import (
    apply_chat_template,
    make_kv_cache,
-    mx_barrier,
 )
 from exo.worker.runner.bootstrap import logger

@@ -90,10 +89,6 @@ def warmup_inference(

    logger.info("Generated ALL warmup tokens")

-    # TODO: Do we want an mx_barrier?
-    #  At least this version is actively incorrect, as it should use mx_barrier(group)
-    mx_barrier()
-
    return tokens_generated


@@ -186,5 +181,3 @@ def mlx_generate(

        if out.finish_reason is not None:
            break
-
-        # TODO: Do we want an mx_barrier?
--- a/src/exo/worker/engines/mlx/utils_mlx.py
+++ b/src/exo/worker/engines/mlx/utils_mlx.py
@@ -41,6 +41,7 @@ import mlx.nn as nn
 from mlx_lm.utils import load_model
 from pydantic import RootModel

+from exo.download.download_utils import build_model_path
 from exo.shared.types.api import ChatCompletionMessageText
 from exo.shared.types.common import Host
 from exo.shared.types.memory import Memory
@@ -55,7 +56,6 @@ from exo.shared.types.worker.shards import (
    ShardMetadata,
    TensorShardMetadata,
 )
-from exo.worker.download.download_utils import build_model_path
 from exo.worker.engines.mlx import Model
 from exo.worker.engines.mlx.auto_parallel import (
    TimeoutCallback,
@@ -70,8 +70,6 @@ Group = mx.distributed.Group
 resource.setrlimit(resource.RLIMIT_NOFILE, (2048, 4096))


-# TODO: Test this
-#  ALSO https://github.com/exo-explore/exo/pull/233#discussion_r2549683673
 def get_weights_size(model_shard_meta: ShardMetadata) -> Memory:
    return Memory.from_float_kb(
        (model_shard_meta.end_layer - model_shard_meta.start_layer)
@@ -89,30 +87,6 @@ class ModelLoadingTimeoutError(Exception):
    pass


-def mx_barrier(group: Group | None = None):
-    mx.eval(
-        mx.distributed.all_sum(
-            mx.array(1.0),
-            stream=mx.default_stream(mx.Device(mx.cpu)),
-            group=group,
-        )
-    )
-
-
-def broadcast_from_zero(value: int, group: Group | None = None):
-    if group is None:
-        return value
-
-    if group.rank() == 0:
-        a = mx.array([value], dtype=mx.int32)
-    else:
-        a = mx.array([0], dtype=mx.int32)
-
-    m = mx.distributed.all_sum(a, stream=mx.Device(mx.DeviceType.cpu), group=group)
-    mx.eval(m)
-    return int(m.item())
-
-
 class HostList(RootModel[list[str]]):
    @classmethod
    def from_hosts(cls, hosts: list[Host]) -> "HostList":
@@ -536,3 +510,33 @@ def mlx_cleanup(
    import gc

    gc.collect()
+
+
+def mx_any(bool_: bool, group: Group | None) -> bool:
+    if group is None:
+        return bool_
+    num_true = mx.distributed.all_sum(
+        mx.array(bool_), group=group, stream=mx.default_stream(mx.Device(mx.cpu))
+    )
+    mx.eval(num_true)
+    return num_true.item() > 0
+
+
+def mx_all(bool_: bool, group: Group | None) -> bool:
+    if group is None:
+        return bool_
+    num_true = mx.distributed.all_sum(
+        mx.array(bool_), group=group, stream=mx.default_stream(mx.Device(mx.cpu))
+    )
+    mx.eval(num_true)
+    return num_true.item() == group.size()
+
+
+def mx_barrier(group: Group | None):
+    if group is None:
+        return
+    mx.eval(
+        mx.distributed.all_sum(
+            mx.array(1.0), group=group, stream=mx.default_stream(mx.Device(mx.cpu))
+        )
+    )
--- a/src/exo/worker/main.py
+++ b/src/exo/worker/main.py
@@ -1,8 +1,9 @@
 from datetime import datetime, timezone
 from random import random
+from typing import Iterator

 import anyio
-from anyio import CancelScope, create_task_group, current_time, fail_after
+from anyio import CancelScope, create_task_group, fail_after
 from anyio.abc import TaskGroup
 from loguru import logger

@@ -10,7 +11,12 @@ from exo.routing.connection_message import ConnectionMessage, ConnectionMessageT
 from exo.shared.apply import apply
 from exo.shared.models.model_cards import ModelId
 from exo.shared.types.api import ImageEditsInternalParams
-from exo.shared.types.commands import ForwarderCommand, RequestEventLog
+from exo.shared.types.commands import (
+    ForwarderCommand,
+    ForwarderDownloadCommand,
+    RequestEventLog,
+    StartDownload,
+)
 from exo.shared.types.common import CommandId, NodeId, SessionId
 from exo.shared.types.events import (
    Event,
@@ -18,7 +24,6 @@ from exo.shared.types.events import (
    ForwarderEvent,
    IndexedEvent,
    InputChunkReceived,
-    NodeDownloadProgress,
    NodeGatheredInfo,
    TaskCreated,
    TaskStatusUpdated,
@@ -28,6 +33,7 @@ from exo.shared.types.events import (
 from exo.shared.types.multiaddr import Multiaddr
 from exo.shared.types.state import State
 from exo.shared.types.tasks import (
+    CancelTask,
    CreateRunner,
    DownloadModel,
    ImageEdits,
@@ -36,23 +42,12 @@ from exo.shared.types.tasks import (
    TaskStatus,
 )
 from exo.shared.types.topology import Connection, SocketConnection
-from exo.shared.types.worker.downloads import (
-    DownloadCompleted,
-    DownloadFailed,
-    DownloadOngoing,
-    DownloadPending,
-    DownloadProgress,
-)
 from exo.shared.types.worker.runners import RunnerId
-from exo.shared.types.worker.shards import ShardMetadata
 from exo.utils.channels import Receiver, Sender, channel
 from exo.utils.event_buffer import OrderedBuffer
 from exo.utils.info_gatherer.info_gatherer import GatheredInfo, InfoGatherer
 from exo.utils.info_gatherer.net_profile import check_reachable
-from exo.worker.download.download_utils import (
-    map_repo_download_progress_to_download_progress_data,
-)
-from exo.worker.download.shard_downloader import RepoDownloadProgress, ShardDownloader
+from exo.utils.keyed_backoff import KeyedBackoff
 from exo.worker.plan import plan
 from exo.worker.runner.runner_supervisor import RunnerSupervisor

@@ -62,7 +57,6 @@ class Worker:
        self,
        node_id: NodeId,
        session_id: SessionId,
-        shard_downloader: ShardDownloader,
        *,
        connection_message_receiver: Receiver[ConnectionMessage],
        global_event_receiver: Receiver[ForwarderEvent],
@@ -70,23 +64,22 @@ class Worker:
        # This is for requesting updates. It doesn't need to be a general command sender right now,
        # but I think it's the correct way to be thinking about commands
        command_sender: Sender[ForwarderCommand],
+        download_command_sender: Sender[ForwarderDownloadCommand],
+        event_index_counter: Iterator[int],
    ):
        self.node_id: NodeId = node_id
        self.session_id: SessionId = session_id

-        self.shard_downloader: ShardDownloader = shard_downloader
-        self._pending_downloads: dict[RunnerId, ShardMetadata] = {}
-
        self.global_event_receiver = global_event_receiver
        self.local_event_sender = local_event_sender
-        self.local_event_index = 0
+        self.event_index_counter = event_index_counter
        self.command_sender = command_sender
+        self.download_command_sender = download_command_sender
        self.connection_message_receiver = connection_message_receiver
        self.event_buffer = OrderedBuffer[Event]()
        self.out_for_delivery: dict[EventId, ForwarderEvent] = {}

        self.state: State = State()
-        self.download_status: dict[ModelId, DownloadProgress] = {}
        self.runners: dict[RunnerId, RunnerSupervisor] = {}
        self._tg: TaskGroup = create_task_group()

@@ -101,6 +94,8 @@ class Worker:
        self.input_chunk_buffer: dict[CommandId, dict[int, str]] = {}
        self.input_chunk_counts: dict[CommandId, int] = {}

+        self._download_backoff: KeyedBackoff[ModelId] = KeyedBackoff(base=0.5, cap=10.0)
+
    async def run(self):
        logger.info("Starting Worker")

@@ -111,7 +106,6 @@ class Worker:
            tg.start_soon(info_gatherer.run)
            tg.start_soon(self._forward_info, info_recv)
            tg.start_soon(self.plan_step)
-            tg.start_soon(self._emit_existing_download_progress)
            tg.start_soon(self._connection_message_event_writer)
            tg.start_soon(self._resend_out_for_delivery)
            tg.start_soon(self._event_applier)
@@ -121,6 +115,7 @@ class Worker:
        # Actual shutdown code - waits for all tasks to complete before executing.
        self.local_event_sender.close()
        self.command_sender.close()
+        self.download_command_sender.close()
        for runner in self.runners.values():
            runner.shutdown()

@@ -179,11 +174,9 @@ class Worker:
    async def plan_step(self):
        while True:
            await anyio.sleep(0.1)
-            # 3. based on the updated state, we plan & execute an operation.
            task: Task | None = plan(
                self.node_id,
                self.runners,
-                self.download_status,
                self.state.downloads,
                self.state.instances,
                self.state.runners,
@@ -207,42 +200,26 @@ class Worker:
                        )
                    )
                case DownloadModel(shard_metadata=shard):
-                    if shard.model_card.model_id not in self.download_status:
-                        progress = DownloadPending(
-                            shard_metadata=shard, node_id=self.node_id
-                        )
-                        self.download_status[shard.model_card.model_id] = progress
-                        await self.event_sender.send(
-                            NodeDownloadProgress(download_progress=progress)
-                        )
-                    initial_progress = (
-                        await self.shard_downloader.get_shard_download_status_for_shard(
-                            shard
+                    model_id = shard.model_card.model_id
+                    if not self._download_backoff.should_proceed(model_id):
+                        continue
+
+                    self._download_backoff.record_attempt(model_id)
+
+                    await self.download_command_sender.send(
+                        ForwarderDownloadCommand(
+                            origin=self.node_id,
+                            command=StartDownload(
+                                target_node_id=self.node_id,
+                                shard_metadata=shard,
+                            ),
                        )
                    )
-                    if initial_progress.status == "complete":
-                        progress = DownloadCompleted(
-                            shard_metadata=shard,
-                            node_id=self.node_id,
-                            total_bytes=initial_progress.total_bytes,
+                    await self.event_sender.send(
+                        TaskStatusUpdated(
+                            task_id=task.task_id, task_status=TaskStatus.Running
                        )
-                        self.download_status[shard.model_card.model_id] = progress
-                        await self.event_sender.send(
-                            NodeDownloadProgress(download_progress=progress)
-                        )
-                        await self.event_sender.send(
-                            TaskStatusUpdated(
-                                task_id=task.task_id,
-                                task_status=TaskStatus.Complete,
-                            )
-                        )
-                    else:
-                        await self.event_sender.send(
-                            TaskStatusUpdated(
-                                task_id=task.task_id, task_status=TaskStatus.Running
-                            )
-                        )
-                        self._handle_shard_download_process(task, initial_progress)
+                    )
                case Shutdown(runner_id=runner_id):
                    try:
                        with fail_after(3):
@@ -253,6 +230,10 @@ class Worker:
                                task_id=task.task_id, task_status=TaskStatus.TimedOut
                            )
                        )
+                case CancelTask(cancelled_task_id=cancelled_task_id):
+                    await self.runners[self._task_to_runner_id(task)].cancel_task(
+                        cancelled_task_id
+                    )
                case ImageEdits() if task.task_params.total_input_chunks > 0:
                    # Assemble image from chunks and inject into task
                    cmd_id = task.command_id
@@ -375,8 +356,6 @@ class Worker:
            for event in self.out_for_delivery.copy().values():
                await self.local_event_sender.send(event)

-    ## Op Executors
-
    def _create_supervisor(self, task: CreateRunner) -> RunnerSupervisor:
        """Creates and stores a new AssignedRunner with initial downloading status."""
        runner = RunnerSupervisor.create(
@@ -387,104 +366,17 @@ class Worker:
        self._tg.start_soon(runner.run)
        return runner

-    def _handle_shard_download_process(
-        self,
-        task: DownloadModel,
-        initial_progress: RepoDownloadProgress,
-    ):
-        """Manages the shard download process with progress tracking."""
-        status = DownloadOngoing(
-            node_id=self.node_id,
-            shard_metadata=task.shard_metadata,
-            download_progress=map_repo_download_progress_to_download_progress_data(
-                initial_progress
-            ),
-        )
-        self.download_status[task.shard_metadata.model_card.model_id] = status
-        self.event_sender.send_nowait(NodeDownloadProgress(download_progress=status))
-
-        last_progress_time = 0.0
-        throttle_interval_secs = 1.0
-
-        async def download_progress_callback(
-            shard: ShardMetadata, progress: RepoDownloadProgress
-        ) -> None:
-            nonlocal self
-            nonlocal last_progress_time
-            if progress.status == "complete":
-                status = DownloadCompleted(
-                    shard_metadata=shard,
-                    node_id=self.node_id,
-                    total_bytes=progress.total_bytes,
-                )
-                self.download_status[shard.model_card.model_id] = status
-                await self.event_sender.send(
-                    NodeDownloadProgress(download_progress=status)
-                )
-                await self.event_sender.send(
-                    TaskStatusUpdated(
-                        task_id=task.task_id, task_status=TaskStatus.Complete
-                    )
-                )
-            elif (
-                progress.status == "in_progress"
-                and current_time() - last_progress_time > throttle_interval_secs
-            ):
-                status = DownloadOngoing(
-                    node_id=self.node_id,
-                    shard_metadata=shard,
-                    download_progress=map_repo_download_progress_to_download_progress_data(
-                        progress
-                    ),
-                )
-                self.download_status[shard.model_card.model_id] = status
-                await self.event_sender.send(
-                    NodeDownloadProgress(download_progress=status)
-                )
-                last_progress_time = current_time()
-
-        self.shard_downloader.on_progress(download_progress_callback)
-
-        async def download_with_error_handling() -> None:
-            try:
-                await self.shard_downloader.ensure_shard(task.shard_metadata)
-            except Exception as e:
-                error_message = str(e)
-                logger.error(
-                    f"Download failed for {task.shard_metadata.model_card.model_id}: {error_message}"
-                )
-                failed_status = DownloadFailed(
-                    node_id=self.node_id,
-                    shard_metadata=task.shard_metadata,
-                    error_message=error_message,
-                )
-                self.download_status[task.shard_metadata.model_card.model_id] = (
-                    failed_status
-                )
-                await self.event_sender.send(
-                    NodeDownloadProgress(download_progress=failed_status)
-                )
-                await self.event_sender.send(
-                    TaskStatusUpdated(
-                        task_id=task.task_id, task_status=TaskStatus.Failed
-                    )
-                )
-
-        self._tg.start_soon(download_with_error_handling)
-
    async def _forward_events(self) -> None:
        with self.event_receiver as events:
            async for event in events:
+                idx = next(self.event_index_counter)
                fe = ForwarderEvent(
-                    origin_idx=self.local_event_index,
+                    origin_idx=idx,
                    origin=self.node_id,
                    session=self.session_id,
                    event=event,
                )
-                logger.debug(
-                    f"Worker published event {self.local_event_index}: {str(event)[:100]}"
-                )
-                self.local_event_index += 1
+                logger.debug(f"Worker published event {idx}: {str(event)[:100]}")
                await self.local_event_sender.send(fe)
                self.out_for_delivery[event.event_id] = fe

@@ -532,42 +424,3 @@ class Worker:
                    await self.event_sender.send(TopologyEdgeDeleted(conn=conn))

            await anyio.sleep(10)
-
-    async def _emit_existing_download_progress(self) -> None:
-        try:
-            while True:
-                logger.debug("Fetching and emitting existing download progress...")
-                async for (
-                    _,
-                    progress,
-                ) in self.shard_downloader.get_shard_download_status():
-                    if progress.status == "complete":
-                        status = DownloadCompleted(
-                            node_id=self.node_id,
-                            shard_metadata=progress.shard,
-                            total_bytes=progress.total_bytes,
-                        )
-                    elif progress.status in ["in_progress", "not_started"]:
-                        if progress.downloaded_bytes_this_session.in_bytes == 0:
-                            status = DownloadPending(
-                                node_id=self.node_id, shard_metadata=progress.shard
-                            )
-                        else:
-                            status = DownloadOngoing(
-                                node_id=self.node_id,
-                                shard_metadata=progress.shard,
-                                download_progress=map_repo_download_progress_to_download_progress_data(
-                                    progress
-                                ),
-                            )
-                    else:
-                        continue
-
-                    self.download_status[progress.shard.model_card.model_id] = status
-                    await self.event_sender.send(
-                        NodeDownloadProgress(download_progress=status)
-                    )
-                logger.debug("Done emitting existing download progress.")
-                await anyio.sleep(5 * 60)  # 5 minutes
-        except Exception as e:
-            logger.error(f"Error emitting existing download progress: {e}")
--- a/src/exo/worker/plan.py
+++ b/src/exo/worker/plan.py
@@ -2,9 +2,11 @@

 from collections.abc import Mapping, Sequence

-from exo.shared.models.model_cards import ModelId
+from loguru import logger
+
 from exo.shared.types.common import CommandId, NodeId
 from exo.shared.types.tasks import (
+    CancelTask,
    ChatCompletion,
    ConnectToGroup,
    CreateRunner,
@@ -45,25 +47,23 @@ def plan(
    node_id: NodeId,
    # Runners is expected to be FRESH and so should not come from state
    runners: Mapping[RunnerId, RunnerSupervisor],
-    # DL_status is expected to be FRESH and so should not come from state
-    download_status: Mapping[ModelId, DownloadProgress],
-    # gdls is not expected to be fresh
    global_download_status: Mapping[NodeId, Sequence[DownloadProgress]],
    instances: Mapping[InstanceId, Instance],
    all_runners: Mapping[RunnerId, RunnerStatus],  # all global
    tasks: Mapping[TaskId, Task],
-    input_chunk_buffer: Mapping[CommandId, dict[int, str]] | None = None,
-    input_chunk_counts: Mapping[CommandId, int] | None = None,
+    input_chunk_buffer: Mapping[CommandId, dict[int, str]] = {},
+    input_chunk_counts: Mapping[CommandId, int] = {},
 ) -> Task | None:
    # Python short circuiting OR logic should evaluate these sequentially.
    return (
        _kill_runner(runners, all_runners, instances)
        or _create_runner(node_id, runners, instances)
-        or _model_needs_download(runners, download_status)
+        or _model_needs_download(node_id, runners, global_download_status)
        or _init_distributed_backend(runners, all_runners)
        or _load_model(runners, all_runners, global_download_status)
        or _ready_to_warmup(runners, all_runners)
        or _pending_tasks(runners, tasks, all_runners, input_chunk_buffer)
+        or _cancel_tasks(runners, tasks)
    )


@@ -115,9 +115,15 @@ def _create_runner(


 def _model_needs_download(
+    node_id: NodeId,
    runners: Mapping[RunnerId, RunnerSupervisor],
-    download_status: Mapping[ModelId, DownloadProgress],
+    global_download_status: Mapping[NodeId, Sequence[DownloadProgress]],
 ) -> DownloadModel | None:
+    local_downloads = global_download_status.get(node_id, [])
+    download_status = {
+        dp.shard_metadata.model_card.model_id: dp for dp in local_downloads
+    }
+
    for runner in runners.values():
        model_id = runner.bound_instance.bound_shard.model_card.model_id
        if isinstance(runner.status, RunnerIdle) and (
@@ -268,7 +274,7 @@ def _pending_tasks(
    runners: Mapping[RunnerId, RunnerSupervisor],
    tasks: Mapping[TaskId, Task],
    all_runners: Mapping[RunnerId, RunnerStatus],
-    input_chunk_buffer: Mapping[CommandId, dict[int, str]] | None = None,
+    input_chunk_buffer: Mapping[CommandId, dict[int, str]],
 ) -> Task | None:
    for task in tasks.values():
        # for now, just forward chat completions
@@ -282,7 +288,7 @@ def _pending_tasks(
        if isinstance(task, ImageEdits) and task.task_params.total_input_chunks > 0:
            cmd_id = task.command_id
            expected = task.task_params.total_input_chunks
-            received = len((input_chunk_buffer or {}).get(cmd_id, {}))
+            received = len(input_chunk_buffer.get(cmd_id, {}))
            if received < expected:
                continue  # Wait for all chunks to arrive

@@ -290,16 +296,33 @@ def _pending_tasks(
            if task.instance_id != runner.bound_instance.instance.instance_id:
                continue

-            # I have a design point here; this is a state race in disguise as the task status doesn't get updated to completed fast enough
-            # however, realistically the task status should be set to completed by the LAST runner, so this is a true race
-            # the actual solution is somewhat deeper than this bypass - TODO!
+            # the task status _should_ be set to completed by the LAST runner
+            # it is currently set by the first
+            # this is definitely a hack
            if task.task_id in runner.completed:
                continue

-            # TODO: Check ordering aligns with MLX distributeds expectations.
-
            if isinstance(runner.status, RunnerReady) and all(
                isinstance(all_runners[global_runner_id], (RunnerReady, RunnerRunning))
                for global_runner_id in runner.bound_instance.instance.shard_assignments.runner_to_shard
            ):
                return task
+
+
+def _cancel_tasks(
+    runners: Mapping[RunnerId, RunnerSupervisor],
+    tasks: Mapping[TaskId, Task],
+) -> Task | None:
+    for task in tasks.values():
+        if task.task_status != TaskStatus.Cancelled:
+            continue
+        logger.info(f"{task.task_id} is cancelled!")
+        for runner in runners.values():
+            if task.instance_id != runner.bound_instance.instance.instance_id:
+                continue
+            logger.info(f"{task.task_id} is mine!")
+            if task.task_id in runner.cancelled:
+                logger.info(f"{task.task_id} already cancelled")
+                continue
+            logger.info(f"cancelling {task.task_id}")
+            CancelTask(instance_id=task.instance_id, cancelled_task_id=task.task_id)
--- a/src/exo/worker/runner/bootstrap.py
+++ b/src/exo/worker/runner/bootstrap.py
@@ -3,7 +3,7 @@ import os
 import loguru

 from exo.shared.types.events import Event, RunnerStatusUpdated
-from exo.shared.types.tasks import Task
+from exo.shared.types.tasks import Task, TaskId
 from exo.shared.types.worker.instances import BoundInstance, MlxJacclInstance
 from exo.shared.types.worker.runners import RunnerFailed
 from exo.utils.channels import ClosedResourceError, MpReceiver, MpSender
@@ -15,6 +15,7 @@ def entrypoint(
    bound_instance: BoundInstance,
    event_sender: MpSender[Event],
    task_receiver: MpReceiver[Task],
+    cancel_receiver: MpReceiver[TaskId],
    _logger: "loguru.Logger",
 ) -> None:
    fast_synch_override = os.environ.get("EXO_FAST_SYNCH")
@@ -38,7 +39,7 @@ def entrypoint(
    try:
        from exo.worker.runner.runner import main

-        main(bound_instance, event_sender, task_receiver)
+        main(bound_instance, event_sender, task_receiver, cancel_receiver)
    except ClosedResourceError:
        logger.warning("Runner communication closed unexpectedly")
    except Exception as e:
--- a/src/exo/worker/runner/runner.py
+++ b/src/exo/worker/runner/runner.py
@@ -1,4 +1,5 @@
 import base64
+import contextlib
 import json
 import time
 from collections.abc import Generator
@@ -37,6 +38,7 @@ from exo.shared.types.tasks import (
    Shutdown,
    StartWarmup,
    Task,
+    TaskId,
    TaskStatus,
 )
 from exo.shared.types.worker.instances import BoundInstance
@@ -62,7 +64,7 @@ from exo.shared.types.worker.runners import (
    RunnerWarmingUp,
 )
 from exo.shared.types.worker.shards import ShardMetadata
-from exo.utils.channels import MpReceiver, MpSender
+from exo.utils.channels import MpReceiver, MpSender, WouldBlock
 from exo.worker.engines.image import (
    DistributedImageModel,
    generate_image,
@@ -77,6 +79,7 @@ from exo.worker.engines.mlx.utils_mlx import (
    initialize_mlx,
    load_mlx_items,
    mlx_force_oom,
+    mx_any,
 )
 from exo.worker.runner.bootstrap import logger

@@ -85,6 +88,7 @@ def main(
    bound_instance: BoundInstance,
    event_sender: MpSender[Event],
    task_receiver: MpReceiver[Task],
+    cancel_receiver: MpReceiver[TaskId],
 ):
    instance, runner_id, shard_metadata = (
        bound_instance.instance,
@@ -99,8 +103,11 @@ def main(
        time.sleep(timeout)

    setup_start_time = time.time()
+    cancelled_tasks = set[TaskId]()

-    model: Model | DistributedImageModel | None = None
+    # type checker was unhappy with me - splitting these fixed it
+    inference_model: Model | None = None
+    image_model: DistributedImageModel | None = None
    tokenizer = None
    group = None

@@ -111,6 +118,10 @@ def main(
    )
    with task_receiver as tasks:
        for task in tasks:
+            with contextlib.suppress(WouldBlock):
+                cancelled_tasks.add(cancel_receiver.receive_nowait())
+            if task.task_id in cancelled_tasks:
+                continue
            event_sender.send(
                TaskStatusUpdated(task_id=task.task_id, task_status=TaskStatus.Running)
            )
@@ -155,7 +166,7 @@ def main(
                        time.sleep(0.5)

                    if ModelTask.TextGeneration in shard_metadata.model_card.tasks:
-                        model, tokenizer = load_mlx_items(
+                        inference_model, tokenizer = load_mlx_items(
                            bound_instance, group, on_timeout=on_model_load_timeout
                        )
                        logger.info(
@@ -165,7 +176,7 @@ def main(
                        ModelTask.TextToImage in shard_metadata.model_card.tasks
                        or ModelTask.ImageToImage in shard_metadata.model_card.tasks
                    ):
-                        model = initialize_image_model(bound_instance)
+                        image_model = initialize_image_model(bound_instance)
                    else:
                        raise ValueError(
                            f"Unknown model task(s): {shard_metadata.model_card.tasks}"
@@ -174,8 +185,6 @@ def main(
                    current_status = RunnerLoaded()
                    logger.info("runner loaded")
                case StartWarmup() if isinstance(current_status, RunnerLoaded):
-                    assert model
-
                    current_status = RunnerWarmingUp()
                    logger.info("runner warming up")
                    event_sender.send(
@@ -186,11 +195,11 @@ def main(

                    logger.info(f"warming up inference for instance: {instance}")
                    if ModelTask.TextGeneration in shard_metadata.model_card.tasks:
-                        assert not isinstance(model, DistributedImageModel)
+                        assert inference_model
                        assert tokenizer

                        toks = warmup_inference(
-                            model=model,
+                            model=inference_model,
                            tokenizer=tokenizer,
                            # kv_prefix_cache=kv_prefix_cache,  # supply for warmup-time prefix caching
                        )
@@ -202,8 +211,8 @@ def main(
                        ModelTask.TextToImage in shard_metadata.model_card.tasks
                        or ModelTask.ImageToImage in shard_metadata.model_card.tasks
                    ):
-                        assert isinstance(model, DistributedImageModel)
-                        image = warmup_image_generator(model=model)
+                        assert image_model
+                        image = warmup_image_generator(model=image_model)
                        if image is not None:
                            logger.info(f"warmed up by generating {image.size} image")
                        else:
@@ -222,7 +231,7 @@ def main(
                            runner_id=runner_id, runner_status=current_status
                        )
                    )
-                    assert model and not isinstance(model, DistributedImageModel)
+                    assert inference_model
                    assert tokenizer
                    assert task_params.messages[0].content is not None

@@ -234,14 +243,14 @@ def main(

                        # Generate responses using the actual MLX generation
                        mlx_generator = mlx_generate(
-                            model=model,
+                            model=inference_model,
                            tokenizer=tokenizer,
                            task=task_params,
                            prompt=prompt,
                        )

                        # GPT-OSS specific parsing to match other model formats.
-                        if isinstance(model, GptOssModel):
+                        if isinstance(inference_model, GptOssModel):
                            mlx_generator = parse_gpt_oss(mlx_generator)

                        # For other thinking models (GLM, etc.), check if we need to
@@ -272,6 +281,12 @@ def main(
                            )

                        for response in mlx_generator:
+                            with contextlib.suppress(WouldBlock):
+                                cancelled_tasks.add(cancel_receiver.receive_nowait())
+                            want_to_cancel = task.task_id in cancelled_tasks
+                            if mx_any(want_to_cancel, group):
+                                break
+
                            match response:
                                case GenerationResponse():
                                    if (
@@ -338,7 +353,7 @@ def main(
                case ImageGeneration(
                    task_params=task_params, command_id=command_id
                ) if isinstance(current_status, RunnerReady):
-                    assert isinstance(model, DistributedImageModel)
+                    assert image_model
                    logger.info(f"received image generation request: {str(task)[:500]}")
                    current_status = RunnerRunning()
                    logger.info("runner running")
@@ -352,7 +367,9 @@ def main(
                        # Generate images using the image generation backend
                        # Track image_index for final images only
                        image_index = 0
-                        for response in generate_image(model=model, task=task_params):
+                        for response in generate_image(
+                            model=image_model, task=task_params
+                        ):
                            if (
                                shard_metadata.device_rank
                                == shard_metadata.world_size - 1
@@ -399,7 +416,7 @@ def main(
                case ImageEdits(task_params=task_params, command_id=command_id) if (
                    isinstance(current_status, RunnerReady)
                ):
-                    assert isinstance(model, DistributedImageModel)
+                    assert image_model
                    logger.info(f"received image edits request: {str(task)[:500]}")
                    current_status = RunnerRunning()
                    logger.info("runner running")
@@ -411,7 +428,9 @@ def main(

                    try:
                        image_index = 0
-                        for response in generate_image(model=model, task=task_params):
+                        for response in generate_image(
+                            model=image_model, task=task_params
+                        ):
                            if (
                                shard_metadata.device_rank
                                == shard_metadata.world_size - 1
@@ -474,7 +493,7 @@ def main(
                RunnerStatusUpdated(runner_id=runner_id, runner_status=current_status)
            )
            if isinstance(current_status, RunnerShutdown):
-                del model, tokenizer, group
+                del inference_model, image_model, tokenizer, group
                mx.clear_cache()
                import gc

--- a/src/exo/worker/runner/runner_supervisor.py
+++ b/src/exo/worker/runner/runner_supervisor.py
@@ -49,10 +49,12 @@ class RunnerSupervisor:
    _ev_recv: MpReceiver[Event]
    _task_sender: MpSender[Task]
    _event_sender: Sender[Event]
+    _cancel_sender: MpSender[TaskId]
    _tg: TaskGroup | None = field(default=None, init=False)
    status: RunnerStatus = field(default_factory=RunnerIdle, init=False)
    pending: dict[TaskId, anyio.Event] = field(default_factory=dict, init=False)
    completed: set[TaskId] = field(default_factory=set, init=False)
+    cancelled: set[TaskId] = field(default_factory=set, init=False)

    @classmethod
    def create(
@@ -63,8 +65,8 @@ class RunnerSupervisor:
        initialize_timeout: float = 400,
    ) -> Self:
        ev_send, ev_recv = mp_channel[Event]()
-        # A task is kind of a runner command
        task_sender, task_recv = mp_channel[Task]()
+        cancel_sender, cancel_recv = mp_channel[TaskId]()

        runner_process = Process(
            target=entrypoint,
@@ -72,6 +74,7 @@ class RunnerSupervisor:
                bound_instance,
                ev_send,
                task_recv,
+                cancel_recv,
                logger,
            ),
            daemon=True,
@@ -86,6 +89,7 @@ class RunnerSupervisor:
            initialize_timeout=initialize_timeout,
            _ev_recv=ev_recv,
            _task_sender=task_sender,
+            _cancel_sender=cancel_sender,
            _event_sender=event_sender,
        )

@@ -131,6 +135,7 @@ class RunnerSupervisor:
            logger.info(
                f"Skipping invalid task {task} as it has already been completed"
            )
+            return
        logger.info(f"Starting task {task}")
        event = anyio.Event()
        self.pending[task.task_id] = event
@@ -140,7 +145,13 @@ class RunnerSupervisor:
            logger.warning(f"Task {task} dropped, runner closed communication.")
            return
        await event.wait()
-        logger.info(f"Finished task {task}")
+
+    async def cancel_task(self, task_id: TaskId):
+        if task_id in self.completed:
+            logger.info(f"Unable to cancel {task_id} as it has been completed")
+            return
+        self.cancelled.add(task_id)
+        await self._cancel_sender.send_async(task_id)

    async def _forward_events(self):
        with self._ev_recv as events:
--- a/src/exo/worker/tests/unittests/test_mlx/test_tokenizers.py
+++ b/src/exo/worker/tests/unittests/test_mlx/test_tokenizers.py
@@ -11,12 +11,12 @@ from pathlib import Path

 import pytest

-from exo.shared.models.model_cards import MODEL_CARDS, ModelCard, ModelId
-from exo.worker.download.download_utils import (
+from exo.download.download_utils import (
    download_file_with_retry,
    ensure_models_dir,
    fetch_file_list_with_cache,
 )
+from exo.shared.models.model_cards import MODEL_CARDS, ModelCard, ModelId
 from exo.worker.engines.mlx.utils_mlx import (
    get_eos_token_ids_for_model,
    load_tokenizer_for_model_id,
--- a/src/exo/worker/tests/unittests/test_plan/test_download_and_loading.py
+++ b/src/exo/worker/tests/unittests/test_plan/test_download_and_loading.py
@@ -1,5 +1,5 @@
 import exo.worker.plan as plan_mod
-from exo.shared.types.common import ModelId, NodeId
+from exo.shared.types.common import NodeId
 from exo.shared.types.memory import Memory
 from exo.shared.types.tasks import LoadModel
 from exo.shared.types.worker.downloads import DownloadCompleted, DownloadProgress
@@ -45,13 +45,9 @@ def test_plan_requests_download_when_waiting_and_shard_not_downloaded():
    instances = {INSTANCE_1_ID: instance}
    all_runners = {RUNNER_1_ID: RunnerIdle()}

-    # No entry for this shard -> should trigger DownloadModel
-    download_status: dict[ModelId, DownloadProgress] = {}
-
    result = plan_mod.plan(
        node_id=NODE_A,
        runners=runners,  # type: ignore
-        download_status=download_status,
        global_download_status={NODE_A: []},
        instances=instances,
        all_runners=all_runners,
@@ -92,14 +88,6 @@ def test_plan_loads_model_when_all_shards_downloaded_and_waiting():
        RUNNER_2_ID: RunnerConnected(),
    }

-    # Local node has already marked its shard as downloaded (not actually used by _load_model)
-    local_download_status = {
-        MODEL_A_ID: DownloadCompleted(
-            shard_metadata=shard1, node_id=NODE_A, total_bytes=Memory()
-        )
-    }
-
-    # Global view has completed downloads for both nodes
    global_download_status = {
        NODE_A: [
            DownloadCompleted(
@@ -116,7 +104,6 @@ def test_plan_loads_model_when_all_shards_downloaded_and_waiting():
    result = plan_mod.plan(
        node_id=NODE_A,
        runners=runners,  # type: ignore
-        download_status=local_download_status,
        global_download_status=global_download_status,
        instances=instances,
        all_runners=all_runners,
@@ -148,23 +135,19 @@ def test_plan_does_not_request_download_when_shard_already_downloaded():
    instances = {INSTANCE_1_ID: instance}
    all_runners = {RUNNER_1_ID: RunnerIdle()}

-    # Local status claims the shard is downloaded already
-    local_download_status = {
-        MODEL_A_ID: DownloadCompleted(
-            shard_metadata=shard, node_id=NODE_A, total_bytes=Memory()
-        )
-    }
-
-    # Global view hasn't caught up yet (no completed shards recorded for NODE_A)
+    # Global state shows shard is downloaded for NODE_A
    global_download_status: dict[NodeId, list[DownloadProgress]] = {
-        NODE_A: [],
+        NODE_A: [
+            DownloadCompleted(
+                shard_metadata=shard, node_id=NODE_A, total_bytes=Memory()
+            )
+        ],
        NODE_B: [],
    }

    result = plan_mod.plan(
        node_id=NODE_A,
        runners=runners,  # type: ignore
-        download_status=local_download_status,
        global_download_status=global_download_status,
        instances=instances,
        all_runners=all_runners,
@@ -202,12 +185,6 @@ def test_plan_does_not_load_model_until_all_shards_downloaded_globally():
        RUNNER_2_ID: RunnerConnected(),
    }

-    # Only NODE_A's shard is recorded as downloaded globally
-    local_download_status = {
-        MODEL_A_ID: DownloadCompleted(
-            shard_metadata=shard1, node_id=NODE_A, total_bytes=Memory()
-        )
-    }
    global_download_status = {
        NODE_A: [
            DownloadCompleted(
@@ -220,7 +197,6 @@ def test_plan_does_not_load_model_until_all_shards_downloaded_globally():
    result = plan_mod.plan(
        node_id=NODE_A,
        runners=runners,  # type: ignore
-        download_status=local_download_status,
        global_download_status=global_download_status,
        instances=instances,
        all_runners=all_runners,
@@ -245,7 +221,6 @@ def test_plan_does_not_load_model_until_all_shards_downloaded_globally():
    result = plan_mod.plan(
        node_id=NODE_A,
        runners=runners,  # type: ignore
-        download_status=local_download_status,
        global_download_status=global_download_status,
        instances=instances,
        all_runners=all_runners,
--- a/src/exo/worker/tests/unittests/test_plan/test_runner_lifecycle.py
+++ b/src/exo/worker/tests/unittests/test_plan/test_runner_lifecycle.py
@@ -47,8 +47,7 @@ def test_plan_kills_runner_when_instance_missing():

    result = plan_mod.plan(
        node_id=NODE_A,
-        runners=runners,  # type: ignore
-        download_status={},
+        runners=runners,  # type: ignore[arg-type]
        global_download_status={NODE_A: []},
        instances=instances,
        all_runners=all_runners,
@@ -87,8 +86,7 @@ def test_plan_kills_runner_when_sibling_failed():

    result = plan_mod.plan(
        node_id=NODE_A,
-        runners=runners,  # type: ignore
-        download_status={},
+        runners=runners,  # type: ignore[arg-type]
        global_download_status={NODE_A: []},
        instances=instances,
        all_runners=all_runners,
@@ -120,7 +118,6 @@ def test_plan_creates_runner_when_missing_for_node():
    result = plan_mod.plan(
        node_id=NODE_A,
        runners=runners,
-        download_status={},
        global_download_status={NODE_A: []},
        instances=instances,
        all_runners=all_runners,
@@ -158,8 +155,7 @@ def test_plan_does_not_create_runner_when_supervisor_already_present():

    result = plan_mod.plan(
        node_id=NODE_A,
-        runners=runners,  # type: ignore
-        download_status={},
+        runners=runners,  # type: ignore[arg-type]
        global_download_status={NODE_A: []},
        instances=instances,
        all_runners=all_runners,
@@ -189,7 +185,6 @@ def test_plan_does_not_create_runner_for_unassigned_node():
    result = plan_mod.plan(
        node_id=NODE_A,
        runners=runners,  # type: ignore
-        download_status={},
        global_download_status={NODE_A: []},
        instances=instances,
        all_runners=all_runners,
--- a/src/exo/worker/tests/unittests/test_plan/test_task_forwarding.py
+++ b/src/exo/worker/tests/unittests/test_plan/test_task_forwarding.py
@@ -65,7 +65,6 @@ def test_plan_forwards_pending_chat_completion_when_runner_ready():
    result = plan_mod.plan(
        node_id=NODE_A,
        runners=runners,  # type: ignore
-        download_status={},
        global_download_status={NODE_A: []},
        instances=instances,
        all_runners=all_runners,
@@ -113,7 +112,6 @@ def test_plan_does_not_forward_chat_completion_if_any_runner_not_ready():
    result = plan_mod.plan(
        node_id=NODE_A,
        runners=runners,  # type: ignore
-        download_status={},
        global_download_status={NODE_A: [], NODE_B: []},
        instances=instances,
        all_runners=all_runners,
@@ -158,7 +156,6 @@ def test_plan_does_not_forward_tasks_for_other_instances():
    result = plan_mod.plan(
        node_id=NODE_A,
        runners=runners,  # type: ignore
-        download_status={},
        global_download_status={NODE_A: []},
        instances=instances,
        all_runners=all_runners,
@@ -221,7 +218,6 @@ def test_plan_ignores_non_pending_or_non_chat_tasks():
    result = plan_mod.plan(
        node_id=NODE_A,
        runners=runners,  # type: ignore
-        download_status={},
        global_download_status={NODE_A: [], NODE_B: []},
        instances=instances,
        all_runners=all_runners,
@@ -261,7 +257,6 @@ def test_plan_returns_none_when_nothing_to_do():
    result = plan_mod.plan(
        node_id=NODE_A,
        runners=runners,  # type: ignore
-        download_status={},
        global_download_status={NODE_A: [], NODE_B: []},
        instances=instances,
        all_runners=all_runners,
--- a/src/exo/worker/tests/unittests/test_plan/test_warmup.py
+++ b/src/exo/worker/tests/unittests/test_plan/test_warmup.py
@@ -57,7 +57,6 @@ def test_plan_starts_warmup_for_accepting_rank_when_all_loaded_or_warming():
    result = plan_mod.plan(
        node_id=NODE_B,
        runners=runners,  # type: ignore
-        download_status={},
        global_download_status={NODE_A: []},
        instances=instances,
        all_runners=all_runners,
@@ -99,7 +98,6 @@ def test_plan_starts_warmup_for_rank_zero_after_others_warming():
    result = plan_mod.plan(
        node_id=NODE_A,
        runners=runners,  # type: ignore
-        download_status={},
        global_download_status={NODE_A: []},
        instances=instances,
        all_runners=all_runners,
@@ -140,7 +138,6 @@ def test_plan_does_not_start_warmup_for_non_zero_rank_until_all_loaded_or_warmin
    result = plan_mod.plan(
        node_id=NODE_B,
        runners=runners,  # type: ignore
-        download_status={},
        global_download_status={NODE_A: [], NODE_B: []},
        instances=instances,
        all_runners=all_runners,
@@ -185,7 +182,6 @@ def test_plan_does_not_start_warmup_for_rank_zero_until_others_warming():
    result = plan_mod.plan(
        node_id=NODE_A,
        runners=runners,  # type: ignore
-        download_status={},
        global_download_status={NODE_A: []},
        instances=instances,
        all_runners=all_runners,
@@ -202,7 +198,6 @@ def test_plan_does_not_start_warmup_for_rank_zero_until_others_warming():
    result = plan_mod.plan(
        node_id=NODE_A,
        runners=runners,  # type: ignore
-        download_status={},
        global_download_status={NODE_A: []},
        instances=instances,
        all_runners=all_runners,
@@ -246,7 +241,6 @@ def test_plan_starts_warmup_for_connecting_rank_after_others_warming():
    result = plan_mod.plan(
        node_id=NODE_B,
        runners=runners,  # type: ignore
-        download_status={},
        global_download_status={NODE_B: []},
        instances=instances,
        all_runners=all_runners,
@@ -289,7 +283,6 @@ def test_plan_does_not_start_warmup_for_accepting_rank_until_all_loaded_or_warmi
    result = plan_mod.plan(
        node_id=NODE_A,
        runners=runners,  # type: ignore
-        download_status={},
        global_download_status={NODE_A: [], NODE_B: []},
        instances=instances,
        all_runners=all_runners,
@@ -331,7 +324,6 @@ def test_plan_does_not_start_warmup_for_connecting_rank_until_others_warming():
    result = plan_mod.plan(
        node_id=NODE_A,
        runners=runners,  # type: ignore
-        download_status={},
        global_download_status={NODE_A: [], NODE_B: []},
        instances=instances,
        all_runners=all_runners,
--- a/tests/headless_runner.py
+++ b/tests/headless_runner.py
@@ -11,6 +11,10 @@ from hypercorn.asyncio import serve  # pyright: ignore[reportUnknownVariableType
 from loguru import logger
 from pydantic import BaseModel

+from exo.download.impl_shard_downloader import (
+    build_full_shard,
+    exo_shard_downloader,
+)
 from exo.shared.logging import InterceptLogger, logger_setup
 from exo.shared.models.model_cards import MODEL_CARDS, ModelId
 from exo.shared.types.api import ChatCompletionMessage, ChatCompletionTaskParams
@@ -36,10 +40,6 @@ from exo.shared.types.worker.runners import RunnerId, ShardAssignments
 from exo.shared.types.worker.shards import PipelineShardMetadata, TensorShardMetadata
 from exo.utils.channels import MpReceiver, MpSender, channel, mp_channel
 from exo.utils.info_gatherer.info_gatherer import GatheredInfo, InfoGatherer
-from exo.worker.download.impl_shard_downloader import (
-    build_full_shard,
-    exo_shard_downloader,
-)
 from exo.worker.runner.bootstrap import entrypoint
Author	SHA1	Message	Date
Evan	45d519cde7	workin on it	2026-01-24 02:39:38 +00:00
Evan Quiney	771a86331b	fix instance port assignment (#1268 ) we were overassigning the port 52414 to instances because of an error in placement	2026-01-23 18:37:40 +00:00
Jake Hillion	6dbbe7797b	downloads: add download and delete buttons to downloads UI The downloads page showed model download progress but provided no way for users to trigger downloads or remove completed models from disk. Added API endpoints (POST /download/start, DELETE /download/{node_id}/{model_id}) that send StartDownload and DeleteDownload commands via the download_command_sender. Updated the dashboard downloads page with per-model buttons: a download button for incomplete downloads and a delete button for completed ones. This allows users to manage downloads directly from the UI without needing to trigger downloads through other means. Test plan: - Deployed on a 3 machine cluster. Did several downloads/deletions - all work and the dashboard updates relatively fluently. It takes roughly 5 seconds to render a 131GB model deletion which isn't too bad.	2026-01-23 18:11:17 +00:00
Jake Hillion	9357503c6f	downloads: refactor to run at node level The Worker previously owned the ShardDownloader directly via dependency injection, which prevented --no-worker nodes from downloading and made it impossible for multiple Workers to share a single downloader instance. Moved download functionality to a new DownloadCoordinator component at the Node level that communicates via the DOWNLOAD_COMMANDS pub/sub topic. Workers now send StartDownload commands instead of calling the downloader directly, and receive progress updates through the event-sourced state. This decouples downloads from the Worker lifecycle and enables future features like UI-triggered downloads to specific nodes and multi-worker download sharing. Test plan: - Mostly tested in the next PR that adds explicit downloads/deletions to the dashboard. - Started a model that isn't downloaded - it works.	2026-01-23 18:04:09 +00:00
ciaranbor	ba19940828	Fix regenerate for image models (#1263 ) ## Motivation The 'regenerate' button was hardcoded to chat completion. Clicking 'regenerate' for image request would result in an error after the model is loaded ## Changes Store request type and dispatch to appropriate request upon regeneration ## Why It Works We make sure to repeat the same request type as was performed originally ## Test Plan ### Manual Testing Checked 'regenerate' works for chat completion, image generation, image editing	2026-01-23 16:33:01 +00:00