localhost ring fixes

tmp: add lots of RING3DBG logging
placement: generate per-node host lists for MLX ring backend
2026-01-01 18:48:26 -05:00 · 2025-12-23 23:32:04 +00:00 · 2025-12-23 22:52:40 +00:00 · 2025-12-23 22:26:25 +00:00 · 2025-12-23 19:28:42 +00:00
58 changed files with 857 additions and 2924 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -7,8 +7,6 @@ digest.txt
 # nix
 .direnv/

-# IDEA (PyCharm)
-.idea

 # xcode / macos
 *.xcuserstate
@@ -16,7 +14,6 @@ digest.txt
 *.xcuserdatad/
 **/.DS_Store
 app/EXO/build/
-dist/


 # rust
--- a/README.md
+++ b/README.md
@@ -61,10 +61,10 @@ Devices running exo automatically discover each other, without needing any manua

 There are two ways to run exo:

-### Run from Source (macOS)
+### Run from Source (Mac & Linux)

 **Prerequisites:**
- [brew](https://github.com/Homebrew/brew) (for simple package management on macOS)
+- [brew](https://github.com/Homebrew/brew) (for simple package management on MacOS)
  
  ```bash
  /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
@@ -98,62 +98,6 @@ uv run exo

 This starts the exo dashboard and API at http://localhost:52415/

-### Run from Source (Linux)
-
-**Prerequisites:**
-
- [uv](https://github.com/astral-sh/uv) (for Python dependency management)
- [node](https://github.com/nodejs/node) (for building the dashboard) - version 18 or higher
- [rust](https://github.com/rust-lang/rustup) (to build Rust bindings, nightly for now)
-
-**Installation methods:**
-
-**Option 1: Using system package manager (Ubuntu/Debian example):**
-```bash
-# Install Node.js and npm
-sudo apt update
-sudo apt install nodejs npm
-
-# Install uv
-curl -LsSf https://astral.sh/uv/install.sh | sh
-
-# Install Rust (using rustup)
-curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
-rustup toolchain install nightly
-```
-
-**Option 2: Using Homebrew on Linux (if preferred):**
-```bash
-# Install Homebrew on Linux
-/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
-
-# Install dependencies
-brew install uv node
-
-# Install Rust (using rustup)
-curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
-rustup toolchain install nightly
-```
-
-**Note:** The `macmon` package is macOS-only and not required for Linux.
-
-Clone the repo, build the dashboard, and run exo:
-
-```bash
-# Clone exo
-git clone https://github.com/exo-explore/exo
-
-# Build dashboard
-cd exo/dashboard && npm install && npm run build && cd ..
-
-# Run exo
-uv run exo
-```
-
-This starts the exo dashboard and API at http://localhost:52415/
-
-**Important note for Linux users:** Currently, exo runs on CPU on Linux. GPU support for Linux platforms is under development. If you'd like to see support for your specific Linux hardware, please [search for existing feature requests](https://github.com/exo-explore/exo/issues) or create a new one.
-
 ### macOS App

 exo ships a macOS app that runs in the background on your Mac.
@@ -166,47 +110,6 @@ Download the latest build here: [EXO-latest.dmg](https://assets.exolabs.net/EXO-

 The app will ask for permission to modify system settings and install a new Network profile. Improvements to this are being worked on.

-#### Uninstalling the macOS App
-
-The recommended way to uninstall is through the app itself: click the menu bar icon → Advanced → Uninstall. This cleanly removes all system components.
-
-If you've already deleted the app, you can run the standalone uninstaller script:
-
-```bash
-sudo ./app/EXO/uninstall-exo.sh
-```
-
-This removes:
- Network setup LaunchDaemon
- Network configuration script
- Log files
- The "exo" network location
-
-**Note:** You'll need to manually remove EXO from Login Items in System Settings → General → Login Items.
-
---
-
-### Enabling RDMA on macOS
-
-RDMA is a new capability added to macOS 26.2. It works on any Mac with Thunderbolt 5 (M4 Pro Mac Mini, M4 Max Mac Studio, M4 Max MacBook Pro, M3 Ultra Mac Studio).
-
-Note that on Mac Studio, you cannot use the Thunderbolt 5 port next to the Ethernet port.
-
-To enable RDMA on macOS, follow these steps:
-
-1. Shut down your Mac.
-2. Hold down the power button for 10 seconds until the boot menu appears.
-3. Select "Options" to enter Recovery mode.
-4. When the Recovery UI appears, open the Terminal from the Utilities menu.
-5. In the Terminal, type:
-   ```
-   rdma_ctl enable
-   ```
-   and press Enter.
-6. Reboot your Mac.
-
-After that, RDMA will be enabled in macOS and exo will take care of the rest.
-
 ---

 ### Using the API
--- a/app/EXO/EXO/ContentView.swift
+++ b/app/EXO/EXO/ContentView.swift
@@ -17,11 +17,9 @@ struct ContentView: View {
    @State private var deletingInstanceIDs: Set<String> = []
    @State private var showAllNodes = false
    @State private var showAllInstances = false
-    @State private var showAdvanced = false
    @State private var showDebugInfo = false
    @State private var bugReportInFlight = false
    @State private var bugReportMessage: String?
-    @State private var uninstallInProgress = false

    var body: some View {
        VStack(alignment: .leading, spacing: 12) {
@@ -51,7 +49,7 @@ struct ContentView: View {

    private var topologySection: some View {
        Group {
-            if let topology = stateService.latestSnapshot?.topologyViewModel(localNodeId: stateService.localNodeId), !topology.nodes.isEmpty {
+            if let topology = stateService.latestSnapshot?.topologyViewModel(), !topology.nodes.isEmpty {
                TopologyMiniView(topology: topology)
            }
        }
@@ -195,7 +193,11 @@ struct ContentView: View {
                Divider()
                    .padding(.vertical, 4)
            }
-            advancedSection
+            controlButton(title: "Check for Updates") {
+                updater.checkForUpdates()
+            }
+            .padding(.bottom, 8)
+            debugSection
                .padding(.bottom, 8)
            controlButton(title: "Quit", tint: .secondary) {
                controller.stop()
@@ -204,33 +206,6 @@ struct ContentView: View {
        }
    }

-    private var advancedSection: some View {
-        VStack(alignment: .leading, spacing: 6) {
-            HStack {
-                Text("Advanced")
-                    .font(.caption)
-                    .foregroundColor(.secondary)
-                Spacer()
-                collapseButton(isExpanded: $showAdvanced)
-            }
-            .animation(nil, value: showAdvanced)
-            if showAdvanced {
-                VStack(alignment: .leading, spacing: 2) {
-                    HoverButton(title: "Check for Updates", small: true) {
-                        updater.checkForUpdates()
-                    }
-                    debugSection
-                    HoverButton(title: "Uninstall", tint: .red, small: true) {
-                        showUninstallConfirmationAlert()
-                    }
-                    .disabled(uninstallInProgress)
-                }
-                .transition(.opacity)
-            }
-        }
-        .animation(.easeInOut(duration: 0.25), value: showAdvanced)
-    }
-
    private func controlButton(title: String, tint: Color = .primary, action: @escaping () -> Void) -> some View {
        HoverButton(title: title, tint: tint, trailingSystemImage: nil, action: action)
    }
@@ -353,15 +328,15 @@ struct ContentView: View {
    }

    private var debugSection: some View {
-        VStack(alignment: .leading, spacing: 4) {
-            HoverButton(
-                title: "Debug Info",
-                tint: .primary,
-                trailingSystemImage: showDebugInfo ? "chevron.up" : "chevron.down",
-                small: true
-            ) {
-                showDebugInfo.toggle()
+        VStack(alignment: .leading, spacing: 6) {
+            HStack {
+                Text("Debug Info")
+                    .font(.caption)
+                    .foregroundColor(.secondary)
+                Spacer()
+                collapseButton(isExpanded: $showDebugInfo)
            }
+            .animation(nil, value: showDebugInfo)
            if showDebugInfo {
                VStack(alignment: .leading, spacing: 4) {
                    Text("Version: \(buildTag)")
@@ -377,7 +352,6 @@ struct ContentView: View {
                    sendBugReportButton
                        .padding(.top, 6)
                }
-                .padding(.leading, 8)
                .transition(.opacity)
            }
        }
@@ -473,88 +447,6 @@ struct ContentView: View {
        bugReportInFlight = false
    }

-    private func showUninstallConfirmationAlert() {
-        let alert = NSAlert()
-        alert.messageText = "Uninstall EXO"
-        alert.informativeText = """
-            This will remove EXO and all its system components:
-
-            • Network configuration daemon
-            • Launch at login registration
-            • EXO network location
-
-            The app will be moved to Trash.
-            """
-        alert.alertStyle = .warning
-        alert.addButton(withTitle: "Uninstall")
-        alert.addButton(withTitle: "Cancel")
-
-        // Style the Uninstall button as destructive
-        if let uninstallButton = alert.buttons.first {
-            uninstallButton.hasDestructiveAction = true
-        }
-
-        let response = alert.runModal()
-        if response == .alertFirstButtonReturn {
-            performUninstall()
-        }
-    }
-
-    private func performUninstall() {
-        uninstallInProgress = true
-
-        // Stop EXO process first
-        controller.cancelPendingLaunch()
-        controller.stop()
-        stateService.stopPolling()
-
-        // Run the privileged uninstall on a background thread
-        // Using .utility QoS to avoid priority inversion with NSAppleScript's subprocess
-        DispatchQueue.global(qos: .utility).async {
-            do {
-                // Remove network setup daemon and components (requires admin privileges)
-                try NetworkSetupHelper.uninstall()
-
-                DispatchQueue.main.async {
-                    // Unregister from launch at login
-                    LaunchAtLoginHelper.disable()
-
-                    // Move app to trash
-                    self.moveAppToTrash()
-
-                    // Quit the app
-                    DispatchQueue.main.asyncAfter(deadline: .now() + 0.5) {
-                        NSApplication.shared.terminate(nil)
-                    }
-                }
-            } catch {
-                DispatchQueue.main.async {
-                    self.showErrorAlert(message: error.localizedDescription)
-                    self.uninstallInProgress = false
-                }
-            }
-        }
-    }
-
-    private func showErrorAlert(message: String) {
-        let alert = NSAlert()
-        alert.messageText = "Uninstall Failed"
-        alert.informativeText = message
-        alert.alertStyle = .critical
-        alert.addButton(withTitle: "OK")
-        alert.runModal()
-    }
-
-    private func moveAppToTrash() {
-        guard let appURL = Bundle.main.bundleURL as URL? else { return }
-        do {
-            try FileManager.default.trashItem(at: appURL, resultingItemURL: nil)
-        } catch {
-            // If we can't trash the app, that's OK - user can do it manually
-            // The important system components have already been cleaned up
-        }
-    }
-
    private var buildTag: String {
        Bundle.main.infoDictionary?["EXOBuildTag"] as? String ?? "unknown"
    }
@@ -568,24 +460,14 @@ private struct HoverButton: View {
    let title: String
    let tint: Color
    let trailingSystemImage: String?
-    let small: Bool
    let action: () -> Void

-    init(title: String, tint: Color = .primary, trailingSystemImage: String? = nil, small: Bool = false, action: @escaping () -> Void) {
-        self.title = title
-        self.tint = tint
-        self.trailingSystemImage = trailingSystemImage
-        self.small = small
-        self.action = action
-    }
-
    @State private var isHovering = false

    var body: some View {
        Button(action: action) {
            HStack {
                Text(title)
-                    .font(small ? .caption : nil)
                Spacer()
                if let systemName = trailingSystemImage {
                    Image(systemName: systemName)
@@ -593,8 +475,8 @@ private struct HoverButton: View {
                }
            }
            .frame(maxWidth: .infinity, alignment: .leading)
-            .padding(.vertical, small ? 4 : 6)
-            .padding(.horizontal, small ? 6 : 8)
+            .padding(.vertical, 6)
+            .padding(.horizontal, 8)
            .background(
                RoundedRectangle(cornerRadius: 6)
                    .fill(
--- a/app/EXO/EXO/EXOApp.swift
+++ b/app/EXO/EXO/EXOApp.swift
@@ -125,22 +125,6 @@ struct EXOApp: App {
    }
 }

-/// Helper for managing EXO's launch-at-login registration
-enum LaunchAtLoginHelper {
-    private static let logger = Logger(subsystem: "io.exo.EXO", category: "LaunchAtLogin")
-
-    /// Unregisters EXO from launching at login
-    static func disable() {
-        guard SMAppService.mainApp.status == .enabled else { return }
-        do {
-            try SMAppService.mainApp.unregister()
-            logger.info("Unregistered EXO from launch at login")
-        } catch {
-            logger.error("Failed to unregister EXO from launch at login: \(error.localizedDescription, privacy: .public)")
-        }
-    }
-}
-
 final class SparkleUpdater: NSObject, ObservableObject {
    private let controller: SPUStandardUpdaterController
    private let delegateProxy: ExoUpdaterDelegate
--- a/app/EXO/EXO/Services/BugReportService.swift
+++ b/app/EXO/EXO/Services/BugReportService.swift
@@ -82,6 +82,7 @@ struct BugReportService {
    }

    private func loadCredentials() throws -> AWSConfig {
+        // These credentials are write-only and necessary to receive bug reports from users
        return AWSConfig(
            accessKey: "AKIAYEKP5EMXTOBYDGHX",
            secretKey: "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",
--- a/app/EXO/EXO/Services/ClusterStateService.swift
+++ b/app/EXO/EXO/Services/ClusterStateService.swift
@@ -7,7 +7,6 @@ final class ClusterStateService: ObservableObject {
    @Published private(set) var lastError: String?
    @Published private(set) var lastActionMessage: String?
    @Published private(set) var modelOptions: [ModelOption] = []
-    @Published private(set) var localNodeId: String?

    private var timer: Timer?
    private let decoder: JSONDecoder
@@ -30,7 +29,6 @@ final class ClusterStateService: ObservableObject {
    func startPolling(interval: TimeInterval = 0.5) {
        stopPolling()
        Task {
-            await fetchLocalNodeId()
            await fetchModels()
            await fetchSnapshot()
        }
@@ -48,31 +46,9 @@ final class ClusterStateService: ObservableObject {
        latestSnapshot = nil
        lastError = nil
        lastActionMessage = nil
-        localNodeId = nil
-    }
-
-    private func fetchLocalNodeId() async {
-        do {
-            let url = baseURL.appendingPathComponent("node_id")
-            var request = URLRequest(url: url)
-            request.cachePolicy = .reloadIgnoringLocalCacheData
-            let (data, response) = try await session.data(for: request)
-            guard let httpResponse = response as? HTTPURLResponse, (200..<300).contains(httpResponse.statusCode) else {
-                return
-            }
-            if let nodeId = try? decoder.decode(String.self, from: data) {
-                localNodeId = nodeId
-            }
-        } catch {
-            // Silently ignore - localNodeId will remain nil and retry on next poll
-        }
    }

    private func fetchSnapshot() async {
-        // Retry fetching local node ID if not yet set
-        if localNodeId == nil {
-            await fetchLocalNodeId()
-        }
        do {
            var request = URLRequest(url: endpoint)
            request.cachePolicy = .reloadIgnoringLocalCacheData
--- a/app/EXO/EXO/Services/NetworkSetupHelper.swift
+++ b/app/EXO/EXO/Services/NetworkSetupHelper.swift
@@ -62,8 +62,7 @@ networksetup -listnetworkservices | grep -q "Thunderbolt Bridge" && {
 """

    static func ensureLaunchDaemonInstalled() {
-        // Use .utility priority to match NSAppleScript's internal QoS and avoid priority inversion
-        Task.detached(priority: .utility) {
+        Task.detached {
            do {
                if daemonAlreadyInstalled() {
                    return
@@ -76,63 +75,6 @@ networksetup -listnetworkservices | grep -q "Thunderbolt Bridge" && {
        }
    }

-    /// Removes all EXO network setup components from the system.
-    /// This includes the LaunchDaemon, scripts, logs, and network location.
-    /// Requires admin privileges.
-    static func uninstall() throws {
-        let uninstallScript = makeUninstallScript()
-        try runShellAsAdmin(uninstallScript)
-        logger.info("EXO network setup components removed successfully")
-    }
-
-    /// Checks if there are any EXO network components installed that need cleanup
-    static func hasInstalledComponents() -> Bool {
-        let manager = FileManager.default
-        let scriptExists = manager.fileExists(atPath: scriptDestination)
-        let plistExists = manager.fileExists(atPath: plistDestination)
-        return scriptExists || plistExists
-    }
-
-    private static func makeUninstallScript() -> String {
-        """
-set -euo pipefail
-
-LABEL="\(daemonLabel)"
-SCRIPT_DEST="\(scriptDestination)"
-PLIST_DEST="\(plistDestination)"
-LOG_OUT="/var/log/\(daemonLabel).log"
-LOG_ERR="/var/log/\(daemonLabel).err.log"
-
-# Unload the LaunchDaemon if running
-launchctl bootout system/"$LABEL" 2>/dev/null || true
-
-# Remove LaunchDaemon plist
-rm -f "$PLIST_DEST"
-
-# Remove the script and parent directory if empty
-rm -f "$SCRIPT_DEST"
-rmdir "$(dirname "$SCRIPT_DEST")" 2>/dev/null || true
-
-# Remove log files
-rm -f "$LOG_OUT" "$LOG_ERR"
-
-# Switch back to Automatic network location
-networksetup -switchtolocation Automatic 2>/dev/null || true
-
-# Delete the exo network location if it exists
-networksetup -listlocations | grep -q '^exo$' && {
-  networksetup -deletelocation exo 2>/dev/null || true
-} || true
-
-# Re-enable Thunderbolt Bridge if it exists
-networksetup -listnetworkservices | grep -q "Thunderbolt Bridge" && {
-  networksetup -setnetworkserviceenabled "Thunderbolt Bridge" on 2>/dev/null || true
-} || true
-
-echo "EXO network components removed successfully"
-"""
-    }
-
    private static func daemonAlreadyInstalled() -> Bool {
        let manager = FileManager.default
        let scriptExists = manager.fileExists(atPath: scriptDestination)
--- a/app/EXO/EXO/ViewModels/NodeViewModel.swift
+++ b/app/EXO/EXO/ViewModels/NodeViewModel.swift
@@ -85,7 +85,7 @@ struct TopologyViewModel {
 }

 extension ClusterState {
-    func topologyViewModel(localNodeId: String?) -> TopologyViewModel? {
+    func topologyViewModel() -> TopologyViewModel? {
        let topologyNodeIds = Set(topology?.nodes.map(\.nodeId) ?? [])
        let allNodes = nodeViewModels().filter { topologyNodeIds.isEmpty || topologyNodeIds.contains($0.id) }
        guard !allNodes.isEmpty else { return nil }
@@ -105,11 +105,6 @@ extension ClusterState {
            orderedNodes = allNodes
        }

-        // Rotate so the local node (from /node_id API) is first
-        if let localId = localNodeId, let index = orderedNodes.firstIndex(where: { $0.id == localId }) {
-            orderedNodes = Array(orderedNodes[index...]) + Array(orderedNodes[..<index])
-        }
-
        let nodeIds = Set(orderedNodes.map(\.id))
        let edgesArray: [TopologyEdgeViewModel] = topology?.connections?.compactMap { connection in
            guard nodeIds.contains(connection.localNodeId), nodeIds.contains(connection.sendBackNodeId) else { return nil }
@@ -117,7 +112,10 @@ extension ClusterState {
        } ?? []
        let edges = Set(edgesArray)

-        return TopologyViewModel(nodes: orderedNodes, edges: Array(edges), currentNodeId: localNodeId)
+        let topologyRootId = topology?.nodes.first?.nodeId
+        let currentId = orderedNodes.first(where: { $0.id == topologyRootId })?.id ?? orderedNodes.first?.id
+
+        return TopologyViewModel(nodes: orderedNodes, edges: Array(edges), currentNodeId: currentId)
    }
 }

--- a/app/EXO/uninstall-exo.sh
+++ b/app/EXO/uninstall-exo.sh
@@ -1,154 +0,0 @@
-#!/usr/bin/env bash
-#
-# EXO Uninstaller Script
-#
-# This script removes all EXO system components that persist after deleting the app.
-# Run with: sudo ./uninstall-exo.sh
-#
-# Components removed:
-# - LaunchDaemon: /Library/LaunchDaemons/io.exo.networksetup.plist
-# - Network script: /Library/Application Support/EXO/
-# - Log files: /var/log/io.exo.networksetup.*
-# - Network location: "exo"
-# - Launch at login registration
-#
-
-set -euo pipefail
-
-LABEL="io.exo.networksetup"
-SCRIPT_DEST="/Library/Application Support/EXO/disable_bridge_enable_dhcp.sh"
-PLIST_DEST="/Library/LaunchDaemons/io.exo.networksetup.plist"
-LOG_OUT="/var/log/${LABEL}.log"
-LOG_ERR="/var/log/${LABEL}.err.log"
-APP_BUNDLE_ID="io.exo.EXO"
-
-# Colors for output
-RED='\033[0;31m'
-GREEN='\033[0;32m'
-YELLOW='\033[1;33m'
-NC='\033[0m' # No Color
-
-echo_info() {
-    echo -e "${GREEN}[INFO]${NC} $1"
-}
-
-echo_warn() {
-    echo -e "${YELLOW}[WARN]${NC} $1"
-}
-
-echo_error() {
-    echo -e "${RED}[ERROR]${NC} $1"
-}
-
-# Check if running as root
-if [[ $EUID -ne 0 ]]; then
-    echo_error "This script must be run as root (use sudo)"
-    exit 1
-fi
-
-echo ""
-echo "========================================"
-echo "        EXO Uninstaller"
-echo "========================================"
-echo ""
-
-# Unload the LaunchDaemon if running
-echo_info "Stopping network setup daemon..."
-if launchctl list | grep -q "$LABEL"; then
-    launchctl bootout system/"$LABEL" 2>/dev/null || true
-    echo_info "Daemon stopped"
-else
-    echo_warn "Daemon was not running"
-fi
-
-# Remove LaunchDaemon plist
-if [[ -f "$PLIST_DEST" ]]; then
-    rm -f "$PLIST_DEST"
-    echo_info "Removed LaunchDaemon plist"
-else
-    echo_warn "LaunchDaemon plist not found (already removed?)"
-fi
-
-# Remove the script and parent directory
-if [[ -f "$SCRIPT_DEST" ]]; then
-    rm -f "$SCRIPT_DEST"
-    echo_info "Removed network setup script"
-else
-    echo_warn "Network setup script not found (already removed?)"
-fi
-
-# Remove EXO directory if empty
-if [[ -d "/Library/Application Support/EXO" ]]; then
-    rmdir "/Library/Application Support/EXO" 2>/dev/null && \
-        echo_info "Removed EXO support directory" || \
-        echo_warn "EXO support directory not empty, leaving in place"
-fi
-
-# Remove log files
-if [[ -f "$LOG_OUT" ]] || [[ -f "$LOG_ERR" ]]; then
-    rm -f "$LOG_OUT" "$LOG_ERR"
-    echo_info "Removed log files"
-else
-    echo_warn "Log files not found (already removed?)"
-fi
-
-# Switch back to Automatic network location
-echo_info "Restoring network configuration..."
-if networksetup -listlocations | grep -q "^Automatic$"; then
-    networksetup -switchtolocation Automatic 2>/dev/null || true
-    echo_info "Switched to Automatic network location"
-else
-    echo_warn "Automatic network location not found"
-fi
-
-# Delete the exo network location if it exists
-if networksetup -listlocations | grep -q "^exo$"; then
-    networksetup -deletelocation exo 2>/dev/null || true
-    echo_info "Deleted 'exo' network location"
-else
-    echo_warn "'exo' network location not found (already removed?)"
-fi
-
-# Re-enable Thunderbolt Bridge if it exists
-if networksetup -listnetworkservices 2>/dev/null | grep -q "Thunderbolt Bridge"; then
-    networksetup -setnetworkserviceenabled "Thunderbolt Bridge" on 2>/dev/null || true
-    echo_info "Re-enabled Thunderbolt Bridge"
-fi
-
-# Note about launch at login registration
-# SMAppService-based login items cannot be removed from a shell script.
-# They can only be unregistered from within the app itself or manually via System Settings.
-echo_warn "Launch at login must be removed manually:"
-echo_warn "  System Settings → General → Login Items → Remove EXO"
-
-# Check if EXO.app exists in common locations
-APP_FOUND=false
-for app_path in "/Applications/EXO.app" "$HOME/Applications/EXO.app"; do
-    if [[ -d "$app_path" ]]; then
-        if [[ "$APP_FOUND" == false ]]; then
-            echo ""
-            APP_FOUND=true
-        fi
-        echo_warn "EXO.app found at: $app_path"
-        echo_warn "You may want to move it to Trash manually."
-    fi
-done
-
-echo ""
-echo "========================================"
-echo_info "EXO uninstall complete!"
-echo "========================================"
-echo ""
-echo "The following have been removed:"
-echo "  • Network setup LaunchDaemon"
-echo "  • Network configuration script"
-echo "  • Log files"
-echo "  • 'exo' network location"
-echo ""
-echo "Your network has been restored to use the 'Automatic' location."
-echo "Thunderbolt Bridge has been re-enabled (if present)."
-echo ""
-echo "Manual step required:"
-echo "  Remove EXO from Login Items in System Settings → General → Login Items"
-echo ""
-
--- a/dashboard/package-lock.json
+++ b/dashboard/package-lock.json
@@ -9,8 +9,6 @@
 			"version": "1.0.0",
 			"dependencies": {
 				"highlight.js": "^11.11.1",
-				"katex": "^0.16.27",
-				"marked": "^17.0.1",
 				"mode-watcher": "^1.1.0"
 			},
 			"devDependencies": {
@@ -863,6 +861,7 @@
 			"integrity": "sha512-oH8tXw7EZnie8FdOWYrF7Yn4IKrqTFHhXvl8YxXxbKwTMcD/5NNCryUSEXRk2ZR4ojnub0P8rNrsVGHXWqIDtA==",
 			"dev": true,
 			"license": "MIT",
+			"peer": true,
 			"dependencies": {
 				"@standard-schema/spec": "^1.0.0",
 				"@sveltejs/acorn-typescript": "^1.0.5",
@@ -902,6 +901,7 @@
 			"integrity": "sha512-Y1Cs7hhTc+a5E9Va/xwKlAJoariQyHY+5zBgCZg4PFWNYQ1nMN9sjK1zhw1gK69DuqVP++sht/1GZg1aRwmAXQ==",
 			"dev": true,
 			"license": "MIT",
+			"peer": true,
 			"dependencies": {
 				"@sveltejs/vite-plugin-svelte-inspector": "^4.0.1",
 				"debug": "^4.4.1",
@@ -1518,6 +1518,7 @@
 			"integrity": "sha512-LCCV0HdSZZZb34qifBsyWlUmok6W7ouER+oQIGBScS8EsZsQbrtFTUrDX4hOl+CS6p7cnNC4td+qrSVGSCTUfQ==",
 			"dev": true,
 			"license": "MIT",
+			"peer": true,
 			"dependencies": {
 				"undici-types": "~6.21.0"
 			}
@@ -1527,6 +1528,7 @@
 			"resolved": "https://registry.npmjs.org/acorn/-/acorn-8.15.0.tgz",
 			"integrity": "sha512-NZyJarBfL7nWwIq+FDL6Zp/yHEhePMNnnJ0y3qfieCrmNvYct8uvtiV41UvlSe6apAfk0fY1FbWx+NwfmpvtTg==",
 			"license": "MIT",
+			"peer": true,
 			"bin": {
 				"acorn": "bin/acorn"
 			},
@@ -1939,6 +1941,7 @@
 			"integrity": "sha512-fmTRWbNMmsmWq6xJV8D19U/gw/bwrHfNXxrIN+HfZgnzqTHp9jOmKMhsTUjXOJnZOdZY9Q28y4yebKzqDKlxlQ==",
 			"dev": true,
 			"license": "ISC",
+			"peer": true,
 			"engines": {
 				"node": ">=12"
 			}
@@ -2251,31 +2254,6 @@
 				"jiti": "lib/jiti-cli.mjs"
 			}
 		},
-		"node_modules/katex": {
-			"version": "0.16.27",
-			"resolved": "https://registry.npmjs.org/katex/-/katex-0.16.27.tgz",
-			"integrity": "sha512-aeQoDkuRWSqQN6nSvVCEFvfXdqo1OQiCmmW1kc9xSdjutPv7BGO7pqY9sQRJpMOGrEdfDgF2TfRXe5eUAD2Waw==",
-			"funding": [
-				"https://opencollective.com/katex",
-				"https://github.com/sponsors/katex"
-			],
-			"license": "MIT",
-			"dependencies": {
-				"commander": "^8.3.0"
-			},
-			"bin": {
-				"katex": "cli.js"
-			}
-		},
-		"node_modules/katex/node_modules/commander": {
-			"version": "8.3.0",
-			"resolved": "https://registry.npmjs.org/commander/-/commander-8.3.0.tgz",
-			"integrity": "sha512-OkTL9umf+He2DZkUq8f8J9of7yL6RJKI24dVITBmNfZBmri9zYZQrKkuXiKhyfPSu8tUhnVBB1iKXevvnlR4Ww==",
-			"license": "MIT",
-			"engines": {
-				"node": ">= 12"
-			}
-		},
 		"node_modules/kleur": {
 			"version": "4.1.5",
 			"resolved": "https://registry.npmjs.org/kleur/-/kleur-4.1.5.tgz",
@@ -2562,18 +2540,6 @@
 				"@jridgewell/sourcemap-codec": "^1.5.5"
 			}
 		},
-		"node_modules/marked": {
-			"version": "17.0.1",
-			"resolved": "https://registry.npmjs.org/marked/-/marked-17.0.1.tgz",
-			"integrity": "sha512-boeBdiS0ghpWcSwoNm/jJBwdpFaMnZWRzjA6SkUMYb40SVaN1x7mmfGKp0jvexGcx+7y2La5zRZsYFZI6Qpypg==",
-			"license": "MIT",
-			"bin": {
-				"marked": "bin/marked.js"
-			},
-			"engines": {
-				"node": ">= 20"
-			}
-		},
 		"node_modules/mode-watcher": {
 			"version": "1.1.0",
 			"resolved": "https://registry.npmjs.org/mode-watcher/-/mode-watcher-1.1.0.tgz",
@@ -2646,6 +2612,7 @@
 			"integrity": "sha512-5gTmgEY/sqK6gFXLIsQNH19lWb4ebPDLA4SdLP7dsWkIXHWlG66oPuVvXSGFPppYZz8ZDZq0dYYrbHfBCVUb1Q==",
 			"dev": true,
 			"license": "MIT",
+			"peer": true,
 			"engines": {
 				"node": ">=12"
 			},
@@ -2833,6 +2800,7 @@
 			"resolved": "https://registry.npmjs.org/svelte/-/svelte-5.45.3.tgz",
 			"integrity": "sha512-ngKXNhNvwPzF43QqEhDOue7TQTrG09em1sd4HBxVF0Wr2gopAmdEWan+rgbdgK4fhBtSOTJO8bYU4chUG7VXZQ==",
 			"license": "MIT",
+			"peer": true,
 			"dependencies": {
 				"@jridgewell/remapping": "^2.3.4",
 				"@jridgewell/sourcemap-codec": "^1.5.0",
@@ -2977,6 +2945,7 @@
 			"integrity": "sha512-jl1vZzPDinLr9eUt3J/t7V6FgNEw9QjvBPdysz9KfQDD41fQrC2Y4vKQdiaUpFT4bXlb1RHhLpp8wtm6M5TgSw==",
 			"dev": true,
 			"license": "Apache-2.0",
+			"peer": true,
 			"bin": {
 				"tsc": "bin/tsc",
 				"tsserver": "bin/tsserver"
@@ -2998,6 +2967,7 @@
 			"integrity": "sha512-+Oxm7q9hDoLMyJOYfUYBuHQo+dkAloi33apOPP56pzj+vsdJDzr+j1NISE5pyaAuKL4A3UD34qd0lx5+kfKp2g==",
 			"dev": true,
 			"license": "MIT",
+			"peer": true,
 			"dependencies": {
 				"esbuild": "^0.25.0",
 				"fdir": "^6.4.4",
--- a/dashboard/package.json
+++ b/dashboard/package.json
@@ -27,8 +27,7 @@
 	},
 	"dependencies": {
 		"highlight.js": "^11.11.1",
-		"katex": "^0.16.27",
-		"marked": "^17.0.1",
 		"mode-watcher": "^1.1.0"
 	}
 }
+
--- a/dashboard/src/lib/components/ChatForm.svelte
+++ b/dashboard/src/lib/components/ChatForm.svelte
@@ -139,11 +139,6 @@
 	}

 	function handleKeydown(event: KeyboardEvent) {
-		// Prevent form submission during IME composition (e.g., Chinese, Japanese, Korean input)
-		if (event.isComposing || event.keyCode === 229) {
-			return;
-		}
-		
 		if (event.key === 'Enter' && !event.shiftKey) {
 			event.preventDefault();
 			handleSubmit();
--- a/dashboard/src/lib/components/ChatMessages.svelte
+++ b/dashboard/src/lib/components/ChatMessages.svelte
@@ -8,80 +8,89 @@
 		regenerateLastResponse
 	} from '$lib/stores/app.svelte';
 	import type { MessageAttachment } from '$lib/stores/app.svelte';
-	import MarkdownContent from './MarkdownContent.svelte';
+import { tick, onDestroy } from 'svelte';

-	interface Props {
-		class?: string;
-		scrollParent?: HTMLElement | null;
-	}
+interface Props {
+	class?: string;
+	scrollParent?: HTMLElement | null;
+}

-	let { class: className = '', scrollParent = null }: Props = $props();
+let { class: className = '', scrollParent = null }: Props = $props();

 	const messageList = $derived(messages());
 	const response = $derived(currentResponse());
 	const loading = $derived(isLoading());

-	// Scroll management - user controls scroll, show button when not at bottom
-	const SCROLL_THRESHOLD = 100;
-	let showScrollButton = $state(false);
-	let lastMessageCount = 0;
-	let containerRef: HTMLDivElement | undefined = $state();
+// Ref for scroll anchor at bottom
+let scrollAnchorRef: HTMLDivElement | undefined = $state();

-	function getScrollContainer(): HTMLElement | null {
-		if (scrollParent) return scrollParent;
-		return containerRef?.parentElement ?? null;
+// Scroll management
+const SCROLL_BOTTOM_THRESHOLD = 120;
+let autoScrollEnabled = true;
+let currentScrollEl: HTMLElement | null = null;
+
+function resolveScrollElement(): HTMLElement | null {
+	if (scrollParent) return scrollParent;
+	let node: HTMLElement | null = scrollAnchorRef?.parentElement as HTMLElement | null;
+	while (node) {
+		const isScrollable = node.scrollHeight > node.clientHeight + 1;
+		if (isScrollable) return node;
+		node = node.parentElement;
 	}
+	return null;
+}

-	function isNearBottom(el: HTMLElement): boolean {
-		return el.scrollHeight - el.scrollTop - el.clientHeight < SCROLL_THRESHOLD;
+function handleScroll() {
+	if (!currentScrollEl) return;
+	const distanceFromBottom = currentScrollEl.scrollHeight - currentScrollEl.scrollTop - currentScrollEl.clientHeight;
+	const isNearBottom = distanceFromBottom < SCROLL_BOTTOM_THRESHOLD;
+	autoScrollEnabled = isNearBottom;
+}
+
+function attachScrollListener() {
+	const nextEl = resolveScrollElement();
+	if (currentScrollEl === nextEl) return;
+	if (currentScrollEl) {
+		currentScrollEl.removeEventListener('scroll', handleScroll);
 	}
+	currentScrollEl = nextEl;
+	if (currentScrollEl) {
+		currentScrollEl.addEventListener('scroll', handleScroll);
+		// Initialize state based on current position
+		handleScroll();
+	}
+}

-	function scrollToBottom() {
-		const el = getScrollContainer();
-		if (el) {
-			el.scrollTo({ top: el.scrollHeight, behavior: 'smooth' });
+onDestroy(() => {
+	if (currentScrollEl) {
+		currentScrollEl.removeEventListener('scroll', handleScroll);
+	}
+});
+
+$effect(() => {
+	// Re-evaluate scroll container if prop changes or after mount
+	scrollParent;
+	attachScrollListener();
+});
+
+// Auto-scroll to bottom when messages change or response updates, but only if user is near bottom
+$effect(() => {
+	// Track these values to trigger effect
+	const _ = messageList.length;
+	const __ = response;
+	const ___ = loading;
+	
+	tick().then(() => {
+		const el = currentScrollEl ?? resolveScrollElement();
+		if (!el || !scrollAnchorRef) return;
+		const distanceFromBottom = el.scrollHeight - el.scrollTop - el.clientHeight;
+		const isNearBottom = distanceFromBottom < SCROLL_BOTTOM_THRESHOLD;
+		if (autoScrollEnabled || isNearBottom) {
+			scrollAnchorRef.scrollIntoView({ behavior: 'smooth', block: 'end' });
+			autoScrollEnabled = true;
 		}
-	}
-
-	function updateScrollButtonVisibility() {
-		const el = getScrollContainer();
-		if (!el) return;
-		showScrollButton = !isNearBottom(el);
-	}
-
-	// Attach scroll listener
-	$effect(() => {
-		const el = scrollParent ?? containerRef?.parentElement;
-		if (!el) return;
-		
-		el.addEventListener('scroll', updateScrollButtonVisibility, { passive: true });
-		// Initial check
-		updateScrollButtonVisibility();
-		return () => el.removeEventListener('scroll', updateScrollButtonVisibility);
-	});
-
-	// Auto-scroll when user sends a new message
-	$effect(() => {
-		const count = messageList.length;
-		if (count > lastMessageCount) {
-			const el = getScrollContainer();
-			if (el) {
-				requestAnimationFrame(() => {
-					el.scrollTo({ top: el.scrollHeight, behavior: 'smooth' });
-				});
-			}
-		}
-		lastMessageCount = count;
-	});
-
-	// Update scroll button visibility when content changes
-	$effect(() => {
-		// Track response to trigger re-check during streaming
-		const _ = response;
-		
-		// Small delay to let DOM update
-		requestAnimationFrame(() => updateScrollButtonVisibility());
 	});
+});

 	// Edit state
 	let editingMessageId = $state<string | null>(null);
@@ -222,7 +231,7 @@ function isThinkingExpanded(messageId: string): boolean {
 <div class="flex flex-col gap-4 sm:gap-6 {className}">
 	{#each messageList as message (message.id)}
 		<div class="group flex {message.role === 'user' ? 'justify-end' : 'justify-start'}">
-			<div class="{message.role === 'user' ? 'max-w-[85%] sm:max-w-[70%] flex flex-col items-end' : 'w-full max-w-[98%] sm:max-w-[95%]'}">
+			<div class="{message.role === 'user' ? 'max-w-[85%] sm:max-w-[70%] flex flex-col items-end' : 'max-w-[95%] sm:max-w-[85%]'}">
 				{#if message.role === 'assistant'}
 					<!-- Assistant message header -->
 					<div class="flex items-center gap-1.5 sm:gap-2 mb-1.5 sm:mb-2">
@@ -296,7 +305,7 @@ function isThinkingExpanded(messageId: string): boolean {
 				{:else}
 					<div class="{message.role === 'user' 
 						? 'command-panel rounded-lg rounded-tr-sm inline-block' 
-						: 'command-panel rounded-lg rounded-tl-sm border-l-2 border-l-exo-yellow/50 block w-full'}">
+						: 'command-panel rounded-lg rounded-tl-sm border-l-2 border-l-exo-yellow/50 inline-block'}">
 						
 						{#if message.role === 'user'}
 							<!-- User message styling -->
@@ -322,7 +331,7 @@ function isThinkingExpanded(messageId: string): boolean {
 								{/if}
 								
 								{#if message.content}
-									<div class="text-xs text-foreground font-mono tracking-wide whitespace-pre-wrap break-words leading-relaxed">
+									<div class="text-sm text-foreground font-mono tracking-wide whitespace-pre-wrap break-words leading-relaxed">
 										{message.content}
 									</div>
 								{/if}
@@ -351,7 +360,7 @@ function isThinkingExpanded(messageId: string): boolean {
 												</svg>
 												<span>Thinking...</span>
 											</span>
-											<span class="text-[10px] tracking-[0.2em] text-exo-light-gray/60 ml-4">
+											<span class="text-[10px] tracking-[0.2em] text-exo-light-gray/60">
 												{isThinkingExpanded(message.id) ? 'HIDE' : 'SHOW'}
 											</span>
 										</button>
@@ -365,8 +374,8 @@ function isThinkingExpanded(messageId: string): boolean {
 										{/if}
 									</div>
 								{/if}
-								<div class="text-xs text-foreground">
-									<MarkdownContent content={message.content || (loading ? response : '')} />
+								<div class="text-sm text-foreground font-mono tracking-wide whitespace-pre-wrap break-words leading-relaxed">
+									{message.content || (loading ? response : '')}
 									{#if loading && !message.content}
 										<span class="inline-block w-2 h-4 bg-exo-yellow/70 ml-1 cursor-blink"></span>
 									{/if}
@@ -448,20 +457,6 @@ function isThinkingExpanded(messageId: string): boolean {
 		</div>
 	{/if}
 	
-	<!-- Invisible element for container reference -->
-	<div bind:this={containerRef}></div>
-
-	<!-- Scroll to bottom button -->
-	{#if showScrollButton}
-		<button
-			type="button"
-			onclick={scrollToBottom}
-			class="sticky bottom-4 left-1/2 -translate-x-1/2 w-10 h-10 rounded-full bg-exo-dark-gray/90 border border-exo-medium-gray/50 flex items-center justify-center text-exo-light-gray hover:text-exo-yellow hover:border-exo-yellow/50 transition-all shadow-lg cursor-pointer z-10"
-			title="Scroll to bottom"
-		>
-			<svg class="w-5 h-5" fill="none" viewBox="0 0 24 24" stroke="currentColor">
-				<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M19 14l-7 7m0 0l-7-7m7 7V3" />
-			</svg>
-		</button>
-	{/if}
+	<!-- Scroll anchor for auto-scroll -->
+	<div bind:this={scrollAnchorRef}></div>
 </div>
--- a/dashboard/src/lib/components/ChatSidebar.svelte
+++ b/dashboard/src/lib/components/ChatSidebar.svelte
@@ -10,9 +10,7 @@ import {
 		clearChat,
 		instances,
 		debugMode,
-		toggleDebugMode,
-		topologyOnlyMode,
-		toggleTopologyOnlyMode
+		toggleDebugMode
 	} from '$lib/stores/app.svelte';

 	interface Props {
@@ -25,7 +23,6 @@ import {
 	const activeId = $derived(activeConversationId());
 const instanceData = $derived(instances());
 const debugEnabled = $derived(debugMode());
-const topologyOnlyEnabled = $derived(topologyOnlyMode());

 	let searchQuery = $state('');
 	let editingId = $state<string | null>(null);
@@ -427,19 +424,6 @@ const topologyOnlyEnabled = $derived(topologyOnlyMode());
 		<div class="text-xs text-white/60 font-mono tracking-wider text-center">
 			{conversationList.length} CONVERSATION{conversationList.length !== 1 ? 'S' : ''}
 		</div>
-		<button
-			type="button"
-			onclick={toggleTopologyOnlyMode}
-			class="p-1.5 rounded border border-exo-medium-gray/40 hover:border-exo-yellow/50 transition-colors cursor-pointer"
-			title="Toggle topology only mode"
-		>
-			<svg class="w-4 h-4 {topologyOnlyEnabled ? 'text-exo-yellow' : 'text-exo-medium-gray'}" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2">
-				<circle cx="12" cy="5" r="2" fill="currentColor" />
-				<circle cx="5" cy="19" r="2" fill="currentColor" />
-				<circle cx="19" cy="19" r="2" fill="currentColor" />
-				<path stroke-linecap="round" d="M12 7v5m0 0l-5 5m5-5l5 5" />
-			</svg>
-		</button>
 	</div>
 	</div>
 </aside>
--- a/dashboard/src/lib/components/HeaderNav.svelte
+++ b/dashboard/src/lib/components/HeaderNav.svelte
@@ -3,9 +3,6 @@

 	export let showHome = true;
 	export let onHome: (() => void) | null = null;
-	export let showSidebarToggle = false;
-	export let sidebarVisible = true;
-	export let onToggleSidebar: (() => void) | null = null;

 	function handleHome(): void {
 		if (onHome) {
@@ -17,38 +14,13 @@
 			window.location.hash = '/';
 		}
 	}
-
-	function handleToggleSidebar(): void {
-		if (onToggleSidebar) {
-			onToggleSidebar();
-		}
-	}
 </script>

 <header class="relative z-20 flex items-center justify-center px-6 pt-8 pb-4 bg-exo-dark-gray">
-	<!-- Left: Sidebar Toggle -->
-	{#if showSidebarToggle}
-	<div class="absolute left-6 top-1/2 -translate-y-1/2">
-		<button
-			onclick={handleToggleSidebar}
-			class="p-2 rounded border border-exo-medium-gray/40 hover:border-exo-yellow/50 transition-colors cursor-pointer"
-			title={sidebarVisible ? 'Hide sidebar' : 'Show sidebar'}
-		>
-			<svg class="w-5 h-5 {sidebarVisible ? 'text-exo-yellow' : 'text-exo-medium-gray'}" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2">
-				{#if sidebarVisible}
-					<path stroke-linecap="round" stroke-linejoin="round" d="M11 19l-7-7 7-7m8 14l-7-7 7-7" />
-				{:else}
-					<path stroke-linecap="round" stroke-linejoin="round" d="M13 5l7 7-7 7M5 5l7 7-7 7" />
-				{/if}
-			</svg>
-		</button>
-	</div>
-	{/if}
-
 	<!-- Center: Logo (clickable to go home) -->
 	<button
 		onclick={handleHome}
-		class="bg-transparent border-none outline-none focus:outline-none transition-opacity duration-200 hover:opacity-90 {showHome ? 'cursor-pointer' : 'cursor-default'}"
+		class="hover:opacity-80 transition-opacity {showHome ? 'cursor-pointer' : 'cursor-default'}"
 		title={showHome ? 'Go to home' : ''}
 		disabled={!showHome}
 	>
--- a/dashboard/src/lib/components/MarkdownContent.svelte
+++ b/dashboard/src/lib/components/MarkdownContent.svelte
@@ -1,451 +0,0 @@
-<script lang="ts">
-	import { marked } from 'marked';
-	import hljs from 'highlight.js';
-	import katex from 'katex';
-	import 'katex/dist/katex.min.css';
-	import { browser } from '$app/environment';
-
-	interface Props {
-		content: string;
-		class?: string;
-	}
-
-	let { content, class: className = '' }: Props = $props();
-
-	let containerRef = $state<HTMLDivElement>();
-	let processedHtml = $state('');
-
-	// Configure marked with syntax highlighting
-	marked.setOptions({
-		gfm: true,
-		breaks: true
-	});
-
-	// Custom renderer for code blocks
-	const renderer = new marked.Renderer();
-
-	renderer.code = function ({ text, lang }: { text: string; lang?: string }) {
-		const language = lang && hljs.getLanguage(lang) ? lang : 'plaintext';
-		const highlighted = hljs.highlight(text, { language }).value;
-		const codeId = `code-${Date.now()}-${Math.random().toString(36).slice(2, 9)}`;
-
-		return `
-			<div class="code-block-wrapper">
-				<div class="code-block-header">
-					<span class="code-language">${language}</span>
-					<button type="button" class="copy-code-btn" data-code="${encodeURIComponent(text)}" title="Copy code">
-						<svg width="16" height="16" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
-							<rect width="14" height="14" x="8" y="8" rx="2" ry="2"/>
-							<path d="M4 16c-1.1 0-2-.9-2-2V4c0-1.1.9-2 2-2h10c1.1 0 2 .9 2 2"/>
-						</svg>
-					</button>
-				</div>
-				<pre><code class="hljs language-${language}" data-code-id="${codeId}">${highlighted}</code></pre>
-			</div>
-		`;
-	};
-
-	// Inline code
-	renderer.codespan = function ({ text }: { text: string }) {
-		return `<code class="inline-code">${text}</code>`;
-	};
-
-	marked.use({ renderer });
-
-	/**
-	 * Preprocess LaTeX: convert \(...\) to $...$ and \[...\] to $$...$$
-	 * Also protect code blocks from LaTeX processing
-	 */
-	function preprocessLaTeX(text: string): string {
-		// Protect code blocks
-		const codeBlocks: string[] = [];
-		let processed = text.replace(/```[\s\S]*?```|`[^`]+`/g, (match) => {
-			codeBlocks.push(match);
-			return `<<CODE_${codeBlocks.length - 1}>>`;
-		});
-
-		// Convert \(...\) to $...$
-		processed = processed.replace(/\\\((.+?)\\\)/g, '$$$1$');
-		
-		// Convert \[...\] to $$...$$
-		processed = processed.replace(/\\\[([\s\S]*?)\\\]/g, '$$$$$1$$$$');
-
-		// Restore code blocks
-		processed = processed.replace(/<<CODE_(\d+)>>/g, (_, index) => codeBlocks[parseInt(index)]);
-
-		return processed;
-	}
-
-	/**
-	 * Render math expressions with KaTeX after HTML is generated
-	 */
-	function renderMath(html: string): string {
-		// Render display math ($$...$$)
-		html = html.replace(/\$\$([\s\S]*?)\$\$/g, (_, math) => {
-			try {
-				return katex.renderToString(math.trim(), {
-					displayMode: true,
-					throwOnError: false,
-					output: 'html'
-				});
-			} catch {
-				return `<span class="math-error">$$${math}$$</span>`;
-			}
-		});
-
-		// Render inline math ($...$) but avoid matching currency like $5
-		html = html.replace(/\$([^\$\n]+?)\$/g, (match, math) => {
-			// Skip if it looks like currency ($ followed by number)
-			if (/^\d/.test(math.trim())) {
-				return match;
-			}
-			try {
-				return katex.renderToString(math.trim(), {
-					displayMode: false,
-					throwOnError: false,
-					output: 'html'
-				});
-			} catch {
-				return `<span class="math-error">$${math}$</span>`;
-			}
-		});
-
-		return html;
-	}
-
-	function processMarkdown(text: string): string {
-		try {
-			// Preprocess LaTeX notation
-			const preprocessed = preprocessLaTeX(text);
-			// Parse markdown
-			let html = marked.parse(preprocessed) as string;
-			// Render math expressions
-			html = renderMath(html);
-			return html;
-		} catch (error) {
-			console.error('Markdown processing error:', error);
-			return text.replace(/\n/g, '<br>');
-		}
-	}
-
-	async function handleCopyClick(event: Event) {
-		const target = event.currentTarget as HTMLButtonElement;
-		const encodedCode = target.getAttribute('data-code');
-		if (!encodedCode) return;
-
-		const code = decodeURIComponent(encodedCode);
-
-		try {
-			await navigator.clipboard.writeText(code);
-			// Show copied feedback
-			const originalHtml = target.innerHTML;
-			target.innerHTML = `
-				<svg width="16" height="16" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
-					<path d="M20 6L9 17l-5-5"/>
-				</svg>
-			`;
-			target.classList.add('copied');
-			setTimeout(() => {
-				target.innerHTML = originalHtml;
-				target.classList.remove('copied');
-			}, 2000);
-		} catch (error) {
-			console.error('Failed to copy:', error);
-		}
-	}
-
-	function setupCopyButtons() {
-		if (!containerRef || !browser) return;
-
-		const buttons = containerRef.querySelectorAll<HTMLButtonElement>('.copy-code-btn');
-		for (const button of buttons) {
-			if (button.dataset.listenerBound !== 'true') {
-				button.dataset.listenerBound = 'true';
-				button.addEventListener('click', handleCopyClick);
-			}
-		}
-	}
-
-	$effect(() => {
-		if (content) {
-			processedHtml = processMarkdown(content);
-		} else {
-			processedHtml = '';
-		}
-	});
-
-	$effect(() => {
-		if (containerRef && processedHtml) {
-			setupCopyButtons();
-		}
-	});
-</script>
-
-<div bind:this={containerRef} class="markdown-content {className}">
-	{@html processedHtml}
-</div>
-
-<style>
-	.markdown-content {
-		line-height: 1.6;
-	}
-
-	/* Paragraphs */
-	.markdown-content :global(p) {
-		margin-bottom: 1rem;
-	}
-
-	.markdown-content :global(p:last-child) {
-		margin-bottom: 0;
-	}
-
-	/* Headers */
-	.markdown-content :global(h1) {
-		font-size: 1.5rem;
-		font-weight: 700;
-		margin: 1.5rem 0 0.75rem 0;
-		color: var(--exo-yellow, #ffd700);
-	}
-
-	.markdown-content :global(h2) {
-		font-size: 1.25rem;
-		font-weight: 600;
-		margin: 1.25rem 0 0.5rem 0;
-		color: var(--exo-yellow, #ffd700);
-	}
-
-	.markdown-content :global(h3) {
-		font-size: 1.125rem;
-		font-weight: 600;
-		margin: 1rem 0 0.5rem 0;
-	}
-
-	.markdown-content :global(h4),
-	.markdown-content :global(h5),
-	.markdown-content :global(h6) {
-		font-size: 1rem;
-		font-weight: 600;
-		margin: 0.75rem 0 0.25rem 0;
-	}
-
-	/* Bold and italic */
-	.markdown-content :global(strong) {
-		font-weight: 600;
-	}
-
-	.markdown-content :global(em) {
-		font-style: italic;
-	}
-
-	/* Inline code */
-	.markdown-content :global(.inline-code) {
-		background: rgba(255, 215, 0, 0.1);
-		color: var(--exo-yellow, #ffd700);
-		padding: 0.125rem 0.375rem;
-		border-radius: 0.25rem;
-		font-family: ui-monospace, SFMono-Regular, 'SF Mono', Monaco, Consolas, monospace;
-		font-size: 0.875em;
-	}
-
-	/* Links */
-	.markdown-content :global(a) {
-		color: var(--exo-yellow, #ffd700);
-		text-decoration: underline;
-		text-underline-offset: 2px;
-	}
-
-	.markdown-content :global(a:hover) {
-		opacity: 0.8;
-	}
-
-	/* Lists */
-	.markdown-content :global(ul) {
-		list-style-type: disc;
-		margin-left: 1.5rem;
-		margin-bottom: 1rem;
-	}
-
-	.markdown-content :global(ol) {
-		list-style-type: decimal;
-		margin-left: 1.5rem;
-		margin-bottom: 1rem;
-	}
-
-	.markdown-content :global(li) {
-		margin-bottom: 0.25rem;
-	}
-
-	.markdown-content :global(li::marker) {
-		color: var(--exo-light-gray, #9ca3af);
-	}
-
-	/* Blockquotes */
-	.markdown-content :global(blockquote) {
-		border-left: 3px solid var(--exo-yellow, #ffd700);
-		padding: 0.5rem 1rem;
-		margin: 1rem 0;
-		background: rgba(255, 215, 0, 0.05);
-		border-radius: 0 0.25rem 0.25rem 0;
-	}
-
-	/* Tables */
-	.markdown-content :global(table) {
-		width: 100%;
-		margin: 1rem 0;
-		border-collapse: collapse;
-		font-size: 0.875rem;
-	}
-
-	.markdown-content :global(th) {
-		background: rgba(255, 215, 0, 0.1);
-		border: 1px solid rgba(255, 215, 0, 0.2);
-		padding: 0.5rem;
-		text-align: left;
-		font-weight: 600;
-	}
-
-	.markdown-content :global(td) {
-		border: 1px solid rgba(255, 255, 255, 0.1);
-		padding: 0.5rem;
-	}
-
-	/* Horizontal rule */
-	.markdown-content :global(hr) {
-		border: none;
-		border-top: 1px solid rgba(255, 255, 255, 0.1);
-		margin: 1.5rem 0;
-	}
-
-	/* Code block wrapper */
-	.markdown-content :global(.code-block-wrapper) {
-		margin: 1rem 0;
-		border-radius: 0.5rem;
-		overflow: hidden;
-		border: 1px solid rgba(255, 215, 0, 0.2);
-		background: rgba(0, 0, 0, 0.4);
-	}
-
-	.markdown-content :global(.code-block-header) {
-		display: flex;
-		justify-content: space-between;
-		align-items: center;
-		padding: 0.5rem 0.75rem;
-		background: rgba(255, 215, 0, 0.05);
-		border-bottom: 1px solid rgba(255, 215, 0, 0.1);
-	}
-
-	.markdown-content :global(.code-language) {
-		color: var(--exo-yellow, #ffd700);
-		font-size: 0.7rem;
-		font-weight: 500;
-		text-transform: uppercase;
-		letter-spacing: 0.1em;
-		font-family: ui-monospace, SFMono-Regular, 'SF Mono', Monaco, Consolas, monospace;
-	}
-
-	.markdown-content :global(.copy-code-btn) {
-		display: flex;
-		align-items: center;
-		justify-content: center;
-		padding: 0.25rem;
-		background: transparent;
-		border: none;
-		color: var(--exo-light-gray, #9ca3af);
-		cursor: pointer;
-		transition: color 0.2s;
-		border-radius: 0.25rem;
-	}
-
-	.markdown-content :global(.copy-code-btn:hover) {
-		color: var(--exo-yellow, #ffd700);
-	}
-
-	.markdown-content :global(.copy-code-btn.copied) {
-		color: #22c55e;
-	}
-
-	.markdown-content :global(.code-block-wrapper pre) {
-		margin: 0;
-		padding: 1rem;
-		overflow-x: auto;
-		background: transparent;
-	}
-
-	.markdown-content :global(.code-block-wrapper code) {
-		font-family: ui-monospace, SFMono-Regular, 'SF Mono', Monaco, Consolas, monospace;
-		font-size: 0.8125rem;
-		line-height: 1.5;
-		background: transparent;
-	}
-
-	/* Syntax highlighting - dark theme matching EXO style */
-	.markdown-content :global(.hljs) {
-		color: #e5e7eb;
-	}
-
-	.markdown-content :global(.hljs-keyword),
-	.markdown-content :global(.hljs-selector-tag),
-	.markdown-content :global(.hljs-literal),
-	.markdown-content :global(.hljs-section),
-	.markdown-content :global(.hljs-link) {
-		color: #c084fc;
-	}
-
-	.markdown-content :global(.hljs-string),
-	.markdown-content :global(.hljs-title),
-	.markdown-content :global(.hljs-name),
-	.markdown-content :global(.hljs-type),
-	.markdown-content :global(.hljs-attribute),
-	.markdown-content :global(.hljs-symbol),
-	.markdown-content :global(.hljs-bullet),
-	.markdown-content :global(.hljs-addition),
-	.markdown-content :global(.hljs-variable),
-	.markdown-content :global(.hljs-template-tag),
-	.markdown-content :global(.hljs-template-variable) {
-		color: #fbbf24;
-	}
-
-	.markdown-content :global(.hljs-comment),
-	.markdown-content :global(.hljs-quote),
-	.markdown-content :global(.hljs-deletion),
-	.markdown-content :global(.hljs-meta) {
-		color: #6b7280;
-	}
-
-	.markdown-content :global(.hljs-number),
-	.markdown-content :global(.hljs-regexp),
-	.markdown-content :global(.hljs-literal),
-	.markdown-content :global(.hljs-built_in) {
-		color: #34d399;
-	}
-
-	.markdown-content :global(.hljs-function),
-	.markdown-content :global(.hljs-class .hljs-title) {
-		color: #60a5fa;
-	}
-
-	/* KaTeX math styling */
-	.markdown-content :global(.katex) {
-		font-size: 1.1em;
-	}
-
-	.markdown-content :global(.katex-display) {
-		margin: 1rem 0;
-		overflow-x: auto;
-		overflow-y: hidden;
-		padding: 0.5rem 0;
-	}
-
-	.markdown-content :global(.katex-display > .katex) {
-		text-align: center;
-	}
-
-	.markdown-content :global(.math-error) {
-		color: #f87171;
-		font-family: ui-monospace, SFMono-Regular, 'SF Mono', Monaco, Consolas, monospace;
-		font-size: 0.875em;
-		background: rgba(248, 113, 113, 0.1);
-		padding: 0.125rem 0.25rem;
-		border-radius: 0.25rem;
-	}
-</style>
--- a/dashboard/src/lib/components/ModelCard.svelte
+++ b/dashboard/src/lib/components/ModelCard.svelte
@@ -1,6 +1,5 @@
 <script lang="ts">
-	import type { DownloadProgress, NodeInfo, PlacementPreview, TopologyEdge } from '$lib/stores/app.svelte';
-	import { debugMode, topologyData } from '$lib/stores/app.svelte';
+	import type { DownloadProgress, NodeInfo, PlacementPreview } from '$lib/stores/app.svelte';

 interface Props {
 		model: { id: string; name?: string; storage_size_megabytes?: number };
@@ -207,8 +206,12 @@ function toggleNodeDetails(nodeId: string): void {
 		const centerY = topoHeight / 2;
 		const radius = numNodes === 1 ? 0 : numNodes === 2 ? 45 : Math.min(topoWidth, topoHeight) * 0.32;
 		
-		// Only use API preview data - no local estimation
+		// Use API preview data if available
 		const hasApiPreview = apiPreview !== null && apiPreview.error === null && apiPreview.memory_delta_by_node !== null;
+		const canFit = hasApiPreview ? true : (() => {
+			const totalAvailable = nodeArray.reduce((sum, n) => sum + n.availableGB, 0);
+			return totalAvailable >= estimatedMemory;
+		})();
 		const error = apiPreview?.error ?? null;
 		
 		let placementNodes: Array<{ 
@@ -229,140 +232,135 @@ function toggleNodeDetails(nodeId: string): void {
 			modelFillHeight: number;
 		}> = [];
 		
-		// Use API placement data directly
-		const memoryDelta = apiPreview?.memory_delta_by_node ?? {};
-		placementNodes = nodeArray.map((n, i) => {
-			const deltaBytes = memoryDelta[n.id] ?? 0;
-			const modelUsageGB = deltaBytes / (1024 * 1024 * 1024);
-			const isUsed = deltaBytes > 0;
-			const angle = numNodes === 1 ? 0 : (i / numNodes) * Math.PI * 2 - Math.PI / 2;
-			const safeTotal = Math.max(n.totalGB, 0.001);
-			const currentPercent = clampPercent((n.usedGB / safeTotal) * 100);
-			const newPercent = clampPercent(((n.usedGB + modelUsageGB) / safeTotal) * 100);
-			const screenHeight = iconSize * 0.58;
+		if (hasApiPreview && apiPreview.memory_delta_by_node) {
+			// Use API placement data
+			const memoryDelta = apiPreview.memory_delta_by_node;
+			placementNodes = nodeArray.map((n, i) => {
+				const deltaBytes = memoryDelta[n.id] ?? 0;
+				const modelUsageGB = deltaBytes / (1024 * 1024 * 1024);
+				const isUsed = deltaBytes > 0;
+				const angle = numNodes === 1 ? 0 : (i / numNodes) * Math.PI * 2 - Math.PI / 2;
+				const safeTotal = Math.max(n.totalGB, 0.001);
+				const currentPercent = clampPercent((n.usedGB / safeTotal) * 100);
+				const newPercent = clampPercent(((n.usedGB + modelUsageGB) / safeTotal) * 100);
+				const screenHeight = iconSize * 0.58;
+				
+				return {
+					id: n.id,
+					deviceName: n.deviceName,
+					deviceType: n.deviceType,
+					totalGB: n.totalGB,
+					currentUsedGB: n.usedGB,
+					modelUsageGB,
+					currentPercent,
+					newPercent,
+					isUsed,
+					x: centerX + Math.cos(angle) * radius,
+					y: centerY + Math.sin(angle) * radius,
+					iconSize,
+					screenHeight,
+					currentFillHeight: screenHeight * (currentPercent / 100),
+					modelFillHeight: screenHeight * ((newPercent - currentPercent) / 100)
+				};
+			});
+		} else if (apiPreview?.error) {
+			// API returned an error - model can't fit, show all nodes as unused
+			placementNodes = nodeArray.map((n, i) => {
+				const angle = numNodes === 1 ? 0 : (i / numNodes) * Math.PI * 2 - Math.PI / 2;
+				const safeTotal = Math.max(n.totalGB, 0.001);
+				const currentPercent = clampPercent((n.usedGB / safeTotal) * 100);
+				const screenHeight = iconSize * 0.58;
+				
+				return {
+					id: n.id,
+					deviceName: n.deviceName,
+					deviceType: n.deviceType,
+					totalGB: n.totalGB,
+					currentUsedGB: n.usedGB,
+					modelUsageGB: 0,
+					currentPercent,
+					newPercent: currentPercent,
+					isUsed: false,
+					x: centerX + Math.cos(angle) * radius,
+					y: centerY + Math.sin(angle) * radius,
+					iconSize,
+					screenHeight,
+					currentFillHeight: screenHeight * (currentPercent / 100),
+					modelFillHeight: 0
+				};
+			});
+		} else {
+			// Fallback: local estimation based on sharding strategy
+			const memoryNeeded = estimatedMemory;
 			
-			return {
-				id: n.id,
-				deviceName: n.deviceName,
-				deviceType: n.deviceType,
-				totalGB: n.totalGB,
-				currentUsedGB: n.usedGB,
-				modelUsageGB,
-				currentPercent,
-				newPercent,
-				isUsed,
-				x: centerX + Math.cos(angle) * radius,
-				y: centerY + Math.sin(angle) * radius,
-				iconSize,
-				screenHeight,
-				currentFillHeight: screenHeight * (currentPercent / 100),
-				modelFillHeight: screenHeight * ((newPercent - currentPercent) / 100)
-			};
-		});
+			if (sharding === 'Pipeline') {
+				const memoryPerNode = memoryNeeded / numNodes;
+				placementNodes = nodeArray.map((n, i) => {
+					const angle = numNodes === 1 ? 0 : (i / numNodes) * Math.PI * 2 - Math.PI / 2;
+					const safeTotal = Math.max(n.totalGB, 0.001);
+					const currentPercent = clampPercent((n.usedGB / safeTotal) * 100);
+					const newPercent = clampPercent(((n.usedGB + memoryPerNode) / safeTotal) * 100);
+					const screenHeight = iconSize * 0.58;
+					
+					return {
+						id: n.id,
+						deviceName: n.deviceName,
+						deviceType: n.deviceType,
+						totalGB: n.totalGB,
+						currentUsedGB: n.usedGB,
+						modelUsageGB: memoryPerNode,
+						currentPercent,
+						newPercent,
+						isUsed: true,
+						x: centerX + Math.cos(angle) * radius,
+						y: centerY + Math.sin(angle) * radius,
+						iconSize,
+						screenHeight,
+						currentFillHeight: screenHeight * (currentPercent / 100),
+						modelFillHeight: screenHeight * ((newPercent - currentPercent) / 100)
+					};
+				});
+			} else {
+				let remaining = memoryNeeded;
+				placementNodes = nodeArray.map((n, i) => {
+					const allocated = Math.min(remaining, n.availableGB);
+					remaining -= allocated;
+					const isUsed = allocated > 0;
+					const angle = numNodes === 1 ? 0 : (i / numNodes) * Math.PI * 2 - Math.PI / 2;
+					const safeTotal = Math.max(n.totalGB, 0.001);
+					const currentPercent = clampPercent((n.usedGB / safeTotal) * 100);
+					const newPercent = clampPercent(((n.usedGB + allocated) / safeTotal) * 100);
+					const screenHeight = iconSize * 0.58;
+					
+					return {
+						id: n.id,
+						deviceName: n.deviceName,
+						deviceType: n.deviceType,
+						totalGB: n.totalGB,
+						currentUsedGB: n.usedGB,
+						modelUsageGB: allocated,
+						currentPercent,
+						newPercent,
+						isUsed,
+						x: centerX + Math.cos(angle) * radius,
+						y: centerY + Math.sin(angle) * radius,
+						iconSize,
+						screenHeight,
+						currentFillHeight: screenHeight * (currentPercent / 100),
+						modelFillHeight: screenHeight * ((newPercent - currentPercent) / 100)
+					};
+				});
+			}
+		}
 		
 		const totalAvailable = nodeArray.reduce((sum, n) => sum + n.availableGB, 0);
-		return { nodes: placementNodes, canFit: hasApiPreview, totalAvailable, topoWidth, topoHeight, error };
+		return { nodes: placementNodes, canFit: hasApiPreview || canFit, totalAvailable, topoWidth, topoHeight, error };
 	});
 	
 	const canFit = $derived(apiPreview ? apiPreview.error === null : placementPreview().canFit);
 	const placementError = $derived(apiPreview?.error ?? null);
 	const nodeCount = $derived(nodeList().length);
 	const filterId = $derived(model.id.replace(/[^a-zA-Z0-9]/g, ''));
-	
-	// Debug mode state
-	const isDebugMode = $derived(debugMode());
-	const topology = $derived(topologyData());
-	const isRdma = $derived(runtime === 'MlxIbv' || runtime === 'MlxJaccl');
-	
-	// Get interface name for an IP from node data
-	function getInterfaceForIp(nodeId: string, ip?: string): string | null {
-		if (!ip || !topology?.nodes) return null;
-		
-		// Strip port if present
-		const cleanIp = ip.includes(':') && !ip.includes('[') ? ip.split(':')[0] : ip;
-		
-		// Check specified node first
-		const node = topology.nodes[nodeId];
-		if (node) {
-			const match = node.network_interfaces?.find((iface) =>
-				(iface.addresses || []).some((addr) => addr === cleanIp || addr === ip)
-			);
-			if (match?.name) return match.name;
-			
-			const mapped = node.ip_to_interface?.[cleanIp] || node.ip_to_interface?.[ip];
-			if (mapped) return mapped;
-		}
-		
-		// Fallback: check all nodes
-		for (const [, otherNode] of Object.entries(topology.nodes)) {
-			if (!otherNode) continue;
-			const match = otherNode.network_interfaces?.find((iface) =>
-				(iface.addresses || []).some((addr) => addr === cleanIp || addr === ip)
-			);
-			if (match?.name) return match.name;
-			
-			const mapped = otherNode.ip_to_interface?.[cleanIp] || otherNode.ip_to_interface?.[ip];
-			if (mapped) return mapped;
-		}
-		
-		return null;
-	}
-	
-	// Get directional arrow based on node positions
-	function getArrow(fromNode: { x: number; y: number }, toNode: { x: number; y: number }): string {
-		const dx = toNode.x - fromNode.x;
-		const dy = toNode.y - fromNode.y;
-		const absX = Math.abs(dx);
-		const absY = Math.abs(dy);
-		
-		if (absX > absY * 2) {
-			return dx > 0 ? '→' : '←';
-		} else if (absY > absX * 2) {
-			return dy > 0 ? '↓' : '↑';
-		} else {
-			if (dx > 0 && dy > 0) return '↘';
-			if (dx > 0 && dy < 0) return '↗';
-			if (dx < 0 && dy > 0) return '↙';
-			return '↖';
-		}
-	}
-
-	// Get connection info for edges between two nodes
-	// Returns exactly one connection per direction (A→B and B→A), preferring non-loopback
-	function getConnectionInfo(nodeId1: string, nodeId2: string): Array<{ ip: string; iface: string | null; from: string; to: string }> {
-		if (!topology?.edges) return [];
-		
-		// Collect candidates for each direction
-		const aToBCandidates: Array<{ ip: string; iface: string | null }> = [];
-		const bToACandidates: Array<{ ip: string; iface: string | null }> = [];
-		
-		for (const edge of topology.edges) {
-			const ip = edge.sendBackIp || '?';
-			const iface = edge.sendBackInterface || getInterfaceForIp(edge.source, ip);
-			
-			if (edge.source === nodeId1 && edge.target === nodeId2) {
-				aToBCandidates.push({ ip, iface });
-			} else if (edge.source === nodeId2 && edge.target === nodeId1) {
-				bToACandidates.push({ ip, iface });
-			}
-		}
-		
-		// Pick best (prefer non-loopback)
-		const pickBest = (candidates: Array<{ ip: string; iface: string | null }>) => {
-			if (candidates.length === 0) return null;
-			return candidates.find(c => !c.ip.startsWith('127.')) || candidates[0];
-		};
-		
-		const result: Array<{ ip: string; iface: string | null; from: string; to: string }> = [];
-		
-		const bestAtoB = pickBest(aToBCandidates);
-		if (bestAtoB) result.push({ ...bestAtoB, from: nodeId1, to: nodeId2 });
-		
-		const bestBtoA = pickBest(bToACandidates);
-		if (bestBtoA) result.push({ ...bestBtoA, from: nodeId2, to: nodeId1 });
-		
-		return result;
-	}
 </script>

 <div class="relative group">
@@ -455,26 +453,6 @@ function toggleNodeDetails(nodeId: string): void {
 					
 					<!-- Connection lines between nodes (if multiple) -->
 					{#if preview.nodes.length > 1}
-						{@const usedNodes = preview.nodes.filter(n => n.isUsed)}
-						{@const nodePositions = Object.fromEntries(preview.nodes.map(n => [n.id, { x: n.x, y: n.y }]))}
-						{@const allConnections = isDebugMode && usedNodes.length > 1 ? (() => {
-							const conns: Array<{ ip: string; iface: string | null; from: string; to: string; midX: number; midY: number; arrow: string }> = [];
-							for (let i = 0; i < usedNodes.length; i++) {
-								for (let j = i + 1; j < usedNodes.length; j++) {
-									const n1 = usedNodes[i];
-									const n2 = usedNodes[j];
-									const midX = (n1.x + n2.x) / 2;
-									const midY = (n1.y + n2.y) / 2;
-									for (const c of getConnectionInfo(n1.id, n2.id)) {
-										const fromPos = nodePositions[c.from];
-										const toPos = nodePositions[c.to];
-										const arrow = fromPos && toPos ? getArrow(fromPos, toPos) : '→';
-										conns.push({ ...c, midX, midY, arrow });
-									}
-								}
-							}
-							return conns;
-						})() : []}
 						{#each preview.nodes as node, i}
 							{#each preview.nodes.slice(i + 1) as node2}
 								<line 
@@ -486,43 +464,6 @@ function toggleNodeDetails(nodeId: string): void {
 								/>
 							{/each}
 						{/each}
-						<!-- Debug: Show connection IPs/interfaces in corners -->
-						{#if isDebugMode && allConnections.length > 0}
-							{@const centerX = preview.topoWidth / 2}
-							{@const centerY = preview.topoHeight / 2}
-							{@const quadrants = {
-								topLeft: allConnections.filter(c => c.midX < centerX && c.midY < centerY),
-								topRight: allConnections.filter(c => c.midX >= centerX && c.midY < centerY),
-								bottomLeft: allConnections.filter(c => c.midX < centerX && c.midY >= centerY),
-								bottomRight: allConnections.filter(c => c.midX >= centerX && c.midY >= centerY)
-							}}
-							{@const padding = 4}
-							{@const lineHeight = 8}
-							<!-- Top Left -->
-							{#each quadrants.topLeft as conn, idx}
-								<text x={padding} y={padding + idx * lineHeight} text-anchor="start" dominant-baseline="hanging" font-size="6" font-family="SF Mono, Monaco, monospace" fill={conn.iface ? 'rgba(255,255,255,0.85)' : 'rgba(248,113,113,0.85)'}>
-									{conn.arrow} {isRdma ? (conn.iface || '?') : `${conn.ip}${conn.iface ? ` (${conn.iface})` : ''}`}
-								</text>
-							{/each}
-							<!-- Top Right -->
-							{#each quadrants.topRight as conn, idx}
-								<text x={preview.topoWidth - padding} y={padding + idx * lineHeight} text-anchor="end" dominant-baseline="hanging" font-size="6" font-family="SF Mono, Monaco, monospace" fill={conn.iface ? 'rgba(255,255,255,0.85)' : 'rgba(248,113,113,0.85)'}>
-									{conn.arrow} {isRdma ? (conn.iface || '?') : `${conn.ip}${conn.iface ? ` (${conn.iface})` : ''}`}
-								</text>
-							{/each}
-							<!-- Bottom Left -->
-							{#each quadrants.bottomLeft as conn, idx}
-								<text x={padding} y={preview.topoHeight - padding - (quadrants.bottomLeft.length - 1 - idx) * lineHeight} text-anchor="start" dominant-baseline="auto" font-size="6" font-family="SF Mono, Monaco, monospace" fill={conn.iface ? 'rgba(255,255,255,0.85)' : 'rgba(248,113,113,0.85)'}>
-									{conn.arrow} {isRdma ? (conn.iface || '?') : `${conn.ip}${conn.iface ? ` (${conn.iface})` : ''}`}
-								</text>
-							{/each}
-							<!-- Bottom Right -->
-							{#each quadrants.bottomRight as conn, idx}
-								<text x={preview.topoWidth - padding} y={preview.topoHeight - padding - (quadrants.bottomRight.length - 1 - idx) * lineHeight} text-anchor="end" dominant-baseline="auto" font-size="6" font-family="SF Mono, Monaco, monospace" fill={conn.iface ? 'rgba(255,255,255,0.85)' : 'rgba(248,113,113,0.85)'}>
-									{conn.arrow} {isRdma ? (conn.iface || '?') : `${conn.ip}${conn.iface ? ` (${conn.iface})` : ''}`}
-								</text>
-							{/each}
-						{/if}
 					{/if}
 					
 					{#each preview.nodes as node}
--- a/dashboard/src/lib/components/TopologyGraph.svelte
+++ b/dashboard/src/lib/components/TopologyGraph.svelte
@@ -24,36 +24,19 @@ function getNodeLabel(nodeId: string): string {

 function getInterfaceLabel(nodeId: string, ip?: string): { label: string; missing: boolean } {
 	if (!ip) return { label: '?', missing: true };
-	
-	// Strip port if present (e.g., "192.168.1.1:8080" -> "192.168.1.1")
-	const cleanIp = ip.includes(':') && !ip.includes('[') ? ip.split(':')[0] : ip;
-	
-	// Helper to check a node's interfaces
-	function checkNode(node: typeof data.nodes[string]): string | null {
-		if (!node) return null;
-		
-		const matchFromInterfaces = node.network_interfaces?.find((iface) =>
-			(iface.addresses || []).some((addr) => addr === cleanIp || addr === ip)
-		);
-		if (matchFromInterfaces?.name) {
-			return matchFromInterfaces.name;
-		}
+	const node = data?.nodes?.[nodeId];
+	if (!node) return { label: '?', missing: true };

-		const mapped = node.ip_to_interface?.[cleanIp] || node.ip_to_interface?.[ip];
-		if (mapped && mapped.trim().length > 0) {
-			return mapped;
-		}
-		return null;
+	const matchFromInterfaces = node.network_interfaces?.find((iface) =>
+		(iface.addresses || []).some((addr) => addr === ip)
+	);
+	if (matchFromInterfaces?.name) {
+		return { label: matchFromInterfaces.name, missing: false };
 	}
-	
-	// Try specified node first
-	const result = checkNode(data?.nodes?.[nodeId]);
-	if (result) return { label: result, missing: false };
-	
-	// Fallback: search all nodes for this IP
-	for (const [, otherNode] of Object.entries(data?.nodes || {})) {
-		const otherResult = checkNode(otherNode);
-		if (otherResult) return { label: otherResult, missing: false };
+
+	const mapped = node.ip_to_interface?.[ip];
+	if (mapped && mapped.trim().length > 0) {
+		return { label: mapped, missing: false };
 	}

 	return { label: '?', missing: true };
@@ -84,7 +67,6 @@ function wrapLine(text: string, maxLen: number): string[] {
 	return lines;
 }

-
 	// Apple logo path for MacBook Pro screen
 	const APPLE_LOGO_PATH = "M788.1 340.9c-5.8 4.5-108.2 62.2-108.2 190.5 0 148.4 130.3 200.9 134.2 202.2-.6 3.2-20.7 71.9-68.7 141.9-42.8 61.6-87.5 123.1-155.5 123.1s-85.5-39.5-164-39.5c-76.5 0-103.7 40.8-165.9 40.8s-105.6-57-155.5-127C46.7 790.7 0 663 0 541.8c0-194.4 126.4-297.5 250.8-297.5 66.1 0 121.2 43.4 162.7 43.4 39.5 0 101.1-46 176.3-46 28.5 0 130.9 2.6 198.3 99.2zm-234-181.5c31.1-36.9 53.1-88.1 53.1-139.3 0-7.1-.6-14.3-1.9-20.1-50.6 1.9-110.8 33.7-147.1 75.8-28.5 32.4-55.1 83.6-55.1 135.5 0 7.8 1.3 15.6 1.9 18.1 3.2.6 8.4 1.3 13.6 1.3 45.4 0 102.5-30.4 135.5-71.3z";
 	const LOGO_NATIVE_WIDTH = 814;
@@ -256,7 +238,6 @@ function wrapLine(text: string, maxLen: number): string[] {
 		const debugLabelsGroup = svg.append('g').attr('class', 'debug-edge-labels');

 		const pairMap = new Map<string, { a: string; b: string; aToB: boolean; bToA: boolean; connections: Array<{ from: string; to: string; ip: string; ifaceLabel: string; missingIface: boolean }> }>();
-		let debugEdgeLabels: Array<{ connections: typeof pairMap extends Map<string, infer V> ? V['connections'] : never; isLeft: boolean; isTop: boolean; mx: number; my: number }> | null = null;
 		edges.forEach(edge => {
 			if (!edge.source || !edge.target || edge.source === edge.target) return;
 			if (!positionById[edge.source] || !positionById[edge.target]) return;
@@ -333,97 +314,109 @@ function wrapLine(text: string, maxLen: number): string[] {
 					.attr('marker-end', 'url(#arrowhead)');
 			}

-			// Collect debug labels for later positioning at edges
 			if (debugEnabled && entry.connections.length > 0) {
-				// Determine which side of viewport based on edge midpoint
-				const isLeft = mx < centerX;
-				const isTop = my < safeCenterY;
-				
-				// Store for batch rendering after all edges processed
-				if (!debugEdgeLabels) debugEdgeLabels = [];
-				debugEdgeLabels.push({
-					connections: entry.connections,
-					isLeft,
-					isTop,
-					mx,
-					my
-				});
-			}
-		});
+				const maxBoxes = 6;
+				const fontSize = isMinimized ? 8 : 9;
+				const lineGap = 2;
+				const labelOffsetOut = Math.max(140, minDimension * 0.38);
+				const labelOffsetSide = isMinimized ? 16 : 20;
+				const boxWidth = 170;
+				const maxLineLen = 26;

-		// Render debug labels at viewport edges/corners
-		if (debugEdgeLabels && debugEdgeLabels.length > 0) {
-			const fontSize = isMinimized ? 10 : 12;
-			const lineHeight = fontSize + 4;
-			const padding = 10;
-			
-			// Helper to get arrow based on direction vector
-			function getArrow(fromId: string, toId: string): string {
-				const fromPos = positionById[fromId];
-				const toPos = positionById[toId];
-				if (!fromPos || !toPos) return '→';
-				
-				const dirX = toPos.x - fromPos.x;
-				const dirY = toPos.y - fromPos.y;
-				const absX = Math.abs(dirX);
-				const absY = Math.abs(dirY);
-				
-				if (absX > absY * 2) {
-					return dirX > 0 ? '→' : '←';
-				} else if (absY > absX * 2) {
-					return dirY > 0 ? '↓' : '↑';
-				} else {
-					if (dirX > 0 && dirY > 0) return '↘';
-					if (dirX > 0 && dirY < 0) return '↗';
-					if (dirX < 0 && dirY > 0) return '↙';
-					return '↖';
+				const connections = entry.connections.slice(0, maxBoxes);
+				if (entry.connections.length > maxBoxes) {
+					const remaining = entry.connections.length - maxBoxes;
+					connections.push({
+						from: '',
+						to: '',
+						ip: `(+${remaining} more)`,
+						ifaceLabel: '',
+						missingIface: false
+					});
 				}
-			}
-			
-			// Group by quadrant: topLeft, topRight, bottomLeft, bottomRight
-			const quadrants: Record<string, typeof debugEdgeLabels> = {
-				topLeft: [],
-				topRight: [],
-				bottomLeft: [],
-				bottomRight: []
-			};
-			
-			debugEdgeLabels.forEach(edge => {
-				const key = (edge.isTop ? 'top' : 'bottom') + (edge.isLeft ? 'Left' : 'Right');
-				quadrants[key].push(edge);
-			});
-			
-			// Render each quadrant
-			Object.entries(quadrants).forEach(([quadrant, edges]) => {
-				if (edges.length === 0) return;
-				
-				const isLeft = quadrant.includes('Left');
-				const isTop = quadrant.includes('top');
-				
-				let baseX = isLeft ? padding : width - padding;
-				let baseY = isTop ? padding : height - padding;
-				const textAnchor = isLeft ? 'start' : 'end';
-				
-				let currentY = baseY;
-				
-				edges.forEach(edge => {
-					edge.connections.forEach(conn => {
-						const arrow = getArrow(conn.from, conn.to);
-						const label = `${arrow} ${conn.ip} ${conn.ifaceLabel}`;
-						debugLabelsGroup.append('text')
-							.attr('x', baseX)
-							.attr('y', currentY)
-							.attr('text-anchor', textAnchor)
-							.attr('dominant-baseline', isTop ? 'hanging' : 'auto')
+
+				let dirX = mx - centerX;
+				let dirY = my - centerY;
+				const dirLen = Math.hypot(dirX, dirY);
+				if (dirLen < 1) {
+					dirX = -uy;
+					dirY = ux;
+				} else {
+					dirX /= dirLen;
+					dirY /= dirLen;
+				}
+
+				const nx = -dirY;
+				const ny = dirX;
+
+				const labelXRaw = mx + dirX * labelOffsetOut + nx * labelOffsetSide;
+				const labelYRaw = my + dirY * labelOffsetOut + ny * labelOffsetSide;
+				const clampPad = Math.min(120, minDimension * 0.12);
+				const labelX = Math.max(clampPad, Math.min(width - clampPad, labelXRaw));
+				const labelY = Math.max(clampPad, Math.min(height - clampPad, labelYRaw));
+
+				const labelGroup = debugLabelsGroup.append('g')
+					.attr('transform', `translate(${labelX}, ${labelY})`);
+
+				const textGroup = labelGroup.append('g');
+
+				connections.forEach((conn, idx) => {
+					const rawLines = conn.from && conn.to
+						? [
+							`${getNodeLabel(conn.from)}→${getNodeLabel(conn.to)}`,
+							`${conn.ip}`,
+							`${conn.ifaceLabel}`
+						]
+						: [conn.ip];
+
+					const wrapped = rawLines.flatMap(line => wrapLine(line, maxLineLen));
+
+					wrapped.forEach((line, lineIdx) => {
+						textGroup.append('text')
+							.attr('x', 0)
+							.attr('y', (idx * (wrapped.length * (fontSize + lineGap))) + lineIdx * (fontSize + lineGap))
+							.attr('text-anchor', 'middle')
+							.attr('dominant-baseline', 'hanging')
 							.attr('font-size', fontSize)
 							.attr('font-family', 'SF Mono, monospace')
-							.attr('fill', conn.missingIface ? 'rgba(248,113,113,0.9)' : 'rgba(255,255,255,0.85)')
-							.text(label);
-						currentY += isTop ? lineHeight : -lineHeight;
+							.attr('fill', conn.missingIface ? 'rgba(248,113,113,0.9)' : 'rgba(255,255,255,0.9)')
+							.text(line);
 					});
 				});
-			});
-		}
+
+				const bbox = textGroup.node()?.getBBox();
+				if (bbox) {
+					const paddedWidth = Math.max(boxWidth, bbox.width + 14);
+					const boxHeight = bbox.height + 8;
+					const boxMinX = labelX - paddedWidth / 2;
+					const boxMaxX = labelX + paddedWidth / 2;
+					const boxMinY = labelY + bbox.y - 4;
+					const boxMaxY = boxMinY + boxHeight;
+
+					const clampPadDynamic = Math.min(140, minDimension * 0.18);
+					let shiftX = 0;
+					let shiftY = 0;
+					if (boxMinX < clampPadDynamic) shiftX = clampPadDynamic - boxMinX;
+					if (boxMaxX > width - clampPadDynamic) shiftX = (width - clampPadDynamic) - boxMaxX;
+					if (boxMinY < clampPadDynamic) shiftY = clampPadDynamic - boxMinY;
+					if (boxMaxY > height - clampPadDynamic) shiftY = (height - clampPadDynamic) - boxMaxY;
+
+					const finalX = labelX + shiftX;
+					const finalY = labelY + shiftY;
+					labelGroup.attr('transform', `translate(${finalX}, ${finalY})`);
+
+					labelGroup.insert('rect', 'g')
+						.attr('x', -paddedWidth / 2)
+						.attr('y', bbox.y - 4)
+						.attr('width', paddedWidth)
+						.attr('height', boxHeight)
+						.attr('rx', 4)
+						.attr('fill', 'rgba(0,0,0,0.75)')
+						.attr('stroke', 'rgba(255,255,255,0.12)')
+						.attr('stroke-width', 0.6);
+				}
+			}
+		});

 		// Draw nodes
 		const nodesGroup = svg.append('g').attr('class', 'nodes-group');
@@ -975,5 +968,4 @@ function wrapLine(text: string, maxLen: number): string[] {
 		from { stroke-dashoffset: 0; }
 		to { stroke-dashoffset: -10; }
 	}
-
 </style>
--- a/dashboard/src/lib/components/index.ts
+++ b/dashboard/src/lib/components/index.ts
@@ -4,5 +4,4 @@ export { default as ChatMessages } from './ChatMessages.svelte';
 export { default as ChatAttachments } from './ChatAttachments.svelte';
 export { default as ChatSidebar } from './ChatSidebar.svelte';
 export { default as ModelCard } from './ModelCard.svelte';
-export { default as MarkdownContent } from './MarkdownContent.svelte';

--- a/dashboard/src/lib/stores/app.svelte.ts
+++ b/dashboard/src/lib/stores/app.svelte.ts
@@ -327,8 +327,6 @@ class AppStore {
 	isTopologyMinimized = $state(false);
 	isSidebarOpen = $state(false); // Hidden by default, shown when in chat mode
 	debugMode = $state(false);
-	topologyOnlyMode = $state(false);
-	chatSidebarVisible = $state(true); // Shown by default
 	
 	private fetchInterval: ReturnType<typeof setInterval> | null = null;
 	private previewsInterval: ReturnType<typeof setInterval> | null = null;
@@ -339,8 +337,6 @@ class AppStore {
 			this.startPolling();
 			this.loadConversationsFromStorage();
 			this.loadDebugModeFromStorage();
-			this.loadTopologyOnlyModeFromStorage();
-			this.loadChatSidebarVisibleFromStorage();
 		}
 	}

@@ -398,44 +394,6 @@ class AppStore {
 		}
 	}

-	private loadTopologyOnlyModeFromStorage() {
-		try {
-			const stored = localStorage.getItem('exo-topology-only-mode');
-			if (stored !== null) {
-				this.topologyOnlyMode = stored === 'true';
-			}
-		} catch (error) {
-			console.error('Failed to load topology only mode:', error);
-		}
-	}
-
-	private saveTopologyOnlyModeToStorage() {
-		try {
-			localStorage.setItem('exo-topology-only-mode', this.topologyOnlyMode ? 'true' : 'false');
-		} catch (error) {
-			console.error('Failed to save topology only mode:', error);
-		}
-	}
-
-	private loadChatSidebarVisibleFromStorage() {
-		try {
-			const stored = localStorage.getItem('exo-chat-sidebar-visible');
-			if (stored !== null) {
-				this.chatSidebarVisible = stored === 'true';
-			}
-		} catch (error) {
-			console.error('Failed to load chat sidebar visibility:', error);
-		}
-	}
-
-	private saveChatSidebarVisibleToStorage() {
-		try {
-			localStorage.setItem('exo-chat-sidebar-visible', this.chatSidebarVisible ? 'true' : 'false');
-		} catch (error) {
-			console.error('Failed to save chat sidebar visibility:', error);
-		}
-	}
-
 	/**
 	 * Create a new conversation
 	 */
@@ -740,34 +698,6 @@ class AppStore {
 		this.saveDebugModeToStorage();
 	}

-	getTopologyOnlyMode(): boolean {
-		return this.topologyOnlyMode;
-	}
-
-	setTopologyOnlyMode(enabled: boolean) {
-		this.topologyOnlyMode = enabled;
-		this.saveTopologyOnlyModeToStorage();
-	}
-
-	toggleTopologyOnlyMode() {
-		this.topologyOnlyMode = !this.topologyOnlyMode;
-		this.saveTopologyOnlyModeToStorage();
-	}
-
-	getChatSidebarVisible(): boolean {
-		return this.chatSidebarVisible;
-	}
-
-	setChatSidebarVisible(visible: boolean) {
-		this.chatSidebarVisible = visible;
-		this.saveChatSidebarVisibleToStorage();
-	}
-
-	toggleChatSidebarVisible() {
-		this.chatSidebarVisible = !this.chatSidebarVisible;
-		this.saveChatSidebarVisibleToStorage();
-	}
-
 	startPolling() {
 		this.fetchState();
 		this.fetchInterval = setInterval(() => this.fetchState(), 1000);
@@ -958,6 +888,8 @@ class AppStore {
 		
 		if (lastUserIndex === -1) return;
 		
+		const lastUserMessage = this.messages[lastUserIndex];
+		
 		// Remove any messages after the user message
 		this.messages = this.messages.slice(0, lastUserIndex + 1);
 		
@@ -998,10 +930,7 @@ class AppStore {
 			}
 			
 			if (!modelToUse) {
-				const idx = this.messages.findIndex(m => m.id === assistantMessage.id);
-				if (idx !== -1) {
-					this.messages[idx].content = 'Error: No model available. Please launch an instance first.';
-				}
+				assistantMessage.content = 'Error: No model available. Please launch an instance first.';
 				this.isLoading = false;
 				this.updateActiveConversation();
 				return;
@@ -1019,10 +948,7 @@ class AppStore {
 			
 			if (!response.ok) {
 				const errorText = await response.text();
-				const idx = this.messages.findIndex(m => m.id === assistantMessage.id);
-				if (idx !== -1) {
-					this.messages[idx].content = `Error: ${response.status} - ${errorText}`;
-				}
+				assistantMessage.content = `Error: ${response.status} - ${errorText}`;
 				this.isLoading = false;
 				this.updateActiveConversation();
 				return;
@@ -1030,10 +956,7 @@ class AppStore {
 			
 			const reader = response.body?.getReader();
 			if (!reader) {
-				const idx = this.messages.findIndex(m => m.id === assistantMessage.id);
-				if (idx !== -1) {
-					this.messages[idx].content = 'Error: No response stream available';
-				}
+				assistantMessage.content = 'Error: No response stream available';
 				this.isLoading = false;
 				this.updateActiveConversation();
 				return;
@@ -1061,16 +984,9 @@ class AppStore {
 							const delta = json.choices?.[0]?.delta?.content;
 							if (delta) {
 								fullContent += delta;
-								const { displayContent, thinkingContent } = this.stripThinkingTags(fullContent);
+								const { displayContent } = this.stripThinkingTags(fullContent);
 								this.currentResponse = displayContent;
-								
-								// Update the assistant message in place (triggers Svelte reactivity)
-								const idx = this.messages.findIndex(m => m.id === assistantMessage.id);
-								if (idx !== -1) {
-									this.messages[idx].content = displayContent;
-									this.messages[idx].thinking = thinkingContent || undefined;
-								}
-								this.persistActiveConversation();
+								assistantMessage.content = displayContent;
 							}
 						} catch {
 							// Skip malformed JSON
@@ -1079,25 +995,16 @@ class AppStore {
 				}
 			}
 			
-			// Final cleanup of the message
-			const { displayContent, thinkingContent } = this.stripThinkingTags(fullContent);
-			const idx = this.messages.findIndex(m => m.id === assistantMessage.id);
-			if (idx !== -1) {
-				this.messages[idx].content = displayContent;
-				this.messages[idx].thinking = thinkingContent || undefined;
-			}
-			this.persistActiveConversation();
-			
-		} catch (error) {
-			const idx = this.messages.findIndex(m => m.id === assistantMessage.id);
-			if (idx !== -1) {
-				this.messages[idx].content = `Error: ${error instanceof Error ? error.message : 'Unknown error'}`;
-			}
-			this.persistActiveConversation();
-		} finally {
-			this.isLoading = false;
+			const { displayContent } = this.stripThinkingTags(fullContent);
+			assistantMessage.content = displayContent;
 			this.currentResponse = '';
 			this.updateActiveConversation();
+			
+		} catch (error) {
+			assistantMessage.content = `Error: ${error instanceof Error ? error.message : 'Unknown error'}`;
+			this.updateActiveConversation();
+		} finally {
+			this.isLoading = false;
 		}
 	}

@@ -1457,8 +1364,6 @@ export const lastUpdate = () => appStore.lastUpdate;
 export const isTopologyMinimized = () => appStore.isTopologyMinimized;
 export const selectedChatModel = () => appStore.selectedChatModel;
 export const debugMode = () => appStore.getDebugMode();
-export const topologyOnlyMode = () => appStore.getTopologyOnlyMode();
-export const chatSidebarVisible = () => appStore.getChatSidebarVisible();

 // Actions
 export const startChat = () => appStore.startChat();
@@ -1486,9 +1391,5 @@ export const isSidebarOpen = () => appStore.isSidebarOpen;
 export const toggleSidebar = () => appStore.toggleSidebar();
 export const toggleDebugMode = () => appStore.toggleDebugMode();
 export const setDebugMode = (enabled: boolean) => appStore.setDebugMode(enabled);
-export const toggleTopologyOnlyMode = () => appStore.toggleTopologyOnlyMode();
-export const setTopologyOnlyMode = (enabled: boolean) => appStore.setTopologyOnlyMode(enabled);
-export const toggleChatSidebarVisible = () => appStore.toggleChatSidebarVisible();
-export const setChatSidebarVisible = (visible: boolean) => appStore.setChatSidebarVisible(visible);
 export const refreshState = () => appStore.fetchState();

--- a/dashboard/src/routes/+page.svelte
+++ b/dashboard/src/routes/+page.svelte
@@ -18,10 +18,6 @@
 		selectedChatModel,
 	debugMode,
 	toggleDebugMode,
-	topologyOnlyMode,
-	toggleTopologyOnlyMode,
-	chatSidebarVisible,
-	toggleChatSidebarVisible,
 		type DownloadProgress,
 		type PlacementPreview
 	} from '$lib/stores/app.svelte';
@@ -41,8 +37,6 @@
 	const selectedModelId = $derived(selectedPreviewModelId());
 	const loadingPreviews = $derived(isLoadingPreviews());
 const debugEnabled = $derived(debugMode());
-const topologyOnlyEnabled = $derived(topologyOnlyMode());
-const sidebarVisible = $derived(chatSidebarVisible());

 	let mounted = $state(false);

@@ -478,7 +472,6 @@ function toggleInstanceDownloadDetails(nodeId: string): void {
 				
 				const progress = parseDownloadProgress(downloadPayload);
 				if (progress) {
-					// Sum all values across nodes - each node downloads independently
 					totalBytes += progress.totalBytes;
 					downloadedBytes += progress.downloadedBytes;
 					totalSpeed += progress.speed;
@@ -496,17 +489,13 @@ function toggleInstanceDownloadDetails(nodeId: string): void {
 			return { isDownloading: false, progress: null, perNode: [] };
 		}

-		// ETA = total remaining bytes / total speed across all nodes
-		const remainingBytes = totalBytes - downloadedBytes;
-		const etaMs = totalSpeed > 0 ? (remainingBytes / totalSpeed) * 1000 : 0;
-
 		return {
 			isDownloading: true,
 			progress: {
 				totalBytes,
 				downloadedBytes,
 				speed: totalSpeed,
-				etaMs,
+				etaMs: totalSpeed > 0 ? ((totalBytes - downloadedBytes) / totalSpeed) * 1000 : 0,
 				percentage: totalBytes > 0 ? (downloadedBytes / totalBytes) * 100 : 0,
 				completedFiles,
 				totalFiles,
@@ -587,7 +576,6 @@ function toggleInstanceDownloadDetails(nodeId: string): void {
 					
 					const progress = parseDownloadProgress(downloadPayload);
 					if (progress) {
-						// Sum all values across nodes - each node downloads independently
 						totalBytes += progress.totalBytes;
 						downloadedBytes += progress.downloadedBytes;
 						totalSpeed += progress.speed;
@@ -608,17 +596,13 @@ function toggleInstanceDownloadDetails(nodeId: string): void {
 			return { isDownloading: false, progress: null, statusText: statusInfo.statusText, perNode: [] };
 		}

-		// ETA = total remaining bytes / total speed across all nodes
-		const remainingBytes = totalBytes - downloadedBytes;
-		const etaMs = totalSpeed > 0 ? (remainingBytes / totalSpeed) * 1000 : 0;
-
 		return {
 			isDownloading: true,
 			progress: {
 				totalBytes,
 				downloadedBytes,
 				speed: totalSpeed,
-				etaMs,
+				etaMs: totalSpeed > 0 ? ((totalBytes - downloadedBytes) / totalSpeed) * 1000 : 0,
 				percentage: totalBytes > 0 ? (downloadedBytes / totalBytes) * 100 : 0,
 				completedFiles,
 				totalFiles,
@@ -634,12 +618,10 @@ function toggleInstanceDownloadDetails(nodeId: string): void {
 	function getStatusColor(statusText: string): string {
 		switch (statusText) {
 			case 'FAILED': return 'text-red-400';
-			case 'SHUTDOWN': return 'text-gray-400';
 			case 'DOWNLOADING': return 'text-blue-400';
 			case 'LOADING': 
 			case 'WARMING UP': 
-			case 'WAITING':
-			case 'INITIALIZING': return 'text-yellow-400';
+			case 'WAITING': return 'text-yellow-400';
 			case 'RUNNING': return 'text-teal-400';
 			case 'READY': 
 			case 'LOADED': return 'text-green-400';
@@ -662,15 +644,12 @@ function toggleInstanceDownloadDetails(nodeId: string): void {
 				if (!r) return null;
 				const [kind] = getTagged(r);
 				const statusMap: Record<string, string> = {
-					RunnerWaitingForInitialization: 'WaitingForInitialization',
-					RunnerInitializingBackend: 'InitializingBackend',
 					RunnerWaitingForModel: 'WaitingForModel',
 					RunnerLoading: 'Loading',
 					RunnerLoaded: 'Loaded',
 					RunnerWarmingUp: 'WarmingUp',
 					RunnerReady: 'Ready',
 					RunnerRunning: 'Running',
-					RunnerShutdown: 'Shutdown',
 					RunnerFailed: 'Failed',
 				};
 				return kind ? statusMap[kind] || null : null;
@@ -681,15 +660,12 @@ function toggleInstanceDownloadDetails(nodeId: string): void {

 		if (statuses.length === 0) return { statusText: 'UNKNOWN', statusClass: 'inactive' };
 		if (has('Failed')) return { statusText: 'FAILED', statusClass: 'failed' };
-		if (has('Shutdown')) return { statusText: 'SHUTDOWN', statusClass: 'inactive' };
 		if (has('Loading')) return { statusText: 'LOADING', statusClass: 'starting' };
 		if (has('WarmingUp')) return { statusText: 'WARMING UP', statusClass: 'starting' };
 		if (has('Running')) return { statusText: 'RUNNING', statusClass: 'running' };
 		if (has('Ready')) return { statusText: 'READY', statusClass: 'loaded' };
 		if (has('Loaded')) return { statusText: 'LOADED', statusClass: 'loaded' };
 		if (has('WaitingForModel')) return { statusText: 'WAITING', statusClass: 'starting' };
-		if (has('InitializingBackend')) return { statusText: 'INITIALIZING', statusClass: 'starting' };
-		if (has('WaitingForInitialization')) return { statusText: 'INITIALIZING', statusClass: 'starting' };

 		return { statusText: 'RUNNING', statusClass: 'active' };
 	}
@@ -1131,47 +1107,16 @@ function toggleInstanceDownloadDetails(nodeId: string): void {
 		<div class="shooting-star" style="top: 50%; left: 40%; --duration: 45s; --delay: 30s;"></div>
 	</div>

-	{#if !topologyOnlyEnabled}
-	<HeaderNav 
-		showHome={chatStarted} 
-		onHome={handleGoHome} 
-		showSidebarToggle={true}
-		sidebarVisible={sidebarVisible}
-		onToggleSidebar={toggleChatSidebarVisible}
-	/>
-	{/if}
+	<HeaderNav showHome={chatStarted} onHome={handleGoHome} />

 	<!-- Main Content -->
 	<main class="flex-1 flex overflow-hidden relative">
-		<!-- Left: Conversation History Sidebar (hidden in topology-only mode or when toggled off) -->
-		{#if !topologyOnlyEnabled && sidebarVisible}
+		<!-- Left: Conversation History Sidebar (always visible) -->
 		<div class="w-80 flex-shrink-0 border-r border-exo-yellow/10">
 			<ChatSidebar class="h-full" />
 		</div>
-		{/if}

-		{#if topologyOnlyEnabled}
-			<!-- TOPOLOGY ONLY MODE: Full-screen topology -->
-			<div class="flex-1 flex flex-col min-h-0 min-w-0 p-4" in:fade={{ duration: 300 }}>
-				<div class="flex-1 relative bg-exo-dark-gray/40 rounded-lg overflow-hidden">
-					<TopologyGraph class="w-full h-full" highlightedNodes={highlightedNodes()} />
-					<!-- Exit topology-only mode button -->
-					<button
-						type="button"
-						onclick={toggleTopologyOnlyMode}
-						class="absolute bottom-4 right-4 p-2 rounded border border-exo-yellow/30 bg-exo-dark-gray/80 hover:border-exo-yellow/50 hover:bg-exo-dark-gray transition-colors cursor-pointer backdrop-blur-sm"
-						title="Exit topology only mode"
-					>
-						<svg class="w-5 h-5 text-exo-yellow" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2">
-							<circle cx="12" cy="5" r="2" fill="currentColor" />
-							<circle cx="5" cy="19" r="2" fill="currentColor" />
-							<circle cx="19" cy="19" r="2" fill="currentColor" />
-							<path stroke-linecap="round" d="M12 7v5m0 0l-5 5m5-5l5 5" />
-						</svg>
-					</button>
-				</div>
-			</div>
-		{:else if !chatStarted}
+		{#if !chatStarted}
 			<!-- WELCOME STATE: Topology + Instance Controls (no left sidebar for cleaner look) -->
 			<div class="flex-1 flex overflow-visible relative" in:fade={{ duration: 300 }} out:fade={{ duration: 200 }}>
 				
@@ -1355,15 +1300,14 @@ function toggleInstanceDownloadDetails(nodeId: string): void {
 																			{:else}
 																				{#each nodeProg.progress.files as f}
 																					{@const filePercent = Math.min(100, Math.max(0, f.percentage ?? 0))}
-																					{@const isFileComplete = filePercent >= 100}
 																					<div class="rounded border border-exo-medium-gray/30 bg-exo-black/40 p-2">
 																						<div class="flex items-center justify-between text-[10px] font-mono text-exo-light-gray/90">
 																							<span class="truncate pr-2">{f.name}</span>
-																							<span class={isFileComplete ? 'text-green-400' : 'text-white/80'}>{filePercent.toFixed(1)}%</span>
+																							<span class="text-white/80">{filePercent.toFixed(1)}%</span>
 																						</div>
 																						<div class="relative h-1 bg-exo-black/60 rounded-sm overflow-hidden mt-1">
 																							<div 
-																								class="absolute inset-y-0 left-0 bg-gradient-to-r {isFileComplete ? 'from-green-500 to-green-400' : 'from-exo-yellow to-exo-yellow/70'} transition-all duration-300"
+																								class="absolute inset-y-0 left-0 bg-gradient-to-r from-exo-yellow to-exo-yellow/70 transition-all duration-300"
 																								style="width: {filePercent.toFixed(1)}%"
 																							></div>
 																						</div>
@@ -1667,13 +1611,13 @@ function toggleInstanceDownloadDetails(nodeId: string): void {
 					in:fade={{ duration: 300, delay: 100 }}
 				>
 					<div class="flex-1 overflow-y-auto px-8 py-6" bind:this={chatScrollRef}>
-						<div class="max-w-7xl mx-auto">
+						<div class="max-w-3xl mx-auto">
 							<ChatMessages scrollParent={chatScrollRef} />
 						</div>
 					</div>
 					
 					<div class="flex-shrink-0 px-8 pb-6 pt-4 bg-gradient-to-t from-exo-black via-exo-black to-transparent">
-						<div class="max-w-7xl mx-auto">
+						<div class="max-w-3xl mx-auto">
 							<ChatForm placeholder="Ask anything" showModelSelector={true} />
 						</div>
 					</div>
@@ -1711,7 +1655,7 @@ function toggleInstanceDownloadDetails(nodeId: string): void {
 							<!-- Panel Header -->
 							<div class="flex items-center gap-2 mb-4">
 								<div class="w-2 h-2 bg-exo-yellow rounded-full shadow-[0_0_8px_rgba(255,215,0,0.6)] animate-pulse"></div>
-								<h3 class="text-xs text-exo-yellow font-mono tracking-[0.2em] uppercase">Instances</h3>
+								<h3 class="text-sm text-exo-yellow font-mono tracking-[0.2em] uppercase">Instances</h3>
 								<div class="flex-1 h-px bg-gradient-to-r from-exo-yellow/30 to-transparent"></div>
 							</div>
 								<div class="space-y-3 max-h-72 overflow-y-auto pr-1">
@@ -1757,28 +1701,28 @@ function toggleInstanceDownloadDetails(nodeId: string): void {
 											<div class="flex justify-between items-start mb-2 pl-2">
 												<div class="flex items-center gap-2">
 													<div class="w-1.5 h-1.5 {isDownloading ? 'bg-blue-400 animate-pulse' : isFailed ? 'bg-red-400' : isLoading ? 'bg-yellow-400 animate-pulse' : isReady ? 'bg-green-400' : 'bg-teal-400'} rounded-full shadow-[0_0_6px_currentColor]"></div>
-													<span class="text-exo-light-gray font-mono text-sm tracking-wider">{id.slice(0, 8).toUpperCase()}</span>
+													<span class="text-exo-light-gray font-mono text-xs tracking-wider">{id.slice(0, 8).toUpperCase()}</span>
 												</div>
 												<button 
 													onclick={() => deleteInstance(id)}
-													class="text-xs px-2 py-1 font-mono tracking-wider uppercase border border-red-500/30 text-red-400 hover:bg-red-500/20 hover:text-red-400 hover:border-red-500/50 transition-all duration-200 cursor-pointer"
+													class="text-xs px-2 py-1 font-mono tracking-wider uppercase border border-red-500/30 text-red-400/80 hover:bg-red-500/20 hover:text-red-400 hover:border-red-500/50 transition-all duration-200 cursor-pointer"
 												>
 													DELETE
 												</button>
 												</div>
 												<div class="pl-2">
-													<div class="text-exo-yellow text-xs font-mono tracking-wide truncate">{getInstanceModelId(instance)}</div>
+													<div class="text-exo-yellow text-sm font-mono tracking-wide truncate">{getInstanceModelId(instance)}</div>
 													<div class="text-white/60 text-xs font-mono">Strategy: <span class="text-white/80">{instanceInfo.sharding} ({instanceInfo.instanceType})</span></div>
 														{#if instanceModelId && instanceModelId !== 'Unknown' && instanceModelId !== 'Unknown Model'}
 															<a
-																class="inline-flex items-center gap-1 text-[11px] text-white/60 hover:text-exo-yellow transition-colors mt-1"
+																class="inline-flex items-center gap-1 text-[10px] text-white/60 hover:text-exo-yellow transition-colors mt-0.5"
 																href={`https://huggingface.co/${instanceModelId}`}
 																target="_blank"
 																rel="noreferrer noopener"
 																aria-label="View model on Hugging Face"
 															>
 																<span>Hugging Face</span>
-																<svg class="w-3.5 h-3.5" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
+																<svg class="w-3 h-3" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
 																	<path d="M14 3h7v7"/>
 																	<path d="M10 14l11-11"/>
 																	<path d="M21 14v6a1 1 0 0 1-1 1h-16a1 1 0 0 1-1-1v-16a1 1 0 0 1 1-1h6"/>
@@ -1789,84 +1733,68 @@ function toggleInstanceDownloadDetails(nodeId: string): void {
 														<div class="text-white/60 text-xs font-mono">{instanceInfo.nodeNames.join(', ')}</div>
 													{/if}
 													{#if debugEnabled && instanceConnections.length > 0}
-													<div class="mt-2 space-y-1">
-														{#each instanceConnections as conn}
-															<div class="text-[11px] leading-snug font-mono text-white/70">
-																<span>{conn.from} -> {conn.to}: {conn.ip}</span>
-																<span class="{conn.missingIface ? 'text-red-400' : 'text-white/60'}"> ({conn.ifaceLabel})</span>
+														<div class="mt-1 space-y-0.5">
+															{#each instanceConnections as conn}
+																<div class="text-[10px] leading-snug font-mono text-white/70">
+																	<span>{conn.from} -> {conn.to}: {conn.ip}</span>
+																	<span class="{conn.missingIface ? 'text-red-400' : 'text-white/60'}"> ({conn.ifaceLabel})</span>
+																</div>
+															{/each}
+														</div>
+													{/if}
+													
+													<!-- Download Progress -->
+													{#if downloadInfo.isDownloading && downloadInfo.progress}
+														<div class="mt-2 space-y-1">
+															<div class="flex justify-between text-sm font-mono">
+																<span class="text-blue-400">{downloadInfo.progress.percentage.toFixed(1)}%</span>
+																<span class="text-exo-light-gray">{formatBytes(downloadInfo.progress.downloadedBytes)}/{formatBytes(downloadInfo.progress.totalBytes)}</span>
+															</div>
+															<div class="relative h-1 bg-exo-black/60 rounded-sm overflow-hidden">
+																<div 
+																	class="absolute inset-y-0 left-0 bg-gradient-to-r from-blue-500 to-blue-400 transition-all duration-300"
+																	style="width: {downloadInfo.progress.percentage}%"
+																></div>
+															</div>
+															<div class="flex justify-between text-xs font-mono text-exo-light-gray">
+																<span>{formatSpeed(downloadInfo.progress.speed)}</span>
+																<span>ETA: {formatEta(downloadInfo.progress.etaMs)}</span>
+																<span>{downloadInfo.progress.completedFiles}/{downloadInfo.progress.totalFiles} files</span>
 															</div>
-														{/each}
-													</div>
-												{/if}
-												
-												<!-- Download Progress -->
-												{#if downloadInfo.isDownloading && downloadInfo.progress}
-													<div class="mt-2 space-y-1">
-														<div class="flex justify-between text-xs font-mono">
-															<span class="text-blue-400">{downloadInfo.progress.percentage.toFixed(1)}%</span>
-															<span class="text-exo-light-gray">{formatBytes(downloadInfo.progress.downloadedBytes)}/{formatBytes(downloadInfo.progress.totalBytes)}</span>
 														</div>
-														<div class="relative h-1.5 bg-exo-black/60 rounded-sm overflow-hidden">
-															<div 
-																class="absolute inset-y-0 left-0 bg-gradient-to-r from-blue-500 to-blue-400 transition-all duration-300"
-																style="width: {downloadInfo.progress.percentage}%"
-															></div>
-														</div>
-														<div class="flex justify-between text-xs font-mono text-exo-light-gray">
-															<span>{formatSpeed(downloadInfo.progress.speed)}</span>
-															<span>ETA: {formatEta(downloadInfo.progress.etaMs)}</span>
-															<span>{downloadInfo.progress.completedFiles}/{downloadInfo.progress.totalFiles} files</span>
-														</div>
-													</div>
-													{#if downloadInfo.perNode.length > 0}
-														<div class="mt-2 space-y-2 max-h-48 overflow-y-auto pr-1">
-															{#each downloadInfo.perNode as nodeProg}
-																{@const nodePercent = Math.min(100, Math.max(0, nodeProg.progress.percentage))}
-																{@const isExpanded = instanceDownloadExpandedNodes.has(nodeProg.nodeId)}
-																<div class="rounded border border-exo-medium-gray/40 bg-exo-black/30 p-2">
-																	<button
-																		type="button"
-																		class="w-full text-left space-y-1.5"
-																		onclick={() => toggleInstanceDownloadDetails(nodeProg.nodeId)}
-																	>
-																		<div class="flex items-center justify-between text-[11px] font-mono text-exo-light-gray">
+														{#if downloadInfo.perNode.length > 0}
+															<div class="mt-2 space-y-1.5 max-h-48 overflow-y-auto pr-1">
+																{#each downloadInfo.perNode as nodeProg}
+																	<div class="rounded border border-exo-medium-gray/40 bg-exo-black/30 p-2">
+																		<div class="flex items-center justify-between text-[11px] font-mono text-exo-light-gray mb-1">
 																			<span class="text-white/80 truncate pr-2">{nodeProg.nodeName}</span>
-																			<span class="flex items-center gap-1 text-blue-300">
-																				{nodePercent.toFixed(1)}%
-																				<svg class="w-3 h-3 text-exo-light-gray" viewBox="0 0 20 20" fill="none" stroke="currentColor" stroke-width="2">
-																					<path d="M6 8l4 4 4-4" class={isExpanded ? 'transform rotate-180 origin-center transition-transform duration-150' : 'transition-transform duration-150'}></path>
-																				</svg>
-																			</span>
+																			<span class="text-blue-300">{Math.min(100, Math.max(0, nodeProg.progress.percentage)).toFixed(1)}%</span>
 																		</div>
-																		<div class="relative h-1.5 bg-exo-black/60 rounded-sm overflow-hidden">
+																		<div class="relative h-1 bg-exo-black/60 rounded-sm overflow-hidden mb-1.5">
 																			<div 
-																				class="absolute inset-y-0 left-0 bg-gradient-to-r from-blue-500 to-blue-400 transition-all duration-300"
-																				style="width: {nodePercent.toFixed(1)}%"
+																				class="absolute inset-y-0 left-0 bg-blue-500/80 transition-all duration-300"
+																				style="width: {Math.min(100, Math.max(0, nodeProg.progress.percentage)).toFixed(1)}%"
 																			></div>
 																		</div>
-																		<div class="flex items-center justify-between text-[11px] font-mono text-exo-light-gray">
+																		<div class="flex items-center justify-between text-[11px] font-mono text-exo-light-gray mb-1">
 																			<span>{formatBytes(nodeProg.progress.downloadedBytes)} / {formatBytes(nodeProg.progress.totalBytes)}</span>
 																			<span>{formatSpeed(nodeProg.progress.speed)} • ETA {formatEta(nodeProg.progress.etaMs)}</span>
 																		</div>
-																	</button>
-
-																	{#if isExpanded}
-																		<div class="mt-2 space-y-1.5">
-																			{#if nodeProg.progress.files.length === 0}
-																				<div class="text-[11px] font-mono text-exo-light-gray/70">No file details reported.</div>
-																			{:else}
-																				{#each nodeProg.progress.files as f}
-																					{@const filePercent = Math.min(100, Math.max(0, f.percentage ?? 0))}
-																					{@const isFileComplete = filePercent >= 100}
-																					<div class="rounded border border-exo-medium-gray/30 bg-exo-black/40 p-2">
-																						<div class="flex items-center justify-between text-[10px] font-mono text-exo-light-gray/90">
+																	{#if nodeProg.progress.files.length > 0}
+																		{@const inProgressFiles = nodeProg.progress.files.filter(f => (f.percentage ?? 0) < 100)}
+																		{@const completedFiles = nodeProg.progress.files.filter(f => (f.percentage ?? 0) >= 100)}
+																		{#if inProgressFiles.length > 0}
+																			<div class="space-y-1">
+																				{#each inProgressFiles as f}
+																					<div class="text-[10px] font-mono text-exo-light-gray/80">
+																						<div class="flex items-center justify-between">
 																							<span class="truncate pr-2">{f.name}</span>
-																							<span class={isFileComplete ? 'text-green-400' : 'text-white/80'}>{filePercent.toFixed(1)}%</span>
+																							<span class="text-white/70">{Math.min(100, Math.max(0, f.percentage)).toFixed(1)}%</span>
 																						</div>
-																						<div class="relative h-1 bg-exo-black/60 rounded-sm overflow-hidden mt-1">
+																						<div class="relative h-1 bg-exo-black/50 rounded-sm overflow-hidden mt-0.5">
 																							<div 
-																								class="absolute inset-y-0 left-0 bg-gradient-to-r {isFileComplete ? 'from-green-500 to-green-400' : 'from-exo-yellow to-exo-yellow/70'} transition-all duration-300"
-																								style="width: {filePercent.toFixed(1)}%"
+																								class="absolute inset-y-0 left-0 bg-gradient-to-r from-exo-yellow to-exo-yellow/70"
+																								style="width: {Math.min(100, Math.max(0, f.percentage)).toFixed(1)}%"
 																							></div>
 																						</div>
 																						<div class="flex items-center justify-between text-[10px] text-exo-light-gray/70 mt-0.5">
@@ -1875,17 +1803,27 @@ function toggleInstanceDownloadDetails(nodeId: string): void {
 																						</div>
 																					</div>
 																				{/each}
-																			{/if}
-																		</div>
+																			</div>
+																		{/if}
+																		{#if completedFiles.length > 0}
+																			<div class="pt-1 space-y-0.5">
+																				{#each completedFiles as f}
+																					<div class="text-[10px] font-mono text-exo-light-gray/70 flex items-center justify-between">
+																						<span class="truncate pr-2">{f.name}</span>
+																						<span class="text-white/60">100%</span>
+																					</div>
+																				{/each}
+																			</div>
+																		{/if}
 																	{/if}
-																</div>
-															{/each}
-														</div>
+																	</div>
+																{/each}
+															</div>
+														{/if}
+														<div class="text-sm text-blue-400 font-mono tracking-wider mt-1">DOWNLOADING</div>
+													{:else}
+														<div class="text-sm {getStatusColor(downloadInfo.statusText)} font-mono tracking-wider mt-1">{downloadInfo.statusText}</div>
 													{/if}
-													<div class="text-xs text-blue-400 font-mono tracking-wider mt-1">DOWNLOADING</div>
-												{:else}
-													<div class="text-xs {getStatusColor(downloadInfo.statusText)} font-mono tracking-wider mt-1">{downloadInfo.statusText}</div>
-												{/if}
 												</div>
 											</div>
 										</div>
--- a/dashboard/src/routes/downloads/+page.svelte
+++ b/dashboard/src/routes/downloads/+page.svelte
@@ -345,19 +345,13 @@
 							<div class="rounded border border-exo-medium-gray/30 bg-exo-dark-gray/60 p-3 space-y-2">
 								<div class="flex items-center justify-between gap-3">
 									<div class="min-w-0 space-y-0.5">
-										<div 
-											class="text-xs font-mono text-white truncate"
-											title={model.prettyName ?? model.modelId}
-										>{model.prettyName ?? model.modelId}</div>
-										<div 
-											class="text-[10px] text-exo-light-gray font-mono truncate"
-											title={model.modelId}
-										>{model.modelId}</div>
-										{#if model.status !== 'completed'}
-											<div class="text-[11px] text-exo-light-gray font-mono">
-												{formatBytes(model.downloadedBytes)} / {formatBytes(model.totalBytes)}
-											</div>
-										{/if}
+										<div class="text-sm font-mono text-white truncate">{model.prettyName ?? model.modelId}</div>
+										<div class="text-[11px] text-exo-light-gray font-mono truncate">
+											{model.modelId}
+										</div>
+										<div class="text-[11px] text-exo-light-gray font-mono">
+											{formatBytes(model.downloadedBytes)} / {formatBytes(model.totalBytes)}
+										</div>
 									</div>
 									<div class="flex items-center gap-2">
 										<span class="text-xs font-mono {pct >= 100 ? 'text-green-400' : pct <= 0 ? 'text-red-400' : 'text-exo-yellow'}">
@@ -432,14 +426,14 @@
 <style>
 	.downloads-grid {
 		display: grid;
-		grid-template-columns: repeat(auto-fill, minmax(320px, 1fr));
+		grid-template-columns: repeat(auto-fill, minmax(260px, 1fr));
 	}
 	@media (min-width: 1024px) {
 		.downloads-grid {
 			grid-template-columns: repeat(3, minmax(0, 1fr));
 		}
 	}
-	@media (min-width: 1600px) {
+	@media (min-width: 1440px) {
 		.downloads-grid {
 			grid-template-columns: repeat(4, minmax(0, 1fr));
 		}
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -29,12 +29,10 @@ dependencies = [
    "exo_pyo3_bindings", # rust bindings
    "anyio==4.11.0",
    "bidict>=0.23.1",
-    "mlx>=0.30.1; sys_platform == 'darwin'",
-    "mlx[cpu]>=0.30.1; sys_platform == 'linux'",
+    "mlx>=0.30.1",
    "mlx-lm>=0.28.3",
    "tiktoken>=0.12.0", # required for kimi k2 tokenizer
    "hypercorn>=0.18.0",
-    "openai-harmony>=0.0.8",
 ]

 [project.scripts]
--- a/src/exo/main.py
+++ b/src/exo/main.py
@@ -1,6 +1,5 @@
 import argparse
 import multiprocessing as mp
-import os
 import signal
 from dataclasses import dataclass, field
 from typing import Self
@@ -195,7 +194,6 @@ def main():
    # TODO: Refactor the current verbosity system
    logger_setup(EXO_LOG, args.verbosity)
    logger.info("Starting EXO")
-    logger.info(f"EXO_LIBP2P_NAMESPACE: {os.getenv('EXO_LIBP2P_NAMESPACE')}")

    node = anyio.run(Node.create, args)
    anyio.run(node.run)
--- a/src/exo/master/api.py
+++ b/src/exo/master/api.py
@@ -13,12 +13,6 @@ from hypercorn.asyncio import serve  # pyright: ignore[reportUnknownVariableType
 from hypercorn.config import Config
 from hypercorn.typing import ASGIFramework
 from loguru import logger
-from openai_harmony import (  # pyright: ignore[reportMissingTypeStubs]
-    HarmonyEncodingName,
-    Role,
-    StreamableParser,
-    load_harmony_encoding,
-)

 from exo.master.placement import place_instance as get_instance_placements
 from exo.shared.apply import apply
@@ -27,13 +21,11 @@ from exo.shared.logging import InterceptLogger
 from exo.shared.models.model_cards import MODEL_CARDS
 from exo.shared.models.model_meta import get_model_meta
 from exo.shared.types.api import (
-    ChatCompletionChoice,
    ChatCompletionMessage,
    ChatCompletionResponse,
    CreateInstanceParams,
    CreateInstanceResponse,
    DeleteInstanceResponse,
-    FinishReason,
    ModelList,
    ModelListModel,
    PlaceInstanceParams,
@@ -64,7 +56,7 @@ from exo.utils.channels import Receiver, Sender, channel
 from exo.utils.dashboard_path import find_dashboard
 from exo.utils.event_buffer import OrderedBuffer

-encoding = load_harmony_encoding(HarmonyEncodingName.HARMONY_GPT_OSS)
+HIDE_THINKING = False


 def chunk_to_response(
@@ -169,9 +161,7 @@ class API:
        self.app.delete("/instance/{instance_id}")(self.delete_instance)
        self.app.get("/models")(self.get_models)
        self.app.get("/v1/models")(self.get_models)
-        self.app.post("/v1/chat/completions", response_model=None)(
-            self.chat_completions
-        )
+        self.app.post("/v1/chat/completions")(self.chat_completions)
        self.app.get("/state")(lambda: self.state)
        self.app.get("/events")(lambda: self._event_log)

@@ -187,32 +177,17 @@ class API:
        return CreateInstanceResponse(
            message="Command received.",
            command_id=command.command_id,
-            model_meta=command.model_meta,
        )

    async def create_instance(
        self, payload: CreateInstanceParams
    ) -> CreateInstanceResponse:
-        instance = payload.instance
-        model_meta = await resolve_model_meta(instance.shard_assignments.model_id)
-        required_memory = model_meta.storage_size
-        available_memory = self._calculate_total_available_memory()
-
-        if required_memory > available_memory:
-            raise HTTPException(
-                status_code=400,
-                detail=f"Insufficient memory to create instance. Required: {required_memory.in_gb:.1f}GB, Available: {available_memory.in_gb:.1f}GB",
-            )
-
-        command = CreateInstance(
-            instance=instance,
-        )
+        command = CreateInstance(instance=payload.instance)
        await self._send(command)

        return CreateInstanceResponse(
            message="Command received.",
            command_id=command.command_id,
-            model_meta=model_meta,
        )

    async def get_placement(
@@ -377,52 +352,32 @@ class API:
            instance_id=instance_id,
        )

-    async def _process_gpt_oss(self, token_chunks: Receiver[TokenChunk]):
-        stream = StreamableParser(encoding, role=Role.ASSISTANT)
-        thinking = False
-
-        async for chunk in token_chunks:
-            stream.process(chunk.token_id)
-
-            delta = stream.last_content_delta
-            ch = stream.current_channel
-
-            if ch == "analysis" and not thinking:
-                thinking = True
-                yield chunk.model_copy(update={"text": "<think>"})
-
-            if ch != "analysis" and thinking:
-                thinking = False
-                yield chunk.model_copy(update={"text": "</think>"})
-
-            if delta:
-                yield chunk.model_copy(update={"text": delta})
-
-            if chunk.finish_reason is not None:
-                if thinking:
-                    yield chunk.model_copy(update={"text": "</think>"})
-                yield chunk
-                break
-
-    async def _chat_chunk_stream(
-        self, command_id: CommandId, parse_gpt_oss: bool
-    ) -> AsyncGenerator[TokenChunk, None]:
-        """Yield `TokenChunk`s for a given command until completion."""
+    async def _generate_chat_stream(
+        self, command_id: CommandId
+    ) -> AsyncGenerator[str, None]:
+        """Generate chat completion stream as JSON strings."""

        try:
            self._chat_completion_queues[command_id], recv = channel[TokenChunk]()

+            is_thinking = False
            with recv as token_chunks:
-                if parse_gpt_oss:
-                    async for chunk in self._process_gpt_oss(token_chunks):
-                        yield chunk
-                        if chunk.finish_reason is not None:
-                            break
-                else:
-                    async for chunk in token_chunks:
-                        yield chunk
-                        if chunk.finish_reason is not None:
-                            break
+                async for chunk in token_chunks:
+                    if HIDE_THINKING:
+                        if chunk.text == "<think>":
+                            is_thinking = True
+                        if chunk.text == "</think>":
+                            is_thinking = False
+                    chunk_response: ChatCompletionResponse = chunk_to_response(
+                        chunk, command_id
+                    )
+                    if not (is_thinking and HIDE_THINKING):
+                        logger.debug(f"chunk_response: {chunk_response}")
+                        yield f"data: {chunk_response.model_dump_json()}\n\n"
+
+                    if chunk.finish_reason is not None:
+                        yield "data: [DONE]\n\n"
+                        break

        except anyio.get_cancelled_exc_class():
            # TODO: TaskCancelled
@@ -437,59 +392,6 @@ class API:
            await self._send(command)
            del self._chat_completion_queues[command_id]

-    async def _generate_chat_stream(
-        self, command_id: CommandId, parse_gpt_oss: bool
-    ) -> AsyncGenerator[str, None]:
-        """Generate chat completion stream as JSON strings."""
-
-        async for chunk in self._chat_chunk_stream(command_id, parse_gpt_oss):
-            chunk_response: ChatCompletionResponse = chunk_to_response(
-                chunk, command_id
-            )
-            logger.debug(f"chunk_response: {chunk_response}")
-
-            yield f"data: {chunk_response.model_dump_json()}\n\n"
-
-            if chunk.finish_reason is not None:
-                yield "data: [DONE]\n\n"
-
-    async def _collect_chat_completion(
-        self, command_id: CommandId, parse_gpt_oss: bool
-    ) -> ChatCompletionResponse:
-        """Collect all token chunks for a chat completion and return a single response."""
-
-        text_parts: list[str] = []
-        model: str | None = None
-        finish_reason: FinishReason | None = None
-
-        async for chunk in self._chat_chunk_stream(command_id, parse_gpt_oss):
-            if model is None:
-                model = chunk.model
-
-            text_parts.append(chunk.text)
-
-            if chunk.finish_reason is not None:
-                finish_reason = chunk.finish_reason
-
-        combined_text = "".join(text_parts)
-        assert model is not None
-
-        return ChatCompletionResponse(
-            id=command_id,
-            created=int(time.time()),
-            model=model,
-            choices=[
-                ChatCompletionChoice(
-                    index=0,
-                    message=ChatCompletionMessage(
-                        role="assistant",
-                        content=combined_text,
-                    ),
-                    finish_reason=finish_reason,
-                )
-            ],
-        )
-
    async def _trigger_notify_user_to_download_model(self, model_id: str) -> None:
        logger.warning(
            "TODO: we should send a notification to the user to download the model"
@@ -497,12 +399,10 @@ class API:

    async def chat_completions(
        self, payload: ChatCompletionTaskParams
-    ) -> ChatCompletionResponse | StreamingResponse:
-        """Handle chat completions, supporting both streaming and non-streaming responses."""
+    ) -> StreamingResponse:
+        """Handle chat completions with proper streaming response."""
        model_meta = await resolve_model_meta(payload.model)
        payload.model = model_meta.model_id
-        parse_gpt_oss = "gpt-oss" in model_meta.model_id.lower()
-        logger.info(f"{parse_gpt_oss=}")

        if not any(
            instance.shard_assignments.model_id == payload.model
@@ -517,13 +417,10 @@ class API:
            request_params=payload,
        )
        await self._send(command)
-        if payload.stream:
-            return StreamingResponse(
-                self._generate_chat_stream(command.command_id, parse_gpt_oss),
-                media_type="text/event-stream",
-            )
-
-        return await self._collect_chat_completion(command.command_id, parse_gpt_oss)
+        return StreamingResponse(
+            self._generate_chat_stream(command.command_id),
+            media_type="text/event-stream",
+        )

    def _calculate_total_available_memory(self) -> Memory:
        """Calculate total available memory across all nodes in bytes."""
@@ -545,8 +442,6 @@ class API:
                    name=card.name,
                    description=card.description,
                    tags=card.tags,
-                    storage_size_megabytes=int(card.metadata.storage_size.in_mb),
-                    supports_tensor=card.metadata.supports_tensor,
                )
                for card in MODEL_CARDS.values()
            ]
@@ -563,7 +458,7 @@ class API:
        async with create_task_group() as tg:
            self._tg = tg
            logger.info("Starting API")
-            tg.start_soon(self._apply_state)
+            tg.start_soon(self._applystate)
            tg.start_soon(self._pause_on_new_election)
            print_startup_banner(self.port)
            await serve(
@@ -575,7 +470,7 @@ class API:
        self.command_sender.close()
        self.global_event_receiver.close()

-    async def _apply_state(self):
+    async def _applystate(self):
        with self.global_event_receiver as events:
            async for f_event in events:
                if f_event.origin != self.session_id.master_node_id:
--- a/src/exo/master/placement.py
+++ b/src/exo/master/placement.py
@@ -21,7 +21,6 @@ from exo.shared.types.commands import (
 )
 from exo.shared.types.events import Event, InstanceCreated, InstanceDeleted
 from exo.shared.types.memory import Memory
-from exo.shared.types.models import ModelId
 from exo.shared.types.topology import NodeInfo
 from exo.shared.types.worker.instances import (
    Instance,
@@ -30,7 +29,6 @@ from exo.shared.types.worker.instances import (
    MlxJacclInstance,
    MlxRingInstance,
 )
-from exo.shared.types.worker.shards import Sharding


 def random_ephemeral_port() -> int:
@@ -67,28 +65,6 @@ def place_instance(
    if not cycles_with_sufficient_memory:
        raise ValueError("No cycles found with sufficient memory")

-    if command.sharding == Sharding.Tensor:
-        if not command.model_meta.supports_tensor:
-            raise ValueError(
-                f"Requested Tensor sharding but this model does not support tensor parallelism: {command.model_meta.model_id}"
-            )
-        # TODO: the condition here for tensor parallel is not correct, but it works good enough for now.
-        cycles_with_sufficient_memory = [
-            cycle
-            for cycle in cycles_with_sufficient_memory
-            if command.model_meta.hidden_size % len(cycle) == 0
-        ]
-        if not cycles_with_sufficient_memory:
-            raise ValueError(
-                f"No tensor sharding found for model with hidden_size {command.model_meta.hidden_size} candidate cycles"
-            )
-    if command.sharding == Sharding.Pipeline and command.model_meta.model_id == ModelId(
-        "mlx-community/DeepSeek-V3.1-8bit"
-    ):
-        raise ValueError(
-            "Pipeline parallelism is not supported for DeepSeek V3.1 (8-bit)"
-        )
-
    smallest_cycles = get_smallest_cycles(cycles_with_sufficient_memory)

    smallest_tb_cycles = [
--- a/src/exo/master/placement_utils.py
+++ b/src/exo/master/placement_utils.py
@@ -217,9 +217,7 @@ def get_mlx_ibv_devices_matrix(
            # Find the IP J uses to talk to I
            for connection_ip, _ in _find_connection_ip(node_j, node_i, cycle_digraph):
                # This is a local IP on I, which is attached to an interface: find that interface
-                if interface_name := _find_rdma_interface_name_for_ip(
-                    connection_ip, node_i
-                ):
+                if interface_name := _find_interface_name_for_ip(connection_ip, node_i):
                    matrix[i][j] = interface_name
                    logger.info(
                        f"Interface name for {connection_ip} on {node_i.node_id}: {interface_name}"
@@ -250,7 +248,7 @@ def _find_connection_ip(
            yield connection.send_back_multiaddr.ip_address, connection.is_thunderbolt()


-def _find_rdma_interface_name_for_ip(
+def _find_interface_name_for_ip(
    ip_address: str,
    node_info: NodeInfo,
 ) -> str | None:
@@ -271,7 +269,7 @@ def _find_rdma_interface_name_for_ip(
    return None


-def _find_interface_name_for_ip(
+def _find_general_interface_name_for_ip(
    ip_address: str,
    node_info: NodeInfo,
 ) -> str | None:
@@ -289,7 +287,6 @@ def _find_interface_name_for_ip(
 def _find_ip_prioritised(
    node: NodeInfo, other_node: NodeInfo, cycle_digraph: Topology
 ) -> str | None:
-    # TODO: Actually prioritize in the correct Ethernet > Wifi > Non-TB > TB order.
    """Find an IP address between nodes with prioritization.

    Priority order:
@@ -299,21 +296,35 @@ def _find_ip_prioritised(
    4. Any other IP address
    """
    ips = list(_find_connection_ip(node, other_node, cycle_digraph))
-    # We expect a unique iface -> ip mapping
-    iface_map = {_find_interface_name_for_ip(ip, other_node): ip for ip, _ in ips}
+    interface_names = [
+        _find_general_interface_name_for_ip(ip, other_node) for ip, _ in ips
+    ]

-    en0_ip = iface_map.get("en0")
+    en0_ip = next(
+        (
+            ip
+            for (ip, _), interface_name in zip(ips, interface_names)
+            if interface_name == "en0"
+        ),
+        None,
+    )
    if en0_ip:
        return en0_ip

-    en1_ip = iface_map.get("en1")
+    en1_ip = next(
+        (
+            ip
+            for (ip, _), interface_name in zip(ips, interface_names)
+            if interface_name == "en1"
+        ),
+        None,
+    )
    if en1_ip:
        return en1_ip

    non_thunderbolt_ip = next(
        (ip for (ip, is_thunderbolt) in ips if not is_thunderbolt), None
    )
-
    if non_thunderbolt_ip:
        return non_thunderbolt_ip

@@ -339,12 +350,16 @@ def get_mlx_ring_hosts_by_node(
    if world_size == 0:
        return {}

+    logger.info(f"[RING3DBG] get_mlx_ring_hosts_by_node: world_size={world_size}, ephemeral_port={ephemeral_port}")
+    logger.info(f"[RING3DBG] cycle node_ids: {[n.node_id for n in selected_cycle]}")
+
    hosts_by_node: dict[NodeId, list[Host]] = {}

    for rank, node in enumerate(selected_cycle):
        node_id = node.node_id
        left_rank = (rank - 1) % world_size
        right_rank = (rank + 1) % world_size
+        logger.info(f"[RING3DBG] rank={rank} node_id={node_id} left_rank={left_rank} right_rank={right_rank}")

        hosts_for_node: list[Host] = []

@@ -359,6 +374,7 @@ def get_mlx_ring_hosts_by_node(
                continue

            connection_ip = _find_ip_prioritised(node, other_node, cycle_digraph)
+            logger.info(f"[RING3DBG] rank={rank} idx={idx} connection_ip={connection_ip}")
            if connection_ip is None:
                logger.warning(
                    f"Failed to find prioritised connection IP between {node_id} and {other_node.node_id}"
@@ -369,6 +385,7 @@ def get_mlx_ring_hosts_by_node(

            hosts_for_node.append(Host(ip=connection_ip, port=ephemeral_port))

+        logger.info(f"[RING3DBG] rank={rank} final hosts_for_node={hosts_for_node}")
        hosts_by_node[node_id] = hosts_for_node

    return hosts_by_node
@@ -385,14 +402,13 @@ def get_mlx_jaccl_coordinators(
    address in format "X.X.X.X:PORT" per node.
    """
    rank_0_node = selected_cycle[0]
-    logger.debug(f"Selecting coordinator from rank 0 node: {rank_0_node.node_id}")
+    logger.info(f"Selecting coordinator from rank 0 node: {rank_0_node.node_id}")

    def get_ip_for_node(n: NodeInfo) -> str:
        if n.node_id == rank_0_node.node_id:
            return "0.0.0.0"

-        ip = _find_ip_prioritised(n, rank_0_node, cycle_digraph)
-        if ip:
+        for ip, _ in _find_connection_ip(n, rank_0_node, cycle_digraph):
            return ip

        logger.warning(
--- a/src/exo/master/tests/test_master.py
+++ b/src/exo/master/tests/test_master.py
@@ -123,8 +123,6 @@ async def test_master():
                            pretty_name="Llama 3.2 1B",
                            n_layers=16,
                            storage_size=Memory.from_bytes(678948),
-                            hidden_size=7168,
-                            supports_tensor=True,
                        ),
                        sharding=Sharding.Pipeline,
                        instance_meta=InstanceMeta.MlxRing,
@@ -181,8 +179,6 @@ async def test_master():
                        pretty_name="Llama 3.2 1B",
                        n_layers=16,
                        storage_size=Memory.from_bytes(678948),
-                        hidden_size=7168,
-                        supports_tensor=True,
                    ),
                    device_rank=0,
                    world_size=1,
--- a/src/exo/master/tests/test_placement.py
+++ b/src/exo/master/tests/test_placement.py
@@ -50,8 +50,6 @@ def model_meta() -> ModelMetadata:
        storage_size=Memory.from_kb(1000),
        pretty_name="Test Model",
        n_layers=10,
-        hidden_size=30,
-        supports_tensor=True,
    )


@@ -142,8 +140,6 @@ def test_get_instance_placements_one_node_exact_fit(
            storage_size=Memory.from_kb(1000),
            pretty_name="Test Model",
            n_layers=10,
-            hidden_size=1000,
-            supports_tensor=True,
        ),
    )
    placements = place_instance(cic, topology, {})
@@ -169,8 +165,6 @@ def test_get_instance_placements_one_node_fits_with_extra_memory(
            storage_size=Memory.from_kb(1000),
            pretty_name="Test Model",
            n_layers=10,
-            hidden_size=1000,
-            supports_tensor=True,
        ),
    )
    placements = place_instance(cic, topology, {})
@@ -196,8 +190,6 @@ def test_get_instance_placements_one_node_not_fit(
            storage_size=Memory.from_kb(1001),
            pretty_name="Test Model",
            n_layers=10,
-            hidden_size=1000,
-            supports_tensor=True,
        ),
    )

--- a/src/exo/master/tests/test_placement_utils.py
+++ b/src/exo/master/tests/test_placement_utils.py
@@ -198,8 +198,6 @@ def test_get_shard_assignments(
        pretty_name="Test Model",
        n_layers=total_layers,
        storage_size=Memory.from_kb(1000),
-        hidden_size=1000,
-        supports_tensor=True,
    )
    cycles = topology.get_cycles()
    selected_cycle = cycles[0]
--- a/src/exo/shared/models/model_cards.py
+++ b/src/exo/shared/models/model_cards.py
@@ -51,8 +51,6 @@ MODEL_CARDS: dict[str, ModelCard] = {
            pretty_name="DeepSeek V3.1 (4-bit)",
            storage_size=Memory.from_gb(378),
            n_layers=61,
-            hidden_size=7168,
-            supports_tensor=True,
        ),
    ),
    "deepseek-v3.1-8bit": ModelCard(
@@ -66,8 +64,6 @@ MODEL_CARDS: dict[str, ModelCard] = {
            pretty_name="DeepSeek V3.1 (8-bit)",
            storage_size=Memory.from_gb(713),
            n_layers=61,
-            hidden_size=7168,
-            supports_tensor=True,
        ),
    ),
    # "deepseek-v3.2": ModelCard(
@@ -139,8 +135,6 @@ MODEL_CARDS: dict[str, ModelCard] = {
            pretty_name="Kimi K2 Instruct (4-bit)",
            storage_size=Memory.from_gb(578),
            n_layers=61,
-            hidden_size=7168,
-            supports_tensor=True,
        ),
    ),
    "kimi-k2-thinking": ModelCard(
@@ -154,8 +148,6 @@ MODEL_CARDS: dict[str, ModelCard] = {
            pretty_name="Kimi K2 Thinking (4-bit)",
            storage_size=Memory.from_gb(658),
            n_layers=61,
-            hidden_size=7168,
-            supports_tensor=True,
        ),
    ),
    # llama-3.1
@@ -170,38 +162,6 @@ MODEL_CARDS: dict[str, ModelCard] = {
            pretty_name="Llama 3.1 8B (4-bit)",
            storage_size=Memory.from_mb(4423),
            n_layers=32,
-            hidden_size=4096,
-            supports_tensor=True,
-        ),
-    ),
-    "llama-3.1-8b-8bit": ModelCard(
-        short_id="llama-3.1-8b-8bit",
-        model_id=ModelId("mlx-community/Meta-Llama-3.1-8B-Instruct-8bit"),
-        name="Llama 3.1 8B (8-bit)",
-        description="""Llama 3.1 is a large language model trained on the Llama 3.1 dataset.""",
-        tags=[],
-        metadata=ModelMetadata(
-            model_id=ModelId("mlx-community/Meta-Llama-3.1-8B-Instruct-8bit"),
-            pretty_name="Llama 3.1 8B (8-bit)",
-            storage_size=Memory.from_mb(8540),
-            n_layers=32,
-            hidden_size=4096,
-            supports_tensor=True,
-        ),
-    ),
-    "llama-3.1-8b-bf16": ModelCard(
-        short_id="llama-3.1-8b-bf16",
-        model_id=ModelId("mlx-community/Meta-Llama-3.1-8B-Instruct-bf16"),
-        name="Llama 3.1 8B (BF16)",
-        description="""Llama 3.1 is a large language model trained on the Llama 3.1 dataset.""",
-        tags=[],
-        metadata=ModelMetadata(
-            model_id=ModelId("mlx-community/Meta-Llama-3.1-8B-Instruct-bf16"),
-            pretty_name="Llama 3.1 8B (BF16)",
-            storage_size=Memory.from_mb(16100),
-            n_layers=32,
-            hidden_size=4096,
-            supports_tensor=True,
        ),
    ),
    "llama-3.1-70b": ModelCard(
@@ -215,8 +175,6 @@ MODEL_CARDS: dict[str, ModelCard] = {
            pretty_name="Llama 3.1 70B (4-bit)",
            storage_size=Memory.from_mb(38769),
            n_layers=80,
-            hidden_size=8192,
-            supports_tensor=True,
        ),
    ),
    # llama-3.2
@@ -231,8 +189,6 @@ MODEL_CARDS: dict[str, ModelCard] = {
            pretty_name="Llama 3.2 1B (4-bit)",
            storage_size=Memory.from_mb(696),
            n_layers=16,
-            hidden_size=2048,
-            supports_tensor=True,
        ),
    ),
    "llama-3.2-3b": ModelCard(
@@ -246,8 +202,6 @@ MODEL_CARDS: dict[str, ModelCard] = {
            pretty_name="Llama 3.2 3B (4-bit)",
            storage_size=Memory.from_mb(1777),
            n_layers=28,
-            hidden_size=3072,
-            supports_tensor=True,
        ),
    ),
    "llama-3.2-3b-8bit": ModelCard(
@@ -261,8 +215,6 @@ MODEL_CARDS: dict[str, ModelCard] = {
            pretty_name="Llama 3.2 3B (8-bit)",
            storage_size=Memory.from_mb(3339),
            n_layers=28,
-            hidden_size=3072,
-            supports_tensor=True,
        ),
    ),
    # llama-3.3
@@ -277,8 +229,6 @@ MODEL_CARDS: dict[str, ModelCard] = {
            pretty_name="Llama 3.3 70B",
            storage_size=Memory.from_mb(38769),
            n_layers=80,
-            hidden_size=8192,
-            supports_tensor=True,
        ),
    ),
    "llama-3.3-70b-8bit": ModelCard(
@@ -292,8 +242,6 @@ MODEL_CARDS: dict[str, ModelCard] = {
            pretty_name="Llama 3.3 70B (8-bit)",
            storage_size=Memory.from_mb(73242),
            n_layers=80,
-            hidden_size=8192,
-            supports_tensor=True,
        ),
    ),
    "llama-3.3-70b-fp16": ModelCard(
@@ -307,8 +255,20 @@ MODEL_CARDS: dict[str, ModelCard] = {
            pretty_name="Llama 3.3 70B (FP16)",
            storage_size=Memory.from_mb(137695),
            n_layers=80,
-            hidden_size=8192,
-            supports_tensor=True,
+        ),
+    ),
+    # phi-3
+    "phi-3-mini": ModelCard(
+        short_id="phi-3-mini",
+        model_id=ModelId("mlx-community/Phi-3-mini-128k-instruct-4bit"),
+        name="Phi 3 Mini 128k (4-bit)",
+        description="""Phi 3 Mini is a large language model trained on the Phi 3 Mini dataset.""",
+        tags=[],
+        metadata=ModelMetadata(
+            model_id=ModelId("mlx-community/Phi-3-mini-128k-instruct-4bit"),
+            pretty_name="Phi 3 Mini 128k (4-bit)",
+            storage_size=Memory.from_mb(2099),
+            n_layers=32,
        ),
    ),
    # qwen3
@@ -323,8 +283,6 @@ MODEL_CARDS: dict[str, ModelCard] = {
            pretty_name="Qwen3 0.6B (4-bit)",
            storage_size=Memory.from_mb(327),
            n_layers=28,
-            hidden_size=1024,
-            supports_tensor=False,
        ),
    ),
    "qwen3-0.6b-8bit": ModelCard(
@@ -338,8 +296,6 @@ MODEL_CARDS: dict[str, ModelCard] = {
            pretty_name="Qwen3 0.6B (8-bit)",
            storage_size=Memory.from_mb(666),
            n_layers=28,
-            hidden_size=1024,
-            supports_tensor=False,
        ),
    ),
    "qwen3-30b": ModelCard(
@@ -353,8 +309,6 @@ MODEL_CARDS: dict[str, ModelCard] = {
            pretty_name="Qwen3 30B A3B (4-bit)",
            storage_size=Memory.from_mb(16797),
            n_layers=48,
-            hidden_size=2048,
-            supports_tensor=True,
        ),
    ),
    "qwen3-30b-8bit": ModelCard(
@@ -368,68 +322,6 @@ MODEL_CARDS: dict[str, ModelCard] = {
            pretty_name="Qwen3 30B A3B (8-bit)",
            storage_size=Memory.from_mb(31738),
            n_layers=48,
-            hidden_size=2048,
-            supports_tensor=True,
-        ),
-    ),
-    "qwen3-80b-a3B-4bit": ModelCard(
-        short_id="qwen3-80b-a3B-4bit",
-        model_id=ModelId("mlx-community/Qwen3-Next-80B-A3B-Instruct-4bit"),
-        name="Qwen3 80B A3B (4-bit)",
-        description="""Qwen3 80B""",
-        tags=[],
-        metadata=ModelMetadata(
-            model_id=ModelId("mlx-community/Qwen3-Next-80B-A3B-Instruct-4bit"),
-            pretty_name="Qwen3 80B A3B (4-bit)",
-            storage_size=Memory.from_mb(44800),
-            n_layers=48,
-            hidden_size=2048,
-            supports_tensor=True,
-        ),
-    ),
-    "qwen3-80b-a3B-8bit": ModelCard(
-        short_id="qwen3-80b-a3B-8bit",
-        model_id=ModelId("mlx-community/Qwen3-Next-80B-A3B-Instruct-8bit"),
-        name="Qwen3 80B A3B (8-bit)",
-        description="""Qwen3 80B""",
-        tags=[],
-        metadata=ModelMetadata(
-            model_id=ModelId("mlx-community/Qwen3-Next-80B-A3B-Instruct-8bit"),
-            pretty_name="Qwen3 80B A3B (8-bit)",
-            storage_size=Memory.from_mb(84700),
-            n_layers=48,
-            hidden_size=2048,
-            supports_tensor=True,
-        ),
-    ),
-    "qwen3-80b-a3B-thinking-4bit": ModelCard(
-        short_id="qwen3-80b-a3B-thinking-4bit",
-        model_id=ModelId("mlx-community/Qwen3-Next-80B-A3B-Thinking-4bit"),
-        name="Qwen3 80B A3B Thinking (4-bit)",
-        description="""Qwen3 80B Reasoning model""",
-        tags=[],
-        metadata=ModelMetadata(
-            model_id=ModelId("mlx-community/Qwen3-Next-80B-A3B-Thinking-4bit"),
-            pretty_name="Qwen3 80B A3B (4-bit)",
-            storage_size=Memory.from_mb(84700),
-            n_layers=48,
-            hidden_size=2048,
-            supports_tensor=True,
-        ),
-    ),
-    "qwen3-80b-a3B-thinking-8bit": ModelCard(
-        short_id="qwen3-80b-a3B-thinking-8bit",
-        model_id=ModelId("mlx-community/Qwen3-Next-80B-A3B-Thinking-8bit"),
-        name="Qwen3 80B A3B Thinking (8-bit)",
-        description="""Qwen3 80B Reasoning model""",
-        tags=[],
-        metadata=ModelMetadata(
-            model_id=ModelId("mlx-community/Qwen3-Next-80B-A3B-Thinking-8bit"),
-            pretty_name="Qwen3 80B A3B (8-bit)",
-            storage_size=Memory.from_mb(84700),
-            n_layers=48,
-            hidden_size=2048,
-            supports_tensor=True,
        ),
    ),
    "qwen3-235b-a22b-4bit": ModelCard(
@@ -443,8 +335,6 @@ MODEL_CARDS: dict[str, ModelCard] = {
            pretty_name="Qwen3 235B A22B (4-bit)",
            storage_size=Memory.from_gb(132),
            n_layers=94,
-            hidden_size=4096,
-            supports_tensor=True,
        ),
    ),
    "qwen3-235b-a22b-8bit": ModelCard(
@@ -458,8 +348,6 @@ MODEL_CARDS: dict[str, ModelCard] = {
            pretty_name="Qwen3 235B A22B (8-bit)",
            storage_size=Memory.from_gb(250),
            n_layers=94,
-            hidden_size=4096,
-            supports_tensor=True,
        ),
    ),
    "qwen3-coder-480b-a35b-4bit": ModelCard(
@@ -473,8 +361,6 @@ MODEL_CARDS: dict[str, ModelCard] = {
            pretty_name="Qwen3 Coder 480B A35B (4-bit)",
            storage_size=Memory.from_gb(270),
            n_layers=62,
-            hidden_size=6144,
-            supports_tensor=True,
        ),
    ),
    "qwen3-coder-480b-a35b-8bit": ModelCard(
@@ -488,84 +374,77 @@ MODEL_CARDS: dict[str, ModelCard] = {
            pretty_name="Qwen3 Coder 480B A35B (8-bit)",
            storage_size=Memory.from_gb(540),
            n_layers=62,
-            hidden_size=6144,
-            supports_tensor=True,
        ),
    ),
-    # gpt-oss
-    "gpt-oss-120b-MXFP4-Q8": ModelCard(
-        short_id="gpt-oss-120b-MXFP4-Q8",
-        model_id=ModelId("mlx-community/gpt-oss-120b-MXFP4-Q8"),
-        name="GPT-OSS 120B (MXFP4-Q8, MLX)",
-        description="""OpenAI's GPT-OSS 120B is a 117B-parameter Mixture-of-Experts model designed for high-reasoning and general-purpose use; this variant is a 4-bit MLX conversion for Apple Silicon.""",
+    # granite
+    "granite-3.3-2b": ModelCard(
+        short_id="granite-3.3-2b",
+        model_id=ModelId("mlx-community/granite-3.3-2b-instruct-fp16"),
+        name="Granite 3.3 2B (FP16)",
+        description="""Granite-3.3-2B-Instruct is a 2-billion parameter 128K context length language model fine-tuned for improved reasoning and instruction-following capabilities.""",
        tags=[],
        metadata=ModelMetadata(
-            model_id=ModelId("mlx-community/gpt-oss-120b-MXFP4-Q8"),
-            pretty_name="GPT-OSS 120B (MXFP4-Q8, MLX)",
-            storage_size=Memory.from_kb(68_996_301),
-            n_layers=36,
-            hidden_size=2880,
-            supports_tensor=True,
+            model_id=ModelId("mlx-community/granite-3.3-2b-instruct-fp16"),
+            pretty_name="Granite 3.3 2B (FP16)",
+            storage_size=Memory.from_mb(4951),
+            n_layers=40,
        ),
    ),
-    "gpt-oss-20b-4bit": ModelCard(
-        short_id="gpt-oss-20b-4bit",
-        model_id=ModelId("mlx-community/gpt-oss-20b-MXFP4-Q4"),
-        name="GPT-OSS 20B (MXFP4-Q4, MLX)",
-        description="""OpenAI's GPT-OSS 20B is a medium-sized MoE model for lower-latency and local or specialized use cases; this MLX variant uses MXFP4 4-bit quantization.""",
-        tags=[],
-        metadata=ModelMetadata(
-            model_id=ModelId("mlx-community/gpt-oss-20b-MXFP4-Q4"),
-            pretty_name="GPT-OSS 20B (MXFP4-Q4, MLX)",
-            storage_size=Memory.from_kb(11_744_051),
-            n_layers=24,
-            hidden_size=2880,
-            supports_tensor=True,
-        ),
-    ),
-    # Needs to be quantized g32 or g16.
-    "glm-4.5-air-8bit": ModelCard(
-        short_id="glm-4.5-air-8bit",
-        model_id=ModelId("mlx-community/GLM-4.5-Air-8bit"),
-        name="GLM 4.5 Air 8bit",
-        description="""GLM 4.5 Air 8bit""",
-        tags=[],
-        metadata=ModelMetadata(
-            model_id=ModelId("mlx-community/GLM-4.5-Air-8bit"),
-            pretty_name="GLM 4.5 Air 8bit",
-            storage_size=Memory.from_gb(114),
-            n_layers=46,
-            hidden_size=4096,
-            supports_tensor=False,
-        ),
-    ),
-    "glm-4.5-air-bf16": ModelCard(
-        short_id="glm-4.5-air-bf16",
-        model_id=ModelId("mlx-community/GLM-4.5-Air-bf16"),
-        name="GLM 4.5 Air bf16",
-        description="""GLM 4.5 Air bf16""",
-        tags=[],
-        metadata=ModelMetadata(
-            model_id=ModelId("mlx-community/GLM-4.5-Air-bf16"),
-            pretty_name="GLM 4.5 Air bf16",
-            storage_size=Memory.from_gb(214),
-            n_layers=46,
-            hidden_size=4096,
-            supports_tensor=True,
-        ),
-    ),
-    # "devstral-2-123b-instruct-2512-8bit": ModelCard(
-    #     short_id="devstral-2-123b-instruct-2512-8bit",
-    #     model_id=ModelId("mlx-community/Devstral-2-123B-Instruct-2512-8bit"),
-    #     name="Devstral 2 123B Instruct 2512 (8-bit, MLX)",
-    #     description="""Mistral AI's Devstral 2 123B Instruct (2512) is an agentic coding model.""",
+    # "granite-3.3-8b": ModelCard(
+    #     short_id="granite-3.3-8b",
+    #     model_id=ModelId("mlx-community/granite-3.3-8b-instruct-fp16"),
+    #     name="Granite 3.3 8B",
+    #     description="""Granite-3.3-8B-Instruct is a 8-billion parameter 128K context length language model fine-tuned for improved reasoning and instruction-following capabilities.""",
    #     tags=[],
    #     metadata=ModelMetadata(
-    #         model_id=ModelId("mlx-community/Devstral-2-123B-Instruct-2512-8bit"),
-    #         pretty_name="Devstral 2 123B Instruct 2512 (8-bit, MLX)",
-    #         storage_size=Memory.from_kb(133_000_000),
-    #         n_layers=88,
-    #         hidden_size=12288,
+    #         model_id=ModelId("mlx-community/granite-3.3-8b-instruct-fp16"),
+    #         pretty_name="Granite 3.3 8B",
+    #         storage_size=Memory.from_kb(15958720),
+    #         n_layers=40,
+    #     ),
+    # ),
+    # smol-lm
+    # "smol-lm-135m": ModelCard(
+    #     short_id="smol-lm-135m",
+    #     model_id="mlx-community/SmolLM-135M-4bit",
+    #     name="Smol LM 135M",
+    #     description="""SmolLM is a series of state-of-the-art small language models available in three sizes: 135M, 360M, and 1.7B parameters. """,
+    #     tags=[],
+    #     metadata=ModelMetadata(
+    #         model_id=ModelId("mlx-community/SmolLM-135M-4bit"),
+    #         pretty_name="Smol LM 135M",
+    #         storage_size=Memory.from_kb(73940),
+    #         n_layers=30,
+    #     ),
+    # ),
+    # gpt-oss
+    # "gpt-oss-120b-MXFP4-Q8": ModelCard(
+    #     short_id="gpt-oss-120b-MXFP4-Q8",
+    #     model_id=ModelId("mlx-community/gpt-oss-120b-MXFP4-Q8"),
+    #     name="GPT-OSS 120B (MXFP4-Q8, MLX)",
+    #     description="""OpenAI's GPT-OSS 120B is a 117B-parameter Mixture-of-Experts model designed for high-reasoning and general-purpose use; this variant is a 4-bit MLX conversion for Apple Silicon.""",
+    #     tags=[],
+    #     metadata=ModelMetadata(
+    #         model_id=ModelId("mlx-community/gpt-oss-120b-MXFP4-Q8"),
+    #         pretty_name="GPT-OSS 120B (MXFP4-Q8, MLX)",
+    #         storage_size=Memory.from_kb(68_996_301),
+    #         n_layers=36,
+    #         hidden_size=2880,
+    #         supports_tensor=True,
+    #     ),
+    # ),
+    # "gpt-oss-20b-4bit": ModelCard(
+    #     short_id="gpt-oss-20b-4bit",
+    #     model_id=ModelId("mlx-community/gpt-oss-20b-MXFP4-Q4"),
+    #     name="GPT-OSS 20B (MXFP4-Q4, MLX)",
+    #     description="""OpenAI's GPT-OSS 20B is a medium-sized MoE model for lower-latency and local or specialized use cases; this MLX variant uses MXFP4 4-bit quantization.""",
+    #     tags=[],
+    #     metadata=ModelMetadata(
+    #         model_id=ModelId("mlx-community/gpt-oss-20b-MXFP4-Q4"),
+    #         pretty_name="GPT-OSS 20B (MXFP4-Q4, MLX)",
+    #         storage_size=Memory.from_kb(11_744_051),
+    #         n_layers=24,
+    #         hidden_size=2880,
    #         supports_tensor=True,
    #     ),
    # ),
--- a/src/exo/shared/models/model_meta.py
+++ b/src/exo/shared/models/model_meta.py
@@ -6,7 +6,6 @@ from huggingface_hub import model_info
 from loguru import logger
 from pydantic import BaseModel, Field

-from exo.shared.models.model_cards import MODEL_CARDS
 from exo.shared.types.memory import Memory
 from exo.shared.types.models import ModelId, ModelMetadata
 from exo.worker.download.download_utils import (
@@ -26,7 +25,6 @@ class ConfigData(BaseModel):
    n_layers: Annotated[int, Field(ge=0)] | None = None  # Sometimes used
    num_decoder_layers: Annotated[int, Field(ge=0)] | None = None  # Transformer models
    decoder_layers: Annotated[int, Field(ge=0)] | None = None  # Some architectures
-    hidden_size: Annotated[int, Field(ge=0)] | None = None

    @property
    def layer_count(self) -> int:
@@ -108,19 +106,10 @@ async def _get_model_meta(model_id: str) -> ModelMetadata:
    config_data = await get_config_data(model_id)
    num_layers = config_data.layer_count
    mem_size_bytes = await get_safetensors_size(model_id)
-    model_card = next(
-        (card for card in MODEL_CARDS.values() if card.model_id == ModelId(model_id)),
-        None,
-    )

    return ModelMetadata(
        model_id=ModelId(model_id),
-        pretty_name=model_card.name if model_card is not None else model_id,
+        pretty_name=model_id,
        storage_size=mem_size_bytes,
        n_layers=num_layers,
-        hidden_size=config_data.hidden_size or 0,
-        # TODO: all custom models currently do not support tensor. We could add a dynamic test for this?
-        supports_tensor=model_card.metadata.supports_tensor
-        if model_card is not None
-        else False,
    )
--- a/src/exo/shared/tests/conftest.py
+++ b/src/exo/shared/tests/conftest.py
@@ -36,8 +36,6 @@ def get_pipeline_shard_metadata(
            pretty_name=str(model_id),
            storage_size=Memory.from_mb(100000),
            n_layers=32,
-            hidden_size=1000,
-            supports_tensor=True,
        ),
        device_rank=device_rank,
        world_size=world_size,
--- a/src/exo/shared/tests/test_apply/test_apply_node_download.py
+++ b/src/exo/shared/tests/test_apply/test_apply_node_download.py
@@ -19,7 +19,7 @@ def test_apply_node_download_progress():
        NodeDownloadProgress(download_progress=event), state
    )

-    assert new_state.downloads == {NodeId("node-1"): [event]}
+    assert new_state == State(downloads={NodeId("node-1"): [event]})


 def test_apply_two_node_download_progress():
@@ -42,4 +42,4 @@ def test_apply_two_node_download_progress():
    # TODO: This test is failing. We should support the following:
    # 1. Downloading multiple models concurrently on the same node (one per runner is fine).
    # 2. Downloading a model, it completes, then downloading a different model on the same node.
-    assert new_state.downloads == {NodeId("node-1"): [event1, event2]}
+    assert new_state == State(downloads={NodeId("node-1"): [event1, event2]})
--- a/src/exo/shared/types/api.py
+++ b/src/exo/shared/types/api.py
@@ -5,7 +5,7 @@ from pydantic import BaseModel, Field, field_validator
 from pydantic_core import PydanticUseDefault

 from exo.shared.types.common import CommandId
-from exo.shared.types.models import ModelId, ModelMetadata
+from exo.shared.types.models import ModelId
 from exo.shared.types.worker.instances import Instance, InstanceId, InstanceMeta
 from exo.shared.types.worker.shards import Sharding

@@ -174,7 +174,6 @@ class DeleteInstanceTaskParams(BaseModel):
 class CreateInstanceResponse(BaseModel):
    message: str
    command_id: CommandId
-    model_meta: ModelMetadata


 class DeleteInstanceResponse(BaseModel):
--- a/src/exo/shared/types/models.py
+++ b/src/exo/shared/types/models.py
@@ -14,5 +14,3 @@ class ModelMetadata(CamelCaseModel):
    pretty_name: str
    storage_size: Memory
    n_layers: PositiveInt
-    hidden_size: PositiveInt
-    supports_tensor: bool
--- a/src/exo/shared/types/tasks.py
+++ b/src/exo/shared/types/tasks.py
@@ -40,10 +40,6 @@ class LoadModel(BaseTask):  # emitted by Worker
    pass


-class ConnectToGroup(BaseTask):  # emitted by Worker
-    pass
-
-
 class StartWarmup(BaseTask):  # emitted by Worker
    pass

@@ -61,11 +57,5 @@ class Shutdown(BaseTask):  # emitted by Worker


 Task = (
-    CreateRunner
-    | DownloadModel
-    | ConnectToGroup
-    | LoadModel
-    | StartWarmup
-    | ChatCompletion
-    | Shutdown
+    CreateRunner | DownloadModel | LoadModel | StartWarmup | ChatCompletion | Shutdown
 )
--- a/src/exo/shared/types/worker/runners.py
+++ b/src/exo/shared/types/worker/runners.py
@@ -21,15 +21,7 @@ class BaseRunnerStatus(TaggedModel):
        return isinstance(self, RunnerRunning)


-class RunnerIdle(BaseRunnerStatus):
-    pass
-
-
-class RunnerConnecting(BaseRunnerStatus):
-    pass
-
-
-class RunnerConnected(BaseRunnerStatus):
+class RunnerWaitingForModel(BaseRunnerStatus):
    pass


@@ -62,9 +54,7 @@ class RunnerFailed(BaseRunnerStatus):


 RunnerStatus = (
-    RunnerIdle
-    | RunnerConnecting
-    | RunnerConnected
+    RunnerWaitingForModel
    | RunnerLoading
    | RunnerLoaded
    | RunnerWarmingUp
--- a/src/exo/worker/download/download_utils.py
+++ b/src/exo/worker/download/download_utils.py
@@ -450,11 +450,6 @@ async def get_weight_map(repo_id: str, revision: str = "main") -> dict[str, str]


 async def resolve_allow_patterns(shard: ShardMetadata) -> list[str]:
-    # TODO: 'Smart' downloads are disabled because:
-    #  (i) We don't handle all kinds of files;
-    # (ii) We don't have sticky sessions.
-    # (iii) Tensor parallel requires all files.
-    return ["*"]
    try:
        weight_map = await get_weight_map(str(shard.model_meta.model_id))
        return get_allow_patterns(weight_map, shard)
--- a/src/exo/worker/download/huggingface_utils.py
+++ b/src/exo/worker/download/huggingface_utils.py
@@ -95,15 +95,7 @@ def extract_layer_num(tensor_name: str) -> int | None:

 def get_allow_patterns(weight_map: dict[str, str], shard: ShardMetadata) -> list[str]:
    default_patterns = set(
-        [
-            "*.json",
-            "*.py",
-            "tokenizer.model",
-            "tiktoken.model",
-            "*.tiktoken",
-            "*.txt",
-            "*.jinja",
-        ]
+        ["*.json", "*.py", "tokenizer.model", "*.tiktoken", "*.txt", "*.jinja"]
    )
    shard_specific_patterns: set[str] = set()
    if weight_map:
--- a/src/exo/worker/download/shard_downloader.py
+++ b/src/exo/worker/download/shard_downloader.py
@@ -1,5 +1,4 @@
 from abc import ABC, abstractmethod
-from copy import copy
 from datetime import timedelta
 from pathlib import Path
 from typing import AsyncIterator, Callable
@@ -13,7 +12,7 @@ from exo.shared.types.worker.shards import (
 from exo.worker.download.download_utils import RepoDownloadProgress


-# TODO: the PipelineShardMetadata getting reinstantiated is a bit messy. Should this be a classmethod?
+# TODO: the PipelineShardMetadata getting reinstantiated is a bit messy. Shoudl this be a classmethod?
 class ShardDownloader(ABC):
    @abstractmethod
    async def ensure_shard(
@@ -44,7 +43,34 @@ class ShardDownloader(ABC):
        Yields:
            tuple[Path, RepoDownloadProgress]: The path and progress of a shard download.
        """
-        yield (Path("/tmp/noop_shard"), NOOP_DOWNLOAD_PROGRESS)
+        yield (
+            Path("/tmp/noop_shard"),
+            RepoDownloadProgress(
+                repo_id="noop",
+                repo_revision="noop",
+                shard=PipelineShardMetadata(
+                    model_meta=ModelMetadata(
+                        model_id=ModelId("noop"),
+                        pretty_name="noope",
+                        storage_size=Memory.from_bytes(0),
+                        n_layers=1,
+                    ),
+                    device_rank=0,
+                    world_size=1,
+                    start_layer=0,
+                    end_layer=1,
+                    n_layers=1,
+                ),
+                completed_files=0,
+                total_files=0,
+                downloaded_bytes=Memory.from_bytes(0),
+                downloaded_bytes_this_session=Memory.from_bytes(0),
+                total_bytes=Memory.from_bytes(0),
+                overall_speed=0,
+                overall_eta=timedelta(seconds=0),
+                status="complete",
+            ),
+        )

    @abstractmethod
    async def get_shard_download_status_for_shard(
@@ -68,41 +94,46 @@ class NoopShardDownloader(ShardDownloader):
    ) -> AsyncIterator[tuple[Path, RepoDownloadProgress]]:
        yield (
            Path("/tmp/noop_shard"),
-            NOOP_DOWNLOAD_PROGRESS,
+            RepoDownloadProgress(
+                repo_id="noop",
+                repo_revision="noop",
+                shard=PipelineShardMetadata(
+                    model_meta=ModelMetadata(
+                        model_id=ModelId("noop"),
+                        pretty_name="noope",
+                        storage_size=Memory.from_bytes(0),
+                        n_layers=1,
+                    ),
+                    device_rank=0,
+                    world_size=1,
+                    start_layer=0,
+                    end_layer=1,
+                    n_layers=1,
+                ),
+                completed_files=0,
+                total_files=0,
+                downloaded_bytes=Memory.from_bytes(0),
+                downloaded_bytes_this_session=Memory.from_bytes(0),
+                total_bytes=Memory.from_bytes(0),
+                overall_speed=0,
+                overall_eta=timedelta(seconds=0),
+                status="complete",
+            ),
        )

    async def get_shard_download_status_for_shard(
        self, shard: ShardMetadata
    ) -> RepoDownloadProgress:
-        dp = copy(NOOP_DOWNLOAD_PROGRESS)
-        dp.shard = shard
-        return dp
-
-
-NOOP_DOWNLOAD_PROGRESS = RepoDownloadProgress(
-    repo_id="noop",
-    repo_revision="noop",
-    shard=PipelineShardMetadata(
-        model_meta=ModelMetadata(
-            model_id=ModelId("noop"),
-            pretty_name="noope",
-            storage_size=Memory.from_bytes(0),
-            n_layers=1,
-            hidden_size=1,
-            supports_tensor=False,
-        ),
-        device_rank=0,
-        world_size=1,
-        start_layer=0,
-        end_layer=1,
-        n_layers=1,
-    ),
-    completed_files=0,
-    total_files=0,
-    downloaded_bytes=Memory.from_bytes(0),
-    downloaded_bytes_this_session=Memory.from_bytes(0),
-    total_bytes=Memory.from_bytes(0),
-    overall_speed=0,
-    overall_eta=timedelta(seconds=0),
-    status="complete",
-)
+        return RepoDownloadProgress(
+            repo_id="noop",
+            repo_revision="noop",
+            shard=shard,
+            completed_files=0,
+            total_files=0,
+            downloaded_bytes=Memory.from_bytes(0),
+            downloaded_bytes_this_session=Memory.from_bytes(0),
+            total_bytes=Memory.from_bytes(0),
+            overall_speed=0,
+            overall_eta=timedelta(seconds=0),
+            status="complete",
+        )
--- a/src/exo/worker/engines/mlx/auto_parallel.py
+++ b/src/exo/worker/engines/mlx/auto_parallel.py
@@ -22,6 +22,7 @@ from mlx_lm.models.qwen3_moe import Qwen3MoeSparseMoeBlock
 from exo.shared.types.worker.shards import (
    PipelineShardMetadata,
 )
+from exo.worker.runner.bootstrap import logger


 class _LayerCallable(Protocol):
@@ -170,6 +171,8 @@ def pipeline_auto_parallel(

    start_layer, end_layer = model_shard_meta.start_layer, model_shard_meta.end_layer
    device_rank, world_size = model_shard_meta.device_rank, model_shard_meta.world_size
+    logger.info(f"[RING3DBG] pipeline_auto_parallel: device_rank={device_rank} world_size={world_size}")
+    logger.info(f"[RING3DBG] layers: start={start_layer} end={end_layer} count={len(layers)}")

    layers = layers[start_layer:end_layer]
    layers[0] = PipelineFirstLayer(layers[0], device_rank, group=group)
--- a/src/exo/worker/engines/mlx/constants.py
+++ b/src/exo/worker/engines/mlx/constants.py
@@ -9,7 +9,8 @@ MAX_KV_SIZE: int | None = 3200
 KEEP_KV_SIZE: int | None = 1600
 QUANTIZE_MODEL_MODE: str | None = "affine"
 CACHE_GROUP_SIZE: int = 64
-KV_CACHE_BITS: int | None = None
+KV_CACHE_BITS: int | None = 8
+TEMPERATURE: float = 1.0

 # TODO: We should really make this opt-in, but Kimi requires trust_remote_code=True
 TRUST_REMOTE_CODE: bool = True
--- a/src/exo/worker/engines/mlx/utils_mlx.py
+++ b/src/exo/worker/engines/mlx/utils_mlx.py
@@ -13,6 +13,7 @@ from mlx_lm.tokenizer_utils import TokenizerWrapper
 from exo.worker.engines.mlx.constants import (
    CACHE_GROUP_SIZE,
    KV_CACHE_BITS,
+    TEMPERATURE,
    TRUST_REMOTE_CODE,
 )

@@ -20,8 +21,6 @@ try:
    from mlx_lm.tokenizer_utils import load_tokenizer
 except ImportError:
    from mlx_lm.tokenizer_utils import load as load_tokenizer  # type: ignore
-import contextlib
-
 import mlx.core as mx
 import mlx.nn as nn
 from mlx_lm.utils import load_model
@@ -49,7 +48,6 @@ from exo.worker.engines.mlx.auto_parallel import (
 )
 from exo.worker.runner.bootstrap import logger

-Group = mx.distributed.Group
 # Needed for 8 bit model
 resource.setrlimit(resource.RLIMIT_NOFILE, (2048, 4096))

@@ -69,7 +67,7 @@ def get_weights_size(model_shard_meta: ShardMetadata) -> Memory:
    )


-def mx_barrier(group: Group | None = None):
+def mx_barrier(group: mx.distributed.Group | None = None):
    mx.eval(
        mx.distributed.all_sum(
            mx.array(1.0),
@@ -79,7 +77,7 @@ def mx_barrier(group: Group | None = None):
    )


-def broadcast_from_zero(value: int, group: Group | None = None):
+def broadcast_from_zero(value: int, group: mx.distributed.Group | None = None):
    if group is None:
        return value

@@ -101,97 +99,97 @@ class HostList(RootModel[list[str]]):

 def mlx_distributed_init(
    bound_instance: BoundInstance,
-) -> Group:
+) -> mx.distributed.Group:
    """
-    Initialize MLX distributed.
+    Initialize the MLX distributed (runs in thread pool).
+
+    Either hosts or mlx_ibv_devices must be provided:
+    - hosts: traditional host-based connectivity using MLX_HOSTFILE
+    - mlx_ibv_devices: RDMA connectivity matrix using MLX_IBV_DEVICES
+    - mlx_ibv_coordinator: coordinator address (IP:PORT) for RDMA setup
+    - strict: if True, raise an error if the distributed backend is not available
    """
    rank = bound_instance.bound_shard.device_rank
    logger.info(f"Starting initialization for rank {rank}")
+    logger.info(f"[RING3DBG] mlx_distributed_init: bound_node_id={bound_instance.bound_node_id}")
+    logger.info(f"[RING3DBG] device_rank={rank}, world_size={bound_instance.bound_shard.world_size}")

-    coordination_file = None
-    try:
-        # TODO: singleton instances
-        match bound_instance.instance:
-            case MlxRingInstance(hosts_by_node=hosts_by_node, ephemeral_port=_):
-                coordination_file = (
-                    f"./hosts_{bound_instance.instance.instance_id}_{rank}.json"
-                )
-                hosts_for_node = hosts_by_node[bound_instance.bound_node_id]
-                hosts_json = HostList.from_hosts(hosts_for_node).model_dump_json()
+    # TODO: singleton instances
+    match bound_instance.instance:
+        case MlxRingInstance(hosts_by_node=hosts_by_node, ephemeral_port=_):
+            hostfile = f"./hosts_{rank}.json"
+            hosts_for_node = hosts_by_node[bound_instance.bound_node_id]
+            logger.info(f"[RING3DBG] hosts_by_node keys: {list(hosts_by_node.keys())}")
+            logger.info(f"[RING3DBG] hosts_for_node (len={len(hosts_for_node)}): {hosts_for_node}")
+            hosts_json = HostList.from_hosts(hosts_for_node).model_dump_json()

-                with open(coordination_file, "w") as f:
-                    _ = f.write(hosts_json)
+            with open(hostfile, "w") as f:
+                _ = f.write(hosts_json)

-                logger.info(
-                    f"rank {rank} hostfile: {coordination_file} hosts: {hosts_json}"
-                )
+            logger.info(f"rank {rank} hostfile: {hostfile} hosts: {hosts_json}")

-                os.environ["MLX_HOSTFILE"] = coordination_file
-                os.environ["MLX_RANK"] = str(rank)
-                os.environ["MLX_RING_VERBOSE"] = "1"
-                group = mx.distributed.init(backend="ring", strict=True)
+            os.environ["MLX_HOSTFILE"] = hostfile
+            os.environ["MLX_RANK"] = str(rank)
+            os.environ["MLX_RING_VERBOSE"] = "1"
+            group = mx.distributed.init(backend="ring", strict=True)

-            case MlxJacclInstance(
-                ibv_devices=ibv_devices, jaccl_coordinators=jaccl_coordinators
-            ):
-                # Use RDMA connectivity matrix
-                coordination_file = (
-                    f"./hosts_{bound_instance.instance.instance_id}_{rank}.json"
-                )
-                ibv_devices_json = json.dumps(ibv_devices)
+        case MlxJacclInstance(
+            ibv_devices=ibv_devices, jaccl_coordinators=jaccl_coordinators
+        ):
+            # Use RDMA connectivity matrix
+            devices_file = f"./hosts_{rank}.json"
+            ibv_devices_json = json.dumps(ibv_devices)

-                with open(coordination_file, "w") as f:
-                    _ = f.write(ibv_devices_json)
+            with open(devices_file, "w") as f:
+                _ = f.write(ibv_devices_json)

-                jaccl_coordinator = jaccl_coordinators[bound_instance.bound_node_id]
+            jaccl_coordinator = jaccl_coordinators[bound_instance.bound_node_id]

-                logger.info(f"rank {rank} MLX_IBV_DEVICES: {ibv_devices_json}")
-                logger.info(f"rank {rank} MLX_JACCL_COORDINATOR: {jaccl_coordinator}")
-                os.environ["MLX_IBV_DEVICES"] = coordination_file
-                os.environ["MLX_RANK"] = str(rank)
-                os.environ["MLX_JACCL_COORDINATOR"] = jaccl_coordinator
-                group = mx.distributed.init(backend="jaccl", strict=True)
+            logger.info(f"rank {rank} MLX_IBV_DEVICES: {ibv_devices_json}")
+            logger.info(f"rank {rank} MLX_JACCL_COORDINATOR: {jaccl_coordinator}")
+            os.environ["MLX_IBV_DEVICES"] = devices_file
+            os.environ["MLX_RANK"] = str(rank)
+            os.environ["MLX_JACCL_COORDINATOR"] = jaccl_coordinator
+            group = mx.distributed.init(backend="jaccl", strict=True)

-        logger.info(f"Rank {rank} mlx distributed initialization complete")
+    logger.info(f"Rank {rank} mlx distributed initialization complete")
+    logger.info(f"[RING3DBG] ring init complete: group.rank()={group.rank()} group.size()={group.size()}")

-        return group
-    finally:
-        with contextlib.suppress(FileNotFoundError):
-            if coordination_file:
-                os.remove(coordination_file)
+    return group


 def initialize_mlx(
    bound_instance: BoundInstance,
-) -> Group:
-    # should we unseed it?
-    # TODO: pass in seed from params
+) -> tuple[Model, TokenizerWrapper, Callable[[mx.array], mx.array]]:
+    """
+    Initialize the MLX model, tokenizer, and sampler. Runs in the MLX thread.
+    """
    mx.random.seed(42)

-    assert len(bound_instance.instance.shard_assignments.node_to_runner) > 1, (
-        "Tried to initialize mlx for a single node instance"
-    )
-    return mlx_distributed_init(bound_instance)
+    set_wired_limit_for_model(get_weights_size(bound_instance.bound_shard))

-
-def load_mlx_items(
-    bound_instance: BoundInstance, group: Group | None
-) -> tuple[Model, TokenizerWrapper, Callable[[mx.array], mx.array]]:
-    # TODO: pass temperature
-    sampler: Callable[[mx.array], mx.array] = make_sampler(temp=0.7)
+    sampler: Callable[[mx.array], mx.array] = make_sampler(temp=TEMPERATURE)
    logger.info("Created a sampler")

-    if group is None:
+    if len(bound_instance.instance.shard_assignments.node_to_runner) <= 1:
        logger.info(f"Single device used for {bound_instance.instance}")
        model_path = build_model_path(bound_instance.bound_shard.model_meta.model_id)
        start_time = time.perf_counter()
        model, _ = load_model(model_path, strict=True)
        end_time = time.perf_counter()
        logger.info(f"Time taken to load model: {(end_time - start_time):.2f}s")
+        if hasattr(model, "model") and isinstance(model.model, DeepseekV3Model):  # type: ignore
+            pass
+            # model, config = quantize_model(
+            #    model, config, group_size=KV_GROUP_SIZE, bits=ATTENTION_KV_BITS, quant_predicate=quant_predicate, mode=QUANTIZE_MODEL_MODE
+            # )
+
        tokenizer = get_tokenizer(model_path, bound_instance.bound_shard)

    else:
        logger.info("Starting distributed init")
+        group = mlx_distributed_init(bound_instance)
+
        start_time = time.perf_counter()
        model, tokenizer = shard_and_load(bound_instance.bound_shard, group=group)
        end_time = time.perf_counter()
@@ -201,12 +199,14 @@ def load_mlx_items(

    set_wired_limit_for_model(get_weights_size(bound_instance.bound_shard))

+    logger.debug(model)
+
    return cast(Model, model), tokenizer, sampler


 def shard_and_load(
    shard_metadata: ShardMetadata,
-    group: Group,
+    group: mx.distributed.Group,
 ) -> tuple[nn.Module, TokenizerWrapper]:
    model_path = build_model_path(shard_metadata.model_meta.model_id)

@@ -234,6 +234,8 @@ def shard_and_load(
    tokenizer = get_tokenizer(model_path, shard_metadata)

    logger.info(f"Group size: {group.size()}, group rank: {group.rank()}")
+    logger.info(f"[RING3DBG] shard_and_load: expected device_rank={shard_metadata.device_rank} world_size={shard_metadata.world_size}")
+    logger.info(f"[RING3DBG] actual group.rank()={group.rank()} group.size()={group.size()}")

    match shard_metadata:
        case TensorShardMetadata():
@@ -395,5 +397,11 @@ def set_wired_limit_for_model(model_size: Memory):
            "MB. This can be slow. See the documentation for possible work-arounds: "
            "https://github.com/ml-explore/mlx-lm/tree/main#large-models"
        )
+    kv_bytes = int(0.02 * model_bytes)
+    target_cache = int(1.10 * (model_bytes + kv_bytes))
+    target_cache = min(target_cache, max_rec_size)
+    mx.set_cache_limit(target_cache)
    mx.set_wired_limit(max_rec_size)
-    logger.info(f"Wired limit set to {max_rec_size}.")
+    logger.info(
+        f"Wired limit set to {max_rec_size}. Cache limit set to {target_cache}."
+    )
--- a/src/exo/worker/main.py
+++ b/src/exo/worker/main.py
@@ -23,7 +23,6 @@ from exo.shared.types.events import (
    TopologyEdgeCreated,
    TopologyEdgeDeleted,
 )
-from exo.shared.types.models import ModelId
 from exo.shared.types.multiaddr import Multiaddr
 from exo.shared.types.profiling import MemoryPerformanceProfile, NodePerformanceProfile
 from exo.shared.types.state import State
@@ -84,7 +83,7 @@ class Worker:
        self.out_for_delivery: dict[EventId, ForwarderEvent] = {}

        self.state: State = State()
-        self.download_status: dict[ModelId, DownloadProgress] = {}
+        self.download_status: dict[ShardMetadata, DownloadProgress] = {}
        self.runners: dict[RunnerId, RunnerSupervisor] = {}
        self._tg: TaskGroup | None = None

@@ -129,7 +128,6 @@ class Worker:
            tg.start_soon(start_polling_node_metrics, resource_monitor_callback)

            tg.start_soon(start_polling_memory_metrics, memory_monitor_callback)
-            tg.start_soon(self._emit_existing_download_progress)
            tg.start_soon(self._connection_message_event_writer)
            tg.start_soon(self._resend_out_for_delivery)
            tg.start_soon(self._event_applier)
@@ -202,11 +200,11 @@ class Worker:
                        )
                    )
                case DownloadModel(shard_metadata=shard):
-                    if shard.model_meta.model_id not in self.download_status:
+                    if shard not in self.download_status:
                        progress = DownloadPending(
                            shard_metadata=shard, node_id=self.node_id
                        )
-                        self.download_status[shard.model_meta.model_id] = progress
+                        self.download_status[shard] = progress
                        await self.event_sender.send(
                            NodeDownloadProgress(download_progress=progress)
                        )
@@ -219,7 +217,7 @@ class Worker:
                        progress = DownloadCompleted(
                            shard_metadata=shard, node_id=self.node_id
                        )
-                        self.download_status[shard.model_meta.model_id] = progress
+                        self.download_status[shard] = progress
                        await self.event_sender.send(
                            NodeDownloadProgress(download_progress=progress)
                        )
@@ -230,7 +228,7 @@ class Worker:
                            )
                        )
                    else:
-                        await self.event_sender.send(
+                        self.event_sender.send_nowait(
                            TaskStatusUpdated(
                                task_id=task.task_id, task_status=TaskStatus.Running
                            )
@@ -351,7 +349,7 @@ class Worker:
                initial_progress
            ),
        )
-        self.download_status[task.shard_metadata.model_meta.model_id] = status
+        self.download_status[task.shard_metadata] = status
        self.event_sender.send_nowait(NodeDownloadProgress(download_progress=status))

        last_progress_time = 0.0
@@ -365,7 +363,7 @@ class Worker:
            nonlocal last_progress_time
            if progress.status == "complete":
                status = DownloadCompleted(shard_metadata=shard, node_id=self.node_id)
-                self.download_status[shard.model_meta.model_id] = status
+                self.download_status[shard] = status
                # Footgun!
                self.event_sender.send_nowait(
                    NodeDownloadProgress(download_progress=status)
@@ -386,7 +384,7 @@ class Worker:
                        progress
                    ),
                )
-                self.download_status[shard.model_meta.model_id] = status
+                self.download_status[shard] = status
                self.event_sender.send_nowait(
                    NodeDownloadProgress(download_progress=status)
                )
@@ -446,40 +444,3 @@ class Worker:
                    await self.event_sender.send(TopologyEdgeDeleted(edge=conn))

            await anyio.sleep(10)
-
-    async def _emit_existing_download_progress(self) -> None:
-        try:
-            while True:
-                logger.info("Fetching and emitting existing download progress...")
-                async for (
-                    _,
-                    progress,
-                ) in self.shard_downloader.get_shard_download_status():
-                    if progress.status == "complete":
-                        status = DownloadCompleted(
-                            node_id=self.node_id, shard_metadata=progress.shard
-                        )
-                    elif progress.status in ["in_progress", "not_started"]:
-                        if progress.downloaded_bytes_this_session.in_bytes == 0:
-                            status = DownloadPending(
-                                node_id=self.node_id, shard_metadata=progress.shard
-                            )
-                        else:
-                            status = DownloadOngoing(
-                                node_id=self.node_id,
-                                shard_metadata=progress.shard,
-                                download_progress=map_repo_download_progress_to_download_progress_data(
-                                    progress
-                                ),
-                            )
-                    else:
-                        continue
-
-                    self.download_status[progress.shard.model_meta.model_id] = status
-                    await self.event_sender.send(
-                        NodeDownloadProgress(download_progress=status)
-                    )
-                logger.info("Done emitting existing download progress.")
-                await anyio.sleep(5 * 60)  # 5 minutes
-        except Exception as e:
-            logger.error(f"Error emitting existing download progress: {e}")
--- a/src/exo/worker/plan.py
+++ b/src/exo/worker/plan.py
@@ -3,10 +3,8 @@
 from collections.abc import Mapping, Sequence

 from exo.shared.types.common import NodeId
-from exo.shared.types.models import ModelId
 from exo.shared.types.tasks import (
    ChatCompletion,
-    ConnectToGroup,
    CreateRunner,
    DownloadModel,
    LoadModel,
@@ -16,25 +14,20 @@ from exo.shared.types.tasks import (
    TaskId,
    TaskStatus,
 )
-from exo.shared.types.worker.downloads import (
-    DownloadCompleted,
-    DownloadOngoing,
-    DownloadProgress,
-)
+from exo.shared.types.worker.downloads import DownloadCompleted, DownloadProgress
 from exo.shared.types.worker.instances import BoundInstance, Instance, InstanceId
 from exo.shared.types.worker.runners import (
-    RunnerConnected,
-    RunnerConnecting,
    RunnerFailed,
    RunnerId,
-    RunnerIdle,
    RunnerLoaded,
    RunnerLoading,
    RunnerReady,
    RunnerRunning,
    RunnerStatus,
+    RunnerWaitingForModel,
    RunnerWarmingUp,
 )
+from exo.shared.types.worker.shards import ShardMetadata
 from exo.worker.runner.runner_supervisor import RunnerSupervisor


@@ -43,7 +36,7 @@ def plan(
    # Runners is expected to be FRESH and so should not come from state
    runners: Mapping[RunnerId, RunnerSupervisor],
    # DL_status is expected to be FRESH and so should not come from state
-    download_status: Mapping[ModelId, DownloadProgress],
+    download_status: Mapping[ShardMetadata, DownloadProgress],
    # gdls is not expected to be fresh
    global_download_status: Mapping[NodeId, Sequence[DownloadProgress]],
    instances: Mapping[InstanceId, Instance],
@@ -55,7 +48,6 @@ def plan(
        _kill_runner(runners, all_runners, instances)
        or _create_runner(node_id, runners, instances)
        or _model_needs_download(runners, download_status)
-        or _init_distributed_backend(runners, all_runners)
        or _load_model(runners, all_runners, global_download_status)
        or _ready_to_warmup(runners, all_runners)
        or _pending_tasks(runners, tasks, all_runners)
@@ -111,15 +103,12 @@ def _create_runner(

 def _model_needs_download(
    runners: Mapping[RunnerId, RunnerSupervisor],
-    download_status: Mapping[ModelId, DownloadProgress],
+    download_status: Mapping[ShardMetadata, DownloadProgress],
 ) -> DownloadModel | None:
    for runner in runners.values():
-        model_id = runner.bound_instance.bound_shard.model_meta.model_id
-        if isinstance(runner.status, RunnerIdle) and (
-            model_id not in download_status
-            or not isinstance(
-                download_status[model_id], (DownloadOngoing, DownloadCompleted)
-            )
+        if (
+            isinstance(runner.status, RunnerWaitingForModel)
+            and runner.bound_instance.bound_shard not in download_status
        ):
            # We don't invalidate download_status randomly in case a file gets deleted on disk
            return DownloadModel(
@@ -128,54 +117,14 @@ def _model_needs_download(
            )


-def _init_distributed_backend(
+""" --- TODO!
+def _init_backend(
    runners: Mapping[RunnerId, RunnerSupervisor],
    all_runners: Mapping[RunnerId, RunnerStatus],
-):
-    for runner in runners.values():
-        instance = runner.bound_instance.instance
-        shard_assignments = instance.shard_assignments
-
-        is_single_node_instance = len(shard_assignments.runner_to_shard) == 1
-        if is_single_node_instance:
-            continue
-
-        runner_is_idle = isinstance(runner.status, RunnerIdle)
-        all_runners_connecting = all(
-            isinstance(
-                all_runners.get(global_runner_id),
-                (RunnerConnecting, RunnerIdle),
-            )
-            for global_runner_id in shard_assignments.runner_to_shard
-        )
-
-        if not (runner_is_idle and all_runners_connecting):
-            continue
-
-        runner_id = runner.bound_instance.bound_runner_id
-
-        shard = runner.bound_instance.bound_shard
-        device_rank = shard.device_rank
-        world_size = shard.world_size
-
-        assert device_rank < world_size
-        assert device_rank >= 0
-
-        accepting_ranks = device_rank < world_size - 1
-
-        # Rank = n-1
-        connecting_rank_ready = device_rank == world_size - 1 and all(
-            isinstance(all_runners.get(global_runner_id, None), RunnerConnecting)
-            for global_runner_id in shard_assignments.runner_to_shard
-            if global_runner_id != runner_id
-        )
-
-        if not (accepting_ranks or connecting_rank_ready):
-            continue
-
-        return ConnectToGroup(instance_id=instance.instance_id)
-
-    return None
+) -> LoadModel | None:
+    for runner in runner.values()
+    pass
+"""


 def _load_model(
@@ -187,33 +136,31 @@ def _load_model(
        instance = runner.bound_instance.instance
        shard_assignments = instance.shard_assignments

-        all_local_downloads_complete = all(
+        all_downloads_complete_local = all(
            nid in global_download_status
            and any(
                isinstance(dp, DownloadCompleted)
-                and dp.shard_metadata.model_meta.model_id == shard_assignments.model_id
+                and dp.shard_metadata == shard_assignments.runner_to_shard[rid]
                for dp in global_download_status[nid]
            )
-            for nid in shard_assignments.node_to_runner
+            for nid, rid in shard_assignments.node_to_runner.items()
        )
-        if not all_local_downloads_complete:
-            continue

-        is_single_node_instance = len(instance.shard_assignments.runner_to_shard) == 1
-        if is_single_node_instance and isinstance(runner.status, RunnerIdle):
-            return LoadModel(instance_id=instance.instance_id)
+        runner_is_waiting = isinstance(runner.status, RunnerWaitingForModel)

-        is_runner_waiting = isinstance(runner.status, RunnerConnected)
-
-        all_ready_for_model = all(
+        all_runners_expecting_model = all(
            isinstance(
-                all_runners.get(global_runner_id, None),
-                (RunnerConnected, RunnerLoading, RunnerLoaded),
+                all_runners.get(global_runner_id),
+                (RunnerWaitingForModel, RunnerLoading, RunnerLoaded),
            )
            for global_runner_id in shard_assignments.runner_to_shard
        )

-        if is_runner_waiting and all_ready_for_model:
+        if (
+            all_downloads_complete_local
+            and runner_is_waiting
+            and all_runners_expecting_model
+        ):
            return LoadModel(instance_id=instance.instance_id)

    return None
@@ -236,8 +183,8 @@ def _ready_to_warmup(
        assert device_rank < world_size
        assert device_rank >= 0

-        # Rank != 0
-        accepting_ranks_ready = device_rank > 0 and all(
+        # Rank != n-1
+        accepting_ranks_ready = device_rank != world_size - 1 and all(
            isinstance(
                all_runners.get(global_runner_id, None),
                (RunnerLoaded, RunnerWarmingUp),
@@ -245,8 +192,8 @@ def _ready_to_warmup(
            for global_runner_id in shard_assignments.runner_to_shard
        )

-        # Rank = 0
-        connecting_rank_ready = device_rank == 0 and all(
+        # Rank = n-1
+        connecting_rank_ready = device_rank == world_size - 1 and all(
            isinstance(all_runners.get(global_runner_id, None), RunnerWarmingUp)
            for global_runner_id in shard_assignments.runner_to_shard
            if global_runner_id != runner_id
@@ -274,8 +221,6 @@ def _pending_tasks(
            if task.instance_id != runner.bound_instance.instance.instance_id:
                continue

-            # TODO: Check ordering aligns with MLX distributeds expectations.
-
            if isinstance(runner.status, RunnerReady) and all(
                isinstance(all_runners[global_runner_id], (RunnerReady, RunnerRunning))
                for global_runner_id in runner.bound_instance.instance.shard_assignments.runner_to_shard
--- a/src/exo/worker/runner/bootstrap.py
+++ b/src/exo/worker/runner/bootstrap.py
@@ -2,13 +2,16 @@ import os

 import loguru

-from exo.shared.types.events import Event, RunnerStatusUpdated
+from exo.shared.types.events import Event
 from exo.shared.types.tasks import Task
 from exo.shared.types.worker.instances import BoundInstance, MlxJacclInstance
-from exo.shared.types.worker.runners import RunnerFailed
 from exo.utils.channels import MpReceiver, MpSender

-logger: "loguru.Logger" = loguru.logger
+logger: "loguru.Logger"
+
+
+if os.getenv("EXO_TESTS") == "1":
+    logger = loguru.logger


 def entrypoint(
@@ -27,23 +30,6 @@ def entrypoint(
    logger = _logger

    # Import main after setting global logger - this lets us just import logger from this module
-    try:
-        from exo.worker.runner.runner import main
+    from exo.worker.runner.runner import main

-        main(bound_instance, event_sender, task_receiver)
-    except Exception as e:
-        logger.opt(exception=e).warning(
-            f"Runner {bound_instance.bound_runner_id} crashed with critical exception {e}"
-        )
-        event_sender.send(
-            RunnerStatusUpdated(
-                runner_id=bound_instance.bound_runner_id,
-                runner_status=RunnerFailed(error_message=str(e)),
-            )
-        )
-    finally:
-        event_sender.close()
-        task_receiver.close()
-        event_sender.join()
-        task_receiver.join()
-        logger.info("bye from the runner")
+    main(bound_instance, event_sender, task_receiver)
--- a/src/exo/worker/runner/runner.py
+++ b/src/exo/worker/runner/runner.py
@@ -11,7 +11,6 @@ from exo.shared.types.events import (
 )
 from exo.shared.types.tasks import (
    ChatCompletion,
-    ConnectToGroup,
    LoadModel,
    Shutdown,
    StartWarmup,
@@ -23,23 +22,20 @@ from exo.shared.types.worker.runner_response import (
    GenerationResponse,
 )
 from exo.shared.types.worker.runners import (
-    RunnerConnected,
-    RunnerConnecting,
    RunnerFailed,
-    RunnerIdle,
    RunnerLoaded,
    RunnerLoading,
    RunnerReady,
    RunnerRunning,
    RunnerShutdown,
    RunnerStatus,
+    RunnerWaitingForModel,
    RunnerWarmingUp,
 )
 from exo.utils.channels import ClosedResourceError, MpReceiver, MpSender
 from exo.worker.engines.mlx.generator.generate import mlx_generate, warmup_inference
 from exo.worker.engines.mlx.utils_mlx import (
    initialize_mlx,
-    load_mlx_items,
    mlx_force_oom,
 )
 from exo.worker.runner.bootstrap import logger
@@ -67,10 +63,9 @@ def main(
        model = None
        tokenizer = None
        sampler = None
-        group = None

-        current_status: RunnerStatus = RunnerIdle()
-        logger.info("runner created")
+        current_status: RunnerStatus = RunnerWaitingForModel()
+        logger.info("runner waiting for model")
        event_sender.send(
            RunnerStatusUpdated(runner_id=runner_id, runner_status=current_status)
        )
@@ -83,26 +78,9 @@ def main(
                )
                event_sender.send(TaskAcknowledged(task_id=task.task_id))
                match task:
-                    case ConnectToGroup() if isinstance(
-                        current_status, (RunnerIdle, RunnerFailed)
+                    case LoadModel() if isinstance(
+                        current_status, (RunnerWaitingForModel, RunnerFailed)
                    ):
-                        logger.info("runner connecting")
-                        current_status = RunnerConnecting()
-                        event_sender.send(
-                            RunnerStatusUpdated(
-                                runner_id=runner_id, runner_status=current_status
-                            )
-                        )
-                        group = initialize_mlx(bound_instance)
-
-                        logger.info("runner connected")
-                        current_status = RunnerConnected()
-
-                    # we load the model if it's connected with a group, or idle without a group. we should never tell a model to connect if it doesn't need to
-                    case LoadModel() if (
-                        isinstance(current_status, RunnerConnected)
-                        and group is not None
-                    ) or (isinstance(current_status, RunnerIdle) and group is None):
                        current_status = RunnerLoading()
                        logger.info("runner loading")
                        event_sender.send(
@@ -111,12 +89,15 @@ def main(
                            )
                        )

-                        model, tokenizer, sampler = load_mlx_items(
-                            bound_instance, group
-                        )
+                        model, tokenizer, sampler = initialize_mlx(bound_instance)

                        current_status = RunnerLoaded()
                        logger.info("runner loaded")
+                        event_sender.send(
+                            RunnerStatusUpdated(
+                                runner_id=runner_id, runner_status=current_status
+                            )
+                        )
                    case StartWarmup() if isinstance(current_status, RunnerLoaded):
                        assert model
                        assert tokenizer
@@ -142,6 +123,11 @@ def main(
                        )
                        current_status = RunnerReady()
                        logger.info("runner ready")
+                        event_sender.send(
+                            RunnerStatusUpdated(
+                                runner_id=runner_id, runner_status=RunnerReady()
+                            )
+                        )
                    case ChatCompletion(
                        task_params=task_params, command_id=command_id
                    ) if isinstance(current_status, RunnerReady):
@@ -186,6 +172,11 @@ def main(

                        current_status = RunnerReady()
                        logger.info("runner ready")
+                        event_sender.send(
+                            RunnerStatusUpdated(
+                                runner_id=runner_id, runner_status=RunnerReady()
+                            )
+                        )
                    case Shutdown():
                        logger.info("runner shutting down")
                        event_sender.send(
@@ -195,19 +186,12 @@ def main(
                        )
                        break
                    case _:
-                        raise ValueError(
-                            f"Received {task.__class__.__name__} outside of state machine in {current_status=}"
-                        )
+                        raise ValueError("Received task outside of state machine")
                event_sender.send(
                    TaskStatusUpdated(
                        task_id=task.task_id, task_status=TaskStatus.Complete
                    )
                )
-                event_sender.send(
-                    RunnerStatusUpdated(
-                        runner_id=runner_id, runner_status=current_status
-                    )
-                )
        event_sender.send(
            RunnerStatusUpdated(runner_id=runner_id, runner_status=RunnerShutdown())
        )
--- a/src/exo/worker/runner/runner_supervisor.py
+++ b/src/exo/worker/runner/runner_supervisor.py
@@ -19,8 +19,8 @@ from exo.shared.types.tasks import Task, TaskId
 from exo.shared.types.worker.instances import BoundInstance
 from exo.shared.types.worker.runners import (
    RunnerFailed,
-    RunnerIdle,
    RunnerStatus,
+    RunnerWaitingForModel,
 )
 from exo.shared.types.worker.shards import ShardMetadata
 from exo.utils.channels import MpReceiver, MpSender, Sender, mp_channel
@@ -41,7 +41,7 @@ class RunnerSupervisor:
    _event_sender: Sender[Event]
    # err_path: str
    _tg: TaskGroup | None = field(default=None, init=False)
-    status: RunnerStatus = field(default_factory=RunnerIdle, init=False)
+    status: RunnerStatus = field(default_factory=RunnerWaitingForModel, init=False)
    pending: dict[TaskId, anyio.Event] = field(default_factory=dict, init=False)

    @classmethod
--- a/src/exo/worker/tests/constants.py
+++ b/src/exo/worker/tests/constants.py
@@ -9,11 +9,9 @@ MASTER_NODE_ID = NodeId("ffffffff-aaaa-4aaa-8aaa-aaaaaaaaaaaa")

 NODE_A: Final[NodeId] = NodeId("aaaaaaaa-aaaa-4aaa-8aaa-aaaaaaaaaaaa")
 NODE_B: Final[NodeId] = NodeId("bbbbbbbb-bbbb-4bbb-8bbb-bbbbbbbbbbbb")
-NODE_C: Final[NodeId] = NodeId("cccccccc-cccc-4ccc-8ccc-cccccccccccc")

 RUNNER_1_ID: Final[RunnerId] = RunnerId("11111111-1111-4111-8111-111111111111")
 RUNNER_2_ID: Final[RunnerId] = RunnerId("33333333-3333-4333-8333-333333333333")
-RUNNER_3_ID: Final[RunnerId] = RunnerId("Runner3")

 INSTANCE_1_ID: Final[InstanceId] = InstanceId("22222222-2222-4222-8222-222222222222")
 INSTANCE_2_ID: Final[InstanceId] = InstanceId("44444444-4444-4444-8444-444444444444")
@@ -26,9 +24,3 @@ TASK_2_ID: Final[TaskId] = TaskId("66666666-6666-4666-8666-666666666666")

 COMMAND_1_ID: Final[CommandId] = CommandId("77777777-7777-4777-8777-777777777777")
 COMMAND_2_ID: Final[CommandId] = CommandId("88888888-8888-4888-8888-888888888888")
-
-SHUTDOWN_TASK_ID = TaskId("shutdown")
-CHAT_COMPLETION_TASK_ID = TaskId("chat-completion")
-INITIALIZATION_TASK_ID = TaskId("initialisation")
-LOAD_TASK_ID = TaskId("load")
-WARMUP_TASK_ID = TaskId("warmup")
--- a/src/exo/worker/tests/unittests/conftest.py
+++ b/src/exo/worker/tests/unittests/conftest.py
@@ -1,5 +1,3 @@
-from __future__ import annotations
-
 from dataclasses import dataclass

 from exo.shared.types.common import NodeId
@@ -16,7 +14,6 @@ from exo.shared.types.worker.runners import RunnerId, RunnerStatus, ShardAssignm
 from exo.shared.types.worker.shards import PipelineShardMetadata, ShardMetadata


-# Runner supervisor without multiprocessing logic.
@dataclass(frozen=True)
 class FakeRunnerSupervisor:
    bound_instance: BoundInstance
@@ -38,8 +35,6 @@ def get_pipeline_shard_metadata(
            pretty_name=str(model_id),
            storage_size=Memory.from_mb(100000),
            n_layers=32,
-            hidden_size=2048,
-            supports_tensor=False,
        ),
        device_rank=device_rank,
        world_size=world_size,
@@ -75,24 +70,3 @@ def get_mlx_ring_instance(
        hosts_by_node={},
        ephemeral_port=50000,
    )
-
-
-def get_bound_mlx_ring_instance(
-    instance_id: InstanceId, model_id: ModelId, runner_id: RunnerId, node_id: NodeId
-) -> BoundInstance:
-    shard = get_pipeline_shard_metadata(model_id=model_id, device_rank=0, world_size=2)
-    other_shard = get_pipeline_shard_metadata(
-        model_id=model_id, device_rank=1, world_size=2
-    )
-    instance = get_mlx_ring_instance(
-        instance_id=instance_id,
-        model_id=model_id,
-        node_to_runner={
-            node_id: runner_id,
-            NodeId("other_node"): RunnerId("other_runner"),
-        },
-        runner_to_shard={runner_id: shard, RunnerId("other_runner"): other_shard},
-    )
-    return BoundInstance(
-        instance=instance, bound_runner_id=runner_id, bound_node_id=node_id
-    )
--- a/src/exo/worker/tests/unittests/test_plan/test_download_and_loading.py
+++ b/src/exo/worker/tests/unittests/test_plan/test_download_and_loading.py
@@ -1,13 +1,12 @@
 import exo.worker.plan as plan_mod
 from exo.shared.types.common import NodeId
-from exo.shared.types.models import ModelId
 from exo.shared.types.tasks import LoadModel
 from exo.shared.types.worker.downloads import DownloadCompleted, DownloadProgress
 from exo.shared.types.worker.instances import BoundInstance
 from exo.shared.types.worker.runners import (
-    RunnerConnected,
-    RunnerIdle,
+    RunnerWaitingForModel,
 )
+from exo.shared.types.worker.shards import ShardMetadata
 from exo.worker.tests.constants import (
    INSTANCE_1_ID,
    MODEL_A_ID,
@@ -39,14 +38,16 @@ def test_plan_requests_download_when_waiting_and_shard_not_downloaded():
    bound_instance = BoundInstance(
        instance=instance, bound_runner_id=RUNNER_1_ID, bound_node_id=NODE_A
    )
-    runner = FakeRunnerSupervisor(bound_instance=bound_instance, status=RunnerIdle())
+    runner = FakeRunnerSupervisor(
+        bound_instance=bound_instance, status=RunnerWaitingForModel()
+    )

    runners = {RUNNER_1_ID: runner}
    instances = {INSTANCE_1_ID: instance}
-    all_runners = {RUNNER_1_ID: RunnerIdle()}
+    all_runners = {RUNNER_1_ID: RunnerWaitingForModel()}

    # No entry for this shard -> should trigger DownloadModel
-    download_status: dict[ModelId, DownloadProgress] = {}
+    download_status: dict[ShardMetadata, DownloadProgress] = {}

    result = plan_mod.plan(
        node_id=NODE_A,
@@ -81,20 +82,20 @@ def test_plan_loads_model_when_all_shards_downloaded_and_waiting():
        instance=instance, bound_runner_id=RUNNER_1_ID, bound_node_id=NODE_A
    )
    local_runner = FakeRunnerSupervisor(
-        bound_instance=bound_instance, status=RunnerConnected()
+        bound_instance=bound_instance, status=RunnerWaitingForModel()
    )

    runners = {RUNNER_1_ID: local_runner}
    instances = {INSTANCE_1_ID: instance}

    all_runners = {
-        RUNNER_1_ID: RunnerConnected(),
-        RUNNER_2_ID: RunnerConnected(),
+        RUNNER_1_ID: RunnerWaitingForModel(),
+        RUNNER_2_ID: RunnerWaitingForModel(),
    }

    # Local node has already marked its shard as downloaded (not actually used by _load_model)
    local_download_status = {
-        MODEL_A_ID: DownloadCompleted(shard_metadata=shard1, node_id=NODE_A)
+        shard1: DownloadCompleted(shard_metadata=shard1, node_id=NODE_A)  # type: ignore[reportUnhashable]
    }

    # Global view has completed downloads for both nodes
@@ -132,15 +133,17 @@ def test_plan_does_not_request_download_when_shard_already_downloaded():
    bound_instance = BoundInstance(
        instance=instance, bound_runner_id=RUNNER_1_ID, bound_node_id=NODE_A
    )
-    runner = FakeRunnerSupervisor(bound_instance=bound_instance, status=RunnerIdle())
+    runner = FakeRunnerSupervisor(
+        bound_instance=bound_instance, status=RunnerWaitingForModel()
+    )

    runners = {RUNNER_1_ID: runner}
    instances = {INSTANCE_1_ID: instance}
-    all_runners = {RUNNER_1_ID: RunnerIdle()}
+    all_runners = {RUNNER_1_ID: RunnerWaitingForModel()}

    # Local status claims the shard is downloaded already
    local_download_status = {
-        MODEL_A_ID: DownloadCompleted(shard_metadata=shard, node_id=NODE_A)
+        shard: DownloadCompleted(shard_metadata=shard, node_id=NODE_A)  # type: ignore[reportUnhashable]
    }

    # Global view hasn't caught up yet (no completed shards recorded for NODE_A)
@@ -180,19 +183,19 @@ def test_plan_does_not_load_model_until_all_shards_downloaded_globally():
        instance=instance, bound_runner_id=RUNNER_1_ID, bound_node_id=NODE_A
    )
    local_runner = FakeRunnerSupervisor(
-        bound_instance=bound_instance, status=RunnerConnected()
+        bound_instance=bound_instance, status=RunnerWaitingForModel()
    )

    runners = {RUNNER_1_ID: local_runner}
    instances = {INSTANCE_1_ID: instance}
    all_runners = {
-        RUNNER_1_ID: RunnerConnected(),
-        RUNNER_2_ID: RunnerConnected(),
+        RUNNER_1_ID: RunnerWaitingForModel(),
+        RUNNER_2_ID: RunnerWaitingForModel(),
    }

    # Only NODE_A's shard is recorded as downloaded globally
    local_download_status = {
-        MODEL_A_ID: DownloadCompleted(shard_metadata=shard1, node_id=NODE_A)
+        shard1: DownloadCompleted(shard_metadata=shard1, node_id=NODE_A)  # type: ignore[reportUnhashable]
    }
    global_download_status = {
        NODE_A: [DownloadCompleted(shard_metadata=shard1, node_id=NODE_A)],
@@ -210,22 +213,3 @@ def test_plan_does_not_load_model_until_all_shards_downloaded_globally():
    )

    assert result is None
-
-    global_download_status = {
-        NODE_A: [DownloadCompleted(shard_metadata=shard1, node_id=NODE_A)],
-        NODE_B: [
-            DownloadCompleted(shard_metadata=shard2, node_id=NODE_B)
-        ],  # NODE_B has no downloads completed yet
-    }
-
-    result = plan_mod.plan(
-        node_id=NODE_A,
-        runners=runners,  # type: ignore
-        download_status=local_download_status,
-        global_download_status=global_download_status,
-        instances=instances,
-        all_runners=all_runners,
-        tasks={},
-    )
-
-    assert result is not None
--- a/src/exo/worker/tests/unittests/test_plan/test_task_forwarding.py
+++ b/src/exo/worker/tests/unittests/test_plan/test_task_forwarding.py
@@ -5,9 +5,9 @@ from exo.shared.types.api import ChatCompletionTaskParams
 from exo.shared.types.tasks import ChatCompletion, Task, TaskId, TaskStatus
 from exo.shared.types.worker.instances import BoundInstance, InstanceId
 from exo.shared.types.worker.runners import (
-    RunnerIdle,
    RunnerReady,
    RunnerRunning,
+    RunnerWaitingForModel,
 )
 from exo.worker.tests.constants import (
    COMMAND_1_ID,
@@ -99,7 +99,7 @@ def test_plan_does_not_forward_chat_completion_if_any_runner_not_ready():
    instances = {INSTANCE_1_ID: instance}
    all_runners = {
        RUNNER_1_ID: RunnerReady(),
-        RUNNER_2_ID: RunnerIdle(),
+        RUNNER_2_ID: RunnerWaitingForModel(),
    }

    task = ChatCompletion(
--- a/src/exo/worker/tests/unittests/test_plan/test_warmup.py
+++ b/src/exo/worker/tests/unittests/test_plan/test_warmup.py
@@ -2,9 +2,8 @@ import exo.worker.plan as plan_mod
 from exo.shared.types.tasks import StartWarmup
 from exo.shared.types.worker.instances import BoundInstance
 from exo.shared.types.worker.runners import (
-    RunnerIdle,
    RunnerLoaded,
-    RunnerLoading,
+    RunnerWaitingForModel,
    RunnerWarmingUp,
 )
 from exo.worker.tests.constants import (
@@ -12,10 +11,8 @@ from exo.worker.tests.constants import (
    MODEL_A_ID,
    NODE_A,
    NODE_B,
-    NODE_C,
    RUNNER_1_ID,
    RUNNER_2_ID,
-    RUNNER_3_ID,
 )
 from exo.worker.tests.unittests.conftest import (
    FakeRunnerSupervisor,
@@ -26,137 +23,9 @@ from exo.worker.tests.unittests.conftest import (

 def test_plan_starts_warmup_for_accepting_rank_when_all_loaded_or_warming():
    """
-    For non-zero device_rank shards, StartWarmup should be emitted when all
-    shards in the instance are Loaded/WarmingUp.
-    """
-    shard0 = get_pipeline_shard_metadata(MODEL_A_ID, device_rank=0, world_size=3)
-    shard1 = get_pipeline_shard_metadata(MODEL_A_ID, device_rank=1, world_size=3)
-    shard2 = get_pipeline_shard_metadata(MODEL_A_ID, device_rank=2, world_size=3)
-    instance = get_mlx_ring_instance(
-        instance_id=INSTANCE_1_ID,
-        model_id=MODEL_A_ID,
-        node_to_runner={NODE_A: RUNNER_1_ID, NODE_B: RUNNER_2_ID, NODE_C: RUNNER_3_ID},
-        runner_to_shard={RUNNER_1_ID: shard0, RUNNER_2_ID: shard1, RUNNER_3_ID: shard2},
-    )
-
-    bound_instance = BoundInstance(
-        instance=instance, bound_runner_id=RUNNER_2_ID, bound_node_id=NODE_B
-    )
-    local_runner = FakeRunnerSupervisor(
-        bound_instance=bound_instance, status=RunnerLoaded()
-    )
-
-    runners = {RUNNER_2_ID: local_runner}
-    instances = {INSTANCE_1_ID: instance}
-    all_runners = {
-        RUNNER_1_ID: RunnerLoaded(),
-        RUNNER_2_ID: RunnerLoaded(),
-        RUNNER_3_ID: RunnerWarmingUp(),
-    }
-
-    result = plan_mod.plan(
-        node_id=NODE_B,
-        runners=runners,  # type: ignore
-        download_status={},
-        global_download_status={NODE_A: []},
-        instances=instances,
-        all_runners=all_runners,
-        tasks={},
-    )
-
-    assert isinstance(result, StartWarmup)
-    assert result.instance_id == INSTANCE_1_ID
-
-
-def test_plan_starts_warmup_for_rank_zero_after_others_warming():
-    """
-    For device_rank == 0, StartWarmup should only be emitted once all the
-    other runners in the instance are already warming up.
-    """
-    shard0 = get_pipeline_shard_metadata(MODEL_A_ID, device_rank=0, world_size=2)
-    shard1 = get_pipeline_shard_metadata(MODEL_A_ID, device_rank=1, world_size=2)
-    instance = get_mlx_ring_instance(
-        instance_id=INSTANCE_1_ID,
-        model_id=MODEL_A_ID,
-        node_to_runner={NODE_A: RUNNER_1_ID, NODE_B: RUNNER_2_ID},
-        runner_to_shard={RUNNER_1_ID: shard0, RUNNER_2_ID: shard1},
-    )
-
-    bound_instance = BoundInstance(
-        instance=instance, bound_runner_id=RUNNER_1_ID, bound_node_id=NODE_A
-    )
-    local_runner = FakeRunnerSupervisor(
-        bound_instance=bound_instance, status=RunnerLoaded()
-    )
-
-    runners = {RUNNER_1_ID: local_runner}
-    instances = {INSTANCE_1_ID: instance}
-    all_runners = {
-        RUNNER_1_ID: RunnerLoaded(),
-        RUNNER_2_ID: RunnerWarmingUp(),
-    }
-
-    result = plan_mod.plan(
-        node_id=NODE_A,
-        runners=runners,  # type: ignore
-        download_status={},
-        global_download_status={NODE_A: []},
-        instances=instances,
-        all_runners=all_runners,
-        tasks={},
-    )
-
-    assert isinstance(result, StartWarmup)
-    assert result.instance_id == INSTANCE_1_ID
-
-
-def test_plan_does_not_start_warmup_for_non_zero_rank_until_all_loaded_or_warming():
-    """
-    Non-zero rank should not start warmup while any shard is not Loaded/WarmingUp.
-    """
-    shard0 = get_pipeline_shard_metadata(MODEL_A_ID, device_rank=0, world_size=2)
-    shard1 = get_pipeline_shard_metadata(MODEL_A_ID, device_rank=1, world_size=2)
-    instance = get_mlx_ring_instance(
-        instance_id=INSTANCE_1_ID,
-        model_id=MODEL_A_ID,
-        node_to_runner={NODE_A: RUNNER_1_ID, NODE_B: RUNNER_2_ID},
-        runner_to_shard={RUNNER_1_ID: shard0, RUNNER_2_ID: shard1},
-    )
-
-    bound_instance = BoundInstance(
-        instance=instance, bound_runner_id=RUNNER_2_ID, bound_node_id=NODE_B
-    )
-    local_runner = FakeRunnerSupervisor(
-        bound_instance=bound_instance, status=RunnerLoaded()
-    )
-
-    runners = {RUNNER_2_ID: local_runner}
-    instances = {INSTANCE_1_ID: instance}
-    all_runners = {
-        RUNNER_1_ID: RunnerIdle(),
-        RUNNER_2_ID: RunnerLoaded(),
-    }
-
-    result = plan_mod.plan(
-        node_id=NODE_B,
-        runners=runners,  # type: ignore
-        download_status={},
-        global_download_status={NODE_A: [], NODE_B: []},
-        instances=instances,
-        all_runners=all_runners,
-        tasks={},
-    )
-
-    assert result is None
-
-
-def test_plan_does_not_start_warmup_for_rank_zero_until_others_warming():
-    """
-    Rank-zero shard should not start warmup until all non-zero ranks are
-    already WarmingUp.
-    For accepting ranks (device_rank != 0), StartWarmup should be
+    For accepting ranks (device_rank != world_size - 1), StartWarmup should be
    emitted when all shards in the instance are Loaded/WarmingUp.
-    In a 2-node setup, rank 1 is the accepting rank.
+    In a 2-node setup, rank 0 is the accepting rank.
    """
    shard0 = get_pipeline_shard_metadata(MODEL_A_ID, device_rank=0, world_size=2)
    shard1 = get_pipeline_shard_metadata(MODEL_A_ID, device_rank=1, world_size=2)
@@ -167,7 +36,7 @@ def test_plan_does_not_start_warmup_for_rank_zero_until_others_warming():
        runner_to_shard={RUNNER_1_ID: shard0, RUNNER_2_ID: shard1},
    )

-    # Rank 1 is the accepting rank
+    # Rank 0 is the accepting rank
    bound_instance = BoundInstance(
        instance=instance, bound_runner_id=RUNNER_1_ID, bound_node_id=NODE_A
    )
@@ -192,23 +61,6 @@ def test_plan_does_not_start_warmup_for_rank_zero_until_others_warming():
        tasks={},
    )

-    assert result is None
-
-    all_runners = {
-        RUNNER_1_ID: RunnerLoaded(),
-        RUNNER_2_ID: RunnerWarmingUp(),
-    }
-
-    result = plan_mod.plan(
-        node_id=NODE_A,
-        runners=runners,  # type: ignore
-        download_status={},
-        global_download_status={NODE_A: []},
-        instances=instances,
-        all_runners=all_runners,
-        tasks={},
-    )
-
    assert isinstance(result, StartWarmup)
    assert result.instance_id == INSTANCE_1_ID

@@ -283,7 +135,7 @@ def test_plan_does_not_start_warmup_for_accepting_rank_until_all_loaded_or_warmi
    instances = {INSTANCE_1_ID: instance}
    all_runners = {
        RUNNER_1_ID: RunnerLoaded(),
-        RUNNER_2_ID: RunnerLoading(),
+        RUNNER_2_ID: RunnerWaitingForModel(),
    }

    result = plan_mod.plan(
@@ -301,8 +153,9 @@ def test_plan_does_not_start_warmup_for_accepting_rank_until_all_loaded_or_warmi

 def test_plan_does_not_start_warmup_for_connecting_rank_until_others_warming():
    """
-    Connecting rank (device_rank == 0) should not start warmup
+    Connecting rank (device_rank == world_size - 1) should not start warmup
    until all other ranks are already WarmingUp.
+    In a 2-node setup, rank 1 is the connecting rank.
    """
    shard0 = get_pipeline_shard_metadata(MODEL_A_ID, device_rank=0, world_size=2)
    shard1 = get_pipeline_shard_metadata(MODEL_A_ID, device_rank=1, world_size=2)
@@ -315,13 +168,13 @@ def test_plan_does_not_start_warmup_for_connecting_rank_until_others_warming():

    # Rank 1 is the connecting rank
    bound_instance = BoundInstance(
-        instance=instance, bound_runner_id=RUNNER_1_ID, bound_node_id=NODE_A
+        instance=instance, bound_runner_id=RUNNER_2_ID, bound_node_id=NODE_B
    )
    local_runner = FakeRunnerSupervisor(
        bound_instance=bound_instance, status=RunnerLoaded()
    )

-    runners = {RUNNER_1_ID: local_runner}
+    runners = {RUNNER_2_ID: local_runner}
    instances = {INSTANCE_1_ID: instance}
    all_runners = {
        RUNNER_1_ID: RunnerLoaded(),
@@ -329,7 +182,7 @@ def test_plan_does_not_start_warmup_for_connecting_rank_until_others_warming():
    }

    result = plan_mod.plan(
-        node_id=NODE_A,
+        node_id=NODE_B,
        runners=runners,  # type: ignore
        download_status={},
        global_download_status={NODE_A: [], NODE_B: []},
--- a/src/exo/worker/tests/unittests/test_runner/test_event_ordering.py
+++ b/src/exo/worker/tests/unittests/test_runner/test_event_ordering.py
@@ -1,208 +0,0 @@
-# Check tasks are complete before runner is ever ready.
-from collections.abc import Iterable
-from typing import Callable
-
-import pytest
-
-import exo.worker.runner.runner as mlx_runner
-from exo.shared.types.api import ChatCompletionMessage
-from exo.shared.types.chunks import TokenChunk
-from exo.shared.types.events import (
-    ChunkGenerated,
-    Event,
-    RunnerStatusUpdated,
-    TaskAcknowledged,
-    TaskStatusUpdated,
-)
-from exo.shared.types.tasks import (
-    ChatCompletion,
-    ChatCompletionTaskParams,
-    ConnectToGroup,
-    LoadModel,
-    Shutdown,
-    StartWarmup,
-    Task,
-    TaskStatus,
-)
-from exo.shared.types.worker.runner_response import GenerationResponse
-from exo.shared.types.worker.runners import (
-    RunnerConnected,
-    RunnerConnecting,
-    RunnerIdle,
-    RunnerLoaded,
-    RunnerLoading,
-    RunnerReady,
-    RunnerRunning,
-    RunnerShutdown,
-    RunnerWarmingUp,
-)
-from exo.utils.channels import mp_channel
-
-from ...constants import (
-    CHAT_COMPLETION_TASK_ID,
-    COMMAND_1_ID,
-    INITIALIZATION_TASK_ID,
-    INSTANCE_1_ID,
-    LOAD_TASK_ID,
-    MODEL_A_ID,
-    NODE_A,
-    RUNNER_1_ID,
-    SHUTDOWN_TASK_ID,
-    WARMUP_TASK_ID,
-)
-from ..conftest import get_bound_mlx_ring_instance
-
-
-def make_nothin[T, U, V](res: T) -> Callable[[], T]:
-    def nothin(*_1: U, **_2: V) -> T:
-        return res
-
-    return nothin
-
-
-nothin = make_nothin(None)
-
-
-INIT_TASK = ConnectToGroup(
-    task_id=INITIALIZATION_TASK_ID,
-    instance_id=INSTANCE_1_ID,
-)
-
-LOAD_TASK = LoadModel(
-    task_id=LOAD_TASK_ID,
-    instance_id=INSTANCE_1_ID,
-)
-
-WARMUP_TASK = StartWarmup(
-    task_id=WARMUP_TASK_ID,
-    instance_id=INSTANCE_1_ID,
-)
-
-SHUTDOWN_TASK = Shutdown(
-    task_id=SHUTDOWN_TASK_ID,
-    instance_id=INSTANCE_1_ID,
-    runner_id=RUNNER_1_ID,
-)
-
-CHAT_PARAMS = ChatCompletionTaskParams(
-    model=str(MODEL_A_ID),
-    messages=[ChatCompletionMessage(role="user", content="hello")],
-    stream=True,
-    max_tokens=4,
-    temperature=0.0,
-)
-
-CHAT_TASK = ChatCompletion(
-    task_id=CHAT_COMPLETION_TASK_ID,
-    command_id=COMMAND_1_ID,
-    task_params=CHAT_PARAMS,
-    instance_id=INSTANCE_1_ID,
-)
-
-
-def assert_events_equal(test_events: Iterable[Event], true_events: Iterable[Event]):
-    for test_event, true_event in zip(test_events, true_events, strict=True):
-        test_event.event_id = true_event.event_id
-        assert test_event == true_event, f"{test_event} != {true_event}"
-
-
-@pytest.fixture
-def patch_out_mlx(monkeypatch: pytest.MonkeyPatch):
-    # initialize_mlx returns a "group" equal to 1
-    monkeypatch.setattr(mlx_runner, "initialize_mlx", make_nothin(1))
-    monkeypatch.setattr(mlx_runner, "load_mlx_items", make_nothin((1, 1, 1)))
-    monkeypatch.setattr(mlx_runner, "warmup_inference", make_nothin(1))
-    monkeypatch.setattr(mlx_runner, "_check_for_debug_prompts", nothin)
-
-    def fake_generate(*_1: object, **_2: object):
-        yield GenerationResponse(token=0, text="hi", finish_reason="stop")
-
-    monkeypatch.setattr(mlx_runner, "mlx_generate", fake_generate)
-
-
-def _run(tasks: Iterable[Task]):
-    bound_instance = get_bound_mlx_ring_instance(
-        instance_id=INSTANCE_1_ID,
-        model_id=MODEL_A_ID,
-        runner_id=RUNNER_1_ID,
-        node_id=NODE_A,
-    )
-
-    task_sender, task_receiver = mp_channel[Task]()
-    event_sender, event_receiver = mp_channel[Event]()
-
-    with task_sender, event_receiver:
-        for t in tasks:
-            task_sender.send(t)
-
-        # worst monkeypatch known to man
-        # this is some c++ nonsense
-        event_sender.close = nothin
-        event_sender.join = nothin
-        task_receiver.close = nothin
-        task_receiver.join = nothin
-
-        mlx_runner.main(bound_instance, event_sender, task_receiver)
-
-        return event_receiver.collect()
-
-
-def test_events_processed_in_correct_order(patch_out_mlx: pytest.MonkeyPatch):
-    events = _run([INIT_TASK, LOAD_TASK, WARMUP_TASK, CHAT_TASK, SHUTDOWN_TASK])
-
-    expected_chunk = ChunkGenerated(
-        command_id=COMMAND_1_ID,
-        chunk=TokenChunk(
-            idx=0,
-            model=MODEL_A_ID,
-            text="hi",
-            token_id=0,
-            finish_reason="stop",
-        ),
-    )
-
-    assert_events_equal(
-        events,
-        [
-            RunnerStatusUpdated(runner_id=RUNNER_1_ID, runner_status=RunnerIdle()),
-            TaskStatusUpdated(
-                task_id=INITIALIZATION_TASK_ID, task_status=TaskStatus.Running
-            ),
-            TaskAcknowledged(task_id=INITIALIZATION_TASK_ID),
-            RunnerStatusUpdated(
-                runner_id=RUNNER_1_ID, runner_status=RunnerConnecting()
-            ),
-            TaskStatusUpdated(
-                task_id=INITIALIZATION_TASK_ID, task_status=TaskStatus.Complete
-            ),
-            RunnerStatusUpdated(runner_id=RUNNER_1_ID, runner_status=RunnerConnected()),
-            TaskStatusUpdated(task_id=LOAD_TASK_ID, task_status=TaskStatus.Running),
-            TaskAcknowledged(task_id=LOAD_TASK_ID),
-            RunnerStatusUpdated(runner_id=RUNNER_1_ID, runner_status=RunnerLoading()),
-            TaskStatusUpdated(task_id=LOAD_TASK_ID, task_status=TaskStatus.Complete),
-            RunnerStatusUpdated(runner_id=RUNNER_1_ID, runner_status=RunnerLoaded()),
-            TaskStatusUpdated(task_id=WARMUP_TASK_ID, task_status=TaskStatus.Running),
-            TaskAcknowledged(task_id=WARMUP_TASK_ID),
-            RunnerStatusUpdated(runner_id=RUNNER_1_ID, runner_status=RunnerWarmingUp()),
-            TaskStatusUpdated(task_id=WARMUP_TASK_ID, task_status=TaskStatus.Complete),
-            RunnerStatusUpdated(runner_id=RUNNER_1_ID, runner_status=RunnerReady()),
-            TaskStatusUpdated(
-                task_id=CHAT_COMPLETION_TASK_ID, task_status=TaskStatus.Running
-            ),
-            TaskAcknowledged(task_id=CHAT_COMPLETION_TASK_ID),
-            RunnerStatusUpdated(runner_id=RUNNER_1_ID, runner_status=RunnerRunning()),
-            expected_chunk,
-            TaskStatusUpdated(
-                task_id=CHAT_COMPLETION_TASK_ID, task_status=TaskStatus.Complete
-            ),
-            # CHAT COMPLETION TASK SHOULD COMPLETE BEFORE RUNNER READY
-            RunnerStatusUpdated(runner_id=RUNNER_1_ID, runner_status=RunnerReady()),
-            TaskStatusUpdated(task_id=SHUTDOWN_TASK_ID, task_status=TaskStatus.Running),
-            TaskAcknowledged(task_id=SHUTDOWN_TASK_ID),
-            TaskStatusUpdated(
-                task_id=SHUTDOWN_TASK_ID, task_status=TaskStatus.Complete
-            ),
-            # SPECIAL EXCEPTION FOR RUNNER SHUTDOWN
-            RunnerStatusUpdated(runner_id=RUNNER_1_ID, runner_status=RunnerShutdown()),
-        ],
-    )
--- a/src/exo/worker/tests/unittests/test_runner/test_runner_supervisor.py
+++ b/src/exo/worker/tests/unittests/test_runner/test_runner_supervisor.py
@@ -1 +0,0 @@
-# TODO:
--- a/src/exo/worker/utils/net_profile.py
+++ b/src/exo/worker/utils/net_profile.py
@@ -15,7 +15,7 @@ async def check_reachability(
 ) -> None:
    """Check if a node is reachable at the given IP and verify its identity."""

-    def _fetch_remote_node_id() -> NodeId | None:
+    def _fetch_remote_node_id() -> str | None:
        connection = http.client.HTTPConnection(target_ip, 52415, timeout=1)
        try:
            connection.request("GET", "/node_id")
@@ -29,17 +29,19 @@ async def check_reachability(
            if body.startswith('"') and body.endswith('"') and len(body) >= 2:
                body = body[1:-1]

-            return NodeId(body) or None
+            return body or None
        except OSError:
            return None
        finally:
            connection.close()

-    remote_node_id = await to_thread.run_sync(_fetch_remote_node_id)
-    if remote_node_id is None:
+    remote_node_id_raw = await to_thread.run_sync(_fetch_remote_node_id)
+    if remote_node_id_raw is None:
        return

+    remote_node_id = NodeId(remote_node_id_raw)
    if remote_node_id == self_node_id:
+        # Connected to ourselves via loopback - skip
        return

    if remote_node_id != expected_node_id:
--- a/uv.lock
+++ b/uv.lock
@@ -334,10 +334,8 @@ dependencies = [
    { name = "hypercorn", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
    { name = "loguru", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
    { name = "mlx", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
-    { name = "mlx", extra = ["cpu"], marker = "sys_platform == 'linux'" },
    { name = "mlx-lm", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
    { name = "networkx", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
-    { name = "openai-harmony", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
    { name = "protobuf", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
    { name = "psutil", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
    { name = "pydantic", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
@@ -376,11 +374,9 @@ requires-dist = [
    { name = "huggingface-hub", specifier = ">=0.33.4" },
    { name = "hypercorn", specifier = ">=0.18.0" },
    { name = "loguru", specifier = ">=0.7.3" },
-    { name = "mlx", marker = "sys_platform == 'darwin'", specifier = ">=0.30.1" },
-    { name = "mlx", extras = ["cpu"], marker = "sys_platform == 'linux'", specifier = ">=0.30.1" },
+    { name = "mlx", specifier = ">=0.30.1" },
    { name = "mlx-lm", specifier = ">=0.28.3" },
    { name = "networkx", specifier = ">=3.5" },
-    { name = "openai-harmony", specifier = ">=0.0.8" },
    { name = "protobuf", specifier = ">=6.32.0" },
    { name = "psutil", specifier = ">=7.0.0" },
    { name = "pydantic", specifier = ">=2.11.7" },
@@ -805,20 +801,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/d4/ff/1e1968f107b4221a98dc26832586b1f646b27ddf3e55c95051c09d751f0a/mlx-0.30.1-cp314-cp314-manylinux_2_35_x86_64.whl", hash = "sha256:d18012d5cf0f013bc4a405cfd1e9d2d28e798f4d2dc4f15aa0fbffff73c02ba2", size = 687114, upload-time = "2025-12-18T01:55:56.506Z" },
 ]

-[package.optional-dependencies]
-cpu = [
-    { name = "mlx-cpu", marker = "sys_platform == 'linux'" },
-]
-
-[[package]]
-name = "mlx-cpu"
-version = "0.30.1"
-source = { registry = "https://pypi.org/simple" }
-wheels = [
-    { url = "https://files.pythonhosted.org/packages/64/51/32903727a68a61e972383e28a775c1f5e5f0628552c85cbc6103d68c0dc4/mlx_cpu-0.30.1-py3-none-manylinux_2_35_aarch64.whl", hash = "sha256:3f5dc2e4d0849181f8253508bb6a0854250483fc63d43ac79ec614b19824b172", size = 8992394, upload-time = "2025-12-18T00:16:13.696Z" },
-    { url = "https://files.pythonhosted.org/packages/0c/74/69c21bb907f3c4064881ab0653029c939ae15fc4e63a5301ef8643cb1d68/mlx_cpu-0.30.1-py3-none-manylinux_2_35_x86_64.whl", hash = "sha256:c9ea6992d8c001e1123dfd3b4d4405ff576c787eec52656ad405e3d033a8be60", size = 10553055, upload-time = "2025-12-18T00:16:16.104Z" },
-]
-
 [[package]]
 name = "mlx-lm"
 version = "0.28.3"
@@ -964,27 +946,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/e2/c1/6dba12fdf68b02a21ac411c9df19afa66bed2540f467150ca64d246b463d/numpy-2.3.4-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:e1708fac43ef8b419c975926ce1eaf793b0c13b7356cfab6ab0dc34c0a02ac0f", size = 18652691, upload-time = "2025-10-15T16:17:46.247Z" },
 ]

-[[package]]
-name = "openai-harmony"
-version = "0.0.8"
-source = { registry = "https://pypi.org/simple" }
-dependencies = [
-    { name = "pydantic", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
-]
-sdist = { url = "https://files.pythonhosted.org/packages/3e/92/2d038d096f29179c7c9571b431f9e739f87a487121901725e23fe338dd9d/openai_harmony-0.0.8.tar.gz", hash = "sha256:6e43f98e6c242fa2de6f8ea12eab24af63fa2ed3e89c06341fb9d92632c5cbdf", size = 284777, upload-time = "2025-11-05T19:07:06.727Z" }
-wheels = [
-    { url = "https://files.pythonhosted.org/packages/45/c6/2502f416d46be3ec08bb66d696cccffb57781a499e3ff2e4d7c174af4e8f/openai_harmony-0.0.8-cp38-abi3-macosx_11_0_arm64.whl", hash = "sha256:029ec25ca74abe48fdb58eb9fdd2a8c1618581fc33ce8e5653f8a1ffbfbd9326", size = 2627806, upload-time = "2025-11-05T19:06:57.063Z" },
-    { url = "https://files.pythonhosted.org/packages/d3/d2/ce6953ca87db9cae3e775024184da7d1c5cb88cead19a2d75b42f00a959c/openai_harmony-0.0.8-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:e4f709815924ec325b9a890e6ab2bbb0ceec8e319a4e257328eb752cf36b2efc", size = 2948463, upload-time = "2025-11-05T19:06:48.17Z" },
-    { url = "https://files.pythonhosted.org/packages/fa/4c/b553c9651662d6ce102ca7f3629d268b23df1abe5841e24bed81e8a8e949/openai_harmony-0.0.8-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:5cfcfd963b50a41fc656c84d3440ca6eecdccd6c552158ce790b8f2e33dfb5a9", size = 2704083, upload-time = "2025-11-05T19:06:50.205Z" },
-    { url = "https://files.pythonhosted.org/packages/9b/af/4eec8f9ab9c27bcdb444460c72cf43011d176fc44c79d6e113094ca1e152/openai_harmony-0.0.8-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:0a3a16972aa1cee38ea958470cd04ac9a2d5ac38fdcf77ab686611246220c158", size = 2959765, upload-time = "2025-11-05T19:06:53.62Z" },
-    { url = "https://files.pythonhosted.org/packages/11/3c/33f3374e4624e0e776f6b13b73c45a7ead7f9c4529f8369ed5bfcaa30cac/openai_harmony-0.0.8-cp38-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:b4d5cfa168e74d08f8ba6d58a7e49bc7daef4d58951ec69b66b0d56f4927a68d", size = 3427031, upload-time = "2025-11-05T19:06:51.829Z" },
-    { url = "https://files.pythonhosted.org/packages/25/3f/1a192b93bb47c6b44cd98ba8cc1d3d2a9308f1bb700c3017e6352da11bda/openai_harmony-0.0.8-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c007d277218a50db8839e599ed78e0fffe5130f614c3f6d93ae257f282071a29", size = 2953260, upload-time = "2025-11-05T19:06:55.406Z" },
-    { url = "https://files.pythonhosted.org/packages/5b/f8/93b582cad3531797c3db7c2db5400fd841538ccddfd9f5e3df61be99a630/openai_harmony-0.0.8-cp38-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:8565d4f5a0638da1bffde29832ed63c9e695c558611053add3b2dc0b56c92dbc", size = 3127044, upload-time = "2025-11-05T19:06:59.553Z" },
-    { url = "https://files.pythonhosted.org/packages/1d/10/4327dbf87f75ae813405fd9a9b4a5cde63d506ffed0a096a440a4cabd89c/openai_harmony-0.0.8-cp38-abi3-musllinux_1_2_armv7l.whl", hash = "sha256:cbaa3bda75ef0d8836e1f8cc84af62f971b1d756d740efc95c38c3e04c0bfde2", size = 2932931, upload-time = "2025-11-05T19:07:01.437Z" },
-    { url = "https://files.pythonhosted.org/packages/8a/c8/1774eec4f6f360ef57618fb8f52e3d3af245b2491bd0297513aa09eec04b/openai_harmony-0.0.8-cp38-abi3-musllinux_1_2_i686.whl", hash = "sha256:772922a9bd24e133950fad71eb1550836f415a88e8c77870e12d0c3bd688ddc2", size = 2996140, upload-time = "2025-11-05T19:07:03.438Z" },
-    { url = "https://files.pythonhosted.org/packages/60/c3/3d1e01e2dba517a91760e4a03e4f20ffc75039a6fe584d0e6f9b5c78fd15/openai_harmony-0.0.8-cp38-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:007b0476a1f331f8130783f901f1da6f5a7057af1a4891f1b6a31dec364189b5", size = 3205080, upload-time = "2025-11-05T19:07:05.078Z" },
-]
-
 [[package]]
 name = "packaging"
 version = "25.0"
Author	SHA1	Message	Date
Jake Hillion	305b02eeb6	localhost ring fixes	2025-12-23 23:32:04 +00:00
Jake Hillion	a25fd21b49	tmp: add lots of RING3DBG logging	2025-12-23 22:52:40 +00:00
Jake Hillion	9e0c1ac8c8	placement: generate per-node host lists for MLX ring backend Pipeline + MLX Ring worked with 2 nodes but failed to initialize with 3 or more nodes. The MLX ring backend requires each node to know its specific left and right neighbors in the ring, but the previous implementation provided a single flat host list shared by all nodes. With 2 nodes, a flat list [host0, host1] accidentally worked because each node could find its only neighbor. With 3+ nodes, each node needs a customized view: - Rank 0: [self, right_neighbor, placeholder] - Rank 1: [left_neighbor, self, right_neighbor] - Rank 2: [placeholder, left_neighbor, self] Changed MlxRingInstance from `hosts: list[Host]` to `hosts_by_node: dict[NodeId, list[Host]]` with `ephemeral_port: int`. Added `get_mlx_ring_hosts_by_node()` which generates per-node host lists where: - Self position uses 0.0.0.0 for local binding - Left/right neighbors use actual connection IPs - Non-neighbors use 198.51.100.1 (RFC 5737 TEST-NET-2 placeholder) Also added IP prioritization (en0 > en1 > non-Thunderbolt > any) to prefer stable network interfaces.	2025-12-23 22:26:25 +00:00
Jake Hillion	6e76212cac	mlx: update to 0.30.1 and align coordinator naming with MLX conventions The Jaccl distributed backend requires MLX 0.30.1+, which includes the RDMA over Thunderbolt support. The previous minimum version (0.29.3) would fail at runtime with "The only valid values for backend are 'any', 'mpi' and 'ring' but 'jaccl' was provided." Bump MLX dependency to >=0.30.1 and rename ibv_coordinators to jaccl_coordinators to match MLX's naming conventions. This includes the environment variable change from MLX_IBV_COORDINATOR to MLX_JACCL_COORDINATOR. Test plan: Hardware setup: 3x Mac Studio M3 Ultra connected all-to-all with TB5 - Built a DMG [0] - Installed on all Macs and started cluster. - Requested a 2 node Tensor + MLX RDMA instance of Llama 3.3 70B (FP16). - It started successfully. - Queried the chat a few times. All was good. This didn't work previously. - Killed the instance and spawned Pipeline + MLX Ring Llama 3.3 70B (FP16). Also started succesfully on two nodes and could be queried. Still not working: - Pipeline + MLX Ring on 3 nodes is failing. Haven't debugged that yet. [0] https://github.com/exo-explore/exo/actions/runs/20467656904/job/58815275013	2025-12-23 19:28:42 +00:00