add glm-47 and deepseek-v32

docs: add AGENTS.md for AI coding agents guidance (#1132 )
## Motivation Add documentation to help AI coding agents (Claude Code, Cursor, GitHub Copilot, etc.) understand the exo codebase and contribute effectively. ## Changes - Add `AGENTS.md` with guidance for AI agents working on the codebase - Add symlink `CLAUDE.md -> AGENTS.md` for backwards compatibility with Claude Code ## Why It Works `AGENTS.md` is becoming a standard convention for AI agent instructions. The symlink ensures Claude Code (which looks for `CLAUDE.md`) continues to work while supporting the broader `AGENTS.md` convention. ## Test Plan ### Manual Testing - Verified symlink works correctly ### Automated Testing - N/A (documentation only) 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 08:29:21 -05:00 · 2026-01-13 13:24:25 +00:00 · 2026-01-13 13:05:47 +00:00 · 2026-01-13 12:06:04 +00:00 · 2026-01-13 12:42:04 +01:00 · 2026-01-13 12:37:12 +01:00
84 changed files with 5279 additions and 6365 deletions
--- a/.github/benchmark-dashboard/README.md
+++ b/.github/benchmark-dashboard/README.md
@@ -1,159 +0,0 @@
-# EXO Benchmark Dashboard
-
-A fully self-contained, browser-based dashboard for tracking EXO benchmark performance over time.
-
-## Features
-
- 📊 **Success Rate Tracking**: Monitor cluster reliability across commits
- ⚡ **Response Time Analysis**: Track average request completion times  
- 🎯 **Throughput Metrics**: Tokens per second visualization
- 📈 **Request Distribution**: Success/failure breakdown over time
- 🔄 **Auto-Refresh**: Updates every 60 seconds
- 📺 **TV-Ready**: Large, clear visualizations perfect for display
- 🔐 **Secure**: Credentials stored in browser localStorage only
- 🌐 **No Backend**: Directly accesses S3 from the browser
-
-## Quick Start
-
-### Option 1: Direct File Access (Simplest)
-
-Just open the HTML file directly in your browser:
-
-```bash
-open .github/benchmark-dashboard/index.html
-```
-
-Then click "Configure AWS Credentials" and enter your keys.
-
-### Option 2: URL Parameters (For Quick Setup)
-
-```bash
-# Serve with credentials in URL (they'll be moved to localStorage)
-open ".github/benchmark-dashboard/index.html?accessKey=YOUR_KEY&secretKey=YOUR_SECRET&region=us-east-1"
-```
-
-The credentials will be saved to localStorage and removed from the URL immediately.
-
-### Option 3: Simple HTTP Server
-
-```bash
-# From repo root
-python3 -m http.server 8080
-
-# Then open: http://localhost:8080/.github/benchmark-dashboard/
-```
-
-## AWS Credentials
-
-The dashboard needs read-only access to the `exo-benchmark-results` S3 bucket.
-
-### Required IAM Permissions
-
-```json
-{
-  "Version": "2012-10-17",
-  "Statement": [
-    {
-      "Effect": "Allow",
-      "Action": [
-        "s3:GetObject",
-        "s3:ListBucket"
-      ],
-      "Resource": [
-        "arn:aws:s3:::exo-benchmark-results",
-        "arn:aws:s3:::exo-benchmark-results/*"
-      ]
-    }
-  ]
-}
-```
-
-### Security Notes
-
- ✅ Credentials stored in browser `localStorage` only
- ✅ Never sent to any server (except AWS)
- ✅ All S3 access happens client-side
- ✅ Use read-only IAM credentials
- ⚠️ Don't commit credentials to git
- ⚠️ Use a dedicated read-only IAM user
-
-## TV/Kiosk Mode
-
-For permanent display on a TV:
-
-### macOS
-```bash
-open -a "Google Chrome" --args --kiosk ".github/benchmark-dashboard/index.html"
-```
-
-### Linux
-```bash
-chromium-browser --kiosk --app="file://$(pwd)/.github/benchmark-dashboard/index.html"
-```
-
-### Auto-start on Boot
-
-Create a simple startup script:
-
-```bash
-#!/bin/bash
-# /usr/local/bin/start-benchmark-dashboard.sh
-
-cd /path/to/exo
-python3 -m http.server 8080 &
-sleep 2
-chromium-browser --kiosk http://localhost:8080/.github/benchmark-dashboard/
-```
-
-## Data Displayed
-
-### Summary Cards
- **Latest Success Rate**: Most recent benchmark success percentage with trend
- **Avg Response Time**: Latest average response time in ms with trend
- **Total Benchmarks**: Count of all benchmarks run
- **Active Configurations**: Number of unique benchmark configs
-
-### Charts
-1. **Success Rate Over Time**: Line chart showing reliability trends
-2. **Average Response Time**: Performance over time (lower is better)
-3. **Throughput**: Tokens/second metric (higher is better)
-4. **Request Distribution**: Stacked bar chart of successes/failures
-
-## How It Works
-
-1. **Loads AWS SDK**: Uses AWS SDK for JavaScript (browser version)
-2. **Lists S3 Objects**: Fetches all files from `s3://exo-benchmark-results/bench/`
-3. **Downloads Results**: Fetches each JSON result file
-4. **Parses & Visualizes**: Uses Chart.js to create interactive charts
-5. **Auto-Refreshes**: Polls S3 every 60 seconds for new results
-
-## Customization
-
-To modify the dashboard:
-
-1. Edit `index.html` 
-2. Adjust `REFRESH_INTERVAL` for different polling frequency
-3. Modify chart colors/styles in the Chart.js configuration
-4. Add new metrics by extending the results parsing
-
-## Troubleshooting
-
-**"AWS credentials not configured"**
- Click "Configure AWS Credentials" and enter your keys
-
-**"Error loading benchmark data"**
- Check AWS credentials are correct
- Verify S3 bucket name is `exo-benchmark-results`
- Ensure IAM user has read permissions
- Check browser console for detailed errors
-
-**"No benchmark results found"**
- Wait for benchmark workflows to run
- Verify results are being uploaded to S3
- Check S3 bucket has files in `bench/` prefix
-
-**Charts not updating**
- Check browser console for errors
- Verify network connectivity to S3
- Try refreshing the page manually
-
--- a/.github/benchmark-dashboard/index.html
+++ b/.github/benchmark-dashboard/index.html
--- a/.github/configs/README.md
+++ b/.github/configs/README.md
@@ -1,186 +0,0 @@
-# EXO Benchmark Configurations
-
-This directory contains configuration files for the EXO staged benchmark system.
-
-## Overview
-
-The staged benchmark system allows you to run complex, multi-stage load tests against EXO clusters. Each stage can have different characteristics:
-
- **Prompt Length**: Number of tokens in the input prompt
- **Generation Length**: Maximum tokens to generate in the response
- **Time Between Requests**: Delay (in seconds) between firing consecutive requests
- **Iterations**: Number of requests to send in this stage
-
-Requests are **fire-and-forget** - they don't wait for the previous request to complete. This allows you to test overlapping request handling and measure success rates under load.
-
-## Configuration Files
-
-### `bench_simple.yaml`
-A minimal configuration that replicates the behavior of the original `bench.py` script:
- Single stage with 1 iteration
- Short prompt (~20 tokens)
- Generates up to 100 tokens
-
-This is useful for quick smoke tests.
-
-### `bench_config.yaml`
-A comprehensive multi-stage benchmark with:
-1. **Warmup** (10 requests): Light load with short prompts
-2. **Medium Load** (20 requests): Moderate load with medium prompts
-3. **Stress Test** (30 requests): Heavy overlapping requests with long prompts
-4. **Cooldown** (5 requests): Light load to wind down
-
-This tests the cluster's behavior under varying load patterns.
-
-## Configuration Schema
-
-```yaml
-# Hardware configuration - maps runner labels to instance counts
-hardware_plan:
-  M3ULTRA_GPU80_512GB: 4
-
-# Environment variables to set on each node (optional)
-environment:
-  OVERRIDE_MEMORY_MB: 512
-
-# Timeout for instance and runner readiness (seconds)
-timeout_seconds: 600
-
-# Model instances to run concurrently
-model_ids:
-  - "mlx-community/Llama-3.2-1B-Instruct-4bit"
-
-# Benchmark stages
-stages:
-  - name: "stage_name"              # Human-readable name for this stage
-    prompt_length: 100               # Target prompt length in tokens
-    generation_length: 200           # Max tokens to generate
-    time_between_requests: 2.0       # Seconds between firing requests
-    iterations: 10                   # Number of requests in this stage
-```
-
-## Running Benchmarks
-
-### Via GitHub Actions
-
-**Automatic (every commit):**
- The **`bench`** workflow runs automatically on every push
- Uses `bench_simple.yaml` as the default configuration
- All settings (hardware plan, timeout, environment variables, models, stages) are defined in the config file
-
-**Manual (on-demand):**
-1. Go to **Actions** → **bench** workflow
-2. Click **Run workflow**
-3. Configure:
-   - **Config File**: Path to your YAML config (default: `.github/configs/bench_simple.yaml`)
-     - `.github/configs/bench_simple.yaml` for quick tests
-     - `.github/configs/bench_config.yaml` for complex multi-stage tests
-   
-All other settings (hardware plan, timeout, environment variables, models, stages) are read from the specified config file.
-
-### Via Command Line
-
-```bash
-# Start EXO on localhost:8000
-uv run exo --api-port 8000
-
-# Run simple benchmark (1 stage, 1 iteration)
-python3 .github/scripts/bench.py \
-  --api-port 8000 \
-  --config .github/configs/bench_simple.yaml \
-  --expected-nodes 1 \
-  --is-primary true \
-  --timeout-seconds 600
-
-# Run complex staged benchmark (4 stages, multiple iterations)
-python3 .github/scripts/bench.py \
-  --api-port 8000 \
-  --config .github/configs/bench_config.yaml \
-  --expected-nodes 1 \
-  --is-primary true \
-  --timeout-seconds 600
-```
-
-## Output Metrics
-
-For each stage, the benchmark reports:
-
- **Total Requests**: Number of requests fired
- **Successful Requests**: Requests that completed successfully
- **Failed Requests**: Requests that encountered errors
- **Success Rate**: Percentage of successful requests
- **Total Tokens**: Sum of all tokens generated across successful requests
- **Avg Tokens/Request**: Average tokens per successful request
- **Avg Time/Request**: Average completion time per successful request
-
-A JSON summary is also printed for easy parsing and storage.
-
-## Creating Custom Benchmarks
-
-To create a custom benchmark:
-
-1. Copy an existing config file (e.g., `bench_config.yaml`)
-2. Modify the stages to match your test scenario
-3. Save it in this directory with a descriptive name
-4. Run it using the workflow or command line
-
-### Example: Sustained Load Test
-
-```yaml
-hardware_plan:
-  M3ULTRA_GPU80_512GB: 2
-
-environment:
-  OVERRIDE_MEMORY_MB: 1024
-
-timeout_seconds: 600
-
-model_ids:
-  - "mlx-community/Llama-3.2-1B-Instruct-4bit"
-
-stages:
-  - name: "sustained_load"
-    prompt_length: 200
-    generation_length: 150
-    time_between_requests: 0.5     # Very fast - 2 requests/second
-    iterations: 100                 # Run for ~50 seconds
-```
-
-### Example: Varying Prompt Sizes
-
-```yaml
-hardware_plan:
-  M4PRO_GPU16_24GB: 3
-
-timeout_seconds: 900
-
-model_ids:
-  - "mlx-community/Llama-3.2-1B-Instruct-4bit"
-
-stages:
-  - name: "tiny_prompts"
-    prompt_length: 10
-    generation_length: 100
-    time_between_requests: 1.0
-    iterations: 10
-    
-  - name: "medium_prompts"
-    prompt_length: 200
-    generation_length: 100
-    time_between_requests: 1.0
-    iterations: 10
-    
-  - name: "large_prompts"
-    prompt_length: 1000
-    generation_length: 100
-    time_between_requests: 1.0
-    iterations: 10
-```
-
-## Tips
-
- **Overlapping Requests**: Set `time_between_requests` < expected completion time to test concurrent request handling
- **Sequential Requests**: Set `time_between_requests` > expected completion time to ensure requests don't overlap
- **Realistic Load**: Model real usage patterns by varying prompt/generation lengths across stages
- **Success Rate**: A 100% success rate indicates the cluster handled the load well; lower rates suggest capacity limits
-
--- a/.github/configs/bench_config.yaml
+++ b/.github/configs/bench_config.yaml
@@ -1,49 +0,0 @@
-# EXO Staged Benchmark Configuration
-# This configuration defines a multi-stage load test for EXO clusters
-
-# Hardware configuration - maps runner labels to instance counts
-hardware_plan:
-  M3ULTRA_GPU80_512GB: 4
-
-# Environment variables to set on each node (optional)
-environment:
-  OVERRIDE_MEMORY_MB: 512
-
-# Timeout for instance and runner readiness (seconds)
-timeout_seconds: 600
-
-# Multiple instances run concurrently on the cluster
-model_ids:
-  - "mlx-community/Qwen3-0.6B-4bit"
-  - "mlx-community/Qwen3-0.6B-4bit"
-
-# Stages run sequentially, each with its own characteristics
-stages:
-  # Stage 1: Light load with short prompts
-  - name: "warmup"
-    prompt_length: 50          # Number of tokens in prompt
-    generation_length: 100     # Max tokens to generate
-    time_between_requests: 5.0 # Seconds between firing requests
-    iterations: 10             # Number of requests to send in this stage
-    
-  # Stage 2: Medium load with medium prompts
-  - name: "medium_load"
-    prompt_length: 200
-    generation_length: 150
-    time_between_requests: 3.0
-    iterations: 20
-    
-  # Stage 3: Heavy load with long prompts - requests will overlap
-  - name: "stress_test"
-    prompt_length: 500
-    generation_length: 200
-    time_between_requests: 1.0  # Fast firing - will definitely overlap
-    iterations: 30
-    
-  # Stage 4: Cool down with simple prompts
-  - name: "cooldown"
-    prompt_length: 50
-    generation_length: 50
-    time_between_requests: 10.0
-    iterations: 5
-
--- a/.github/configs/bench_simple.yaml
+++ b/.github/configs/bench_simple.yaml
@@ -1,125 +0,0 @@
-# Simple single-shot benchmark
-# Tests 2 instances concurrently on 2 nodes
-
-# Hardware configuration - maps runner labels to instance counts
-hardware_plan:
-  puffin4: 1
-  puffin8: 1
-
-# Environment variables to set on each node
-environment:
-  PLACEHOLDER: "placeholder"
-  # OVERRIDE_MEMORY_MB: 50000
-  MLX_METAL_FAST_SYNCH: 1
-
-# Timeout for instance and runner readiness (seconds)
-timeout_seconds: 1800
-
-# Model instances to run concurrently
-model_ids:
-  # - "mlx-community/DeepSeek-V3.1-8bit"
-  # - "mlx-community/Kimi-K2-Instruct-4bit"
-  - "mlx-community/Kimi-K2-Thinking"
-  # - "mlx-community/Qwen3-235B-A22B-4bit"
-  # - "mlx-community/Llama-3.3-70B-Instruct-4bit"
-  # - "mlx-community/Llama-3.3-70B-Instruct-8bit"
-  # - "mlx-community/Llama-3.2-1B-Instruct-4bit"
-
-# Sharding strategy: "Pipeline" or "Tensor"
-sharding: "Tensor"
-
-# Instance type: "MlxRing" or "MlxIbv"
-instance_meta: "MlxIbv"
-
-# If true, run requests sequentially (no overlap); if false, fire-and-forget (default: false)
-no_overlap: true
-
-# Benchmark stages
-# pp: 64, 256, 1024, 2048, 4096, 8192, 16384
-# g: 64, 512
-stages:
-  # - name: "simple"
-  #   prompt_length: 512
-  #   generation_length: 10
-  #   time_between_requests: 2.0
-  #   iterations: 5
-  # - name: "pp64_g64"
-  #   prompt_length: 64
-  #   generation_length: 64
-  #   time_between_requests: 2.0
-  #   iterations: 5
-  # - name: "pp64_g64"
-  #   prompt_length: 64
-  #   generation_length: 64
-  #   time_between_requests: 2.0
-  #   iterations: 5
-  # - name: "pp64_g512"
-  #   prompt_length: 64
-  #   generation_length: 512
-  #   time_between_requests: 2.0
-  #   iterations: 10
-  # - name: "pp256_g64"
-  #   prompt_length: 256
-  #   generation_length: 64
-  #   time_between_requests: 2.0
-  #   iterations: 5
-  - name: "pp256_g64"
-    prompt_length: 256
-    generation_length: 64
-    time_between_requests: 2.0
-    iterations: 5
-  # - name: "pp256_g512"
-  #   prompt_length: 256
-  #   generation_length: 512
-  #   time_between_requests: 2.0
-  #   iterations: 10
-  # - name: "pp1024_g64"
-  #   prompt_length: 1024
-  #   generation_length: 64
-  #   time_between_requests: 2.0
-  #   iterations: 5
-  # - name: "pp1024_g512"
-  #   prompt_length: 1024
-  #   generation_length: 512
-  #   time_between_requests: 2.0
-  #   iterations: 10
-  # - name: "pp2048_g64"
-  #   prompt_length: 2048
-  #   generation_length: 64
-  #   time_between_requests: 2.0
-  #   iterations: 5
-  # - name: "pp2048_g512"
-  #   prompt_length: 2048
-  #   generation_length: 512
-  #   time_between_requests: 2.0
-  #   iterations: 10
-  # - name: "pp4096_g64"
-  #   prompt_length: 4096
-  #   generation_length: 64
-  #   time_between_requests: 2.0
-  #   iterations: 4
-  # - name: "pp4096_g512"
-  #   prompt_length: 4096
-  #   generation_length: 512
-  #   time_between_requests: 2.0
-  #   iterations: 10
-  # - name: "pp8192_g64"
-  #   prompt_length: 8192
-  #   generation_length: 64
-  #   time_between_requests: 2.0
-  #   iterations: 5
-  # - name: "pp8192_g512"
-  #   prompt_length: 8192
-  #   generation_length: 512
-  #   time_between_requests: 2.0
-  #   iterations: 5
-  # - name: "pp16384_g64"
-  #   prompt_length: 16384
-  #   generation_length: 64
-  #   time_between_requests: 2.0
-  #   iterations: 10
-  # - name: "pp16384_g512"
-  #   prompt_length: 16384
-  #   generation_length: 512
-  #   time_between_requests: 2.0
-  #   iterations: 10
--- a/.github/scripts/bench.py
+++ b/.github/scripts/bench.py
--- a/.github/scripts/build_matrix.py
+++ b/.github/scripts/build_matrix.py
@@ -1,70 +0,0 @@
-#!/usr/bin/env python3
-import json
-import os
-from typing import NotRequired, TypedDict, cast
-
-import yaml
-
-
-class MatrixEntry(TypedDict):
-    label: str
-    index: int
-
-
-class MatrixInclude(TypedDict):
-    label: str
-    index: int
-    is_primary: bool
-    expected_nodes: int
-
-
-class Config(TypedDict):
-    hardware_plan: dict[str, int]
-    timeout_seconds: NotRequired[int]
-    environment: NotRequired[dict[str, str]]
-
-
-# Read the config file
-config_file: str = os.environ["CONFIG_FILE"]
-with open(config_file, "r") as f:
-    config: Config = cast(Config, yaml.safe_load(f))
-
-# Extract hardware plan from config
-plan: dict[str, int] = config["hardware_plan"]
-if not plan:
-    raise ValueError(f"No hardware_plan found in {config_file}")
-
-# Build matrix entries
-entries: list[MatrixEntry] = []
-for label, count in plan.items():
-    for idx in range(count):
-        entries.append({"label": label, "index": idx})
-
-total_nodes: int = len(entries)
-matrix: dict[str, list[MatrixInclude]] = {
-    "include": [
-        {
-            "label": e["label"],
-            "index": e["index"],
-            "is_primary": (i == 0),
-            "expected_nodes": total_nodes,
-        }
-        for i, e in enumerate(entries)
-    ]
-}
-
-# Extract other config values
-timeout_seconds: int = config.get("timeout_seconds", 600)
-environment: dict[str, str] = config.get("environment", {})
-
-# Output to GitHub Actions
-with open(os.environ["GITHUB_OUTPUT"], "a") as f:
-    f.write(f"matrix={json.dumps(matrix)}\n")
-    f.write(f"config_file={config_file}\n")
-    f.write(f"timeout_seconds={timeout_seconds}\n")
-    f.write(f"environment={json.dumps(environment)}\n")
-
-print(f"Matrix: {json.dumps(matrix)}")
-print(f"Config file: {config_file}")
-print(f"Timeout: {timeout_seconds}")
-print(f"Environment: {json.dumps(environment)}")
--- a/.github/workflows/BENCH_USAGE.md
+++ b/.github/workflows/BENCH_USAGE.md
@@ -1,156 +0,0 @@
-# Benchmark Workflow Usage
-
-## Overview
-
-The `bench_matrix.yml` workflow enables distributed benchmarking of models across multiple self-hosted macOS runners with different hardware configurations.
-
-## Workflow Inputs
-
-| Input | Description | Default | Required |
-|-------|-------------|---------|----------|
-| `model_id` | Model ID to benchmark | `mlx-community/Llama-3.2-1B-Instruct-4bit` | Yes |
-| `hardware_plan` | JSON mapping of runner labels to counts | `{"M4PRO_GPU16_24GB": 1}` | Yes |
-| `prompt` | Benchmark prompt text | `What is the capital of France?` | No |
-| `timeout_seconds` | Timeout for instance/runner readiness | `600` | No |
-
-## Hardware Plan Format
-
-The `hardware_plan` input is a JSON object mapping runner labels to the number of machines:
-
-```json
-{
-  "M4PRO_GPU16_24GB": 2,
-  "M3ULTRA_GPU80_512GB": 1
-}
-```
-
-This example would:
- Start 2 runners with the `M4PRO_GPU16_24GB` label
- Start 1 runner with the `M3ULTRA_GPU80_512GB` label
- Total of 3 runners coordinating on a single distributed inference instance
-
-## How It Works
-
-1. **Planning Job** (`plan`)
-   - Runs on `ubuntu-latest`
-   - Parses the `hardware_plan` JSON
-   - Generates a dynamic matrix with one entry per runner
-   - Only the first runner (index 0) is marked as `is_primary`
-
-2. **Benchmark Worker Jobs** (`bench_worker`)
-   - Each job runs on a self-hosted macOS runner with the specified label
-   - All runners start EXO in parallel
-   - The primary runner creates the model instance
-   - All runners wait for their assigned runner to be ready (Loaded/Running status)
-   - The primary runner executes the benchmark and prints results
-   - The primary runner deletes the instance
-
-## Example Usage
-
-### Single Machine Benchmark
-
-```yaml
-model_id: mlx-community/Llama-3.2-1B-Instruct-4bit
-hardware_plan: '{"M4PRO_GPU16_24GB": 1}'
-prompt: What is the capital of France?
-timeout_seconds: 600
-```
-
-### Multi-Machine Distributed Benchmark
-
-```yaml
-model_id: mlx-community/Llama-3.2-3B-Instruct-4bit
-hardware_plan: '{"M4PRO_GPU16_24GB": 2, "M3ULTRA_GPU80_512GB": 1}'
-prompt: Explain quantum computing in simple terms.
-timeout_seconds: 900
-```
-
-## Benchmark Output
-
-The primary runner outputs a JSON object with benchmark results:
-
-```json
-{
-  "model_id": "mlx-community/Llama-3.2-1B-Instruct-4bit",
-  "instance_id": "abc-123-def",
-  "tokens": 42,
-  "elapsed_s": 2.451,
-  "tps": 17.136
-}
-```
-
-Where:
- `tokens`: Number of chunks/tokens generated
- `elapsed_s`: Total elapsed time in seconds
- `tps`: Tokens per second (tokens / elapsed_s)
-
-## Runner Requirements
-
-Each self-hosted runner must:
- Be labeled with appropriate hardware tags (e.g., `M4PRO_GPU16_24GB`)
- Have the `self-hosted` and `macOS` labels
- Have Nix installed with flakes enabled
- Have network connectivity to other runners in the same job
-
-## Architecture
-
-```
-┌─────────────────────────────────────────────────────────────┐
-│ GitHub Actions Workflow (bench_matrix.yml)                  │
-├─────────────────────────────────────────────────────────────┤
-│                                                              │
-│  ┌────────────────┐                                         │
-│  │  Plan Job      │                                         │
-│  │  (ubuntu)      │──┬─► Matrix: [{label, index, primary}] │
-│  └────────────────┘  │                                      │
-│                      │                                      │
-│  ┌───────────────────▼──────────────────────────────────┐  │
-│  │  Bench Worker Jobs (Matrix)                         │  │
-│  ├──────────────────────────────────────────────────────┤  │
-│  │                                                       │  │
-│  │  Runner 0 (Primary)     Runner 1         Runner 2    │  │
-│  │  ┌─────────────┐       ┌─────────────┐ ┌──────────┐ │  │
-│  │  │ Start EXO   │       │ Start EXO   │ │ Start EXO│ │  │
-│  │  │ Create Inst │       │ Wait...     │ │ Wait...  │ │  │
-│  │  │ Wait Ready  │       │ Wait Ready  │ │ Wait...  │ │  │
-│  │  │ Run Bench   │       │ (idle)      │ │ (idle)   │ │  │
-│  │  │ Print TPS   │       │             │ │          │ │  │
-│  │  │ Delete Inst │       │             │ │          │ │  │
-│  │  └─────────────┘       └─────────────┘ └──────────┘ │  │
-│  └───────────────────────────────────────────────────────┘  │
-└─────────────────────────────────────────────────────────────┘
-```
-
-## Implementation Details
-
-### `scripts/bench.py`
-
-A standalone Python script that:
- Creates instance (primary only)
- Polls `/state` endpoint until instance and all runners are ready
- Executes chat completion with timing (primary only)
- Parses SSE stream and counts tokens
- Computes TPS metrics
- Cleans up instance (primary only)
-
-### Key Functions
-
- `wait_for_instance()`: Polls until instance with model_id appears
- `wait_for_runners_ready()`: Polls until expected number of runners reach Loaded/Running status
- `run_benchmark()`: Executes chat completion, measures time, counts tokens
-
-## Troubleshooting
-
-### Instance never becomes ready
- Check EXO logs in the workflow output
- Verify model_id is valid and accessible
- Increase `timeout_seconds`
-
-### Runner mismatch
- Ensure hardware_plan counts match available labeled runners
- Check runner labels match exactly (case-sensitive)
-
-### Network issues
- Verify runners can communicate on the network
- Check firewall rules between runner hosts
-
--- a/.github/workflows/bench.yml
+++ b/.github/workflows/bench.yml
@@ -1,305 +0,0 @@
-name: bench
-
-on: [push]
-
-jobs:
-  plan:
-    if: contains(github.event.head_commit.message, '/bench')
-    runs-on: ubuntu-latest
-    outputs:
-      matrix: ${{ steps.build.outputs.matrix }}
-      config_file: ${{ steps.build.outputs.config_file }}
-      timeout_seconds: ${{ steps.build.outputs.timeout_seconds }}
-      environment: ${{ steps.build.outputs.environment }}
-    steps:
-      - name: Checkout repository
-        uses: actions/checkout@v4
-
-      - name: Build matrix from config file
-        id: build
-        shell: bash
-        run: |
-          set -euo pipefail
-          CONFIG_FILE='.github/configs/bench_simple.yaml'
-          export CONFIG_FILE
-          echo "Config file: $CONFIG_FILE"
-          python3 .github/scripts/build_matrix.py
-
-  bench_worker:
-    needs: plan
-    strategy:
-      fail-fast: false
-      matrix: ${{ fromJSON(needs.plan.outputs.matrix) }}
-    name: "bench on ${{ matrix.label }} [${{ matrix.index }}]"
-    runs-on: [self-hosted, macOS, "${{ matrix.label }}"]
-    steps:
-      - name: Checkout repository
-        uses: actions/checkout@v4
-        with:
-          lfs: false
-
-      - name: Configure git user
-        run: |
-          git config --local user.email "github-actions@users.noreply.github.com"
-          git config --local user.name  "github-actions bot"
-        shell: bash
-
-      # TODO: this is mega hacky and I'd like a simpler solution.
-      - name: Setup Nix Environment
-        run: |
-          echo "Checking for nix installation..."
-          
-          # Check if nix is already available
-          if command -v nix >/dev/null 2>&1; then
-            echo "Nix already in PATH"
-          # Try sourcing profile scripts to set up environment properly
-          elif [ -f /nix/var/nix/profiles/default/etc/profile.d/nix-daemon.sh ]; then
-            echo "Sourcing multi-user nix-daemon profile script"
-            source /nix/var/nix/profiles/default/etc/profile.d/nix-daemon.sh
-          elif [ -f "$HOME/.nix-profile/etc/profile.d/nix.sh" ]; then
-            echo "Sourcing single-user nix profile script"
-            source "$HOME/.nix-profile/etc/profile.d/nix.sh"
-          elif [ -f /nix/var/nix/profiles/per-user/$USER/profile/etc/profile.d/nix.sh ]; then
-            echo "Sourcing per-user nix profile script"
-            source /nix/var/nix/profiles/per-user/$USER/profile/etc/profile.d/nix.sh
-          elif [ -f /etc/profile.d/nix.sh ]; then
-            echo "Sourcing system-wide nix profile script"
-            source /etc/profile.d/nix.sh
-          # Fallback: manually add nix to PATH if binary exists
-          elif [ -f /nix/var/nix/profiles/default/bin/nix ]; then
-            echo "Found nix binary, manually adding to PATH"
-            export PATH="/nix/var/nix/profiles/default/bin:$PATH"
-          elif [ -f "$HOME/.nix-profile/bin/nix" ]; then
-            echo "Found nix binary in user profile, manually adding to PATH"
-            export PATH="$HOME/.nix-profile/bin:$PATH"
-          else
-            echo "Nix not found. Debugging info:"
-            echo "USER: $USER"
-            echo "HOME: $HOME"
-            echo "Current PATH: $PATH"
-            echo ""
-            echo "Checking common Nix locations:"
-            echo "  /nix/var/nix/profiles/default/bin/nix:"
-            ls -la /nix/var/nix/profiles/default/bin/nix 2>/dev/null || echo "    Not found"
-            echo "  /nix/var/nix/profiles/default/etc/profile.d/nix-daemon.sh:"
-            ls -la /nix/var/nix/profiles/default/etc/profile.d/nix-daemon.sh 2>/dev/null || echo "    Not found"
-            echo "  ~/.nix-profile/etc/profile.d/nix.sh:"
-            ls -la "$HOME/.nix-profile/etc/profile.d/nix.sh" 2>/dev/null || echo "    Not found"
-            echo "  /nix/var/nix/profiles/per-user/$USER/profile/etc/profile.d/nix.sh:"
-            ls -la "/nix/var/nix/profiles/per-user/$USER/profile/etc/profile.d/nix.sh" 2>/dev/null || echo "    Not found"
-            echo ""
-            echo "/nix directory structure:"
-            ls -la /nix 2>/dev/null || echo "    /nix directory not found"
-            echo ""
-            echo "/nix/var:"
-            ls -la /nix/var 2>/dev/null || echo "    /nix/var not found"
-            echo ""
-            echo "/nix/store:"
-            ls -la /nix/store 2>/dev/null | head -20 || echo "    /nix/store not found"
-            echo ""
-            echo "GitHub Actions runner is running as user '$USER'."
-            echo "If Nix is installed for a different user, either:"
-            echo "  1. Install Nix for user '$USER' (multi-user install recommended)"
-            echo "  2. Configure the runner service to run as the user with Nix installed"
-            echo "  3. Ensure Nix is installed system-wide with proper daemon setup"
-            exit 1
-          fi
-          
-          # Verify nix is available and persist to GITHUB_ENV
-          if command -v nix >/dev/null 2>&1; then
-            echo "✓ Nix is available"
-            nix --version
-            echo "PATH=$PATH" >> $GITHUB_ENV
-            if [ -n "$NIX_PATH" ]; then
-              echo "NIX_PATH=$NIX_PATH" >> $GITHUB_ENV
-            fi
-          else
-            echo "ERROR: Failed to set up Nix"
-            echo "PATH after setup attempt: $PATH"
-            exit 1
-          fi
-        shell: bash
-
-      - name: Setup EXO_HOME and API_PORT
-        run: |
-          EXO_HOME=$(mktemp -d -t exo-e2e-XXXXXXXX)
-          API_PORT=$((49152 + RANDOM % (65535 - 49152 + 1)))
-          EXO_MODELS_DIR="$HOME/.exo/models"
-          EXO_LIBP2P_NAMESPACE="bench-${GITHUB_RUN_ID}-${GITHUB_RUN_ATTEMPT}"
-          echo "EXO_HOME=$EXO_HOME" >> "$GITHUB_ENV"
-          echo "API_PORT=$API_PORT" >> "$GITHUB_ENV"
-          echo "EXO_MODELS_DIR=$EXO_MODELS_DIR" >> "$GITHUB_ENV"
-          echo "EXO_LIBP2P_NAMESPACE=$EXO_LIBP2P_NAMESPACE" >> "$GITHUB_ENV"
-          echo "Created EXO_HOME: $EXO_HOME"
-          echo "Generated API_PORT: $API_PORT"
-          echo "Using models from: $EXO_MODELS_DIR"
-          echo "Using libp2p namespace: $EXO_LIBP2P_NAMESPACE"
-        shell: bash
-
-      - name: Configure local MLX if available
-        run: |
-          echo "=== DEBUG: Checking for local MLX configuration ==="
-          MODIFIED=false
-          
-          echo "Checking for /Users/Shared/mlx directory..."
-          if [ -d "/Users/Shared/mlx" ]; then
-            echo "✓ Found /Users/Shared/mlx"
-            ls -la /Users/Shared/mlx | head -5
-            echo "Enabling local mlx path in pyproject.toml"
-            sed -i.bak 's|^# mlx = { path = "/Users/Shared/mlx", editable=true }$|mlx = { path = "/Users/Shared/mlx", editable=true }|' pyproject.toml
-            MODIFIED=true
-          else
-            echo "✗ /Users/Shared/mlx not found, will use PyPI version"
-          fi
-          
-          echo "Checking for /Users/Shared/mlx-lm directory..."
-          if [ -d "/Users/Shared/mlx-lm" ]; then
-            echo "✓ Found /Users/Shared/mlx-lm"
-            ls -la /Users/Shared/mlx-lm | head -5
-            echo "Enabling local mlx-lm path in pyproject.toml"
-            sed -i.bak 's|^# mlx-lm = { path = "/Users/Shared/mlx-lm", editable=true }$|mlx-lm = { path = "/Users/Shared/mlx-lm", editable=true }|' pyproject.toml
-            MODIFIED=true
-          else
-            echo "✗ /Users/Shared/mlx-lm not found, will use PyPI version"
-          fi
-          
-          if [ "$MODIFIED" = true ]; then
-            echo "=== Modified pyproject.toml [tool.uv.sources] section: ==="
-            sed -n '/\[tool\.uv\.sources\]/,/^\[/{/^\[tool\.uv\.sources\]/p; /^\[/!p;}' pyproject.toml
-            echo "=== Regenerating uv.lock with local MLX paths... ==="
-            nix --extra-experimental-features nix-command --extra-experimental-features flakes develop --command uv lock --upgrade-package mlx --upgrade-package mlx-lm
-            echo "✓ Lock file regenerated"
-          else
-            echo "⚠ No local MLX directories found, using PyPI packages"
-          fi
-          echo "=== DEBUG: Local MLX configuration complete ==="
-        shell: bash
-
-      - name: Sync dependencies
-        run: |
-          if [ -d "/Users/Shared/test" ]; then
-            pushd /Users/Shared/test
-            uv sync --reinstall
-            popd
-          fi
-          echo "Running just sync to ensure clean dependencies..."
-          nix --extra-experimental-features nix-command --extra-experimental-features flakes develop --command just sync
-        shell: bash
-
-      - name: Start EXO and run bench script
-        shell: bash
-        env:
-          IS_PRIMARY: ${{ matrix.is_primary }}
-          EXPECTED_NODES: ${{ matrix.expected_nodes }}
-          HARDWARE_LABEL: ${{ matrix.label }}
-          CONFIG_FILE: ${{ needs.plan.outputs.config_file }}
-          TIMEOUT_SECONDS: ${{ needs.plan.outputs.timeout_seconds }}
-          ENVIRONMENT_JSON: ${{ needs.plan.outputs.environment }}
-        run: |
-          set -euo pipefail
-
-          # Parse environment variables from config
-          ENV_VARS=""
-          if [ -n "$ENVIRONMENT_JSON" ] && [ "$ENVIRONMENT_JSON" != "{}" ]; then
-            ENV_VARS=$(echo "$ENVIRONMENT_JSON" | python3 -c "import sys, json; env = json.load(sys.stdin); print(' '.join([f'{k}={v}' for k, v in env.items()]))")
-          fi
-
-          echo "Starting EXO with API_PORT=${API_PORT} EXO_HOME=${EXO_HOME} EXO_LIBP2P_NAMESPACE=${EXO_LIBP2P_NAMESPACE}"
-          echo "Environment variables from config: $ENV_VARS"
-          LOG_FILE=/tmp/exo.log
-          : > "$LOG_FILE"
-
-          MASTER_FLAG=""
-          if [ "$IS_PRIMARY" = "true" ]; then
-            MASTER_FLAG="-m"
-          fi
-
-          nix --extra-experimental-features nix-command --extra-experimental-features flakes develop --command bash -c \
-            "EXO_HOME=$EXO_HOME EXO_MODELS_DIR=$EXO_MODELS_DIR EXO_LIBP2P_NAMESPACE=$EXO_LIBP2P_NAMESPACE $ENV_VARS PYTHONUNBUFFERED=1 PYTHONDEBUG=1 PYTHONPATH=. uv run exo $MASTER_FLAG --api-port $API_PORT" \
-            >> "$LOG_FILE" 2>&1 &
-
-          EXO_PID=$!
-          echo "Started EXO in background with PID: $EXO_PID"
-          echo "Log file: $LOG_FILE"
-
-          cleanup() {
-            echo '=== EXO log (tail) ==='
-            tail -n 300 "$LOG_FILE" || true
-            if ps -p "$EXO_PID" >/dev/null 2>&1; then
-              echo "Killing EXO (PID $EXO_PID)"
-              kill "$EXO_PID" || true
-            fi
-          }
-          trap cleanup EXIT
-
-          for i in $(seq 1 60); do
-            if curl -s "http://localhost:${API_PORT}/state" >/dev/null 2>&1; then
-              echo "EXO API ready"
-              break
-            fi
-            if ! ps -p "$EXO_PID" >/dev/null 2>&1; then
-              echo "EXO terminated early"; sed -n '1,200p' "$LOG_FILE" || true; exit 1
-            fi
-            sleep 1
-          done
-
-          RESULTS_FILE="/tmp/bench_results_${GITHUB_RUN_ID}_${GITHUB_RUN_ATTEMPT}_$(date +%s).json"
-          echo "Results will be saved to: $RESULTS_FILE"
-          echo "RESULTS_FILE=$RESULTS_FILE" >> "$GITHUB_ENV"
-
-          echo "Running bench script with config: $CONFIG_FILE, timeout: $TIMEOUT_SECONDS"
-          nix --extra-experimental-features nix-command --extra-experimental-features flakes develop --command bash -c \
-            "PYTHONUNBUFFERED=1 uv run --no-project --with pyyaml --with pydantic python .github/scripts/bench.py \
-              --api-port $API_PORT \
-              --config $CONFIG_FILE \
-              --expected-nodes ${EXPECTED_NODES} \
-              --is-primary ${IS_PRIMARY} \
-              --timeout-seconds ${TIMEOUT_SECONDS} \
-              --output $RESULTS_FILE \
-              --git-commit ${GITHUB_SHA} \
-              --hardware-labels ${HARDWARE_LABEL}"
-
-      - name: Install AWS CLI
-        if: always() && env.RESULTS_FILE && matrix.is_primary
-        run: |
-          if ! command -v aws &> /dev/null; then
-            echo "AWS CLI not found, installing..."
-            brew install awscli
-          else
-            echo "AWS CLI already installed"
-          fi
-        shell: bash
-
-      - name: Upload results to S3
-        if: always() && env.RESULTS_FILE && matrix.is_primary
-        env:
-          AWS_ACCESS_KEY_ID: ${{ secrets.S3_BENCHMARKS_AWS_ACCESS_KEY_ID }}
-          AWS_SECRET_ACCESS_KEY: ${{ secrets.S3_BENCHMARKS_AWS_SECRET_ACCESS_KEY }}
-          AWS_DEFAULT_REGION: us-east-1
-        run: |
-          echo "Checking for results file: $RESULTS_FILE"
-          echo "Is primary: ${{ matrix.is_primary }}"
-
-          if [ -f "$RESULTS_FILE" ]; then
-            TIMESTAMP=$(date -u +%Y/%m/%d/%H%M%S)
-            S3_KEY="bench/${TIMESTAMP}_${GITHUB_SHA:0:8}_${GITHUB_RUN_ID}.json"
-            echo "Uploading results to s3://exo-benchmark-results/$S3_KEY"
-
-            aws s3 cp "$RESULTS_FILE" "s3://exo-benchmark-results/$S3_KEY" \
-              --content-type application/json \
-              --metadata "commit=${GITHUB_SHA},run_id=${GITHUB_RUN_ID},branch=${GITHUB_REF_NAME}"
-
-            echo "Results uploaded successfully"
-            echo "View at: https://exo-benchmark-results.s3.amazonaws.com/$S3_KEY"
-          else
-            echo "Results file not found at: $RESULTS_FILE"
-            echo "Skipping upload"
-          fi
-        shell: bash
-
-      - name: Cleanup EXO_HOME
-        run: |
-          echo "Cleaning up EXO_HOME: $EXO_HOME"
-          rm -rf "$EXO_HOME"
-        shell: bash
-        if: always()
--- a/.github/workflows/build-app.yml
+++ b/.github/workflows/build-app.yml
@@ -1,6 +1,7 @@
 name: Build EXO macOS DMG

 on:
+  workflow_dispatch:
  push:
    tags:
      - "v*"
@@ -18,6 +19,7 @@ jobs:
      SPARKLE_ED25519_PRIVATE: ${{ secrets.SPARKLE_ED25519_PRIVATE }}
      SPARKLE_S3_BUCKET: ${{ secrets.SPARKLE_S3_BUCKET }}
      SPARKLE_S3_PREFIX: ${{ secrets.SPARKLE_S3_PREFIX }}
+      EXO_BUG_REPORT_PRESIGNED_URL_ENDPOINT: ${{ secrets.EXO_BUG_REPORT_PRESIGNED_URL_ENDPOINT }}
      AWS_REGION: ${{ secrets.AWS_REGION }}
      EXO_BUILD_NUMBER: ${{ github.run_number }}
      EXO_LIBP2P_NAMESPACE: ${{ github.ref_name }}
@@ -34,7 +36,7 @@ jobs:

      - name: Derive release version from tag
        run: |
-          if [[ "$GITHUB_REF_NAME" == "test-app" ]]; then
+          if [[ "$GITHUB_REF_NAME" == "test-app" || "${{ github.event_name }}" == "workflow_dispatch" ]]; then
            VERSION="0.0.0-alpha.0"
            echo "IS_ALPHA=true" >> $GITHUB_ENV
          else
@@ -47,6 +49,32 @@ jobs:
          fi
          echo "RELEASE_VERSION=$VERSION" >> $GITHUB_ENV

+      - name: Compute build version from semver
+        run: |
+          VERSION="$RELEASE_VERSION"
+          # Extract major.minor.patch (strip prerelease suffix)
+          BASE_VERSION="${VERSION%%-*}"
+          MAJOR=$(echo "$BASE_VERSION" | cut -d. -f1)
+          MINOR=$(echo "$BASE_VERSION" | cut -d. -f2)
+          PATCH=$(echo "$BASE_VERSION" | cut -d. -f3)
+
+          # Extract prerelease number (e.g., "alpha.2" -> 2, or 999 for releases)
+          if [[ "$VERSION" == *-* ]]; then
+            PRERELEASE_PART="${VERSION#*-}"
+            PRERELEASE_NUM="${PRERELEASE_PART##*.}"
+            # Default to 0 if not a number
+            if ! [[ "$PRERELEASE_NUM" =~ ^[0-9]+$ ]]; then
+              PRERELEASE_NUM=0
+            fi
+          else
+            PRERELEASE_NUM=999
+          fi
+
+          # Compute: PRERELEASE + (1000 * PATCH) + (1_000_000 * MINOR) + (1_000_000_000 * MAJOR)
+          BUILD_VERSION=$((PRERELEASE_NUM + 1000 * PATCH + 1000000 * MINOR + 1000000000 * MAJOR))
+          echo "EXO_BUILD_VERSION=$BUILD_VERSION" >> $GITHUB_ENV
+          echo "Computed build version: $BUILD_VERSION from $VERSION"
+
      - name: Ensure tag commit is on main
        if: github.ref_type == 'tag'
        run: |
@@ -162,11 +190,12 @@ jobs:
            -configuration Release \
            -derivedDataPath build \
            MARKETING_VERSION="$RELEASE_VERSION" \
-            CURRENT_PROJECT_VERSION="$EXO_BUILD_NUMBER" \
+            CURRENT_PROJECT_VERSION="$EXO_BUILD_VERSION" \
            EXO_BUILD_TAG="$RELEASE_VERSION" \
            EXO_BUILD_COMMIT="$GITHUB_SHA" \
            SPARKLE_FEED_URL="$SPARKLE_FEED_URL" \
            SPARKLE_ED25519_PUBLIC="$SPARKLE_ED25519_PUBLIC" \
+            EXO_BUG_REPORT_PRESIGNED_URL_ENDPOINT="$EXO_BUG_REPORT_PRESIGNED_URL_ENDPOINT" \
            CODE_SIGNING_IDENTITY="$SIGNING_IDENTITY" \
            CODE_SIGN_INJECT_BASE_ENTITLEMENTS=YES
          mkdir -p ../../output
@@ -294,5 +323,5 @@ jobs:
          aws s3 cp "$DMG_NAME" "s3://${SPARKLE_S3_BUCKET}/${PREFIX}${DMG_NAME}"
          if [[ "$IS_ALPHA" != "true" ]]; then
            aws s3 cp "$DMG_NAME" "s3://${SPARKLE_S3_BUCKET}/${PREFIX}EXO-latest.dmg"
+            aws s3 cp appcast.xml "s3://${SPARKLE_S3_BUCKET}/${PREFIX}appcast.xml" --content-type application/xml --cache-control no-cache
          fi
-          aws s3 cp appcast.xml "s3://${SPARKLE_S3_BUCKET}/${PREFIX}appcast.xml" --content-type application/xml --cache-control no-cache
--- a/.github/workflows/pipeline.yml
+++ b/.github/workflows/pipeline.yml
@@ -20,6 +20,12 @@ jobs:
        with:
          nix_path: nixpkgs=channel:nixos-unstable

+      - uses: cachix/cachix-action@v14
+        name: Configure Cachix
+        with:
+          name: exo
+          authToken: "${{ secrets.CACHIX_AUTH_TOKEN }}"
+
      - name: Configure git user
        run: |
          git config --local user.email "github-actions@users.noreply.github.com"
@@ -88,9 +94,19 @@ jobs:

      - uses: ./.github/actions/typecheck

-  nix-flake-check:
-    name: Check Nix flake
-    runs-on: ubuntu-latest
+  nix:
+    name: Build and check (${{ matrix.system }})
+    runs-on: ${{ matrix.runner }}
+    strategy:
+      fail-fast: false
+      matrix:
+        include:
+          - runner: macos-26
+            system: aarch64-darwin
+          - runner: ubuntu-latest
+            system: x86_64-linux
+          - runner: ubuntu-24.04-arm
+            system: aarch64-linux
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4
@@ -101,83 +117,20 @@ jobs:
        with:
          nix_path: nixpkgs=channel:nixos-unstable

-      - name: Run nix flake check
-        run: |
-          nix flake check
-        shell: bash
+      - uses: cachix/cachix-action@v14
+        name: Configure Cachix
+        with:
+          name: exo
+          authToken: "${{ secrets.CACHIX_AUTH_TOKEN }}"

-#  ci:
-#    needs: typecheck
-#    runs-on: ubuntu-latest
-#    permissions:
-#      contents: read
-#    env:
-#      GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-#    steps:
-#      - name: Checkout repository
-#        uses: actions/checkout@v4
-#        with:
-#          fetch-depth: 0
-#          token: ${{ secrets.GITHUB_TOKEN }}
-#          lfs: true
-#
-#      - name: Configure git user
-#        run: |
-#          git config --local user.email "github-actions@users.noreply.github.com"
-#          git config --local user.name  "github-actions bot"
-#        shell: bash
-#
-#      - name: Pull LFS files
-#        run: |
-#          echo "Pulling Git LFS files..."
-#          git lfs pull
-#        shell: bash
-#
-#      - name: Setup EXO_HOME and API_PORT
-#        run: |
-#          EXO_HOME=$(mktemp -d -t exo-ci-XXXXXXXX)
-#          # Generate random port (macOS compatible method)
-#          API_PORT=$((49152 + RANDOM % (65535 - 49152 + 1)))
-#          echo "EXO_HOME=$EXO_HOME" >> $GITHUB_ENV
-#          echo "API_PORT=$API_PORT" >> $GITHUB_ENV
-#          echo "Created EXO_HOME: $EXO_HOME"
-#          echo "Generated API_PORT: $API_PORT"
-#        shell: bash
-#
-#      - name: Setup Nix Environment
-#        run: |
-#          echo "Checking for nix installation..."
-#          
-#          # Check if nix binary exists directly
-#          if [ -f /nix/var/nix/profiles/default/bin/nix ]; then
-#            echo "Found nix binary at /nix/var/nix/profiles/default/bin/nix"
-#            export PATH="/nix/var/nix/profiles/default/bin:$PATH"
-#            echo "PATH=$PATH" >> $GITHUB_ENV
-#            nix --version
-#          elif [ -f /nix/var/nix/profiles/default/etc/profile.d/nix-daemon.sh ]; then
-#            echo "Found nix profile script, sourcing..."
-#            source /nix/var/nix/profiles/default/etc/profile.d/nix-daemon.sh
-#            nix --version
-#          elif command -v nix >/dev/null 2>&1; then
-#            echo "Nix already in PATH"
-#            nix --version
-#          else
-#            echo "Nix not found. Debugging info:"
-#            echo "Contents of /nix/var/nix/profiles/default/:"
-#            ls -la /nix/var/nix/profiles/default/ 2>/dev/null || echo "Directory not found"
-#            echo "Contents of /nix/var/nix/profiles/default/bin/:"
-#            ls -la /nix/var/nix/profiles/default/bin/ 2>/dev/null || echo "Directory not found"
-#            exit 1
-#          fi
-#        shell: bash
-#
-#      - uses: ./.github/actions/lint-check
-#
-#      - uses: ./.github/actions/unit-test
-#
-#      - name: Cleanup EXO_HOME
-#        run: |
-#          echo "Cleaning up EXO_HOME: $EXO_HOME"
-#          rm -rf "$EXO_HOME"
-#        shell: bash
-#        if: always()
+      - name: Build all Nix outputs
+        run: |
+          nix flake show --json | jq -r '
+            [
+              (.packages."${{ matrix.system }}" // {} | keys[] | ".#packages.${{ matrix.system }}.\(.)"),
+              (.devShells."${{ matrix.system }}" // {} | keys[] | ".#devShells.${{ matrix.system }}.\(.)")
+            ] | .[]
+          ' | xargs nix build
+
+      - name: Run nix flake check
+        run: nix flake check
--- a/.gitignore
+++ b/.gitignore
@@ -16,6 +16,7 @@ digest.txt
 *.xcuserdatad/
 **/.DS_Store
 app/EXO/build/
+dist/


 # rust
--- a/.mlx_typings/mlx_lm/models/deepseek_v3.pyi
+++ b/.mlx_typings/mlx_lm/models/deepseek_v3.pyi
@@ -0,0 +1,156 @@
+"""Type stubs for mlx_lm.models.deepseek_v3"""
+
+from dataclasses import dataclass
+from typing import Any, Dict, Optional
+
+import mlx.core as mx
+import mlx.nn as nn
+
+from .base import BaseModelArgs
+from .switch_layers import SwitchGLU
+
+@dataclass
+class ModelArgs(BaseModelArgs):
+    model_type: str
+    vocab_size: int
+    hidden_size: int
+    intermediate_size: int
+    moe_intermediate_size: int
+    num_hidden_layers: int
+    num_attention_heads: int
+    num_key_value_heads: int
+    n_shared_experts: Optional[int]
+    n_routed_experts: Optional[int]
+    routed_scaling_factor: float
+    kv_lora_rank: int
+    q_lora_rank: Optional[int]
+    qk_rope_head_dim: int
+    v_head_dim: int
+    qk_nope_head_dim: int
+    topk_method: str
+    scoring_func: str
+    norm_topk_prob: bool
+    n_group: int
+    topk_group: int
+    num_experts_per_tok: int
+    moe_layer_freq: int
+    first_k_dense_replace: int
+    max_position_embeddings: int
+    rms_norm_eps: float
+    rope_theta: float
+    rope_scaling: Optional[Dict[str, Any]]
+    attention_bias: bool
+
+class DeepseekV3Attention(nn.Module):
+    config: ModelArgs
+    hidden_size: int
+    num_heads: int
+    max_position_embeddings: int
+    rope_theta: float
+    q_lora_rank: Optional[int]
+    qk_rope_head_dim: int
+    kv_lora_rank: int
+    v_head_dim: int
+    qk_nope_head_dim: int
+    q_head_dim: int
+    scale: float
+    q_proj: nn.Linear
+    q_a_proj: nn.Linear
+    q_a_layernorm: nn.RMSNorm
+    q_b_proj: nn.Linear
+    kv_a_proj_with_mqa: nn.Linear
+    kv_a_layernorm: nn.RMSNorm
+    kv_b_proj: nn.Linear
+    o_proj: nn.Linear
+    rope: Any
+
+    def __init__(self, config: ModelArgs) -> None: ...
+    def __call__(
+        self,
+        x: mx.array,
+        mask: Optional[mx.array] = None,
+        cache: Optional[Any] = None,
+    ) -> mx.array: ...
+
+class DeepseekV3MLP(nn.Module):
+    config: ModelArgs
+    hidden_size: int
+    intermediate_size: int
+    gate_proj: nn.Linear
+    up_proj: nn.Linear
+    down_proj: nn.Linear
+
+    def __init__(
+        self,
+        config: ModelArgs,
+        hidden_size: Optional[int] = None,
+        intermediate_size: Optional[int] = None,
+    ) -> None: ...
+    def __call__(self, x: mx.array) -> mx.array: ...
+
+class MoEGate(nn.Module):
+    config: ModelArgs
+    top_k: int
+    norm_topk_prob: bool
+    n_routed_experts: Optional[int]
+    routed_scaling_factor: float
+    n_group: int
+    topk_group: int
+    weight: mx.array
+    e_score_correction_bias: mx.array
+
+    def __init__(self, config: ModelArgs) -> None: ...
+    def __call__(self, x: mx.array) -> tuple[mx.array, mx.array]: ...
+
+class DeepseekV3MoE(nn.Module):
+    config: ModelArgs
+    num_experts_per_tok: int
+    switch_mlp: SwitchGLU
+    gate: MoEGate
+    shared_experts: DeepseekV3MLP
+    sharding_group: Optional[mx.distributed.Group]
+
+    def __init__(self, config: ModelArgs) -> None: ...
+    def __call__(self, x: mx.array) -> mx.array: ...
+
+class DeepseekV3DecoderLayer(nn.Module):
+    self_attn: DeepseekV3Attention
+    mlp: DeepseekV3MLP | DeepseekV3MoE
+    input_layernorm: nn.RMSNorm
+    post_attention_layernorm: nn.RMSNorm
+
+    def __init__(self, config: ModelArgs, layer_idx: int) -> None: ...
+    def __call__(
+        self,
+        x: mx.array,
+        mask: Optional[mx.array] = None,
+        cache: Optional[Any] = None,
+    ) -> mx.array: ...
+
+class DeepseekV3Model(nn.Module):
+    vocab_size: int
+    embed_tokens: nn.Embedding
+    layers: list[DeepseekV3DecoderLayer]
+    norm: nn.RMSNorm
+
+    def __init__(self, config: ModelArgs) -> None: ...
+    def __call__(
+        self,
+        x: mx.array,
+        cache: Optional[Any] = None,
+    ) -> mx.array: ...
+
+class Model(nn.Module):
+    model_type: str
+    model: DeepseekV3Model
+    lm_head: nn.Linear
+
+    def __init__(self, config: ModelArgs) -> None: ...
+    def __call__(
+        self,
+        inputs: mx.array,
+        cache: Optional[Any] = None,
+    ) -> mx.array: ...
+    def sanitize(self, weights: dict[str, Any]) -> dict[str, Any]: ...
+    @property
+    def layers(self) -> list[DeepseekV3DecoderLayer]: ...
--- a/.mlx_typings/mlx_lm/models/switch_layers.pyi
+++ b/.mlx_typings/mlx_lm/models/switch_layers.pyi
@@ -57,6 +57,11 @@ class SwiGLU(nn.Module):
    def __call__(self, x, gate): ...

 class SwitchGLU(nn.Module):
+    gate_proj: SwitchLinear
+    up_proj: SwitchLinear
+    down_proj: SwitchLinear
+    activation: SwiGLU
+
    def __init__(
        self,
        input_dims: int,
--- a/.mlx_typings/mlx_lm/tokenizer_utils.pyi
+++ b/.mlx_typings/mlx_lm/tokenizer_utils.pyi
@@ -4,6 +4,7 @@ This type stub file was generated by pyright.

 from functools import partial
 from pathlib import Path
+from typing import Any

 from transformers import PreTrainedTokenizerFast

@@ -103,37 +104,55 @@ class TokenizerWrapper:
    Accessing any attribute other than the ``detokenizer`` is forwarded to the
    huggingface tokenizer.
    """
-    def __init__(self, tokenizer, detokenizer_class=..., eos_token_ids=...) -> None: ...
-    def add_eos_token(self, token: str):  # -> None:
-        ...
-    @property
-    def has_thinking(self):  # -> bool:
-        ...
-    @property
-    def think_start(self):  # -> str | None:
-        ...
-    @property
-    def think_end(self):  # -> str | None:
-        ...
-    @property
-    def has_tool_calling(self):  # -> bool:
-        ...
-    @property
-    def tool_call_start(self):  # -> str | None:
-        ...
-    @property
-    def tool_call_end(self):  # -> str | None:
-        ...
-    @property
-    def detokenizer(self):  # -> NaiveStreamingDetokenizer:
-        """
-        Get a stateful streaming detokenizer.
-        """

-    def __getattr__(self, attr):  # -> set[Any] | Any:
-        ...
-    def __setattr__(self, attr, value):  # -> None:
-        ...
+    _tokenizer: PreTrainedTokenizerFast
+    eos_token_id: int | None
+    eos_token: str | None
+    bos_token_id: int | None
+    bos_token: str | None
+    vocab_size: int
+    all_special_tokens: list[str]
+
+    def __init__(
+        self,
+        tokenizer: Any,
+        detokenizer_class: Any = ...,
+        eos_token_ids: list[int] | None = ...,
+        chat_template: Any = ...,
+        tool_parser: Any = ...,
+        tool_call_start: str | None = ...,
+        tool_call_end: str | None = ...,
+    ) -> None: ...
+    def encode(self, text: str, **kwargs: Any) -> list[int]: ...
+    def decode(self, token_ids: list[int], **kwargs: Any) -> str: ...
+    def apply_chat_template(
+        self,
+        messages: list[dict[str, Any]],
+        tokenize: bool = False,
+        add_generation_prompt: bool = False,
+        tools: Any = None,
+        **kwargs: Any,
+    ) -> str: ...
+    def get_vocab(self) -> dict[str, int]: ...
+    def add_eos_token(self, token: str) -> None: ...
+    @property
+    def has_thinking(self) -> bool: ...
+    @property
+    def think_start(self) -> str | None: ...
+    @property
+    def think_end(self) -> str | None: ...
+    @property
+    def has_tool_calling(self) -> bool: ...
+    @property
+    def tool_call_start(self) -> str | None: ...
+    @property
+    def tool_call_end(self) -> str | None: ...
+    @property
+    def detokenizer(self) -> NaiveStreamingDetokenizer:
+        """Get a stateful streaming detokenizer."""
+
+    def __getattr__(self, attr: str) -> Any: ...
+    def __setattr__(self, attr: str, value: Any) -> None: ...

 class NewlineTokenizer(PreTrainedTokenizerFast):
    """A tokenizer that replaces newlines with <n> and <n> with new line."""
@@ -146,18 +165,11 @@ class NewlineTokenizer(PreTrainedTokenizerFast):
    def batch_decode(self, *args, **kwargs):  # -> list[str]:
        ...

-def load_tokenizer(
+def load(
    model_path: Path,
-    tokenizer_config_extra=...,
-    return_tokenizer=...,
-    eos_token_ids=...,
-) -> (
-    TokenizerWrapper
-    | type[SPMStreamingDetokenizer]
-    | partial[SPMStreamingDetokenizer]
-    | type[BPEStreamingDetokenizer]
-    | type[NaiveStreamingDetokenizer]
-):
+    tokenizer_config_extra: dict[str, Any] | None = None,
+    eos_token_ids: list[int] | int | None = None,
+) -> TokenizerWrapper:
    """Load a huggingface tokenizer and try to infer the type of streaming
    detokenizer to use.

@@ -165,4 +177,7 @@ def load_tokenizer(
    a Hugging Face repo ID.
    """

-def no_bos_or_eos(sequence: list, bos: int, eos: int) -> list: ...
+# Alias for backward compatibility
+load_tokenizer = load
+
+def no_bos_or_eos(sequence: list[int], bos: int, eos: int) -> list[int]: ...
--- a/.prettierrc
+++ b/.prettierrc
@@ -0,0 +1,3 @@
+{
+  "useTabs": true
+}
--- a/.swift-format
+++ b/.swift-format
@@ -0,0 +1,6 @@
+{
+  "version": 1,
+  "indentation": {
+    "spaces": 4
+  }
+}
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -0,0 +1,96 @@
+# AGENTS.md
+
+This file provides guidance to AI coding agents when working with code in this repository.
+
+## Project Overview
+
+exo is a distributed AI inference system that connects multiple devices into a cluster. It enables running large language models across multiple machines using MLX as the inference backend and libp2p for peer-to-peer networking.
+
+## Build & Run Commands
+
+```bash
+# Build the dashboard (required before running exo)
+cd dashboard && npm install && npm run build && cd ..
+
+# Run exo (starts both master and worker with API at http://localhost:52415)
+uv run exo
+
+# Run with verbose logging
+uv run exo -v   # or -vv for more verbose
+
+# Run tests (excludes slow tests by default)
+uv run pytest
+
+# Run all tests including slow tests
+uv run pytest -m ""
+
+# Run a specific test file
+uv run pytest src/exo/shared/tests/test_election.py
+
+# Run a specific test function
+uv run pytest src/exo/shared/tests/test_election.py::test_function_name
+
+# Type checking (strict mode)
+uv run basedpyright
+
+# Linting
+uv run ruff check
+
+# Format code (using nix)
+nix fmt
+```
+
+## Architecture
+
+### Node Composition
+A single exo `Node` (src/exo/main.py) runs multiple components:
+- **Router**: libp2p-based pub/sub messaging via Rust bindings (exo_pyo3_bindings)
+- **Worker**: Handles inference tasks, downloads models, manages runner processes
+- **Master**: Coordinates cluster state, places model instances across nodes
+- **Election**: Bully algorithm for master election
+- **API**: FastAPI server for OpenAI-compatible chat completions
+
+### Message Flow
+Components communicate via typed pub/sub topics (src/exo/routing/topics.py):
+- `GLOBAL_EVENTS`: Master broadcasts indexed events to all workers
+- `LOCAL_EVENTS`: Workers send events to master for indexing
+- `COMMANDS`: Workers/API send commands to master
+- `ELECTION_MESSAGES`: Election protocol messages
+- `CONNECTION_MESSAGES`: libp2p connection updates
+
+### Event Sourcing
+The system uses event sourcing for state management:
+- `State` (src/exo/shared/types/state.py): Immutable state object
+- `apply()` (src/exo/shared/apply.py): Pure function that applies events to state
+- Master indexes events and broadcasts; workers apply indexed events
+
+### Key Type Hierarchy
+- `src/exo/shared/types/`: Pydantic models for all shared types
+  - `events.py`: Event types (discriminated union)
+  - `commands.py`: Command types
+  - `tasks.py`: Task types for worker execution
+  - `state.py`: Cluster state model
+
+### Rust Components
+Rust code in `rust/` provides:
+- `networking`: libp2p networking (gossipsub, peer discovery)
+- `exo_pyo3_bindings`: PyO3 bindings exposing Rust to Python
+- `system_custodian`: System-level operations
+
+### Dashboard
+Svelte 5 + TypeScript frontend in `dashboard/`. Build output goes to `dashboard/build/` and is served by the API.
+
+## Code Style Requirements
+
+From .cursorrules:
+- Strict, exhaustive typing - never bypass the type-checker
+- Use `Literal[...]` for enum-like sets, `typing.NewType` for primitives
+- Pydantic models with `frozen=True` and `strict=True`
+- Pure functions with injectable effect handlers for side-effects
+- Descriptive names - no abbreviations or 3-letter acronyms
+- Catch exceptions only where you can handle them meaningfully
+- Use `@final` and immutability wherever applicable
+
+## Testing
+
+Tests use pytest-asyncio with `asyncio_mode = "auto"`. Tests are in `tests/` subdirectories alongside the code they test. The `EXO_TESTS=1` env var is set during tests.
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1 @@
+AGENTS.md
--- a/MISSED_THINGS.md
+++ b/MISSED_THINGS.md
@@ -0,0 +1,41 @@
+# Missed things
+[X] Log EXO_LIBP2P_NAMESPACE on start in exo/main.py
+[X] Ordering of warmup was changed, which is wrong. It was changed to rank < n-1, then rank=n-1. It should be rank!=0 then rank=0 (this matches the auto_parallel implementation. NOTE: we use a different convention to mlx-lm, our terminal rank is rank=n-1 whereas mlx-lm is rank=0 hence i can see why this was changed wrongly).
+[X] Downloads keying by model_id not shard_metadata (worker/plan.py, worker/main.py).
+[X] Fetching download status of all models on start
+[X] Deduplication of tasks in plan_step.
+[X] resolve_allow_patterns should just be wildcard now.
+[] no mx_barrier in genreate.py mlx_generate at the end.
+[] cache assertion not needed in auto_parallel.py PipelineLastLayer.
+[] GPTOSS support dropped in auto_parallel.py.
+[] sharding changed "all-to-sharded" became _all_to_sharded in auto_parallel.py.
+[] same as above with "sharded-to-all" became _sharded_to_all in auto_parallel.py.
+[] Dropped support for Ministral3Model, DeepseekV32Model, Glm4MoeModel, Qwen3NextModel, GptOssMode in auto_parallel.py.
+[] Dropped prefill/decode code in auto_parallel.py and utils_mlx.py.
+[X] KV_CACHE_BITS should be None to disable quantized KV cache.
+[] Dropped _set_nofile_limit in utils_mlx.py.
+[] We have group optional in load_mlx_items in utils_mlx.py.
+[] Dropped add_missing_chat_templates for GptOss in load_mlx_items in utils_mlx.py.
+[] Dropped model.make_cache in make_kv_cache in utils_mlx.py.
+[X] We put cache limit back in utils_mlx.py.
+[] topology.py remove_node removes the connections after checking if node is is in self._node_id_to_rx_id_map. on beta_1 it checks after, so would remove stale connections I guess?
+[] Missing Glm 4.7 model cards (this isn't ready yet but should be picked up, probably create an issue... the blocker is transforemrs version doesn't support the tokenizer for Glm 4.7. rc-1 does but we can't upgrade as it breaks other things.)
+[] try-except in _command_processor only excepts ValueError. This was silently failing leading to un-debuggable errors (we had a KeyError that was happening ). Changed this to catch Exception instead of ValueError. See exo-v2 89ae38405e0052e3c22405daf094b065878aa873 and fb99fea69b5a39017efc90c5dad0072e677455f0.
+[X] In placement.py, place_instance no longer looks at model_meta.supports_tensor and check if this tensor parallel number of nodes is supported by the model's tensor dimensions.
+[X] In placement.py, place_instanec, we no longer have the special case to exclude DeepSeek v3.1 pipeline parallel (it doesn't work).
+[] logger.warning("You have likely selected ibv for a single node instance; falling back to MlxRing") was changed to debug. That will spam this warning since it happens every time we query instance previews.
+[X] In placement_utils.py, get_mlx_jaccl_coordinators, We no longer prioritise Jaccl Coordinator IP. Now it picks the first one, which is unstable (Jaccl coordinator over TB5 is unstable).
+
+
+
+[X] Downloads keying by model_id not shard_metadata (worker/plan.py, worker/main.py).
+[X] Fetching download status of all models on start
+[X] Deduplication of tasks in plan_step.
+[X] resolve_allow_patterns should just be wildcard now.
+[X] KV_CACHE_BITS should be None to disable quantized KV cache.
+[X] We put cache limit back in utils_mlx.py.
+[X] In placement.py, place_instance no longer looks at model_meta.supports_tensor and check if this tensor parallel number of nodes is supported by the model's tensor dimensions.
+[X] In placement.py, place_instanec, we no longer have the special case to exclude DeepSeek v3.1 pipeline parallel (it doesn't work).
+[X] In placement_utils.py, get_mlx_jaccl_coordinators, We no longer prioritise Jaccl Coordinator IP. Now it picks the first one, which is unstable (Jaccl coordinator over TB5 is unstable).
+
+
--- a/README.md
+++ b/README.md
@@ -8,7 +8,7 @@
 exo: Run your own AI cluster at home with everyday devices. Maintained by [exo labs](https://x.com/exolabs).

 <p align="center">
-  <a href="https://discord.gg/72NsF6ux" target="_blank" rel="noopener noreferrer"><img src="https://img.shields.io/badge/Discord-Join%20Server-5865F2?logo=discord&logoColor=white" alt="Discord"></a>
+  <a href="https://discord.gg/TJ4P57arEm" target="_blank" rel="noopener noreferrer"><img src="https://img.shields.io/badge/Discord-Join%20Server-5865F2?logo=discord&logoColor=white" alt="Discord"></a>
  <a href="https://x.com/exolabs" target="_blank" rel="noopener noreferrer"><img src="https://img.shields.io/twitter/follow/exolabs?style=social" alt="X"></a>
  <a href="https://www.apache.org/licenses/LICENSE-2.0.html" target="_blank" rel="noopener noreferrer"><img src="https://img.shields.io/badge/License-Apache2.0-blue.svg" alt="License: Apache-2.0"></a>
 </p>
@@ -166,6 +166,24 @@ Download the latest build here: [EXO-latest.dmg](https://assets.exolabs.net/EXO-

 The app will ask for permission to modify system settings and install a new Network profile. Improvements to this are being worked on.

+#### Uninstalling the macOS App
+
+The recommended way to uninstall is through the app itself: click the menu bar icon → Advanced → Uninstall. This cleanly removes all system components.
+
+If you've already deleted the app, you can run the standalone uninstaller script:
+
+```bash
+sudo ./app/EXO/uninstall-exo.sh
+```
+
+This removes:
+- Network setup LaunchDaemon
+- Network configuration script
+- Log files
+- The "exo" network location
+
+**Note:** You'll need to manually remove EXO from Login Items in System Settings → General → Login Items.
+
 ---

 ### Enabling RDMA on macOS
@@ -287,7 +305,10 @@ curl -X DELETE http://localhost:52415/instance/YOUR_INSTANCE_ID
 - List all models: `curl http://localhost:52415/models`
 - Inspect instance IDs and deployment state: `curl http://localhost:52415/state`

-For further details, see API types and endpoints in [src/exo/master/api.py](src/exo/master/api.py).
+For further details, see:
+
+- API basic documentation in [docs/api.md](docs/api.md).
+- API types and endpoints in [src/exo/master/api.py](src/exo/master/api.py).

 ---

--- a/app/EXO/EXO/ContentView.swift
+++ b/app/EXO/EXO/ContentView.swift
@@ -12,18 +12,25 @@ struct ContentView: View {
    @EnvironmentObject private var controller: ExoProcessController
    @EnvironmentObject private var stateService: ClusterStateService
    @EnvironmentObject private var networkStatusService: NetworkStatusService
+    @EnvironmentObject private var localNetworkChecker: LocalNetworkChecker
    @EnvironmentObject private var updater: SparkleUpdater
    @State private var focusedNode: NodeViewModel?
    @State private var deletingInstanceIDs: Set<String> = []
    @State private var showAllNodes = false
    @State private var showAllInstances = false
+    @State private var showAdvanced = false
    @State private var showDebugInfo = false
    @State private var bugReportInFlight = false
    @State private var bugReportMessage: String?
+    @State private var uninstallInProgress = false
+    @State private var pendingNamespace: String = ""

    var body: some View {
        VStack(alignment: .leading, spacing: 12) {
            statusSection
+            if shouldShowLocalNetworkWarning {
+                localNetworkWarningBanner
+            }
            if shouldShowClusterDetails {
                Divider()
                overviewSection
@@ -38,6 +45,7 @@ struct ContentView: View {
        }
        .animation(.easeInOut(duration: 0.3), value: shouldShowClusterDetails)
        .animation(.easeInOut(duration: 0.3), value: shouldShowInstances)
+        .animation(.easeInOut(duration: 0.3), value: shouldShowLocalNetworkWarning)
        .padding()
        .frame(width: 340)
        .onAppear {
@@ -47,9 +55,62 @@ struct ContentView: View {
        }
    }

+    private var shouldShowLocalNetworkWarning: Bool {
+        if case .notWorking = localNetworkChecker.status {
+            return controller.status != .stopped
+        }
+        return false
+    }
+
+    private var localNetworkWarningBanner: some View {
+        VStack(alignment: .leading, spacing: 6) {
+            HStack(spacing: 6) {
+                Image(systemName: "exclamationmark.triangle.fill")
+                    .foregroundColor(.orange)
+                Text("Local Network Access Issue")
+                    .font(.caption)
+                    .fontWeight(.semibold)
+            }
+            Text(
+                "Device discovery won't work. To fix:\n1. Quit EXO\n2. Open System Settings → Privacy & Security → Local Network\n3. Toggle EXO off, then back on\n4. Relaunch EXO"
+            )
+            .font(.caption2)
+            .foregroundColor(.secondary)
+            .fixedSize(horizontal: false, vertical: true)
+            Button {
+                openLocalNetworkSettings()
+            } label: {
+                Text("Open Settings")
+                    .font(.caption2)
+            }
+            .buttonStyle(.bordered)
+            .controlSize(.small)
+        }
+        .padding(8)
+        .background(
+            RoundedRectangle(cornerRadius: 8)
+                .fill(Color.orange.opacity(0.1))
+        )
+        .overlay(
+            RoundedRectangle(cornerRadius: 8)
+                .stroke(Color.orange.opacity(0.3), lineWidth: 1)
+        )
+    }
+
+    private func openLocalNetworkSettings() {
+        // Open Privacy & Security settings - Local Network section
+        if let url = URL(
+            string: "x-apple.systempreferences:com.apple.preference.security?Privacy_LocalNetwork")
+        {
+            NSWorkspace.shared.open(url)
+        }
+    }
+
    private var topologySection: some View {
        Group {
-            if let topology = stateService.latestSnapshot?.topologyViewModel(localNodeId: stateService.localNodeId), !topology.nodes.isEmpty {
+            if let topology = stateService.latestSnapshot?.topologyViewModel(
+                localNodeId: stateService.localNodeId), !topology.nodes.isEmpty
+            {
                TopologyMiniView(topology: topology)
            }
        }
@@ -83,8 +144,10 @@ struct ContentView: View {
                VStack(alignment: .leading, spacing: 4) {
                    HStack {
                        VStack(alignment: .leading) {
-                            Text("\(overview.usedRam, specifier: "%.0f") / \(overview.totalRam, specifier: "%.0f") GB")
-                                .font(.headline)
+                            Text(
+                                "\(overview.usedRam, specifier: "%.0f") / \(overview.totalRam, specifier: "%.0f") GB"
+                            )
+                            .font(.headline)
                            Text("Memory")
                                .font(.caption)
                                .foregroundColor(.secondary)
@@ -193,11 +256,7 @@ struct ContentView: View {
                Divider()
                    .padding(.vertical, 4)
            }
-            controlButton(title: "Check for Updates") {
-                updater.checkForUpdates()
-            }
-            .padding(.bottom, 8)
-            debugSection
+            advancedSection
                .padding(.bottom, 8)
            controlButton(title: "Quit", tint: .secondary) {
                controller.stop()
@@ -206,7 +265,57 @@ struct ContentView: View {
        }
    }

-    private func controlButton(title: String, tint: Color = .primary, action: @escaping () -> Void) -> some View {
+    private var advancedSection: some View {
+        VStack(alignment: .leading, spacing: 6) {
+            HStack {
+                Text("Advanced")
+                    .font(.caption)
+                    .foregroundColor(.secondary)
+                Spacer()
+                collapseButton(isExpanded: $showAdvanced)
+            }
+            .animation(nil, value: showAdvanced)
+            if showAdvanced {
+                VStack(alignment: .leading, spacing: 8) {
+                    VStack(alignment: .leading, spacing: 4) {
+                        Text("Cluster Namespace")
+                            .font(.caption2)
+                            .foregroundColor(.secondary)
+                        HStack {
+                            TextField("optional", text: $pendingNamespace)
+                                .textFieldStyle(.roundedBorder)
+                                .font(.caption2)
+                                .onAppear {
+                                    pendingNamespace = controller.customNamespace
+                                }
+                            Button("Save & Restart") {
+                                controller.customNamespace = pendingNamespace
+                                if controller.status == .running || controller.status == .starting {
+                                    controller.restart()
+                                }
+                            }
+                            .font(.caption2)
+                            .disabled(pendingNamespace == controller.customNamespace)
+                        }
+                    }
+                    HoverButton(title: "Check for Updates", small: true) {
+                        updater.checkForUpdates()
+                    }
+                    debugSection
+                    HoverButton(title: "Uninstall", tint: .red, small: true) {
+                        showUninstallConfirmationAlert()
+                    }
+                    .disabled(uninstallInProgress)
+                }
+                .transition(.opacity)
+            }
+        }
+        .animation(.easeInOut(duration: 0.25), value: showAdvanced)
+    }
+
+    private func controlButton(title: String, tint: Color = .primary, action: @escaping () -> Void)
+        -> some View
+    {
        HoverButton(title: title, tint: tint, trailingSystemImage: nil, action: action)
    }

@@ -237,9 +346,12 @@ struct ContentView: View {
        Button {
            isExpanded.wrappedValue.toggle()
        } label: {
-            Label(isExpanded.wrappedValue ? "Hide" : "Show All", systemImage: isExpanded.wrappedValue ? "chevron.up" : "chevron.down")
-                .labelStyle(.titleAndIcon)
-                .contentTransition(.symbolEffect(.replace))
+            Label(
+                isExpanded.wrappedValue ? "Hide" : "Show All",
+                systemImage: isExpanded.wrappedValue ? "chevron.up" : "chevron.down"
+            )
+            .labelStyle(.titleAndIcon)
+            .contentTransition(.symbolEffect(.replace))
        }
        .buttonStyle(.plain)
        .font(.caption2)
@@ -328,15 +440,15 @@ struct ContentView: View {
    }

    private var debugSection: some View {
-        VStack(alignment: .leading, spacing: 6) {
-            HStack {
-                Text("Debug Info")
-                    .font(.caption)
-                    .foregroundColor(.secondary)
-                Spacer()
-                collapseButton(isExpanded: $showDebugInfo)
+        VStack(alignment: .leading, spacing: 4) {
+            HoverButton(
+                title: "Debug Info",
+                tint: .primary,
+                trailingSystemImage: showDebugInfo ? "chevron.up" : "chevron.down",
+                small: true
+            ) {
+                showDebugInfo.toggle()
            }
-            .animation(nil, value: showDebugInfo)
            if showDebugInfo {
                VStack(alignment: .leading, spacing: 4) {
                    Text("Version: \(buildTag)")
@@ -349,15 +461,63 @@ struct ContentView: View {
                        .font(.caption2)
                        .foregroundColor(thunderboltStatusColor)
                    interfaceIpList
+                    rdmaStatusView
                    sendBugReportButton
                        .padding(.top, 6)
                }
+                .padding(.leading, 8)
                .transition(.opacity)
            }
        }
        .animation(.easeInOut(duration: 0.25), value: showDebugInfo)
    }

+    private var rdmaStatusView: some View {
+        let rdma = networkStatusService.status.rdmaStatus
+        return VStack(alignment: .leading, spacing: 1) {
+            Text("RDMA: \(rdmaStatusText(rdma))")
+                .font(.caption2)
+                .foregroundColor(rdmaStatusColor(rdma))
+            if !rdma.devices.isEmpty {
+                Text("  Devices: \(rdma.devices.joined(separator: ", "))")
+                    .font(.caption2)
+                    .foregroundColor(.secondary)
+            }
+            if !rdma.activePorts.isEmpty {
+                Text("  Active Ports:")
+                    .font(.caption2)
+                    .foregroundColor(.secondary)
+                ForEach(rdma.activePorts, id: \.device) { port in
+                    Text("    \(port.device) port \(port.port): \(port.state)")
+                        .font(.caption2)
+                        .foregroundColor(.green)
+                }
+            }
+        }
+    }
+
+    private func rdmaStatusText(_ rdma: RDMAStatus) -> String {
+        switch rdma.rdmaCtlEnabled {
+        case .some(true):
+            return "Enabled"
+        case .some(false):
+            return "Disabled"
+        case nil:
+            return rdma.devices.isEmpty ? "Not Available" : "Available"
+        }
+    }
+
+    private func rdmaStatusColor(_ rdma: RDMAStatus) -> Color {
+        switch rdma.rdmaCtlEnabled {
+        case .some(true):
+            return .green
+        case .some(false):
+            return .orange
+        case nil:
+            return rdma.devices.isEmpty ? .secondary : .green
+        }
+    }
+
    private var sendBugReportButton: some View {
        VStack(alignment: .leading, spacing: 4) {
            Button {
@@ -447,6 +607,88 @@ struct ContentView: View {
        bugReportInFlight = false
    }

+    private func showUninstallConfirmationAlert() {
+        let alert = NSAlert()
+        alert.messageText = "Uninstall EXO"
+        alert.informativeText = """
+            This will remove EXO and all its system components:
+
+            • Network configuration daemon
+            • Launch at login registration
+            • EXO network location
+
+            The app will be moved to Trash.
+            """
+        alert.alertStyle = .warning
+        alert.addButton(withTitle: "Uninstall")
+        alert.addButton(withTitle: "Cancel")
+
+        // Style the Uninstall button as destructive
+        if let uninstallButton = alert.buttons.first {
+            uninstallButton.hasDestructiveAction = true
+        }
+
+        let response = alert.runModal()
+        if response == .alertFirstButtonReturn {
+            performUninstall()
+        }
+    }
+
+    private func performUninstall() {
+        uninstallInProgress = true
+
+        // Stop EXO process first
+        controller.cancelPendingLaunch()
+        controller.stop()
+        stateService.stopPolling()
+
+        // Run the privileged uninstall on a background thread
+        // Using .utility QoS to avoid priority inversion with NSAppleScript's subprocess
+        DispatchQueue.global(qos: .utility).async {
+            do {
+                // Remove network setup daemon and components (requires admin privileges)
+                try NetworkSetupHelper.uninstall()
+
+                DispatchQueue.main.async {
+                    // Unregister from launch at login
+                    LaunchAtLoginHelper.disable()
+
+                    // Move app to trash
+                    self.moveAppToTrash()
+
+                    // Quit the app
+                    DispatchQueue.main.asyncAfter(deadline: .now() + 0.5) {
+                        NSApplication.shared.terminate(nil)
+                    }
+                }
+            } catch {
+                DispatchQueue.main.async {
+                    self.showErrorAlert(message: error.localizedDescription)
+                    self.uninstallInProgress = false
+                }
+            }
+        }
+    }
+
+    private func showErrorAlert(message: String) {
+        let alert = NSAlert()
+        alert.messageText = "Uninstall Failed"
+        alert.informativeText = message
+        alert.alertStyle = .critical
+        alert.addButton(withTitle: "OK")
+        alert.runModal()
+    }
+
+    private func moveAppToTrash() {
+        guard let appURL = Bundle.main.bundleURL as URL? else { return }
+        do {
+            try FileManager.default.trashItem(at: appURL, resultingItemURL: nil)
+        } catch {
+            // If we can't trash the app, that's OK - user can do it manually
+            // The important system components have already been cleaned up
+        }
+    }
+
    private var buildTag: String {
        Bundle.main.infoDictionary?["EXOBuildTag"] as? String ?? "unknown"
    }
@@ -460,14 +702,27 @@ private struct HoverButton: View {
    let title: String
    let tint: Color
    let trailingSystemImage: String?
+    let small: Bool
    let action: () -> Void

+    init(
+        title: String, tint: Color = .primary, trailingSystemImage: String? = nil,
+        small: Bool = false, action: @escaping () -> Void
+    ) {
+        self.title = title
+        self.tint = tint
+        self.trailingSystemImage = trailingSystemImage
+        self.small = small
+        self.action = action
+    }
+
    @State private var isHovering = false

    var body: some View {
        Button(action: action) {
            HStack {
                Text(title)
+                    .font(small ? .caption : nil)
                Spacer()
                if let systemName = trailingSystemImage {
                    Image(systemName: systemName)
@@ -475,8 +730,8 @@ private struct HoverButton: View {
                }
            }
            .frame(maxWidth: .infinity, alignment: .leading)
-            .padding(.vertical, 6)
-            .padding(.horizontal, 8)
+            .padding(.vertical, small ? 4 : 6)
+            .padding(.horizontal, small ? 6 : 8)
            .background(
                RoundedRectangle(cornerRadius: 6)
                    .fill(
@@ -491,4 +746,3 @@ private struct HoverButton: View {
        .onHover { isHovering = $0 }
    }
 }
-
--- a/app/EXO/EXO/EXOApp.swift
+++ b/app/EXO/EXO/EXOApp.swift
@@ -8,9 +8,9 @@
 import AppKit
 import CoreImage
 import CoreImage.CIFilterBuiltins
+import ServiceManagement
 import Sparkle
 import SwiftUI
-import ServiceManagement
 import UserNotifications
 import os.log

@@ -19,6 +19,7 @@ struct EXOApp: App {
    @StateObject private var controller: ExoProcessController
    @StateObject private var stateService: ClusterStateService
    @StateObject private var networkStatusService: NetworkStatusService
+    @StateObject private var localNetworkChecker: LocalNetworkChecker
    @StateObject private var updater: SparkleUpdater
    private let terminationObserver: TerminationObserver
    private let ciContext = CIContext(options: nil)
@@ -37,9 +38,13 @@ struct EXOApp: App {
        _stateService = StateObject(wrappedValue: service)
        let networkStatus = NetworkStatusService()
        _networkStatusService = StateObject(wrappedValue: networkStatus)
+        let localNetwork = LocalNetworkChecker()
+        _localNetworkChecker = StateObject(wrappedValue: localNetwork)
        _updater = StateObject(wrappedValue: updater)
        enableLaunchAtLoginIfNeeded()
        NetworkSetupHelper.ensureLaunchDaemonInstalled()
+        // Check local network access BEFORE launching exo
+        localNetwork.check()
        controller.scheduleLaunch(after: 15)
        service.startPolling()
        networkStatus.startPolling()
@@ -51,6 +56,7 @@ struct EXOApp: App {
                .environmentObject(controller)
                .environmentObject(stateService)
                .environmentObject(networkStatusService)
+                .environmentObject(localNetworkChecker)
                .environmentObject(updater)
        } label: {
            menuBarIcon
@@ -107,7 +113,7 @@ struct EXOApp: App {
        filter.contrast = 0.9

        guard let output = filter.outputImage,
-              let rendered = ciContext.createCGImage(output, from: output.extent)
+            let rendered = ciContext.createCGImage(output, from: output.extent)
        else {
            return nil
        }
@@ -120,7 +126,26 @@ struct EXOApp: App {
        do {
            try SMAppService.mainApp.register()
        } catch {
-            Logger().error("Failed to register EXO for launch at login: \(error.localizedDescription)")
+            Logger().error(
+                "Failed to register EXO for launch at login: \(error.localizedDescription)")
+        }
+    }
+}
+
+/// Helper for managing EXO's launch-at-login registration
+enum LaunchAtLoginHelper {
+    private static let logger = Logger(subsystem: "io.exo.EXO", category: "LaunchAtLogin")
+
+    /// Unregisters EXO from launching at login
+    static func disable() {
+        guard SMAppService.mainApp.status == .enabled else { return }
+        do {
+            try SMAppService.mainApp.unregister()
+            logger.info("Unregistered EXO from launch at login")
+        } catch {
+            logger.error(
+                "Failed to unregister EXO from launch at login: \(error.localizedDescription, privacy: .public)"
+            )
        }
    }
 }
@@ -145,7 +170,7 @@ final class SparkleUpdater: NSObject, ObservableObject {
        center.requestAuthorization(options: [.alert, .sound]) { _, _ in }
        controller.updater.automaticallyChecksForUpdates = true
        controller.updater.automaticallyDownloadsUpdates = false
-        controller.updater.updateCheckInterval = 900 // 15 minutes
+        controller.updater.updateCheckInterval = 900  // 15 minutes
        DispatchQueue.main.asyncAfter(deadline: .now() + 5) { [weak controller] in
            controller?.updater.checkForUpdatesInBackground()
        }
@@ -212,7 +237,8 @@ private final class ExoNotificationDelegate: NSObject, UNUserNotificationCenterD
    func userNotificationCenter(
        _ center: UNUserNotificationCenter,
        willPresent notification: UNNotification,
-        withCompletionHandler completionHandler: @escaping (UNNotificationPresentationOptions) -> Void
+        withCompletionHandler completionHandler: @escaping (UNNotificationPresentationOptions) ->
+            Void
    ) {
        completionHandler([.banner, .list, .sound])
    }
--- a/app/EXO/EXO/ExoProcessController.swift
+++ b/app/EXO/EXO/ExoProcessController.swift
@@ -2,6 +2,8 @@ import AppKit
 import Combine
 import Foundation

+private let customNamespaceKey = "EXOCustomNamespace"
+
@MainActor
 final class ExoProcessController: ObservableObject {
    enum Status: Equatable {
@@ -27,6 +29,14 @@ final class ExoProcessController: ObservableObject {
    @Published private(set) var status: Status = .stopped
    @Published private(set) var lastError: String?
    @Published private(set) var launchCountdownSeconds: Int?
+    @Published var customNamespace: String = {
+        return UserDefaults.standard.string(forKey: customNamespaceKey) ?? ""
+    }()
+    {
+        didSet {
+            UserDefaults.standard.set(customNamespace, forKey: customNamespaceKey)
+        }
+    }

    private var process: Process?
    private var runtimeDirectoryURL: URL?
@@ -180,7 +190,7 @@ final class ExoProcessController: ObservableObject {
    private func makeEnvironment(for runtimeURL: URL) -> [String: String] {
        var environment = ProcessInfo.processInfo.environment
        environment["EXO_RUNTIME_DIR"] = runtimeURL.path
-        environment["EXO_LIBP2P_NAMESPACE"] = buildTag()
+        environment["EXO_LIBP2P_NAMESPACE"] = computeNamespace()

        var paths: [String] = []
        if let existing = environment["PATH"], !existing.isEmpty {
@@ -212,11 +222,19 @@ final class ExoProcessController: ObservableObject {
        if let tag = Bundle.main.infoDictionary?["EXOBuildTag"] as? String, !tag.isEmpty {
            return tag
        }
-        if let short = Bundle.main.infoDictionary?["CFBundleShortVersionString"] as? String, !short.isEmpty {
+        if let short = Bundle.main.infoDictionary?["CFBundleShortVersionString"] as? String,
+            !short.isEmpty
+        {
            return short
        }
        return "dev"
    }
+
+    private func computeNamespace() -> String {
+        let base = buildTag()
+        let custom = customNamespace.trimmingCharacters(in: .whitespaces)
+        return custom.isEmpty ? base : custom
+    }
 }

 struct RuntimeError: LocalizedError {
--- a/app/EXO/EXO/Info.plist
+++ b/app/EXO/EXO/Info.plist
@@ -8,5 +8,15 @@
 	<string>$(EXO_BUILD_TAG)</string>
 	<key>EXOBuildCommit</key>
 	<string>$(EXO_BUILD_COMMIT)</string>
+	<key>EXOBugReportPresignedUrlEndpoint</key>
+	<string>$(EXO_BUG_REPORT_PRESIGNED_URL_ENDPOINT)</string>
+	<key>NSLocalNetworkUsageDescription</key>
+	<string>EXO needs local network access to discover and connect to other devices in your cluster for distributed AI inference.</string>
+	<key>NSBonjourServices</key>
+	<array>
+		<string>_p2p._tcp</string>
+		<string>_p2p._udp</string>
+		<string>_libp2p._udp</string>
+	</array>
 </dict>
 </plist>
--- a/app/EXO/EXO/Models/ClusterState.swift
+++ b/app/EXO/EXO/Models/ClusterState.swift
@@ -16,10 +16,13 @@ struct ClusterState: Decodable {
        self.instances = rawInstances.mapValues(\.instance)
        self.runners = try container.decode([String: RunnerStatusSummary].self, forKey: .runners)
        self.nodeProfiles = try container.decode([String: NodeProfile].self, forKey: .nodeProfiles)
-        let rawTasks = try container.decodeIfPresent([String: TaggedTask].self, forKey: .tasks) ?? [:]
+        let rawTasks =
+            try container.decodeIfPresent([String: TaggedTask].self, forKey: .tasks) ?? [:]
        self.tasks = rawTasks.compactMapValues(\.task)
        self.topology = try container.decodeIfPresent(Topology.self, forKey: .topology)
-        let rawDownloads = try container.decodeIfPresent([String: [TaggedNodeDownload]].self, forKey: .downloads) ?? [:]
+        let rawDownloads =
+            try container.decodeIfPresent([String: [TaggedNodeDownload]].self, forKey: .downloads)
+            ?? [:]
        self.downloads = rawDownloads.mapValues { $0.compactMap(\.status) }
    }

@@ -41,7 +44,8 @@ private struct TaggedInstance: Decodable {
        let payloads = try container.decode([String: ClusterInstancePayload].self)
        guard let entry = payloads.first else {
            throw DecodingError.dataCorrupted(
-                DecodingError.Context(codingPath: decoder.codingPath, debugDescription: "Empty instance payload")
+                DecodingError.Context(
+                    codingPath: decoder.codingPath, debugDescription: "Empty instance payload")
            )
        }
        self.instance = ClusterInstance(
@@ -77,7 +81,8 @@ struct RunnerStatusSummary: Decodable {
        let payloads = try container.decode([String: RunnerStatusDetail].self)
        guard let entry = payloads.first else {
            throw DecodingError.dataCorrupted(
-                DecodingError.Context(codingPath: decoder.codingPath, debugDescription: "Empty runner status payload")
+                DecodingError.Context(
+                    codingPath: decoder.codingPath, debugDescription: "Empty runner status payload")
            )
        }
        self.status = entry.key
@@ -257,7 +262,9 @@ struct ChatCompletionTaskParameters: Decodable, Equatable {

    func promptPreview() -> String? {
        guard let messages else { return nil }
-        if let userMessage = messages.last(where: { $0.role?.lowercased() == "user" && ($0.content?.isEmpty == false) }) {
+        if let userMessage = messages.last(where: {
+            $0.role?.lowercased() == "user" && ($0.content?.isEmpty == false)
+        }) {
            return userMessage.content
        }
        return messages.last?.content
@@ -365,5 +372,3 @@ extension ClusterState {

    func availableModels() -> [ModelOption] { [] }
 }
-
-
--- a/app/EXO/EXO/Services/BugReportService.swift
+++ b/app/EXO/EXO/Services/BugReportService.swift
@@ -1,4 +1,3 @@
-import CryptoKit
 import Foundation

 struct BugReportOutcome: Equatable {
@@ -7,17 +6,17 @@ struct BugReportOutcome: Equatable {
 }

 enum BugReportError: LocalizedError {
-    case missingCredentials
    case invalidEndpoint
+    case presignedUrlFailed(String)
    case uploadFailed(String)
    case collectFailed(String)

    var errorDescription: String? {
        switch self {
-        case .missingCredentials:
-            return "Bug report upload credentials are not set."
        case .invalidEndpoint:
            return "Bug report endpoint is invalid."
+        case .presignedUrlFailed(let message):
+            return "Failed to get presigned URLs: \(message)"
        case .uploadFailed(let message):
            return "Bug report upload failed: \(message)"
        case .collectFailed(let message):
@@ -27,11 +26,13 @@ enum BugReportError: LocalizedError {
 }

 struct BugReportService {
-    struct AWSConfig {
-        let accessKey: String
-        let secretKey: String
-        let region: String
-        let bucket: String
+    private struct PresignedUrlsRequest: Codable {
+        let keys: [String]
+    }
+
+    private struct PresignedUrlsResponse: Codable {
+        let urls: [String: String]
+        let expiresIn: Int?
    }

    func sendReport(
@@ -39,9 +40,9 @@ struct BugReportService {
        now: Date = Date(),
        isManual: Bool = false
    ) async throws -> BugReportOutcome {
-        let credentials = try loadCredentials()
-        let timestamp = ISO8601DateFormatter().string(from: now)
-        let prefix = "reports/\(timestamp)/"
+        let timestamp = Self.runTimestampString(now)
+        let dayPrefix = Self.dayPrefixString(now)
+        let prefix = "reports/\(dayPrefix)/\(timestamp)/"

        let logData = readLog()
        let ifconfigText = try await captureIfconfig()
@@ -66,28 +67,82 @@ struct BugReportService {
            ("\(prefix)exo.log", logData),
            ("\(prefix)state.json", stateData),
            ("\(prefix)events.json", eventsData),
-            ("\(prefix)report.json", reportJSON)
+            ("\(prefix)report.json", reportJSON),
        ]

-        let uploader = try S3Uploader(config: credentials)
-        for item in uploads {
-            guard let data = item.data else { continue }
-            try await uploader.upload(
-                objectPath: item.path,
-                body: data
-            )
+        let uploadItems: [(key: String, body: Data)] = uploads.compactMap { item in
+            guard let body = item.data else { return nil }
+            return (key: item.path, body: body)
        }

-        return BugReportOutcome(success: true, message: "Bug Report sent. Thank you for helping to improve EXO 1.0.")
+        guard !uploadItems.isEmpty else {
+            return BugReportOutcome(success: false, message: "No data to upload")
+        }
+
+        let presignedUrls = try await fetchPresignedUploadUrls(keys: uploadItems.map(\.key))
+        for item in uploadItems {
+            guard let urlString = presignedUrls[item.key], let url = URL(string: urlString) else {
+                throw BugReportError.uploadFailed("Missing presigned URL for \(item.key)")
+            }
+            try await uploadToPresignedUrl(url: url, body: item.body)
+        }
+
+        return BugReportOutcome(
+            success: true, message: "Bug Report sent. Thank you for helping to improve EXO 1.0.")
    }

-    private func loadCredentials() throws -> AWSConfig {
-        return AWSConfig(
-            accessKey: "AKIAYEKP5EMXTOBYDGHX",
-            secretKey: "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",
-            region: "us-east-1",
-            bucket: "exo-bug-reports"
-        )
+    private static func dayPrefixString(_ date: Date) -> String {
+        var calendar = Calendar(identifier: .gregorian)
+        calendar.timeZone = TimeZone(secondsFromGMT: 0) ?? .current
+        let components = calendar.dateComponents([.year, .month, .day], from: date)
+        let year = components.year ?? 0
+        let month = components.month ?? 0
+        let day = components.day ?? 0
+        return String(format: "%04d/%02d/%02d", year, month, day)
+    }
+
+    private static func runTimestampString(_ date: Date) -> String {
+        let formatter = DateFormatter()
+        formatter.locale = Locale(identifier: "en_US_POSIX")
+        formatter.timeZone = TimeZone(secondsFromGMT: 0) ?? .current
+        formatter.dateFormat = "yyyy-MM-dd'T'HHmmss.SSS'Z'"
+        return formatter.string(from: date)
+    }
+
+    private func fetchPresignedUploadUrls(keys: [String], bundle: Bundle = .main) async throws
+        -> [String: String]
+    {
+        guard
+            let endpointString = bundle.infoDictionary?["EXOBugReportPresignedUrlEndpoint"]
+                as? String
+        else {
+            throw BugReportError.invalidEndpoint
+        }
+        let trimmedEndpointString = endpointString.trimmingCharacters(in: .whitespacesAndNewlines)
+        guard !trimmedEndpointString.isEmpty, let endpoint = URL(string: trimmedEndpointString)
+        else {
+            throw BugReportError.invalidEndpoint
+        }
+
+        var request = URLRequest(url: endpoint)
+        request.httpMethod = "POST"
+        request.timeoutInterval = 10
+        request.setValue("application/json", forHTTPHeaderField: "Content-Type")
+
+        let encoder = JSONEncoder()
+        request.httpBody = try encoder.encode(PresignedUrlsRequest(keys: keys))
+
+        let (data, response) = try await URLSession.shared.data(for: request)
+        guard let http = response as? HTTPURLResponse else {
+            throw BugReportError.presignedUrlFailed("Non-HTTP response")
+        }
+        guard (200..<300).contains(http.statusCode) else {
+            throw BugReportError.presignedUrlFailed("HTTP status \(http.statusCode)")
+        }
+
+        let decoder = JSONDecoder()
+        let decoded = try decoder.decode(PresignedUrlsResponse.self, from: data)
+        return decoded.urls
    }

    private func readLog() -> Data? {
@@ -100,7 +155,8 @@ struct BugReportService {
    private func captureIfconfig() async throws -> String {
        let result = runCommand(["/sbin/ifconfig"])
        guard result.exitCode == 0 else {
-            throw BugReportError.collectFailed(result.error.isEmpty ? "ifconfig failed" : result.error)
+            throw BugReportError.collectFailed(
+                result.error.isEmpty ? "ifconfig failed" : result.error)
        }
        return result.output
    }
@@ -108,12 +164,23 @@ struct BugReportService {
    private func readDebugInfo() -> DebugInfo {
        DebugInfo(
            thunderboltBridgeDisabled: readThunderboltBridgeDisabled(),
-            interfaces: readInterfaces()
+            interfaces: readInterfaces(),
+            rdma: readRDMADebugInfo()
+        )
+    }
+
+    private func readRDMADebugInfo() -> DebugInfo.RDMADebugInfo {
+        DebugInfo.RDMADebugInfo(
+            rdmaCtlStatus: safeRunCommand(["/usr/bin/rdma_ctl", "status"]),
+            ibvDevices: safeRunCommand(["/usr/bin/ibv_devices"]),
+            ibvDevinfo: safeRunCommand(["/usr/bin/ibv_devinfo"])
        )
    }

    private func readThunderboltBridgeDisabled() -> Bool? {
-        let result = runCommand(["/usr/sbin/networksetup", "-getnetworkserviceenabled", "Thunderbolt Bridge"])
+        let result = runCommand([
+            "/usr/sbin/networksetup", "-getnetworkserviceenabled", "Thunderbolt Bridge",
+        ])
        guard result.exitCode == 0 else { return nil }
        let output = result.output.lowercased()
        if output.contains("enabled") {
@@ -156,7 +223,8 @@ struct BugReportService {
        request.timeoutInterval = 5
        do {
            let (data, response) = try await URLSession.shared.data(for: request)
-            guard let http = response as? HTTPURLResponse, (200..<300).contains(http.statusCode) else {
+            guard let http = response as? HTTPURLResponse, (200..<300).contains(http.statusCode)
+            else {
                return nil
            }
            return data
@@ -165,6 +233,36 @@ struct BugReportService {
        }
    }

+    private func uploadToPresignedUrl(url: URL, body: Data) async throws {
+        let maxAttempts = 2
+        var lastError: Error?
+
+        for attempt in 1...maxAttempts {
+            do {
+                var request = URLRequest(url: url)
+                request.httpMethod = "PUT"
+                request.httpBody = body
+                request.timeoutInterval = 30
+
+                let (_, response) = try await URLSession.shared.data(for: request)
+                guard let http = response as? HTTPURLResponse else {
+                    throw BugReportError.uploadFailed("Non-HTTP response")
+                }
+                guard (200..<300).contains(http.statusCode) else {
+                    throw BugReportError.uploadFailed("HTTP status \(http.statusCode)")
+                }
+                return
+            } catch {
+                lastError = error
+                if attempt < maxAttempts {
+                    try await Task.sleep(nanoseconds: 400_000_000)
+                }
+            }
+        }
+
+        throw BugReportError.uploadFailed(lastError?.localizedDescription ?? "Unknown error")
+    }
+
    private func makeReportJson(
        timestamp: String,
        hostName: String,
@@ -182,7 +280,7 @@ struct BugReportService {
            "system": system,
            "exo_version": exo.version as Any,
            "exo_commit": exo.commit as Any,
-            "report_type": isManual ? "manual" : "automated"
+            "report_type": isManual ? "manual" : "automated",
        ]
        return try? JSONSerialization.data(withJSONObject: payload, options: [.prettyPrinted])
    }
@@ -213,10 +311,13 @@ struct BugReportService {
        let user = safeRunCommand(["/usr/bin/whoami"])
        let consoleUser = safeRunCommand(["/usr/bin/stat", "-f%Su", "/dev/console"])
        let uptime = safeRunCommand(["/usr/bin/uptime"])
-        let diskRoot = safeRunCommand(["/bin/sh", "-c", "/bin/df -h / | awk 'NR==2 {print $1, $2, $3, $4, $5}'"])
+        let diskRoot = safeRunCommand([
+            "/bin/sh", "-c", "/bin/df -h / | awk 'NR==2 {print $1, $2, $3, $4, $5}'",
+        ])

        let interfacesList = safeRunCommand(["/usr/sbin/ipconfig", "getiflist"])
-        let interfacesAndIPs = interfacesList?
+        let interfacesAndIPs =
+            interfacesList?
            .split(whereSeparator: { $0 == " " || $0 == "\n" })
            .compactMap { iface -> [String: Any]? in
                let name = String(iface)
@@ -227,7 +328,8 @@ struct BugReportService {
            } ?? []

        let wifiSSID: String?
-        let airportPath = "/System/Library/PrivateFrameworks/Apple80211.framework/Versions/Current/Resources/airport"
+        let airportPath =
+            "/System/Library/PrivateFrameworks/Apple80211.framework/Versions/Current/Resources/airport"
        if FileManager.default.isExecutableFile(atPath: airportPath) {
            wifiSSID = safeRunCommand([airportPath, "-I"]).flatMap(parseWifiSSID)
        } else {
@@ -255,7 +357,7 @@ struct BugReportService {
            "disk_root": diskRoot as Any,
            "interfaces_and_ips": interfacesAndIPs,
            "ipconfig_getiflist": interfacesList as Any,
-            "wifi_ssid": wifiSSID as Any
+            "wifi_ssid": wifiSSID as Any,
        ]
    }

@@ -313,7 +415,8 @@ struct BugReportService {
        for line in airportOutput.split(separator: "\n") {
            let trimmed = line.trimmingCharacters(in: .whitespaces)
            if trimmed.hasPrefix("SSID:") {
-                return trimmed.replacingOccurrences(of: "SSID:", with: "").trimmingCharacters(in: .whitespaces)
+                return trimmed.replacingOccurrences(of: "SSID:", with: "").trimmingCharacters(
+                    in: .whitespaces)
            }
        }
        return nil
@@ -350,6 +453,7 @@ struct BugReportService {
 private struct DebugInfo {
    let thunderboltBridgeDisabled: Bool?
    let interfaces: [InterfaceStatus]
+    let rdma: RDMADebugInfo

    struct InterfaceStatus {
        let name: String
@@ -358,7 +462,21 @@ private struct DebugInfo {
        func toDictionary() -> [String: Any] {
            [
                "name": name,
-                "ip": ip as Any
+                "ip": ip as Any,
+            ]
+        }
+    }
+
+    struct RDMADebugInfo {
+        let rdmaCtlStatus: String?
+        let ibvDevices: String?
+        let ibvDevinfo: String?
+
+        func toDictionary() -> [String: Any] {
+            [
+                "rdma_ctl_status": rdmaCtlStatus as Any,
+                "ibv_devices": ibvDevices as Any,
+                "ibv_devinfo": ibvDevinfo as Any,
            ]
        }
    }
@@ -366,7 +484,8 @@ private struct DebugInfo {
    func toDictionary() -> [String: Any] {
        [
            "thunderbolt_bridge_disabled": thunderboltBridgeDisabled as Any,
-            "interfaces": interfaces.map { $0.toDictionary() }
+            "interfaces": interfaces.map { $0.toDictionary() },
+            "rdma": rdma.toDictionary(),
        ]
    }
 }
@@ -376,163 +495,3 @@ private struct CommandResult {
    let output: String
    let error: String
 }
-
-private struct S3Uploader {
-    let config: BugReportService.AWSConfig
-
-    init(config: BugReportService.AWSConfig) throws {
-        self.config = config
-    }
-
-    func upload(objectPath: String, body: Data) async throws {
-        let host = "\(config.bucket).s3.amazonaws.com"
-        guard let url = URL(string: "https://\(host)/\(objectPath)") else {
-            throw BugReportError.invalidEndpoint
-        }
-
-        let now = Date()
-        let amzDate = awsTimestamp(now)
-        let dateStamp = dateStamp(now)
-        let payloadHash = sha256Hex(body)
-
-        let headers = [
-            "host": host,
-            "x-amz-content-sha256": payloadHash,
-            "x-amz-date": amzDate
-        ]
-
-        let canonicalRequest = buildCanonicalRequest(
-            method: "PUT",
-            url: url,
-            headers: headers,
-            payloadHash: payloadHash
-        )
-
-        let stringToSign = buildStringToSign(
-            amzDate: amzDate,
-            dateStamp: dateStamp,
-            canonicalRequestHash: sha256Hex(canonicalRequest.data(using: .utf8) ?? Data())
-        )
-
-        let signingKey = deriveKey(secret: config.secretKey, dateStamp: dateStamp, region: config.region, service: "s3")
-        let signature = hmacHex(key: signingKey, data: Data(stringToSign.utf8))
-
-        let signedHeaders = "host;x-amz-content-sha256;x-amz-date"
-        let authorization = """
-AWS4-HMAC-SHA256 Credential=\(config.accessKey)/\(dateStamp)/\(config.region)/s3/aws4_request, SignedHeaders=\(signedHeaders), Signature=\(signature)
-"""
-
-        var request = URLRequest(url: url)
-        request.httpMethod = "PUT"
-        request.httpBody = body
-        request.setValue(headers["x-amz-content-sha256"], forHTTPHeaderField: "x-amz-content-sha256")
-        request.setValue(headers["x-amz-date"], forHTTPHeaderField: "x-amz-date")
-        request.setValue(host, forHTTPHeaderField: "Host")
-        request.setValue(authorization, forHTTPHeaderField: "Authorization")
-
-        let (data, response) = try await URLSession.shared.data(for: request)
-        guard let http = response as? HTTPURLResponse, (200..<300).contains(http.statusCode) else {
-            let statusText = (response as? HTTPURLResponse)?.statusCode ?? -1
-            _ = data // ignore response body for UX
-            throw BugReportError.uploadFailed("HTTP status \(statusText)")
-        }
-    }
-
-    private func buildCanonicalRequest(
-        method: String,
-        url: URL,
-        headers: [String: String],
-        payloadHash: String
-    ) -> String {
-        let canonicalURI = encodePath(url.path)
-        let canonicalQuery = url.query ?? ""
-        let sortedHeaders = headers.sorted { $0.key < $1.key }
-        let canonicalHeaders = sortedHeaders
-            .map { "\($0.key.lowercased()):\($0.value)\n" }
-            .joined()
-        let signedHeaders = sortedHeaders.map { $0.key.lowercased() }.joined(separator: ";")
-
-        return [
-            method,
-            canonicalURI,
-            canonicalQuery,
-            canonicalHeaders,
-            signedHeaders,
-            payloadHash
-        ].joined(separator: "\n")
-    }
-
-    private func encodePath(_ path: String) -> String {
-        return path
-            .split(separator: "/")
-            .map { segment in
-                segment.addingPercentEncoding(withAllowedCharacters: Self.rfc3986) ?? String(segment)
-            }
-            .joined(separator: "/")
-            .prependSlashIfNeeded()
-    }
-
-    private func buildStringToSign(
-        amzDate: String,
-        dateStamp: String,
-        canonicalRequestHash: String
-    ) -> String {
-        """
-AWS4-HMAC-SHA256
-\(amzDate)
-\(dateStamp)/\(config.region)/s3/aws4_request
-\(canonicalRequestHash)
-"""
-    }
-
-    private func deriveKey(secret: String, dateStamp: String, region: String, service: String) -> Data {
-        let kDate = hmac(key: Data(("AWS4" + secret).utf8), data: Data(dateStamp.utf8))
-        let kRegion = hmac(key: kDate, data: Data(region.utf8))
-        let kService = hmac(key: kRegion, data: Data(service.utf8))
-        return hmac(key: kService, data: Data("aws4_request".utf8))
-    }
-
-    private func hmac(key: Data, data: Data) -> Data {
-        let keySym = SymmetricKey(data: key)
-        let mac = HMAC<SHA256>.authenticationCode(for: data, using: keySym)
-        return Data(mac)
-    }
-
-    private func hmacHex(key: Data, data: Data) -> String {
-        hmac(key: key, data: data).map { String(format: "%02x", $0) }.joined()
-    }
-
-    private func sha256Hex(_ data: Data) -> String {
-        let digest = SHA256.hash(data: data)
-        return digest.compactMap { String(format: "%02x", $0) }.joined()
-    }
-
-    private func awsTimestamp(_ date: Date) -> String {
-        let formatter = DateFormatter()
-        formatter.dateFormat = "yyyyMMdd'T'HHmmss'Z'"
-        formatter.timeZone = TimeZone(abbreviation: "UTC")
-        return formatter.string(from: date)
-    }
-
-    private func dateStamp(_ date: Date) -> String {
-        let formatter = DateFormatter()
-        formatter.dateFormat = "yyyyMMdd"
-        formatter.timeZone = TimeZone(abbreviation: "UTC")
-        return formatter.string(from: date)
-    }
-
-    private static let rfc3986: CharacterSet = {
-        var set = CharacterSet.alphanumerics
-        set.insert(charactersIn: "-._~")
-        return set
-    }()
-}
-
-private extension String {
-    func prependSlashIfNeeded() -> String {
-        if hasPrefix("/") {
-            return self
-        }
-        return "/" + self
-    }
-}
--- a/app/EXO/EXO/Services/ClusterStateService.swift
+++ b/app/EXO/EXO/Services/ClusterStateService.swift
@@ -57,7 +57,9 @@ final class ClusterStateService: ObservableObject {
            var request = URLRequest(url: url)
            request.cachePolicy = .reloadIgnoringLocalCacheData
            let (data, response) = try await session.data(for: request)
-            guard let httpResponse = response as? HTTPURLResponse, (200..<300).contains(httpResponse.statusCode) else {
+            guard let httpResponse = response as? HTTPURLResponse,
+                (200..<300).contains(httpResponse.statusCode)
+            else {
                return
            }
            if let nodeId = try? decoder.decode(String.self, from: data) {
@@ -113,7 +115,9 @@ final class ClusterStateService: ObservableObject {
        }
    }

-    func launchInstance(modelId: String, sharding: String, instanceMeta: String, minNodes: Int) async {
+    func launchInstance(modelId: String, sharding: String, instanceMeta: String, minNodes: Int)
+        async
+    {
        do {
            var request = URLRequest(url: baseURL.appendingPathComponent("instance"))
            request.httpMethod = "POST"
@@ -122,7 +126,7 @@ final class ClusterStateService: ObservableObject {
                "model_id": modelId,
                "sharding": sharding,
                "instance_meta": instanceMeta,
-                "min_nodes": minNodes
+                "min_nodes": minNodes,
            ]
            request.httpBody = try JSONSerialization.data(withJSONObject: payload, options: [])
            let (_, response) = try await session.data(for: request)
@@ -143,7 +147,9 @@ final class ClusterStateService: ObservableObject {
        do {
            let url = baseURL.appendingPathComponent("models")
            let (data, response) = try await session.data(from: url)
-            guard let httpResponse = response as? HTTPURLResponse, (200..<300).contains(httpResponse.statusCode) else {
+            guard let httpResponse = response as? HTTPURLResponse,
+                (200..<300).contains(httpResponse.statusCode)
+            else {
                throw URLError(.badServerResponse)
            }
            let list = try decoder.decode(ModelListResponse.self, from: data)
--- a/app/EXO/EXO/Services/LocalNetworkChecker.swift
+++ b/app/EXO/EXO/Services/LocalNetworkChecker.swift
@@ -0,0 +1,150 @@
+import Foundation
+import Network
+import os.log
+
+/// Checks if the app's local network permission is actually functional.
+///
+/// macOS local network permission can appear enabled in System Preferences but not
+/// actually work after a restart. This service detects this by creating a UDP
+/// connection to the mDNS multicast address (224.0.0.251:5353).
+@MainActor
+final class LocalNetworkChecker: ObservableObject {
+    enum Status: Equatable {
+        case unknown
+        case checking
+        case working
+        case notWorking(reason: String)
+
+        var isHealthy: Bool {
+            if case .working = self { return true }
+            return false
+        }
+
+        var displayText: String {
+            switch self {
+            case .unknown:
+                return "Unknown"
+            case .checking:
+                return "Checking..."
+            case .working:
+                return "Working"
+            case .notWorking(let reason):
+                return reason
+            }
+        }
+    }
+
+    private static let logger = Logger(subsystem: "io.exo.EXO", category: "LocalNetworkChecker")
+
+    @Published private(set) var status: Status = .unknown
+    @Published private(set) var lastConnectionState: String = "none"
+
+    private var connection: NWConnection?
+    private var checkTask: Task<Void, Never>?
+
+    /// Checks if local network access is working.
+    func check() {
+        checkTask?.cancel()
+        status = .checking
+        lastConnectionState = "connecting"
+
+        checkTask = Task { [weak self] in
+            guard let self else { return }
+            let result = await self.performCheck()
+            self.status = result
+            Self.logger.info("Local network check complete: \(result.displayText)")
+        }
+    }
+
+    private func performCheck() async -> Status {
+        Self.logger.info("Checking local network access via UDP multicast")
+
+        connection?.cancel()
+        connection = nil
+
+        // mDNS multicast address - same as libp2p uses for peer discovery
+        let host = NWEndpoint.Host("224.0.0.251")
+        let port = NWEndpoint.Port(integerLiteral: 5353)
+
+        let params = NWParameters.udp
+        params.allowLocalEndpointReuse = true
+
+        let conn = NWConnection(host: host, port: port, using: params)
+        connection = conn
+
+        return await withCheckedContinuation { continuation in
+            var hasResumed = false
+            let lock = NSLock()
+
+            let resumeOnce: (Status) -> Void = { status in
+                lock.lock()
+                defer { lock.unlock() }
+                guard !hasResumed else { return }
+                hasResumed = true
+                continuation.resume(returning: status)
+            }
+
+            conn.stateUpdateHandler = { [weak self] state in
+                let stateStr: String
+                switch state {
+                case .setup: stateStr = "setup"
+                case .preparing: stateStr = "preparing"
+                case .ready: stateStr = "ready"
+                case .waiting(let e): stateStr = "waiting(\(e))"
+                case .failed(let e): stateStr = "failed(\(e))"
+                case .cancelled: stateStr = "cancelled"
+                @unknown default: stateStr = "unknown"
+                }
+
+                Task { @MainActor in
+                    self?.lastConnectionState = stateStr
+                }
+
+                switch state {
+                case .ready:
+                    resumeOnce(.working)
+                case .waiting(let error):
+                    let errorStr = "\(error)"
+                    if errorStr.contains("54") || errorStr.contains("ECONNRESET") {
+                        resumeOnce(.notWorking(reason: "Connection blocked"))
+                    }
+                case .failed(let error):
+                    let errorStr = "\(error)"
+                    if errorStr.contains("65") || errorStr.contains("EHOSTUNREACH")
+                        || errorStr.contains("permission") || errorStr.contains("denied")
+                    {
+                        resumeOnce(.notWorking(reason: "Permission denied"))
+                    } else {
+                        resumeOnce(.notWorking(reason: "Failed: \(error.localizedDescription)"))
+                    }
+                case .cancelled, .setup, .preparing:
+                    break
+                @unknown default:
+                    break
+                }
+            }
+
+            conn.start(queue: .main)
+
+            Task {
+                try? await Task.sleep(nanoseconds: 3_000_000_000)
+                let state = conn.state
+                switch state {
+                case .ready:
+                    resumeOnce(.working)
+                case .waiting, .preparing, .setup:
+                    resumeOnce(.notWorking(reason: "Timeout (may be blocked)"))
+                default:
+                    resumeOnce(.notWorking(reason: "Timeout"))
+                }
+            }
+        }
+    }
+
+    func stop() {
+        checkTask?.cancel()
+        checkTask = nil
+        connection?.cancel()
+        connection = nil
+    }
+}
--- a/app/EXO/EXO/Services/NetworkSetupHelper.swift
+++ b/app/EXO/EXO/Services/NetworkSetupHelper.swift
@@ -5,64 +5,66 @@ import os.log
 enum NetworkSetupHelper {
    private static let logger = Logger(subsystem: "io.exo.EXO", category: "NetworkSetup")
    private static let daemonLabel = "io.exo.networksetup"
-    private static let scriptDestination = "/Library/Application Support/EXO/disable_bridge_enable_dhcp.sh"
+    private static let scriptDestination =
+        "/Library/Application Support/EXO/disable_bridge_enable_dhcp.sh"
    private static let plistDestination = "/Library/LaunchDaemons/io.exo.networksetup.plist"
    private static let requiredStartInterval: Int = 1791

    private static let setupScript = """
-#!/usr/bin/env bash
+        #!/usr/bin/env bash

-set -euo pipefail
+        set -euo pipefail

-PREFS="/Library/Preferences/SystemConfiguration/preferences.plist"
+        PREFS="/Library/Preferences/SystemConfiguration/preferences.plist"

-# Remove bridge0 interface
-ifconfig bridge0 &>/dev/null && {
-  ifconfig bridge0 | grep -q 'member' && {
-    ifconfig bridge0 | awk '/member/ {print $2}' | xargs -n1 ifconfig bridge0 deletem 2>/dev/null || true
-  }
-  ifconfig bridge0 destroy 2>/dev/null || true
-}
+        # Remove bridge0 interface
+        ifconfig bridge0 &>/dev/null && {
+          ifconfig bridge0 | grep -q 'member' && {
+            ifconfig bridge0 | awk '/member/ {print $2}' | xargs -n1 ifconfig bridge0 deletem 2>/dev/null || true
+          }
+          ifconfig bridge0 destroy 2>/dev/null || true
+        }

-# Remove Thunderbolt Bridge from VirtualNetworkInterfaces in preferences.plist
-/usr/libexec/PlistBuddy -c "Delete :VirtualNetworkInterfaces:Bridge:bridge0" "$PREFS" 2>/dev/null || true
+        # Remove Thunderbolt Bridge from VirtualNetworkInterfaces in preferences.plist
+        /usr/libexec/PlistBuddy -c "Delete :VirtualNetworkInterfaces:Bridge:bridge0" "$PREFS" 2>/dev/null || true

-networksetup -listlocations | grep -q exo || {
-  networksetup -createlocation exo
-}
+        networksetup -listlocations | grep -q exo || {
+          networksetup -createlocation exo
+        }

-networksetup -switchtolocation exo
-networksetup -listallhardwareports \\
-  | awk -F': ' '/Hardware Port: / {print $2}' \\
-  | while IFS=":" read -r name; do
-      case "$name" in
-        "Ethernet Adapter"*)
-                ;;
-        "Thunderbolt Bridge")
-                ;;
-        "Thunderbolt "*)
-          networksetup -listallnetworkservices \\
-            | grep -q "EXO $name" \\
-              || networksetup -createnetworkservice "EXO $name" "$name" 2>/dev/null \\
-              || continue
-          networksetup -setdhcp "EXO $name"
-                ;;
-        *)
-          networksetup -listallnetworkservices \\
-            | grep -q "$name" \\
-              || networksetup -createnetworkservice "$name" "$name" 2>/dev/null \\
-              || continue
-                ;;
-      esac
-    done
+        networksetup -switchtolocation exo
+        networksetup -listallhardwareports \\
+          | awk -F': ' '/Hardware Port: / {print $2}' \\
+          | while IFS=":" read -r name; do
+              case "$name" in
+                "Ethernet Adapter"*)
+                        ;;
+                "Thunderbolt Bridge")
+                        ;;
+                "Thunderbolt "*)
+                  networksetup -listallnetworkservices \\
+                    | grep -q "EXO $name" \\
+                      || networksetup -createnetworkservice "EXO $name" "$name" 2>/dev/null \\
+                      || continue
+                  networksetup -setdhcp "EXO $name"
+                        ;;
+                *)
+                  networksetup -listallnetworkservices \\
+                    | grep -q "$name" \\
+                      || networksetup -createnetworkservice "$name" "$name" 2>/dev/null \\
+                      || continue
+                        ;;
+              esac
+            done

-networksetup -listnetworkservices | grep -q "Thunderbolt Bridge" && {
-  networksetup -setnetworkserviceenabled "Thunderbolt Bridge" off
-} || true
-"""
+        networksetup -listnetworkservices | grep -q "Thunderbolt Bridge" && {
+          networksetup -setnetworkserviceenabled "Thunderbolt Bridge" off
+        } || true
+        """

    static func ensureLaunchDaemonInstalled() {
-        Task.detached {
+        // Use .utility priority to match NSAppleScript's internal QoS and avoid priority inversion
+        Task.detached(priority: .utility) {
            do {
                if daemonAlreadyInstalled() {
                    return
@@ -70,11 +72,70 @@ networksetup -listnetworkservices | grep -q "Thunderbolt Bridge" && {
                try await installLaunchDaemon()
                logger.info("Network setup launch daemon installed and started")
            } catch {
-                logger.error("Network setup launch daemon failed: \(error.localizedDescription, privacy: .public)")
+                logger.error(
+                    "Network setup launch daemon failed: \(error.localizedDescription, privacy: .public)"
+                )
            }
        }
    }

+    /// Removes all EXO network setup components from the system.
+    /// This includes the LaunchDaemon, scripts, logs, and network location.
+    /// Requires admin privileges.
+    static func uninstall() throws {
+        let uninstallScript = makeUninstallScript()
+        try runShellAsAdmin(uninstallScript)
+        logger.info("EXO network setup components removed successfully")
+    }
+
+    /// Checks if there are any EXO network components installed that need cleanup
+    static func hasInstalledComponents() -> Bool {
+        let manager = FileManager.default
+        let scriptExists = manager.fileExists(atPath: scriptDestination)
+        let plistExists = manager.fileExists(atPath: plistDestination)
+        return scriptExists || plistExists
+    }
+
+    private static func makeUninstallScript() -> String {
+        """
+        set -euo pipefail
+
+        LABEL="\(daemonLabel)"
+        SCRIPT_DEST="\(scriptDestination)"
+        PLIST_DEST="\(plistDestination)"
+        LOG_OUT="/var/log/\(daemonLabel).log"
+        LOG_ERR="/var/log/\(daemonLabel).err.log"
+
+        # Unload the LaunchDaemon if running
+        launchctl bootout system/"$LABEL" 2>/dev/null || true
+
+        # Remove LaunchDaemon plist
+        rm -f "$PLIST_DEST"
+
+        # Remove the script and parent directory if empty
+        rm -f "$SCRIPT_DEST"
+        rmdir "$(dirname "$SCRIPT_DEST")" 2>/dev/null || true
+
+        # Remove log files
+        rm -f "$LOG_OUT" "$LOG_ERR"
+
+        # Switch back to Automatic network location
+        networksetup -switchtolocation Automatic 2>/dev/null || true
+
+        # Delete the exo network location if it exists
+        networksetup -listlocations | grep -q '^exo$' && {
+          networksetup -deletelocation exo 2>/dev/null || true
+        } || true
+
+        # Re-enable Thunderbolt Bridge if it exists
+        networksetup -listnetworkservices | grep -q "Thunderbolt Bridge" && {
+          networksetup -setnetworkserviceenabled "Thunderbolt Bridge" on 2>/dev/null || true
+        } || true
+
+        echo "EXO network components removed successfully"
+        """
+    }
+
    private static func daemonAlreadyInstalled() -> Bool {
        let manager = FileManager.default
        let scriptExists = manager.fileExists(atPath: scriptDestination)
@@ -82,7 +143,8 @@ networksetup -listnetworkservices | grep -q "Thunderbolt Bridge" && {
        guard scriptExists, plistExists else { return false }
        guard
            let data = try? Data(contentsOf: URL(fileURLWithPath: plistDestination)),
-            let plist = try? PropertyListSerialization.propertyList(from: data, options: [], format: nil) as? [String: Any]
+            let plist = try? PropertyListSerialization.propertyList(
+                from: data, options: [], format: nil) as? [String: Any]
        else {
            return false
        }
@@ -92,7 +154,9 @@ networksetup -listnetworkservices | grep -q "Thunderbolt Bridge" && {
        else {
            return false
        }
-        if let programArgs = plist["ProgramArguments"] as? [String], programArgs.contains(scriptDestination) == false {
+        if let programArgs = plist["ProgramArguments"] as? [String],
+            programArgs.contains(scriptDestination) == false
+        {
            return false
        }
        return true
@@ -105,58 +169,59 @@ networksetup -listnetworkservices | grep -q "Thunderbolt Bridge" && {

    private static func makeInstallerScript() -> String {
        """
-set -euo pipefail
+        set -euo pipefail

-LABEL="\(daemonLabel)"
-SCRIPT_DEST="\(scriptDestination)"
-PLIST_DEST="\(plistDestination)"
+        LABEL="\(daemonLabel)"
+        SCRIPT_DEST="\(scriptDestination)"
+        PLIST_DEST="\(plistDestination)"

-mkdir -p "$(dirname "$SCRIPT_DEST")"
+        mkdir -p "$(dirname "$SCRIPT_DEST")"

-cat > "$SCRIPT_DEST" <<'EOF_SCRIPT'
-\(setupScript)
-EOF_SCRIPT
-chmod 755 "$SCRIPT_DEST"
+        cat > "$SCRIPT_DEST" <<'EOF_SCRIPT'
+        \(setupScript)
+        EOF_SCRIPT
+        chmod 755 "$SCRIPT_DEST"

-cat > "$PLIST_DEST" <<'EOF_PLIST'
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
-<plist version="1.0">
-<dict>
-  <key>Label</key>
-  <string>\(daemonLabel)</string>
-  <key>ProgramArguments</key>
-  <array>
-    <string>/bin/bash</string>
-    <string>\(scriptDestination)</string>
-  </array>
-  <key>StartInterval</key>
-  <integer>\(requiredStartInterval)</integer>
-  <key>RunAtLoad</key>
-  <true/>
-  <key>StandardOutPath</key>
-  <string>/var/log/\(daemonLabel).log</string>
-  <key>StandardErrorPath</key>
-  <string>/var/log/\(daemonLabel).err.log</string>
-</dict>
-</plist>
-EOF_PLIST
+        cat > "$PLIST_DEST" <<'EOF_PLIST'
+        <?xml version="1.0" encoding="UTF-8"?>
+        <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
+        <plist version="1.0">
+        <dict>
+          <key>Label</key>
+          <string>\(daemonLabel)</string>
+          <key>ProgramArguments</key>
+          <array>
+            <string>/bin/bash</string>
+            <string>\(scriptDestination)</string>
+          </array>
+          <key>StartInterval</key>
+          <integer>\(requiredStartInterval)</integer>
+          <key>RunAtLoad</key>
+          <true/>
+          <key>StandardOutPath</key>
+          <string>/var/log/\(daemonLabel).log</string>
+          <key>StandardErrorPath</key>
+          <string>/var/log/\(daemonLabel).err.log</string>
+        </dict>
+        </plist>
+        EOF_PLIST

-launchctl bootout system/"$LABEL" >/dev/null 2>&1 || true
-launchctl bootstrap system "$PLIST_DEST"
-launchctl enable system/"$LABEL"
-launchctl kickstart -k system/"$LABEL"
-"""
+        launchctl bootout system/"$LABEL" >/dev/null 2>&1 || true
+        launchctl bootstrap system "$PLIST_DEST"
+        launchctl enable system/"$LABEL"
+        launchctl kickstart -k system/"$LABEL"
+        """
    }

    private static func runShellAsAdmin(_ script: String) throws {
-        let escapedScript = script
+        let escapedScript =
+            script
            .replacingOccurrences(of: "\\", with: "\\\\")
            .replacingOccurrences(of: "\"", with: "\\\"")

        let appleScriptSource = """
-do shell script "\(escapedScript)" with administrator privileges
-"""
+            do shell script "\(escapedScript)" with administrator privileges
+            """

        guard let appleScript = NSAppleScript(source: appleScriptSource) else {
            throw NetworkSetupError.scriptCreationFailed
--- a/app/EXO/EXO/Services/NetworkStatusService.swift
+++ b/app/EXO/EXO/Services/NetworkStatusService.swift
@@ -35,14 +35,34 @@ struct NetworkStatus: Equatable {
    let thunderboltBridgeState: ThunderboltState?
    let bridgeInactive: Bool?
    let interfaceStatuses: [InterfaceIpStatus]
+    let rdmaStatus: RDMAStatus

    static let empty = NetworkStatus(
        thunderboltBridgeState: nil,
        bridgeInactive: nil,
-        interfaceStatuses: []
+        interfaceStatuses: [],
+        rdmaStatus: .empty
    )
 }

+struct RDMAStatus: Equatable {
+    let rdmaCtlEnabled: Bool?
+    let devices: [String]
+    let activePorts: [RDMAPort]
+
+    var isAvailable: Bool {
+        rdmaCtlEnabled == true || !devices.isEmpty
+    }
+
+    static let empty = RDMAStatus(rdmaCtlEnabled: nil, devices: [], activePorts: [])
+}
+
+struct RDMAPort: Equatable {
+    let device: String
+    let port: String
+    let state: String
+}
+
 struct InterfaceIpStatus: Equatable {
    let interfaceName: String
    let ipAddress: String?
@@ -59,10 +79,79 @@ private struct NetworkStatusFetcher {
        NetworkStatus(
            thunderboltBridgeState: readThunderboltBridgeState(),
            bridgeInactive: readBridgeInactive(),
-            interfaceStatuses: readInterfaceStatuses()
+            interfaceStatuses: readInterfaceStatuses(),
+            rdmaStatus: readRDMAStatus()
        )
    }

+    private func readRDMAStatus() -> RDMAStatus {
+        let rdmaCtlEnabled = readRDMACtlEnabled()
+        let devices = readRDMADevices()
+        let activePorts = readRDMAActivePorts()
+        return RDMAStatus(
+            rdmaCtlEnabled: rdmaCtlEnabled, devices: devices, activePorts: activePorts)
+    }
+
+    private func readRDMACtlEnabled() -> Bool? {
+        let result = runCommand(["rdma_ctl", "status"])
+        guard result.exitCode == 0 else { return nil }
+        let output = result.output.lowercased().trimmingCharacters(in: .whitespacesAndNewlines)
+        if output.contains("enabled") {
+            return true
+        }
+        if output.contains("disabled") {
+            return false
+        }
+        return nil
+    }
+
+    private func readRDMADevices() -> [String] {
+        let result = runCommand(["ibv_devices"])
+        guard result.exitCode == 0 else { return [] }
+        var devices: [String] = []
+        for line in result.output.split(separator: "\n") {
+            let trimmed = line.trimmingCharacters(in: .whitespaces)
+            if trimmed.hasPrefix("---") || trimmed.lowercased().hasPrefix("device")
+                || trimmed.isEmpty
+            {
+                continue
+            }
+            let parts = trimmed.split(separator: " ", maxSplits: 1)
+            if let deviceName = parts.first {
+                devices.append(String(deviceName))
+            }
+        }
+        return devices
+    }
+
+    private func readRDMAActivePorts() -> [RDMAPort] {
+        let result = runCommand(["ibv_devinfo"])
+        guard result.exitCode == 0 else { return [] }
+        var ports: [RDMAPort] = []
+        var currentDevice: String?
+        var currentPort: String?
+
+        for line in result.output.split(separator: "\n") {
+            let trimmed = line.trimmingCharacters(in: .whitespaces)
+            if trimmed.hasPrefix("hca_id:") {
+                currentDevice = trimmed.replacingOccurrences(of: "hca_id:", with: "")
+                    .trimmingCharacters(in: .whitespaces)
+            } else if trimmed.hasPrefix("port:") {
+                currentPort = trimmed.replacingOccurrences(of: "port:", with: "")
+                    .trimmingCharacters(in: .whitespaces)
+            } else if trimmed.hasPrefix("state:") {
+                let state = trimmed.replacingOccurrences(of: "state:", with: "").trimmingCharacters(
+                    in: .whitespaces)
+                if let device = currentDevice, let port = currentPort {
+                    if state.lowercased().contains("active") {
+                        ports.append(RDMAPort(device: device, port: port, state: state))
+                    }
+                }
+            }
+        }
+        return ports
+    }
+
    private func readThunderboltBridgeState() -> ThunderboltState? {
        let result = runCommand(["networksetup", "-getnetworkserviceenabled", "Thunderbolt Bridge"])
        guard result.exitCode == 0 else {
@@ -85,10 +174,11 @@ private struct NetworkStatusFetcher {
    private func readBridgeInactive() -> Bool? {
        let result = runCommand(["ifconfig", "bridge0"])
        guard result.exitCode == 0 else { return nil }
-        guard let statusLine = result.output
-            .components(separatedBy: .newlines)
-            .first(where: { $0.contains("status:") })?
-            .lowercased()
+        guard
+            let statusLine = result.output
+                .components(separatedBy: .newlines)
+                .first(where: { $0.contains("status:") })?
+                .lowercased()
        else {
            return nil
        }
@@ -171,4 +261,3 @@ private struct NetworkStatusFetcher {
        )
    }
 }
-
--- a/app/EXO/EXO/ViewModels/InstanceViewModel.swift
+++ b/app/EXO/EXO/ViewModels/InstanceViewModel.swift
@@ -57,7 +57,7 @@ struct InstanceViewModel: Identifiable, Equatable {
        case waiting
        case failed
        case idle
-        case unknown
+        case preparing

        var label: String {
            switch self {
@@ -68,7 +68,7 @@ struct InstanceViewModel: Identifiable, Equatable {
            case .waiting: return "Waiting"
            case .failed: return "Failed"
            case .idle: return "Idle"
-            case .unknown: return "Unknown"
+            case .preparing: return "Preparing"
            }
        }
    }
@@ -107,10 +107,13 @@ extension ClusterState {
            let nodeToRunner = instance.shardAssignments.nodeToRunner
            let nodeIds = Array(nodeToRunner.keys)
            let runnerIds = Array(nodeToRunner.values)
-            let nodeNames = nodeIds.compactMap { nodeProfiles[$0]?.friendlyName ?? nodeProfiles[$0]?.modelId ?? $0 }
+            let nodeNames = nodeIds.compactMap {
+                nodeProfiles[$0]?.friendlyName ?? nodeProfiles[$0]?.modelId ?? $0
+            }
            let statuses = runnerIds.compactMap { runners[$0]?.status.lowercased() }
            let downloadProgress = aggregateDownloadProgress(for: nodeIds)
-            let state = InstanceViewModel.State(statuses: statuses, hasActiveDownload: downloadProgress != nil)
+            let state = InstanceViewModel.State(
+                statuses: statuses, hasActiveDownload: downloadProgress != nil)
            let chatTasks = (chatTasksByInstance[entry.key] ?? [])
                .sorted(by: { $0.sortPriority < $1.sortPriority })
                .map { InstanceTaskViewModel(task: $0) }
@@ -165,8 +168,8 @@ extension ClusterState {
    }
 }

-private extension InstanceViewModel.State {
-    init(statuses: [String], hasActiveDownload: Bool = false) {
+extension InstanceViewModel.State {
+    fileprivate init(statuses: [String], hasActiveDownload: Bool = false) {
        if statuses.contains(where: { $0.contains("failed") }) {
            self = .failed
        } else if hasActiveDownload || statuses.contains(where: { $0.contains("downloading") }) {
@@ -182,7 +185,7 @@ private extension InstanceViewModel.State {
        } else if statuses.isEmpty {
            self = .idle
        } else {
-            self = .unknown
+            self = .preparing
        }
    }
 }
@@ -243,4 +246,3 @@ extension InstanceTaskViewModel {
        self.parameters = task.parameters
    }
 }
-
--- a/app/EXO/EXO/ViewModels/NodeViewModel.swift
+++ b/app/EXO/EXO/ViewModels/NodeViewModel.swift
@@ -87,7 +87,9 @@ struct TopologyViewModel {
 extension ClusterState {
    func topologyViewModel(localNodeId: String?) -> TopologyViewModel? {
        let topologyNodeIds = Set(topology?.nodes.map(\.nodeId) ?? [])
-        let allNodes = nodeViewModels().filter { topologyNodeIds.isEmpty || topologyNodeIds.contains($0.id) }
+        let allNodes = nodeViewModels().filter {
+            topologyNodeIds.isEmpty || topologyNodeIds.contains($0.id)
+        }
        guard !allNodes.isEmpty else { return nil }

        let nodesById = Dictionary(uniqueKeysWithValues: allNodes.map { ($0.id, $0) })
@@ -106,18 +108,24 @@ extension ClusterState {
        }

        // Rotate so the local node (from /node_id API) is first
-        if let localId = localNodeId, let index = orderedNodes.firstIndex(where: { $0.id == localId }) {
+        if let localId = localNodeId,
+            let index = orderedNodes.firstIndex(where: { $0.id == localId })
+        {
            orderedNodes = Array(orderedNodes[index...]) + Array(orderedNodes[..<index])
        }

        let nodeIds = Set(orderedNodes.map(\.id))
-        let edgesArray: [TopologyEdgeViewModel] = topology?.connections?.compactMap { connection in
-            guard nodeIds.contains(connection.localNodeId), nodeIds.contains(connection.sendBackNodeId) else { return nil }
-            return TopologyEdgeViewModel(sourceId: connection.localNodeId, targetId: connection.sendBackNodeId)
-        } ?? []
+        let edgesArray: [TopologyEdgeViewModel] =
+            topology?.connections?.compactMap { connection in
+                guard nodeIds.contains(connection.localNodeId),
+                    nodeIds.contains(connection.sendBackNodeId)
+                else { return nil }
+                return TopologyEdgeViewModel(
+                    sourceId: connection.localNodeId, targetId: connection.sendBackNodeId)
+            } ?? []
        let edges = Set(edgesArray)

-        return TopologyViewModel(nodes: orderedNodes, edges: Array(edges), currentNodeId: localNodeId)
+        return TopologyViewModel(
+            nodes: orderedNodes, edges: Array(edges), currentNodeId: localNodeId)
    }
 }
-
--- a/app/EXO/EXO/Views/InstanceRowView.swift
+++ b/app/EXO/EXO/Views/InstanceRowView.swift
@@ -20,8 +20,8 @@ struct InstanceRowView: View {
                if let progress = instance.downloadProgress {
                    downloadStatusView(progress: progress)
                } else {
-                statusChip(label: instance.state.label.uppercased(), color: statusColor)
-            }
+                    statusChip(label: instance.state.label.uppercased(), color: statusColor)
+                }
            }
            if let progress = instance.downloadProgress {
                GeometryReader { geometry in
@@ -83,7 +83,7 @@ struct InstanceRowView: View {
        case .ready: return .teal
        case .waiting, .idle: return .gray
        case .failed: return .red
-        case .unknown: return .secondary
+        case .preparing: return .secondary
        }
    }

@@ -97,7 +97,8 @@ struct InstanceRowView: View {
                        .font(.caption)
                        .fontWeight(.semibold)
                    if let subtitle = task.subtitle,
-                       subtitle.caseInsensitiveCompare(parentModelName) != .orderedSame {
+                        subtitle.caseInsensitiveCompare(parentModelName) != .orderedSame
+                    {
                        Text(subtitle)
                            .font(.caption2)
                            .foregroundColor(.secondary)
@@ -234,9 +235,12 @@ struct InstanceRowView: View {
        Button {
            isExpanded.wrappedValue.toggle()
        } label: {
-            Label(isExpanded.wrappedValue ? "Hide" : "Show", systemImage: isExpanded.wrappedValue ? "chevron.up" : "chevron.down")
-                .labelStyle(.titleAndIcon)
-                .contentTransition(.symbolEffect(.replace))
+            Label(
+                isExpanded.wrappedValue ? "Hide" : "Show",
+                systemImage: isExpanded.wrappedValue ? "chevron.up" : "chevron.down"
+            )
+            .labelStyle(.titleAndIcon)
+            .contentTransition(.symbolEffect(.replace))
        }
        .buttonStyle(.plain)
        .font(.caption2)
@@ -311,7 +315,9 @@ struct InstanceRowView: View {
        }

        @ViewBuilder
-        private func detailRow(icon: String? = nil, title: String, value: String, tint: Color = .secondary) -> some View {
+        private func detailRow(
+            icon: String? = nil, title: String, value: String, tint: Color = .secondary
+        ) -> some View {
            HStack(alignment: .firstTextBaseline, spacing: 6) {
                if let icon {
                    Image(systemName: icon)
@@ -329,4 +335,3 @@ struct InstanceRowView: View {
        }
    }
 }
-
--- a/app/EXO/EXO/Views/NodeDetailView.swift
+++ b/app/EXO/EXO/Views/NodeDetailView.swift
@@ -32,4 +32,3 @@ struct NodeDetailView: View {
        }
    }
 }
-
--- a/app/EXO/EXO/Views/NodeRowView.swift
+++ b/app/EXO/EXO/Views/NodeRowView.swift
@@ -28,4 +28,3 @@ struct NodeRowView: View {
        .padding(.vertical, 4)
    }
 }
-
--- a/app/EXO/EXO/Views/TopologyMiniView.swift
+++ b/app/EXO/EXO/Views/TopologyMiniView.swift
@@ -76,30 +76,33 @@ struct TopologyMiniView: View {

    private func connectionLines(in size: CGSize) -> some View {
        let positions = positionedNodes(in: size)
-        let positionById = Dictionary(uniqueKeysWithValues: positions.map { ($0.node.id, $0.point) })
+        let positionById = Dictionary(
+            uniqueKeysWithValues: positions.map { ($0.node.id, $0.point) })
        return Canvas { context, _ in
            guard !topology.edges.isEmpty else { return }
            let nodeRadius: CGFloat = 32
            let arrowLength: CGFloat = 10
            let arrowSpread: CGFloat = .pi / 7
            for edge in topology.edges {
-                guard let start = positionById[edge.sourceId], let end = positionById[edge.targetId] else { continue }
+                guard let start = positionById[edge.sourceId], let end = positionById[edge.targetId]
+                else { continue }
                let dx = end.x - start.x
                let dy = end.y - start.y
                let distance = max(CGFloat(hypot(dx, dy)), 1)
                let ux = dx / distance
                let uy = dy / distance
-                let adjustedStart = CGPoint(x: start.x + ux * nodeRadius, y: start.y + uy * nodeRadius)
+                let adjustedStart = CGPoint(
+                    x: start.x + ux * nodeRadius, y: start.y + uy * nodeRadius)
                let adjustedEnd = CGPoint(x: end.x - ux * nodeRadius, y: end.y - uy * nodeRadius)

                var linePath = Path()
                linePath.move(to: adjustedStart)
                linePath.addLine(to: adjustedEnd)
-            context.stroke(
+                context.stroke(
                    linePath,
                    with: .color(.secondary.opacity(0.3)),
-                style: StrokeStyle(lineWidth: 1, dash: [4, 4])
-            )
+                    style: StrokeStyle(lineWidth: 1, dash: [4, 4])
+                )

                let angle = atan2(uy, ux)
                let tip = adjustedEnd
@@ -168,5 +171,3 @@ private struct NodeGlyphView: View {
        .frame(width: 95)
    }
 }
-
-
--- a/app/EXO/EXOTests/EXOTests.swift
+++ b/app/EXO/EXOTests/EXOTests.swift
@@ -6,6 +6,7 @@
 //

 import Testing
+
@testable import EXO

 struct EXOTests {
--- a/app/EXO/uninstall-exo.sh
+++ b/app/EXO/uninstall-exo.sh
@@ -0,0 +1,154 @@
+#!/usr/bin/env bash
+#
+# EXO Uninstaller Script
+#
+# This script removes all EXO system components that persist after deleting the app.
+# Run with: sudo ./uninstall-exo.sh
+#
+# Components removed:
+# - LaunchDaemon: /Library/LaunchDaemons/io.exo.networksetup.plist
+# - Network script: /Library/Application Support/EXO/
+# - Log files: /var/log/io.exo.networksetup.*
+# - Network location: "exo"
+# - Launch at login registration
+#
+
+set -euo pipefail
+
+LABEL="io.exo.networksetup"
+SCRIPT_DEST="/Library/Application Support/EXO/disable_bridge_enable_dhcp.sh"
+PLIST_DEST="/Library/LaunchDaemons/io.exo.networksetup.plist"
+LOG_OUT="/var/log/${LABEL}.log"
+LOG_ERR="/var/log/${LABEL}.err.log"
+APP_BUNDLE_ID="io.exo.EXO"
+
+# Colors for output
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'
+NC='\033[0m' # No Color
+
+echo_info() {
+    echo -e "${GREEN}[INFO]${NC} $1"
+}
+
+echo_warn() {
+    echo -e "${YELLOW}[WARN]${NC} $1"
+}
+
+echo_error() {
+    echo -e "${RED}[ERROR]${NC} $1"
+}
+
+# Check if running as root
+if [[ $EUID -ne 0 ]]; then
+    echo_error "This script must be run as root (use sudo)"
+    exit 1
+fi
+
+echo ""
+echo "========================================"
+echo "        EXO Uninstaller"
+echo "========================================"
+echo ""
+
+# Unload the LaunchDaemon if running
+echo_info "Stopping network setup daemon..."
+if launchctl list | grep -q "$LABEL"; then
+    launchctl bootout system/"$LABEL" 2>/dev/null || true
+    echo_info "Daemon stopped"
+else
+    echo_warn "Daemon was not running"
+fi
+
+# Remove LaunchDaemon plist
+if [[ -f "$PLIST_DEST" ]]; then
+    rm -f "$PLIST_DEST"
+    echo_info "Removed LaunchDaemon plist"
+else
+    echo_warn "LaunchDaemon plist not found (already removed?)"
+fi
+
+# Remove the script and parent directory
+if [[ -f "$SCRIPT_DEST" ]]; then
+    rm -f "$SCRIPT_DEST"
+    echo_info "Removed network setup script"
+else
+    echo_warn "Network setup script not found (already removed?)"
+fi
+
+# Remove EXO directory if empty
+if [[ -d "/Library/Application Support/EXO" ]]; then
+    rmdir "/Library/Application Support/EXO" 2>/dev/null && \
+        echo_info "Removed EXO support directory" || \
+        echo_warn "EXO support directory not empty, leaving in place"
+fi
+
+# Remove log files
+if [[ -f "$LOG_OUT" ]] || [[ -f "$LOG_ERR" ]]; then
+    rm -f "$LOG_OUT" "$LOG_ERR"
+    echo_info "Removed log files"
+else
+    echo_warn "Log files not found (already removed?)"
+fi
+
+# Switch back to Automatic network location
+echo_info "Restoring network configuration..."
+if networksetup -listlocations | grep -q "^Automatic$"; then
+    networksetup -switchtolocation Automatic 2>/dev/null || true
+    echo_info "Switched to Automatic network location"
+else
+    echo_warn "Automatic network location not found"
+fi
+
+# Delete the exo network location if it exists
+if networksetup -listlocations | grep -q "^exo$"; then
+    networksetup -deletelocation exo 2>/dev/null || true
+    echo_info "Deleted 'exo' network location"
+else
+    echo_warn "'exo' network location not found (already removed?)"
+fi
+
+# Re-enable Thunderbolt Bridge if it exists
+if networksetup -listnetworkservices 2>/dev/null | grep -q "Thunderbolt Bridge"; then
+    networksetup -setnetworkserviceenabled "Thunderbolt Bridge" on 2>/dev/null || true
+    echo_info "Re-enabled Thunderbolt Bridge"
+fi
+
+# Note about launch at login registration
+# SMAppService-based login items cannot be removed from a shell script.
+# They can only be unregistered from within the app itself or manually via System Settings.
+echo_warn "Launch at login must be removed manually:"
+echo_warn "  System Settings → General → Login Items → Remove EXO"
+
+# Check if EXO.app exists in common locations
+APP_FOUND=false
+for app_path in "/Applications/EXO.app" "$HOME/Applications/EXO.app"; do
+    if [[ -d "$app_path" ]]; then
+        if [[ "$APP_FOUND" == false ]]; then
+            echo ""
+            APP_FOUND=true
+        fi
+        echo_warn "EXO.app found at: $app_path"
+        echo_warn "You may want to move it to Trash manually."
+    fi
+done
+
+echo ""
+echo "========================================"
+echo_info "EXO uninstall complete!"
+echo "========================================"
+echo ""
+echo "The following have been removed:"
+echo "  • Network setup LaunchDaemon"
+echo "  • Network configuration script"
+echo "  • Log files"
+echo "  • 'exo' network location"
+echo ""
+echo "Your network has been restored to use the 'Automatic' location."
+echo "Thunderbolt Bridge has been re-enabled (if present)."
+echo ""
+echo "Manual step required:"
+echo "  Remove EXO from Login Items in System Settings → General → Login Items"
+echo ""
+
--- a/bench/exo_bench.py
+++ b/bench/exo_bench.py
@@ -0,0 +1,526 @@
+#!/usr/bin/env python3
+# pyright: reportAny=false, reportUnknownMemberType=false, reportUnknownVariableType=false, reportUnknownArgumentType=false
+from __future__ import annotations
+
+import argparse
+import http.client
+import json
+import os
+import time
+from collections.abc import Callable
+from statistics import mean
+from typing import Any
+from urllib.parse import urlencode
+
+from loguru import logger
+from transformers import AutoTokenizer
+
+from exo.shared.models.model_cards import MODEL_CARDS
+from exo.shared.types.memory import Memory
+
+
+class ExoHttpError(RuntimeError):
+    def __init__(self, status: int, reason: str, body_preview: str):
+        super().__init__(f"HTTP {status} {reason}: {body_preview}")
+        self.status = status
+
+
+class ExoClient:
+    def __init__(self, host: str, port: int, timeout_s: float = 2400.0):
+        self.host = host
+        self.port = port
+        self.timeout_s = timeout_s
+
+    def request_json(
+        self,
+        method: str,
+        path: str,
+        params: dict[str, Any] | None = None,
+        body: dict[str, Any] | None = None,
+        headers: dict[str, str] | None = None,
+    ) -> Any:
+        if not path.startswith("/"):
+            path = "/" + path
+        if params:
+            path = path + "?" + urlencode(params)
+
+        conn = http.client.HTTPConnection(self.host, self.port, timeout=self.timeout_s)
+        try:
+            payload: bytes | None = None
+            hdrs: dict[str, str] = {"Accept": "application/json"}
+
+            if body is not None:
+                payload = json.dumps(body).encode("utf-8")
+                hdrs["Content-Type"] = "application/json"
+            if headers:
+                hdrs.update(headers)
+
+            conn.request(method.upper(), path, body=payload, headers=hdrs)
+            resp = conn.getresponse()
+            raw = resp.read()
+            text = raw.decode("utf-8", errors="replace") if raw else ""
+
+            if resp.status >= 400:
+                raise ExoHttpError(resp.status, resp.reason, text[:300])
+
+            if not text:
+                return None
+            return json.loads(text)
+        finally:
+            conn.close()
+
+    def post_bench_chat_completions(self, payload: dict[str, Any]) -> dict[str, Any]:
+        return self.request_json("POST", "/bench/chat/completions", body=payload)
+
+
+def unwrap_instance(instance: dict[str, Any]) -> dict[str, Any]:
+    if len(instance) != 1:
+        raise KeyError(f"Expected 1 key, got keys={list(instance.keys())}")
+
+    tag = next(iter(instance))
+    inner = instance[tag]
+    if not isinstance(inner, dict):
+        raise TypeError(f"payload for {tag} must be dict, got {type(inner)}")
+    return inner
+
+
+def instance_id_from_instance(instance: dict[str, Any]) -> str:
+    inner = unwrap_instance(instance)
+    return str(inner["instanceId"])
+
+
+def nodes_used_in_instance(instance: dict[str, Any]) -> int:
+    inner = unwrap_instance(instance)
+    return len(inner["shardAssignments"]["nodeToRunner"])
+
+
+def runner_ids_from_instance(instance: dict[str, Any]) -> list[str]:
+    inner = unwrap_instance(instance)
+    runner_to_shard = inner["shardAssignments"]["runnerToShard"]
+    return list(runner_to_shard.keys())
+
+
+def runner_ready(runner: dict[str, Any]) -> bool:
+    return "RunnerReady" in runner
+
+
+def wait_for_instance_ready(
+    client: ExoClient, instance_id: str, timeout: float = 24000.0
+) -> None:
+    start_time = time.time()
+    while time.time() - start_time < timeout:
+        state = client.request_json("GET", "/state")
+        instances = state.get("instances", {})
+
+        if instance_id not in instances:
+            time.sleep(0.1)
+            continue
+
+        instance = instances[instance_id]
+        runner_ids = runner_ids_from_instance(instance)
+        runners = state.get("runners", {})
+
+        if all(runner_ready(runners.get(rid, {})) for rid in runner_ids):
+            return
+
+        time.sleep(0.1)
+
+    raise TimeoutError(f"Instance {instance_id} did not become ready within {timeout=}")
+
+
+def wait_for_instance_gone(
+    client: ExoClient, instance_id: str, timeout: float = 3.0
+) -> None:
+    start_time = time.time()
+    while time.time() - start_time < timeout:
+        try:
+            client.request_json("GET", f"/instance/{instance_id}")
+            time.sleep(0.4)
+        except ExoHttpError as e:
+            if e.status == 404:
+                return
+
+    raise TimeoutError(f"Instance {instance_id} did not get deleted within {timeout=}")
+
+
+def format_peak_memory(b: float) -> str:
+    for unit in ["B", "KB", "MB", "GB", "TB"]:
+        if b < 1024.0:
+            return f"{b:.2f}{unit}"
+        b /= 1024.0
+    raise ValueError("You're using petabytes of memory. Something went wrong...")
+
+
+def parse_int_list(values: list[str]) -> list[int]:
+    items: list[int] = []
+    for v in values:
+        for part in v.split(","):
+            part = part.strip()
+            if part:
+                items.append(int(part))
+
+    seen: set[int] = set()
+    out: list[int] = []
+    for x in items:
+        if x not in seen:
+            out.append(x)
+            seen.add(x)
+    return out
+
+
+def resolve_model_short_id(client: ExoClient, model_arg: str) -> tuple[str, str]:
+    models = client.request_json("GET", "/models") or {}
+    data = models.get("data") or []
+
+    for m in data:
+        if m.get("id") == model_arg:
+            short_id = str(m["id"])
+            full_id = str(m.get("hugging_face_id") or m["id"])
+            return short_id, full_id
+
+    for m in data:
+        if m.get("hugging_face_id") == model_arg:
+            short_id = str(m["id"])
+            full_id = str(m["hugging_face_id"])
+            return short_id, full_id
+
+    raise ValueError(f"Model not found in /models: {model_arg}")
+
+
+def placement_filter(instance_meta: str, wanted: str) -> bool:
+    s = (instance_meta or "").lower()
+    if wanted == "both":
+        return ("ring" in s) or ("jaccl" in s)
+    return wanted in s
+
+
+def sharding_filter(sharding: str, wanted: str) -> bool:
+    s = (sharding or "").lower()
+    if wanted == "both":
+        return ("pipeline" in s) or ("tensor" in s)
+    return wanted in s
+
+
+def run_one_completion(
+    client: ExoClient, model_id: str, pp_hint: int, tg: int, prompt_sizer: PromptSizer
+) -> tuple[dict[str, Any], int]:
+    content, pp_tokens = prompt_sizer.build(pp_hint)
+    payload: dict[str, Any] = {
+        "model": model_id,
+        "messages": [{"role": "user", "content": content}],
+        "stream": False,
+        "max_tokens": tg,
+    }
+
+    t0 = time.perf_counter()
+    out = client.post_bench_chat_completions(payload)
+    elapsed = time.perf_counter() - t0
+
+    stats = out.get("generation_stats")
+
+    preview = (out.get("choices") or [{}])[0]["message"]["content"][:200]
+
+    return {
+        "elapsed_s": elapsed,
+        "output_text_preview": preview,
+        "stats": stats,
+    }, pp_tokens
+
+
+class PromptSizer:
+    def __init__(self, tokenizer: Any, atom: str = "a "):
+        self.tokenizer = tokenizer
+        self.atom = atom
+        self.count_fn = PromptSizer._make_counter(tokenizer)
+        self.base_tokens = self.count_fn("")
+
+    @staticmethod
+    def _make_counter(tokenizer: Any) -> Callable[[str], int]:
+        def count_fn(user_content: str) -> int:
+            messages = [{"role": "user", "content": user_content}]
+            ids = tokenizer.apply_chat_template(
+                messages, tokenize=True, add_generation_prompt=True
+            )
+            return int(len(ids))
+
+        return count_fn
+
+    def build(self, target_prompt_tokens: int) -> tuple[str, int]:
+        target = int(target_prompt_tokens)
+        if target < self.base_tokens:
+            raise RuntimeError(
+                f"Target ({target}) is smaller than template overhead ({self.base_tokens})."
+            )
+
+        content = ""
+        tok = self.count_fn(content)
+
+        while tok < target:
+            content += self.atom
+            tok = self.count_fn(content)
+
+        if tok != target:
+            raise RuntimeError(
+                f"Overshot: got {tok} tokens (target {target}). "
+                f"Pick a different atom (try ' a' or '\\n' or '0 ')."
+            )
+
+        return content, tok
+
+
+def main() -> int:
+    ap = argparse.ArgumentParser(
+        prog="exo-bench",
+        description="Benchmark exo model throughput across placement previews.",
+    )
+    ap.add_argument("--host", default=os.environ.get("EXO_HOST", "localhost"))
+    ap.add_argument(
+        "--port", type=int, default=int(os.environ.get("EXO_PORT", "52415"))
+    )
+    ap.add_argument("--model", required=True, help="Model short id or huggingface id")
+    ap.add_argument(
+        "--pp",
+        nargs="+",
+        required=True,
+        help="Prompt-size hints (ints). Accepts commas.",
+    )
+    ap.add_argument(
+        "--tg",
+        nargs="+",
+        required=True,
+        help="Generation lengths (ints). Accepts commas.",
+    )
+    ap.add_argument(
+        "--max-nodes",
+        type=int,
+        default=4,
+        help="Only consider placements using <= this many nodes.",
+    )
+    ap.add_argument(
+        "--instance-meta", choices=["ring", "jaccl", "both"], default="both"
+    )
+    ap.add_argument(
+        "--sharding", choices=["pipeline", "tensor", "both"], default="both"
+    )
+    ap.add_argument(
+        "--skip-pipeline-jaccl",
+        action="store_true",
+        help="Pipeline jaccl is often pointless, skip by default",
+    )
+    ap.add_argument(
+        "--repeat", type=int, default=1, help="Repetitions per (pp,tg) pair."
+    )
+    ap.add_argument(
+        "--warmup",
+        type=int,
+        default=0,
+        help="Warmup runs per placement (uses first pp/tg).",
+    )
+    ap.add_argument(
+        "--timeout", type=float, default=2400.0, help="HTTP timeout (seconds)."
+    )
+    ap.add_argument(
+        "--json-out",
+        default="bench/results.json",
+        help="Write raw per-run results JSON to this path.",
+    )
+    ap.add_argument(
+        "--dry-run", action="store_true", help="List selected placements and exit."
+    )
+    args = ap.parse_args()
+
+    pp_list = parse_int_list(args.pp)
+    tg_list = parse_int_list(args.tg)
+    if not pp_list or not tg_list:
+        logger.error("pp and tg lists must be non-empty")
+        return 2
+    if args.repeat <= 0:
+        logger.error("--repeat must be >= 1")
+        return 2
+
+    client = ExoClient(args.host, args.port, timeout_s=args.timeout)
+    short_id, full_model_id = resolve_model_short_id(client, args.model)
+
+    previews_resp = client.request_json(
+        "GET", "/instance/previews", params={"model_id": short_id}
+    )
+    previews = previews_resp.get("previews") or []
+
+    tokenizer = AutoTokenizer.from_pretrained(
+        full_model_id,
+        trust_remote_code=True,
+    )
+    if tokenizer is None:
+        raise RuntimeError("[exo-bench] tokenizer load failed")
+
+    try:
+        prompt_sizer = PromptSizer(tokenizer)
+        logger.debug(f"[exo-bench] loaded tokenizer: {full_model_id} for prompt sizer")
+    except Exception:
+        logger.error("[exo-bench] tokenizer usable but prompt sizing failed")
+        raise
+
+    selected: list[dict[str, Any]] = []
+    for p in previews:
+        if p.get("error") is not None:
+            continue
+        if not placement_filter(str(p.get("instance_meta", "")), args.instance_meta):
+            continue
+        if not sharding_filter(str(p.get("sharding", "")), args.sharding):
+            continue
+
+        instance = p.get("instance")
+        if not isinstance(instance, dict):
+            continue
+
+        n = nodes_used_in_instance(instance)
+        # Skip tensor ring single node as it is pointless when pipeline ring
+        if n == 1 and (
+            (args.sharding == "both" and "tensor" in p.get("sharding", "").lower())
+            or (
+                args.instance_meta == "both"
+                and "jaccl" in p.get("instance_meta", "").lower()
+            )
+        ):
+            continue
+
+        if (
+            args.skip_pipeline_jaccl
+            and (
+                args.instance_meta == "both"
+                and "jaccl" in p.get("instance_meta", "").lower()
+            )
+            and (
+                args.sharding == "both" and "pipeline" in p.get("sharding", "").lower()
+            )
+        ):
+            continue
+
+        if 0 < n <= args.max_nodes:
+            selected.append(p)
+
+    if not selected:
+        logger.error("No valid placements matched your filters.")
+        return 1
+
+    selected.sort(
+        key=lambda p: (
+            str(p.get("instance_meta", "")),
+            str(p.get("sharding", "")),
+            -nodes_used_in_instance(p["instance"]),
+        ),
+        reverse=True,
+    )
+
+    logger.debug(f"exo-bench model: short_id={short_id} full_id={full_model_id}")
+    logger.info(f"placements: {len(selected)}")
+    for p in selected:
+        logger.info(
+            f"  - {p['sharding']} / {p['instance_meta']} / nodes={nodes_used_in_instance(p['instance'])}"
+        )
+
+    if args.dry_run:
+        return 0
+
+    all_rows: list[dict[str, Any]] = []
+
+    for preview in selected:
+        instance = preview["instance"]
+        instance_id = instance_id_from_instance(instance)
+
+        sharding = str(preview["sharding"])
+        instance_meta = str(preview["instance_meta"])
+        n_nodes = nodes_used_in_instance(instance)
+
+        logger.info("=" * 80)
+        logger.info(
+            f"PLACEMENT: {sharding} / {instance_meta} / nodes={n_nodes} / instance_id={instance_id}"
+        )
+
+        client.request_json("POST", "/instance", body={"instance": instance})
+        wait_for_instance_ready(client, instance_id)
+
+        time.sleep(1)
+
+        try:
+            for i in range(args.warmup):
+                run_one_completion(
+                    client, full_model_id, pp_list[0], tg_list[0], prompt_sizer
+                )
+                logger.debug(f"  warmup {i + 1}/{args.warmup} done")
+
+            for pp in pp_list:
+                if (
+                    pp * n_nodes > 2048
+                    and "ring" in instance_meta.lower()
+                    and "tensor" in sharding.lower()
+                ):
+                    model_card = MODEL_CARDS[short_id]
+                    if model_card.metadata.storage_size > Memory.from_gb(10):
+                        logger.info(
+                            f"Skipping tensor ring as this is too slow for model of size {model_card.metadata.storage_size} on {n_nodes=}"
+                        )
+                        continue
+                for tg in tg_list:
+                    runs: list[dict[str, Any]] = []
+                    for r in range(args.repeat):
+                        time.sleep(3)
+                        try:
+                            row, actual_pp_tokens = run_one_completion(
+                                client, full_model_id, pp, tg, prompt_sizer
+                            )
+                        except Exception as e:
+                            logger.error(e)
+                            continue
+                        row.update(
+                            {
+                                "model_short_id": short_id,
+                                "model_id": full_model_id,
+                                "placement_sharding": sharding,
+                                "placement_instance_meta": instance_meta,
+                                "placement_nodes": n_nodes,
+                                "instance_id": instance_id,
+                                "pp_tokens": actual_pp_tokens,
+                                "tg": tg,
+                                "repeat_index": r,
+                            }
+                        )
+                        runs.append(row)
+                        all_rows.append(row)
+
+                    if runs:
+                        prompt_tps = mean(x["stats"]["prompt_tps"] for x in runs)
+                        gen_tps = mean(x["stats"]["generation_tps"] for x in runs)
+                        ptok = mean(x["stats"]["prompt_tokens"] for x in runs)
+                        gtok = mean(x["stats"]["generation_tokens"] for x in runs)
+                        peak = mean(
+                            x["stats"]["peak_memory_usage"]["inBytes"] for x in runs
+                        )
+
+                        logger.info(
+                            f"prompt_tps={prompt_tps:.2f} gen_tps={gen_tps:.2f}    "
+                            f"prompt_tokens={ptok} gen_tokens={gtok}    "
+                            f"peak_memory={format_peak_memory(peak)}\n"
+                        )
+                    time.sleep(2)
+        finally:
+            try:
+                client.request_json("DELETE", f"/instance/{instance_id}")
+            except ExoHttpError as e:
+                if e.status != 404:
+                    raise
+            wait_for_instance_gone(client, instance_id)
+            logger.debug(f"Deleted instance {instance_id}")
+
+            time.sleep(5)
+
+    if args.json_out:
+        with open(args.json_out, "w", encoding="utf-8") as f:
+            json.dump(all_rows, f, indent=2, ensure_ascii=False)
+        logger.debug(f"\nWrote results JSON: {args.json_out}")
+
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
--- a/dashboard/src/app.d.ts
+++ b/dashboard/src/app.d.ts
@@ -11,4 +11,3 @@ declare global {
 }

 export {};
-
--- a/dashboard/src/lib/components/ChatForm.svelte
+++ b/dashboard/src/lib/components/ChatForm.svelte
@@ -139,6 +139,11 @@
 	}

 	function handleKeydown(event: KeyboardEvent) {
+		// Prevent form submission during IME composition (e.g., Chinese, Japanese, Korean input)
+		if (event.isComposing || event.keyCode === 229) {
+			return;
+		}
+		
 		if (event.key === 'Enter' && !event.shiftKey) {
 			event.preventDefault();
 			handleSubmit();
--- a/dashboard/src/lib/components/index.ts
+++ b/dashboard/src/lib/components/index.ts
@@ -1,8 +1,7 @@
-export { default as TopologyGraph } from './TopologyGraph.svelte';
-export { default as ChatForm } from './ChatForm.svelte';
-export { default as ChatMessages } from './ChatMessages.svelte';
-export { default as ChatAttachments } from './ChatAttachments.svelte';
-export { default as ChatSidebar } from './ChatSidebar.svelte';
-export { default as ModelCard } from './ModelCard.svelte';
-export { default as MarkdownContent } from './MarkdownContent.svelte';
-
+export { default as TopologyGraph } from "./TopologyGraph.svelte";
+export { default as ChatForm } from "./ChatForm.svelte";
+export { default as ChatMessages } from "./ChatMessages.svelte";
+export { default as ChatAttachments } from "./ChatAttachments.svelte";
+export { default as ChatSidebar } from "./ChatSidebar.svelte";
+export { default as ModelCard } from "./ModelCard.svelte";
+export { default as MarkdownContent } from "./MarkdownContent.svelte";
--- a/dashboard/src/lib/stores/app.svelte.ts
+++ b/dashboard/src/lib/stores/app.svelte.ts
--- a/dashboard/src/lib/types/files.ts
+++ b/dashboard/src/lib/types/files.ts
@@ -13,55 +13,124 @@ export interface ChatUploadedFile {
 }

 export interface ChatAttachment {
-	type: 'image' | 'text' | 'pdf' | 'audio';
+	type: "image" | "text" | "pdf" | "audio";
 	name: string;
 	content?: string;
 	base64Url?: string;
 	mimeType?: string;
 }

-export type FileCategory = 'image' | 'text' | 'pdf' | 'audio' | 'unknown';
+export type FileCategory = "image" | "text" | "pdf" | "audio" | "unknown";

-export const IMAGE_EXTENSIONS = ['.jpg', '.jpeg', '.png', '.gif', '.webp', '.svg'];
-export const IMAGE_MIME_TYPES = ['image/jpeg', 'image/png', 'image/gif', 'image/webp', 'image/svg+xml'];
+export const IMAGE_EXTENSIONS = [
+	".jpg",
+	".jpeg",
+	".png",
+	".gif",
+	".webp",
+	".svg",
+];
+export const IMAGE_MIME_TYPES = [
+	"image/jpeg",
+	"image/png",
+	"image/gif",
+	"image/webp",
+	"image/svg+xml",
+];

 export const TEXT_EXTENSIONS = [
-	'.txt', '.md', '.json', '.xml', '.yaml', '.yml', '.csv', '.log',
-	'.js', '.ts', '.jsx', '.tsx', '.py', '.java', '.cpp', '.c', '.h',
-	'.css', '.html', '.htm', '.sql', '.sh', '.bat', '.rs', '.go',
-	'.rb', '.php', '.swift', '.kt', '.scala', '.r', '.dart', '.vue', '.svelte'
+	".txt",
+	".md",
+	".json",
+	".xml",
+	".yaml",
+	".yml",
+	".csv",
+	".log",
+	".js",
+	".ts",
+	".jsx",
+	".tsx",
+	".py",
+	".java",
+	".cpp",
+	".c",
+	".h",
+	".css",
+	".html",
+	".htm",
+	".sql",
+	".sh",
+	".bat",
+	".rs",
+	".go",
+	".rb",
+	".php",
+	".swift",
+	".kt",
+	".scala",
+	".r",
+	".dart",
+	".vue",
+	".svelte",
 ];
 export const TEXT_MIME_TYPES = [
-	'text/plain', 'text/markdown', 'text/csv', 'text/html', 'text/css',
-	'application/json', 'application/xml', 'text/xml', 'application/javascript',
-	'text/javascript', 'application/typescript'
+	"text/plain",
+	"text/markdown",
+	"text/csv",
+	"text/html",
+	"text/css",
+	"application/json",
+	"application/xml",
+	"text/xml",
+	"application/javascript",
+	"text/javascript",
+	"application/typescript",
 ];

-export const PDF_EXTENSIONS = ['.pdf'];
-export const PDF_MIME_TYPES = ['application/pdf'];
+export const PDF_EXTENSIONS = [".pdf"];
+export const PDF_MIME_TYPES = ["application/pdf"];

-export const AUDIO_EXTENSIONS = ['.mp3', '.wav', '.ogg', '.m4a'];
-export const AUDIO_MIME_TYPES = ['audio/mpeg', 'audio/wav', 'audio/ogg', 'audio/mp4'];
+export const AUDIO_EXTENSIONS = [".mp3", ".wav", ".ogg", ".m4a"];
+export const AUDIO_MIME_TYPES = [
+	"audio/mpeg",
+	"audio/wav",
+	"audio/ogg",
+	"audio/mp4",
+];

 /**
 * Get file category based on MIME type and extension
 */
-export function getFileCategory(mimeType: string, fileName: string): FileCategory {
-	const extension = fileName.toLowerCase().slice(fileName.lastIndexOf('.'));
-	
-	if (IMAGE_MIME_TYPES.includes(mimeType) || IMAGE_EXTENSIONS.includes(extension)) {
-		return 'image';
+export function getFileCategory(
+	mimeType: string,
+	fileName: string,
+): FileCategory {
+	const extension = fileName.toLowerCase().slice(fileName.lastIndexOf("."));
+
+	if (
+		IMAGE_MIME_TYPES.includes(mimeType) ||
+		IMAGE_EXTENSIONS.includes(extension)
+	) {
+		return "image";
 	}
 	if (PDF_MIME_TYPES.includes(mimeType) || PDF_EXTENSIONS.includes(extension)) {
-		return 'pdf';
+		return "pdf";
 	}
-	if (AUDIO_MIME_TYPES.includes(mimeType) || AUDIO_EXTENSIONS.includes(extension)) {
-		return 'audio';
+	if (
+		AUDIO_MIME_TYPES.includes(mimeType) ||
+		AUDIO_EXTENSIONS.includes(extension)
+	) {
+		return "audio";
 	}
-	if (TEXT_MIME_TYPES.includes(mimeType) || TEXT_EXTENSIONS.includes(extension) || mimeType.startsWith('text/')) {
-		return 'text';
+	if (
+		TEXT_MIME_TYPES.includes(mimeType) ||
+		TEXT_EXTENSIONS.includes(extension) ||
+		mimeType.startsWith("text/")
+	) {
+		return "text";
 	}
-	return 'unknown';
+	return "unknown";
 }

 /**
@@ -69,36 +138,36 @@ export function getFileCategory(mimeType: string, fileName: string): FileCategor
 */
 export function getAcceptString(categories: FileCategory[]): string {
 	const accepts: string[] = [];
-	
+
 	for (const category of categories) {
 		switch (category) {
-			case 'image':
+			case "image":
 				accepts.push(...IMAGE_EXTENSIONS, ...IMAGE_MIME_TYPES);
 				break;
-			case 'text':
+			case "text":
 				accepts.push(...TEXT_EXTENSIONS, ...TEXT_MIME_TYPES);
 				break;
-			case 'pdf':
+			case "pdf":
 				accepts.push(...PDF_EXTENSIONS, ...PDF_MIME_TYPES);
 				break;
-			case 'audio':
+			case "audio":
 				accepts.push(...AUDIO_EXTENSIONS, ...AUDIO_MIME_TYPES);
 				break;
 		}
 	}
-	
-	return accepts.join(',');
+
+	return accepts.join(",");
 }

 /**
 * Format file size for display
 */
 export function formatFileSize(bytes: number): string {
-	if (bytes === 0) return '0 B';
+	if (bytes === 0) return "0 B";
 	const k = 1024;
-	const sizes = ['B', 'KB', 'MB', 'GB'];
+	const sizes = ["B", "KB", "MB", "GB"];
 	const i = Math.floor(Math.log(bytes) / Math.log(k));
-	return parseFloat((bytes / Math.pow(k, i)).toFixed(1)) + ' ' + sizes[i];
+	return parseFloat((bytes / Math.pow(k, i)).toFixed(1)) + " " + sizes[i];
 }

 /**
@@ -128,42 +197,44 @@ export function readFileAsText(file: File): Promise<string> {
 /**
 * Process uploaded files into ChatUploadedFile format
 */
-export async function processUploadedFiles(files: File[]): Promise<ChatUploadedFile[]> {
+export async function processUploadedFiles(
+	files: File[],
+): Promise<ChatUploadedFile[]> {
 	const results: ChatUploadedFile[] = [];
-	
+
 	for (const file of files) {
-		const id = Date.now().toString() + Math.random().toString(36).substring(2, 9);
+		const id =
+			Date.now().toString() + Math.random().toString(36).substring(2, 9);
 		const category = getFileCategory(file.type, file.name);
-		
+
 		const base: ChatUploadedFile = {
 			id,
 			name: file.name,
 			size: file.size,
 			type: file.type,
-			file
+			file,
 		};
-		
+
 		try {
-			if (category === 'image') {
+			if (category === "image") {
 				const preview = await readFileAsDataURL(file);
 				results.push({ ...base, preview });
-			} else if (category === 'text' || category === 'unknown') {
+			} else if (category === "text" || category === "unknown") {
 				const textContent = await readFileAsText(file);
 				results.push({ ...base, textContent });
-			} else if (category === 'pdf') {
+			} else if (category === "pdf") {
 				results.push(base);
-			} else if (category === 'audio') {
+			} else if (category === "audio") {
 				const preview = await readFileAsDataURL(file);
 				results.push({ ...base, preview });
 			} else {
 				results.push(base);
 			}
 		} catch (error) {
-			console.error('Error processing file:', file.name, error);
+			console.error("Error processing file:", file.name, error);
 			results.push(base);
 		}
 	}
-	
+
 	return results;
 }
-
--- a/dashboard/src/routes/+page.svelte
+++ b/dashboard/src/routes/+page.svelte
@@ -51,6 +51,59 @@ const sidebarVisible = $derived(chatSidebarVisible());
 	let selectedSharding = $state<'Pipeline' | 'Tensor'>('Pipeline');
 	type InstanceMeta = 'MlxRing' | 'MlxIbv' | 'MlxJaccl';
 	
+	// Launch defaults persistence
+	const LAUNCH_DEFAULTS_KEY = 'exo-launch-defaults';
+	interface LaunchDefaults {
+		modelId: string | null;
+		sharding: 'Pipeline' | 'Tensor';
+		instanceType: InstanceMeta;
+		minNodes: number;
+	}
+	
+	function saveLaunchDefaults(): void {
+		const defaults: LaunchDefaults = {
+			modelId: selectedPreviewModelId(),
+			sharding: selectedSharding,
+			instanceType: selectedInstanceType,
+			minNodes: selectedMinNodes,
+		};
+		try {
+			localStorage.setItem(LAUNCH_DEFAULTS_KEY, JSON.stringify(defaults));
+		} catch (e) {
+			console.warn('Failed to save launch defaults:', e);
+		}
+	}
+	
+	function loadLaunchDefaults(): LaunchDefaults | null {
+		try {
+			const stored = localStorage.getItem(LAUNCH_DEFAULTS_KEY);
+			if (!stored) return null;
+			return JSON.parse(stored) as LaunchDefaults;
+		} catch (e) {
+			console.warn('Failed to load launch defaults:', e);
+			return null;
+		}
+	}
+	
+	function applyLaunchDefaults(availableModels: Array<{id: string}>, maxNodes: number): void {
+		const defaults = loadLaunchDefaults();
+		if (!defaults) return;
+		
+		// Apply sharding and instance type unconditionally
+		selectedSharding = defaults.sharding;
+		selectedInstanceType = defaults.instanceType;
+		
+		// Apply minNodes if valid (between 1 and maxNodes)
+		if (defaults.minNodes && defaults.minNodes >= 1 && defaults.minNodes <= maxNodes) {
+			selectedMinNodes = defaults.minNodes;
+		}
+		
+		// Only apply model if it exists in the available models
+		if (defaults.modelId && availableModels.some(m => m.id === defaults.modelId)) {
+			selectPreviewModel(defaults.modelId);
+		}
+	}
+	
 	let selectedInstanceType = $state<InstanceMeta>('MlxRing');
 	let selectedMinNodes = $state<number>(1);
 	let minNodesInitialized = $state(false);
@@ -298,6 +351,9 @@ function toggleInstanceDownloadDetails(nodeId: string): void {
 				const data = await response.json();
 				// API returns { data: [{ id, name }] } format
 				models = data.data || [];
+				// Restore last launch defaults if available
+				const currentNodeCount = topologyData() ? Object.keys(topologyData()!.nodes).length : 1;
+				applyLaunchDefaults(models, currentNodeCount);
 			}
 		} catch (error) {
 			console.error('Failed to fetch models:', error);
@@ -537,7 +593,7 @@ function toggleInstanceDownloadDetails(nodeId: string): void {
 		// Unwrap the instance
 		const [instanceTag, instance] = getTagged(instanceWrapped);
 		if (!instance || typeof instance !== 'object') {
-			return { isDownloading: false, progress: null, statusText: 'UNKNOWN', perNode: [] };
+			return { isDownloading: false, progress: null, statusText: 'PREPARING', perNode: [] };
 		}

 		const inst = instance as { shardAssignments?: { nodeToRunner?: Record<string, string>; runnerToShard?: Record<string, unknown>; modelId?: string } };
@@ -650,7 +706,7 @@ function toggleInstanceDownloadDetails(nodeId: string): void {
 	function deriveInstanceStatus(instanceWrapped: unknown): { statusText: string; statusClass: string } {
 		const [, instance] = getTagged(instanceWrapped);
 		if (!instance || typeof instance !== 'object') {
-			return { statusText: 'UNKNOWN', statusClass: 'inactive' };
+			return { statusText: 'PREPARING', statusClass: 'inactive' };
 		}
 		
 		const inst = instance as { shardAssignments?: { runnerToShard?: Record<string, unknown> } };
@@ -679,7 +735,7 @@ function toggleInstanceDownloadDetails(nodeId: string): void {

 		const has = (s: string) => statuses.includes(s);

-		if (statuses.length === 0) return { statusText: 'UNKNOWN', statusClass: 'inactive' };
+		if (statuses.length === 0) return { statusText: 'PREPARING', statusClass: 'inactive' };
 		if (has('Failed')) return { statusText: 'FAILED', statusClass: 'failed' };
 		if (has('Shutdown')) return { statusText: 'SHUTDOWN', statusClass: 'inactive' };
 		if (has('Loading')) return { statusText: 'LOADING', statusClass: 'starting' };
@@ -988,6 +1044,7 @@ function toggleInstanceDownloadDetails(nodeId: string): void {

 	function handleSliderMouseUp() {
 		isDraggingSlider = false;
+		saveLaunchDefaults();
 	}

 	// Handle touch events for mobile
@@ -1007,6 +1064,7 @@ function toggleInstanceDownloadDetails(nodeId: string): void {

 	function handleSliderTouchEnd() {
 		isDraggingSlider = false;
+		saveLaunchDefaults();
 	}

 	const nodeCount = $derived(data ? Object.keys(data.nodes).length : 0);
@@ -1209,9 +1267,9 @@ function toggleInstanceDownloadDetails(nodeId: string): void {
 							<div class="flex-1 h-px bg-gradient-to-r from-exo-yellow/30 to-transparent"></div>
 						</div>
 						
-						<div 
+						<div
 							bind:this={instancesContainerRef}
-							class="max-h-72 space-y-3 overflow-y-auto"
+							class="max-h-72 xl:max-h-96 space-y-3 overflow-y-auto overflow-x-hidden py-px"
 						>
 								{#each Object.entries(instanceData) as [id, instance]}
 									{@const downloadInfo = getInstanceDownloadStatus(id, instance)}
@@ -1464,6 +1522,7 @@ function toggleInstanceDownloadDetails(nodeId: string): void {
 												onclick={() => {
 													if (modelCanFit) {
 														selectPreviewModel(model.id);
+														saveLaunchDefaults();
 														isModelDropdownOpen = false;
 														modelDropdownSearch = '';
 													}
@@ -1497,7 +1556,7 @@ function toggleInstanceDownloadDetails(nodeId: string): void {
 								<div class="text-xs text-white/70 font-mono mb-2">Sharding:</div>
 								<div class="flex gap-2">
 									<button 
-										onclick={() => selectedSharding = 'Pipeline'}
+										onclick={() => { selectedSharding = 'Pipeline'; saveLaunchDefaults(); }}
 										class="flex items-center gap-2 py-2 px-4 text-sm font-mono border rounded transition-all duration-200 cursor-pointer {selectedSharding === 'Pipeline' ? 'bg-transparent text-exo-yellow border-exo-yellow' : 'bg-transparent text-white/70 border-exo-medium-gray/50 hover:border-exo-yellow/50'}"
 									>
 										<span class="w-4 h-4 rounded-full border-2 flex items-center justify-center {selectedSharding === 'Pipeline' ? 'border-exo-yellow' : 'border-exo-medium-gray'}">
@@ -1508,7 +1567,7 @@ function toggleInstanceDownloadDetails(nodeId: string): void {
 										Pipeline
 									</button>
 									<button 
-										onclick={() => selectedSharding = 'Tensor'}
+										onclick={() => { selectedSharding = 'Tensor'; saveLaunchDefaults(); }}
 										class="flex items-center gap-2 py-2 px-4 text-sm font-mono border rounded transition-all duration-200 cursor-pointer {selectedSharding === 'Tensor' ? 'bg-transparent text-exo-yellow border-exo-yellow' : 'bg-transparent text-white/70 border-exo-medium-gray/50 hover:border-exo-yellow/50'}"
 									>
 										<span class="w-4 h-4 rounded-full border-2 flex items-center justify-center {selectedSharding === 'Tensor' ? 'border-exo-yellow' : 'border-exo-medium-gray'}">
@@ -1526,7 +1585,7 @@ function toggleInstanceDownloadDetails(nodeId: string): void {
 								<div class="text-xs text-white/70 font-mono mb-2">Instance Type:</div>
 								<div class="flex gap-2">
 									<button 
-										onclick={() => selectedInstanceType = 'MlxRing'}
+										onclick={() => { selectedInstanceType = 'MlxRing'; saveLaunchDefaults(); }}
 										class="flex items-center gap-2 py-2 px-4 text-sm font-mono border rounded transition-all duration-200 cursor-pointer {selectedInstanceType === 'MlxRing' ? 'bg-transparent text-exo-yellow border-exo-yellow' : 'bg-transparent text-white/70 border-exo-medium-gray/50 hover:border-exo-yellow/50'}"
 									>
 										<span class="w-4 h-4 rounded-full border-2 flex items-center justify-center {selectedInstanceType === 'MlxRing' ? 'border-exo-yellow' : 'border-exo-medium-gray'}">
@@ -1537,7 +1596,7 @@ function toggleInstanceDownloadDetails(nodeId: string): void {
 										MLX Ring
 									</button>
 									<button 
-										onclick={() => selectedInstanceType = 'MlxIbv'}
+										onclick={() => { selectedInstanceType = 'MlxIbv'; saveLaunchDefaults(); }}
 										class="flex items-center gap-2 py-2 px-4 text-sm font-mono border rounded transition-all duration-200 cursor-pointer {selectedInstanceType === 'MlxIbv' ? 'bg-transparent text-exo-yellow border-exo-yellow' : 'bg-transparent text-white/70 border-exo-medium-gray/50 hover:border-exo-yellow/50'}"
 									>
 										<span class="w-4 h-4 rounded-full border-2 flex items-center justify-center {selectedInstanceType === 'MlxIbv' ? 'border-exo-yellow' : 'border-exo-medium-gray'}">
@@ -1714,7 +1773,7 @@ function toggleInstanceDownloadDetails(nodeId: string): void {
 								<h3 class="text-xs text-exo-yellow font-mono tracking-[0.2em] uppercase">Instances</h3>
 								<div class="flex-1 h-px bg-gradient-to-r from-exo-yellow/30 to-transparent"></div>
 							</div>
-								<div class="space-y-3 max-h-72 overflow-y-auto pr-1">
+								<div class="space-y-3 max-h-72 xl:max-h-96 overflow-y-auto overflow-x-hidden py-px pr-1">
 									{#each Object.entries(instanceData) as [id, instance]}
 										{@const downloadInfo = getInstanceDownloadStatus(id, instance)}
 										{@const statusText = downloadInfo.statusText}
--- a/dashboard/src/routes/downloads/+page.svelte
+++ b/dashboard/src/routes/downloads/+page.svelte
@@ -199,7 +199,13 @@
 					const rawProgress = (downloadPayload as Record<string, unknown>).download_progress
 						?? (downloadPayload as Record<string, unknown>).downloadProgress
 						?? {};
-					const totalBytes = getBytes((rawProgress as Record<string, unknown>).total_bytes ?? (rawProgress as Record<string, unknown>).totalBytes);
+					// For DownloadCompleted, total_bytes is at top level; for DownloadOngoing, it's inside download_progress
+					const totalBytes = getBytes(
+						(downloadPayload as Record<string, unknown>).total_bytes
+						?? (downloadPayload as Record<string, unknown>).totalBytes
+						?? (rawProgress as Record<string, unknown>).total_bytes
+						?? (rawProgress as Record<string, unknown>).totalBytes
+					);
 					const downloadedBytes = getBytes((rawProgress as Record<string, unknown>).downloaded_bytes ?? (rawProgress as Record<string, unknown>).downloadedBytes);
 					const speed = (rawProgress as Record<string, unknown>).speed as number ?? 0;
 					const etaMs = (rawProgress as Record<string, unknown>).eta_ms as number ?? (rawProgress as Record<string, unknown>).etaMs as number ?? 0;
@@ -332,8 +338,13 @@
 								<div class="text-lg font-mono text-white truncate">{node.nodeName}</div>
 								<div class="text-xs text-exo-light-gray font-mono truncate">{node.nodeId}</div>
 							</div>
-							<div class="text-xs font-mono uppercase tracking-wider whitespace-nowrap shrink-0">
-								<span class="text-green-400">{node.models.filter(m => m.status === 'completed').length}</span><span class="text-exo-yellow"> /{node.models.length} models</span>
+							<div class="text-xs font-mono uppercase tracking-wider whitespace-nowrap shrink-0 text-right">
+								<div>
+									<span class="text-green-400">{node.models.filter(m => m.status === 'completed').length}</span><span class="text-exo-yellow"> / {node.models.length} models</span>
+								</div>
+								<div class="text-exo-light-gray normal-case tracking-normal">
+									{formatBytes(node.models.filter(m => m.status === 'completed').reduce((sum, m) => sum + m.totalBytes, 0))} on disk
+								</div>
 							</div>
 						</div>

@@ -385,7 +396,7 @@
 								</div>

 								<div class="flex items-center justify-between text-xs font-mono text-exo-light-gray">
-									<span>{model.status === 'completed' ? 'Completed' : `${formatSpeed(model.speed)} • ETA ${formatEta(model.etaMs)}`}</span>
+									<span>{model.status === 'completed' ? `Completed (${formatBytes(model.totalBytes)})` : `${formatSpeed(model.speed)} • ETA ${formatEta(model.etaMs)}`}</span>
 									{#if model.status !== 'completed'}
 										<span>{model.files.length} file{model.files.length === 1 ? '' : 's'}</span>
 									{/if}
--- a/dashboard/vite.config.ts
+++ b/dashboard/vite.config.ts
@@ -1,16 +1,15 @@
-import tailwindcss from '@tailwindcss/vite';
-import { sveltekit } from '@sveltejs/kit/vite';
-import { defineConfig } from 'vite';
+import tailwindcss from "@tailwindcss/vite";
+import { sveltekit } from "@sveltejs/kit/vite";
+import { defineConfig } from "vite";

 export default defineConfig({
 	plugins: [tailwindcss(), sveltekit()],
 	server: {
 		proxy: {
-			'/v1': 'http://localhost:52415',
-			'/state': 'http://localhost:52415',
-			'/models': 'http://localhost:52415',
-			'/instance': 'http://localhost:52415'
-		}
-	}
+			"/v1": "http://localhost:52415",
+			"/state": "http://localhost:52415",
+			"/models": "http://localhost:52415",
+			"/instance": "http://localhost:52415",
+		},
+	},
 });
-
--- a/docs/api.md
+++ b/docs/api.md
@@ -0,0 +1,212 @@
+# EXO API – Technical Reference
+
+This document describes the REST API exposed by the **EXO ** service, as implemented in:
+
+`src/exo/master/api.py`
+
+The API is used to manage model instances in the cluster, inspect cluster state, and perform inference using an OpenAI-compatible interface.
+
+Base URL example:
+
+```
+http://localhost:52415
+```
+
+## 1. General / Meta Endpoints
+
+### Get Master Node ID
+
+**GET** `/node_id`
+
+Returns the identifier of the current master node.
+
+**Response (example):**
+
+```json
+{
+  "node_id": "node-1234"
+}
+```
+
+### Get Cluster State
+
+**GET** `/state`
+
+Returns the current state of the cluster, including nodes and active instances.
+
+**Response:**
+JSON object describing topology, nodes, and instances.
+
+### Get Events
+
+**GET** `/events`
+
+Returns the list of internal events recorded by the master (mainly for debugging and observability).
+
+**Response:**
+Array of event objects.
+
+## 2. Model Instance Management
+
+### Create Instance
+
+**POST** `/instance`
+
+Creates a new model instance in the cluster.
+
+**Request body (example):**
+
+```json
+{
+  "instance": {
+    "model_id": "llama-3.2-1b",
+    "placement": { }
+  }
+}
+```
+
+**Response:**
+JSON description of the created instance.
+
+### Delete Instance
+
+**DELETE** `/instance/{instance_id}`
+
+Deletes an existing instance by ID.
+
+**Path parameters:**
+
+* `instance_id`: string, ID of the instance to delete
+
+**Response:**
+Status / confirmation JSON.
+
+### Get Instance
+
+**GET** `/instance/{instance_id}`
+
+Returns details of a specific instance.
+
+**Path parameters:**
+
+* `instance_id`: string
+
+**Response:**
+JSON description of the instance.
+
+### Preview Placements
+
+**GET** `/instance/previews?model_id=...`
+
+Returns possible placement previews for a given model.
+
+**Query parameters:**
+
+* `model_id`: string, required
+
+**Response:**
+Array of placement preview objects.
+
+### Compute Placement
+
+**GET** `/instance/placement`
+
+Computes a placement for a potential instance without creating it.
+
+**Query parameters (typical):**
+
+* `model_id`: string
+* `sharding`: string or config
+* `instance_meta`: JSON-encoded metadata
+* `min_nodes`: integer
+
+**Response:**
+JSON object describing the proposed placement / instance configuration.
+
+### Place Instance (Dry Operation)
+
+**POST** `/place_instance`
+
+Performs a placement operation for an instance (planning step), without necessarily creating it.
+
+**Request body:**
+JSON describing the instance to be placed.
+
+**Response:**
+Placement result.
+
+## 3. Models
+
+### List Models
+
+**GET** `/models`
+**GET** `/v1/models` (alias)
+
+Returns the list of available models and their metadata.
+
+**Response:**
+Array of model descriptors.
+
+## 4. Inference / Chat Completions
+
+### OpenAI-Compatible Chat Completions
+
+**POST** `/v1/chat/completions`
+
+Executes a chat completion request using an OpenAI-compatible schema. Supports streaming and non-streaming modes.
+
+**Request body (example):**
+
+```json
+{
+  "model": "llama-3.2-1b",
+  "messages": [
+    { "role": "system", "content": "You are a helpful assistant." },
+    { "role": "user", "content": "Hello" }
+  ],
+  "stream": false
+}
+```
+
+**Response:**
+OpenAI-compatible chat completion response.
+
+### Benchmarked Chat Completions
+
+**POST** `/bench/chat/completions`
+
+Same as `/v1/chat/completions`, but also returns performance and generation statistics.
+
+**Request body:**
+Same schema as `/v1/chat/completions`.
+
+**Response:**
+Chat completion plus benchmarking metrics.
+
+## 5. Complete Endpoint Summary
+
+```
+GET     /node_id
+GET     /state
+GET     /events
+
+POST    /instance
+GET     /instance/{instance_id}
+DELETE  /instance/{instance_id}
+
+GET     /instance/previews
+GET     /instance/placement
+POST    /place_instance
+
+GET     /models
+GET     /v1/models
+
+POST    /v1/chat/completions
+POST    /bench/chat/completions
+```
+
+## 6. Notes
+
+* The `/v1/chat/completions` endpoint is compatible with the OpenAI API format, so existing OpenAI clients can be pointed to EXO by changing the base URL.
+* The instance placement endpoints allow you to plan and preview cluster allocations before actually creating instances.
+* The `/events` and `/state` endpoints are primarily intended for operational visibility and debugging.
--- a/flake.nix
+++ b/flake.nix
@@ -16,12 +16,11 @@
    };
  };

-  # TODO: figure out caching story
-  # nixConfig = {
-  #   # nix community cachix
-  #   extra-trusted-public-keys = "nix-community.cachix.org-1:mB9FSh9qf2dCimDSUo8Zy7bkq5CX+/rkCWyvRCYg3Fs=";
-  #   extra-substituters = "https://nix-community.cachix.org";
-  # };
+  nixConfig = {
+    # nix community cachix
+    extra-trusted-public-keys = "exo.cachix.org-1:okq7hl624TBeAR3kV+g39dUFSiaZgLRkLsFBCuJ2NZI=";
+    extra-substituters = "https://exo.cachix.org";
+  };

  outputs =
    inputs:
@@ -42,11 +41,22 @@
        };
        treefmtEval = inputs.treefmt-nix.lib.evalModule pkgs {
          projectRootFile = "flake.nix";
-          programs.ruff-format.enable = true;
-          programs.ruff-format.excludes = [ "rust/exo_pyo3_bindings/exo_pyo3_bindings.pyi" ];
-          programs.rustfmt.enable = true;
-          programs.rustfmt.package = (fenixToolchain system).rustfmt;
-          programs.nixpkgs-fmt.enable = true;
+          programs = {
+            nixpkgs-fmt.enable = true;
+            ruff-format = {
+              enable = true;
+              excludes = [ "rust/exo_pyo3_bindings/exo_pyo3_bindings.pyi" ];
+            };
+            rustfmt = {
+              enable = true;
+              package = (fenixToolchain system).rustfmt;
+            };
+            prettier = {
+              enable = true;
+              includes = [ "*.ts" ];
+            };
+            swift-format.enable = true;
+          };
        };
      in
      {
@@ -62,6 +72,9 @@
          packages =
            with pkgs;
            [
+              # FORMATTING
+              treefmtEval.config.build.wrapper
+
              # PYTHON
              python313
              uv
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -8,30 +8,18 @@ dependencies = [
    "aiofiles>=24.1.0",
    "aiohttp>=3.12.14",
    "types-aiofiles>=24.1.0.20250708",
-    "typeguard>=4.4.4",
    "pydantic>=2.11.7",
-    "base58>=2.1.1",
-    "cryptography>=45.0.5",
    "fastapi>=0.116.1",
    "filelock>=3.18.0",
-    "aiosqlite>=0.21.0",
-    "networkx>=3.5",
-    "protobuf>=6.32.0",
-    "rich>=14.1.0",
    "rustworkx>=0.17.1",
-    "sqlmodel>=0.0.24",
-    "sqlalchemy[asyncio]>=2.0.43",
-    "greenlet>=3.2.4",
    "huggingface-hub>=0.33.4",
    "psutil>=7.0.0",
    "loguru>=0.7.3",
-    "textual>=5.3.0",
    "exo_pyo3_bindings", # rust bindings
    "anyio==4.11.0",
-    "bidict>=0.23.1",
-    "mlx>=0.30.1; sys_platform == 'darwin'",
-    "mlx[cpu]>=0.30.1; sys_platform == 'linux'",
-    "mlx-lm>=0.28.3",
+    "mlx==0.30.1; sys_platform == 'darwin'",
+    "mlx[cpu]==0.30.1; sys_platform == 'linux'",
+    "mlx-lm @ git+https://github.com/AlexCheema/mlx-lm.git@fix-transformers-5.0.0rc2",
    "tiktoken>=0.12.0", # required for kimi k2 tokenizer
    "hypercorn>=0.18.0",
    "openai-harmony>=0.0.8",
@@ -45,6 +33,7 @@ exo = "exo.main:main"
 # dependencies only required for development
 [dependency-groups]
 dev = [
+    "basedpyright>=1.29.0",
    "pyinstaller>=6.17.0",
    "pytest>=8.4.0",
    "pytest-asyncio>=1.0.0",
@@ -82,7 +71,7 @@ build-backend = "uv_build"
 ###

 [tool.basedpyright]
-include = [".venv/lib/mlx", ".venv/lib/mlx_lm", "src"]
+include = [".venv/lib/mlx", ".venv/lib/mlx_lm", "src", "bench"]
 typeCheckingMode = "strict"
 failOnWarnings = true

@@ -110,6 +99,7 @@ root = "src"

 # supported platforms for this project
 [tool.uv]
+prerelease = "allow"
 environments = [
    "sys_platform == 'darwin'",
    "sys_platform == 'linux'",
--- a/src/exo/main.py
+++ b/src/exo/main.py
@@ -28,7 +28,7 @@ from exo.worker.main import Worker
@dataclass
 class Node:
    router: Router
-    worker: Worker
+    worker: Worker | None
    election: Election  # Every node participates in election, as we do want a node to become master even if it isn't a master candidate if no master candidates are present.
    election_result_receiver: Receiver[ElectionResult]
    master: Master | None
@@ -62,15 +62,19 @@ class Node:
        else:
            api = None

-        worker = Worker(
-            node_id,
-            session_id,
-            exo_shard_downloader(),
-            connection_message_receiver=router.receiver(topics.CONNECTION_MESSAGES),
-            global_event_receiver=router.receiver(topics.GLOBAL_EVENTS),
-            local_event_sender=router.sender(topics.LOCAL_EVENTS),
-            command_sender=router.sender(topics.COMMANDS),
-        )
+        if not args.no_worker:
+            worker = Worker(
+                node_id,
+                session_id,
+                exo_shard_downloader(),
+                connection_message_receiver=router.receiver(topics.CONNECTION_MESSAGES),
+                global_event_receiver=router.receiver(topics.GLOBAL_EVENTS),
+                local_event_sender=router.sender(topics.LOCAL_EVENTS),
+                command_sender=router.sender(topics.COMMANDS),
+            )
+        else:
+            worker = None
+
        # We start every node with a master
        master = Master(
            node_id,
@@ -100,8 +104,9 @@ class Node:
        async with self._tg as tg:
            signal.signal(signal.SIGINT, lambda _, __: self.shutdown())
            tg.start_soon(self.router.run)
-            tg.start_soon(self.worker.run)
            tg.start_soon(self.election.run)
+            if self.worker:
+                tg.start_soon(self.worker.run)
            if self.master:
                tg.start_soon(self.master.run)
            if self.api:
@@ -209,6 +214,7 @@ class Args(CamelCaseModel):
    spawn_api: bool = False
    api_port: PositiveInt = 52415
    tb_only: bool = False
+    no_worker: bool = False

    @classmethod
    def parse(cls) -> Self:
@@ -246,6 +252,10 @@ class Args(CamelCaseModel):
            dest="api_port",
            default=52415,
        )
+        parser.add_argument(
+            "--no-worker",
+            action="store_true",
+        )

        args = parser.parse_args()
        return cls(**vars(args))  # pyright: ignore[reportAny] - We are intentionally validating here, we can't do it statically
--- a/src/exo/master/api.py
+++ b/src/exo/master/api.py
@@ -27,6 +27,8 @@ from exo.shared.logging import InterceptLogger
 from exo.shared.models.model_cards import MODEL_CARDS
 from exo.shared.models.model_meta import get_model_meta
 from exo.shared.types.api import (
+    BenchChatCompletionResponse,
+    BenchChatCompletionTaskParams,
    ChatCompletionChoice,
    ChatCompletionMessage,
    ChatCompletionResponse,
@@ -34,6 +36,7 @@ from exo.shared.types.api import (
    CreateInstanceResponse,
    DeleteInstanceResponse,
    FinishReason,
+    GenerationStats,
    ModelList,
    ModelListModel,
    PlaceInstanceParams,
@@ -172,6 +175,7 @@ class API:
        self.app.post("/v1/chat/completions", response_model=None)(
            self.chat_completions
        )
+        self.app.post("/bench/chat/completions")(self.bench_chat_completions)
        self.app.get("/state")(lambda: self.state)
        self.app.get("/events")(lambda: self._event_log)

@@ -490,6 +494,45 @@ class API:
            ],
        )

+    async def _collect_chat_completion_with_stats(
+        self, command_id: CommandId, parse_gpt_oss: bool
+    ) -> BenchChatCompletionResponse:
+        text_parts: list[str] = []
+        model: str | None = None
+        finish_reason: FinishReason | None = None
+
+        stats: GenerationStats | None = None
+
+        async for chunk in self._chat_chunk_stream(command_id, parse_gpt_oss):
+            if model is None:
+                model = chunk.model
+
+            text_parts.append(chunk.text)
+            stats = chunk.stats or stats
+
+            if chunk.finish_reason is not None:
+                finish_reason = chunk.finish_reason
+
+        combined_text = "".join(text_parts)
+        assert model is not None
+
+        resp = BenchChatCompletionResponse(
+            id=command_id,
+            created=int(time.time()),
+            model=model,
+            choices=[
+                ChatCompletionChoice(
+                    index=0,
+                    message=ChatCompletionMessage(
+                        role="assistant", content=combined_text
+                    ),
+                    finish_reason=finish_reason,
+                )
+            ],
+            generation_stats=stats,
+        )
+        return resp
+
    async def _trigger_notify_user_to_download_model(self, model_id: str) -> None:
        logger.warning(
            "TODO: we should send a notification to the user to download the model"
@@ -525,6 +568,33 @@ class API:

        return await self._collect_chat_completion(command.command_id, parse_gpt_oss)

+    async def bench_chat_completions(
+        self, payload: BenchChatCompletionTaskParams
+    ) -> BenchChatCompletionResponse:
+        model_meta = await resolve_model_meta(payload.model)
+        parse_gpt_oss = "gpt-oss" in model_meta.model_id.lower()
+        payload.model = model_meta.model_id
+
+        if not any(
+            instance.shard_assignments.model_id == payload.model
+            for instance in self.state.instances.values()
+        ):
+            await self._trigger_notify_user_to_download_model(payload.model)
+            raise HTTPException(
+                status_code=404, detail=f"No instance found for model {payload.model}"
+            )
+
+        payload.stream = False
+
+        command = ChatCompletion(request_params=payload)
+        await self._send(command)
+
+        response = await self._collect_chat_completion_with_stats(
+            command.command_id,
+            parse_gpt_oss,
+        )
+        return response
+
    def _calculate_total_available_memory(self) -> Memory:
        """Calculate total available memory across all nodes in bytes."""
        total_available = Memory()
--- a/src/exo/master/placement.py
+++ b/src/exo/master/placement.py
@@ -21,6 +21,7 @@ from exo.shared.types.commands import (
 )
 from exo.shared.types.events import Event, InstanceCreated, InstanceDeleted
 from exo.shared.types.memory import Memory
+from exo.shared.types.models import ModelId
 from exo.shared.types.topology import NodeInfo
 from exo.shared.types.worker.instances import (
    Instance,
@@ -29,6 +30,7 @@ from exo.shared.types.worker.instances import (
    MlxJacclInstance,
    MlxRingInstance,
 )
+from exo.shared.types.worker.shards import Sharding


 def random_ephemeral_port() -> int:
@@ -65,6 +67,28 @@ def place_instance(
    if not cycles_with_sufficient_memory:
        raise ValueError("No cycles found with sufficient memory")

+    if command.sharding == Sharding.Tensor:
+        if not command.model_meta.supports_tensor:
+            raise ValueError(
+                f"Requested Tensor sharding but this model does not support tensor parallelism: {command.model_meta.model_id}"
+            )
+        # TODO: the condition here for tensor parallel is not correct, but it works good enough for now.
+        cycles_with_sufficient_memory = [
+            cycle
+            for cycle in cycles_with_sufficient_memory
+            if command.model_meta.hidden_size % len(cycle) == 0
+        ]
+        if not cycles_with_sufficient_memory:
+            raise ValueError(
+                f"No tensor sharding found for model with hidden_size {command.model_meta.hidden_size} candidate cycles"
+            )
+    if command.sharding == Sharding.Pipeline and command.model_meta.model_id == ModelId(
+        "mlx-community/DeepSeek-V3.1-8bit"
+    ):
+        raise ValueError(
+            "Pipeline parallelism is not supported for DeepSeek V3.1 (8-bit)"
+        )
+
    smallest_cycles = get_smallest_cycles(cycles_with_sufficient_memory)

    smallest_tb_cycles = [
--- a/src/exo/master/placement_utils.py
+++ b/src/exo/master/placement_utils.py
@@ -385,13 +385,14 @@ def get_mlx_jaccl_coordinators(
    address in format "X.X.X.X:PORT" per node.
    """
    rank_0_node = selected_cycle[0]
-    logger.info(f"Selecting coordinator from rank 0 node: {rank_0_node.node_id}")
+    logger.debug(f"Selecting coordinator from rank 0 node: {rank_0_node.node_id}")

    def get_ip_for_node(n: NodeInfo) -> str:
        if n.node_id == rank_0_node.node_id:
            return "0.0.0.0"

-        for ip, _ in _find_connection_ip(n, rank_0_node, cycle_digraph):
+        ip = _find_ip_prioritised(n, rank_0_node, cycle_digraph)
+        if ip:
            return ip

        logger.warning(
--- a/src/exo/master/tests/test_placement.py
+++ b/src/exo/master/tests/test_placement.py
@@ -50,7 +50,7 @@ def model_meta() -> ModelMetadata:
        storage_size=Memory.from_kb(1000),
        pretty_name="Test Model",
        n_layers=10,
-        hidden_size=10,
+        hidden_size=30,
        supports_tensor=True,
    )

--- a/src/exo/shared/models/model_cards.py
+++ b/src/exo/shared/models/model_cards.py
@@ -70,34 +70,36 @@ MODEL_CARDS: dict[str, ModelCard] = {
            supports_tensor=True,
        ),
    ),
-    # "deepseek-v3.2": ModelCard(
-    #     short_id="deepseek-v3.2",
-    #     model_id=ModelId("mlx-community/DeepSeek-V3.2-8bit"),
-    #     name="DeepSeek V3.2 (8-bit)",
-    #     description="""DeepSeek V3.2 is a large language model trained on the DeepSeek V3.2 dataset.""",
-    #     tags=[],
-    #     metadata=ModelMetadata(
-    #         model_id=ModelId("mlx-community/DeepSeek-V3.2-8bit"),
-    #         pretty_name="DeepSeek V3.2 (8-bit)",
-    #         storage_size=Memory.from_kb(754706307),
-    #         n_layers=61,
-    #         hidden_size=7168,
-    #     ),
-    # ),
-    # "deepseek-v3.2-4bit": ModelCard(
-    #     short_id="deepseek-v3.2-4bit",
-    #     model_id=ModelId("mlx-community/DeepSeek-V3.2-4bit"),
-    #     name="DeepSeek V3.2 (4-bit)",
-    #     description="""DeepSeek V3.2 is a large language model trained on the DeepSeek V3.2 dataset.""",
-    #     tags=[],
-    #     metadata=ModelMetadata(
-    #         model_id=ModelId("mlx-community/DeepSeek-V3.2-4bit"),
-    #         pretty_name="DeepSeek V3.2 (4-bit)",
-    #         storage_size=Memory.from_kb(754706307 // 2),  # TODO !!!!!
-    #         n_layers=61,
-    #         hidden_size=7168,
-    #     ),
-    # ),
+    "deepseek-v3.2": ModelCard(
+        short_id="deepseek-v3.2",
+        model_id=ModelId("mlx-community/DeepSeek-V3.2-8bit"),
+        name="DeepSeek V3.2 (8-bit)",
+        description="""DeepSeek V3.2 is a large language model trained on the DeepSeek V3.2 dataset.""",
+        tags=[],
+        metadata=ModelMetadata(
+            model_id=ModelId("mlx-community/DeepSeek-V3.2-8bit"),
+            pretty_name="DeepSeek V3.2 (8-bit)",
+            storage_size=Memory.from_kb(754706307),
+            n_layers=61,
+            hidden_size=7168,
+            supports_tensor=True,
+        ),
+    ),
+    "deepseek-v3.2-4bit": ModelCard(
+        short_id="deepseek-v3.2-4bit",
+        model_id=ModelId("mlx-community/DeepSeek-V3.2-4bit"),
+        name="DeepSeek V3.2 (4-bit)",
+        description="""DeepSeek V3.2 is a large language model trained on the DeepSeek V3.2 dataset.""",
+        tags=[],
+        metadata=ModelMetadata(
+            model_id=ModelId("mlx-community/DeepSeek-V3.2-4bit"),
+            pretty_name="DeepSeek V3.2 (4-bit)",
+            storage_size=Memory.from_kb(754706307 // 2),  # TODO !!!!!
+            n_layers=61,
+            hidden_size=7168,
+            supports_tensor=True,
+        ),
+    ),
    # deepseek r1
    # "deepseek-r1-0528-4bit": ModelCard(
    #     short_id="deepseek-r1-0528-4bit",
@@ -554,6 +556,36 @@ MODEL_CARDS: dict[str, ModelCard] = {
            supports_tensor=True,
        ),
    ),
+    "glm-4.7-4bit": ModelCard(
+        short_id="glm-4.7-4bit",
+        model_id=ModelId("mlx-community/GLM-4.7-4bit"),
+        name="GLM 4.7 4bit",
+        description="GLM 4.7 4bit",
+        tags=[],
+        metadata=ModelMetadata(
+            model_id=ModelId("mlx-community/GLM-4.7-4bit"),
+            pretty_name="GLM 4.7 4bit",
+            storage_size=Memory.from_bytes(198556925568),
+            n_layers=91,
+            hidden_size=5120,
+            supports_tensor=True,
+        ),
+    ),
+    "glm-4.7-8bit-gs32": ModelCard(
+        short_id="glm-4.7-8bit-gs32",
+        model_id=ModelId("mlx-community/GLM-4.7-8bit-gs32"),
+        name="GLM 4.7 8bit (gs32)",
+        description="GLM 4.7 8bit (gs32)",
+        tags=[],
+        metadata=ModelMetadata(
+            model_id=ModelId("mlx-community/GLM-4.7-8bit-gs32"),
+            pretty_name="GLM 4.7 8bit (gs32)",
+            storage_size=Memory.from_bytes(396963397248),
+            n_layers=91,
+            hidden_size=5120,
+            supports_tensor=True,
+        ),
+    ),
    # "devstral-2-123b-instruct-2512-8bit": ModelCard(
    #     short_id="devstral-2-123b-instruct-2512-8bit",
    #     model_id=ModelId("mlx-community/Devstral-2-123B-Instruct-2512-8bit"),
--- a/src/exo/shared/tests/test_apply/test_apply_node_download.py
+++ b/src/exo/shared/tests/test_apply/test_apply_node_download.py
@@ -2,6 +2,7 @@ from exo.shared.apply import apply_node_download_progress
 from exo.shared.tests.conftest import get_pipeline_shard_metadata
 from exo.shared.types.common import NodeId
 from exo.shared.types.events import NodeDownloadProgress
+from exo.shared.types.memory import Memory
 from exo.shared.types.state import State
 from exo.shared.types.worker.downloads import DownloadCompleted
 from exo.worker.tests.constants import MODEL_A_ID, MODEL_B_ID
@@ -13,6 +14,7 @@ def test_apply_node_download_progress():
    event = DownloadCompleted(
        node_id=NodeId("node-1"),
        shard_metadata=shard1,
+        total_bytes=Memory(),
    )

    new_state = apply_node_download_progress(
@@ -28,10 +30,12 @@ def test_apply_two_node_download_progress():
    event1 = DownloadCompleted(
        node_id=NodeId("node-1"),
        shard_metadata=shard1,
+        total_bytes=Memory(),
    )
    event2 = DownloadCompleted(
        node_id=NodeId("node-1"),
        shard_metadata=shard2,
+        total_bytes=Memory(),
    )
    state = State(downloads={NodeId("node-1"): [event1]})

--- a/src/exo/shared/types/api.py
+++ b/src/exo/shared/types/api.py
@@ -5,6 +5,7 @@ from pydantic import BaseModel, Field, field_validator
 from pydantic_core import PydanticUseDefault

 from exo.shared.types.common import CommandId
+from exo.shared.types.memory import Memory
 from exo.shared.types.models import ModelId, ModelMetadata
 from exo.shared.types.worker.instances import Instance, InstanceId, InstanceMeta
 from exo.shared.types.worker.shards import Sharding
@@ -51,6 +52,10 @@ class ChatCompletionMessage(BaseModel):
    function_call: dict[str, Any] | None = None


+class BenchChatCompletionMessage(ChatCompletionMessage):
+    pass
+
+
 class TopLogprobItem(BaseModel):
    token: str
    logprob: float
@@ -113,6 +118,18 @@ class ChatCompletionResponse(BaseModel):
    service_tier: str | None = None


+class GenerationStats(BaseModel):
+    prompt_tps: float
+    generation_tps: float
+    prompt_tokens: int
+    generation_tokens: int
+    peak_memory_usage: Memory
+
+
+class BenchChatCompletionResponse(ChatCompletionResponse):
+    generation_stats: GenerationStats | None = None
+
+
 class ChatCompletionTaskParams(BaseModel):
    model: str
    frequency_penalty: float | None = None
@@ -135,6 +152,10 @@ class ChatCompletionTaskParams(BaseModel):
    user: str | None = None


+class BenchChatCompletionTaskParams(ChatCompletionTaskParams):
+    pass
+
+
 class PlaceInstanceParams(BaseModel):
    model_id: str
    sharding: Sharding = Sharding.Pipeline
--- a/src/exo/shared/types/chunks.py
+++ b/src/exo/shared/types/chunks.py
@@ -1,5 +1,6 @@
 from enum import Enum

+from exo.shared.types.api import GenerationStats
 from exo.utils.pydantic_ext import TaggedModel

 from .api import FinishReason
@@ -20,6 +21,7 @@ class TokenChunk(BaseChunk):
    text: str
    token_id: int
    finish_reason: FinishReason | None = None
+    stats: GenerationStats | None = None


 class ImageChunk(BaseChunk):
--- a/src/exo/shared/types/worker/downloads.py
+++ b/src/exo/shared/types/worker/downloads.py
@@ -28,7 +28,7 @@ class DownloadPending(BaseDownloadProgress):


 class DownloadCompleted(BaseDownloadProgress):
-    pass
+    total_bytes: Memory


 class DownloadFailed(BaseDownloadProgress):
--- a/src/exo/shared/types/worker/runner_response.py
+++ b/src/exo/shared/types/worker/runner_response.py
@@ -1,4 +1,4 @@
-from exo.shared.types.api import FinishReason
+from exo.shared.types.api import FinishReason, GenerationStats
 from exo.utils.pydantic_ext import TaggedModel


@@ -15,6 +15,7 @@ class GenerationResponse(BaseRunnerResponse):
    token: int
    # logprobs: list[float] | None = None # too big. we can change to be top-k
    finish_reason: FinishReason | None = None
+    stats: GenerationStats | None = None


 class FinishedResponse(BaseRunnerResponse):
--- a/src/exo/shared/types/worker/runners.py
+++ b/src/exo/shared/types/worker/runners.py
@@ -53,6 +53,10 @@ class RunnerRunning(BaseRunnerStatus):
    pass


+class RunnerShuttingDown(BaseRunnerStatus):
+    pass
+
+
 class RunnerShutdown(BaseRunnerStatus):
    pass

@@ -70,6 +74,7 @@ RunnerStatus = (
    | RunnerWarmingUp
    | RunnerReady
    | RunnerRunning
+    | RunnerShuttingDown
    | RunnerShutdown
    | RunnerFailed
 )
--- a/src/exo/worker/download/download_utils.py
+++ b/src/exo/worker/download/download_utils.py
@@ -450,6 +450,11 @@ async def get_weight_map(repo_id: str, revision: str = "main") -> dict[str, str]


 async def resolve_allow_patterns(shard: ShardMetadata) -> list[str]:
+    # TODO: 'Smart' downloads are disabled because:
+    #  (i) We don't handle all kinds of files;
+    # (ii) We don't have sticky sessions.
+    # (iii) Tensor parallel requires all files.
+    return ["*"]
    try:
        weight_map = await get_weight_map(str(shard.model_meta.model_id))
        return get_allow_patterns(weight_map, shard)
--- a/src/exo/worker/engines/mlx/auto_parallel.py
+++ b/src/exo/worker/engines/mlx/auto_parallel.py
@@ -289,8 +289,7 @@ class DeepSeekShardingStrategy(TensorParallelShardingStrategy):
        model = cast(DeepseekV3Model, model)
        for layer in model.layers:
            # Shard the self attention
-            if layer.self_attn.q_lora_rank is None:  # pyright: ignore[reportUnnecessaryComparison]
-                # Unfortunately, q_lora_rank can be None despite typing hints.
+            if layer.self_attn.q_lora_rank is None:
                layer.self_attn.q_proj = self.all_to_sharded_linear(
                    layer.self_attn.q_proj
                )
--- a/src/exo/worker/engines/mlx/constants.py
+++ b/src/exo/worker/engines/mlx/constants.py
@@ -9,7 +9,7 @@ MAX_KV_SIZE: int | None = 3200
 KEEP_KV_SIZE: int | None = 1600
 QUANTIZE_MODEL_MODE: str | None = "affine"
 CACHE_GROUP_SIZE: int = 64
-KV_CACHE_BITS: int | None = 8
+KV_CACHE_BITS: int | None = None

 # TODO: We should really make this opt-in, but Kimi requires trust_remote_code=True
 TRUST_REMOTE_CODE: bool = True
--- a/src/exo/worker/engines/mlx/generator/generate.py
+++ b/src/exo/worker/engines/mlx/generator/generate.py
@@ -3,10 +3,17 @@ from typing import Any, Callable, Generator, cast, get_args
 import mlx.core as mx
 from mlx_lm import stream_generate
 from mlx_lm.models.cache import KVCache
+from mlx_lm.sample_utils import make_sampler
 from mlx_lm.tokenizer_utils import TokenizerWrapper

 # from exo.engines.mlx.cache import KVPrefixCache
-from exo.shared.types.api import ChatCompletionMessage, FinishReason
+from exo.shared.types.api import (
+    BenchChatCompletionTaskParams,
+    ChatCompletionMessage,
+    FinishReason,
+    GenerationStats,
+)
+from exo.shared.types.memory import Memory
 from exo.shared.types.tasks import ChatCompletionTaskParams
 from exo.shared.types.worker.runner_response import (
    GenerationResponse,
@@ -41,7 +48,6 @@ def maybe_quantize_kv_cache(
 def warmup_inference(
    model: Model,
    tokenizer: TokenizerWrapper,
-    sampler: Callable[[mx.array], mx.array],
 ) -> int:
    content = "Prompt to warm up the inference engine. Repeat this."

@@ -64,6 +70,9 @@ def warmup_inference(
        model=model,
    )

+    # Use a default sampler for warmup
+    sampler = make_sampler(temp=0.7)
+
    logger.info("Generating warmup tokens")
    for _r in stream_generate(
        model=model,
@@ -72,7 +81,7 @@ def warmup_inference(
        max_tokens=50,
        sampler=sampler,
        prompt_cache=cache,
-        prefill_step_size=65536,
+        prefill_step_size=2048,
        kv_group_size=KV_GROUP_SIZE,
        kv_bits=KV_BITS,
    ):
@@ -80,20 +89,47 @@ def warmup_inference(
        tokens_generated += 1

    logger.info("Generated ALL warmup tokens")
+
+    # TODO: Do we want an mx_barrier?
+    #  At least this version is actively incorrect, as it should use mx_barrier(group)
    mx_barrier()

    return tokens_generated


+def ban_token_ids(token_ids: list[int]) -> Callable[[mx.array, mx.array], mx.array]:
+    token_ids = [int(t) for t in token_ids]
+
+    def proc(_history: mx.array, logits: mx.array) -> mx.array:
+        for tid in token_ids:
+            logits[..., tid] = -1e9
+        return logits
+
+    return proc
+
+
+def eos_ids_from_tokenizer(tokenizer: TokenizerWrapper) -> list[int]:
+    eos: list[int] | None = getattr(tokenizer, "eos_token_ids", None)
+    if eos is None:
+        return []
+    return eos
+
+
 def mlx_generate(
    model: Model,
    tokenizer: TokenizerWrapper,
-    sampler: Callable[[mx.array], mx.array],
    task: ChatCompletionTaskParams,
 ) -> Generator[GenerationResponse]:
+    # Ensure that generation stats only contains peak memory for this generation
+    mx.reset_peak_memory()
+    is_bench: bool = isinstance(task, BenchChatCompletionTaskParams)
+
    # Currently we support chat-completion tasks only.
    logger.info(f"task_params: {task}")

+    if task.seed is not None:
+        mx.random.seed(task.seed)
+
    prompt = apply_chat_template(
        tokenizer=tokenizer,
        chat_task_data=task,
@@ -101,6 +137,17 @@ def mlx_generate(

    caches = make_kv_cache(model=model)

+    logits_processors: list[Callable[[mx.array, mx.array], mx.array]] = []
+    if is_bench:
+        # Only sample length eos tokens
+        eos_ids = eos_ids_from_tokenizer(tokenizer)
+        logits_processors = [ban_token_ids(eos_ids)]
+
+    sampler = make_sampler(
+        temp=task.temperature if task.temperature is not None else 0.7,
+        top_p=task.top_p if task.top_p is not None else 1.0,
+    )
+
    max_tokens = task.max_tokens or MAX_TOKENS
    for out in stream_generate(
        model=model,
@@ -108,26 +155,40 @@ def mlx_generate(
        prompt=prompt,
        max_tokens=max_tokens,
        sampler=sampler,
+        logits_processors=logits_processors,
        prompt_cache=caches,
-        prefill_step_size=65536,
+        # TODO: Dynamically change prefill step size to be the maximum possible without timing out.
+        prefill_step_size=2048,
        kv_group_size=KV_GROUP_SIZE,
        kv_bits=KV_BITS,
    ):
        logger.info(out.text)
-        if out.finish_reason is not None and out.finish_reason not in get_args(
-            FinishReason
-        ):
-            # We don't throw here as this failure case is really not all that bad
-            # Just log the error and move on
-            logger.warning(
-                f"Model generated unexpected finish_reason: {out.finish_reason}"
+
+        stats: GenerationStats | None = None
+        if out.finish_reason is not None:
+            stats = GenerationStats(
+                prompt_tps=float(out.prompt_tps),
+                generation_tps=float(out.generation_tps),
+                prompt_tokens=int(out.prompt_tokens),
+                generation_tokens=int(out.generation_tokens),
+                peak_memory_usage=Memory.from_gb(out.peak_memory),
            )

+            if out.finish_reason not in get_args(FinishReason):
+                # We don't throw here as this failure case is really not all that bad
+                # Just log the error and move on
+                logger.warning(
+                    f"Model generated unexpected finish_reason: {out.finish_reason}"
+                )
+
        yield GenerationResponse(
            text=out.text,
            token=out.token,
            finish_reason=cast(FinishReason | None, out.finish_reason),
+            stats=stats,
        )

        if out.finish_reason is not None:
            break
+
+        # TODO: Do we want an mx_barrier?
--- a/src/exo/worker/engines/mlx/utils_mlx.py
+++ b/src/exo/worker/engines/mlx/utils_mlx.py
@@ -1,13 +1,25 @@
 import json
 import os
 import resource
+import sys
 import time
 from pathlib import Path
-from typing import Any, Callable, cast
+from typing import Any, cast
+
+# Monkey-patch for transformers 5.x compatibility
+# Kimi's tokenization_kimi.py imports bytes_to_unicode from the old location
+# which was moved in transformers 5.0.0rc2
+try:
+    import transformers.models.gpt2.tokenization_gpt2 as gpt2_tokenization
+    from transformers.convert_slow_tokenizer import bytes_to_unicode
+
+    if not hasattr(gpt2_tokenization, "bytes_to_unicode"):
+        gpt2_tokenization.bytes_to_unicode = bytes_to_unicode  # type: ignore[attr-defined]
+except ImportError:
+    pass  # transformers < 5.0 or bytes_to_unicode not available

 from mlx_lm.models.cache import KVCache, QuantizedKVCache, RotatingKVCache
 from mlx_lm.models.deepseek_v3 import DeepseekV3Model
-from mlx_lm.sample_utils import make_sampler
 from mlx_lm.tokenizer_utils import TokenizerWrapper

 from exo.worker.engines.mlx.constants import (
@@ -19,7 +31,7 @@ from exo.worker.engines.mlx.constants import (
 try:
    from mlx_lm.tokenizer_utils import load_tokenizer
 except ImportError:
-    from mlx_lm.tokenizer_utils import load as load_tokenizer  # type: ignore
+    from mlx_lm.tokenizer_utils import load as load_tokenizer
 import contextlib

 import mlx.core as mx
@@ -176,11 +188,7 @@ def initialize_mlx(

 def load_mlx_items(
    bound_instance: BoundInstance, group: Group | None
-) -> tuple[Model, TokenizerWrapper, Callable[[mx.array], mx.array]]:
-    # TODO: pass temperature
-    sampler: Callable[[mx.array], mx.array] = make_sampler(temp=0.7)
-    logger.info("Created a sampler")
-
+) -> tuple[Model, TokenizerWrapper]:
    if group is None:
        logger.info(f"Single device used for {bound_instance.instance}")
        model_path = build_model_path(bound_instance.bound_shard.model_meta.model_id)
@@ -201,7 +209,7 @@ def load_mlx_items(

    set_wired_limit_for_model(get_weights_size(bound_instance.bound_shard))

-    return cast(Model, model), tokenizer, sampler
+    return cast(Model, model), tokenizer


 def shard_and_load(
@@ -257,26 +265,70 @@ def shard_and_load(
    return model, tokenizer


-def get_tokenizer(model_path: Path, shard_metadata: ShardMetadata):
-    # TODO: Let's move away from this custom logic to mlx_lm.load()
-    if "kimi-k2" in shard_metadata.model_meta.model_id.lower():
-        eos_token_ids = [163586]
+def get_tokenizer(model_path: Path, shard_metadata: ShardMetadata) -> TokenizerWrapper:
+    """Load tokenizer for a model shard. Delegates to load_tokenizer_for_model_id."""
+    return load_tokenizer_for_model_id(shard_metadata.model_meta.model_id, model_path)

-    elif "glm" in shard_metadata.model_meta.model_id.lower():
-        eos_token_ids = [151336, 151329, 151338]

-    else:
-        eos_token_ids = None
+def get_eos_token_ids_for_model(model_id: str) -> list[int] | None:
+    """
+    Get the EOS token IDs for a model based on its ID.

-    tokenizer = cast(
-        TokenizerWrapper,
-        load_tokenizer(
-            model_path,
-            tokenizer_config_extra={"trust_remote_code": TRUST_REMOTE_CODE},
-            eos_token_ids=eos_token_ids,
-        ),
+    Some models require explicit EOS token configuration that isn't in their
+    tokenizer config. This function returns the known EOS token IDs for such models.
+
+    Args:
+        model_id: The HuggingFace model ID
+
+    Returns:
+        List of EOS token IDs, or None if the model uses standard tokenizer config
+    """
+    model_id_lower = model_id.lower()
+    if "kimi-k2" in model_id_lower:
+        return [163586]
+    elif "glm" in model_id_lower:
+        return [151336, 151329, 151338]
+    return None
+
+
+def load_tokenizer_for_model_id(model_id: str, model_path: Path) -> TokenizerWrapper:
+    """
+    Load tokenizer for a model given its ID and local path.
+
+    This is the core tokenizer loading logic, handling special cases for different
+    model families (Kimi, GLM, etc.) and transformers 5.x compatibility.
+
+    Args:
+        model_id: The HuggingFace model ID (e.g., "moonshotai/Kimi-K2-Instruct")
+        model_path: Local path where the model/tokenizer files are stored
+
+    Returns:
+        TokenizerWrapper instance configured for the model
+    """
+    model_id_lower = model_id.lower()
+    eos_token_ids = get_eos_token_ids_for_model(model_id)
+
+    # Kimi uses a custom TikTokenTokenizer that transformers 5.x can't load via AutoTokenizer
+    if "kimi-k2" in model_id_lower:
+        sys.path.insert(0, str(model_path))
+        from tokenization_kimi import TikTokenTokenizer  # type: ignore[import-not-found]  # noqa: I001
+
+        hf_tokenizer: Any = TikTokenTokenizer.from_pretrained(model_path)  # pyright: ignore[reportUnknownVariableType,reportUnknownMemberType]
+
+        # Patch encode to use internal tiktoken model directly
+        # transformers 5.x has a bug in the encode->pad path for slow tokenizers
+        def _patched_encode(text: str, **_kwargs: object) -> list[int]:
+            # Pass allowed_special="all" to handle special tokens like <|im_user|>
+            return list(hf_tokenizer.model.encode(text, allowed_special="all"))  # pyright: ignore[reportUnknownMemberType,reportUnknownArgumentType]
+
+        hf_tokenizer.encode = _patched_encode
+        return TokenizerWrapper(hf_tokenizer, eos_token_ids=eos_token_ids)
+
+    tokenizer = load_tokenizer(
+        model_path,
+        tokenizer_config_extra={"trust_remote_code": TRUST_REMOTE_CODE},
+        eos_token_ids=eos_token_ids,
    )
-    assert isinstance(tokenizer, TokenizerWrapper)

    return tokenizer

@@ -289,15 +341,15 @@ def apply_chat_template(
    messages = chat_task_data.messages

    formatted_messages: list[dict[str, Any]] = []
-    for _, message in enumerate(messages):
+    for message in messages:
        if isinstance(message.content, ChatCompletionMessageText):
            message.content = message.content.text
        if isinstance(message.content, list):
-            if len(message.content) != 1:
-                logger.warning("Received malformed prompt")
+            if len(message.content) == 0:
+                logger.warning("Received prompt with no content, skipping")
                continue

-            message.content = message.content[0].text
+            message.content = "\n".join(c.text for c in message.content).strip()
        if message.content is None and message.thinking is None:
            continue

@@ -306,13 +358,14 @@ def apply_chat_template(
            {k: v for k, v in message.model_dump().items() if v is not None}  # type: ignore
        )

-    prompt: str = tokenizer.apply_chat_template(  # type: ignore
+    prompt: str = tokenizer.apply_chat_template(
        formatted_messages,
        tokenize=False,
        add_generation_prompt=True,
+        tools=chat_task_data.tools,
    )

-    return prompt  # type: ignore
+    return prompt


 class NullKVCache(KVCache):
@@ -395,11 +448,15 @@ def set_wired_limit_for_model(model_size: Memory):
            "MB. This can be slow. See the documentation for possible work-arounds: "
            "https://github.com/ml-explore/mlx-lm/tree/main#large-models"
        )
-    kv_bytes = int(0.02 * model_bytes)
-    target_cache = int(1.10 * (model_bytes + kv_bytes))
-    target_cache = min(target_cache, max_rec_size)
-    mx.set_cache_limit(target_cache)
    mx.set_wired_limit(max_rec_size)
-    logger.info(
-        f"Wired limit set to {max_rec_size}. Cache limit set to {target_cache}."
-    )
+    logger.info(f"Wired limit set to {max_rec_size}.")
+
+
+def mlx_cleanup(
+    model: Model | None, tokenizer: TokenizerWrapper | None, group: Group | None
+) -> None:
+    del model, tokenizer, group
+    mx.clear_cache()
+    import gc
+
+    gc.collect()
--- a/src/exo/worker/main.py
+++ b/src/exo/worker/main.py
@@ -23,6 +23,7 @@ from exo.shared.types.events import (
    TopologyEdgeCreated,
    TopologyEdgeDeleted,
 )
+from exo.shared.types.models import ModelId
 from exo.shared.types.multiaddr import Multiaddr
 from exo.shared.types.profiling import MemoryPerformanceProfile, NodePerformanceProfile
 from exo.shared.types.state import State
@@ -83,7 +84,7 @@ class Worker:
        self.out_for_delivery: dict[EventId, ForwarderEvent] = {}

        self.state: State = State()
-        self.download_status: dict[ShardMetadata, DownloadProgress] = {}
+        self.download_status: dict[ModelId, DownloadProgress] = {}
        self.runners: dict[RunnerId, RunnerSupervisor] = {}
        self._tg: TaskGroup | None = None

@@ -128,6 +129,7 @@ class Worker:
            tg.start_soon(start_polling_node_metrics, resource_monitor_callback)

            tg.start_soon(start_polling_memory_metrics, memory_monitor_callback)
+            tg.start_soon(self._emit_existing_download_progress)
            tg.start_soon(self._connection_message_event_writer)
            tg.start_soon(self._resend_out_for_delivery)
            tg.start_soon(self._event_applier)
@@ -200,11 +202,11 @@ class Worker:
                        )
                    )
                case DownloadModel(shard_metadata=shard):
-                    if shard not in self.download_status:
+                    if shard.model_meta.model_id not in self.download_status:
                        progress = DownloadPending(
                            shard_metadata=shard, node_id=self.node_id
                        )
-                        self.download_status[shard] = progress
+                        self.download_status[shard.model_meta.model_id] = progress
                        await self.event_sender.send(
                            NodeDownloadProgress(download_progress=progress)
                        )
@@ -215,9 +217,11 @@ class Worker:
                    )
                    if initial_progress.status == "complete":
                        progress = DownloadCompleted(
-                            shard_metadata=shard, node_id=self.node_id
+                            shard_metadata=shard,
+                            node_id=self.node_id,
+                            total_bytes=initial_progress.total_bytes,
                        )
-                        self.download_status[shard] = progress
+                        self.download_status[shard.model_meta.model_id] = progress
                        await self.event_sender.send(
                            NodeDownloadProgress(download_progress=progress)
                        )
@@ -349,7 +353,7 @@ class Worker:
                initial_progress
            ),
        )
-        self.download_status[task.shard_metadata] = status
+        self.download_status[task.shard_metadata.model_meta.model_id] = status
        self.event_sender.send_nowait(NodeDownloadProgress(download_progress=status))

        last_progress_time = 0.0
@@ -362,8 +366,12 @@ class Worker:
            nonlocal self
            nonlocal last_progress_time
            if progress.status == "complete":
-                status = DownloadCompleted(shard_metadata=shard, node_id=self.node_id)
-                self.download_status[shard] = status
+                status = DownloadCompleted(
+                    shard_metadata=shard,
+                    node_id=self.node_id,
+                    total_bytes=progress.total_bytes,
+                )
+                self.download_status[shard.model_meta.model_id] = status
                # Footgun!
                self.event_sender.send_nowait(
                    NodeDownloadProgress(download_progress=status)
@@ -384,7 +392,7 @@ class Worker:
                        progress
                    ),
                )
-                self.download_status[shard] = status
+                self.download_status[shard.model_meta.model_id] = status
                self.event_sender.send_nowait(
                    NodeDownloadProgress(download_progress=status)
                )
@@ -444,3 +452,42 @@ class Worker:
                    await self.event_sender.send(TopologyEdgeDeleted(edge=conn))

            await anyio.sleep(10)
+
+    async def _emit_existing_download_progress(self) -> None:
+        try:
+            while True:
+                logger.info("Fetching and emitting existing download progress...")
+                async for (
+                    _,
+                    progress,
+                ) in self.shard_downloader.get_shard_download_status():
+                    if progress.status == "complete":
+                        status = DownloadCompleted(
+                            node_id=self.node_id,
+                            shard_metadata=progress.shard,
+                            total_bytes=progress.total_bytes,
+                        )
+                    elif progress.status in ["in_progress", "not_started"]:
+                        if progress.downloaded_bytes_this_session.in_bytes == 0:
+                            status = DownloadPending(
+                                node_id=self.node_id, shard_metadata=progress.shard
+                            )
+                        else:
+                            status = DownloadOngoing(
+                                node_id=self.node_id,
+                                shard_metadata=progress.shard,
+                                download_progress=map_repo_download_progress_to_download_progress_data(
+                                    progress
+                                ),
+                            )
+                    else:
+                        continue
+
+                    self.download_status[progress.shard.model_meta.model_id] = status
+                    await self.event_sender.send(
+                        NodeDownloadProgress(download_progress=status)
+                    )
+                logger.info("Done emitting existing download progress.")
+                await anyio.sleep(5 * 60)  # 5 minutes
+        except Exception as e:
+            logger.error(f"Error emitting existing download progress: {e}")
--- a/src/exo/worker/plan.py
+++ b/src/exo/worker/plan.py
@@ -3,6 +3,7 @@
 from collections.abc import Mapping, Sequence

 from exo.shared.types.common import NodeId
+from exo.shared.types.models import ModelId
 from exo.shared.types.tasks import (
    ChatCompletion,
    ConnectToGroup,
@@ -34,7 +35,6 @@ from exo.shared.types.worker.runners import (
    RunnerStatus,
    RunnerWarmingUp,
 )
-from exo.shared.types.worker.shards import ShardMetadata
 from exo.worker.runner.runner_supervisor import RunnerSupervisor


@@ -43,7 +43,7 @@ def plan(
    # Runners is expected to be FRESH and so should not come from state
    runners: Mapping[RunnerId, RunnerSupervisor],
    # DL_status is expected to be FRESH and so should not come from state
-    download_status: Mapping[ShardMetadata, DownloadProgress],
+    download_status: Mapping[ModelId, DownloadProgress],
    # gdls is not expected to be fresh
    global_download_status: Mapping[NodeId, Sequence[DownloadProgress]],
    instances: Mapping[InstanceId, Instance],
@@ -111,13 +111,14 @@ def _create_runner(

 def _model_needs_download(
    runners: Mapping[RunnerId, RunnerSupervisor],
-    download_status: Mapping[ShardMetadata, DownloadProgress],
+    download_status: Mapping[ModelId, DownloadProgress],
 ) -> DownloadModel | None:
    for runner in runners.values():
+        model_id = runner.bound_instance.bound_shard.model_meta.model_id
        if isinstance(runner.status, RunnerIdle) and (
-            not isinstance(
-                download_status.get(runner.bound_instance.bound_shard, None),
-                (DownloadOngoing, DownloadCompleted),
+            model_id not in download_status
+            or not isinstance(
+                download_status[model_id], (DownloadOngoing, DownloadCompleted)
            )
        ):
            # We don't invalidate download_status randomly in case a file gets deleted on disk
@@ -273,6 +274,12 @@ def _pending_tasks(
            if task.instance_id != runner.bound_instance.instance.instance_id:
                continue

+            # I have a design point here; this is a state race in disguise as the task status doesn't get updated to completed fast enough
+            # however, realistically the task status should be set to completed by the LAST runner, so this is a true race
+            # the actual solution is somewhat deeper than this bypass - TODO!
+            if task.task_id in runner.completed:
+                continue
+
            # TODO: Check ordering aligns with MLX distributeds expectations.

            if isinstance(runner.status, RunnerReady) and all(
--- a/src/exo/worker/runner/bootstrap.py
+++ b/src/exo/worker/runner/bootstrap.py
@@ -6,7 +6,7 @@ from exo.shared.types.events import Event, RunnerStatusUpdated
 from exo.shared.types.tasks import Task
 from exo.shared.types.worker.instances import BoundInstance, MlxJacclInstance
 from exo.shared.types.worker.runners import RunnerFailed
-from exo.utils.channels import MpReceiver, MpSender
+from exo.utils.channels import ClosedResourceError, MpReceiver, MpSender

 logger: "loguru.Logger" = loguru.logger

@@ -31,6 +31,8 @@ def entrypoint(
        from exo.worker.runner.runner import main

        main(bound_instance, event_sender, task_receiver)
+    except ClosedResourceError:
+        logger.warning("Runner communication closed unexpectedly")
    except Exception as e:
        logger.opt(exception=e).warning(
            f"Runner {bound_instance.bound_runner_id} crashed with critical exception {e}"
@@ -42,8 +44,10 @@ def entrypoint(
            )
        )
    finally:
-        event_sender.close()
-        task_receiver.close()
-        event_sender.join()
-        task_receiver.join()
-        logger.info("bye from the runner")
+        try:
+            event_sender.close()
+            task_receiver.close()
+        finally:
+            event_sender.join()
+            task_receiver.join()
+            logger.info("bye from the runner")
--- a/src/exo/worker/runner/runner.py
+++ b/src/exo/worker/runner/runner.py
@@ -1,5 +1,7 @@
 import time

+import mlx.core as mx
+
 from exo.shared.types.api import ChatCompletionMessageText
 from exo.shared.types.chunks import TokenChunk
 from exo.shared.types.events import (
@@ -32,10 +34,11 @@ from exo.shared.types.worker.runners import (
    RunnerReady,
    RunnerRunning,
    RunnerShutdown,
+    RunnerShuttingDown,
    RunnerStatus,
    RunnerWarmingUp,
 )
-from exo.utils.channels import ClosedResourceError, MpReceiver, MpSender
+from exo.utils.channels import MpReceiver, MpSender
 from exo.worker.engines.mlx.generator.generate import mlx_generate, warmup_inference
 from exo.worker.engines.mlx.utils_mlx import (
    initialize_mlx,
@@ -55,180 +58,153 @@ def main(
        bound_instance.bound_runner_id,
        bound_instance.bound_shard,
    )
-    try:
-        logger.info("hello from the runner")
-        if getattr(shard_metadata, "immediate_exception", False):
-            raise Exception("Fake exception - runner failed to spin up.")
-        if timeout := getattr(shard_metadata, "should_timeout", 0):
-            time.sleep(timeout)
+    logger.info("hello from the runner")
+    if getattr(shard_metadata, "immediate_exception", False):
+        raise Exception("Fake exception - runner failed to spin up.")
+    if timeout := getattr(shard_metadata, "should_timeout", 0):
+        time.sleep(timeout)

-        setup_start_time = time.time()
+    setup_start_time = time.time()

-        model = None
-        tokenizer = None
-        sampler = None
-        group = None
+    model = None
+    tokenizer = None
+    group = None

-        current_status: RunnerStatus = RunnerIdle()
-        logger.info("runner created")
-        event_sender.send(
-            RunnerStatusUpdated(runner_id=runner_id, runner_status=current_status)
-        )
-        with task_receiver as tasks:
-            for task in tasks:
-                event_sender.send(
-                    TaskStatusUpdated(
-                        task_id=task.task_id, task_status=TaskStatus.Running
-                    )
-                )
-                event_sender.send(TaskAcknowledged(task_id=task.task_id))
-                match task:
-                    case ConnectToGroup() if isinstance(
-                        current_status, (RunnerIdle, RunnerFailed)
-                    ):
-                        logger.info("runner connecting")
-                        current_status = RunnerConnecting()
-                        event_sender.send(
-                            RunnerStatusUpdated(
-                                runner_id=runner_id, runner_status=current_status
-                            )
-                        )
-                        group = initialize_mlx(bound_instance)
-
-                        logger.info("runner connected")
-                        current_status = RunnerConnected()
-
-                    # we load the model if it's connected with a group, or idle without a group. we should never tell a model to connect if it doesn't need to
-                    case LoadModel() if (
-                        isinstance(current_status, RunnerConnected)
-                        and group is not None
-                    ) or (isinstance(current_status, RunnerIdle) and group is None):
-                        current_status = RunnerLoading()
-                        logger.info("runner loading")
-                        event_sender.send(
-                            RunnerStatusUpdated(
-                                runner_id=runner_id, runner_status=current_status
-                            )
-                        )
-
-                        model, tokenizer, sampler = load_mlx_items(
-                            bound_instance, group
-                        )
-
-                        current_status = RunnerLoaded()
-                        logger.info("runner loaded")
-                    case StartWarmup() if isinstance(current_status, RunnerLoaded):
-                        assert model
-                        assert tokenizer
-                        assert sampler
-                        current_status = RunnerWarmingUp()
-                        logger.info("runner warming up")
-                        event_sender.send(
-                            RunnerStatusUpdated(
-                                runner_id=runner_id, runner_status=current_status
-                            )
-                        )
-
-                        logger.info(f"warming up inference for instance: {instance}")
-                        toks = warmup_inference(
-                            model=model,
-                            tokenizer=tokenizer,
-                            sampler=sampler,
-                            # kv_prefix_cache=kv_prefix_cache,  # supply for warmup-time prefix caching
-                        )
-                        logger.info(f"warmed up by generating {toks} tokens")
-                        logger.info(
-                            f"runner initialized in {time.time() - setup_start_time} seconds"
-                        )
-                        current_status = RunnerReady()
-                        logger.info("runner ready")
-                    case ChatCompletion(
-                        task_params=task_params, command_id=command_id
-                    ) if isinstance(current_status, RunnerReady):
-                        assert model
-                        assert tokenizer
-                        assert sampler
-                        logger.info(f"received chat request: {str(task)[:500]}")
-                        current_status = RunnerRunning()
-                        logger.info("runner running")
-                        event_sender.send(
-                            RunnerStatusUpdated(
-                                runner_id=runner_id, runner_status=current_status
-                            )
-                        )
-                        assert task_params.messages[0].content is not None
-                        _check_for_debug_prompts(task_params.messages[0].content)
-
-                        # Generate responses using the actual MLX generation
-                        for response in mlx_generate(
-                            model=model,
-                            tokenizer=tokenizer,
-                            sampler=sampler,
-                            task=task_params,
-                        ):
-                            match response:
-                                case GenerationResponse():
-                                    if shard_metadata.device_rank == 0:
-                                        event_sender.send(
-                                            ChunkGenerated(
-                                                command_id=command_id,
-                                                chunk=TokenChunk(
-                                                    idx=response.token,
-                                                    model=shard_metadata.model_meta.model_id,
-                                                    text=response.text,
-                                                    token_id=response.token,
-                                                    finish_reason=response.finish_reason,
-                                                ),
-                                            )
-                                        )
-                                    # case TokenizedResponse():
-                                    # TODO: something here ig
-
-                        current_status = RunnerReady()
-                        logger.info("runner ready")
-                    case Shutdown():
-                        logger.info("runner shutting down")
-                        event_sender.send(
-                            TaskStatusUpdated(
-                                task_id=task.task_id, task_status=TaskStatus.Complete
-                            )
-                        )
-                        break
-                    case _:
-                        raise ValueError(
-                            f"Received {task.__class__.__name__} outside of state machine in {current_status=}"
-                        )
-                event_sender.send(
-                    TaskStatusUpdated(
-                        task_id=task.task_id, task_status=TaskStatus.Complete
-                    )
-                )
-                event_sender.send(
-                    RunnerStatusUpdated(
-                        runner_id=runner_id, runner_status=current_status
-                    )
-                )
-        event_sender.send(
-            RunnerStatusUpdated(runner_id=runner_id, runner_status=RunnerShutdown())
-        )
-    except ClosedResourceError:
-        logger.warning("runner communication closed unexpectedly")
-    except Exception as e:
-        logger.opt(exception=e).warning(
-            f"Runner {runner_id} crashed with critical exception {e}"
-        )
-        event_sender.send(
-            RunnerStatusUpdated(
-                runner_id=runner_id,
-                runner_status=RunnerFailed(error_message=str(e)),
+    current_status: RunnerStatus = RunnerIdle()
+    logger.info("runner created")
+    event_sender.send(
+        RunnerStatusUpdated(runner_id=runner_id, runner_status=current_status)
+    )
+    with task_receiver as tasks:
+        for task in tasks:
+            event_sender.send(
+                TaskStatusUpdated(task_id=task.task_id, task_status=TaskStatus.Running)
            )
-        )
-    finally:
-        event_sender.close()
-        task_receiver.close()
-        event_sender.join()
-        task_receiver.join()
-        logger.info("bye from the runner")
+            event_sender.send(TaskAcknowledged(task_id=task.task_id))
+            match task:
+                case ConnectToGroup() if isinstance(
+                    current_status, (RunnerIdle, RunnerFailed)
+                ):
+                    logger.info("runner connecting")
+                    current_status = RunnerConnecting()
+                    event_sender.send(
+                        RunnerStatusUpdated(
+                            runner_id=runner_id, runner_status=current_status
+                        )
+                    )
+                    group = initialize_mlx(bound_instance)
+
+                    logger.info("runner connected")
+                    current_status = RunnerConnected()
+
+                # we load the model if it's connected with a group, or idle without a group. we should never tell a model to connect if it doesn't need to
+                case LoadModel() if (
+                    isinstance(current_status, RunnerConnected) and group is not None
+                ) or (isinstance(current_status, RunnerIdle) and group is None):
+                    current_status = RunnerLoading()
+                    logger.info("runner loading")
+                    event_sender.send(
+                        RunnerStatusUpdated(
+                            runner_id=runner_id, runner_status=current_status
+                        )
+                    )
+
+                    model, tokenizer = load_mlx_items(bound_instance, group)
+
+                    current_status = RunnerLoaded()
+                    logger.info("runner loaded")
+                case StartWarmup() if isinstance(current_status, RunnerLoaded):
+                    assert model
+                    assert tokenizer
+                    current_status = RunnerWarmingUp()
+                    logger.info("runner warming up")
+                    event_sender.send(
+                        RunnerStatusUpdated(
+                            runner_id=runner_id, runner_status=current_status
+                        )
+                    )
+
+                    logger.info(f"warming up inference for instance: {instance}")
+                    toks = warmup_inference(
+                        model=model,
+                        tokenizer=tokenizer,
+                        # kv_prefix_cache=kv_prefix_cache,  # supply for warmup-time prefix caching
+                    )
+                    logger.info(f"warmed up by generating {toks} tokens")
+                    logger.info(
+                        f"runner initialized in {time.time() - setup_start_time} seconds"
+                    )
+                    current_status = RunnerReady()
+                    logger.info("runner ready")
+                case ChatCompletion(task_params=task_params, command_id=command_id) if (
+                    isinstance(current_status, RunnerReady)
+                ):
+                    assert model
+                    assert tokenizer
+                    logger.info(f"received chat request: {str(task)[:500]}")
+                    current_status = RunnerRunning()
+                    logger.info("runner running")
+                    event_sender.send(
+                        RunnerStatusUpdated(
+                            runner_id=runner_id, runner_status=current_status
+                        )
+                    )
+                    assert task_params.messages[0].content is not None
+                    _check_for_debug_prompts(task_params.messages[0].content)
+
+                    # Generate responses using the actual MLX generation
+                    for response in mlx_generate(
+                        model=model,
+                        tokenizer=tokenizer,
+                        task=task_params,
+                    ):
+                        match response:
+                            case GenerationResponse():
+                                if shard_metadata.device_rank == 0:
+                                    event_sender.send(
+                                        ChunkGenerated(
+                                            command_id=command_id,
+                                            chunk=TokenChunk(
+                                                idx=response.token,
+                                                model=shard_metadata.model_meta.model_id,
+                                                text=response.text,
+                                                token_id=response.token,
+                                                finish_reason=response.finish_reason,
+                                                stats=response.stats,
+                                            ),
+                                        )
+                                    )
+                                # case TokenizedResponse():
+                                # TODO: something here ig
+
+                    current_status = RunnerReady()
+                    logger.info("runner ready")
+                case Shutdown():
+                    current_status = RunnerShuttingDown()
+                    logger.info("runner shutting down")
+                    event_sender.send(
+                        RunnerStatusUpdated(
+                            runner_id=runner_id, runner_status=current_status
+                        )
+                    )
+                    current_status = RunnerShutdown()
+                case _:
+                    raise ValueError(
+                        f"Received {task.__class__.__name__} outside of state machine in {current_status=}"
+                    )
+            event_sender.send(
+                TaskStatusUpdated(task_id=task.task_id, task_status=TaskStatus.Complete)
+            )
+            event_sender.send(
+                RunnerStatusUpdated(runner_id=runner_id, runner_status=current_status)
+            )
+            if isinstance(current_status, RunnerShutdown):
+                del model, tokenizer, group
+                mx.clear_cache()
+                import gc
+
+                gc.collect()
+                break


 EXO_RUNNER_MUST_FAIL = "EXO RUNNER MUST FAIL"
--- a/src/exo/worker/runner/runner_supervisor.py
+++ b/src/exo/worker/runner/runner_supervisor.py
@@ -14,13 +14,23 @@ from anyio import (
 from anyio.abc import TaskGroup
 from loguru import logger

-from exo.shared.types.events import Event, RunnerStatusUpdated, TaskAcknowledged
-from exo.shared.types.tasks import Task, TaskId
+from exo.shared.types.events import (
+    Event,
+    RunnerStatusUpdated,
+    TaskAcknowledged,
+    TaskStatusUpdated,
+)
+from exo.shared.types.tasks import Task, TaskId, TaskStatus
 from exo.shared.types.worker.instances import BoundInstance
 from exo.shared.types.worker.runners import (
+    RunnerConnecting,
    RunnerFailed,
    RunnerIdle,
+    RunnerLoading,
+    RunnerRunning,
+    RunnerShuttingDown,
    RunnerStatus,
+    RunnerWarmingUp,
 )
 from exo.shared.types.worker.shards import ShardMetadata
 from exo.utils.channels import MpReceiver, MpSender, Sender, mp_channel
@@ -39,10 +49,10 @@ class RunnerSupervisor:
    _ev_recv: MpReceiver[Event]
    _task_sender: MpSender[Task]
    _event_sender: Sender[Event]
-    # err_path: str
    _tg: TaskGroup | None = field(default=None, init=False)
    status: RunnerStatus = field(default_factory=RunnerIdle, init=False)
    pending: dict[TaskId, anyio.Event] = field(default_factory=dict, init=False)
+    completed: set[TaskId] = field(default_factory=set, init=False)

    @classmethod
    def create(
@@ -77,7 +87,6 @@ class RunnerSupervisor:
            _ev_recv=ev_recv,
            _task_sender=task_sender,
            _event_sender=event_sender,
-            # err_path=err_path,
        )

        return self
@@ -118,6 +127,10 @@ class RunnerSupervisor:
        self._tg.cancel_scope.cancel()

    async def start_task(self, task: Task):
+        if task.task_id in self.completed:
+            logger.info(
+                f"Skipping invalid task {task} as it has already been completed"
+            )
        logger.info(f"Starting task {task}")
        event = anyio.Event()
        self.pending[task.task_id] = event
@@ -138,6 +151,22 @@ class RunnerSupervisor:
                    if isinstance(event, TaskAcknowledged):
                        self.pending.pop(event.task_id).set()
                        continue
+                    if (
+                        isinstance(event, TaskStatusUpdated)
+                        and event.task_status == TaskStatus.Complete
+                    ):
+                        # If a task has just been completed, we should be working on it.
+                        assert isinstance(
+                            self.status,
+                            (
+                                RunnerRunning,
+                                RunnerWarmingUp,
+                                RunnerLoading,
+                                RunnerConnecting,
+                                RunnerShuttingDown,
+                            ),
+                        )
+                        self.completed.add(event.task_id)
                    await self._event_sender.send(event)
            except (ClosedResourceError, BrokenResourceError) as e:
                await self._check_runner(e)
--- a/src/exo/worker/tests/constants.py
+++ b/src/exo/worker/tests/constants.py
@@ -9,9 +9,11 @@ MASTER_NODE_ID = NodeId("ffffffff-aaaa-4aaa-8aaa-aaaaaaaaaaaa")

 NODE_A: Final[NodeId] = NodeId("aaaaaaaa-aaaa-4aaa-8aaa-aaaaaaaaaaaa")
 NODE_B: Final[NodeId] = NodeId("bbbbbbbb-bbbb-4bbb-8bbb-bbbbbbbbbbbb")
+NODE_C: Final[NodeId] = NodeId("cccccccc-cccc-4ccc-8ccc-cccccccccccc")

 RUNNER_1_ID: Final[RunnerId] = RunnerId("11111111-1111-4111-8111-111111111111")
 RUNNER_2_ID: Final[RunnerId] = RunnerId("33333333-3333-4333-8333-333333333333")
+RUNNER_3_ID: Final[RunnerId] = RunnerId("Runner3")

 INSTANCE_1_ID: Final[InstanceId] = InstanceId("22222222-2222-4222-8222-222222222222")
 INSTANCE_2_ID: Final[InstanceId] = InstanceId("44444444-4444-4444-8444-444444444444")
--- a/src/exo/worker/tests/unittests/conftest.py
+++ b/src/exo/worker/tests/unittests/conftest.py
@@ -1,11 +1,9 @@
-from __future__ import annotations
-
-from dataclasses import dataclass
+from dataclasses import dataclass, field

 from exo.shared.types.common import NodeId
 from exo.shared.types.memory import Memory
 from exo.shared.types.models import ModelId, ModelMetadata
-from exo.shared.types.tasks import BaseTask
+from exo.shared.types.tasks import BaseTask, TaskId
 from exo.shared.types.worker.instances import (
    BoundInstance,
    Instance,
@@ -21,6 +19,7 @@ from exo.shared.types.worker.shards import PipelineShardMetadata, ShardMetadata
 class FakeRunnerSupervisor:
    bound_instance: BoundInstance
    status: RunnerStatus
+    completed: set[TaskId] = field(default_factory=set)


 class OtherTask(BaseTask):
--- a/src/exo/worker/tests/unittests/test_mlx/test_tokenizers.py
+++ b/src/exo/worker/tests/unittests/test_mlx/test_tokenizers.py
@@ -0,0 +1,386 @@
+"""
+Unit tests for tokenizer loading and functionality across all supported models.
+
+This test downloads only tokenizer-related files (not full model weights) to verify
+that tokenizers can be loaded and used correctly for encoding/decoding.
+"""
+
+import asyncio
+import contextlib
+from pathlib import Path
+
+import pytest
+
+from exo.shared.models.model_cards import MODEL_CARDS, ModelCard
+from exo.worker.download.download_utils import (
+    download_file_with_retry,
+    ensure_models_dir,
+    fetch_file_list_with_cache,
+)
+from exo.worker.engines.mlx.utils_mlx import (
+    get_eos_token_ids_for_model,
+    load_tokenizer_for_model_id,
+)
+
+# Files needed for tokenizer functionality
+TOKENIZER_FILE_PATTERNS = [
+    "tokenizer.json",
+    "tokenizer_config.json",
+    "special_tokens_map.json",
+    "vocab.json",
+    "vocab.txt",
+    "merges.txt",
+    "tiktoken.model",
+    "added_tokens.json",
+    "tokenizer.model",
+    "tokenization_*.py",  # Custom tokenizer implementations
+]
+
+
+def is_tokenizer_file(filename: str) -> bool:
+    """Check if a file is needed for tokenizer functionality."""
+    for pattern in TOKENIZER_FILE_PATTERNS:
+        if "*" in pattern:
+            prefix = pattern.split("*")[0]
+            suffix = pattern.split("*")[1]
+            if filename.startswith(prefix) and filename.endswith(suffix):
+                return True
+        elif filename == pattern:
+            return True
+    return False
+
+
+async def download_tokenizer_files(model_id: str) -> Path:
+    """Download only the tokenizer-related files for a model."""
+    target_dir = await ensure_models_dir() / model_id.replace("/", "--")
+    target_dir.mkdir(parents=True, exist_ok=True)
+
+    file_list = await fetch_file_list_with_cache(model_id, "main", recursive=True)
+
+    tokenizer_files = [f for f in file_list if is_tokenizer_file(f.path)]
+
+    if not tokenizer_files:
+        pytest.skip(f"No tokenizer files found for {model_id}")
+
+    for file_entry in tokenizer_files:
+        with contextlib.suppress(FileNotFoundError):
+            await download_file_with_retry(
+                model_id, "main", file_entry.path, target_dir
+            )
+
+    return target_dir
+
+
+# Get a sample of models to test (one per family to keep tests fast)
+def get_test_models() -> list[tuple[str, ModelCard]]:
+    """Get a representative sample of models to test."""
+    # Pick one model from each family to test
+    families: dict[str, tuple[str, ModelCard]] = {}
+    for short_id, card in MODEL_CARDS.items():
+        # Extract family name (e.g., "llama-3.1" from "llama-3.1-8b")
+        parts = short_id.split("-")
+        family = "-".join(parts[:2]) if len(parts) >= 2 else parts[0]
+
+        if family not in families:
+            families[family] = (short_id, card)
+
+    return list(families.values())
+
+
+TEST_MODELS: list[tuple[str, ModelCard]] = get_test_models()
+
+
+@pytest.fixture(scope="module")
+def event_loop():
+    """Create event loop for async tests."""
+    loop = asyncio.new_event_loop()
+    yield loop
+    loop.close()
+
+
+@pytest.mark.parametrize(
+    "short_id,model_card",
+    TEST_MODELS,
+    ids=[m[0] for m in TEST_MODELS],
+)
+@pytest.mark.asyncio
+async def test_tokenizer_encode_decode(short_id: str, model_card: ModelCard) -> None:
+    """Test that tokenizer can encode and decode text correctly."""
+    model_id = str(model_card.model_id)
+
+    # Download tokenizer files
+    model_path = await download_tokenizer_files(model_id)
+
+    # Verify required files exist
+    has_tokenizer = (
+        (model_path / "tokenizer.json").exists()
+        or (model_path / "tokenizer_config.json").exists()
+        or (model_path / "tiktoken.model").exists()
+        or (model_path / "tokenizer.model").exists()
+    )
+    if not has_tokenizer:
+        pytest.skip(f"Required tokenizer files not found for {model_id}")
+
+    # Load tokenizer
+    tokenizer = load_tokenizer_for_model_id(model_id, model_path)
+
+    # Test basic encoding
+    test_text = "Hello, world!"
+    encoded = tokenizer.encode(test_text)
+    assert isinstance(encoded, list), f"encode() should return a list for {model_id}"
+    assert len(encoded) > 0, f"encode() should return non-empty list for {model_id}"
+    assert all(isinstance(t, int) for t in encoded), (
+        f"All tokens should be integers for {model_id}"
+    )
+
+    # Test decoding
+    decoded = tokenizer.decode(encoded)
+    assert isinstance(decoded, str), f"decode() should return a string for {model_id}"
+    assert test_text in decoded or decoded.strip() == test_text.strip(), (
+        f"decode(encode(x)) should preserve text for {model_id}: got {decoded!r}"
+    )
+
+    # Test with longer text
+    long_text = "The quick brown fox jumps over the lazy dog. " * 10
+    long_encoded = tokenizer.encode(long_text)
+    assert len(long_encoded) > len(encoded), (
+        f"Longer text should produce more tokens for {model_id}"
+    )
+
+    # Test empty string
+    empty_encoded = tokenizer.encode("")
+    assert isinstance(empty_encoded, list), (
+        f"encode('') should return a list for {model_id}"
+    )
+
+    # Test special characters
+    special_text = 'Hello!\n\tWorld? <test> & "quotes"'
+    special_encoded = tokenizer.encode(special_text)
+    assert len(special_encoded) > 0, f"Special chars should encode for {model_id}"
+
+    # Test unicode
+    unicode_text = "Hello 世界 🌍"
+    unicode_encoded = tokenizer.encode(unicode_text)
+    assert len(unicode_encoded) > 0, f"Unicode should encode for {model_id}"
+
+
+@pytest.mark.parametrize(
+    "short_id,model_card",
+    TEST_MODELS,
+    ids=[m[0] for m in TEST_MODELS],
+)
+@pytest.mark.asyncio
+async def test_tokenizer_has_required_attributes(
+    short_id: str, model_card: ModelCard
+) -> None:
+    """Test that tokenizer has required attributes for inference."""
+    model_id = str(model_card.model_id)
+
+    model_path = await download_tokenizer_files(model_id)
+
+    has_tokenizer = (
+        (model_path / "tokenizer.json").exists()
+        or (model_path / "tokenizer_config.json").exists()
+        or (model_path / "tiktoken.model").exists()
+        or (model_path / "tokenizer.model").exists()
+    )
+    if not has_tokenizer:
+        pytest.skip(f"Required tokenizer files not found for {model_id}")
+
+    tokenizer = load_tokenizer_for_model_id(model_id, model_path)
+    eos_token_ids = get_eos_token_ids_for_model(model_id)
+
+    # Check for vocabulary size
+    empty_vocab: dict[str, int] = {}
+    vocab_size: int = getattr(tokenizer, "vocab_size", None) or len(
+        getattr(tokenizer, "get_vocab", lambda: empty_vocab)()
+    )
+    assert vocab_size > 0, f"Tokenizer should have vocab_size > 0 for {model_id}"
+
+    # Check for EOS token (either from tokenizer or explicitly provided)
+    has_eos = (
+        eos_token_ids is not None
+        or getattr(tokenizer, "eos_token_id", None) is not None
+        or getattr(tokenizer, "eos_token", None) is not None
+    )
+    assert has_eos, f"Tokenizer should have EOS token for {model_id}"
+
+
+@pytest.mark.parametrize(
+    "short_id,model_card",
+    TEST_MODELS,
+    ids=[m[0] for m in TEST_MODELS],
+)
+@pytest.mark.asyncio
+async def test_tokenizer_special_tokens(short_id: str, model_card: ModelCard) -> None:
+    """Test that tokenizer can encode text containing special tokens.
+
+    This is critical because the actual inference path uses prompts with
+    special tokens from chat templates. If special tokens aren't handled
+    correctly, encoding will fail.
+    """
+    model_id = str(model_card.model_id)
+
+    model_path = await download_tokenizer_files(model_id)
+
+    has_tokenizer = (
+        (model_path / "tokenizer.json").exists()
+        or (model_path / "tokenizer_config.json").exists()
+        or (model_path / "tiktoken.model").exists()
+        or (model_path / "tokenizer.model").exists()
+    )
+    assert has_tokenizer, f"Required tokenizer files not found for {model_id}"
+
+    tokenizer = load_tokenizer_for_model_id(model_id, model_path)
+
+    # Get special tokens from the tokenizer
+    special_tokens: list[str] = []
+
+    # Try to get special tokens from various sources
+    if hasattr(tokenizer, "all_special_tokens"):
+        special_tokens.extend(tokenizer.all_special_tokens)
+    elif hasattr(tokenizer, "_tokenizer") and hasattr(
+        tokenizer._tokenizer,
+        "all_special_tokens",
+    ):
+        special_tokens.extend(tokenizer._tokenizer.all_special_tokens)
+
+    # Also check for common special token attributes
+    for attr in [
+        "bos_token",
+        "eos_token",
+        "pad_token",
+        "unk_token",
+        "sep_token",
+        "cls_token",
+    ]:
+        token = getattr(tokenizer, attr, None)
+        if token is None and hasattr(tokenizer, "_tokenizer"):
+            token = getattr(tokenizer._tokenizer, attr, None)
+        if token and isinstance(token, str) and token not in special_tokens:
+            special_tokens.append(token)
+
+    # If we found special tokens, test encoding text that contains them
+    if special_tokens:
+        # Create text with special tokens interspersed
+        test_with_special = f"{special_tokens[0]}Hello world"
+        if len(special_tokens) > 1:
+            test_with_special += f"{special_tokens[1]}"
+
+        encoded = tokenizer.encode(test_with_special)
+        assert isinstance(encoded, list), (
+            f"encode() with special tokens should return list for {model_id}"
+        )
+        assert len(encoded) > 0, (
+            f"encode() with special tokens should return non-empty list for {model_id}"
+        )
+        assert all(isinstance(t, int) for t in encoded), (
+            f"All tokens should be integers for {model_id}"
+        )
+
+        # Verify we can decode
+        decoded = tokenizer.decode(encoded)
+        assert isinstance(decoded, str), f"decode() should return string for {model_id}"
+
+    # Test with angle-bracket tokens (common format for special tokens)
+    # These should not raise errors even if they're not actual special tokens
+    angle_bracket_text = "<|test|>Hello<|end|>"
+    encoded = tokenizer.encode(angle_bracket_text)
+    assert isinstance(encoded, list), (
+        f"encode() with angle brackets should return list for {model_id}"
+    )
+    assert len(encoded) > 0, (
+        f"encode() with angle brackets should be non-empty for {model_id}"
+    )
+
+
+# Specifically test Kimi tokenizer since it has special handling
+@pytest.mark.asyncio
+async def test_kimi_tokenizer_specifically():
+    """Test Kimi tokenizer with its specific patches and quirks."""
+    kimi_models = [
+        (short_id, card)
+        for short_id, card in MODEL_CARDS.items()
+        if "kimi" in short_id.lower()
+    ]
+
+    if not kimi_models:
+        pytest.skip("No Kimi models found in MODEL_CARDS")
+
+    _, model_card = kimi_models[0]
+    model_id = str(model_card.model_id)
+
+    model_path = await download_tokenizer_files(model_id)
+
+    # Ensure the custom tokenizer file exists
+    if not (model_path / "tokenization_kimi.py").exists():
+        pytest.skip("tokenization_kimi.py not found")
+
+    tokenizer = load_tokenizer_for_model_id(model_id, model_path)
+    eos_token_ids = get_eos_token_ids_for_model(model_id)
+
+    # Test encode/decode cycle
+    test_text = "Hello, world!"
+    encoded = tokenizer.encode(test_text)
+    decoded = tokenizer.decode(encoded)
+
+    assert len(encoded) > 0, "Kimi tokenizer should encode text"
+    assert isinstance(decoded, str), "Kimi tokenizer should decode to string"
+
+    # Test that the patched encode works (returns list of ints)
+    assert all(isinstance(t, int) for t in encoded), "Tokens should be integers"
+
+    # Test encoding text with special tokens (like from chat templates)
+    # This is critical - the warmup inference uses prompts with special tokens
+    special_token_text = "<|im_user|>user<|im_middle|>Hello<|im_end|><|im_assistant|>"
+    special_encoded = tokenizer.encode(special_token_text)
+    assert len(special_encoded) > 0, "Kimi tokenizer should handle special tokens"
+    assert all(isinstance(t, int) for t in special_encoded), (
+        "Special token encoding should return integers"
+    )
+
+    # Verify EOS token is set
+    assert eos_token_ids == [163586], "Kimi EOS token should be [163586]"
+
+
+# Test GLM tokenizer since it also has special handling
+@pytest.mark.asyncio
+async def test_glm_tokenizer_specifically():
+    """Test GLM tokenizer with its specific EOS tokens."""
+    glm_models = [
+        (short_id, card)
+        for short_id, card in MODEL_CARDS.items()
+        if "glm" in short_id.lower()
+    ]
+
+    if not glm_models:
+        pytest.skip("No GLM models found in MODEL_CARDS")
+
+    _, model_card = glm_models[0]
+    model_id = str(model_card.model_id)
+
+    model_path = await download_tokenizer_files(model_id)
+
+    has_tokenizer = (model_path / "tokenizer.json").exists() or (
+        model_path / "tokenizer_config.json"
+    ).exists()
+    if not has_tokenizer:
+        pytest.skip("GLM tokenizer files not found")
+
+    tokenizer = load_tokenizer_for_model_id(model_id, model_path)
+    eos_token_ids = get_eos_token_ids_for_model(model_id)
+
+    # Test encode/decode
+    test_text = "Hello, world!"
+    encoded = tokenizer.encode(test_text)
+    decoded = tokenizer.decode(encoded)
+
+    assert len(encoded) > 0, "GLM tokenizer should encode text"
+    assert isinstance(decoded, str), "GLM tokenizer should decode to string"
+
+    # Verify EOS tokens
+    assert eos_token_ids == [
+        151336,
+        151329,
+        151338,
+    ], "GLM EOS tokens should be correct"
--- a/src/exo/worker/tests/unittests/test_plan/test_download_and_loading.py
+++ b/src/exo/worker/tests/unittests/test_plan/test_download_and_loading.py
@@ -1,5 +1,7 @@
 import exo.worker.plan as plan_mod
 from exo.shared.types.common import NodeId
+from exo.shared.types.memory import Memory
+from exo.shared.types.models import ModelId
 from exo.shared.types.tasks import LoadModel
 from exo.shared.types.worker.downloads import DownloadCompleted, DownloadProgress
 from exo.shared.types.worker.instances import BoundInstance
@@ -7,7 +9,6 @@ from exo.shared.types.worker.runners import (
    RunnerConnected,
    RunnerIdle,
 )
-from exo.shared.types.worker.shards import ShardMetadata
 from exo.worker.tests.constants import (
    INSTANCE_1_ID,
    MODEL_A_ID,
@@ -46,7 +47,7 @@ def test_plan_requests_download_when_waiting_and_shard_not_downloaded():
    all_runners = {RUNNER_1_ID: RunnerIdle()}

    # No entry for this shard -> should trigger DownloadModel
-    download_status: dict[ShardMetadata, DownloadProgress] = {}
+    download_status: dict[ModelId, DownloadProgress] = {}

    result = plan_mod.plan(
        node_id=NODE_A,
@@ -94,13 +95,23 @@ def test_plan_loads_model_when_all_shards_downloaded_and_waiting():

    # Local node has already marked its shard as downloaded (not actually used by _load_model)
    local_download_status = {
-        shard1: DownloadCompleted(shard_metadata=shard1, node_id=NODE_A)  # type: ignore[reportUnhashable]
+        MODEL_A_ID: DownloadCompleted(
+            shard_metadata=shard1, node_id=NODE_A, total_bytes=Memory()
+        )
    }

    # Global view has completed downloads for both nodes
    global_download_status = {
-        NODE_A: [DownloadCompleted(shard_metadata=shard1, node_id=NODE_A)],
-        NODE_B: [DownloadCompleted(shard_metadata=shard2, node_id=NODE_B)],
+        NODE_A: [
+            DownloadCompleted(
+                shard_metadata=shard1, node_id=NODE_A, total_bytes=Memory()
+            )
+        ],
+        NODE_B: [
+            DownloadCompleted(
+                shard_metadata=shard2, node_id=NODE_B, total_bytes=Memory()
+            )
+        ],
    }

    result = plan_mod.plan(
@@ -140,7 +151,9 @@ def test_plan_does_not_request_download_when_shard_already_downloaded():

    # Local status claims the shard is downloaded already
    local_download_status = {
-        shard: DownloadCompleted(shard_metadata=shard, node_id=NODE_A)  # type: ignore[reportUnhashable]
+        MODEL_A_ID: DownloadCompleted(
+            shard_metadata=shard, node_id=NODE_A, total_bytes=Memory()
+        )
    }

    # Global view hasn't caught up yet (no completed shards recorded for NODE_A)
@@ -192,10 +205,16 @@ def test_plan_does_not_load_model_until_all_shards_downloaded_globally():

    # Only NODE_A's shard is recorded as downloaded globally
    local_download_status = {
-        shard1: DownloadCompleted(shard_metadata=shard1, node_id=NODE_A)  # type: ignore[reportUnhashable]
+        MODEL_A_ID: DownloadCompleted(
+            shard_metadata=shard1, node_id=NODE_A, total_bytes=Memory()
+        )
    }
    global_download_status = {
-        NODE_A: [DownloadCompleted(shard_metadata=shard1, node_id=NODE_A)],
+        NODE_A: [
+            DownloadCompleted(
+                shard_metadata=shard1, node_id=NODE_A, total_bytes=Memory()
+            )
+        ],
        NODE_B: [],  # NODE_B has no downloads completed yet
    }

@@ -212,9 +231,15 @@ def test_plan_does_not_load_model_until_all_shards_downloaded_globally():
    assert result is None

    global_download_status = {
-        NODE_A: [DownloadCompleted(shard_metadata=shard1, node_id=NODE_A)],
+        NODE_A: [
+            DownloadCompleted(
+                shard_metadata=shard1, node_id=NODE_A, total_bytes=Memory()
+            )
+        ],
        NODE_B: [
-            DownloadCompleted(shard_metadata=shard2, node_id=NODE_B)
+            DownloadCompleted(
+                shard_metadata=shard2, node_id=NODE_B, total_bytes=Memory()
+            )
        ],  # NODE_B has no downloads completed yet
    }

--- a/src/exo/worker/tests/unittests/test_plan/test_warmup.py
+++ b/src/exo/worker/tests/unittests/test_plan/test_warmup.py
@@ -12,8 +12,10 @@ from exo.worker.tests.constants import (
    MODEL_A_ID,
    NODE_A,
    NODE_B,
+    NODE_C,
    RUNNER_1_ID,
    RUNNER_2_ID,
+    RUNNER_3_ID,
 )
 from exo.worker.tests.unittests.conftest import (
    FakeRunnerSupervisor,
@@ -24,37 +26,39 @@ from exo.worker.tests.unittests.conftest import (

 def test_plan_starts_warmup_for_accepting_rank_when_all_loaded_or_warming():
    """
-    For non-final device_rank shards, StartWarmup should be emitted when all
+    For non-zero device_rank shards, StartWarmup should be emitted when all
    shards in the instance are Loaded/WarmingUp.
    """
-    shard0 = get_pipeline_shard_metadata(MODEL_A_ID, device_rank=0, world_size=2)
-    shard1 = get_pipeline_shard_metadata(MODEL_A_ID, device_rank=1, world_size=2)
+    shard0 = get_pipeline_shard_metadata(MODEL_A_ID, device_rank=0, world_size=3)
+    shard1 = get_pipeline_shard_metadata(MODEL_A_ID, device_rank=1, world_size=3)
+    shard2 = get_pipeline_shard_metadata(MODEL_A_ID, device_rank=2, world_size=3)
    instance = get_mlx_ring_instance(
        instance_id=INSTANCE_1_ID,
        model_id=MODEL_A_ID,
-        node_to_runner={NODE_A: RUNNER_1_ID, NODE_B: RUNNER_2_ID},
-        runner_to_shard={RUNNER_1_ID: shard0, RUNNER_2_ID: shard1},
+        node_to_runner={NODE_A: RUNNER_1_ID, NODE_B: RUNNER_2_ID, NODE_C: RUNNER_3_ID},
+        runner_to_shard={RUNNER_1_ID: shard0, RUNNER_2_ID: shard1, RUNNER_3_ID: shard2},
    )

    bound_instance = BoundInstance(
-        instance=instance, bound_runner_id=RUNNER_1_ID, bound_node_id=NODE_A
+        instance=instance, bound_runner_id=RUNNER_2_ID, bound_node_id=NODE_B
    )
    local_runner = FakeRunnerSupervisor(
        bound_instance=bound_instance, status=RunnerLoaded()
    )

-    runners = {RUNNER_1_ID: local_runner}
+    runners = {RUNNER_2_ID: local_runner}
    instances = {INSTANCE_1_ID: instance}
    all_runners = {
        RUNNER_1_ID: RunnerLoaded(),
        RUNNER_2_ID: RunnerLoaded(),
+        RUNNER_3_ID: RunnerWarmingUp(),
    }

    result = plan_mod.plan(
-        node_id=NODE_A,
+        node_id=NODE_B,
        runners=runners,  # type: ignore
        download_status={},
-        global_download_status={NODE_B: []},
+        global_download_status={NODE_A: []},
        instances=instances,
        all_runners=all_runners,
        tasks={},
@@ -150,9 +154,9 @@ def test_plan_does_not_start_warmup_for_rank_zero_until_others_warming():
    """
    Rank-zero shard should not start warmup until all non-zero ranks are
    already WarmingUp.
-    For accepting ranks (device_rank != world_size - 1), StartWarmup should be
+    For accepting ranks (device_rank != 0), StartWarmup should be
    emitted when all shards in the instance are Loaded/WarmingUp.
-    In a 2-node setup, rank 0 is the accepting rank.
+    In a 2-node setup, rank 1 is the accepting rank.
    """
    shard0 = get_pipeline_shard_metadata(MODEL_A_ID, device_rank=0, world_size=2)
    shard1 = get_pipeline_shard_metadata(MODEL_A_ID, device_rank=1, world_size=2)
@@ -163,7 +167,7 @@ def test_plan_does_not_start_warmup_for_rank_zero_until_others_warming():
        runner_to_shard={RUNNER_1_ID: shard0, RUNNER_2_ID: shard1},
    )

-    # Rank 0 is the accepting rank
+    # Rank 1 is the accepting rank
    bound_instance = BoundInstance(
        instance=instance, bound_runner_id=RUNNER_1_ID, bound_node_id=NODE_A
    )
@@ -188,6 +192,23 @@ def test_plan_does_not_start_warmup_for_rank_zero_until_others_warming():
        tasks={},
    )

+    assert result is None
+
+    all_runners = {
+        RUNNER_1_ID: RunnerLoaded(),
+        RUNNER_2_ID: RunnerWarmingUp(),
+    }
+
+    result = plan_mod.plan(
+        node_id=NODE_A,
+        runners=runners,  # type: ignore
+        download_status={},
+        global_download_status={NODE_A: []},
+        instances=instances,
+        all_runners=all_runners,
+        tasks={},
+    )
+
    assert isinstance(result, StartWarmup)
    assert result.instance_id == INSTANCE_1_ID

@@ -280,9 +301,8 @@ def test_plan_does_not_start_warmup_for_accepting_rank_until_all_loaded_or_warmi

 def test_plan_does_not_start_warmup_for_connecting_rank_until_others_warming():
    """
-    Connecting rank (device_rank == world_size - 1) should not start warmup
+    Connecting rank (device_rank == 0) should not start warmup
    until all other ranks are already WarmingUp.
-    In a 2-node setup, rank 1 is the connecting rank.
    """
    shard0 = get_pipeline_shard_metadata(MODEL_A_ID, device_rank=0, world_size=2)
    shard1 = get_pipeline_shard_metadata(MODEL_A_ID, device_rank=1, world_size=2)
@@ -295,13 +315,13 @@ def test_plan_does_not_start_warmup_for_connecting_rank_until_others_warming():

    # Rank 1 is the connecting rank
    bound_instance = BoundInstance(
-        instance=instance, bound_runner_id=RUNNER_2_ID, bound_node_id=NODE_B
+        instance=instance, bound_runner_id=RUNNER_1_ID, bound_node_id=NODE_A
    )
    local_runner = FakeRunnerSupervisor(
        bound_instance=bound_instance, status=RunnerLoaded()
    )

-    runners = {RUNNER_2_ID: local_runner}
+    runners = {RUNNER_1_ID: local_runner}
    instances = {INSTANCE_1_ID: instance}
    all_runners = {
        RUNNER_1_ID: RunnerLoaded(),
@@ -309,7 +329,7 @@ def test_plan_does_not_start_warmup_for_connecting_rank_until_others_warming():
    }

    result = plan_mod.plan(
-        node_id=NODE_B,
+        node_id=NODE_A,
        runners=runners,  # type: ignore
        download_status={},
        global_download_status={NODE_A: [], NODE_B: []},
--- a/src/exo/worker/tests/unittests/test_runner/test_event_ordering.py
+++ b/src/exo/worker/tests/unittests/test_runner/test_event_ordering.py
@@ -34,6 +34,7 @@ from exo.shared.types.worker.runners import (
    RunnerReady,
    RunnerRunning,
    RunnerShutdown,
+    RunnerShuttingDown,
    RunnerWarmingUp,
 )
 from exo.utils.channels import mp_channel
@@ -110,7 +111,7 @@ def assert_events_equal(test_events: Iterable[Event], true_events: Iterable[Even
 def patch_out_mlx(monkeypatch: pytest.MonkeyPatch):
    # initialize_mlx returns a "group" equal to 1
    monkeypatch.setattr(mlx_runner, "initialize_mlx", make_nothin(1))
-    monkeypatch.setattr(mlx_runner, "load_mlx_items", make_nothin((1, 1, 1)))
+    monkeypatch.setattr(mlx_runner, "load_mlx_items", make_nothin((1, 1)))
    monkeypatch.setattr(mlx_runner, "warmup_inference", make_nothin(1))
    monkeypatch.setattr(mlx_runner, "_check_for_debug_prompts", nothin)

@@ -199,6 +200,9 @@ def test_events_processed_in_correct_order(patch_out_mlx: pytest.MonkeyPatch):
            RunnerStatusUpdated(runner_id=RUNNER_1_ID, runner_status=RunnerReady()),
            TaskStatusUpdated(task_id=SHUTDOWN_TASK_ID, task_status=TaskStatus.Running),
            TaskAcknowledged(task_id=SHUTDOWN_TASK_ID),
+            RunnerStatusUpdated(
+                runner_id=RUNNER_1_ID, runner_status=RunnerShuttingDown()
+            ),
            TaskStatusUpdated(
                task_id=SHUTDOWN_TASK_ID, task_status=TaskStatus.Complete
            ),
--- a/src/exo/worker/utils/net_profile.py
+++ b/src/exo/worker/utils/net_profile.py
@@ -32,6 +32,8 @@ async def check_reachability(
            return NodeId(body) or None
        except OSError:
            return None
+        except http.client.HTTPException:
+            return None
        finally:
            connection.close()

--- a/tests/headless_runner.py
+++ b/tests/headless_runner.py
@@ -0,0 +1,268 @@
+import multiprocessing as mp
+import socket
+import time
+import typing
+
+import anyio
+from fastapi import FastAPI
+from fastapi.responses import StreamingResponse
+from hypercorn import Config
+from hypercorn.asyncio import serve  # pyright: ignore[reportUnknownVariableType]
+from loguru import logger
+from pydantic import BaseModel
+
+from exo.shared.logging import InterceptLogger, logger_setup
+from exo.shared.models.model_cards import MODEL_CARDS, ModelId
+from exo.shared.types.api import ChatCompletionMessage, ChatCompletionTaskParams
+from exo.shared.types.commands import CommandId
+from exo.shared.types.common import Host, NodeId
+from exo.shared.types.events import Event
+from exo.shared.types.tasks import (
+    ChatCompletion,
+    ConnectToGroup,
+    LoadModel,
+    Shutdown,
+    StartWarmup,
+    Task,
+)
+from exo.shared.types.worker.instances import (
+    BoundInstance,
+    Instance,
+    InstanceId,
+    MlxJacclInstance,
+    MlxRingInstance,
+)
+from exo.shared.types.worker.runners import RunnerId, ShardAssignments
+from exo.shared.types.worker.shards import PipelineShardMetadata, TensorShardMetadata
+from exo.utils.channels import MpReceiver, MpSender, mp_channel
+from exo.worker.download.impl_shard_downloader import (
+    build_full_shard,
+    exo_shard_downloader,
+)
+from exo.worker.runner.bootstrap import entrypoint
+
+
+class Tests(BaseModel):
+    # list[hostname, ip addr]
+    devs: list[list[str]]
+    model_id: str
+    kind: typing.Literal["init", "warmup", "inference"]
+
+
+mp.set_start_method("spawn", force=True)
+logger_setup(None)
+
+
+async def main():
+    logger.info("starting cool server majig")
+    await assert_downloads()
+    cfg = Config()
+    cfg.bind = "0.0.0.0:52415"
+    # nb: shared.logging needs updating if any of this changes
+    cfg.accesslog = "-"
+    cfg.errorlog = "-"
+    cfg.logger_class = InterceptLogger
+    app = FastAPI()
+    app.post("/ring")(ring_backend)
+    app.post("/jaccl")(jaccl_backend)
+    shutdown = anyio.Event()
+    await serve(
+        app,  # type: ignore
+        cfg,
+        shutdown_trigger=lambda: shutdown.wait(),
+    )
+    await anyio.sleep_forever()
+    # gracefully shutdown the api
+    shutdown.set()
+
+
+async def assert_downloads():
+    sd = exo_shard_downloader()
+    # await sd.ensure_shard(await build_full_shard(MODEL_CARDS["qwen3-0.6b"].model_id))
+    await sd.ensure_shard(
+        await build_full_shard(MODEL_CARDS["llama-3.1-8b-bf16"].model_id)
+    )
+    await sd.ensure_shard(await build_full_shard(MODEL_CARDS["qwen3-30b"].model_id))
+    await sd.ensure_shard(
+        await build_full_shard(MODEL_CARDS["gpt-oss-120b-MXFP4-Q8"].model_id)
+    )
+    await sd.ensure_shard(
+        await build_full_shard(MODEL_CARDS["gpt-oss-20b-4bit"].model_id)
+    )
+    await sd.ensure_shard(await build_full_shard(MODEL_CARDS["deepseek-v3.2"].model_id))
+    await sd.ensure_shard(
+        await build_full_shard(MODEL_CARDS["glm-4.7-8bit-gs32"].model_id)
+    )
+
+
+async def ring_backend(test: Tests):
+    iid = InstanceId(str(hash(str(test.devs))))
+    weird_hn = socket.gethostname()
+    for dev in test.devs:
+        if weird_hn.startswith(dev[0]) or dev[0].startswith(weird_hn):
+            hn = dev[0]
+            break
+    else:
+        raise ValueError(f"{weird_hn} not in {test.devs}")
+    return await execute_test(test, ring_instance(test, iid, hn), hn)
+
+
+def ring_instance(test: Tests, iid: InstanceId, hn: str) -> Instance:
+    hbn = [Host(ip="i dont care", port=52416) for _ in test.devs]
+    world_size = len(test.devs)
+    for i in range(world_size):
+        if test.devs[i][0] == hn:
+            hn = test.devs[i][0]
+            if i - 1 >= 0:
+                hbn[i - 1] = Host(ip=test.devs[i - 1][1], port=52416)
+            if i + 1 < len(test.devs):
+                hbn[i + 1] = Host(ip=test.devs[i + 1][1], port=52416)
+            hbn[i] = Host(ip="0.0.0.0", port=52416)
+            break
+    else:
+        raise ValueError(f"{hn} not in {test.devs}")
+
+    meta = MODEL_CARDS[test.model_id].metadata
+    instance = MlxRingInstance(
+        instance_id=iid,
+        ephemeral_port=52416,
+        hosts_by_node={NodeId(hn): hbn},
+        shard_assignments=ShardAssignments(
+            model_id=ModelId(test.model_id),
+            node_to_runner={NodeId(host[0]): RunnerId(host[0]) for host in test.devs},
+            runner_to_shard={
+                RunnerId(test.devs[i][0]): PipelineShardMetadata(
+                    model_meta=meta,
+                    device_rank=i,
+                    world_size=world_size,
+                    start_layer=(meta.n_layers // world_size) * i,
+                    end_layer=min(
+                        meta.n_layers, (meta.n_layers // world_size) * (i + 1)
+                    ),
+                    n_layers=min(meta.n_layers, (meta.n_layers // world_size) * (i + 1))
+                    - (meta.n_layers // world_size) * i,
+                )
+                for i in range(world_size)
+            },
+        ),
+    )
+
+    return instance
+
+
+async def execute_test(test: Tests, instance: Instance, hn: str):
+    world_size = len(test.devs)
+    iid = InstanceId(str(hash(str(test.devs))))
+    _handle, recv, send = new_runner(instance, hn)
+    if world_size > 1:
+        send.send(ConnectToGroup(instance_id=iid))
+    send.send(LoadModel(instance_id=iid))
+
+    match test.kind:
+        case "init":
+            pass
+        case "warmup":
+            send.send(StartWarmup(instance_id=iid))
+        case "inference":
+            send.send(StartWarmup(instance_id=iid))
+            send.send(
+                ChatCompletion(
+                    task_params=ChatCompletionTaskParams(
+                        model=test.model_id,
+                        messages=[
+                            ChatCompletionMessage(
+                                role="system", content="You are a helpful assistant"
+                            ),
+                            ChatCompletionMessage(
+                                role="user", content="What is the capital of France?"
+                            ),
+                        ],
+                    ),
+                    command_id=CommandId("yo"),
+                    instance_id=iid,
+                )
+            )
+
+    send.send(Shutdown(runner_id=RunnerId(hn), instance_id=iid))
+
+    async def map_recv():
+        with recv:
+            try:
+                async for item in recv:
+                    yield item.model_dump_json() + "\n"
+            except anyio.ClosedResourceError:
+                pass
+
+    ret = StreamingResponse(map_recv())
+    ret._pls_dont_gc = _handle  # type: ignore
+    return ret
+
+
+async def jaccl_backend(test: Tests):
+    iid = InstanceId(str(hash(str(test.devs))))
+    weird_hn = socket.gethostname()
+    for dev in test.devs:
+        if weird_hn.startswith(dev[0]) or dev[0].startswith(weird_hn):
+            hn = dev[0]
+            break
+    else:
+        raise ValueError(f"{weird_hn} not in {test.devs}")
+    return await execute_test(test, jaccl_instance(test, iid, hn), hn)
+
+
+def jaccl_instance(test: Tests, iid: InstanceId, hn: str):
+    meta = MODEL_CARDS[test.model_id].metadata
+    world_size = len(test.devs)
+
+    return MlxJacclInstance(
+        instance_id=iid,
+        ibv_devices=[[None, "rdma_en3"], ["rdma_en3", None]],
+        # rank 0 is always coordinator
+        jaccl_coordinators={
+            NodeId(host[0]): test.devs[0][1] + ":52416" for host in test.devs
+        },
+        shard_assignments=ShardAssignments(
+            model_id=ModelId(test.model_id),
+            node_to_runner={NodeId(host[0]): RunnerId(host[0]) for host in test.devs},
+            runner_to_shard={
+                RunnerId(test.devs[i][0]): TensorShardMetadata(
+                    model_meta=meta,
+                    device_rank=i,
+                    world_size=world_size,
+                    start_layer=meta.n_layers,
+                    end_layer=meta.n_layers,
+                    n_layers=meta.n_layers,
+                )
+                for i in range(world_size)
+            },
+        ),
+    )
+
+
+def new_runner(
+    instance: Instance,
+    hn: str,
+) -> tuple[mp.Process, MpReceiver[Event], MpSender[Task]]:
+    bound_instance = BoundInstance(
+        instance=instance, bound_runner_id=RunnerId(hn), bound_node_id=NodeId(hn)
+    )
+    ev_send, ev_recv = mp_channel[Event]()
+    task_send, task_recv = mp_channel[Task]()
+
+    runner_process = mp.Process(
+        target=entrypoint,
+        args=(
+            bound_instance,
+            ev_send,
+            task_recv,
+            logger,
+        ),
+    )
+    runner_process._pls_dont_gc = (ev_send, task_recv)  # type: ignore
+    runner_process.start()
+    time.sleep(0.1)
+    return (runner_process, ev_recv, task_send)
+
+
+if __name__ == "__main__":
+    anyio.run(main)
--- a/tests/start_distributed_test.sh
+++ b/tests/start_distributed_test.sh
@@ -0,0 +1,56 @@
+#!/usr/bin/env bash
+
+set -euo pipefail
+
+query() {
+  tailscale status | awk -v find="$1" '$2 == find { print $1 }'
+}
+
+if [[ $# -lt 2 ]]; then
+  echo "USAGE: $0 <test kind> [host1] [host2] ..."
+  exit 1
+fi
+
+
+kind=$1
+shift
+
+test_kinds="ring jaccl"
+
+if ! echo "$test_kinds" | grep -q "$kind"; then
+  printf "%s is not a known test kind.\nCurrent test kinds are %s" "$kind" "$test_kinds"
+  exit 1
+fi
+
+hostnames=("$@")
+weaved=()
+ips=()
+for name in "${hostnames[@]}"; do
+  ip=$(query "$name")
+  ips+=("$ip")
+  weaved+=("$name" "$ip")
+done
+
+devs_raw=$(printf "[\"%s\", \"%s\"], " "${weaved[@]}")
+devs="[${devs_raw%, }]"
+
+model_ids=("qwen3-30b" "gpt-oss-120b-MXFP4-Q8" "kimi-k2-thinking")
+
+for model_id in "${model_ids[@]}"; do
+  for i in "${!ips[@]}"; do  
+    { 
+      req="{
+        \"model_id\": \"${model_id}\",
+        \"devs\": ${devs},
+        \"kind\": \"inference\"
+       }"
+      echo "req $req"
+      curl -sN \
+        -X POST "http://${ips[$i]}:52415/${kind}" \
+        -H "Content-Type: application/json" -d "$req" \
+      2>&1 | sed "s/^/\n${hostnames[$i]}@${ips[$i]}: /" || echo "curl to ${hostnames[$i]} failed" && exit 1
+    } &
+  done
+  wait
+done
+
--- a/tmp/disable_bridge_enable_dhcp.sh
+++ b/tmp/disable_bridge_enable_dhcp.sh
@@ -1,24 +0,0 @@
-#!/usr/bin/env bash
-set -euo pipefail
-
-networksetup -listallnetworkservices | grep -q '^Thunderbolt Bridge$' \
-  && echo "Disabling bridge in networksetup" \
-  && networksetup -setnetworkserviceenabled "Thunderbolt Bridge" off
-
-networksetup -listallnetworkservices | grep -q '^\*Thunderbolt Bridge$' \
-  && echo "Bridge disabled in networksetup"
-
-ifconfig bridge0 &>/dev/null && {
-  ifconfig bridge0 | grep -q 'member' && echo "Removing bridge members in ifconfig" && {
-    ifconfig bridge0 | \
-      awk '/member/ {print $2}' | \
-      xargs -n1 sudo ifconfig bridge0 deletem
-  }
-  ifconfig bridge0 | grep -q 'status: active' && sudo ifconfig bridge0 down
-  ifconfig bridge0 | grep -q 'status: inactive' && echo "Bridge disabled in ifconfig"
-}
-
-for iface in $(seq 2 7); do
-  sudo ipconfig set "en$iface" dhcp && echo "enabled dhcp on en$iface" || echo "failed to enable dhcp on en$iface"
-done
-
--- a/uv.lock
+++ b/uv.lock