switch to mlx-lm 30.5 instead of git main

we were waiting for mlx-lm 30.5 features - we can switch to the pypi version now that it is out this also removes some dead code in our pyproject.toml
improve log message in shard downloader
2026-01-30 16:51:02 -05:00 · 2026-01-30 11:19:01 +00:00 · 2026-01-30 10:35:01 +00:00 · 2026-01-30 10:23:08 +00:00 · 2026-01-29 19:14:40 +00:00 · 2026-01-29 17:24:32 +00:00
37 changed files with 2209 additions and 1488 deletions
--- a/.github/actions/typecheck/action.yml
+++ b/.github/actions/typecheck/action.yml
@@ -1,12 +0,0 @@
-name: Type Check
-
-description: "Run type checker"
-
-runs:
-  using: "composite"
-  steps:
-    - name: Run type checker
-      run: |
-        nix --extra-experimental-features nix-command --extra-experimental-features flakes develop -c just sync
-        nix --extra-experimental-features nix-command --extra-experimental-features flakes develop -c just check
-      shell: bash
--- a/.github/workflows/pipeline.yml
+++ b/.github/workflows/pipeline.yml
@@ -26,73 +26,14 @@ jobs:
          name: exo
          authToken: "${{ secrets.CACHIX_AUTH_TOKEN }}"

-      - name: Configure git user
-        run: |
-          git config --local user.email "github-actions@users.noreply.github.com"
-          git config --local user.name  "github-actions bot"
-        shell: bash
+      - name: Load nix develop environment
+        run: nix run github:nicknovitski/nix-develop/v1

-      - name: Pull LFS files
-        run: |
-          echo "Pulling Git LFS files..."
-          git lfs pull
-        shell: bash
+      - name: Sync dependencies
+        run: uv sync --all-packages

-      - name: Setup Nix Environment
-        run: |
-          echo "Checking for nix installation..."
-          
-          # Check if nix binary exists directly
-          if [ -f /nix/var/nix/profiles/default/bin/nix ]; then
-            echo "Found nix binary at /nix/var/nix/profiles/default/bin/nix"
-            export PATH="/nix/var/nix/profiles/default/bin:$PATH"
-            echo "PATH=$PATH" >> $GITHUB_ENV
-            nix --version
-          elif [ -f /nix/var/nix/profiles/default/etc/profile.d/nix-daemon.sh ]; then
-            echo "Found nix profile script, sourcing..."
-            source /nix/var/nix/profiles/default/etc/profile.d/nix-daemon.sh
-            nix --version
-          elif command -v nix >/dev/null 2>&1; then
-            echo "Nix already in PATH"
-            nix --version
-          else
-            echo "Nix not found. Debugging info:"
-            echo "Contents of /nix/var/nix/profiles/default/:"
-            ls -la /nix/var/nix/profiles/default/ 2>/dev/null || echo "Directory not found"
-            echo "Contents of /nix/var/nix/profiles/default/bin/:"
-            ls -la /nix/var/nix/profiles/default/bin/ 2>/dev/null || echo "Directory not found"
-            exit 1
-          fi
-        shell: bash
-
-      - name: Configure basedpyright include for local MLX
-        run: |
-          RUNNER_LABELS='${{ toJSON(runner.labels) }}'
-          if echo "$RUNNER_LABELS" | grep -q "local_mlx"; then
-            if [ -d "/Users/Shared/mlx" ]; then
-              echo "Updating [tool.basedpyright].include to use /Users/Shared/mlx"
-              awk '
-                BEGIN { in=0 }
-                /^\[tool\.basedpyright\]/ { in=1; print; next }
-                in && /^\[/ { in=0 }  # next section
-                in && /^[ \t]*include[ \t]*=/ {
-                  print "include = [\"/Users/Shared/mlx\"]"
-                  next
-                }
-                { print }
-              ' pyproject.toml > pyproject.toml.tmp && mv pyproject.toml.tmp pyproject.toml
-
-              echo "New [tool.basedpyright] section:"
-              sed -n '/^\[tool\.basedpyright\]/,/^\[/p' pyproject.toml | sed '$d' || true
-            else
-              echo "local_mlx tag present but /Users/Shared/mlx not found; leaving pyproject unchanged."
-            fi
-          else
-            echo "Runner does not have 'local_mlx' tag; leaving pyproject unchanged."
-          fi
-        shell: bash
-
-      - uses: ./.github/actions/typecheck
+      - name: Run type checker
+        run: uv run basedpyright --project pyproject.toml

  nix:
    name: Build and check (${{ matrix.system }})
@@ -123,6 +64,63 @@ jobs:
          name: exo
          authToken: "${{ secrets.CACHIX_AUTH_TOKEN }}"

+      - name: Build Metal packages (macOS only)
+        if: runner.os == 'macOS'
+        run: |
+          # Try to build metal-toolchain first (may succeed via cachix cache hit)
+          if nix build .#metal-toolchain 2>/dev/null; then
+            echo "metal-toolchain built successfully (likely cache hit)"
+          else
+            echo "metal-toolchain build failed, extracting from Xcode..."
+
+            NAR_HASH="sha256-ayR5mXN4sZAddwKEG2OszGRF93k9ZFc7H0yi2xbylQw="
+            NAR_NAME="metal-toolchain-17C48.nar"
+
+            # Use RUNNER_TEMP to avoid /tmp symlink issues on macOS
+            WORK_DIR="${RUNNER_TEMP}/metal-work"
+            mkdir -p "$WORK_DIR"
+
+            # Download the Metal toolchain component
+            xcodebuild -downloadComponent MetalToolchain
+
+            # Find and mount the DMG
+            DMG_PATH=$(find /System/Library/AssetsV2/com_apple_MobileAsset_MetalToolchain -name '*.dmg' 2>/dev/null | head -1)
+            if [ -z "$DMG_PATH" ]; then
+              echo "Error: Could not find Metal toolchain DMG"
+              exit 1
+            fi
+
+            echo "Found DMG at: $DMG_PATH"
+            hdiutil attach "$DMG_PATH" -mountpoint "${WORK_DIR}/metal-dmg"
+
+            # Copy the toolchain
+            cp -R "${WORK_DIR}/metal-dmg/Metal.xctoolchain" "${WORK_DIR}/metal-export"
+            hdiutil detach "${WORK_DIR}/metal-dmg"
+
+            # Create NAR and add to store
+            nix nar pack "${WORK_DIR}/metal-export" > "${WORK_DIR}/${NAR_NAME}"
+            STORE_PATH=$(nix store add --mode flat "${WORK_DIR}/${NAR_NAME}")
+            echo "Added NAR to store: $STORE_PATH"
+
+            # Verify the hash matches
+            ACTUAL_HASH=$(nix hash file "${WORK_DIR}/${NAR_NAME}")
+            if [ "$ACTUAL_HASH" != "$NAR_HASH" ]; then
+              echo "Warning: NAR hash mismatch!"
+              echo "Expected: $NAR_HASH"
+              echo "Actual:   $ACTUAL_HASH"
+              echo "The metal-toolchain.nix may need updating"
+            fi
+
+            # Clean up
+            rm -rf "$WORK_DIR"
+
+            # Retry the build now that NAR is in store
+            nix build .#metal-toolchain
+          fi
+
+          # Build mlx (depends on metal-toolchain)
+          nix build .#mlx
+
      - name: Build all Nix outputs
        run: |
          nix flake show --json | jq -r '
@@ -134,3 +132,14 @@ jobs:

      - name: Run nix flake check
        run: nix flake check
+
+      - name: Run pytest (macOS only)
+        if: runner.os == 'macOS'
+        run: |
+          # Build the test environment (requires relaxed sandbox for uv2nix on macOS)
+          TEST_ENV=$(nix build '.#exo-test-env' --option sandbox relaxed --print-out-paths)
+
+          # Run pytest outside sandbox (needs GPU access for MLX)
+          export HOME="$RUNNER_TEMP"
+          export EXO_TESTS=1
+          $TEST_ENV/bin/python -m pytest src -m "not slow" --import-mode=importlib
--- a/README.md
+++ b/README.md
@@ -5,7 +5,7 @@
  <img alt="exo logo" src="/docs/imgs/exo-logo-transparent.png" width="50%" height="50%">
 </picture>

-exo: Run your own AI cluster at home with everyday devices. Maintained by [exo labs](https://x.com/exolabs).
+exo: Run frontier AI locally. Maintained by [exo labs](https://x.com/exolabs).

 <p align="center">
  <a href="https://discord.gg/TJ4P57arEm" target="_blank" rel="noopener noreferrer"><img src="https://img.shields.io/badge/Discord-Join%20Server-5865F2?logo=discord&logoColor=white" alt="Discord"></a>
@@ -107,6 +107,10 @@ uv run exo

 This starts the exo dashboard and API at http://localhost:52415/

+
+*Please view the section on RDMA to enable this feature on MacOS >=26.2!*
+
+
 ### Run from Source (Linux)

 **Prerequisites:**
@@ -230,7 +234,7 @@ This removes:

 RDMA is a new capability added to macOS 26.2. It works on any Mac with Thunderbolt 5 (M4 Pro Mac Mini, M4 Max Mac Studio, M4 Max MacBook Pro, M3 Ultra Mac Studio).

-Note that on Mac Studio, you cannot use the Thunderbolt 5 port next to the Ethernet port.
+Please refer to the caveats for immediate troubleshooting.

 To enable RDMA on macOS, follow these steps:

@@ -247,6 +251,14 @@ To enable RDMA on macOS, follow these steps:

 After that, RDMA will be enabled in macOS and exo will take care of the rest.

+**Important Caveats**
+
+1. Devices that wish to be part of an RDMA cluster must be connected to all other devices in the cluster.
+2. The cables must support TB5.
+3. On a Mac Studio, you cannot use the Thunderbolt 5 port next to the Ethernet port.
+4. If running from source, please use the script found at `tmp/set_rdma_network_config.sh`, which will disable Thunderbolt Bridge and set dhcp on each RDMA port.
+5. RDMA ports may be unable to discover each other on different versions of MacOS. Please ensure that OS versions match exactly (even beta version numbers) on all devices.
+
 ---

 ### Using the API
--- a/app/EXO/EXO.xcodeproj/project.pbxproj
+++ b/app/EXO/EXO.xcodeproj/project.pbxproj
@@ -342,6 +342,8 @@
 				SDKROOT = macosx;
 				SWIFT_ACTIVE_COMPILATION_CONDITIONS = "DEBUG $(inherited)";
 				SWIFT_OPTIMIZATION_LEVEL = "-Onone";
+				SWIFT_TREAT_WARNINGS_AS_ERRORS = YES;
+				GCC_TREAT_WARNINGS_AS_ERRORS = YES;
 			};
 			name = Debug;
 		};
@@ -397,6 +399,8 @@
 				MTL_FAST_MATH = YES;
 				SDKROOT = macosx;
 				SWIFT_COMPILATION_MODE = wholemodule;
+				SWIFT_TREAT_WARNINGS_AS_ERRORS = YES;
+				GCC_TREAT_WARNINGS_AS_ERRORS = YES;
 			};
 			name = Release;
 		};
--- a/app/EXO/EXO/EXOApp.swift
+++ b/app/EXO/EXO/EXOApp.swift
@@ -225,7 +225,7 @@ private final class ExoUpdaterDelegate: NSObject, SPUUpdaterDelegate {
        }
    }

-    private func showNotification(title: String, body: String) {
+    nonisolated private func showNotification(title: String, body: String) {
        let center = UNUserNotificationCenter.current()
        let content = UNMutableNotificationContent()
        content.title = title
--- a/app/EXO/EXO/Services/NetworkSetupHelper.swift
+++ b/app/EXO/EXO/Services/NetworkSetupHelper.swift
@@ -18,6 +18,9 @@ enum NetworkSetupHelper {

        set -euo pipefail

+        # Wait for macOS to finish network setup after boot
+        sleep 20
+
        PREFS="/Library/Preferences/SystemConfiguration/preferences.plist"

        # Remove bridge0 interface
@@ -80,7 +83,7 @@ enum NetworkSetupHelper {
                let alert = NSAlert()
                alert.messageText = "EXO Network Configuration"
                alert.informativeText =
-                    "EXO needs to install a system service to automatically disable Thunderbolt Bridge on startup. This prevents network loops when connecting multiple Macs via Thunderbolt.\n\nYou will be prompted for your administrator password."
+                    "EXO needs to install a system service to configure local networking. This will disable Thunderbolt Bridge (preventing packet storms) and install a Network Location.\n\nYou will be prompted for your password."
                alert.alertStyle = .informational
                alert.addButton(withTitle: "Install")
                alert.addButton(withTitle: "Not Now")
@@ -241,11 +244,11 @@ enum NetworkSetupHelper {
        rm -f "$LOG_OUT" "$LOG_ERR"

        # Switch back to Automatic network location
-        networksetup -switchtolocation Automatic 2>/dev/null || true
+        networksetup -switchtolocation Automatic >/dev/null 2>&1 || true

        # Delete the exo network location if it exists
-        networksetup -listlocations | grep -q '^exo$' && {
-          networksetup -deletelocation exo 2>/dev/null || true
+        networksetup -listlocations 2>/dev/null | grep -q '^exo$' && {
+          networksetup -deletelocation exo >/dev/null 2>&1 || true
        } || true

        # Re-enable any Thunderbolt Bridge service if it exists
@@ -255,12 +258,12 @@ enum NetworkSetupHelper {
          tb_devices=$(networksetup -listallhardwareports 2>/dev/null | awk '
            /^Hardware Port:/ { port = tolower(substr($0, 16)) }
            /^Device:/ { if (port ~ /thunderbolt/) print substr($0, 9) }
-          ')
+          ') || true
          [ -z "$tb_devices" ] && return 0

          # For each bridge device, check if it contains Thunderbolt interfaces
          for bridge in bridge0 bridge1 bridge2; do
-            members=$(ifconfig "$bridge" 2>/dev/null | awk '/member:/ {print $2}')
+            members=$(ifconfig "$bridge" 2>/dev/null | awk '/member:/ {print $2}') || true
            [ -z "$members" ] && continue

            for tb_dev in $tb_devices; do
@@ -269,7 +272,7 @@ enum NetworkSetupHelper {
                service_name=$(networksetup -listnetworkserviceorder 2>/dev/null | awk -v dev="$bridge" '
                  /^\\([0-9*]/ { gsub(/^\\([0-9*]+\\) /, ""); svc = $0 }
                  /Device:/ && $0 ~ dev { print svc; exit }
-                ')
+                ') || true
                if [ -n "$service_name" ]; then
                  networksetup -setnetworkserviceenabled "$service_name" on 2>/dev/null || true
                  return 0
@@ -277,8 +280,9 @@ enum NetworkSetupHelper {
              fi
            done
          done
+          return 0
        }
-        find_and_enable_thunderbolt_bridge
+        find_and_enable_thunderbolt_bridge || true

        echo "EXO network components removed successfully"
        """
--- a/app/EXO/EXO/Services/ThunderboltBridgeService.swift
+++ b/app/EXO/EXO/Services/ThunderboltBridgeService.swift
@@ -127,21 +127,24 @@ final class ThunderboltBridgeService: ObservableObject {

        // 2. Request specific network configuration rights
        let rightName = "system.services.systemconfiguration.network"
-        var item = AuthorizationItem(
-            name: rightName,
-            valueLength: 0,
-            value: nil,
-            flags: 0
-        )
-        var rights = AuthorizationRights(count: 1, items: &item)
-
-        status = AuthorizationCopyRights(
-            authRef,
-            &rights,
-            nil,
-            [.extendRights, .interactionAllowed],
-            nil
-        )
+        status = rightName.withCString { nameCString in
+            var item = AuthorizationItem(
+                name: nameCString,
+                valueLength: 0,
+                value: nil,
+                flags: 0
+            )
+            return withUnsafeMutablePointer(to: &item) { itemPointer in
+                var rights = AuthorizationRights(count: 1, items: itemPointer)
+                return AuthorizationCopyRights(
+                    authRef,
+                    &rights,
+                    nil,
+                    [.extendRights, .interactionAllowed],
+                    nil
+                )
+            }
+        }
        guard status == errAuthorizationSuccess else {
            if status == errAuthorizationCanceled {
                throw ThunderboltBridgeError.authorizationCanceled
--- a/app/EXO/uninstall-exo.sh
+++ b/app/EXO/uninstall-exo.sh
@@ -29,21 +29,21 @@ YELLOW='\033[1;33m'
 NC='\033[0m' # No Color

 echo_info() {
-    echo -e "${GREEN}[INFO]${NC} $1"
+  echo -e "${GREEN}[INFO]${NC} $1"
 }

 echo_warn() {
-    echo -e "${YELLOW}[WARN]${NC} $1"
+  echo -e "${YELLOW}[WARN]${NC} $1"
 }

 echo_error() {
-    echo -e "${RED}[ERROR]${NC} $1"
+  echo -e "${RED}[ERROR]${NC} $1"
 }

 # Check if running as root
 if [[ $EUID -ne 0 ]]; then
-    echo_error "This script must be run as root (use sudo)"
-    exit 1
+  echo_error "This script must be run as root (use sudo)"
+  exit 1
 fi

 echo ""
@@ -55,64 +55,64 @@ echo ""
 # Unload the LaunchDaemon if running
 echo_info "Stopping network setup daemon..."
 if launchctl list | grep -q "$LABEL"; then
-    launchctl bootout system/"$LABEL" 2>/dev/null || true
-    echo_info "Daemon stopped"
+  launchctl bootout system/"$LABEL" 2>/dev/null || true
+  echo_info "Daemon stopped"
 else
-    echo_warn "Daemon was not running"
+  echo_warn "Daemon was not running"
 fi

 # Remove LaunchDaemon plist
-if [[ -f "$PLIST_DEST" ]]; then
-    rm -f "$PLIST_DEST"
-    echo_info "Removed LaunchDaemon plist"
+if [[ -f $PLIST_DEST ]]; then
+  rm -f "$PLIST_DEST"
+  echo_info "Removed LaunchDaemon plist"
 else
-    echo_warn "LaunchDaemon plist not found (already removed?)"
+  echo_warn "LaunchDaemon plist not found (already removed?)"
 fi

 # Remove the script and parent directory
-if [[ -f "$SCRIPT_DEST" ]]; then
-    rm -f "$SCRIPT_DEST"
-    echo_info "Removed network setup script"
+if [[ -f $SCRIPT_DEST ]]; then
+  rm -f "$SCRIPT_DEST"
+  echo_info "Removed network setup script"
 else
-    echo_warn "Network setup script not found (already removed?)"
+  echo_warn "Network setup script not found (already removed?)"
 fi

 # Remove EXO directory if empty
 if [[ -d "/Library/Application Support/EXO" ]]; then
-    rmdir "/Library/Application Support/EXO" 2>/dev/null && \
-        echo_info "Removed EXO support directory" || \
-        echo_warn "EXO support directory not empty, leaving in place"
+  rmdir "/Library/Application Support/EXO" 2>/dev/null &&
+    echo_info "Removed EXO support directory" ||
+    echo_warn "EXO support directory not empty, leaving in place"
 fi

 # Remove log files
-if [[ -f "$LOG_OUT" ]] || [[ -f "$LOG_ERR" ]]; then
-    rm -f "$LOG_OUT" "$LOG_ERR"
-    echo_info "Removed log files"
+if [[ -f $LOG_OUT ]] || [[ -f $LOG_ERR ]]; then
+  rm -f "$LOG_OUT" "$LOG_ERR"
+  echo_info "Removed log files"
 else
-    echo_warn "Log files not found (already removed?)"
+  echo_warn "Log files not found (already removed?)"
 fi

 # Switch back to Automatic network location
 echo_info "Restoring network configuration..."
 if networksetup -listlocations | grep -q "^Automatic$"; then
-    networksetup -switchtolocation Automatic 2>/dev/null || true
-    echo_info "Switched to Automatic network location"
+  networksetup -switchtolocation Automatic 2>/dev/null || true
+  echo_info "Switched to Automatic network location"
 else
-    echo_warn "Automatic network location not found"
+  echo_warn "Automatic network location not found"
 fi

 # Delete the exo network location if it exists
 if networksetup -listlocations | grep -q "^exo$"; then
-    networksetup -deletelocation exo 2>/dev/null || true
-    echo_info "Deleted 'exo' network location"
+  networksetup -deletelocation exo 2>/dev/null || true
+  echo_info "Deleted 'exo' network location"
 else
-    echo_warn "'exo' network location not found (already removed?)"
+  echo_warn "'exo' network location not found (already removed?)"
 fi

 # Re-enable Thunderbolt Bridge if it exists
 if networksetup -listnetworkservices 2>/dev/null | grep -q "Thunderbolt Bridge"; then
-    networksetup -setnetworkserviceenabled "Thunderbolt Bridge" on 2>/dev/null || true
-    echo_info "Re-enabled Thunderbolt Bridge"
+  networksetup -setnetworkserviceenabled "Thunderbolt Bridge" on 2>/dev/null || true
+  echo_info "Re-enabled Thunderbolt Bridge"
 fi

 # Note about launch at login registration
@@ -124,14 +124,14 @@ echo_warn "  System Settings → General → Login Items → Remove EXO"
 # Check if EXO.app exists in common locations
 APP_FOUND=false
 for app_path in "/Applications/EXO.app" "$HOME/Applications/EXO.app"; do
-    if [[ -d "$app_path" ]]; then
-        if [[ "$APP_FOUND" == false ]]; then
-            echo ""
-            APP_FOUND=true
-        fi
-        echo_warn "EXO.app found at: $app_path"
-        echo_warn "You may want to move it to Trash manually."
+  if [[ -d $app_path ]]; then
+    if [[ $APP_FOUND == false ]]; then
+      echo ""
+      APP_FOUND=true
    fi
+    echo_warn "EXO.app found at: $app_path"
+    echo_warn "You may want to move it to Trash manually."
+  fi
 done

 echo ""
@@ -151,4 +151,3 @@ echo ""
 echo "Manual step required:"
 echo "  Remove EXO from Login Items in System Settings → General → Login Items"
 echo ""
-
--- a/dashboard/package-lock.json
+++ b/dashboard/package-lock.json
@@ -865,7 +865,6 @@
 			"integrity": "sha512-oH8tXw7EZnie8FdOWYrF7Yn4IKrqTFHhXvl8YxXxbKwTMcD/5NNCryUSEXRk2ZR4ojnub0P8rNrsVGHXWqIDtA==",
 			"dev": true,
 			"license": "MIT",
-			"peer": true,
 			"dependencies": {
 				"@standard-schema/spec": "^1.0.0",
 				"@sveltejs/acorn-typescript": "^1.0.5",
@@ -905,7 +904,6 @@
 			"integrity": "sha512-Y1Cs7hhTc+a5E9Va/xwKlAJoariQyHY+5zBgCZg4PFWNYQ1nMN9sjK1zhw1gK69DuqVP++sht/1GZg1aRwmAXQ==",
 			"dev": true,
 			"license": "MIT",
-			"peer": true,
 			"dependencies": {
 				"@sveltejs/vite-plugin-svelte-inspector": "^4.0.1",
 				"debug": "^4.4.1",
@@ -1522,7 +1520,6 @@
 			"integrity": "sha512-LCCV0HdSZZZb34qifBsyWlUmok6W7ouER+oQIGBScS8EsZsQbrtFTUrDX4hOl+CS6p7cnNC4td+qrSVGSCTUfQ==",
 			"dev": true,
 			"license": "MIT",
-			"peer": true,
 			"dependencies": {
 				"undici-types": "~6.21.0"
 			}
@@ -1532,7 +1529,6 @@
 			"resolved": "https://registry.npmjs.org/acorn/-/acorn-8.15.0.tgz",
 			"integrity": "sha512-NZyJarBfL7nWwIq+FDL6Zp/yHEhePMNnnJ0y3qfieCrmNvYct8uvtiV41UvlSe6apAfk0fY1FbWx+NwfmpvtTg==",
 			"license": "MIT",
-			"peer": true,
 			"bin": {
 				"acorn": "bin/acorn"
 			},
@@ -1945,7 +1941,6 @@
 			"integrity": "sha512-fmTRWbNMmsmWq6xJV8D19U/gw/bwrHfNXxrIN+HfZgnzqTHp9jOmKMhsTUjXOJnZOdZY9Q28y4yebKzqDKlxlQ==",
 			"dev": true,
 			"license": "ISC",
-			"peer": true,
 			"engines": {
 				"node": ">=12"
 			}
@@ -2653,7 +2648,6 @@
 			"integrity": "sha512-5gTmgEY/sqK6gFXLIsQNH19lWb4ebPDLA4SdLP7dsWkIXHWlG66oPuVvXSGFPppYZz8ZDZq0dYYrbHfBCVUb1Q==",
 			"dev": true,
 			"license": "MIT",
-			"peer": true,
 			"engines": {
 				"node": ">=12"
 			},
@@ -2696,7 +2690,6 @@
 			"integrity": "sha512-UOnG6LftzbdaHZcKoPFtOcCKztrQ57WkHDeRD9t/PTQtmT0NHSeWWepj6pS0z/N7+08BHFDQVUrfmfMRcZwbMg==",
 			"dev": true,
 			"license": "MIT",
-			"peer": true,
 			"bin": {
 				"prettier": "bin/prettier.cjs"
 			},
@@ -2869,7 +2862,6 @@
 			"resolved": "https://registry.npmjs.org/svelte/-/svelte-5.45.3.tgz",
 			"integrity": "sha512-ngKXNhNvwPzF43QqEhDOue7TQTrG09em1sd4HBxVF0Wr2gopAmdEWan+rgbdgK4fhBtSOTJO8bYU4chUG7VXZQ==",
 			"license": "MIT",
-			"peer": true,
 			"dependencies": {
 				"@jridgewell/remapping": "^2.3.4",
 				"@jridgewell/sourcemap-codec": "^1.5.0",
@@ -3014,7 +3006,6 @@
 			"integrity": "sha512-jl1vZzPDinLr9eUt3J/t7V6FgNEw9QjvBPdysz9KfQDD41fQrC2Y4vKQdiaUpFT4bXlb1RHhLpp8wtm6M5TgSw==",
 			"dev": true,
 			"license": "Apache-2.0",
-			"peer": true,
 			"bin": {
 				"tsc": "bin/tsc",
 				"tsserver": "bin/tsserver"
@@ -3036,7 +3027,6 @@
 			"integrity": "sha512-+Oxm7q9hDoLMyJOYfUYBuHQo+dkAloi33apOPP56pzj+vsdJDzr+j1NISE5pyaAuKL4A3UD34qd0lx5+kfKp2g==",
 			"dev": true,
 			"license": "MIT",
-			"peer": true,
 			"dependencies": {
 				"esbuild": "^0.25.0",
 				"fdir": "^6.4.4",
--- a/dashboard/src/lib/stores/app.svelte.ts
+++ b/dashboard/src/lib/stores/app.svelte.ts
@@ -173,6 +173,11 @@ export interface PlacementPreviewResponse {
  previews: PlacementPreview[];
 }

+interface ImageApiResponse {
+  created: number;
+  data: Array<{ b64_json?: string; url?: string }>;
+}
+
 interface RawStateResponse {
  topology?: RawTopology;
  instances?: Record<
@@ -2095,107 +2100,137 @@ class AppStore {
        throw new Error(`API error: ${response.status} - ${errorText}`);
      }

-      const reader = response.body?.getReader();
-      if (!reader) {
-        throw new Error("No response body");
-      }
+      // Streaming requires both stream=true AND partialImages > 0
+      const isStreaming = params.stream && params.partialImages > 0;

-      interface ImageGenerationChunk {
-        data?: { b64_json?: string };
-        format?: string;
-        type?: "partial" | "final";
-        image_index?: number;
-        partial_index?: number;
-        total_partials?: number;
-      }
+      if (!isStreaming) {
+        // Non-streaming: parse JSON response directly
+        const jsonResponse = (await response.json()) as ImageApiResponse;
+        const format = params.outputFormat || "png";
+        const mimeType = `image/${format}`;

-      const numImages = params.numImages;
+        const attachments: MessageAttachment[] = jsonResponse.data
+          .filter((img) => img.b64_json)
+          .map((img, index) => ({
+            type: "generated-image" as const,
+            name: `generated-image-${index + 1}.${format}`,
+            preview: `data:${mimeType};base64,${img.b64_json}`,
+            mimeType,
+          }));

-      await this.parseSSEStream<ImageGenerationChunk>(
-        reader,
-        targetConversationId,
-        (parsed) => {
-          const imageData = parsed.data?.b64_json;
+        this.updateConversationMessage(
+          targetConversationId,
+          assistantMessage.id,
+          (msg) => {
+            msg.content = "";
+            msg.attachments = attachments;
+          },
+        );
+        this.syncActiveMessagesIfNeeded(targetConversationId);
+      } else {
+        // Streaming mode: use SSE parser
+        const reader = response.body?.getReader();
+        if (!reader) {
+          throw new Error("No response body");
+        }

-          if (imageData) {
-            const format = parsed.format || "png";
-            const mimeType = `image/${format}`;
-            const imageIndex = parsed.image_index ?? 0;
+        interface ImageGenerationChunk {
+          data?: { b64_json?: string };
+          format?: string;
+          type?: "partial" | "final";
+          image_index?: number;
+          partial_index?: number;
+          total_partials?: number;
+        }

-            if (parsed.type === "partial") {
-              // Update with partial image and progress
-              const partialNum = (parsed.partial_index ?? 0) + 1;
-              const totalPartials = parsed.total_partials ?? 3;
-              const progressText =
-                numImages > 1
-                  ? `Generating image ${imageIndex + 1}/${numImages}... ${partialNum}/${totalPartials}`
-                  : `Generating... ${partialNum}/${totalPartials}`;
+        const numImages = params.numImages;

-              const partialAttachment: MessageAttachment = {
-                type: "generated-image",
-                name: `generated-image.${format}`,
-                preview: `data:${mimeType};base64,${imageData}`,
-                mimeType,
-              };
+        await this.parseSSEStream<ImageGenerationChunk>(
+          reader,
+          targetConversationId,
+          (parsed) => {
+            const imageData = parsed.data?.b64_json;

-              this.updateConversationMessage(
-                targetConversationId,
-                assistantMessage.id,
-                (msg) => {
-                  msg.content = progressText;
-                  if (imageIndex === 0) {
-                    // First image - safe to replace attachments with partial preview
-                    msg.attachments = [partialAttachment];
-                  } else {
-                    // Subsequent images - keep existing finals, show partial at current position
-                    const existingAttachments = msg.attachments || [];
-                    // Keep only the completed final images (up to current imageIndex)
-                    const finals = existingAttachments.slice(0, imageIndex);
-                    msg.attachments = [...finals, partialAttachment];
-                  }
-                },
-              );
-            } else if (parsed.type === "final") {
-              // Final image - replace partial at this position
-              const newAttachment: MessageAttachment = {
-                type: "generated-image",
-                name: `generated-image-${imageIndex + 1}.${format}`,
-                preview: `data:${mimeType};base64,${imageData}`,
-                mimeType,
-              };
+            if (imageData) {
+              const format = parsed.format || "png";
+              const mimeType = `image/${format}`;
+              const imageIndex = parsed.image_index ?? 0;

-              this.updateConversationMessage(
-                targetConversationId,
-                assistantMessage.id,
-                (msg) => {
-                  if (imageIndex === 0) {
-                    // First final image - replace any partial preview
-                    msg.attachments = [newAttachment];
-                  } else {
-                    // Subsequent images - keep previous finals, replace partial at current position
-                    const existingAttachments = msg.attachments || [];
-                    // Slice keeps indices 0 to imageIndex-1 (the previous final images)
-                    const previousFinals = existingAttachments.slice(
-                      0,
-                      imageIndex,
-                    );
-                    msg.attachments = [...previousFinals, newAttachment];
-                  }
+              if (parsed.type === "partial") {
+                // Update with partial image and progress
+                const partialNum = (parsed.partial_index ?? 0) + 1;
+                const totalPartials = parsed.total_partials ?? 3;
+                const progressText =
+                  numImages > 1
+                    ? `Generating image ${imageIndex + 1}/${numImages}... ${partialNum}/${totalPartials}`
+                    : `Generating... ${partialNum}/${totalPartials}`;

-                  // Update progress message for multiple images
-                  if (numImages > 1 && imageIndex < numImages - 1) {
-                    msg.content = `Generating image ${imageIndex + 2}/${numImages}...`;
-                  } else {
-                    msg.content = "";
-                  }
-                },
-              );
+                const partialAttachment: MessageAttachment = {
+                  type: "generated-image",
+                  name: `generated-image.${format}`,
+                  preview: `data:${mimeType};base64,${imageData}`,
+                  mimeType,
+                };
+
+                this.updateConversationMessage(
+                  targetConversationId,
+                  assistantMessage.id,
+                  (msg) => {
+                    msg.content = progressText;
+                    if (imageIndex === 0) {
+                      // First image - safe to replace attachments with partial preview
+                      msg.attachments = [partialAttachment];
+                    } else {
+                      // Subsequent images - keep existing finals, show partial at current position
+                      const existingAttachments = msg.attachments || [];
+                      // Keep only the completed final images (up to current imageIndex)
+                      const finals = existingAttachments.slice(0, imageIndex);
+                      msg.attachments = [...finals, partialAttachment];
+                    }
+                  },
+                );
+              } else if (parsed.type === "final") {
+                // Final image - replace partial at this position
+                const newAttachment: MessageAttachment = {
+                  type: "generated-image",
+                  name: `generated-image-${imageIndex + 1}.${format}`,
+                  preview: `data:${mimeType};base64,${imageData}`,
+                  mimeType,
+                };
+
+                this.updateConversationMessage(
+                  targetConversationId,
+                  assistantMessage.id,
+                  (msg) => {
+                    if (imageIndex === 0) {
+                      // First final image - replace any partial preview
+                      msg.attachments = [newAttachment];
+                    } else {
+                      // Subsequent images - keep previous finals, replace partial at current position
+                      const existingAttachments = msg.attachments || [];
+                      // Slice keeps indices 0 to imageIndex-1 (the previous final images)
+                      const previousFinals = existingAttachments.slice(
+                        0,
+                        imageIndex,
+                      );
+                      msg.attachments = [...previousFinals, newAttachment];
+                    }
+
+                    // Update progress message for multiple images
+                    if (numImages > 1 && imageIndex < numImages - 1) {
+                      msg.content = `Generating image ${imageIndex + 2}/${numImages}...`;
+                    } else {
+                      msg.content = "";
+                    }
+                  },
+                );
+              }
+
+              this.syncActiveMessagesIfNeeded(targetConversationId);
            }
-
-            this.syncActiveMessagesIfNeeded(targetConversationId);
-          }
-        },
-      );
+          },
+        );
+      }
    } catch (error) {
      console.error("Error generating image:", error);
      this.handleStreamingError(
@@ -2343,69 +2378,98 @@ class AppStore {
        throw new Error(`API error: ${apiResponse.status} - ${errorText}`);
      }

-      const reader = apiResponse.body?.getReader();
-      if (!reader) {
-        throw new Error("No response body");
-      }
+      // Streaming requires both stream=true AND partialImages > 0
+      const isStreaming = params.stream && params.partialImages > 0;

-      interface ImageEditChunk {
-        data?: { b64_json?: string };
-        format?: string;
-        type?: "partial" | "final";
-        partial_index?: number;
-        total_partials?: number;
-      }
+      if (!isStreaming) {
+        // Non-streaming: parse JSON response directly
+        const jsonResponse = (await apiResponse.json()) as ImageApiResponse;
+        const format = params.outputFormat || "png";
+        const mimeType = `image/${format}`;
+        const attachments: MessageAttachment[] = jsonResponse.data
+          .filter((img) => img.b64_json)
+          .map((img) => ({
+            type: "generated-image" as const,
+            name: `edited-image.${format}`,
+            preview: `data:${mimeType};base64,${img.b64_json}`,
+            mimeType,
+          }));

-      await this.parseSSEStream<ImageEditChunk>(
-        reader,
-        targetConversationId,
-        (parsed) => {
-          const imageData = parsed.data?.b64_json;
+        this.updateConversationMessage(
+          targetConversationId,
+          assistantMessage.id,
+          (msg) => {
+            msg.content = "";
+            msg.attachments = attachments;
+          },
+        );
+        this.syncActiveMessagesIfNeeded(targetConversationId);
+      } else {
+        // Streaming mode: use SSE parser
+        const reader = apiResponse.body?.getReader();
+        if (!reader) {
+          throw new Error("No response body");
+        }

-          if (imageData) {
-            const format = parsed.format || "png";
-            const mimeType = `image/${format}`;
-            if (parsed.type === "partial") {
-              // Update with partial image and progress
-              const partialNum = (parsed.partial_index ?? 0) + 1;
-              const totalPartials = parsed.total_partials ?? 3;
-              this.updateConversationMessage(
-                targetConversationId,
-                assistantMessage.id,
-                (msg) => {
-                  msg.content = `Editing... ${partialNum}/${totalPartials}`;
-                  msg.attachments = [
-                    {
-                      type: "generated-image",
-                      name: `edited-image.${format}`,
-                      preview: `data:${mimeType};base64,${imageData}`,
-                      mimeType,
-                    },
-                  ];
-                },
-              );
-            } else if (parsed.type === "final") {
-              // Final image
-              this.updateConversationMessage(
-                targetConversationId,
-                assistantMessage.id,
-                (msg) => {
-                  msg.content = "";
-                  msg.attachments = [
-                    {
-                      type: "generated-image",
-                      name: `edited-image.${format}`,
-                      preview: `data:${mimeType};base64,${imageData}`,
-                      mimeType,
-                    },
-                  ];
-                },
-              );
+        interface ImageEditChunk {
+          data?: { b64_json?: string };
+          format?: string;
+          type?: "partial" | "final";
+          partial_index?: number;
+          total_partials?: number;
+        }
+
+        await this.parseSSEStream<ImageEditChunk>(
+          reader,
+          targetConversationId,
+          (parsed) => {
+            const imageData = parsed.data?.b64_json;
+
+            if (imageData) {
+              const format = parsed.format || "png";
+              const mimeType = `image/${format}`;
+              if (parsed.type === "partial") {
+                // Update with partial image and progress
+                const partialNum = (parsed.partial_index ?? 0) + 1;
+                const totalPartials = parsed.total_partials ?? 3;
+                this.updateConversationMessage(
+                  targetConversationId,
+                  assistantMessage.id,
+                  (msg) => {
+                    msg.content = `Editing... ${partialNum}/${totalPartials}`;
+                    msg.attachments = [
+                      {
+                        type: "generated-image",
+                        name: `edited-image.${format}`,
+                        preview: `data:${mimeType};base64,${imageData}`,
+                        mimeType,
+                      },
+                    ];
+                  },
+                );
+              } else if (parsed.type === "final") {
+                // Final image
+                this.updateConversationMessage(
+                  targetConversationId,
+                  assistantMessage.id,
+                  (msg) => {
+                    msg.content = "";
+                    msg.attachments = [
+                      {
+                        type: "generated-image",
+                        name: `edited-image.${format}`,
+                        preview: `data:${mimeType};base64,${imageData}`,
+                        mimeType,
+                      },
+                    ];
+                  },
+                );
+              }
+              this.syncActiveMessagesIfNeeded(targetConversationId);
            }
-            this.syncActiveMessagesIfNeeded(targetConversationId);
-          }
-        },
-      );
+          },
+        );
+      }
    } catch (error) {
      console.error("Error editing image:", error);
      this.handleStreamingError(
--- a/flake.lock
+++ b/flake.lock
@@ -21,7 +21,9 @@
          "nixpkgs"
        ],
        "purescript-overlay": "purescript-overlay",
-        "pyproject-nix": "pyproject-nix"
+        "pyproject-nix": [
+          "pyproject-nix"
+        ]
      },
      "locked": {
        "lastModified": 1765953015,
@@ -149,19 +151,44 @@
        "type": "github"
      }
    },
+    "pyproject-build-systems": {
+      "inputs": {
+        "nixpkgs": [
+          "nixpkgs"
+        ],
+        "pyproject-nix": [
+          "pyproject-nix"
+        ],
+        "uv2nix": [
+          "uv2nix"
+        ]
+      },
+      "locked": {
+        "lastModified": 1763662255,
+        "narHash": "sha256-4bocaOyLa3AfiS8KrWjZQYu+IAta05u3gYZzZ6zXbT0=",
+        "owner": "pyproject-nix",
+        "repo": "build-system-pkgs",
+        "rev": "042904167604c681a090c07eb6967b4dd4dae88c",
+        "type": "github"
+      },
+      "original": {
+        "owner": "pyproject-nix",
+        "repo": "build-system-pkgs",
+        "type": "github"
+      }
+    },
    "pyproject-nix": {
      "inputs": {
        "nixpkgs": [
-          "dream2nix",
          "nixpkgs"
        ]
      },
      "locked": {
-        "lastModified": 1763017646,
-        "narHash": "sha256-Z+R2lveIp6Skn1VPH3taQIuMhABg1IizJd8oVdmdHsQ=",
+        "lastModified": 1764134915,
+        "narHash": "sha256-xaKvtPx6YAnA3HQVp5LwyYG1MaN4LLehpQI8xEdBvBY=",
        "owner": "pyproject-nix",
        "repo": "pyproject.nix",
-        "rev": "47bd6f296502842643078d66128f7b5e5370790c",
+        "rev": "2c8df1383b32e5443c921f61224b198a2282a657",
        "type": "github"
      },
      "original": {
@@ -178,7 +205,10 @@
        "flake-parts": "flake-parts",
        "nixpkgs": "nixpkgs",
        "nixpkgs-swift": "nixpkgs-swift",
-        "treefmt-nix": "treefmt-nix"
+        "pyproject-build-systems": "pyproject-build-systems",
+        "pyproject-nix": "pyproject-nix",
+        "treefmt-nix": "treefmt-nix",
+        "uv2nix": "uv2nix"
      }
    },
    "rust-analyzer-src": {
@@ -239,6 +269,29 @@
        "repo": "treefmt-nix",
        "type": "github"
      }
+    },
+    "uv2nix": {
+      "inputs": {
+        "nixpkgs": [
+          "nixpkgs"
+        ],
+        "pyproject-nix": [
+          "pyproject-nix"
+        ]
+      },
+      "locked": {
+        "lastModified": 1767701098,
+        "narHash": "sha256-CJhKZnWb3gumR9oTRjFvCg/6lYTGbZRU7xtvcyWIRwU=",
+        "owner": "pyproject-nix",
+        "repo": "uv2nix",
+        "rev": "9d357f0d2ce6f5f35ec7959d7e704452352eb4da",
+        "type": "github"
+      },
+      "original": {
+        "owner": "pyproject-nix",
+        "repo": "uv2nix",
+        "type": "github"
+      }
    }
  },
  "root": "root",
--- a/flake.nix
+++ b/flake.nix
@@ -24,6 +24,26 @@
    dream2nix = {
      url = "github:nix-community/dream2nix";
      inputs.nixpkgs.follows = "nixpkgs";
+      inputs.pyproject-nix.follows = "pyproject-nix";
+    };
+
+    # Python packaging with uv2nix
+    pyproject-nix = {
+      url = "github:pyproject-nix/pyproject.nix";
+      inputs.nixpkgs.follows = "nixpkgs";
+    };
+
+    uv2nix = {
+      url = "github:pyproject-nix/uv2nix";
+      inputs.pyproject-nix.follows = "pyproject-nix";
+      inputs.nixpkgs.follows = "nixpkgs";
+    };
+
+    pyproject-build-systems = {
+      url = "github:pyproject-nix/build-system-pkgs";
+      inputs.pyproject-nix.follows = "pyproject-nix";
+      inputs.uv2nix.follows = "uv2nix";
+      inputs.nixpkgs.follows = "nixpkgs";
    };

    # Pinned nixpkgs for swift-format (swift is broken on x86_64-linux in newer nixpkgs)
@@ -48,6 +68,7 @@
        inputs.treefmt-nix.flakeModule
        ./dashboard/parts.nix
        ./rust/parts.nix
+        ./python/parts.nix
      ];

      perSystem =
@@ -58,6 +79,11 @@
          pkgsSwift = import inputs.nixpkgs-swift { inherit system; };
        in
        {
+          # Allow unfree for metal-toolchain (needed for Darwin Metal packages)
+          _module.args.pkgs = import inputs.nixpkgs {
+            inherit system;
+            config.allowUnfreePredicate = pkg: (pkg.pname or "") == "metal-toolchain";
+          };
          treefmt = {
            projectRootFile = "flake.nix";
            programs = {
@@ -79,14 +105,24 @@
                enable = true;
                package = pkgsSwift.swiftPackages.swift-format;
              };
+              shfmt.enable = true;
            };
          };

-          checks.lint = pkgs.runCommand "lint-check" { } ''
-            export RUFF_CACHE_DIR="$TMPDIR/ruff-cache"
-            ${pkgs.ruff}/bin/ruff check ${inputs.self}/
-            touch $out
-          '';
+          packages = lib.optionalAttrs pkgs.stdenv.hostPlatform.isDarwin (
+            let
+              uvLock = builtins.fromTOML (builtins.readFile ./uv.lock);
+              mlxPackage = builtins.head (builtins.filter (p: p.name == "mlx") uvLock.package);
+              uvLockMlxVersion = mlxPackage.version;
+            in
+            {
+              metal-toolchain = pkgs.callPackage ./nix/metal-toolchain.nix { };
+              mlx = pkgs.callPackage ./nix/mlx.nix {
+                metal-toolchain = self'.packages.metal-toolchain;
+                inherit uvLockMlxVersion;
+              };
+            }
+          );

          devShells.default = with pkgs; pkgs.mkShell {
            inputsFrom = [ self'.checks.cargo-build ];
--- a/2
+++ b/2
@@ -1,7 +1,7 @@
 export NIX_CONFIG := "extra-experimental-features = nix-command flakes"

 fmt:
-    nix fmt
+    treefmt || nix fmt

 lint:
    uv run ruff check --fix
--- a/nix/darwin-build-fixes.patch
+++ b/nix/darwin-build-fixes.patch
@@ -0,0 +1,79 @@
+diff --git a/CMakeLists.txt b/CMakeLists.txt
+index 0ed30932..d8528132 100644
+--- a/CMakeLists.txt
+++ b/CMakeLists.txt
+@@ -177,11 +177,7 @@ if(MLX_BUILD_METAL)
+     add_compile_definitions(MLX_METAL_DEBUG)
+   endif()
+
+-  # Throw an error if xcrun not found
+-  execute_process(
+-    COMMAND zsh "-c" "/usr/bin/xcrun -sdk macosx --show-sdk-version"
+-    OUTPUT_VARIABLE MACOS_SDK_VERSION
+-    OUTPUT_STRIP_TRAILING_WHITESPACE COMMAND_ERROR_IS_FATAL ANY)
+  set(MACOS_SDK_VERSION @sdkVersion@)
+
+   if(${MACOS_SDK_VERSION} LESS 14.0)
+     message(
+@@ -199,11 +195,8 @@ if(MLX_BUILD_METAL)
+     endif()
+     set(XCRUN_FLAGS "-mmacosx-version-min=${CMAKE_OSX_DEPLOYMENT_TARGET}")
+   endif()
+-  execute_process(
+-    COMMAND
+-      zsh "-c"
+-      "echo \"__METAL_VERSION__\" | xcrun -sdk macosx metal ${XCRUN_FLAGS} -E -x metal -P - | tail -1 | tr -d '\n'"
+-    OUTPUT_VARIABLE MLX_METAL_VERSION COMMAND_ERROR_IS_FATAL ANY)
+  set(
+    MLX_METAL_VERSION @metalVersion@)
+   FetchContent_Declare(metal_cpp URL ${METAL_CPP_URL})
+   FetchContent_MakeAvailable(metal_cpp)
+   target_include_directories(
+diff --git a/cmake/extension.cmake b/cmake/extension.cmake
+index 13db804a..5b385132 100644
+--- a/cmake/extension.cmake
+++ b/cmake/extension.cmake
+@@ -36,7 +36,7 @@ macro(mlx_build_metallib)
+   add_custom_command(
+     OUTPUT ${MTLLIB_BUILD_TARGET}
+     COMMAND
+-      xcrun -sdk macosx metal
+      metal -fmodules-cache-path=${CMAKE_BINARY_DIR}/metal-cache
+       "$<LIST:TRANSFORM,${MTLLIB_INCLUDE_DIRS},PREPEND,-I>"
+       ${MTLLIB_COMPILE_OPTIONS} ${MTLLIB_SOURCES} -o ${MTLLIB_BUILD_TARGET}
+     DEPENDS ${MTLLIB_DEPS} ${MTLLIB_SOURCES}
+diff --git a/mlx/backend/metal/kernels/CMakeLists.txt b/mlx/backend/metal/kernels/CMakeLists.txt
+index 262b0495..5c7446ad 100644
+--- a/mlx/backend/metal/kernels/CMakeLists.txt
+++ b/mlx/backend/metal/kernels/CMakeLists.txt
+@@ -29,7 +29,7 @@ function(build_kernel_base TARGET SRCFILE DEPS)
+                     "-mmacosx-version-min=${CMAKE_OSX_DEPLOYMENT_TARGET}")
+   endif()
+   add_custom_command(
+-    COMMAND xcrun -sdk macosx metal ${METAL_FLAGS} -c ${SRCFILE}
+    COMMAND metal -fmodules-cache-path=${CMAKE_BINARY_DIR}/metal-cache ${METAL_FLAGS} -c ${SRCFILE}
+             -I${PROJECT_SOURCE_DIR} -o ${TARGET}.air
+     DEPENDS ${SRCFILE} ${DEPS} ${BASE_HEADERS}
+     OUTPUT ${TARGET}.air
+@@ -170,7 +170,7 @@ endif()
+
+ add_custom_command(
+   OUTPUT ${MLX_METAL_PATH}/mlx.metallib
+-  COMMAND xcrun -sdk macosx metallib ${KERNEL_AIR} -o
+  COMMAND metallib ${KERNEL_AIR} -o
+           ${MLX_METAL_PATH}/mlx.metallib
+   DEPENDS ${KERNEL_AIR}
+   COMMENT "Building mlx.metallib"
+diff --git a/mlx/backend/metal/make_compiled_preamble.sh b/mlx/backend/metal/make_compiled_preamble.sh
+index bb55ed3a..94ea7dd7 100644
+--- a/mlx/backend/metal/make_compiled_preamble.sh
+++ b/mlx/backend/metal/make_compiled_preamble.sh
+@@ -31,7 +31,7 @@ OUTPUT_FILE=${OUTPUT_DIR}/${SRC_NAME}.cpp
+ mkdir -p "$OUTPUT_DIR"
+
+ # Use the metal compiler to get a list of headers (with depth)
+-CCC="xcrun -sdk macosx metal -x metal"
+CCC="metal -x metal -fmodules-cache-path=${OUTPUT_DIR}/metal-cache"
+ HDRS=$( $CCC -I"$SRC_DIR" -I"$JIT_INCLUDES" -DMLX_METAL_JIT -E -P -CC -C -H "$INPUT_FILE" $CFLAGS -w 2>&1 1>/dev/null )
+
+ # Remove any included system frameworks (for MetalPerformancePrimitive headers)
--- a/nix/metal-toolchain.nix
+++ b/nix/metal-toolchain.nix
@@ -0,0 +1,56 @@
+{ lib, stdenvNoCC, requireFile, nix }:
+
+let
+  narFile = requireFile {
+    name = "metal-toolchain-17C48.nar";
+    message = ''
+      The Metal Toolchain NAR must be available.
+
+      If you have cachix configured for exo.cachix.org, this should be automatic.
+
+      Otherwise:
+        1. Install Xcode 26+ from the App Store
+        2. Run: xcodebuild -downloadComponent MetalToolchain
+        3. Export the toolchain:
+           hdiutil attach "$(find /System/Library/AssetsV2/com_apple_MobileAsset_MetalToolchain -name '*.dmg' | head -1)" -mountpoint /tmp/metal-dmg
+           cp -R /tmp/metal-dmg/Metal.xctoolchain /tmp/metal-export
+           hdiutil detach /tmp/metal-dmg
+        4. Create NAR and add to store:
+           nix nar pack /tmp/metal-export > /tmp/metal-toolchain-17C48.nar
+           nix store add --mode flat /tmp/metal-toolchain-17C48.nar
+    '';
+    hash = "sha256-ayR5mXN4sZAddwKEG2OszGRF93k9ZFc7H0yi2xbylQw=";
+  };
+in
+stdenvNoCC.mkDerivation {
+  pname = "metal-toolchain";
+  version = "17C48";
+
+  dontUnpack = true;
+  dontBuild = true;
+  dontFixup = true;
+
+  nativeBuildInputs = [ nix ];
+
+  installPhase = ''
+    runHook preInstall
+
+    nix-store --restore $out < ${narFile}
+
+    # Create bin directory with symlinks for PATH
+    mkdir -p $out/bin
+    ln -s $out/usr/bin/metal $out/bin/metal
+    ln -s $out/usr/bin/metallib $out/bin/metallib
+
+    runHook postInstall
+  '';
+
+  # Metal language version for CMake (from: echo __METAL_VERSION__ | metal -E -x metal -P -)
+  passthru.metalVersion = "400";
+
+  meta = {
+    description = "Apple Metal compiler toolchain";
+    platforms = [ "aarch64-darwin" ];
+    license = lib.licenses.unfree;
+  };
+}
--- a/nix/mlx.nix
+++ b/nix/mlx.nix
@@ -0,0 +1,158 @@
+{ stdenv
+, lib
+, fetchFromGitHub
+, replaceVars
+, fetchzip
+, cmake
+, nlohmann_json
+, apple-sdk_26
+, metal-toolchain
+, runCommand
+, fmt
+, python313Packages
+, uvLockMlxVersion
+}:
+
+assert stdenv.isDarwin;
+
+let
+  python = python313Packages.python;
+
+  # Static dependencies included directly during compilation
+  gguf-tools = fetchFromGitHub {
+    owner = "antirez";
+    repo = "gguf-tools";
+    rev = "8fa6eb65236618e28fd7710a0fba565f7faa1848";
+    hash = "sha256-15FvyPOFqTOr5vdWQoPnZz+mYH919++EtghjozDlnSA=";
+  };
+
+  metal_cpp = fetchzip {
+    url = "https://developer.apple.com/metal/cpp/files/metal-cpp_26.zip";
+    hash = "sha256-7n2eI2lw/S+Us6l7YPAATKwcIbRRpaQ8VmES7S8ZjY8=";
+  };
+
+  nanobind = fetchFromGitHub {
+    owner = "wjakob";
+    repo = "nanobind";
+    rev = "v2.10.2";
+    hash = "sha256-io44YhN+VpfHFWyvvLWSanRgbzA0whK8WlDNRi3hahU=";
+    fetchSubmodules = true;
+  };
+
+  mlx = stdenv.mkDerivation rec {
+    pname = "mlx";
+    version = let v = "0.30.4"; in
+      assert v == uvLockMlxVersion || throw "MLX version mismatch: nix/mlx.nix has ${v} but uv.lock has ${uvLockMlxVersion}. Update both the version and hash in nix/mlx.nix.";
+      v;
+    pyproject = true;
+
+    src = fetchFromGitHub {
+      owner = "ml-explore";
+      repo = "mlx";
+      tag = "v${version}";
+      hash = "sha256-OJk6jPlbaSlsUdk3ADz3tWcRzTWXRof3/q8Soe1AO6w=";
+    };
+
+    patches = [
+      (replaceVars ./darwin-build-fixes.patch {
+        sdkVersion = apple-sdk_26.version;
+        metalVersion = metal-toolchain.metalVersion;
+      })
+    ];
+
+    postPatch = ''
+      substituteInPlace mlx/backend/cpu/jit_compiler.cpp \
+        --replace-fail "g++" "$CXX"
+    '';
+
+    dontUseCmakeConfigure = true;
+
+    enableParallelBuilding = true;
+
+    # Allows multiple cores to be used in Python builds.
+    postUnpack = ''
+      export MAKEFLAGS+="''${enableParallelBuilding:+-j$NIX_BUILD_CORES}"
+    '';
+
+    # Updates the wrong fetcher rev attribute
+    passthru.skipBulkUpdate = true;
+
+    env = {
+      DEV_RELEASE = 1;
+      CMAKE_ARGS = toString [
+        (lib.cmakeBool "USE_SYSTEM_FMT" true)
+        (lib.cmakeOptionType "filepath" "FETCHCONTENT_SOURCE_DIR_GGUFLIB" "${gguf-tools}")
+        (lib.cmakeOptionType "filepath" "FETCHCONTENT_SOURCE_DIR_JSON" "${nlohmann_json.src}")
+        (lib.cmakeOptionType "filepath" "FETCHCONTENT_SOURCE_DIR_NANOBIND" "${nanobind}")
+        (lib.cmakeBool "FETCHCONTENT_FULLY_DISCONNECTED" true)
+        (lib.cmakeBool "MLX_BUILD_METAL" true)
+        (lib.cmakeOptionType "filepath" "FETCHCONTENT_SOURCE_DIR_METAL_CPP" "${metal_cpp}")
+        (lib.cmakeOptionType "string" "CMAKE_OSX_DEPLOYMENT_TARGET" "${apple-sdk_26.version}")
+        (lib.cmakeOptionType "filepath" "CMAKE_OSX_SYSROOT" "${apple-sdk_26.passthru.sdkroot}")
+      ];
+      SDKROOT = apple-sdk_26.passthru.sdkroot;
+      MACOSX_DEPLOYMENT_TARGET = apple-sdk_26.version;
+    };
+
+    build-system = [
+      python313Packages.setuptools
+    ];
+
+    nativeBuildInputs = [
+      cmake
+      metal-toolchain
+      python313Packages.pypaBuildHook
+      python313Packages.pypaInstallHook
+      python313Packages.setuptools
+      python313Packages.typing-extensions
+      python313Packages.wheel
+      python313Packages.cmake
+      python313Packages.ninja
+    ];
+
+    buildInputs = [
+      fmt
+      gguf-tools
+      python313Packages.nanobind
+      python313Packages.pybind11
+      apple-sdk_26
+    ];
+
+    # Tests require Metal GPU access which isn't available in the Nix sandbox.
+    # To run tests, build with: nix build --option sandbox false .#mlx.passthru.tests.mlxTest
+    doCheck = false;
+
+    pythonImportsCheck = [ "mlx" ];
+
+    passthru.tests = {
+      # Runs example scripts to verify MLX works. Requires --option sandbox false
+      # since Metal GPU access is needed.
+      mlxTest =
+        runCommand "run-mlx-examples"
+          {
+            buildInputs = [ mlx ];
+            nativeBuildInputs = [ python ];
+          }
+          ''
+            cp ${src}/examples/python/logistic_regression.py .
+            ${python.interpreter} logistic_regression.py
+            rm logistic_regression.py
+
+            cp ${src}/examples/python/linear_regression.py .
+            ${python.interpreter} linear_regression.py
+            rm linear_regression.py
+
+            touch $out
+          '';
+    };
+
+    meta = {
+      homepage = "https://github.com/ml-explore/mlx";
+      description = "Array framework for Apple silicon";
+      changelog = "https://github.com/ml-explore/mlx/releases/tag/${src.tag}";
+      license = lib.licenses.mit;
+      platforms = [ "aarch64-darwin" ];
+    };
+  };
+in
+mlx
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -17,8 +17,8 @@ dependencies = [
    "loguru>=0.7.3",
    "exo_pyo3_bindings", # rust bindings
    "anyio==4.11.0",
-    "mlx==0.30.3; sys_platform == 'darwin'",
-    "mlx[cpu]==0.30.3; sys_platform == 'linux'",
+    "mlx==0.30.4; sys_platform == 'darwin'",
+    "mlx[cpu]==0.30.4; sys_platform == 'linux'",
    "mlx-lm==0.30.5",
    "tiktoken>=0.12.0", # required for kimi k2 tokenizer
    "hypercorn>=0.18.0",
@@ -31,8 +31,6 @@ dependencies = [
 ]

 [project.scripts]
-exo-master = "exo.master.main:main"
-exo-worker = "exo.worker.main:main"
 exo = "exo.main:main"

 # dependencies only required for development
@@ -46,12 +44,6 @@ dev = [
    "ruff>=0.11.13",
 ]

-# mlx[cuda] requires a newer version of mlx. the ideal on linux is: default to mlx[cpu] unless[cuda] specified.
-[project.optional-dependencies]
-# cuda = [
-#     "mlx[cuda]==0.26.3",
-# ]
-
 ###
 # workspace configuration
 ###
@@ -64,6 +56,7 @@ members = [
 [tool.uv.sources]
 exo_pyo3_bindings = { workspace = true }
 # Uncomment to use local mlx/mlx-lm development versions:
+# mlx-lm = { git = "https://github.com/ml-explore/mlx-lm", branch = "main" }
 # mlx = { path = "/Users/Shared/mlx", editable=true }
 # mlx-lm = { path = "/Users/Shared/mlx-lm", editable=true }

@@ -115,7 +108,7 @@ environments = [
 ###

 [tool.ruff]
-extend-exclude = ["shared/protobufs/**", "*mlx_typings/**", "rust/exo_pyo3_bindings/**"]
+extend-exclude = ["*mlx_typings/**", "rust/exo_pyo3_bindings/**"]

 [tool.ruff.lint]
 extend-select = ["I", "N", "B", "A", "PIE", "SIM"]
--- a/python/parts.nix
+++ b/python/parts.nix
@@ -0,0 +1,93 @@
+{ inputs, ... }:
+{
+  perSystem =
+    { config, self', pkgs, lib, system, ... }:
+    let
+      # Load workspace from uv.lock
+      workspace = inputs.uv2nix.lib.workspace.loadWorkspace {
+        workspaceRoot = inputs.self;
+      };
+
+      # Create overlay from workspace
+      # Use wheels from PyPI for most packages; we override mlx with our pure Nix Metal build
+      overlay = workspace.mkPyprojectOverlay { sourcePreference = "wheel"; };
+
+      # Override overlay to inject Nix-built components
+      exoOverlay = final: prev: {
+        # Replace workspace exo_pyo3_bindings with Nix-built wheel
+        exo-pyo3-bindings = pkgs.stdenv.mkDerivation {
+          pname = "exo-pyo3-bindings";
+          version = "0.1.0";
+          src = self'.packages.exo_pyo3_bindings;
+          # Install from pre-built wheel
+          nativeBuildInputs = [ final.pyprojectWheelHook ];
+          dontStrip = true;
+        };
+      };
+
+      python = pkgs.python313;
+
+      # Overlay to provide build systems and custom packages
+      buildSystemsOverlay = final: prev: {
+        # Use our pure Nix-built MLX with Metal support
+        mlx = self'.packages.mlx;
+
+        # mlx-lm is a git dependency that needs setuptools
+        mlx-lm = prev.mlx-lm.overrideAttrs (old: {
+          nativeBuildInputs = (old.nativeBuildInputs or [ ]) ++ [
+            final.setuptools
+          ];
+        });
+      };
+
+      pythonSet = (pkgs.callPackage inputs.pyproject-nix.build.packages {
+        inherit python;
+      }).overrideScope (
+        lib.composeManyExtensions [
+          inputs.pyproject-build-systems.overlays.default
+          overlay
+          exoOverlay
+          buildSystemsOverlay
+        ]
+      );
+      exoVenv = pythonSet.mkVirtualEnv "exo-env" workspace.deps.default;
+
+      # Virtual environment with dev dependencies for testing
+      testVenv = pythonSet.mkVirtualEnv "exo-test-env" (
+        workspace.deps.default // {
+          exo = [ "dev" ]; # Include pytest, pytest-asyncio, pytest-env
+        }
+      );
+
+      exoPackage = pkgs.runCommand "exo"
+        {
+          nativeBuildInputs = [ pkgs.makeWrapper ];
+        }
+        ''
+          mkdir -p $out/bin
+
+          # Create wrapper scripts
+          for script in exo exo-master exo-worker; do
+            makeWrapper ${exoVenv}/bin/$script $out/bin/$script \
+              --set DASHBOARD_DIR ${self'.packages.dashboard}
+          done
+        '';
+    in
+    {
+      # Python package only available on macOS (requires MLX/Metal)
+      packages = lib.optionalAttrs pkgs.stdenv.hostPlatform.isDarwin {
+        exo = exoPackage;
+        # Test environment for running pytest outside of Nix sandbox (needs GPU access)
+        exo-test-env = testVenv;
+      };
+
+      checks = {
+        # Ruff linting (works on all platforms)
+        lint = pkgs.runCommand "ruff-lint" { } ''
+          export RUFF_CACHE_DIR="$TMPDIR/ruff-cache"
+          ${pkgs.ruff}/bin/ruff check ${inputs.self}/
+          touch $out
+        '';
+      };
+    };
+}
--- a/src/exo/download/impl_shard_downloader.py
+++ b/src/exo/download/impl_shard_downloader.py
@@ -166,9 +166,8 @@ class ResumableShardDownloader(ShardDownloader):
        for task in asyncio.as_completed(tasks):
            try:
                yield await task
-            # TODO: except Exception
            except Exception as e:
-                logger.error("Error downloading shard:", e)
+                logger.warning(f"Error downloading shard: {type(e).__name__}")

    async def get_shard_download_status_for_shard(
        self, shard: ShardMetadata
--- a/src/exo/master/api.py
+++ b/src/exo/master/api.py
@@ -65,7 +65,9 @@ from exo.shared.types.api import (
    StartDownloadParams,
    StartDownloadResponse,
    StreamingChoiceResponse,
+    StreamOptions,
    ToolCall,
+    Usage,
 )
 from exo.shared.types.chunks import (
    ErrorChunk,
@@ -113,7 +115,9 @@ def _format_to_content_type(image_format: Literal["png", "jpeg", "webp"] | None)


 def chunk_to_response(
-    chunk: TokenChunk | ToolCallChunk, command_id: CommandId
+    chunk: TokenChunk | ToolCallChunk,
+    command_id: CommandId,
+    usage: Usage | None,
 ) -> ChatCompletionResponse:
    return ChatCompletionResponse(
        id=command_id,
@@ -138,6 +142,7 @@ def chunk_to_response(
                finish_reason=chunk.finish_reason,
            )
        ],
+        usage=usage,
    )


@@ -522,9 +527,10 @@ class API:
                del self._chat_completion_queues[command_id]

    async def _generate_chat_stream(
-        self, command_id: CommandId
+        self, command_id: CommandId, stream_options: StreamOptions | None = None
    ) -> AsyncGenerator[str, None]:
        """Generate chat completion stream as JSON strings."""
+        include_usage = stream_options.include_usage if stream_options else False

        async for chunk in self._chat_chunk_stream(command_id):
            assert not isinstance(chunk, ImageChunk)
@@ -540,8 +546,10 @@ class API:
                yield "data: [DONE]\n\n"
                return

+            usage = chunk.usage if include_usage else None
+
            chunk_response: ChatCompletionResponse = chunk_to_response(
-                chunk, command_id
+                chunk, command_id, usage=usage
            )
            logger.debug(f"chunk_response: {chunk_response}")

@@ -559,6 +567,7 @@ class API:
        tool_calls: list[ToolCall] = []
        model: str | None = None
        finish_reason: FinishReason | None = None
+        usage: Usage | None = None

        async for chunk in self._chat_chunk_stream(command_id):
            if isinstance(chunk, ErrorChunk):
@@ -583,6 +592,9 @@ class API:
                    for i, tool in enumerate(chunk.tool_calls)
                )

+            if chunk.usage is not None:
+                usage = chunk.usage
+
            if chunk.finish_reason is not None:
                finish_reason = chunk.finish_reason

@@ -604,6 +616,7 @@ class API:
                    finish_reason=finish_reason,
                )
            ],
+            usage=usage,
        )

    async def _collect_chat_completion_with_stats(
@@ -691,7 +704,7 @@ class API:
        await self._send(command)
        if payload.stream:
            return StreamingResponse(
-                self._generate_chat_stream(command.command_id),
+                self._generate_chat_stream(command.command_id, payload.stream_options),
                media_type="text/event-stream",
            )

--- a/src/exo/shared/models/model_cards.py
+++ b/src/exo/shared/models/model_cards.py
@@ -1,5 +1,5 @@
 from enum import Enum
-from typing import Annotated
+from typing import Annotated, Any

 import aiofiles
 import aiofiles.os as aios
@@ -7,7 +7,14 @@ import tomlkit
 from anyio import Path, open_file
 from huggingface_hub import model_info
 from loguru import logger
-from pydantic import BaseModel, Field, PositiveInt, field_validator
+from pydantic import (
+    AliasChoices,
+    BaseModel,
+    Field,
+    PositiveInt,
+    field_validator,
+    model_validator,
+)

 from exo.shared.constants import EXO_ENABLE_IMAGE_MODELS
 from exo.shared.types.common import ModelId
@@ -121,6 +128,14 @@ MODEL_CARDS: dict[str, ModelCard] = {
        supports_tensor=True,
        tasks=[ModelTask.TextGeneration],
    ),
+    "kimi-k2.5": ModelCard(
+        model_id=ModelId("mlx-community/Kimi-K2.5"),
+        storage_size=Memory.from_gb(617),
+        n_layers=61,
+        hidden_size=7168,
+        supports_tensor=True,
+        tasks=[ModelTask.TextGeneration],
+    ),
    # llama-3.1
    "llama-3.1-8b": ModelCard(
        model_id=ModelId("mlx-community/Meta-Llama-3.1-8B-Instruct-4bit"),
@@ -703,15 +718,18 @@ if EXO_ENABLE_IMAGE_MODELS:
 class ConfigData(BaseModel):
    model_config = {"extra": "ignore"}  # Allow unknown fields

-    # Common field names for number of layers across different architectures
-    num_hidden_layers: Annotated[int, Field(ge=0)] | None = None
-    num_layers: Annotated[int, Field(ge=0)] | None = None
-    n_layer: Annotated[int, Field(ge=0)] | None = None
-    n_layers: Annotated[int, Field(ge=0)] | None = None  # Sometimes used
-    num_decoder_layers: Annotated[int, Field(ge=0)] | None = None  # Transformer models
-    decoder_layers: Annotated[int, Field(ge=0)] | None = None  # Some architectures
-    hidden_size: Annotated[int, Field(ge=0)] | None = None
    architectures: list[str] | None = None
+    hidden_size: Annotated[int, Field(ge=0)] | None = None
+    layer_count: int = Field(
+        validation_alias=AliasChoices(
+            "num_hidden_layers",
+            "num_layers",
+            "n_layer",
+            "n_layers",
+            "num_decoder_layers",
+            "decoder_layers",
+        )
+    )

    @property
    def supports_tensor(self) -> bool:
@@ -726,25 +744,27 @@ class ConfigData(BaseModel):
            ["GptOssForCausalLM"],
        ]

-    @property
-    def layer_count(self) -> int:
-        # Check common field names for layer count
-        layer_fields = [
-            self.num_hidden_layers,
-            self.num_layers,
-            self.n_layer,
-            self.n_layers,
-            self.num_decoder_layers,
-            self.decoder_layers,
-        ]
+    @model_validator(mode="before")
+    @classmethod
+    def defer_to_text_config(cls, data: dict[str, Any]):
+        text_config = data.get("text_config")
+        if text_config is None:
+            return data

-        for layer_count in layer_fields:
-            if layer_count is not None:
-                return layer_count
+        for field in [
+            "architectures",
+            "hidden_size",
+            "num_hidden_layers",
+            "num_layers",
+            "n_layer",
+            "n_layers",
+            "num_decoder_layers",
+            "decoder_layers",
+        ]:
+            if (val := text_config.get(field)) is not None:  # pyright: ignore[reportAny]
+                data[field] = val

-        raise ValueError(
-            f"No layer count found in config.json: {self.model_dump_json()}"
-        )
+        return data


 async def get_config_data(model_id: ModelId) -> ConfigData:
--- a/src/exo/shared/types/api.py
+++ b/src/exo/shared/types/api.py
@@ -11,7 +11,7 @@ from exo.shared.types.common import CommandId, NodeId
 from exo.shared.types.memory import Memory
 from exo.shared.types.worker.instances import Instance, InstanceId, InstanceMeta
 from exo.shared.types.worker.shards import Sharding, ShardMetadata
-from exo.utils.pydantic_ext import CamelCaseModel
+from exo.utils.pydantic_ext import CamelCaseModel, ConfigDict, TaggedModel

 FinishReason = Literal[
    "stop", "length", "tool_calls", "content_filter", "function_call", "error"
@@ -116,8 +116,8 @@ class Usage(BaseModel):
    prompt_tokens: int
    completion_tokens: int
    total_tokens: int
-    prompt_tokens_details: PromptTokensDetails | None = None
-    completion_tokens_details: CompletionTokensDetails | None = None
+    prompt_tokens_details: PromptTokensDetails
+    completion_tokens_details: CompletionTokensDetails


 class StreamingChoiceResponse(BaseModel):
@@ -170,7 +170,13 @@ class BenchChatCompletionResponse(ChatCompletionResponse):
    generation_stats: GenerationStats | None = None


-class ChatCompletionTaskParams(BaseModel):
+class StreamOptions(BaseModel):
+    include_usage: bool = False
+
+
+class ChatCompletionTaskParams(TaggedModel):
+    model_config = ConfigDict(extra="ignore")
+
    model: str
    frequency_penalty: float | None = None
    messages: list[ChatCompletionMessage]
@@ -184,6 +190,7 @@ class ChatCompletionTaskParams(BaseModel):
    seed: int | None = None
    stop: str | list[str] | None = None
    stream: bool = False
+    stream_options: StreamOptions | None = None
    temperature: float | None = None
    top_p: float | None = None
    tools: list[dict[str, Any]] | None = None
--- a/src/exo/shared/types/chunks.py
+++ b/src/exo/shared/types/chunks.py
@@ -2,7 +2,7 @@ from collections.abc import Generator
 from typing import Any, Literal

 from exo.shared.models.model_cards import ModelId
-from exo.shared.types.api import GenerationStats, ImageGenerationStats
+from exo.shared.types.api import GenerationStats, ImageGenerationStats, Usage
 from exo.utils.pydantic_ext import TaggedModel

 from .api import FinishReason
@@ -17,6 +17,7 @@ class BaseChunk(TaggedModel):
 class TokenChunk(BaseChunk):
    text: str
    token_id: int
+    usage: Usage | None
    finish_reason: Literal["stop", "length", "content_filter"] | None = None
    stats: GenerationStats | None = None

@@ -28,6 +29,7 @@ class ErrorChunk(BaseChunk):

 class ToolCallChunk(BaseChunk):
    tool_calls: list[ToolCallItem]
+    usage: Usage | None
    finish_reason: Literal["tool_calls"] = "tool_calls"
    stats: GenerationStats | None = None

--- a/src/exo/shared/types/commands.py
+++ b/src/exo/shared/types/commands.py
@@ -2,6 +2,7 @@ from pydantic import Field

 from exo.shared.models.model_cards import ModelCard, ModelId
 from exo.shared.types.api import (
+    BenchChatCompletionTaskParams,
    ChatCompletionTaskParams,
    ImageEditsInternalParams,
    ImageGenerationTaskParams,
@@ -22,7 +23,7 @@ class TestCommand(BaseCommand):


 class ChatCompletion(BaseCommand):
-    request_params: ChatCompletionTaskParams
+    request_params: ChatCompletionTaskParams | BenchChatCompletionTaskParams


 class ImageGeneration(BaseCommand):
--- a/src/exo/shared/types/tasks.py
+++ b/src/exo/shared/types/tasks.py
@@ -3,6 +3,7 @@ from enum import Enum
 from pydantic import Field

 from exo.shared.types.api import (
+    BenchChatCompletionTaskParams,
    ChatCompletionTaskParams,
    ImageEditsInternalParams,
    ImageGenerationTaskParams,
@@ -54,7 +55,7 @@ class StartWarmup(BaseTask):  # emitted by Worker

 class ChatCompletion(BaseTask):  # emitted by Master
    command_id: CommandId
-    task_params: ChatCompletionTaskParams
+    task_params: ChatCompletionTaskParams | BenchChatCompletionTaskParams

    error_type: str | None = Field(default=None)
    error_message: str | None = Field(default=None)
--- a/src/exo/shared/types/worker/runner_response.py
+++ b/src/exo/shared/types/worker/runner_response.py
@@ -6,6 +6,7 @@ from exo.shared.types.api import (
    GenerationStats,
    ImageGenerationStats,
    ToolCallItem,
+    Usage,
 )
 from exo.utils.pydantic_ext import TaggedModel

@@ -24,6 +25,7 @@ class GenerationResponse(BaseRunnerResponse):
    # logprobs: list[float] | None = None # too big. we can change to be top-k
    finish_reason: FinishReason | None = None
    stats: GenerationStats | None = None
+    usage: Usage | None


 class ImageGenerationResponse(BaseRunnerResponse):
@@ -57,6 +59,7 @@ class PartialImageResponse(BaseRunnerResponse):

 class ToolCallResponse(BaseRunnerResponse):
    tool_calls: list[ToolCallItem]
+    usage: Usage | None


 class FinishedResponse(BaseRunnerResponse):
--- a/src/exo/worker/engines/image/generate.py
+++ b/src/exo/worker/engines/image/generate.py
@@ -98,8 +98,8 @@ def generate_image(

    partial_images = (
        task.partial_images
-        if task.partial_images is not None
-        else (3 if task.stream else 0)
+        if task.partial_images is not None and task.stream is not None and task.stream
+        else 0
    )

    image_path: Path | None = None
--- a/src/exo/worker/engines/image/pipeline/runner.py
+++ b/src/exo/worker/engines/image/pipeline/runner.py
@@ -348,6 +348,7 @@ class DiffusionRunner:
                ctx.in_loop(  # pyright: ignore[reportAny]
                    t=t,
                    latents=latents,
+                    time_steps=time_steps,
                )

                mx.eval(latents)
--- a/src/exo/worker/engines/mlx/auto_parallel.py
+++ b/src/exo/worker/engines/mlx/auto_parallel.py
@@ -23,6 +23,7 @@ from mlx_lm.models.glm4_moe_lite import Glm4MoeLiteDecoderLayer, Glm4MoeLiteMLP
 from mlx_lm.models.glm4_moe_lite import Model as GLM4MoeLiteModel
 from mlx_lm.models.gpt_oss import GptOssMoeModel
 from mlx_lm.models.gpt_oss import Model as GptOssModel
+from mlx_lm.models.kimi_k25 import Model as KimiK25Model
 from mlx_lm.models.llama import Model as LlamaModel
 from mlx_lm.models.minimax import Model as MiniMaxModel
 from mlx_lm.models.ministral3 import Model as Ministral3Model
@@ -200,6 +201,9 @@ def pipeline_auto_parallel(
    device_rank, world_size = model_shard_meta.device_rank, model_shard_meta.world_size

    layers = layers[start_layer:end_layer]
+    for layer in layers:
+        mx.eval(layer)  # type: ignore
+
    layers[0] = PipelineFirstLayer(layers[0], device_rank, group=group)
    layers[-1] = PipelineLastLayer(
        layers[-1],
@@ -344,7 +348,7 @@ def tensor_auto_parallel(
            all_to_sharded_linear_in_place,
            sharded_to_all_linear_in_place,
        )
-    elif isinstance(model, (DeepseekV3Model, DeepseekV32Model)):
+    elif isinstance(model, (DeepseekV3Model, DeepseekV32Model, KimiK25Model)):
        tensor_parallel_sharding_strategy = DeepSeekShardingStrategy(
            group,
            all_to_sharded_linear,
@@ -453,7 +457,7 @@ def _set_layers(model: nn.Module, layers: list[_LayerCallable]) -> None:

        # Update DeepSeek V3 specific parameters when layers are shrunk
        if isinstance(
-            model, (DeepseekV3Model, DeepseekV32Model, Glm4MoeModel)
+            model, (DeepseekV3Model, DeepseekV32Model, Glm4MoeModel, KimiK25Model)
        ) and hasattr(inner_model_instance, "num_layers"):
            logger.info(
                f"Setting num_layers to {len(layers)} for model {model.model.__class__.__name__}"
@@ -622,6 +626,7 @@ class MiniMaxShardingStrategy(TensorParallelShardingStrategy):
        on_timeout: TimeoutCallback | None,
    ) -> nn.Module:
        model = cast(MiniMaxModel, model)
+        rank = self.group.rank()
        for layer in model.layers:
            eval_with_timeout(
                layer.parameters(), timeout_seconds / len(model.layers), on_timeout
@@ -631,6 +636,16 @@ class MiniMaxShardingStrategy(TensorParallelShardingStrategy):
            layer.self_attn.k_proj = self.all_to_sharded_linear(layer.self_attn.k_proj)
            layer.self_attn.v_proj = self.all_to_sharded_linear(layer.self_attn.v_proj)
            layer.self_attn.o_proj = self.sharded_to_all_linear(layer.self_attn.o_proj)
+
+            # Shard qk_norm weights if present (must match sharded head count)
+            if getattr(layer.self_attn, "use_qk_norm", False):
+                layer.self_attn.q_norm.weight = layer.self_attn.q_norm.weight.split(  # type: ignore
+                    self.N, axis=-1
+                )[rank]
+                layer.self_attn.k_norm.weight = layer.self_attn.k_norm.weight.split(  # type: ignore
+                    self.N, axis=-1
+                )[rank]
+
            layer.self_attn.num_attention_heads //= self.N
            layer.self_attn.num_key_value_heads //= self.N

--- a/src/exo/worker/engines/mlx/generator/generate.py
+++ b/src/exo/worker/engines/mlx/generator/generate.py
@@ -10,8 +10,11 @@ from mlx_lm.tokenizer_utils import TokenizerWrapper
 from exo.shared.types.api import (
    BenchChatCompletionTaskParams,
    ChatCompletionMessage,
+    CompletionTokensDetails,
    FinishReason,
    GenerationStats,
+    PromptTokensDetails,
+    Usage,
 )
 from exo.shared.types.memory import Memory
 from exo.shared.types.mlx import KVCacheType
@@ -39,7 +42,7 @@ def prefill(
    sampler: Callable[[mx.array], mx.array],
    prompt_tokens: mx.array,
    cache: KVCacheType,
-) -> float:
+) -> tuple[float, int]:
    """Prefill the KV cache with prompt tokens.

    This runs the model over the prompt tokens to populate the cache,
@@ -50,7 +53,7 @@ def prefill(
    """
    num_tokens = len(prompt_tokens)
    if num_tokens == 0:
-        return 0.0
+        return 0.0, 0

    logger.debug(f"Prefilling {num_tokens} tokens...")
    start_time = time.perf_counter()
@@ -85,7 +88,7 @@ def prefill(
        f"Prefill complete: {num_tokens} tokens in {elapsed:.2f}s "
        f"({tokens_per_sec:.1f} tok/s)"
    )
-    return tokens_per_sec
+    return tokens_per_sec, num_tokens


 def warmup_inference(
@@ -169,6 +172,8 @@ def mlx_generate(
    mx.reset_peak_memory()
    is_bench: bool = isinstance(task, BenchChatCompletionTaskParams)

+    logger.info(f"{is_bench=}")
+
    # Currently we support chat-completion tasks only.
    logger.debug(f"task_params: {task}")

@@ -204,7 +209,9 @@ def mlx_generate(
    )

    # Prefill cache with all tokens except the last one
-    prefill_tps = prefill(model, tokenizer, sampler, prompt_tokens[:-1], caches)
+    prefill_tps, prefill_tokens = prefill(
+        model, tokenizer, sampler, prompt_tokens[:-1], caches
+    )

    # stream_generate starts from the last token
    last_token = prompt_tokens[-1:]
@@ -212,28 +219,43 @@ def mlx_generate(
    max_tokens = task.max_tokens or MAX_TOKENS
    generated_text_parts: list[str] = []
    generation_start_time = time.perf_counter()
-    for out in stream_generate(
-        model=model,
-        tokenizer=tokenizer,
-        prompt=last_token,
-        max_tokens=max_tokens,
-        sampler=sampler,
-        logits_processors=logits_processors,
-        prompt_cache=caches,
-        # TODO: Dynamically change prefill step size to be the maximum possible without timing out.
-        prefill_step_size=2048,
-        kv_group_size=KV_GROUP_SIZE,
-        kv_bits=KV_BITS,
+    usage: Usage | None = None
+    in_thinking = False
+    reasoning_tokens = 0
+    think_start = tokenizer.think_start
+    think_end = tokenizer.think_end
+    for completion_tokens, out in enumerate(
+        stream_generate(
+            model=model,
+            tokenizer=tokenizer,
+            prompt=last_token,
+            max_tokens=max_tokens,
+            sampler=sampler,
+            logits_processors=logits_processors,
+            prompt_cache=caches,
+            # TODO: Dynamically change prefill step size to be the maximum possible without timing out.
+            prefill_step_size=2048,
+            kv_group_size=KV_GROUP_SIZE,
+            kv_bits=KV_BITS,
+        ),
+        start=1,
    ):
        generated_text_parts.append(out.text)
        logger.info(out.text)

+        if think_start is not None and out.text == think_start:
+            in_thinking = True
+        elif think_end is not None and out.text == think_end:
+            in_thinking = False
+        if in_thinking:
+            reasoning_tokens += 1
+
        stats: GenerationStats | None = None
        if out.finish_reason is not None:
            stats = GenerationStats(
                prompt_tps=float(prefill_tps or out.prompt_tps),
                generation_tps=float(out.generation_tps),
-                prompt_tokens=int(out.prompt_tokens),
+                prompt_tokens=int(prefill_tokens + out.prompt_tokens),
                generation_tokens=int(out.generation_tokens),
                peak_memory_usage=Memory.from_gb(out.peak_memory),
            )
@@ -245,11 +267,24 @@ def mlx_generate(
                    f"Model generated unexpected finish_reason: {out.finish_reason}"
                )

+            usage = Usage(
+                prompt_tokens=int(out.prompt_tokens),
+                completion_tokens=completion_tokens,
+                total_tokens=int(out.prompt_tokens) + completion_tokens,
+                prompt_tokens_details=PromptTokensDetails(
+                    cached_tokens=prefix_hit_length
+                ),
+                completion_tokens_details=CompletionTokensDetails(
+                    reasoning_tokens=reasoning_tokens
+                ),
+            )
+
        yield GenerationResponse(
            text=out.text,
            token=out.token,
            finish_reason=cast(FinishReason | None, out.finish_reason),
            stats=stats,
+            usage=usage,
        )

        if out.finish_reason is not None:
--- a/src/exo/worker/engines/mlx/utils_mlx.py
+++ b/src/exo/worker/engines/mlx/utils_mlx.py
@@ -165,12 +165,11 @@ def mlx_distributed_init(

                jaccl_coordinator = jaccl_coordinators[bound_instance.bound_node_id]

-                # TODO: update once upstream fixes
                logger.info(
-                    f"rank {rank} MLX_JACCL_DEVICES: {coordination_file} with devices: {jaccl_devices_json}"
+                    f"rank {rank} MLX_IBV_DEVICES: {coordination_file} with devices: {jaccl_devices_json}"
                )
                logger.info(f"rank {rank} MLX_JACCL_COORDINATOR: {jaccl_coordinator}")
-                os.environ["MLX_JACCL_DEVICES"] = coordination_file
+                os.environ["MLX_IBV_DEVICES"] = coordination_file
                os.environ["MLX_RANK"] = str(rank)
                os.environ["MLX_JACCL_COORDINATOR"] = jaccl_coordinator
                group = mx.distributed.init(backend="jaccl", strict=True)
@@ -259,10 +258,10 @@ def shard_and_load(

    logger.info(f"Group size: {group.size()}, group rank: {group.rank()}")

-    # Estimate timeout based on model size
-    base_timeout = float(os.environ.get("EXO_MODEL_LOAD_TIMEOUT", "60"))
+    # Estimate timeout based on model size (5x default for large queued workloads)
+    base_timeout = float(os.environ.get("EXO_MODEL_LOAD_TIMEOUT", "300"))
    model_size_gb = get_weights_size(shard_metadata).in_bytes / (1024**3)
-    timeout_seconds = base_timeout + model_size_gb / 5
+    timeout_seconds = base_timeout + model_size_gb
    logger.info(
        f"Evaluating model parameters with timeout of {timeout_seconds:.0f}s "
        f"(model size: {model_size_gb:.1f}GB)"
@@ -339,8 +338,35 @@ def load_tokenizer_for_model_id(

    # Kimi uses a custom TikTokenTokenizer that transformers 5.x can't load via AutoTokenizer
    if "kimi-k2" in model_id_lower:
+        import importlib.util
+        import types
+
        sys.path.insert(0, str(model_path))
-        from tokenization_kimi import TikTokenTokenizer  # type: ignore[import-not-found]  # noqa: I001
+
+        # Load tool_declaration_ts first (tokenization_kimi imports it with relative import)
+        tool_decl_path = model_path / "tool_declaration_ts.py"
+        if tool_decl_path.exists():
+            spec = importlib.util.spec_from_file_location(
+                "tool_declaration_ts", tool_decl_path
+            )
+            if spec and spec.loader:
+                tool_decl_module = importlib.util.module_from_spec(spec)
+                sys.modules["tool_declaration_ts"] = tool_decl_module
+                spec.loader.exec_module(tool_decl_module)
+
+        # Load tokenization_kimi with patched source (convert relative to absolute import)
+        tok_path = model_path / "tokenization_kimi.py"
+        source = tok_path.read_text()
+        source = source.replace("from .tool_declaration_ts", "from tool_declaration_ts")
+        spec = importlib.util.spec_from_file_location("tokenization_kimi", tok_path)
+        if spec:
+            tok_module = types.ModuleType("tokenization_kimi")
+            tok_module.__file__ = str(tok_path)
+            sys.modules["tokenization_kimi"] = tok_module
+            exec(compile(source, tok_path, "exec"), tok_module.__dict__)  # noqa: S102
+            TikTokenTokenizer = tok_module.TikTokenTokenizer  # type: ignore[attr-defined]  # noqa: N806
+        else:
+            from tokenization_kimi import TikTokenTokenizer  # type: ignore[import-not-found]  # noqa: I001

        hf_tokenizer: Any = TikTokenTokenizer.from_pretrained(model_path)  # pyright: ignore[reportUnknownVariableType,reportUnknownMemberType]

--- a/src/exo/worker/runner/runner.py
+++ b/src/exo/worker/runner/runner.py
@@ -277,9 +277,11 @@ def main(
                                tokenizer.tool_parser,  # pyright: ignore[reportAny]
                            )

+                        completion_tokens = 0
                        for response in mlx_generator:
                            match response:
                                case GenerationResponse():
+                                    completion_tokens += 1
                                    if (
                                        device_rank == 0
                                        and response.finish_reason == "error"
@@ -307,6 +309,7 @@ def main(
                                                    model=shard_metadata.model_card.model_id,
                                                    text=response.text,
                                                    token_id=response.token,
+                                                    usage=response.usage,
                                                    finish_reason=response.finish_reason,
                                                    stats=response.stats,
                                                ),
@@ -320,6 +323,7 @@ def main(
                                                chunk=ToolCallChunk(
                                                    tool_calls=response.tool_calls,
                                                    model=shard_metadata.model_card.model_id,
+                                                    usage=response.usage,
                                                ),
                                            )
                                        )
@@ -535,10 +539,10 @@ def parse_gpt_oss(
                            name=current_tool_name,
                            arguments="".join(tool_arg_parts).strip(),
                        )
-                    ]
+                    ],
+                    usage=response.usage,
                )
                tool_arg_parts = []
-                break
            current_tool_name = recipient

        # If inside a tool call, accumulate arguments
@@ -684,7 +688,7 @@ def parse_tool_calls(
                    tools = [_validate_single_tool(tool) for tool in parsed]
                else:
                    tools = [_validate_single_tool(parsed)]
-                yield ToolCallResponse(tool_calls=tools)
+                yield ToolCallResponse(tool_calls=tools, usage=response.usage)

            except (
                json.JSONDecodeError,
--- a/src/exo/worker/tests/unittests/test_runner/test_event_ordering.py
+++ b/src/exo/worker/tests/unittests/test_runner/test_event_ordering.py
@@ -120,7 +120,7 @@ def patch_out_mlx(monkeypatch: pytest.MonkeyPatch):
    monkeypatch.setattr(mlx_runner, "detect_thinking_prompt_suffix", make_nothin(False))

    def fake_generate(*_1: object, **_2: object):
-        yield GenerationResponse(token=0, text="hi", finish_reason="stop")
+        yield GenerationResponse(token=0, text="hi", finish_reason="stop", usage=None)

    monkeypatch.setattr(mlx_runner, "mlx_generate", fake_generate)

@@ -182,6 +182,8 @@ def test_events_processed_in_correct_order(patch_out_mlx: pytest.MonkeyPatch):
            text="hi",
            token_id=0,
            finish_reason="stop",
+            usage=None,
+            stats=None,
        ),
    )

--- a/tests/start_distributed_test.sh
+++ b/tests/start_distributed_test.sh
@@ -11,7 +11,6 @@ if [[ $# -lt 2 ]]; then
  exit 1
 fi

-
 kind=$1
 shift

@@ -31,14 +30,14 @@ for name in "${hostnames[@]}"; do
  weaved+=("$name" "$ip")
 done

-devs_raw=$(printf "[\"%s\", \"%s\"], " "${weaved[@]}")
+devs_raw=$(printf '["%s", "%s"], ' "${weaved[@]}")
 devs="[${devs_raw%, }]"

 model_ids=("qwen3-30b" "gpt-oss-120b-MXFP4-Q8" "kimi-k2-thinking")

 for model_id in "${model_ids[@]}"; do
-  for i in "${!ips[@]}"; do  
-    { 
+  for i in "${!ips[@]}"; do
+    {
      req="{
        \"model_id\": \"${model_id}\",
        \"devs\": ${devs},
@@ -48,9 +47,8 @@ for model_id in "${model_ids[@]}"; do
      curl -sN \
        -X POST "http://${ips[$i]}:52415/${kind}" \
        -H "Content-Type: application/json" -d "$req" \
-      2>&1 | sed "s/^/\n${hostnames[$i]}@${ips[$i]}: /" || echo "curl to ${hostnames[$i]} failed" && exit 1
+        2>&1 | sed "s/^/\n${hostnames[$i]}@${ips[$i]}: /" || echo "curl to ${hostnames[$i]} failed" && exit 1
    } &
  done
  wait
 done
-
--- a/tmp/config_examples/opencode.json
+++ b/tmp/config_examples/opencode.json
@@ -0,0 +1,18 @@
+{
+  "$schema": "https://opencode.ai/config.json",
+  "model": "exo/mlx-community/gpt-oss-120b-MXFP4-Q8",
+  "provider": {
+    "exo": {
+      "api": "http://localhost:52415/v1",
+      "models": {
+        "mlx-community/gpt-oss-120b-MXFP4-Q8": {
+          "name": "GPT OSS 120B",
+          "limit": {
+            "context": 32768,
+            "output": 8192
+          }
+        }
+      }
+    }
+  }
+}
--- a/tmp/set_rdma_network_config.sh
+++ b/tmp/set_rdma_network_config.sh
@@ -0,0 +1,47 @@
+#!/usr/bin/env bash
+
+set -euo pipefail
+
+PREFS="/Library/Preferences/SystemConfiguration/preferences.plist"
+
+# Remove bridge0 interface
+ifconfig bridge0 &>/dev/null && {
+  ifconfig bridge0 | grep -q 'member' && {
+    ifconfig bridge0 | awk '/member/ {print $2}' | xargs -n1 ifconfig bridge0 deletem 2>/dev/null || true
+  }
+  ifconfig bridge0 destroy 2>/dev/null || true
+}
+
+# Remove Thunderbolt Bridge from VirtualNetworkInterfaces in preferences.plist
+/usr/libexec/PlistBuddy -c "Delete :VirtualNetworkInterfaces:Bridge:bridge0" "$PREFS" 2>/dev/null || true
+
+networksetup -listlocations | grep -q exo || {
+  networksetup -createlocation exo
+}
+
+networksetup -switchtolocation exo
+networksetup -listallhardwareports |
+  awk -F': ' '/Hardware Port: / {print $2}' |
+  while IFS=":" read -r name; do
+    case "$name" in
+    "Ethernet Adapter"*) ;;
+    "Thunderbolt Bridge") ;;
+    "Thunderbolt "*)
+      networksetup -listallnetworkservices |
+        grep -q "EXO $name" ||
+        networksetup -createnetworkservice "EXO $name" "$name" 2>/dev/null ||
+        continue
+      networksetup -setdhcp "EXO $name"
+      ;;
+    *)
+      networksetup -listallnetworkservices |
+        grep -q "$name" ||
+        networksetup -createnetworkservice "$name" "$name" 2>/dev/null ||
+        continue
+      ;;
+    esac
+  done
+
+networksetup -listnetworkservices | grep -q "Thunderbolt Bridge" && {
+  networksetup -setnetworkserviceenabled "Thunderbolt Bridge" off
+} || true
--- a/uv.lock
+++ b/uv.lock
Author	SHA1	Message	Date
Evan	c890bd0beb	switch to mlx-lm 30.5 instead of git main we were waiting for mlx-lm 30.5 features - we can switch to the pypi version now that it is out this also removes some dead code in our pyproject.toml	2026-01-30 11:19:01 +00:00
Evan	9ba61f3733	improve log message in shard downloader closes #1336	2026-01-30 10:35:01 +00:00
rltakashige	d9eca75895	Add usage stats (#1333 ) ## Motivation (Probably) the final missing piece of the Chat Completions API ## Changes Add UsageStats ## Why It Works OpenCode reviewed my PR and gave me stats: <img width="1150" height="802" alt="image" src="https://github.com/user-attachments/assets/ebc06bae-797f-4087-87d5-2f26cf60fc48" /> ## Test Plan ### Automated Testing No tests were broken.	2026-01-30 10:23:08 +00:00
rltakashige	9dabde7e57	Fix bench after recent updates (#1331 ) ## Motivation A lot of changes happened without much attention to the state of exo bench. ## Changes Use TaggedModel for BenchChatCompletion so it serialises properly. Don't break after gpt oss tool call to preserve parity with the rest of the codebase. ## Why It Works <!-- Explain why your approach solves the problem --> ## Test Plan ### Manual Testing <img width="2856" height="678" alt="image" src="https://github.com/user-attachments/assets/2e18cf0d-c0f8-467c-9763-1a6a59c8a327" /> Also tested GPT OSS tool calling in OpenCode	2026-01-29 19:14:40 +00:00
ciaranbor	a31942ce12	Ciaran/image non streaming (#1328 ) ## Motivation The dashboard UI attempted to parse all image generation responses as SSE streams, even when streaming was disabled. This broke non-streaming image generation. ## Changes - Parse JSON responses directly when not streaming, use SSE parser only when stream=true AND partialImages > 0 - explicitly disable partial images when not streaming ## Why It Works Both API and dashboard now use the same condition (stream && partialImages > 0) to determine response format, ensuring correct parsing. ## Test Plan ### Manual Testing Non-streamed image generation results appear in the UI. Streamed image generation still works	2026-01-29 17:24:32 +00:00
Alex Cheema	7cc313b22a	Treat Swift/Xcode build warnings as errors (#1322 ) ## Motivation Warnings that go unchecked tend to accumulate and hide real issues. Treating them as errors ensures they are addressed immediately, both locally during development and in CI. ## Changes Added `SWIFT_TREAT_WARNINGS_AS_ERRORS = YES` and `GCC_TREAT_WARNINGS_AS_ERRORS = YES` to the project-level Debug and Release build configurations in `project.pbxproj`. This applies to all targets (EXO, EXOTests, EXOUITests). ## Why It Works Xcode's `SWIFT_TREAT_WARNINGS_AS_ERRORS` and `GCC_TREAT_WARNINGS_AS_ERRORS` build settings promote Swift and C/ObjC warnings to errors at compile time. Setting them at the project level means all targets inherit the policy without needing per-target or CI-level overrides. ## Test Plan ### Manual Testing - Built the EXO scheme in Release configuration with `xcodebuild` — no warning-as-error failures from Swift or C/ObjC sources. ### Automated Testing - CI already builds with `-configuration Release`, so it will automatically enforce warnings-as-errors via the inherited project settings — no CI changes needed.	2026-01-29 17:15:49 +00:00
rltakashige	2837225dc7	Load pipeline layers sequentially (#1329 ) ## Motivation Slightly annoyed by needing this change, but same story as for tensor loading...	2026-01-29 17:08:38 +00:00
Jake Hillion	e4c6a7dbb4	nix: add Python packaging with uv2nix Add uv2nix to build Python packages from uv.lock. This creates a fully Nix-managed Python environment with the Rust bindings injected via overlay. Changes: - Add pyproject-nix, uv2nix, and pyproject-build-systems flake inputs - Create python/parts.nix with overlays to inject Nix-built Rust wheel - Export packages.exo on macOS (wraps exo/exo-master/exo-worker with dashboard) - Add checks.lint (ruff, all platforms) and checks.pytest (macOS only) - Simplify CI typecheck job using nicknovitski/nix-develop action - Delete .github/actions/typecheck composite action (no longer needed) - Add no-build-package for MLX packages in pyproject.toml (use wheels) The Python build is currently macOS-only since MLX requires Metal. Linux support will be added once the pyproject dependencies are simplified. Test plan: - Run `nix flake check` on macOS to verify pytest and lint pass - Build exo package on macOS: `nix build .#exo` - Verify CI pipeline passes with simplified typecheck job	2026-01-29 16:35:58 +00:00
Evan	b1e88a3d06	shfmt adds shfmt, a shell formatter, and formats the bash files	2026-01-29 15:24:36 +00:00
Jake Hillion	ebeddfb308	mlx: build with Nix (#1285 ) In order to make testing and deployment simpler and more reproducible, we want to provide a Nix derivation for our macOS .app build. We already build the Rust and dashboard with Nix, but so far the Python has been blocked because we haven't had an MLX build. This change adds a Metal compiler derivation that uses `requireFile` to be provided a NAR of the unfree macOS Metal compiler. It is documented how to get this file, but effectively you have to trigger the download, mount the DMG, and NAR the result. Once this is added to the store by hash we can build MLX using it. The MLX build itself is quite self explanatory. Test plan: - CI. We follow the instructions to grab the Metal compiler. Once this is in Cachix we should really never do this again, and I can pin the path too to ensure it doesn't leave. - MLX tests run as part of the MLX derivation's build. They pass. - `NIXPKGS_ALLOW_UNFREE=1 nix build .#mlx.passthru.tests.mlxTest --impure --option sandbox false` --------- Co-authored-by: Ryuichi Leo Takashige <leo@exolabs.net>	2026-01-29 14:07:00 +00:00
Alex Cheema	9111575997	Add startup delay and update network setup message (#1309 ) ## Summary - Add 20-second startup delay to wait for macOS to finish network setup after boot - Update user-facing message to clarify the service configures local networking, disables Thunderbolt Bridge (preventing packet storms), and installs a Network Location ## Test plan - [ ] Manual verification of Swift syntax - [ ] Test network setup on macOS device after reboot 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: rltakashige <rl.takashige@gmail.com>	2026-01-29 13:05:50 +00:00
Sami Khan	ffacabe7e4	Fix uninstall button error (#1306 ) ## Motivation Fix "Network setup script failed" error when clicking uninstall button and resolve Xcode compiler warnings. ## Changes - NetworkSetupHelper.swift: Add \|\| true guards and explicit return 0 in find_and_enable_thunderbolt_bridge to prevent script failures with set -euo pipefail - ThunderboltBridgeService.swift: Use withCString and withUnsafeMutablePointer for Authorization API calls to fix pointer lifetime warnings - EXOApp.swift: Mark showNotification as nonisolated to fix main actor isolation warning ## Why It Works - The uninstall script's Thunderbolt re-enable function could exit non-zero in edge cases (no bridges, no matches). Since this is a cleanup step, failures should not abort uninstall. - Swift requires explicit pointer lifetime management when passing strings/structs to C APIs. - showNotification is called from a nonisolated delegate method and uses thread-safe APIs. ## Test Plan ### Manual Testing Hardware: MacBook Pro - Clicked Uninstall button, verified it completes without error - Built in Xcode, verified no warnings ### Automated Testing N/A	2026-01-29 12:57:48 +00:00
rltakashige	9e58a57599	Add RDMA caveats to README.md (#1316 ) ## Motivation Running RDMA from source is not well documented as is. Several surprising things that took time to debug internally too. App should be updated to detect MacOS versions in future.	2026-01-28 18:44:00 +00:00
Evan Quiney	748a026071	fix configdata validation for kimi-k2 (#1314 ) ## motivation our shard downloader could not correctly fetch data for kimi-k2, as it deferred some values to a text_config field. ## changes config_data now prioritizes this field if it exists in information like layer_count	2026-01-28 14:29:36 +00:00
Alex Cheema	f1a2d054ec	Update tagline to "Run frontier AI locally" (#1313 ) - Update README tagline from "Run your own AI cluster at home with everyday devices" to "Run frontier AI locally"	2026-01-28 12:38:14 +00:00
Alex Cheema	b3c8f85fc8	Update MLX to 0.30.4 (#1311 ) ## Summary - Bump mlx from 0.30.3 to 0.30.4 ## Test plan - [x] `uv lock` succeeds - [x] Type checking passes (`uv run basedpyright`) - [x] Run inference tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-28 04:30:21 -08:00
rltakashige	a562114ba5	Add Kimi K2.5 support (#1302 ) ## Motivation <!-- Why is this change needed? What problem does it solve? --> <!-- If it fixes an open issue, please link to the issue here --> ## Changes <!-- Describe what you changed in detail --> ## Why It Works <!-- Explain why your approach solves the problem --> ## Test Plan ### Manual Testing <!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB, connected via Thunderbolt 4) --> <!-- What you did: --> <!-- - --> ### Automated Testing <!-- Describe changes to automated tests, or how existing tests cover this change --> <!-- - --> --------- Co-authored-by: Alex Cheema <41707476+AlexCheema@users.noreply.github.com>	2026-01-28 05:44:19 +00:00
Evan Quiney	991d278119	replace nix fmt with treefmt in just lint (#1301 ) man evaluating the nix flake is so slow. treefmt speeeedy	2026-01-27 17:03:01 +00:00
rltakashige	c55cbf6739	Add mlx lm style tensor sharding for Minimax (#1299 ) ## Motivation Broken right now. We'll potentially add a better one later ## Changes <!-- Describe what you changed in detail --> ## Why It Works <!-- Explain why your approach solves the problem --> ## Test Plan ### Manual Testing Used for evals without any issue. ### Automated Testing <!-- Describe changes to automated tests, or how existing tests cover this change --> <!-- - -->	2026-01-27 15:29:06 +00:00