Commit Graph

1986 Commits

Author SHA1 Message Date
Sami Khan
ffacabe7e4 Fix uninstall button error (#1306)
## Motivation

Fix "Network setup script failed" error when clicking uninstall button
and resolve Xcode compiler warnings.

## Changes

- NetworkSetupHelper.swift: Add || true guards and explicit return 0 in
find_and_enable_thunderbolt_bridge to prevent script failures with set
-euo pipefail
- ThunderboltBridgeService.swift: Use withCString and
withUnsafeMutablePointer for Authorization API calls to fix pointer
lifetime warnings
- EXOApp.swift: Mark showNotification as nonisolated to fix main actor
isolation warning

## Why It Works

- The uninstall script's Thunderbolt re-enable function could exit
non-zero in edge cases (no bridges, no matches). Since this is a cleanup
step, failures should not abort uninstall.
- Swift requires explicit pointer lifetime management when passing
strings/structs to C APIs.
- showNotification is called from a nonisolated delegate method and uses
thread-safe APIs.

## Test Plan

### Manual Testing
Hardware: MacBook Pro

- Clicked Uninstall button, verified it completes without error
- Built in Xcode, verified no warnings   

### Automated Testing
N/A
2026-01-29 12:57:48 +00:00
rltakashige
9e58a57599 Add RDMA caveats to README.md (#1316)
## Motivation

Running RDMA from source is not well documented as is. Several
surprising things that took time to debug internally too.

App should be updated to detect MacOS versions in future.
2026-01-28 18:44:00 +00:00
Evan Quiney
748a026071 fix configdata validation for kimi-k2 (#1314)
## motivation
our shard downloader could not correctly fetch data for kimi-k2, as it
deferred some values to a text_config field.
## changes
config_data now prioritizes this field if it exists in information like
layer_count
2026-01-28 14:29:36 +00:00
Alex Cheema
f1a2d054ec Update tagline to "Run frontier AI locally" (#1313)
- Update README tagline from "Run your own AI cluster at home with
everyday devices" to "Run frontier AI locally"
2026-01-28 12:38:14 +00:00
Alex Cheema
b3c8f85fc8 Update MLX to 0.30.4 (#1311)
## Summary
- Bump mlx from 0.30.3 to 0.30.4

## Test plan
- [x] `uv lock` succeeds
- [x] Type checking passes (`uv run basedpyright`)
- [x] Run inference tests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 04:30:21 -08:00
rltakashige
a562114ba5 Add Kimi K2.5 support (#1302)
## Motivation

<!-- Why is this change needed? What problem does it solve? -->
<!-- If it fixes an open issue, please link to the issue here -->

## Changes

<!-- Describe what you changed in detail -->

## Why It Works

<!-- Explain why your approach solves the problem -->

## Test Plan

### Manual Testing
<!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB,
connected via Thunderbolt 4) -->
<!-- What you did: -->
<!-- - -->

### Automated Testing
<!-- Describe changes to automated tests, or how existing tests cover
this change -->
<!-- - -->

---------

Co-authored-by: Alex Cheema <41707476+AlexCheema@users.noreply.github.com>
2026-01-28 05:44:19 +00:00
Evan Quiney
991d278119 replace nix fmt with treefmt in just lint (#1301)
man evaluating the nix flake is so slow. treefmt speeeedy
2026-01-27 17:03:01 +00:00
rltakashige
c55cbf6739 Add mlx lm style tensor sharding for Minimax (#1299)
## Motivation

Broken right now. We'll potentially add a better one later

## Changes

<!-- Describe what you changed in detail -->

## Why It Works

<!-- Explain why your approach solves the problem -->

## Test Plan

### Manual Testing
Used for evals without any issue.

### Automated Testing
<!-- Describe changes to automated tests, or how existing tests cover
this change -->
<!-- - -->
2026-01-27 15:29:06 +00:00
Alex Cheema
bd4f0bf048 Fix download speed/ETA display for re-downloads (#1294)
## Motivation

After the download verification fix, when files are re-downloaded due to
upstream changes (size mismatch), the download progress displays
correctly (completion %, bytes, file counts), but speed shows 0 B/s and
ETA shows "--" for both overall and per-file progress.

## Changes

- Modified `on_progress_wrapper` in `src/exo/download/download_utils.py`
to detect re-download scenarios
- Added re-download detection: when `curr_bytes < previous_downloaded`,
the file was deleted and download restarted
- On re-download: reset `start_time` to current time and set
`downloaded_this_session = curr_bytes`
- Added two tests to `test_download_verification.py` covering
re-download and continuing download scenarios

## Why It Works

The bug occurred because:
1. `file_progress` is initialized with the OLD local file size (e.g.,
1.5GB)
2. When `_download_file` detects size mismatch, it deletes the file and
starts fresh
3. Progress callback receives small `curr_bytes` (e.g., 8KB) but
compares against old size
4. `downloaded_this_session = 0 + (8KB - 1.5GB) = -1.5GB` (negative!)
5. Negative session bytes → 0 or negative speed → ETA shows "--"

The fix detects when `curr_bytes < previous_downloaded` (indicating
re-download started) and resets tracking to treat it as a fresh
download.

## Test Plan

### Manual Testing
<!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB,
connected via Thunderbolt 4) -->
<!-- What you did: -->
- Download a model, modify a file to change its size, restart exo,
verify speed/ETA display correctly during re-download

### Automated Testing
- Added `TestProgressResetOnRedownload` class with two tests:
- `test_progress_resets_correctly_on_redownload`: Verifies progress
resets correctly when re-download starts
- `test_progress_accumulates_on_continuing_download`: Verifies
continuing downloads still accumulate correctly
- All 11 download tests pass
- Type checking (basedpyright): 0 errors
- Linting (ruff): All checks passed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 21:56:58 +00:00
rltakashige
cd8c01b7c8 Fix kv prefix cache (#1262)
## Motivation

OpenCode sends very large prompts, most of which are repeated on the
next call.

## Changes

Add prefix caching, reducing average time in prefill (in testing) from
40 seconds to 4. This massively improves user experience.

Also evicts KV caches from this prefix cache in a LRU-style manner. 

## Why It Works

We no longer prefill repeatedly but rather use kv cache stored in
memory. A future update may want to use storage to make the prefix cache
larger.

## Test Plan

### Manual Testing
Tested speedup on OpenCode

### Automated Testing
Added a lot of tests

---------

Co-authored-by: David Hind <davehind@yahoo.co.uk>
2026-01-26 20:13:58 +00:00
rltakashige
59e991ce15 Only ignore message if actually empty (#1292)
## Motivation

<!-- Why is this change needed? What problem does it solve? -->
<!-- If it fixes an open issue, please link to the issue here -->

## Changes

<!-- Describe what you changed in detail -->

## Why It Works

<!-- Explain why your approach solves the problem -->

## Test Plan

### Manual Testing
<!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB,
connected via Thunderbolt 4) -->
<!-- What you did: -->
<!-- - -->

### Automated Testing
<!-- Describe changes to automated tests, or how existing tests cover
this change -->
<!-- - -->
2026-01-26 19:33:23 +00:00
ciaranbor
ffba340e70 Ciaran/image quantization (#1272)
## Motivation

Enable users to select and use quantized variants (8-bit, 4-bit) of
image models

## Changes

Use exolabs HF org for image models

## Why It Works

Quantized versions have been uploaded to exolabs HF org

## Test Plan

Loaded and ran different quantized variants. Confirmed lower memory
usage and different outputs for the same seed. Verified chat completion
still works.
2026-01-26 19:25:05 +00:00
rltakashige
9968abe816 Leo/fix basic model shard (#1291)
## Motivation

Some models, on some configurations, would have several issues that
caused the model to be stuck on loading.

## Changes

Several loading issues were with upstream mlx lm shard loading for
tensor parallel.
GLM 4.7 Flash now uses GLM 4.7 Lite.
A final portion of the issues were from mlx memory not being properly
released before calling mx.eval(model), causing the system to run out of
memory.

## Test Plan

### Manual Testing
Done a bunch (thanks @AlexCheema), hopefully exhaustive. 

### Automated Testing
A bunch of automated testing is imminent but not landed yet.

---------

Co-authored-by: Alex Cheema <alexcheema123@gmail.com>
2026-01-26 17:49:09 +00:00
Alex Cheema
0e30b0830f Fix download system for upstream file changes (#1290)
## Motivation

When upstream files change on Hugging Face, exo's download system
doesn't detect the change and downloads get stuck. The only workaround
is deleting `~/.exo/models/` and the cache.

Root causes:
1. Existing files are never re-verified against remote metadata
2. File list cache is never invalidated, causing stale sizes to be used

## Changes

1. **Verify existing files against remote size** (`_download_file`):
Before returning early for existing files, verify the local file size
matches remote. If mismatched, delete and re-download. If network fails
(offline), fall back to trusting local file.

2. **Always try fresh file list first** (`fetch_file_list_with_cache`):
Always attempt to fetch fresh data from Hugging Face. On success, update
the cache. On failure, fall back to cached data if available.

3. **Clear cache on model delete** (`delete_model`): When a model is
deleted, also delete its cache entry to prevent stale metadata.

## Why It Works

- **Online**: Stale local files are detected via size mismatch and
re-downloaded. Fresh file list is always fetched and cache is updated.
- **Offline with cache**: Existing files are trusted. Cached file list
is used as fallback.
- **Offline without cache**: Fails gracefully (can't download without
knowing what files to get).

The size check is O(1) so there's no performance impact. Hash
verification still happens after download completes (existing behavior).

## Test Plan

### Manual Testing
<!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB,
connected via Thunderbolt 4) -->
<!-- What you did: -->
- Download a model, manually modify a local file's content, restart exo,
verify it re-downloads

### Automated Testing
Added 9 new tests in
`src/exo/download/tests/test_download_verification.py`:
- Re-download when file size changes upstream
- Skip download when file size matches
- Offline fallback uses local file
- Fetch fresh file list and update cache
- Fall back to cache when fetch fails
- Error propagates when no cache exists
- Model delete clears cache
- Delete when only cache exists
- Delete nonexistent model

All tests pass: `uv run pytest src/exo/download/tests/ -v`

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 09:14:58 -08:00
Alex Cheema
44453c4c8b Remove change-detection checks from info gatherer monitors (#1283)
## Summary
- When a node times out, its info gets cleared from state. The monitor
functions only sent data when something changed, leaving no mechanism to
re-populate this info after a timeout.
- Removes change-detection checks from `_monitor_misc`,
`_monitor_system_profiler_thunderbolt_data`, `_watch_system_info`, and
`_monitor_thunderbolt_bridge_status` so data is sent periodically
regardless of whether it changed.

## Test plan
- [ ] Verify type checker passes: `uv run basedpyright`
- [ ] Verify linter passes: `uv run ruff check`
- [ ] Verify tests pass: `uv run pytest`
- [ ] Manually test that node info is re-populated after a timeout by
observing cluster behavior

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 12:23:22 +00:00
Jake Hillion
1290e8ed9f dashboard: fix prettier-svelte rebuilding on every file change
The prettier-svelte package was rebuilding whenever any file in the
repository changed because dashboardStubSrc referenced inputs.self
directly. Since inputs.self's store path hash is computed from the
entire repository contents, any file modification invalidated the
derivation.

Added dashboardLockfileSrc using lib.cleanSourceWith to filter
inputs.self to only include package.json and package-lock.json from
the dashboard directory. Updated dashboardStubSrc to reference this
filtered source instead of inputs.self directly.

This ensures prettier-svelte only rebuilds when the lockfiles actually
change, significantly improving build caching for unrelated changes.

Test plan:
- Built prettier-svelte with nix build .#prettier-svelte
- Modified src/exo/main.py and rebuilt - same store path (no rebuild)
- Modified dashboard/package.json and rebuilt - different store path (rebuild triggered)
- Ran nix flake check successfully
2026-01-26 12:02:05 +00:00
Evan Quiney
d93db3d6bf re enable the evil network script (#1277)
seems like we still need the interfaces to be routable for mdns. at
least we're not dependent on this behaviour anymore.
2026-01-24 13:36:06 +00:00
Alex Cheema
ff4a2022f7 Revert state compaction (#1259) (#1275)
## Summary

Reverts the state compaction feature (#1259) to investigate issues with
nodes staying as "unknown" after joining a cluster.

## Test plan

- [ ] Verify nodes properly show up after joining cluster
- [ ] Verify state catchup works correctly without compaction

🤖 Generated with [Claude Code](https://claude.com/claude-code)
2026-01-23 16:29:48 -08:00
rltakashige
cee48f6f34 Parse GPT OSS tool calling (#1271)
## Motivation

<img width="3162" height="858" alt="image"
src="https://github.com/user-attachments/assets/e552f373-620a-4522-894b-6f93fd7f1e50"
/>

## Changes

OpenAI Harmony StreamableParser does parsing for us.

## Why It Works

<img width="3230" height="588" alt="image"
src="https://github.com/user-attachments/assets/81f8a43e-c04b-4bd0-9fd0-65e9b5f6ea1d"
/>
2026-01-23 20:43:53 +00:00
Evan Quiney
2b67e84a03 state compaction (#1259)
## motivation

a node joining a long-running cluster would bring down networking. this
attempts to mitigate that issue by compacting the state for catching up
new devices

## changes

introduces a new topic ("state_catchup") over which a full state can be
sent. currently the master sends the worker + api this new state, and
they update only if they have no other events applied - otherwise usual
NACK systems function

## testing

manually tested on two and eight nodes - its an improvement, not a fix

Co-authored-by: rltakashige <rl.takashige@gmail.com>
2026-01-23 20:32:49 +00:00
Alex Cheema
7204fdeb4a Restore Thunderbolt Bridge LaunchDaemon (#1270)
## Motivation

The LaunchDaemon approach for disabling Thunderbolt Bridge was removed
in commit 43f12f5d and replaced with dynamic cycle detection. However,
the LaunchDaemon runs automatically on reboot, ensuring the bridge is
always disabled before it can cause packet storms.

## Changes

- Restore `NetworkSetupHelper.promptAndInstallIfNeeded()` to install a
LaunchDaemon that disables Thunderbolt Bridge on startup
- Show user prompt explaining what will be installed before requesting
admin password
- Remove old cleanup-only logic from `EXOApp.swift`
- Installer removes any existing installation before installing fresh
(handles upgrades)

## Why It Works

The LaunchDaemon runs at boot with `RunAtLoad=true` and periodically
(every ~30 min), destroying bridge0 and disabling Thunderbolt Bridge
before it can cause packet storms. The daemon is only installed
once—`daemonAlreadyInstalled()` checks script content and plist config
match before prompting.

## Test Plan

### Manual Testing
- Run app first time → should see prompt → click Install → enter admin
password → daemon installed
- Run app again → no prompt (already installed)
- Reboot → bridge0 should be destroyed/disabled automatically
- Check daemon: `launchctl list | grep io.exo.networksetup`
- Check files: `/Library/LaunchDaemons/io.exo.networksetup.plist`,
`/Library/Application Support/EXO/disable_bridge.sh`

### Automated Testing
N/A - requires admin privileges and system-level changes

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-23 20:25:37 +00:00
Evan Quiney
ec345a4315 fix: deprioritise uncertain ethernet devices (#1267)
we were placing coordinators on uncertain devices (enX+) that are listed
as "USB LAN" - these could be thunderbolt ports breaking RDMA instances
2026-01-23 20:13:28 +00:00
ciaranbor
9967dfa734 Prevent conversation collision (#1266)
## Motivation

When a user switched conversations while a response was still streaming,
the streaming content would be written to the currently selected
conversation instead of the original one. For streamed image generation,
each partial image would be written to the open conversation

## Changes

Added helper methods to track and update the correct conversation during
streaming:
- updateConversationMessage() - Update a message in a specific
conversation by ID
- syncActiveMessagesIfNeeded() - Sync this.messages from target
conversation only if it's active
- conversationExists() - Check if a conversation still exists (handles
mid-stream deletion)
  - persistConversation() - Persist a specific conversation to storage
- addMessageToConversation() - Add a message directly to a specific
conversation


## Why It Works

Capturing the conversation ID at the start of the request ensures we
know which conversation to update

## Test Plan

### Manual Testing

Tested switching conversation during generation across each model type
2026-01-23 19:59:08 +00:00
ciaranbor
23fd37fe4d Add FLUX.1-Krea-dev model (#1269)
## Why It Works

Same implementation as FLUX.1-dev, just different weights
2026-01-23 19:48:24 +00:00
Alex Cheema
d229df38f9 Fix placement filter to use subset matching instead of exact match (#1265)
## Motivation

When using the dashboard's instance placement filter (clicking nodes in
the topology), it was filtering to placements that use exactly the
selected nodes. This isn't the expected behavior - users want to see
placements that include all selected nodes, but may also include
additional nodes.

For example, selecting nodes [A, B] should show placements using [A, B],
[A, B, C], [A, B, C, D], etc. - not just [A, B].

## Changes

- Added `required_nodes` parameter to `place_instance()` in
`placement.py`
- Filter cycles early in placement to only those containing all required
nodes (subset matching)
- Simplified `api.py` by removing the subgraph topology filtering and
passing `required_nodes` directly to placement
- Renamed internal `node_ids` variable to `placement_node_ids` to avoid
shadowing the parameter

## Why It Works

By filtering cycles at the placement level using
`required_nodes.issubset(cycle.node_ids)`, we ensure that only cycles
containing all the user-selected nodes are considered. This happens
early in the placement algorithm, so we don't waste time computing
placements that would be filtered out later.

## Test Plan

### Manual Testing
- Select nodes in the dashboard topology view
- Verify that placements shown include all selected nodes (but may
include additional nodes)
- Verify that placements not containing the selected nodes are filtered
out

### Automated Testing
- Existing placement tests pass
- `uv run pytest src/exo/master/tests/ -v` - 37 tests pass

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-23 19:40:31 +00:00
Alex Cheema
8a595fee2f Fix Thunderbolt bridge cycle detection to include 2-node cycles (#1261)
## Motivation

Packet storms occur with Thunderbolt bridge enabled on 2 machines
connected by Thunderbolt, not just 3+ node cycles as previously assumed.
The cycle detection was too conservative and missed this case.

## Changes

- Changed the minimum cycle length from >2 (3+ nodes) to >=2 (2+ nodes)
- Updated the early return threshold from `< 3` to `< 2` enabled nodes
- Updated docstring to reflect the new behavior

## Why It Works

A Thunderbolt bridge loop between just 2 machines can still create
broadcast storms when both have the bridge enabled. The previous
threshold of 3+ was based on an incorrect assumption that 2-node
connections wouldn't cause this problem.

## Test Plan

### Manual Testing
<!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB,
connected via Thunderbolt 4) -->
- Tested with 2 machines connected via Thunderbolt with bridge enabled
- Confirmed packet storms occur in this configuration
- Verified the fix correctly detects and handles 2-node cycles

### Automated Testing
- Existing topology tests cover cycle detection logic

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-23 19:34:48 +00:00
ciaranbor
c8571a17a3 Fix guidance (#1264)
## Motivation

Previously, we only handled user-provided guidance parameter for CFG
models.

## Changes

Just pass the parameter to model setup
2026-01-23 19:13:45 +00:00
Evan Quiney
771a86331b fix instance port assignment (#1268)
we were overassigning the port 52414 to instances because of an error in placement
2026-01-23 18:37:40 +00:00
Jake Hillion
6dbbe7797b downloads: add download and delete buttons to downloads UI
The downloads page showed model download progress but provided no way
for users to trigger downloads or remove completed models from disk.

Added API endpoints (POST /download/start, DELETE /download/{node_id}/{model_id})
that send StartDownload and DeleteDownload commands via the download_command_sender.
Updated the dashboard downloads page with per-model buttons: a download button
for incomplete downloads and a delete button for completed ones.

This allows users to manage downloads directly from the UI without needing
to trigger downloads through other means.

Test plan:
- Deployed on a 3 machine cluster. Did several downloads/deletions - all
  work and the dashboard updates relatively fluently. It takes roughly 5
  seconds to render a 131GB model deletion which isn't too bad.
2026-01-23 18:11:17 +00:00
Jake Hillion
9357503c6f downloads: refactor to run at node level
The Worker previously owned the ShardDownloader directly via dependency
injection, which prevented --no-worker nodes from downloading and made
it impossible for multiple Workers to share a single downloader instance.

Moved download functionality to a new DownloadCoordinator component at
the Node level that communicates via the DOWNLOAD_COMMANDS pub/sub topic.
Workers now send StartDownload commands instead of calling the downloader
directly, and receive progress updates through the event-sourced state.

This decouples downloads from the Worker lifecycle and enables future
features like UI-triggered downloads to specific nodes and multi-worker
download sharing.

Test plan:
- Mostly tested in the next PR that adds explicit downloads/deletions to
  the dashboard.
- Started a model that isn't downloaded - it works.
2026-01-23 18:04:09 +00:00
ciaranbor
ba19940828 Fix regenerate for image models (#1263)
## Motivation

The 'regenerate' button was hardcoded to chat completion. Clicking
'regenerate' for image request would result in an error after the model
is loaded

## Changes

Store request type and dispatch to appropriate request upon regeneration

## Why It Works

We make sure to repeat the same request type as was performed originally

## Test Plan

### Manual Testing

Checked 'regenerate' works for chat completion, image generation, image
editing
2026-01-23 16:33:01 +00:00
Jake Hillion
f255345a1a dashboard: decouple prettier-svelte from dashboard source
The prettier-svelte formatter depended on the full dashboard build
(dashboardFull), causing the devshell to rebuild whenever any dashboard
source file changed.

Created a deps-only dream2nix derivation (deps.nix) that uses a stub
source containing only package.json, package-lock.json, and minimal
files for vite to succeed. Updated prettier-svelte to use this
derivation instead of dashboardFull.

The stub source is constant unless lockfiles change, so prettier-svelte
and the devshell no longer rebuild when dashboard source files are
modified.

Test plan:
- nix flake check passed
- nix fmt successfully formatted svelte files
2026-01-23 15:16:48 +00:00
ciaranbor
a1939c89f2 Enable UI settings for image editing (#1258)
## Motivation

Image editing was missing UI controls for quality, output format, and
advanced parameters that text-to-image generation already supported.

## Changes

- Added quality, output_format, and advanced_params to image edit API
endpoints
- Extended isImageModel check to include image editing models

## Why It Works

The API now accepts and forwards these settings for image edits, and the
UI displays the appropriate controls for image editing models.

## Test Plan

### Manual Testing

Verified parameters can be set in UI and that they progagate through to
model inference
2026-01-23 13:37:25 +00:00
ciaranbor
cb9c9ee55c Enable generating multiple images. Optionally stream partial images (#1251)
## Motivation

Support OpenAI API `n` setting

## Changes

- Users can select `n` to generate more than one image with the same
prompt
- each image uses a different seed -> different results
- `stream` and `partial_images` settings can be overwritten in UI
2026-01-23 11:19:58 +00:00
Alex Cheema
df240f834d Fix GLM and Kimi tool calling crashes (#1255)
## Motivation

Fixes tool calling crashes with GLM-4.7-Flash and Kimi-K2 models.

Related: #1254

Two distinct issues were causing crashes:
1. **Tool parser crashes** - The upstream GLM47 and Kimi tool parsers
call `.group()` on regex matches without checking for `None`, causing
`AttributeError` when the model outputs malformed tool calls
2. **Chat template crashes** - GLM's chat template expects
`tool_calls[].function.arguments` to be a dict, but OpenAI format
provides it as a JSON string, causing `'str object' has no attribute
'items'`

## Changes

**`src/exo/worker/runner/runner.py`:**
- Add `patch_glm_tokenizer()` - fixed version of mlx_lm's glm47 parser
with None checks
- Fix `patch_kimi_tokenizer()` - add None checks before calling
`.group()` on regex matches
- Add `ValueError` and `AttributeError` to exception handling in
`parse_tool_calls()`

**`src/exo/worker/engines/mlx/utils_mlx.py`:**
- Add `_normalize_tool_calls()` - parses
`tool_calls[].function.arguments` from JSON string to dict for templates
that expect dicts (like GLM-4.7-Flash)

## Why It Works

1. **Parser fixes**: By checking if regex matches are `None` before
calling `.group()`, we can raise a proper `ValueError` instead of
crashing with `AttributeError`

2. **Template fix**: The GLM-4.7-Flash chat template iterates over
arguments with `.items()`:
   ```jinja2
   {% set _args = tc.arguments %}{% for k, v in _args.items() %}
   ```
OpenAI format has `arguments` as a JSON string.
`_normalize_tool_calls()` parses this to a dict before passing to the
template.

## Test Plan

### Manual Testing
- Hardware: Mac with GLM-4.7-Flash-4bit model
- Tested tool calling with GLM model - no longer crashes

### Automated Testing
- Existing tests pass (`uv run pytest`)
- Type checking passes (`uv run basedpyright`)
- Linting passes (`uv run ruff check`)

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-01-23 01:39:59 +00:00
ciaranbor
cd125b3b8c Use icon for image editing models (#1252)
## Motivation

Visual indicator for image editing models

## Changes

Add pencil icon to edit models in model list
2026-01-22 22:37:34 +00:00
Alex Cheema
b783a21399 dashboard: add placement filter by clicking topology nodes (#1248)
## Motivation

When selecting a model for placement, users often want to see placements
that utilize specific nodes in their cluster. Currently there's no way
to filter the placement previews to focus on configurations that include
particular machines.

## Changes

- **Backend**: Added `node_ids` query parameter to the
`/placement-previews` API endpoint. When provided, the endpoint filters
the topology to only include the specified nodes before generating
placements using the new `Topology.filter_to_nodes()` method.

- **Topology class**: Added `filter_to_nodes(node_ids)` method that
creates a new topology containing only the specified nodes and edges
between them.

- **App store**: Added `previewNodeFilter` state to track selected
nodes, with methods to toggle/clear the filter. Automatically cleans up
filter when nodes are removed from the cluster and re-fetches previews
when topology changes.

- **TopologyGraph component**: Added click handlers to toggle node
filter selection, hover effects to indicate clickable nodes, and visual
styling (yellow highlight for selected, dimmed for filtered-out nodes).

- **Main page**: Added filter indicator in top-right corner of topology
showing active filter count with a clear button.

## Why It Works

The filtering happens at the backend/placement generation level rather
than just filtering the results. This ensures we see all valid placement
combinations for the selected nodes, not just a subset that happened to
be generated for the full topology.

The visual feedback uses the same rendering approach as the existing
highlight system - state is tracked in Svelte and applied during render,
so it persists across data updates without flickering.

## Test Plan

### Manual Testing
<!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB,
connected via Thunderbolt 4) -->
<!-- What you did: -->
- Click a node in topology → should show yellow highlight and filter
indicator
- Click another node → indicator shows "2 nodes", previews update to
show only placements using both
- Hover over nodes → subtle yellow highlight indicates they're clickable
- Click X on filter indicator → clears filter, shows all placements
again
- Disconnect a node while it's in filter → filter auto-removes that node

### Automated Testing
- Existing tests cover the Topology class; the new `filter_to_nodes`
method follows the same patterns

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 22:12:57 +00:00
Alex Cheema
43f12f5d08 Replace LaunchDaemon with dynamic Thunderbolt Bridge loop detection (#1222)
## Motivation

The previous approach installed a LaunchDaemon plist that ran
periodically to disable Thunderbolt Bridge. This required full admin
privileges upfront and ran regardless of whether a problematic loop
existed.

This change replaces that with dynamic detection - only prompting the
user when an actual TB bridge loop with 3+ machines is detected, and
using fine-grained SCPreferences authorization instead of full admin.

## Changes

**Backend (Python):**
- Added `ThunderboltBridgeStatus` model to track bridge enabled/exists
state per node
- Added `node_thunderbolt_bridge` and `thunderbolt_bridge_cycles` fields
to State
- Added `get_thunderbolt_bridge_cycles()` method to Topology class
- **Robust TB bridge detection:**
- Finds bridge network services from `-listnetworkserviceorder` (not
`-listallhardwareports` which can miss bridges)
- Checks each bridge's member interfaces via `ifconfig` to verify it
contains Thunderbolt interfaces
- Handles varying service names (e.g., "TB Bridge", "Thunderbolt
Bridge", "Bridge (bridge0)")
  - Includes `service_name` in status for correct disable commands
  - Added warning logs for all error cases in detection
- Updated `apply.py` to handle the new event type and recompute cycles
on node timeout

**Swift App:**
- New `ThunderboltBridgeService` that monitors for cycles from cluster
state
- Shows NSAlert when a cycle with >2 machines is detected
- Uses `SCPreferencesCreateWithAuthorization` with
`system.services.systemconfiguration.network` right for targeted
permissions
- **Auto-cleanup of legacy LaunchDaemon:** On app startup, checks for
and removes old plist/scripts (non-fatal if user cancels)
- **Periodic local network checking:** Re-checks every 10s so the
warning disappears when user grants permission
- **Fixed ClusterState model:** Updated to decode new granular state
fields (`nodeIdentities`, `nodeMemory`, `nodeSystem`,
`nodeThunderboltBridge`) with computed `nodeProfiles` property for
backwards compatibility
- **Fixed Topology model:** Updated to match actual JSON structure where
`nodes` is an array of strings (not objects) and `connections` is a
nested map (not flat array)
- Cleaned up `NetworkSetupHelper` by removing daemon installation code
(now only handles uninstall)

**Dashboard:**
- Added yellow warning badge on topology when TB bridge cycle detected
- On hover: highlights affected nodes in yellow on the topology graph
- Shows which machines are in the cycle with friendly names
- Provides copy-paste terminal command with the correct service name:
  ```
  sudo networksetup -setnetworkserviceenabled "<service-name>" off
  ```
- Warning appears in all topology views (full, welcome, and minimized
chat sidebar)
- **Debug mode:** Shows "TB:ON" or "TB:OFF" status next to each node in
the topology

## Why It Works

- Cycle detection happens on the backend where we have full topology
information
- Only cycles with 3+ machines are flagged (2-node connections are fine)
- TB bridge detection is robust:
- Uses `-listnetworkserviceorder` to find bridges (works on all machines
tested)
- Verifies bridge membership via `ifconfig` to confirm Thunderbolt
interfaces
  - Handles different service names across machines
- The Swift app reacts to detected cycles and prompts the user once per
cycle
- The dashboard provides visual feedback and actionable instructions
- `SCPreferencesCreateWithAuthorization` provides the minimal
permissions needed to modify network service state
- Legacy LaunchDaemon is automatically cleaned up on first launch with
this version

## Test Plan

### Manual Testing
Here EXO detected a TB bridge cycle:

#### Dashboard:
<img width="1363" height="884" alt="Screenshot 2026-01-21 at 10 07
30 PM"
src="https://github.com/user-attachments/assets/7da9c621-0c91-42c4-898e-4952188a1f61"
/>

#### Hovering the warning:
<img width="359" height="279" alt="Screenshot 2026-01-21 at 16 30 57"
src="https://github.com/user-attachments/assets/05501dcf-3d4a-4704-9f38-257748c05a53"
/>

#### macOS app warning popup:
<img width="270" height="410" alt="Screenshot 2026-01-21 at 16 29 08"
src="https://github.com/user-attachments/assets/45714427-08c3-4fb4-9e61-144925c51adf"
/>

### Which then asks for the user's password:
<img width="263" height="372" alt="Screenshot 2026-01-21 at 16 29 28"
src="https://github.com/user-attachments/assets/7502e591-596d-4128-8cf5-6a12674e27bc"
/>

Which when entered, successfully disables bridge and no longer shows the
warning on dashboard.

#### When it fails it shows the error message:
<img width="263" height="234" alt="Screenshot 2026-01-21 at 14 45 38"
src="https://github.com/user-attachments/assets/2d10b3d5-69d7-46ea-b631-d52d8651ab41"
/>

### Automated Testing
- Type checker: 0 errors (`uv run basedpyright`)
- Linter: All checks passed (`uv run ruff check`)
- Tests: 118 passed (`uv run pytest`)
- Dashboard: Builds successfully (`npm run build`)

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 21:53:05 +00:00
ciaranbor
8027d7933f Ciaran/hf token (#1250)
## Motivation

black-forest-labs models require hf auth and signup to download. We
don't handle this gracefully.
https://github.com/exo-explore/exo/issues/1242

## Changes

- Handle auth errors
- Surface error to UI and suggest resolution
- Support using HF_TOKEN env variable for auto
- Hide image functionality behind `EXO_ENABLE_IMAGE_MODELS=true` for now

## Why It Works

Users are presented with actionable feedback when issue occurs

## Test Plan

### Manual Testing

Confirmed loading black-forest-labs model in UI presents the issue in
the UI.
Confirmed both `hf auto login` and setting `HF_TOKEN` resolve the issue
2026-01-22 20:39:53 +00:00
Evan
ac6efa747b add kimi tool parseing
this patches the kimi tokenizer to add tool calling - it can be reverted
once upstream support is added for kimi-k2
2026-01-22 11:49:25 +00:00
Evan
2e3c33db6d implement mlx-lm tool calling
splits up the runners generation chunks into tool calls, tokens and
errors, and writes tool call chunks when the upstream parser detects
them.
2026-01-22 11:49:25 +00:00
rltakashige
fc8e6ad06b Reduce download log spam (#1249)
## Motivation

<!-- Why is this change needed? What problem does it solve? -->
<!-- If it fixes an open issue, please link to the issue here -->

## Changes

<!-- Describe what you changed in detail -->

## Why It Works

<!-- Explain why your approach solves the problem -->

## Test Plan

### Manual Testing
<!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB,
connected via Thunderbolt 4) -->
<!-- What you did: -->
<!-- - -->

### Automated Testing
<!-- Describe changes to automated tests, or how existing tests cover
this change -->
<!-- - -->
2026-01-22 11:28:36 +00:00
Alex Cheema
023108a19d Disable image model cards temporarily (#1247)
## Motivation

Image generation feature is not stable and causing issues for users.

Fixes #1242

## Changes

- Commented out image model cards (flux1-schnell, flux1-dev, qwen-image,
qwen-image-edit-2509) in `src/exo/shared/models/model_cards.py`
- Added reference to issue #1242 in the comment explaining why they are
disabled

## Why It Works

By commenting out the model cards, these image models will no longer
appear in the model list, preventing users from attempting to use the
unstable feature until it is stabilized.

## Test Plan

### Manual Testing
- Run exo and verify image models no longer appear in the model list

### Automated Testing
- No changes to automated tests needed - this simply removes models from
the available list

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 22:39:59 +00:00
Jake Hillion
c9818c30b4 dashboard: show model total size on downloads page for pending downloads
The downloads page showed "0B / 0B" for models that haven't started
downloading yet because the download progress data only gets populated
after the file list is fetched from HuggingFace.

Added a fetch to the /models API endpoint on page mount and created a
helper function that falls back to storage_size_megabytes when the
download's totalBytes is 0.

This allows users to see the actual model size (e.g., "0 / 25GB")
before a download begins, which is helpful for a future feature that
lets you download models explicitly.

Test plan:
- Deployed to a cluster, the previous 0B now show sensible values.
2026-01-21 21:53:54 +00:00
Alex Cheema
8f6726d6be Fix config.json download errors for image models (#1245)
## Motivation

When `get_shard_download_status()` runs, it iterates over all models in
`MODEL_CARDS` and calls `build_full_shard()` → `build_base_shard()` →
`ModelCard.from_hf()`. This unconditionally tried to download
`config.json` from HuggingFace, but image models (FLUX, Qwen-Image)
don't have a root-level config.json file, causing errors:

```
Error downloading shard: File not found: https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/config.json
Error downloading shard: File not found: https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/config.json
Error downloading shard: File not found: https://huggingface.co/Qwen/Qwen-Image/resolve/main/config.json
Error downloading shard: File not found: https://huggingface.co/Qwen/Qwen-Image-Edit-2509/resolve/main/config.json
```

## Changes

### ModelCard.load() fix
- `build_base_shard()` now uses `ModelCard.load()` instead of
`ModelCard.from_hf()`
- `ModelCard.load()` iterates through `MODEL_CARDS.values()` to find a
match by `model_id`

### exo-bench fixes
- Use `name` field instead of `id` for model resolution
- Pass `full_model_id` to `/instance/previews` endpoint
- Make model name matching case-insensitive
- Update README example model name

## Why It Works

`MODEL_CARDS` uses short names as keys (e.g., `"flux1-schnell"`) but the
`model_id` values are HuggingFace paths (e.g.,
`"black-forest-labs/FLUX.1-schnell"`). When `ModelCard.load()` was
called with the HF path, it didn't match any key and fell back to
`from_hf()` which tried to download config.json.

The fix iterates through `MODEL_CARDS.values()` to find a match by
`model_id`, ensuring predefined models (including image models) use
their registry entries directly without network calls. A key lookup is
unnecessary since `load()` is always called with HF paths which don't
match the short-name keys.

## Test Plan

### Manual Testing
- Run exo and verify no more "Error downloading shard: File not found:
.../config.json" errors for image models
- Run exo-bench and verify model resolution works correctly

### Automated Testing
- `uv run basedpyright` - passes with 0 errors
- `uv run pytest` - all tests pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 21:30:48 +00:00
rltakashige
ede779219c Reduce log spam (#1241)
## Motivation

Way too much spam. Some logs were also obsolete, leading to users
thinking there was something wrong during expected behaviour.

## Changes

<!-- Describe what you changed in detail -->

## Why It Works

<!-- Explain why your approach solves the problem -->

## Test Plan

### Manual Testing
<!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB,
connected via Thunderbolt 4) -->
<!-- What you did: -->
<!-- - -->

### Automated Testing
<!-- Describe changes to automated tests, or how existing tests cover
this change -->
<!-- - -->
2026-01-21 19:08:30 +00:00
Jake Hillion
a7e205e489 treefmt: add Svelte file formatting
Package prettier with Svelte support and add it to treefmt-nix to format
the dashboard. This change is brutal, I spent a long time trying to get
it nicer but it doesn't seem there's a good way to make this minimal.
Sorry for the noise!

This will make it easier for new contributors to get the formatting
right first time.

Also removes the `.prettierrc` because it turns out treefmt-nix was
ignoring it.

Test plan:
- CI
2026-01-21 18:51:55 +00:00
rltakashige
a354aaa3e5 Fix tests broken in recent commits (#1239)
We'll have good CI soon...

## Test Plan

### Automated Testing
Wroks
2026-01-21 18:32:49 +00:00
ciaranbor
307f454b96 feat: initial image generation support (#1095)
## Motivation

Enable distributed image generation across exo clusters

## Changes

- Added OpenAI-compatible /v1/images/generations and /v1/images/edits
API endpoints - Added /bench/images/generations and /bench/images/edits
endpoints that return generation statistics (timing, throughput metrics)
- Implemented PipeFusion distributed inference for diffusion models,
enabling patch-based parallelism across nodes
- Added model adapters for Flux (schnell, dev) and Qwen image models

## Why It Works

https://arxiv.org/abs/2405.14430

## Test Plan

### Manual Testing
- Generate images using /v1/images/generations endpoint with single and
multi-node clusters
- Test image editing via /v1/images/edits with source images
- Verify streaming partial images appear progressively in the dashboard
- Use /bench/images/generations to measure generation performance
- Test both Flux and Qwen model families

---------

Co-authored-by: Sami Khan <smsak99@gmail.com>
2026-01-21 18:21:58 +00:00
rltakashige
a31b6ee045 Import download utils once all modules are loaded (#1238)
## Motivation

Test failed due to circular import

## Changes

<!-- Describe what you changed in detail -->

## Why It Works

<!-- Explain why your approach solves the problem -->

## Test Plan

### Manual Testing
Tried importing and calling the functions, worked fine.

### Automated Testing
Tests pass again
2026-01-21 17:58:06 +00:00