Compare commits

..

8 Commits

Author SHA1 Message Date
Ryuichi Leo Takashige
aa788228bc fix tool parsing to load json if json 2026-02-24 19:45:20 +00:00
Ryuichi Leo Takashige
9d3b1334da update mlx again 2026-02-24 18:04:56 +00:00
Ryuichi Leo Takashige
811a4d80bd update mlx to resolve event leak 2026-02-24 13:03:55 +00:00
rltakashige
2fe689315b download .model files in exo bench (#1607)
## Motivation

failed again for kimi on a machine that had never downloaded it.

## Test Plan

### Manual Testing
it worked this time
2026-02-24 11:13:04 +00:00
Alex Cheema
644c5573ce fix: improve text contrast on downloads page (#1601)
## Summary
- Bumps opacity on dark-grey text/icons on the downloads page that were
nearly invisible against the dark background
- Informational text (GB sizes, speeds, disk free, model IDs) → full
`text-exo-light-gray`
- Interactive icons (delete, resume, retry) → `/70` at rest
- Hover-only elements (download button) → `/60`

## Test plan
- [ ] Open http://localhost:52415/downloads with models downloaded
- [ ] Verify GB downloaded amounts are clearly visible
- [ ] Verify delete trash icons are visible (not just on hover)
- [ ] Verify download speed text is readable
- [ ] Verify "paused" labels and resume buttons are visible
- [ ] Verify "X GB free" disk labels in column headers are readable

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 10:42:49 +00:00
rltakashige
12c3015f52 fix qwen moe tensor sharding (#1604)
## Motivation

<!-- Why is this change needed? What problem does it solve? -->
<!-- If it fixes an open issue, please link to the issue here -->

## Changes

<!-- Describe what you changed in detail -->

## Why It Works

<!-- Explain why your approach solves the problem -->

## Test Plan

### Manual Testing
<!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB,
connected via Thunderbolt 4) -->
<!-- What you did: -->
<!-- - -->

### Automated Testing
<!-- Describe changes to automated tests, or how existing tests cover
this change -->
<!-- - -->
2026-02-23 21:23:27 +00:00
rltakashige
365dd68d9a Final fixes for release (#1603)
## Motivation

<!-- Why is this change needed? What problem does it solve? -->
<!-- If it fixes an open issue, please link to the issue here -->

## Changes

<!-- Describe what you changed in detail -->

## Why It Works

<!-- Explain why your approach solves the problem -->

## Test Plan

### Manual Testing
<!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB,
connected via Thunderbolt 4) -->
<!-- What you did: -->
<!-- - -->

### Automated Testing
<!-- Describe changes to automated tests, or how existing tests cover
this change -->
<!-- - -->
2026-02-23 21:10:15 +00:00
Alex Cheema
d3d129581e test: verify instance deletion cancels ongoing tasks (#1508)
## Summary
- The cancellation logic for issue #1215 already exists in
`get_transition_events()` (`src/exo/master/placement.py:208-227`) — when
an instance is deleted, `TaskStatusUpdated(Cancelled)` events are
emitted for all Pending/Running tasks on that instance
- Combined with PR #1276's token-boundary cancellation in runners, the
full pipeline works end-to-end
- However, the existing test
`test_get_transition_events_delete_instance` passed `{}` for tasks, so
this path was never exercised
- This PR adds 4 tests covering the cancellation behavior:
  - Running tasks are cancelled on instance deletion
  - Pending tasks are cancelled on instance deletion
  - Completed/Failed/TimedOut/Cancelled tasks are left alone
  - Only tasks matching the deleted instance are cancelled

Closes #1215

## Test plan
- [x] `uv run pytest src/exo/master/tests/test_placement.py -v` — all 15
tests pass
- [x] `uv run basedpyright` — 0 errors
- [x] `uv run ruff check` — all checks passed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 20:12:23 +00:00
9 changed files with 169 additions and 31 deletions

View File

@@ -75,7 +75,7 @@ def load_tokenizer_for_bench(model_id: str) -> Any:
model_path = Path(
snapshot_download(
model_id,
allow_patterns=["*.json", "*.py", "*.tiktoken"],
allow_patterns=["*.json", "*.py", "*.tiktoken", "*.model"],
)
)

View File

@@ -412,7 +412,7 @@
<div>{col.label}</div>
{#if col.diskAvailable != null}
<div
class="text-[9px] text-exo-light-gray/60 normal-case tracking-normal mt-0.5"
class="text-[9px] text-white/70 normal-case tracking-normal mt-0.5"
>
{formatBytes(col.diskAvailable)} free
</div>
@@ -436,7 +436,7 @@
</div>
{#if row.prettyName}
<div
class="text-[10px] text-exo-light-gray/60"
class="text-[10px] text-white/60"
title={row.modelId}
>
{row.modelId}
@@ -450,7 +450,7 @@
title="View model details"
>
<svg
class="w-4 h-4 text-white/30 hover:text-white/60"
class="w-4 h-4 text-white/60 hover:text-white/80"
viewBox="0 0 24 24"
fill="currentColor"
>
@@ -483,12 +483,12 @@
clip-rule="evenodd"
></path>
</svg>
<span class="text-xs text-exo-light-gray/70"
<span class="text-xs text-white/70"
>{formatBytes(cell.totalBytes)}</span
>
<button
type="button"
class="text-exo-light-gray/40 hover:text-red-400 transition-colors mt-0.5 cursor-pointer"
class="text-white/50 hover:text-red-400 transition-colors mt-0.5 cursor-pointer"
onclick={() =>
deleteDownload(col.nodeId, row.modelId)}
title="Delete from this node"
@@ -530,7 +530,7 @@
).toFixed(1)}%"
></div>
</div>
<span class="text-[10px] text-exo-light-gray/60"
<span class="text-[10px] text-white/70"
>{formatSpeed(cell.speed)}</span
>
</div>
@@ -542,7 +542,7 @@
: "Download pending"}
>
{#if cell.downloaded > 0 && cell.total > 0}
<span class="text-exo-light-gray/70 text-xs"
<span class="text-white/70 text-xs"
>{formatBytes(cell.downloaded)} / {formatBytes(
cell.total,
)}</span
@@ -561,7 +561,7 @@
{#if row.shardMetadata}
<button
type="button"
class="text-exo-light-gray/50 hover:text-exo-yellow transition-colors cursor-pointer"
class="text-white/50 hover:text-exo-yellow transition-colors cursor-pointer"
onclick={() =>
startDownload(col.nodeId, row.shardMetadata!)}
title="Resume download on this node"
@@ -581,14 +581,13 @@
</svg>
</button>
{:else}
<span class="text-exo-light-gray/40 text-[10px]"
>paused</span
<span class="text-white/50 text-[10px]">paused</span
>
{/if}
{:else if row.shardMetadata}
<button
type="button"
class="text-exo-light-gray/30 hover:text-exo-yellow transition-colors cursor-pointer"
class="text-white/50 hover:text-exo-yellow transition-colors cursor-pointer"
onclick={() =>
startDownload(col.nodeId, row.shardMetadata!)}
title="Start download on this node"
@@ -608,8 +607,7 @@
</svg>
</button>
{:else}
<span class="text-exo-light-gray/50 text-sm">...</span
>
<span class="text-white/40 text-sm">...</span>
{/if}
</div>
{:else if cell.kind === "failed"}
@@ -631,7 +629,7 @@
{#if row.shardMetadata}
<button
type="button"
class="text-exo-light-gray/40 hover:text-exo-yellow transition-colors cursor-pointer"
class="text-white/50 hover:text-exo-yellow transition-colors cursor-pointer"
onclick={() =>
startDownload(col.nodeId, row.shardMetadata!)}
title="Retry download on this node"
@@ -663,7 +661,7 @@
{#if row.shardMetadata}
<button
type="button"
class="text-exo-light-gray/30 hover:text-exo-yellow transition-colors mt-0.5 opacity-0 group-hover:opacity-100 cursor-pointer"
class="text-white/50 hover:text-exo-yellow transition-colors mt-0.5 opacity-0 group-hover:opacity-100 cursor-pointer"
onclick={() =>
startDownload(col.nodeId, row.shardMetadata!)}
title="Download to this node"

View File

@@ -41,7 +41,7 @@ let
mlx = stdenv.mkDerivation rec {
pname = "mlx";
version = let v = "0.30.7.dev20260220+13998a05"; in
version = let v = "0.30.7.dev20260224+e862b122"; in
assert v == uvLockMlxVersion || throw "MLX version mismatch: nix/mlx.nix has ${v} but uv.lock has ${uvLockMlxVersion}. Update both the version and hash in nix/mlx.nix.";
v;
pyproject = true;
@@ -49,8 +49,8 @@ let
src = fetchFromGitHub {
owner = "rltakashige";
repo = "mlx-jaccl-fix-small-recv";
rev = "13998a054715edcdc93618fb1496c79c7c25ff7c";
hash = "sha256-fAqA3hFwNBx7FcoGnhQsIFpAIRbC2EerACm4Fvne0Cc=";
rev = "e862b1223a2310d4cc8df1135aed42f5246bc50a";
hash = "sha256-GosFIWxIB48Egb1MqJrR3xhsUsQeWdRk5rV93USY6wQ=";
};
patches = [

View File

@@ -14,10 +14,12 @@ from exo.shared.models.model_cards import ModelCard, ModelId, ModelTask
from exo.shared.topology import Topology
from exo.shared.types.commands import PlaceInstance
from exo.shared.types.common import CommandId, NodeId
from exo.shared.types.events import InstanceCreated, InstanceDeleted
from exo.shared.types.events import InstanceCreated, InstanceDeleted, TaskStatusUpdated
from exo.shared.types.memory import Memory
from exo.shared.types.multiaddr import Multiaddr
from exo.shared.types.profiling import NetworkInterfaceInfo, NodeNetworkInfo
from exo.shared.types.tasks import TaskId, TaskStatus, TextGeneration
from exo.shared.types.text_generation import InputMessage, TextGenerationTaskParams
from exo.shared.types.topology import Connection, SocketConnection
from exo.shared.types.worker.instances import (
Instance,
@@ -456,3 +458,117 @@ def test_tensor_rdma_backend_connectivity_matrix(
else:
ip_part = coordinator.split(":")[0]
assert len(ip_part.split(".")) == 4
def _make_task(
instance_id: InstanceId,
status: TaskStatus = TaskStatus.Running,
) -> TextGeneration:
return TextGeneration(
task_id=TaskId(),
task_status=status,
instance_id=instance_id,
command_id=CommandId(),
task_params=TextGenerationTaskParams(
model=ModelId("test-model"),
input=[InputMessage(role="user", content="hello")],
),
)
def test_get_transition_events_delete_instance_cancels_running_tasks(
instance: Instance,
):
# arrange
instance_id = InstanceId()
current_instances: dict[InstanceId, Instance] = {instance_id: instance}
target_instances: dict[InstanceId, Instance] = {}
task = _make_task(instance_id, TaskStatus.Running)
tasks = {task.task_id: task}
# act
events = get_transition_events(current_instances, target_instances, tasks)
# assert cancellation event should come before the deletion event
assert len(events) == 2
assert isinstance(events[0], TaskStatusUpdated)
assert events[0].task_id == task.task_id
assert events[0].task_status == TaskStatus.Cancelled
assert isinstance(events[1], InstanceDeleted)
assert events[1].instance_id == instance_id
def test_get_transition_events_delete_instance_cancels_pending_tasks(
instance: Instance,
):
# arrange
instance_id = InstanceId()
current_instances: dict[InstanceId, Instance] = {instance_id: instance}
target_instances: dict[InstanceId, Instance] = {}
task = _make_task(instance_id, TaskStatus.Pending)
tasks = {task.task_id: task}
# act
events = get_transition_events(current_instances, target_instances, tasks)
# assert
assert len(events) == 2
assert isinstance(events[0], TaskStatusUpdated)
assert events[0].task_id == task.task_id
assert events[0].task_status == TaskStatus.Cancelled
assert isinstance(events[1], InstanceDeleted)
def test_get_transition_events_delete_instance_ignores_completed_tasks(
instance: Instance,
):
# arrange
instance_id = InstanceId()
current_instances: dict[InstanceId, Instance] = {instance_id: instance}
target_instances: dict[InstanceId, Instance] = {}
tasks = {
t.task_id: t
for t in [
_make_task(instance_id, TaskStatus.Complete),
_make_task(instance_id, TaskStatus.Failed),
_make_task(instance_id, TaskStatus.TimedOut),
_make_task(instance_id, TaskStatus.Cancelled),
]
}
# act
events = get_transition_events(current_instances, target_instances, tasks)
# assert only the InstanceDeleted event, no cancellations
assert len(events) == 1
assert isinstance(events[0], InstanceDeleted)
def test_get_transition_events_delete_instance_cancels_only_matching_tasks(
instance: Instance,
):
# arrange
instance_id_a = InstanceId()
instance_id_b = InstanceId()
current_instances: dict[InstanceId, Instance] = {
instance_id_a: instance,
instance_id_b: instance,
}
# only delete instance A, keep instance B
target_instances: dict[InstanceId, Instance] = {instance_id_b: instance}
task_a = _make_task(instance_id_a, TaskStatus.Running)
task_b = _make_task(instance_id_b, TaskStatus.Running)
tasks = {task_a.task_id: task_a, task_b.task_id: task_b}
# act
events = get_transition_events(current_instances, target_instances, tasks)
# assert only task_a should be cancelled
cancel_events = [e for e in events if isinstance(e, TaskStatusUpdated)]
delete_events = [e for e in events if isinstance(e, InstanceDeleted)]
assert len(cancel_events) == 1
assert cancel_events[0].task_id == task_a.task_id
assert cancel_events[0].task_status == TaskStatus.Cancelled
assert len(delete_events) == 1
assert delete_events[0].instance_id == instance_id_a

View File

@@ -852,6 +852,8 @@ class QwenShardingStrategy(TensorParallelShardingStrategy):
layer.self_attn.o_proj = self.sharded_to_all_linear(
layer.self_attn.o_proj
)
layer.self_attn.n_heads //= self.N
layer.self_attn.n_kv_heads //= self.N
else:
assert isinstance(layer, Qwen3NextDecoderLayer)
if hasattr(layer, "linear_attn"):

View File

@@ -51,11 +51,33 @@ def _parse_json_calls(text: str) -> list[ToolCallItem] | None:
return None
def _try_parse_json(v: str) -> str | dict[str, Any] | list[Any]:
stripped = v.strip()
if (stripped.startswith("[") and stripped.endswith("]")) or (
stripped.startswith("{") and stripped.endswith("}")
):
try:
parsed: dict[str, Any] | list[Any] = json.loads(stripped) # pyright: ignore[reportAny]
return parsed
except (json.JSONDecodeError, ValueError):
pass
return v
def _flatten(p: dict[str, Any]) -> dict[str, str]:
return {
k: json.dumps(v) if isinstance(v, (dict, list)) else str(v) # pyright: ignore[reportAny]
for k, v in p.items() # pyright: ignore[reportAny]
}
result: dict[str, str] = {}
for k, v in p.items(): # pyright: ignore[reportAny]
if isinstance(v, dict):
resolved: dict[str, Any] = {
str(ik): _try_parse_json(str(iv)) if isinstance(iv, str) else iv # pyright: ignore[reportUnknownArgumentType]
for ik, iv in v.items() # pyright: ignore[reportUnknownVariableType]
}
result[k] = json.dumps(resolved)
elif isinstance(v, list):
result[k] = json.dumps(v)
else:
result[k] = str(v) # pyright: ignore[reportAny]
return result
json_tool_parser = ToolParser(

View File

@@ -8,7 +8,7 @@ from urllib.request import urlopen
h = sys.argv[1] if len(sys.argv) > 1 else sys.exit(f"USAGE: {sys.argv[0]} host")
ts = subprocess.run(
["/Applications/Tailscale.app/Contents/MacOS/Tailscale", "status"], check=True, text=True, capture_output=True
["tailscale", "status"], check=True, text=True, capture_output=True
).stdout.splitlines()
ip = next(
(sl[0] for line in ts if len(sl := line.split()) >= 2 if sl[1] == h), None

View File

@@ -15,7 +15,7 @@ if not (args := sys.argv[1:]):
kind = args[0] if args[0] in ("jaccl", "ring") else "both"
hosts = args[1:] if kind != "both" else args
ts = subprocess.run(
["/Applications/Tailscale.app/Contents/MacOS/Tailscale", "status"], check=True, text=True, capture_output=True
["tailscale", "status"], check=True, text=True, capture_output=True
).stdout.splitlines()
ip = {sl[1]: sl[0] for line in ts if len(sl := line.split()) >= 2}
ips = [ip[h] for h in hosts]

10
uv.lock generated
View File

@@ -378,7 +378,7 @@ dependencies = [
{ name = "loguru", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "mflux", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "mlx", version = "0.30.6", source = { registry = "https://pypi.org/simple" }, extra = ["cpu"], marker = "sys_platform == 'linux'" },
{ name = "mlx", version = "0.30.7.dev20260220+13998a05", source = { git = "https://github.com/rltakashige/mlx-jaccl-fix-small-recv.git?branch=address-rdma-gpu-locks#13998a054715edcdc93618fb1496c79c7c25ff7c" }, marker = "sys_platform == 'darwin'" },
{ name = "mlx", version = "0.30.7.dev20260224+e862b122", source = { git = "https://github.com/rltakashige/mlx-jaccl-fix-small-recv.git?branch=address-rdma-gpu-locks#e862b1223a2310d4cc8df1135aed42f5246bc50a" }, marker = "sys_platform == 'darwin'" },
{ name = "mlx-lm", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "msgspec", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "openai-harmony", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
@@ -1025,7 +1025,7 @@ dependencies = [
{ name = "huggingface-hub", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "matplotlib", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "mlx", version = "0.30.6", source = { registry = "https://pypi.org/simple" }, extra = ["cuda13"], marker = "sys_platform == 'linux'" },
{ name = "mlx", version = "0.30.7.dev20260220+13998a05", source = { git = "https://github.com/rltakashige/mlx-jaccl-fix-small-recv.git?branch=address-rdma-gpu-locks#13998a054715edcdc93618fb1496c79c7c25ff7c" }, marker = "sys_platform == 'darwin'" },
{ name = "mlx", version = "0.30.7.dev20260224+e862b122", source = { git = "https://github.com/rltakashige/mlx-jaccl-fix-small-recv.git?branch=address-rdma-gpu-locks#e862b1223a2310d4cc8df1135aed42f5246bc50a" }, marker = "sys_platform == 'darwin'" },
{ name = "numpy", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "opencv-python", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "piexif", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
@@ -1072,8 +1072,8 @@ cuda13 = [
[[package]]
name = "mlx"
version = "0.30.7.dev20260220+13998a05"
source = { git = "https://github.com/rltakashige/mlx-jaccl-fix-small-recv.git?branch=address-rdma-gpu-locks#13998a054715edcdc93618fb1496c79c7c25ff7c" }
version = "0.30.7.dev20260224+e862b122"
source = { git = "https://github.com/rltakashige/mlx-jaccl-fix-small-recv.git?branch=address-rdma-gpu-locks#e862b1223a2310d4cc8df1135aed42f5246bc50a" }
resolution-markers = [
"sys_platform == 'darwin'",
]
@@ -1108,7 +1108,7 @@ version = "0.30.7"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "jinja2", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "mlx", version = "0.30.7.dev20260220+13998a05", source = { git = "https://github.com/rltakashige/mlx-jaccl-fix-small-recv.git?branch=address-rdma-gpu-locks#13998a054715edcdc93618fb1496c79c7c25ff7c" }, marker = "sys_platform == 'darwin'" },
{ name = "mlx", version = "0.30.7.dev20260224+e862b122", source = { git = "https://github.com/rltakashige/mlx-jaccl-fix-small-recv.git?branch=address-rdma-gpu-locks#e862b1223a2310d4cc8df1135aed42f5246bc50a" }, marker = "sys_platform == 'darwin'" },
{ name = "numpy", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "protobuf", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },
{ name = "pyyaml", marker = "sys_platform == 'darwin' or sys_platform == 'linux'" },