mirror/exo - exo - Gitea: Git with a cup of tea

mirror/exo

mirror of https://github.com/exo-explore/exo.git synced 2026-05-24 22:56:08 -04:00

Author	SHA1	Message	Date
Andrei Cravtov	74e9fe15e6	fix(bug): EventRouter lifetime-handling fixed, no more process crashes (#2102 ) ## Motivation Trying to (partially) fix [this](https://github.com/exo-explore/exo/issues/2101) issue. ## Changes Changed channels (in channels.py) to support exception overriding. Made EventRouter channels throw a subclass of the resource closed/broken errors. The current lifetime logic of EventRouter in event loop no longer blows up because components that use channels from EventRouter now catch the subclass exceptions in the run method: Worker, Master, DownloadCoordinator, RunnerSupervisor. Added logic to throw when API server exits without being asked to shut down - this kill the sleep-forever in the task-group.	2026-05-22 14:20:04 +01:00
Evan Quiney	90f24bef30	fix model cards not validating properly after #2071 (#2096 )	2026-05-15 15:17:35 +00:00
Andrei Cravtov	5097b2665d	Tweaked workspace settings (#2095 ) workspace settings	2026-05-15 13:04:50 +00:00
rltakashige	bc6661e6aa	Add node backends to model cards (#2071 ) Co-authored-by: Evan <evanev7@gmail.com>	2026-05-15 12:52:12 +00:00
Andrei Cravtov	14aab35688	Runner error handling (#2093 ) # Runner error handling ## Motivation Runner failures were mostly surfaced as plain shutdown messages, which made root cause hard to spot from API errors or runner status. This adds a MVP path for preserving runner crash context and attaching known stderr diagnostics to failure reports. ## Changes - Added `RunnerTerminationError` for Python exceptions raised inside runner bootstrap - Changed runner bootstrap to send `Event \| RunnerTerminationError` over the private runner channel - Moved public `RunnerFailed` emission back into supervisor - Added stderr-only `RunnerDiagnosticCollector` - + Added known diagnostics for Metal GPU timeout, ring socket receive errno, and ring transport abort - Added diagnostics to `RunnerFailed` and `ErrorChunk` - Tweaked async process termination to join briefly before terminate/kill - Updated tests/fixtures for new failure payload shape - Added Ruff VS Code formatter settings ## Why It Works Runner child now reports raw-ish failure context to supervisor instead of publishing failed status directly. Supervisor still owns process lifecycle, exit code/signal handling, in-flight task error chunks, and final runner status. Stderr diagnostics stay best effort and only known root-cause variants are surfaced. ## Test Plan ### Manual Testing Hardware: remote runner logs from e16/e11/e4/e2 What you did: - inspected live runner stderr logs - used observed Metal GPU timeout and ring socket errors as initial diagnostic targets ### Automated Testing - `nix flake check` - supervisor test covers error chunk + failed status emission - plan lifecycle test updated for failed runner diagnostics - type/lint checks cover new runner channel union --------- Co-authored-by: Evan Quiney <evanev7@gmail.com>	2026-05-15 12:40:59 +00:00
Heidar	88d46d46fd	fix: omit null delta fields in streaming chat completions (issue #2082 ) (#2092 ) ## Motivation Streaming /v1/chat/completions responses emitted null for tool_calls, function_call, name, and tool_call_id in every delta chunk. The OpenAI streaming spec marks these fields as non-nullable — they must either carry a real value or be absent entirely. Spec-correct clients doing delta.get("tool_calls", []) receive None and crash with 'NoneType' object is not iterable. Root cause: the streaming serialisation path called model_dump_json() without exclude_none=True, while the request-parsing path already used it correctly. Three call sites in chat_completions.py and two in responses.py were affected. ## Testing Before — every delta carries explicit nulls: $ curl -sN -X POST http://localhost:52415/v1/chat/completions \ -H 'Content-Type: application/json' \ -d '{"model":"mlx-community/Qwen3.5-2B-MLX-8bit","messages":[{"role":"user"," content":"hi"}],"max_tokens":3,"stream":true}' \ \| grep "^data: " data: {"id":"7c4dae10-...","choices":[{"index":0,"delta":{"role":"assistant","c ontent":null,"reasoning_content":"Okay","name":null,"tool_calls":null,"tool_cal l_id":null,"function_call":null},"logprobs":null,"finish_reason":null,"usage":n ull}],"usage":null,"service_tier":null} data: {"id":"7c4dae10-...","choices":[{"index":0,"delta":{"role":"assistant","c ontent":null,"reasoning_content":",","name":null,"tool_calls":null,"tool_call_i d":null,"function_call":null},"logprobs":null,"finish_reason":null,"usage":null }],"usage":null,"service_tier":null} data: {"id":"7c4dae10-...","choices":[{"index":0,"delta":{"role":"assistant","c ontent":" the","reasoning_content":null,"name":null,"tool_calls":null,"tool_cal l_id":null,"function_call":null},"logprobs":null,"finish_reason":"length","usag e":{"prompt_tokens":11,...}}],"usage":null,"service_tier":null} data: [DONE] After — only populated fields are emitted: data: {"id":"demo","object":"chat.completion","created":...,"model":"mlx-commun ity/Qwen3.5-2B-MLX-8bit","choices":[{"index":0,"delta":{"role":"assistant","rea soning_content":"Okay"}}]} data: {"id":"demo","object":"chat.completion","created":...,"model":"mlx-commun ity/Qwen3.5-2B-MLX-8bit","choices":[{"index":0,"delta":{"role":"assistant","rea soning_content":","}}]} data: {"id":"demo","object":"chat.completion","created":...,"model":"mlx-commun ity/Qwen3.5-2B-MLX-8bit","choices":[{"index":0,"delta":{"role":"assistant","con tent":" the"},"finish_reason":"length"}],"usage":{"prompt_tokens":11,"completio n_tokens":3,"total_tokens":14,...}} data: [DONE]	2026-05-14 16:32:54 +00:00
Heidar	e8ec8d5010	fix ollama API compatibility for VS Code Copilot (#2091 ) Ollama adapter fixes for VS Code Copilot (#2042): - /api/version: bare semver "1.0.0" - Copilot parseInts each segment. - /api/show: populate model_info + capabilities - Copilot crashes on null model_info and filters by `tools`. - Add POST /ollama/v1/chat/completions - ollama serves the OpenAI-compat route here, BYOK clients 405 without it. Before: <img width="1380" height="144" alt="image" src="https://github.com/user-attachments/assets/99d5464f-187d-4432-9a31-8229c55aa209" /> After: <img width="1362" height="181" alt="image" src="https://github.com/user-attachments/assets/361dc006-d8df-435f-8d8b-4fa4f44a8c23" /> <img width="279" height="909" alt="image" src="https://github.com/user-attachments/assets/4621aba7-bd57-4762-8568-34a3383a6025" /> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-14 16:12:58 +00:00
Heidar	1fd15d59fc	create directory on startup (#2089 ) ## Motivation <!-- Why is this change needed? What problem does it solve? --> <!-- If it fixes an open issue, please link to the issue here --> When you first run `uv run exo` you get an error like : `FileNotFoundError: [Errno 2] No such file or directory: '/Users/heidar/.exo/models'` Manually tested on Macbook Pro M1 32GB Fixes issue - https://github.com/exo-explore/exo/issues/2090	2026-05-14 16:03:51 +00:00
Evan Quiney	4466cd5323	use custom mlx sources for linux (#2087 ) switch to hosting mlx sources on github & cachix instead of using a broken version of mlx. closes #2043.	2026-05-13 10:45:11 +01:00
Andrei Cravtov	ed2d10bdc6	Redirect runner stdout/stderr to file logs (#2084 ) ## Motivation We want to use log mining tools like [Drain3](https://github.com/logpai/Drain3) to get standardized error formats, but for that we should record runner stdout/stderr in a massive append-only log to gather training data for such tools. Also useful for future opt-in telemetry. ## Changes The stdout/stderr from runner now splits into 3 tasks: 1) raw write to dedicated runner logs 2) sanitized line-by-line logging with log-guru 3) stub for further error-processing (i.e. turning lines into errors) ### Manual Testing Works on 4x mac mini clusted connected as TB4 ring.	2026-05-12 11:48:08 +01:00
Andrei Cravtov	87c72fc1fd	Fixes issue #2068 (#2083 ) ## Motivation To fix https://github.com/exo-explore/exo/issues/2068 ## Changes Adds queue shutdown logic & hard-timeouts for closing server. ## Why It Works Prevents API from hanging more than 5 seconds.	2026-05-11 12:15:22 +00:00
Evan Quiney	b76bc30107	bump rust versions (#2081 )	2026-05-10 17:11:46 +00:00
team-wcv	08ffa5f637	Map GLM 4.7 stop tokens to GLM 4 IDs (#2061 ) ## Motivation GLM 4.7 reuses the GLM 4 chat-template tokenizer, but the model card and EOS-detection path didn't have an explicit mapping for it, so OpenAI-compatible clients didn't see a clean stop and the runner emitted follow-on role turns (e.g. \`<\|user\|>\` continuations after \`<\|assistant\|>\`'s output). ## Changes \`src/exo/worker/engines/mlx/utils_mlx.py\` — add the GLM 4 stop-token IDs as the EOS set when the loaded model's tokenizer matches GLM 4 / 4.7 chat templates. ## Why It Works The GLM 4 tokenizer's \`<\|user\|>\`, \`<\|observation\|>\`, and \`<\|endoftext\|>\` IDs are stable across the GLM 4 / 4.7 line; treating any of them as EOS lets the runner stop at the assistant turn boundary the same way it stops at \`</s>\` for Llama-style models. No prompt-template changes — only the stop set widens. ## Test Plan ### Automated Testing New unit test \`src/exo/worker/tests/unittests/test_mlx/test_eos_token_ids.py\` covering: GLM 4 / 4.7 path returns the expected stop ID set; non-GLM path returns the standard EOS only. \`\`\` src/exo/worker/tests/unittests/test_mlx/test_eos_token_ids.py .. === 2 passed in 0.01s === \`\`\` \`uv run basedpyright\` and \`uv run ruff check\` both clean. ### Manual Testing Hardware: 4-node Apple Silicon cluster, M5 Max master. - Loaded \`mlx-community/GLM-4.7-Air-mlx-4bit\`, ran chat completion via \`/v1/chat/completions\`. Before this fix the assistant turn ran on into a synthetic \`<\|user\|>\` continuation; after the fix the response stops cleanly at the assistant boundary. --------- Co-authored-by: jw-wcv <101585096+jw-wcv@users.noreply.github.com> Co-authored-by: Evan Quiney <evanev7@gmail.com>	2026-05-10 17:02:22 +00:00
Andrei Cravtov	45df74ba98	Andrei/mp capture stdio (#2056 ) ## Motivation Process-isolated runner crashes and C-extension failures can write directly to fd-level stdout/stderr, bypassing Python/loguru. We need to capture that output per runner process without polluting the main process or other workers, and without breaking operation when the parent stdio is detached. ## Changes - Added `AsyncProcess`, a spawn-only multiprocessing wrapper that redirects child stdout/stderr to pipes and exposes them as in-memory `Receiver[bytes]`s - Replaced runner-supervisor's raw `multiprocessing.Process` usage with `AsyncProcess` - Added `--no-stdio`, redirecting stdin/stdout/stderr to `/dev/null` after logging is configured - Disabled verbose MLX - Added tests covering stdio capture, child crashes, repeated bad children, SIGTERM/SIGKILL shutdown escalation, stdio detachment, and spawning captured children from a stdio-detached parent ## Why It Works The parent can redirect its own stdio fds to `/dev/null`, while `AsyncProcess` installs fresh pipe fds over fd 1 and 2 inside each spawned child. That keeps stdio-detached parents quiet while preserving per-runner stdout/stderr capture. Runner shutdown is still bounded: SIGTERM grace first, then SIGKILL escalation if needed. Next direction: the runner supervisor currently drains captured output and logs it as stdout/debug and stderr/warning. This should be split into more useful process-isolated error reporting instead of just log forwarding (regex match on errors to obtain "reason" string, best effort). ## Test Plan ### Manual Testing Ran on 4 Mac Minis in a Thunderbolt 4 ring, can see that runner's stdout/stderr contents are being captured. ### Automated Testing - Added async-process tests for fd-level stdout/stderr capture, Python traceback capture, bounded-buffer output, child `exit`/abort, parent stdio preservation, fd leak checks, spawn-context mp channels, and SIGTERM/SIGKILL shutdown behavior - Added stdio-detach tests proving stdio detaches to `/dev/null`, a stdio-detached parent can still spawn and capture a child, and the same stdio-detached parent can spawn/capture multiple children sequentially - Updated runner-supervisor tests for the new `AsyncProcess.exitcode` path	2026-05-09 22:45:14 +01:00
Kerollos Magdy	ce37bdceb6	fix: Create directory for PID file if it doesn't exist (#2075 ) Ensure the directory for the PID file exists before creating it. ## Motivation Fixes https://github.com/exo-explore/exo/issues/2074 ## Changes <!-- Describe what you changed in detail --> ## Why It Works <!-- Explain why your approach solves the problem --> ## Test Plan ### Manual Testing <!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB, connected via Thunderbolt 4) --> <!-- What you did: --> <!-- - --> ### Automated Testing <!-- Describe changes to automated tests, or how existing tests cover this change --> <!-- - -->	2026-05-09 12:10:22 +00:00
Andrei Cravtov	e5a1e5dadb	Create PID file locking for EXO (#2072 ) ## Motivation EXO should be PID file locked, to prevent duplicate processes from clobbering the log, right now this isn't the case. ## Changes I added a wrapper around a Rust PID file lock library, and used it to implement PID locking for EXO, with the PID file being in exo cache directory. ## Test Plan ### Manual Testing Tested on e11, trying to spawn duplicate EXO processes prevented.	2026-05-08 18:50:18 +01:00
ciaranbor	fa57131374	Integration tests infra (#1995 ) ## Motivation No automated integration tests exist for exo. Manual testing against real hardware clusters is slow and error-prone. We need a pytest framework that deploys clusters via `eco`, runs inference scenarios, and tears down cleanly. ## Changes - `tools/src/exo_tools/` — New workspace member shared by bench, eval, and tests: - `client.py` — `ExoClient` HTTP client (extracted from `bench/harness.py`) - `harness.py` — instance lifecycle helpers (placement, wait-for-ready, etc.) - `cluster.py` — `EcoSession` for eco cluster lifecycle (deploy/stop/start/release/logs/exec) with unique `USER=<prefix>-<uuid>` per session and atexit/signal cleanup - `tests/integration/` — 17 pytest tests across 5 files: - `test_1node.py` — place, chat, multi-turn, delete, state/models endpoints, cluster snapshot, download-from-scratch - `test_2node.py` — parametrized tensor/jaccl + pipeline/ring inference and multi-turn - `test_4node.py` — parametrized 4-node pipeline/ring inference, cluster state - `test_resilience.py` — full disconnect/reconnect cycle (2-node → disconnect → 1-node → reconnect → 2-node) - `test_dashboard.py` — Playwright: dashboard loads, shows node info, chat flow - `helpers.py` — placement/inference helpers, re-exports from `exo_tools` - `conftest.py` — session-scoped cluster fixtures with constraint-based eco reservations; `--hosts` override; `EXO_REF` env var for CI deployments from a GitHub branch - `bench/` — Updated imports from `exo_tools.client` / `exo_tools.harness` - `pyproject.toml` — Added `tools` workspace member, `playwright` dev dep, `--ignore=tests/integration` ## Why It Works Tests use `eco` for cluster lifecycle and `ExoClient` for API interactions — same tools humans use. Session-scoped fixtures deploy once per file. Unique eco users prevent test runs from interfering with each other or manual usage. ## Test Plan ### Automated Testing - `uv run pytest tests/integration/ -v -s` — full suite (~4-5 min, 17/17 passing) - `uv run pytest tests/integration/ -v -s --hosts s4,s9,s10,s22` — pin specific hosts - `EXO_REF=main uv run pytest tests/integration/ -v` — deploy from a GitHub branch (CI) - `uv run pytest` — confirms integration tests are excluded from default runs	2026-05-08 17:15:08 +01:00
Alex Cheema	414132ae9c	Use time-weighted power sampling (#2038 ) ## Why The power sampler currently averages sampled wattage values arithmetically. That can be materially wrong when sample intervals are uneven: a short high-power spike gets the same weight as a long steady interval. Energy should be computed by integrating power over time, and average power should be derived from energy / elapsed time. ## How - Store each power sample with its relative timestamp. - Anchor the first sample at `t=0` and take a final sample at `elapsed` when producing results. - Integrate per-node power using the trapezoidal rule. - Sum node energy for total cluster energy, then derive total average system power from total energy / elapsed. - Add focused unit tests for uneven sample intervals and the single-sample fallback. ## Tests - `uv run pytest src/exo/utils/tests/test_power_sampler.py` - `uv run basedpyright` - `uv run ruff check src/exo/utils/power_sampler.py src/exo/utils/tests/test_power_sampler.py` - `nix fmt`	2026-05-07 10:42:14 +00:00
Alex Cheema	edef8004f8	Store custom model cards in State (#2024 ) ## Why Workers currently update their custom model-card cache by reacting to `CustomModelCardAdded` / `CustomModelCardDeleted` events directly. That is another snapshot footgun: a worker restored from State may never see the historical add/delete event, so the durable State must include the desired custom-card set. ## How - Add `State.custom_model_cards`, keyed by `ModelId`. - Reduce `CustomModelCardAdded` into State. - Reduce `CustomModelCardDeleted` into State. - Add focused reducer tests for add and delete. This PR only makes custom cards durable in State. A follow-up PR will make workers reconcile their on-disk custom-card cache from this state instead of relying on those events directly. ## Tests - `uv run pytest src/exo/shared/tests/test_apply/test_apply_custom_model_cards.py src/exo/shared/tests/test_state_serialization.py` - `uv run pytest` - `uv run ruff check src/exo/shared/types/state.py src/exo/shared/apply.py src/exo/shared/tests/test_apply/test_apply_custom_model_cards.py` - `uv run basedpyright` - `nix fmt`	2026-05-07 09:06:39 +01:00
Alex Cheema	a0c00f9dfd	fix(placement): gate RDMA on nodeRdmaCtl.enabled at both endpoints (#2014 ) ## Summary - Fixes a bug where `POST /place_instance` (and the dashboard UI) would accept an MlxJaccl/RDMA instance spanning nodes whose `nodeRdmaCtl.enabled` was `false`, because topology + placement consulted Thunderbolt-derived RDMA edges without checking the per-node `rdma_ctl` status. - Three-layer fix: topology only emits `RDMAConnection` edges when both endpoints have `nodeRdmaCtl.enabled = true`; flipping a node to disabled immediately purges every RDMA edge touching it; `place_instance` additionally rejects RDMA cycles containing any disabled or unobserved node as a defense-in-depth check on the API/master path. ## Details - `src/exo/shared/apply.py` - `MacThunderboltConnections` case now filters out RDMA connections whose source or sink lacks observed-and-enabled `rdma_ctl` status (missing entry → treated as disabled). - `RdmaCtlStatus` case now calls `topology.remove_all_rdma_connections_touching(node_id)` when the node reports disabled, so consumers don't have to wait for the next TB poll. - `src/exo/shared/topology.py` - New `Topology.remove_all_rdma_connections_touching(node_id)` removes every RDMA edge incident to the node (incoming and outgoing) while leaving socket edges intact. - `src/exo/master/placement.py` - `place_instance` accepts `node_rdma_ctl: Mapping[NodeId, NodeRdmaCtlStatus] \| None`. The `is_rdma_cycle` filter now also requires `nodeRdmaCtl.enabled` for every node in the cycle. MlxJaccl placement raises the existing "no RDMA-connected cycles available" error if no qualifying cycle remains. - `src/exo/api/main.py`, `src/exo/master/main.py` - Both placement entrypoints now pass `state.node_rdma_ctl` through. ## Tests - `src/exo/shared/tests/test_apply/test_apply_rdma_gating.py` (new): six unit tests covering enabled/disabled/missing combinations on apply, the immediate-purge transition, and that purging RDMA edges leaves socket edges untouched. - `src/exo/master/tests/test_placement.py`: existing `test_tensor_rdma_backend_connectivity_matrix` updated to pass `node_rdma_ctl`. Two new tests assert MlxJaccl placement is rejected when any cycle node is `enabled=false` or has no `rdma_ctl` entry. ## Test plan - [x] `uv run basedpyright` — 0 errors - [x] `uv run ruff check` — clean - [x] `nix fmt` - [x] `uv run pytest` — 429 passed, 1 skipped - [ ] On a real mixed cluster (s15/s16 disabled, s17/s18 enabled), confirm: - [ ] `POST /place_instance` for an RDMA instance including s15 or s16 returns an error - [ ] An RDMA instance can still be placed across {s17, s18} - [ ] `GET /state` shows no `sourceRdmaIface`/`sinkRdmaIface` on s15↔s16 connections - [ ] Dashboard previews don't surface RDMA-spanning options that include s15/s16 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 07:00:15 +00:00
Drifter4242	89d20c1888	fix(inference): prevent TP collective deadlock via agree_on_tasks order (#2048 ) If you have two machines and make two requests at the same time, it can crash. This is because the tasks can sometimes end up in different orders on different machines. We need to sort the tasks and mx_all_gather_tasks already sorts the tasks but the code ignores that ordering. The fix is to make sure the sort order is preserved. The rest is written by Sonnet (reviewed by me): Tensor-parallel inference requires that every rank enqueues tasks in the same order before running agree_on_tasks collectives. The old implementation filtered from _maybe_queue: self._queue.extend(task for task in self._maybe_queue if task in agreed) self._maybe_queue = [task for task in self._maybe_queue if task in different] Because _maybe_queue is independently ordered per-rank (tasks arrive via gRPC in whatever order the API server sends them), two concurrent requests could produce different _maybe_queue orderings on rank 0 vs rank 1. The filter then preserved those different orders into _queue, so each rank started processing tasks in a different sequence. The next mlx collective (all_reduce, all_gather, etc.) on rank 0 corresponded to a different task than on rank 1 → permanent deadlock. Fix: extend from agreed directly. mx_all_gather_tasks returns agreed as a list sorted by task_id on all ranks, so every rank appends the same sequence regardless of local arrival order. Applies to both SequentialGenerator and BatchGenerator. ## Motivation `agree_on_tasks` is called on every rank after accumulating new requests in `_maybe_queue`. Its job is to run an `all_gather` collective so all ranks agree on which tasks to promote to `_queue` before the next inference step. The old implementation re-imposed local arrival order when extending `_queue`: ```python self._queue.extend(task for task in self._maybe_queue if task in agreed) ``` `mx_all_gather_tasks` already returns `agreed` sorted by `task_id` — the same deterministic order on every rank. But iterating `self._maybe_queue` instead of `agreed` discarded that sort and substituted the local gRPC arrival order, which differs per rank under concurrent load. Two concurrent requests arriving in `[A, B]` order on rank 0 and `[B, A]` on rank 1 caused the first MLX collective in the next step to hang permanently: each rank was executing a different task's collective and would never match. ## Changes `SequentialGenerator.agree_on_tasks` and `BatchGenerator.agree_on_tasks`: ```python # Before self._queue.extend(task for task in self._maybe_queue if task in agreed) self._maybe_queue = [task for task in self._maybe_queue if task in different] # After self._queue.extend(agreed) # preserves mx_all_gather_tasks sort order self._maybe_queue = list(different) # already in local order; filter was redundant ``` ## Why It Works `mx_all_gather_tasks` (in `utils_mlx.py`) computes the agreed set then sorts by `task_id`: ```python agreed = [local_tasks[tid] for tid in sorted(agreed_ids)] ``` Because `task_id` is a UUID and the sort is lexicographic, every rank produces the same `agreed` list regardless of local arrival order. Using `agreed` directly preserves this guarantee. The `different` list (tasks not yet seen on all ranks) is built by iterating `tasks` in local order, which is already correct. ## Test Plan ### Manual Testing Hardware: 2× Mac Studio M3 Ultra 512 GB, Thunderbolt 5 direct bridge, `MlxJaccl` RDMA tensor-parallel (`moonshotai/Kimi-K2.6`, 595 GB INT4, 61 layers). - Sent concurrent streaming requests; confirmed all complete without deadlock. - This hardware configuration (sub-millisecond inter-node latency) is the most likely to trigger the race, as requests from separate HTTP connections can reach rank 0 and rank 1 in opposite order before `agree_on_tasks` runs. ### Automated Testing All existing tests pass: `pytest src -m "not slow" --import-mode=importlib` — 422/422 passed. The existing `test_event_ordering.py` covers the `agree_on_tasks` call path with a mock that returns tasks in consistent order; the race requires real distributed hardware to reproduce deterministically.	2026-05-06 12:24:58 +00:00
Evan Quiney	dbcceaa50c	Initialise _cancelled_tasks in ImageEngine (#2051 ) we yielded nonsense chunks from engines; we didn't initialize the image engine correctly. mostly rewrite of #2049 --------- Co-authored-by: ciaranbor <ciaranborourke-dev@proton.me>	2026-05-05 17:27:57 +01:00
Sam Bradbury	9c6ff4ce95	feat: update rdma_ctl instructions (#1977 ) ## Motivation The RDMA setup instructions were missing a step: after booting to Recovery mode, users need to open Terminal from the Utilities menu before they can run the `rdma_ctl` command. Without this step, users following the instructions wouldn't know how to access a terminal in Recovery mode. This step was already in the README just not in the UI notifications. ## Changes Added a missing instruction step — "Open Terminal from the Utilities menu" — to three instances of the RDMA setup flow in `dashboard/src/routes/+page.svelte`. ## Why It Works N/A copy change only. ## Test Plan ### Manual Testing Hardware: MacBook Pro M4 Max 48GB ### Automated Testing No automated tests affected; this is a UI copy change only. Co-authored-by: Sam Bradbury <sam@consultbradbury.com>	2026-05-01 11:18:57 +00:00
ecohash-co	b26268dfaf	fix(macos-app): disable URL response caching for cluster-state polling (#2005 ) Fixes #2004. `ClusterStateService` polls `/state` at 2 Hz via `URLSession.shared`, which keeps an on-disk `URLCache` attached by default. Every polled response body gets persisted under `~/Library/Caches/exolabs.EXO/`, sustaining ~500–620 KB/sec of file-backed memory dirtied — far above macOS's ~25 KB/sec per-process daily-average baseline. Six microstackshot reports observed on a single Mac Studio M3 Ultra over eight days, with one 15-hour run accumulating 34.36 GB of cache writes. Heaviest stack on every diagnostic report (96–98% of samples): ``` _dispatch_workloop_worker_thread → _dispatch_block_async_invoke2 → __CFURLCache::CreateAndStoreCacheNode → write ``` Full diagnostic data and analysis in #2004. ## What changed `ClusterStateService` now defaults to an ephemeral, non-caching `URLSession` instead of `URLSession.shared`. Cluster-state responses are time-sensitive and small; nothing benefits from being cached on disk. ```swift private static func makeNonCachingSession() -> URLSession { let config = URLSessionConfiguration.ephemeral config.urlCache = nil config.requestCachePolicy = .reloadIgnoringLocalCacheData return URLSession(configuration: config) } ``` The existing per-request `request.cachePolicy = .reloadIgnoringLocalCacheData` calls are kept as defense in depth — they only affect read behavior, but harmless to leave alongside the session-level config. ## Scope - Behavioral: none. Polled requests still go out at the same cadence; responses still parse the same; no semantic change to any API surface. - Test injection: the `session:` parameter remains in `init`, so tests can still inject a custom mock session unchanged. - `BugReportService` and other `URLSession.shared` callers: untouched. If maintainers prefer an app-wide URLCache disable instead, happy to switch the approach (issue body has the alternative spelled out). ## Verification Verified locally that compiling EXO with this change produces a working menubar app and `ClusterStateService` continues to fetch state correctly. After ~30 min of idle polling, no new entries in `/Library/Logs/DiagnosticReports/EXO_.diag` and no growth in `~/Library/Caches/exolabs.EXO/`. ## Test plan - [ ] Build EXO from this branch on macOS 26.4 - [ ] Launch, let cluster state polling run for 30+ min - [ ] Confirm no new microstackshot diagnostic reports - [ ] Confirm `~/Library/Caches/exolabs.EXO/Cache.db` does not grow 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Jordan Miller <jordan.d.miller@gmail.com>	2026-05-01 10:41:10 +00:00
ciaranbor	8dae3ecb9a	A few targeted tweaks to address HF rate limits (#2009 ) ## Motivation - exo bursts ~200 HF Hub-API requests on every cold start, blowing past the anonymous 500-req/5-min budget. - The existing retry loop catches 429 generically and gives up in ~3s — well before HF's reset window. - `file_meta` and `_download_file` had no 429 handling at all (became `AssertionError`). - Disk file-list cache was bypassed on every process restart. ## Changes All in `src/exo/download/download_utils.py` + tests. - Parse `t=` from HF's `RateLimit` header on 429; sleep `min(t, 300s) + jitter`. - Handle 429 at all three call sites (`_fetch_file_list`, `file_meta`, `_download_file`). - `n_attempts`: 3 → 5. - Disk cache now primary across restarts (24h mtime TTL). - `?recursive=true` instead of N+1 subdir walks. ## Why It Works `t=<seconds>` is HF's "wait this long and you'll be unblocked" — sleeping that long lets the window reset. Disk-cache-as-primary plus recursive listing cuts cold-start Hub-API traffic by ~10×. ## Test Plan ### Manual Testing MacBook Pro M1 Max. Tripped the real HF 429. Pre-fix: failed in 3.4s. Post-fix: slept (HF returned `t=158`) and recovered. ### Automated Testing - New `test_rate_limit_handling.py` (19 tests) — header parsing, retry-loop behaviour, plus HTTP-level coverage that mocks aiohttp to return a 429 and asserts each call site raises `HuggingFaceRateLimitError(retry_after=52.0)`. - New `TestFileListCacheTTL` in `test_offline_mode.py` — fresh cache hits, stale cache refetches. - 421 tests pass; basedpyright / ruff / nix fmt clean.	2026-04-30 18:06:15 +00:00
Alex Cheema	fb12b403ea	fix(app): tighten Share Bug Report prompt layout (#2008 ) ## Summary Follow-ups to #2003 based on feedback that the Share Bug Report window felt visually weighty: too much padding above and below, and a description editor that invited an essay rather than a one-liner. ## Changes (one file) `app/EXO/EXO/Views/BugReportWindowController.swift`: - Auto-size the window to its content. Switched from `NSHostingView` + fixed `contentRect: 480x380` + SwiftUI `frame(minHeight: 320)` to `NSHostingController` with `sizingOptions = [.preferredContentSize, .minSize]`. The fixed-min combo was centering the form in dead vertical space. - Smaller, lower-pressure editor. Field is now labeled `Description (optional)` with a placeholder hint (`What were you doing when it broke?`) inside the editor. Editor height fixed at 72pt (was 120pt min). Replaced the long lead-in paragraph and headline with a single one-line caption between field and buttons: `Diagnostic logs will be uploaded with your report.` - Tighter spacing. Outer padding 20 -> 16, root spacing 16 -> 12, prompting-section spacing 12 -> 8. - Remove em dash from copy. `BugReportService` and the menu wiring are unchanged. ## Test plan - [ ] Click `Share Bug Report...` from the menu bar. - [ ] The window opens centered and sized to its content (no big empty bands top/bottom). - [ ] Description editor is visibly compact, with the placeholder hint showing when empty. - [ ] The optional-ness is conveyed by the field label (no separate help paragraph). - [ ] Caption `Diagnostic logs will be uploaded with your report.` appears in `.caption` style under the editor, above the buttons. - [ ] Resize the window: persists across re-opens (frame autosave still works). - [ ] Send/Cancel/Try Again/Done flows behave the same as before. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 15:10:26 +01:00
Alex Cheema	1606e63816	feat(app): open Share Bug Report in a dedicated window (#2003 ) ## Summary - Adds a top-level Share Bug Report… menu item to the macOS popover (between Check for Updates and Quit) with SF Symbol `ladybug`. - Clicking it opens a dedicated resizable `NSWindow` ("Send a Bug Report") that hosts the prompting / sending / success / failure flow. - Removes the description-less duplicate from Settings → Debug Info, and the dead `debugSection` it nominally lived behind. ## Why PR #1959 added a user-description prompt to the bug-report flow, but its trigger lived inside `ContentView.debugSection` — a view that's defined but never rendered in the body. The path users actually hit was `SettingsView.sendBugReportButton`, which called `BugReportService.sendReport(isManual: true)` without ever passing `userDescription`. So the description prompt was unreachable in the built app. ## Approach Per Apple HIG, an action that requires further input before completing should open a dialog, not transform the menu inline. So: - Add a top-level menu entry that ends in `…` (HIG: ellipsis indicates "further input required"). - Move the prompting/sending/success/failure state machine into a standalone `BugReportWindowController` modeled after the existing `SettingsWindowController`. - Single-instance window with frame-autosave name, sensible `contentMinSize`, resizable, native button layout (`.cancelAction` / `.defaultAction` keyboard shortcuts), light/dark-mode-correct `.textBackgroundColor` and `.separatorColor`. - Auto-focus the description field on open. `Try Again` from failure, `Open GitHub Issue` + `Done` from success. ## Files - `app/EXO/EXO/Views/BugReportWindowController.swift` (new) — controller + view. - `app/EXO/EXO/EXOApp.swift` — wire `BugReportWindowController` as a `@StateObject` and inject as environment object. - `app/EXO/EXO/ContentView.swift` — replace inline state machine with menu item that calls `bugReportWindowController.open()`. Remove now-unused state, helpers, and dead `debugSection`. - `app/EXO/EXO/Views/SettingsView.swift` — remove duplicate `sendBugReportButton`, `sendBugReport()`, and related `@State`. Section "Debug Info" keeps Thunderbolt / interface / RDMA info. `BugReportService` is unchanged. ## Test plan - [ ] Open the menu-bar popover → confirm Share Bug Report… appears between Check for Updates and Quit, with a ladybug icon. - [ ] Click it → a window titled "Send a Bug Report" appears, centered, with the description editor focused. - [ ] Resize the window → size persists across re-opens (frame autosave). - [ ] Type a description, press Return → upload succeeds, success card with Open GitHub Issue + Done appears. - [ ] Click Open GitHub Issue → browser opens with the description pre-filled into the issue template. - [ ] Send with empty description → upload still succeeds. - [ ] Press Esc from the prompting state → window closes. - [ ] On failure (e.g., offline) → error card with Try Again + Close appears; Try Again returns to the editor with the description preserved. - [ ] Open the Settings window → Debug Info section is unchanged except the Send Bug Report button is gone. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 13:05:07 +01:00
Alex Cheema	667a3bb0e5	feat: keep-models option when uninstalling EXO (#1997 ) ## Summary - Adds a Keep downloaded models (~/.exo/models) checkbox to the macOS uninstall confirmation dialog (Settings → Advanced → Danger Zone). The full `~/.exo` directory is now removed on uninstall by default; if the checkbox is checked, `~/.exo/models` is preserved. - The standalone `app/EXO/uninstall-exo.sh` gains a matching `--keep-models` flag and the same `~/.exo` cleanup so GUI and CLI flows stay in sync. Resolves the user home via `$SUDO_USER` since the script runs under `sudo`. Previously, "Uninstall EXO" only cleaned up system-level components (LaunchDaemon, network location, logs, app bundle) and left the entire `~/.exo` directory behind. Now uninstalling actually removes EXO's user data, with a one-click opt-out for the (potentially many GB) of downloaded models. ![Uninstall dialog with new checkbox](https://raw.githubusercontent.com/exo-explore/exo/703b7fbbf13441217ad2903bb199f07e92af4490/uninstall-dialog.png) > Note: the rendered icon in the screenshot above is the generic system folder icon because it was captured from a small standalone Swift binary (no app bundle / icon resource). When triggered from the actual EXO.app, the EXO app icon is shown. ## Test plan - [ ] Build EXO.app locally; open Settings → Advanced → Danger Zone → Uninstall EXO; confirm the new "Keep downloaded models (~/.exo/models)" checkbox is present and unchecked by default. - [ ] Uninstall with the checkbox checked → `~/.exo/models/` survives, all other entries under `~/.exo` are gone, system components removed, app moved to Trash. - [ ] Uninstall with the checkbox unchecked → `~/.exo` is fully removed. - [ ] `sudo app/EXO/uninstall-exo.sh --keep-models` → `~/.exo/models/` is preserved, the rest of `~/.exo` is removed. - [ ] `sudo app/EXO/uninstall-exo.sh` (no flag) → `~/.exo` is fully removed. - [ ] `app/EXO/uninstall-exo.sh --help` prints usage and exits 0; unknown args exit 2 with a usage hint. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Evan <evanev7@gmail.com>	2026-04-28 01:06:02 +00:00
Evan Quiney	c80b10c013	implement engine abstraction for mlx and mflux (#2000 ) refactor for future versions.	2026-04-28 00:58:17 +00:00
Alex Cheema	18ffe1df23	fix: uninstall-exo.sh removes both current and legacy bridge scripts (#1998 ) ## Summary The standalone `app/EXO/uninstall-exo.sh` only knew about the legacy filename `disable_bridge_enable_dhcp.sh`. On machines installed with newer EXO versions, the current `/Library/Application Support/EXO/disable_bridge.sh` was left behind, and the script then reported `EXO support directory not empty, leaving in place`. This PR makes the script try both filenames, removing whichever ones exist. Tolerates either, both, or neither being present without erroring. The Swift `NetworkSetupHelper.makeUninstallScript()` already handles both paths correctly, so the GUI uninstall flow is unaffected — this is a script-only fix. Caught while running an end-to-end uninstall on a real machine for #1997. ## Test plan Verified the new block in isolation against all four states: - [x] both `disable_bridge.sh` and `disable_bridge_enable_dhcp.sh` present → both removed - [x] only `disable_bridge.sh` present → removed cleanly - [x] only `disable_bridge_enable_dhcp.sh` present → removed cleanly (legacy install) - [x] neither present → prints the existing "already removed?" warning, exits 0 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 00:28:12 +00:00
rltakashige	f0d1371d89	MLX P/D (#1993 ) ## Motivation MLX only prefill server for Apple Silicon	2026-04-28 00:12:42 +00:00
Adam Durham	5d10188d3a	fix: route by in-flight tasks only — completed tasks were skewing load balance (#1989 ) The load balancer counted ALL tasks (Complete, Cancelled, TimedOut, Failed) instead of only Pending/Running ones. With 138 accumulated tasks and only 7 active, routing decisions were based on historical distribution, causing one node to appear permanently 'busier' and starving the other of work. Co-authored-by: Adam Durham <adam@example.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-27 16:03:12 +00:00
ciaranbor	f2a0db4e23	Extend bench/eval tooling (#1905 ) ## Motivation Extend bench/eval tooling with robustness features, streaming support, and align model configs with vllm eval for reproducible comparisons. ## Changes - exo_eval: Checkpoint/resume (JSONL), instance health monitoring + early abort, `top_k`/`min_p`/`enable_thinking` params, LCB `--release-version`/`--offset` - exo_bench: Streaming SSE (`--stream`), Kimi tokenizer fix for transformers 5.x - Both tools: Auto-detect running instances instead of requiring `--skip-instance-setup`; `--fresh-instance` to override - harness: SSE streaming client, `find_existing_instance()` shared helper, removed download timeout, settle-timeout default 0→7200s - models.toml: Added `enable_thinking`, aligned `max_tokens`/temps with vllm, added new models - API: Streaming SSE for `/bench/chat/completions` ## Why It Works - Checkpoint/resume uses append-only JSONL + skip-on-load so interrupted evals resume without re-running completed questions - Health monitoring races an `asyncio.Event` against API calls for fast abort when the instance dies - Auto-detection queries `/state` for existing instances matching the model ID before attempting placement - Streaming reuses the existing `generate_chat_stream` infrastructure from the regular chat endpoint	2026-04-27 16:53:43 +01:00
rltakashige	37f6f4f6c2	Add DeepSeek V4 Flash/Pro (#1978 ) Wait for upstream merge. --------- Co-authored-by: Evan <evanev7@gmail.com>	2026-04-27 15:20:50 +01:00
Adam Durham	48a922fd5c	fix: map presence_penalty and frequency_penalty from ChatCompletionRequest (#1991 ) Upstream PR #1947 added `presence_penalty` and `frequency_penalty` to `TextGenerationTaskParams` and the mlx-lm generator call sites, but missed wiring them up in the API adapter so they were silently dropped from incoming requests. This fixes the API mapping. Co-authored-by: Adam Durham <adam@example.com>	2026-04-27 08:58:59 +00:00
rltakashige	fd707de30b	Add more model cards (#1970 ) v1.0.71	2026-04-23 15:28:40 +01:00
Alex Cheema	45248c5c85	chore(app): hardcode bug report presigned-URL endpoint (#1971 ) ## Motivation The bug-report presigned-URL endpoint (`https://reports.exolabs.net/presigned-urls`) was injected at build time from the `EXO_BUG_REPORT_PRESIGNED_URL_ENDPOINT` GitHub Actions secret into `Info.plist`, then read at runtime by `BugReportService`. It isn't actually a secret — the POST body is just `{"keys":[...]}` with no credential (see `app/EXO/EXO/Services/BugReportService.swift:136-142`), abuse prevention lives server-side on the lambda, and the URL is already visible in every publicly-distributed DMG's `Info.plist`. Treating it as a repo secret added plumbing with no security benefit and broke local dev builds — hitting Send Bug Report on an uncustomised `just build-app` raised "Bug report endpoint is invalid". ## Changes - `app/EXO/EXO/Info.plist`: replace `$(EXO_BUG_REPORT_PRESIGNED_URL_ENDPOINT)` with the literal URL. - `.github/workflows/build-app.yml`: drop the `EXO_BUG_REPORT_PRESIGNED_URL_ENDPOINT` job-level env var and the xcodebuild build-setting passthrough. No other workflow changes. Swift code is unchanged — `BugReportService` still reads from `Info.plist`, which leaves an escape hatch if anyone ever needs to override via `xcodebuild EXOBugReportPresignedUrlEndpoint=...` without recompiling. Follow-up: the `EXO_BUG_REPORT_PRESIGNED_URL_ENDPOINT` repo secret can now be deleted in the GitHub Actions settings UI. ## Why It Works `Info.plist` variable substitution turns `$(FOO)` into whatever build setting `FOO` resolves to. CI was setting `FOO` via xcodebuild; local dev wasn't, so the key resolved to an empty string, which `BugReportService.fetchPresignedUploadUrls` rejects via the `!trimmedEndpointString.isEmpty` guard at `BugReportService.swift:131`. Hardcoding the literal string removes the substitution entirely, so every build — local or CI — gets the right value. ## Test Plan ### Manual Testing <!-- Hardware: MacBook Pro (macOS app build via Xcode) --> - `just build-app` with no extra env vars (reproduces the failure path on `main`). - `/usr/libexec/PlistBuddy -c "Print :EXOBugReportPresignedUrlEndpoint" app/EXO/build/Build/Products/Debug/EXO.app/Contents/Info.plist` → returns `https://reports.exolabs.net/presigned-urls` (was empty before this change). - `open app/EXO/build/Build/Products/Debug/EXO.app` → menubar → Debug Info → Send Bug Report → type a description → Send → upload succeeds and the Create GitHub Issue button appears (was failing with "Bug report endpoint is invalid" before). - Cross-check on the Slack side that the uploaded `report.json` lands under `reports/YYYY/MM/DD/<ts>/` as before. ### Automated Testing <!-- Describe changes to automated tests, or how existing tests cover this change --> - No new tests. This is a single-string change to `Info.plist` plus a workflow cleanup. `nix flake check` in CI verifies formatting/lint for the rest of the tree. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 14:08:16 +00:00
rltakashige	290e3fd927	Keep image cache fresh (temporary fix) (#1961 ) ## Motivation When a new node joins, it might not have the cache. Caveat: This is potentially fallible if a new node joins and updates real topology, but the API topology hasn't caught up with this fact and the user queues up a new text generation. In practice, there is only a split second where this is the case, and this is only for users of the dashboard interface. We should fix this properly after the release.	2026-04-23 11:39:36 +01:00
rltakashige	3894cf134e	Fix Gemma 4 E2B TP + DeepSeek V32 thinking parsing (#1967 )	2026-04-23 01:50:39 +00:00
Alex Cheema	8993ccaf09	feat(app): add friendly context message to bug report prompt (#1959 ) ## Motivation When a user clicks Send Bug Report in the macOS app, we already give them the option to add more context via an optional text field. But the current prompt is just a terse label — `"What's the issue? (optional)"` — which doesn't tell the user why bothering to fill it in matters. A friendly one-line explanation increases the chance they'll describe what went wrong, which is the single most useful signal when we triage the resulting diagnostic bundle. ## Changes - `app/EXO/EXO/ContentView.swift`: In the `.prompting` phase of `sendBugReportButton`, replace the single label with a two-line hierarchy: - Primary: `Tell us what went wrong (optional)` - Helper: `A quick description of what you were doing and what happened helps us track down the bug for you.` - The helper uses `.caption2` + `.secondary` + `.opacity(0.8)` + `.fixedSize(horizontal: false, vertical: true)` so it stays visually subordinate and wraps cleanly inside the 340pt popover. No changes to `BugReportService`, the `user_description` payload, or any other flow. ## Why It Works The optional description is already plumbed end-to-end (text editor → `bugReportUserDescription` state → `BugReportService.sendReport(..., userDescription:)` → `report.json`'s `user_description` field → GitHub issue pre-fill). The only gap was user-facing motivation, so this is purely a copy/layout tweak inside the existing `.prompting` case — no new state, bindings, or service changes. ## Test Plan ### Manual Testing <!-- Hardware: MacBook Pro (macOS app build via Xcode) --> - Build the macOS app in Xcode (`app/EXO/EXO.xcodeproj`) and launch it. - Open the menubar popover → expand Debug Info → click Send Bug Report. - Verify the new primary label and helper sentence both appear above the text editor and wrap cleanly within the popover width. - Leave the field empty → click Send → upload should succeed (no `user_description` in payload, same as before). - Fill in a description → click Send → upload succeeds and the success card with Create GitHub Issue appears; clicking it opens GitHub with the description pre-filled. - Click Cancel from the prompting state → returns to idle. ### Automated Testing <!-- Describe changes to automated tests, or how existing tests cover this change --> - No new automated tests. This is a SwiftUI copy/layout change; existing `EXOTests` are smoke-level and don't cover `ContentView` view bodies, and UI snapshot tests aren't worth adding for a two-line copy tweak. - `nix fmt` reports 0 files changed after the edit; `nix flake check` in CI will verify formatting/lint for the rest of the tree. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 17:39:36 +00:00
Nadeem Hilal Wani	4939fbe995	feat(dashboard): add Pi integration tab (#1925 ) ## Summary Adds a new Pi tab to the Integrations page (`/#/integrations`) alongside the existing Claude Code, OpenCode, Codex, OpenClaw, Open WebUI, n8n, and Firefox tabs. [pi](https://pi.dev) (`@mariozechner/pi-coding-agent`) is a terminal coding agent that supports custom OpenAI-compatible providers via `~/.pi/agent/models.json`. This tab gives users a copy-pasteable config to wire pi up to their exo cluster. ## What's in the tab - Model selector (shown when multiple models are running) — picks the default model for the generated shell command. - Models Config card — generates `~/.pi/agent/models.json` registering `exo` as a custom provider: - `baseUrl` → `<apiUrl>/v1` - `api` → `openai-completions` - `apiKey` → `"exo"` (placeholder; exo ignores it) - `compat.supportsDeveloperRole: false` and `compat.supportsReasoningEffort: false`, per pi docs recommendation for local OpenAI-compatible servers - Auto-populates every running model with `id`, `contextWindow` (from `/v1/models`), and `input: ["text", "image"]` for vision-capable models - Shell Command card — `pi --provider exo --model <model>` for quick launch. The tab gracefully falls back to `your-model-id` when no models are running, matching the behavior of the other tabs. ## Usage 1. `npm install -g @mariozechner/pi-coding-agent` 2. Paste the generated config into `~/.pi/agent/models.json` 3. Run `pi` and pick an exo model via `/model` — or run the shell command directly ## Changes - `dashboard/src/routes/integrations/+page.svelte` — adds `"Pi"` to the `tabs` tuple, `piModel` state, `piModelsJson` + `piShellCommand` derivations, and the tab content block. Single-file, scoped change — no backend or type changes. ## Testing - `cd dashboard && npm run build` — ✅ builds cleanly - `svelte-check` on the edited file — no new errors - Manually verified the tab renders, the model selector updates the generated JSON, and the config reflects `/v1/models` capabilities (vision → `input: ["text","image"]`, `context_length` → `contextWindow`). ## Screenshots <img width="1545" height="1236" alt="pi-tab" src="https://github.com/user-attachments/assets/38aa179f-4ed9-4a1e-9783-d3baa7738263" />	2026-04-22 17:29:47 +00:00
rltakashige	73782ecc65	Fix event mutation causing indexed vs event mismatch (#1964 ) Fixes small issue with #1957	2026-04-22 16:12:24 +00:00
rltakashige	f6e418ed23	Cleanup on #1952 (#1960 )	2026-04-22 17:05:49 +01:00
rltakashige	7a312a177b	Misc fixes: upstream JACCL all_sum, API, etc. + Add Kimi K2.6 (#1952 ) ## Motivation This fixes a bunch of observed model quality issues introduced upstream in JACCL, as well as API issues and prefix cache calculation. ## Test Plan ### Manual Testing Tested a bunch ### Automated Testing Added a test, automated eval tool calls on Kimi K2.6, Minimax M2.7, GPT OSS and Qwen3.6 models. --------- Co-authored-by: Evan <evanev7@gmail.com>	2026-04-22 15:43:27 +00:00
Evan Quiney	0a549f8846	remove layer loading callback (#1890 ) first part of modularising the backend is simplifying some of the control flow. more tbd.	2026-04-22 14:03:31 +01:00
Evan Quiney	df332035ef	swap camelcasemodels for frozenmodels globally (#1957 )	2026-04-22 11:49:25 +00:00
ciaranbor	af673845d3	Ignore HF remote repo changes (temporary fix) (#1958 ) ## Motivation Fixes #1918. Downloaded model status reverts from "completed" to "pending" during each download scan. Reproduced with `zai-org/GLM-5.1`. ## Changes - `coordinator.py`: In the periodic rescan, don't downgrade already-completed models; fall back to `resolve_existing_model()` (safetensors weight check) when per-file size check reports incomplete - New `test_download_status_not_lost.py`: 3 regression tests ## Why It Works The rescan compares local file sizes against HF's `main` revision. When HF updates text files (README, jinja, etc.), remote sizes change but local files still match the old revision — causing a false "incomplete". The fix uses the safetensors weight check as ground truth instead. Long-term: pin the downloaded revision SHA rather than always checking against `main`. ## Test Plan ### Manual Testing - Mac Studio M3 Ultra with GLM-5.1 downloaded (natural reproduction of the issue) - Confirmed GLM-5.1 stays `DownloadCompleted` through multiple rescan cycles ### Automated Testing - 3 new tests: completed-not-downgraded, fallback-to-resolve, genuinely-incomplete-stays-pending	2026-04-22 11:11:01 +01:00
ciaranbor	49670c8624	Handle missing total_size in safetensors index files (#1956 ) ## Motivation Image models fail to load after a mid-download instance deletion and recreation. The system skips the download and crashes with `FileNotFoundError: No safetensors files found in .../vae`. ## Changes - Make `ModelSafetensorsIndexMetadata.total_size` optional (`PositiveInt \| None = None`) - Add null guard in `fetch_safetensors_size` - Add regression test ## Why It Works Exolabs quantized image models have safetensors index files with mflux metadata (`quantization_level`, `mflux_version`) but no `total_size`. The required `PositiveInt` field caused Pydantic validation to fail, which was silently swallowed by `except Exception: continue` in `_scan_model_directory`. This skipped all weight map checks, making incomplete models appear complete. ## Test Plan ### Manual Testing - Hardware: Mac Studio - Before: `CreateRunner → LoadModel` (crash). After: `CreateRunner → DownloadModel` (correct). ### Automated Testing - `test_safetensors_index.py`: 3 cases covering missing, valid, and null metadata	2026-04-21 16:39:14 +01:00
rltakashige	fcc3718efb	Add sampling defaults (#1947 ) ## Motivation Model quality issues ### Manual Testing TODO	2026-04-21 06:45:33 +00:00
rltakashige	8ccfd7fcb6	Fix some misc build issues (#1948 ) ## Motivation <!-- Why is this change needed? What problem does it solve? --> <!-- If it fixes an open issue, please link to the issue here --> ## Changes <!-- Describe what you changed in detail --> ## Why It Works <!-- Explain why your approach solves the problem --> ## Test Plan ### Manual Testing <!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB, connected via Thunderbolt 4) --> <!-- What you did: --> <!-- - --> ### Automated Testing <!-- Describe changes to automated tests, or how existing tests cover this change --> <!-- - -->	2026-04-21 07:41:07 +01:00

1 2 3 4 5 ...

2344 Commits