mirror/exo - exo - Gitea: Git with a cup of tea

mirror/exo

mirror of https://github.com/exo-explore/exo.git synced 2026-02-05 11:43:17 -05:00

Author	SHA1	Message	Date
Ryuichi Leo Takashige	fe05608260	Fix prompt for GLM	2026-02-04 19:53:53 +00:00
Ryuichi Leo Takashige	bce8ee3a6e	Force synchronization points	2026-02-04 19:18:25 +00:00
Ryuichi Leo Takashige	741e2790dd	fix for non ssm	2026-02-04 18:54:19 +00:00
Ryuichi Leo Takashige	1f29b9c85d	shard all qwen3 next attention	2026-02-04 18:45:33 +00:00
Ryuichi Leo Takashige	bfa3160339	handle upstream mamba -> arrays cache update	2026-02-04 17:59:04 +00:00
Ryuichi Leo Takashige	d745157342	Disable prefix cache to test	2026-02-04 12:27:04 +00:00
Ryuichi Leo Takashige	e3751e03c2	Add mamba cache.	2026-02-04 12:17:28 +00:00
Alex Cheema	cd9f3182d9	Fix NameError for Cache in WrappedMiniMaxAttention Use string annotation for the Cache type since it only exists in type stubs, not in the actual mlx_lm package at runtime. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 19:15:50 -08:00
Alex Cheema	a54ba12dee	Merge branch 'main' into leo/add-more-tensor-strategies	2026-02-03 14:54:22 -08:00
rltakashige	a0f4f36355	Reduce reliance on internet (#1363 ) ## Motivation Offline users currently have to wait for every retry to fail before being able to launch a model. For users that restart clusters often or share API keys between devices, we also spam HuggingFace with downloads every 5 minutes. These issues are caused by _emit_existing_download_progress being inefficient. ## Changes - Only query HuggingFace once while EXO is running (assumption being that a change should only be reflected on a new EXO session) - Only query HuggingFace when there is an internet connection (polling connectivity every 10 seconds) - Request download progress if we switch from no connectivity -> connected to reduce the wait. - Reduce download progress sleep as it's no longer expensive (queries cache most of the time). - Reduce retries as 30 is way too many. ## Test Plan ### Manual Testing Manually tested the behaviour. ### Automated Testing None, should I add any? We do have some tests for this folder, but they are probably not too helpful.	2026-02-03 20:03:29 +00:00
Alex Cheema	acb97127bf	Normalize TextGenerationTaskParams.input to list[InputMessage] (#1360 ) ## Motivation With the addition of the Responses API, we introduced `str \| list[InputMessage]` as the type for `TextGenerationTaskParams.input` since the Responses API supports sending input as a plain string. But there was no reason to leak that flexibility past the API adapter boundary — it just meant every downstream consumer had to do `if isinstance(messages, str):` checks, adding complexity for no benefit. ## Changes - Changed `TextGenerationTaskParams.input` from `str \| list[InputMessage]` to `list[InputMessage]` - Each API adapter (Chat Completions, Claude Messages, Responses) now normalizes to `list[InputMessage]` at the boundary - Removed `isinstance(task_params.input, str)` branches in `utils_mlx.py` and `runner.py` - Wrapped string inputs in `[InputMessage(role="user", content=...)]` in the warmup path and all test files ## Why It Works The API adapters are the only place where we deal with raw user input formats. By normalizing there, all downstream code (worker, runner, MLX engine) can just assume `list[InputMessage]` and skip the type-checking branches. The type system (`basedpyright`) catches any missed call sites at compile time. ## Test Plan ### Automated Testing - `uv run basedpyright` — 0 errors - `uv run ruff check` — passes - `nix fmt` — applied - `uv run pytest` — 174 passed, 1 skipped Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 06:01:56 -08:00
Evan Quiney	d90605f198	migrate model cards to .toml files (#1354 )	2026-02-03 12:32:06 +00:00
Evan Quiney	f400b4d7c5	fix InstanceViewModel.swift (#1359 ) wasn't caught when we merged the API changes	2026-02-02 18:43:27 +00:00
Evan Quiney	d97bca88e6	improve distributed testing (#1300 ) Our distributed test now does a full query cycle for every model loaded onto the relevant machine. This will help find bugs early, as it already has found one with Qwen3 Next! I didn't write down what the error was though. Gooooooood luck with that! Co-authored-by: rltakashige <rl.takashige@gmail.com>	2026-02-02 18:25:39 +00:00
Alex Cheema	dfce188d99	fix: handle unclosed tool calls and GLM arg parsing edge cases (#1344 ) ## Motivation Tool-call requests can hang indefinitely when `max_tokens` truncates generation mid-tool-call. ## Reproduction 1. Send a chat completion with `tools` and a low `max_tokens` (e.g. 65) to Qwen3-0.6B 2. Model generates `<think>...</think>` then starts `<tool_call>` but `max_tokens` cuts it off before `</tool_call>` 3. Before this fix: `parse_tool_calls` buffers tokens after `<tool_call>`, generator exhausts, buffered tokens (including `finish_reason`) are silently dropped → stream hangs forever 4. After this fix: buffered tokens are flushed as regular text with `finish_reason` propagated → response returns normally with `finish_reason: "length"` Confirmed with fresh local testing: 4 unclosed tool call flushes triggered in a single session. Also confirmed via production logs from Jan 29 (2 occurrences). ## Changes 1. `parse_tool_calls` unclosed tool call flush — when the generator exhausts inside an open `<tool_call>` block, flush buffered tokens as regular text and propagate `finish_reason` 2. GLM regex fix — match literal `\n` (not escaped `\\n`) between arg tags; handle missing `</arg_value>` via lookahead 3. 7 new unit tests for `parse_tool_calls` covering unclosed, closed, passthrough, and failed-parse scenarios ## Why It Works - `parse_tool_calls` now has a post-loop check: if `in_tool_call` is still true, it yields the buffered text with the tracked `finish_reason` instead of silently dropping it - The GLM regex now matches real-world output where newlines appear between tags and `</arg_value>` may be absent ## Test Plan ### Manual Testing - Qwen3-0.6B-4bit with `tools` + various `max_tokens` values (61-75) - Confirmed responses return with `finish_reason: "length"` instead of hanging - Log output shows `"generator exhausted inside unclosed tool call, flushing buffered text"` ### Automated Testing - 7 new tests in `test_parse_tool_calls.py` - Full test suite passes (`uv run pytest`) --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Ryuichi Leo Takashige <leo@exolabs.net> Co-authored-by: Evan <evanev7@gmail.com> Co-authored-by: Jake Hillion <jake@hillion.co.uk> Co-authored-by: rltakashige <rl.takashige@gmail.com>	2026-02-02 17:45:51 +00:00
Evan Quiney	54b19879a0	create config home when checking for config file (#1353 ) we didn't check before, raising a critical exception. now we create ~/.config/exo on linux systems before touching config.toml. this wasn't caught before since everything lives in ~/.exo on macos, and we no longer write the keypair to CONFIG_HOME, so config.toml has to do init work it avoided before.	2026-02-02 17:36:51 +00:00
ciaranbor	19965c7ba5	Ciaran/profiling (#1345 ) ## Motivation Know what the hell is going on ## Changes - Tracing library (src/exo/shared/tracing.py): trace() context manager, Chrome Trace Format export, statistics computation - Runner instrumentation (src/exo/worker/engines/image/pipeline/runner.py): Wrapped sync/async steps, compute blocks, and send/recv operations - Trace collection: Workers send traces to master after task completion; merged into ~/.exo/traces/trace_{task_id}.json - API endpoints: List, fetch, stats, and raw download at /v1/traces/* - Dashboard: Trace list and detail pages with Perfetto integration ## Why It Works <img width="1236" height="767" alt="Screenshot 2026-01-30 at 19 00 09" src="https://github.com/user-attachments/assets/73e6e46d-ba10-4e83-ba99-ff1c3f62ab05" /> <img width="1659" height="89" alt="Screenshot 2026-01-30 at 19 00 58" src="https://github.com/user-attachments/assets/c0fd0e65-e4fc-4fd5-920d-b43b2887d591" />	2026-02-02 17:19:45 +00:00
Evan Quiney	3e27ead705	remove mdns discovered peers from appearing in state (#1312 ) ## motivation eagerly discovered peers through gossipsub were added to state. this left things looking broken from one-sided connections ## changes the worker no longer writes topology edges from these gossipsub messages we now strictly rely on http-discovered topology, which tends to be more reflective of the actual state of the systems connectivity	2026-02-02 16:58:53 +00:00
Alex Cheema	d826d309b3	chore: gitignore hosts_.json files (#1343 ) ## Motivation `hosts_.json` files are local host configuration snapshots that shouldn't be tracked in version control. ## Changes Added `hosts_.json` pattern to `.gitignore`. ## Why It Works The glob pattern `hosts_.json` matches any file starting with `hosts_` and ending with `.json` in the repo root. ## Test Plan ### Manual Testing - Verified that `hosts_*.json` files are ignored by git after this change. ### Automated Testing - No automated tests needed for a `.gitignore` change. Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-02 16:14:11 +00:00
Alex Cheema	c3537980bd	feat: add Claude Messages API and OpenAI Responses API support (#1167 ) ## Motivation Add support for Claude Messages API and OpenAI Responses API to allow users to interact with exo using these popular API formats. This enables broader compatibility with existing tooling and SDKs that expect these API formats. ## Architecture Adapter logic lives exclusively in the API layer (`src/exo/master/adapters/`). On the way in, each adapter converts its API-specific request type (`ChatCompletionRequest`, `ClaudeMessagesRequest`, `ResponsesRequest`) into `TextGenerationTaskParams`. On the way out, each adapter converts the `TokenChunk` stream back into its API-specific response format. Everything inside the application — commands, worker, runner, event sourcing — only sees `TextGenerationTaskParams` and `TokenChunk`. No API-specific types cross the boundary. ``` API layer │ Application internals │ Chat Completions → [adapter] → TextGenerationTaskParams ──→ │ ──→ TextGeneration command → Runner → TokenChunk ──→ │ ──→ [adapter] → ChatCompletionResponse Claude Messages → [adapter] → TextGenerationTaskParams ──→ │ ──→ TextGeneration command → Runner → TokenChunk ──→ │ ──→ [adapter] → ClaudeMessagesResponse Responses API → [adapter] → TextGenerationTaskParams ──→ │ ──→ TextGeneration command → Runner → TokenChunk ──→ │ ──→ [adapter] → ResponsesResponse ``` ## Changes ### New Files - `src/exo/shared/types/claude_api.py` - Pydantic types for Claude Messages API - `src/exo/shared/types/openai_responses.py` - Pydantic types for OpenAI Responses API - `src/exo/shared/types/text_generation.py` - Shared `TextGenerationTaskParams` internal type - `src/exo/master/adapters/chat_completions.py` - Chat Completions adapter (streaming/non-streaming) - `src/exo/master/adapters/claude.py` - Claude Messages adapter (streaming/non-streaming) - `src/exo/master/adapters/responses.py` - OpenAI Responses adapter (streaming/non-streaming) ### Modified Files - `src/exo/master/api.py` - Refactored to use adapters uniformly for all endpoints; extracted `_resolve_and_validate_text_model` helper to deduplicate model validation across all text endpoints; removed ad-hoc `try/except ValueError` blocks from non-streaming paths ### New Endpoints - `POST /v1/messages` - Claude Messages API (streaming and non-streaming) - `POST /v1/responses` - OpenAI Responses API (streaming and non-streaming) ## Why It Works All APIs are implemented as pure conversion adapters at the edge of the application: 1. Adapter functions in `src/exo/master/adapters/` convert incoming requests to `TextGenerationTaskParams` 2. `api.py` wraps the params in a `TextGeneration` command and sends it through the existing command/event flow 3. The worker, runner, and event sourcing layers only handle `TextGenerationTaskParams` and `TokenChunk` — they have no awareness of Chat Completions, Claude, or Responses API formats 4. On response, adapter functions convert the `TokenChunk` stream back to the caller's expected format 5. Model validation is handled by a single shared helper (`_resolve_and_validate_text_model`), mirroring the existing `_validate_image_model` pattern for image endpoints No changes to core inference logic were needed. ### Streaming Formats - Chat Completions: Uses `data: {...}\n\n` with `[DONE]` terminator - Claude: Uses event types `message_start`, `content_block_start`, `content_block_delta`, `content_block_stop`, `message_delta`, `message_stop` - OpenAI Responses: Uses event types `response.created`, `response.in_progress`, `response.output_item.added`, `response.content_part.added`, `response.output_text.delta`, `response.output_text.done`, `response.content_part.done`, `response.output_item.done`, `response.completed` ## Test Plan ### Manual Testing Hardware: MacBook Pro M3 Max Non-streaming tests: ```bash # Chat Completions API curl -X POST http://localhost:52415/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model": "llama-3.2-1b", "messages": [{"role": "user", "content": "Hello"}], "max_tokens": 20}' # Claude Messages API curl -X POST http://localhost:52415/v1/messages \ -H "Content-Type: application/json" \ -d '{"model": "llama-3.2-1b", "max_tokens": 50, "messages": [{"role": "user", "content": "Hello"}]}' # OpenAI Responses API curl -X POST http://localhost:52415/v1/responses \ -H "Content-Type: application/json" \ -d '{"model": "llama-3.2-1b", "input": "Hello", "max_output_tokens": 20}' ``` Streaming tests: ```bash # Chat Completions API (streaming) curl -N -X POST http://localhost:52415/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model": "llama-3.2-1b", "messages": [{"role": "user", "content": "Hello"}], "stream": true, "max_tokens": 20}' # Claude Messages API (streaming) curl -N -X POST http://localhost:52415/v1/messages \ -H "Content-Type: application/json" \ -d '{"model": "llama-3.2-1b", "max_tokens": 50, "messages": [{"role": "user", "content": "Hello"}], "stream": true}' # OpenAI Responses API (streaming) curl -N -X POST http://localhost:52415/v1/responses \ -H "Content-Type: application/json" \ -d '{"model": "llama-3.2-1b", "input": "Hello", "stream": true, "max_output_tokens": 20}' ``` All endpoints tested successfully with proper response formats and streaming events. ### Automated Testing - Tests in `src/exo/master/tests/` all pass (85 tests) - Type checker (basedpyright) passes with 0 errors - Linter (ruff) passes --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Evan <evanev7@gmail.com>	2026-02-02 15:58:37 +00:00
rltakashige	21d477f1cb	Update exo bench (#1357 ) ## Motivation Make exo bench faster for longer prompts, lengthen default timeouts and use pairs for pp and tg. ## Changes - Uses binary search to find correct prompt - Flag to force all combinations if that is desired	2026-02-02 15:46:15 +00:00
Jake Hillion	b2579c78fe	nix: add macmon to PATH in wrapper scripts on Darwin `nix run .#exo` couldn't find `macmon` because the Nix wrapper scripts didn't include it in PATH, causing `shutil.which("macmon")` to fail. Added `--prefix PATH : ${pkgs.macmon}/bin` to the `makeWrapper` call, conditional on Darwin via `lib.optionalString`, so macmon's binary is available at runtime without modifying the user's system PATH. Test plan: - Verified `nix build .#exo` succeeds - Checked wrapper script contains macmon store path in PATH prefix	2026-02-02 13:42:36 +00:00
rltakashige	d98f2c9b68	Merge branch 'main' into leo/add-more-tensor-strategies	2026-01-30 14:31:55 +00:00
Evan Quiney	cd946742f7	fix skipping logic in worker plan (#1342 ) the worker plan function had some skipping logic missing, leading to double-submitting tasks.	2026-01-30 14:31:40 +00:00
rltakashige	a5bc38ad1f	Check all nodes to evict (#1341 ) ## Motivation If nodes have uneven memory, one node may evict cache that remains on another node. This will break prefill on some setups. ## Changes <!-- Describe what you changed in detail --> ## Why It Works <!-- Explain why your approach solves the problem --> ## Test Plan ### Manual Testing <!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB, connected via Thunderbolt 4) --> <!-- What you did: --> <!-- - --> ### Automated Testing <!-- Describe changes to automated tests, or how existing tests cover this change --> <!-- - -->	2026-01-30 13:42:09 +00:00
rltakashige	33f22ca78a	Merge branch 'main' into leo/add-more-tensor-strategies	2026-01-30 13:37:38 +00:00
Evan Quiney	2a4e0d4629	make node-ids unique per-session (#1338 ) we currently have no strict reuqirements that node ids persist across sessions, so we can generate fresh nodeids each time this avoids issues like #1332, but prevents further features such as caching downloads or node-id dialling Co-authored-by: rltakashige <rl.takashige@gmail.com>	2026-01-30 13:33:31 +00:00
Evan Quiney	46a14153dd	switch to ModelCard.load outside of download log (#1339 ) some attempts to load model cards (i.e. build_base_shard) always went through networking rather than using downloaded model cards. we should always default to ModelCard.load in these scenarios	2026-01-30 11:20:20 +00:00
Evan	9ba61f3733	improve log message in shard downloader closes #1336	2026-01-30 10:35:01 +00:00
rltakashige	d9eca75895	Add usage stats (#1333 ) ## Motivation (Probably) the final missing piece of the Chat Completions API ## Changes Add UsageStats ## Why It Works OpenCode reviewed my PR and gave me stats: <img width="1150" height="802" alt="image" src="https://github.com/user-attachments/assets/ebc06bae-797f-4087-87d5-2f26cf60fc48" /> ## Test Plan ### Automated Testing No tests were broken.	2026-01-30 10:23:08 +00:00
rltakashige	9dabde7e57	Fix bench after recent updates (#1331 ) ## Motivation A lot of changes happened without much attention to the state of exo bench. ## Changes Use TaggedModel for BenchChatCompletion so it serialises properly. Don't break after gpt oss tool call to preserve parity with the rest of the codebase. ## Why It Works <!-- Explain why your approach solves the problem --> ## Test Plan ### Manual Testing <img width="2856" height="678" alt="image" src="https://github.com/user-attachments/assets/2e18cf0d-c0f8-467c-9763-1a6a59c8a327" /> Also tested GPT OSS tool calling in OpenCode	2026-01-29 19:14:40 +00:00
ciaranbor	a31942ce12	Ciaran/image non streaming (#1328 ) ## Motivation The dashboard UI attempted to parse all image generation responses as SSE streams, even when streaming was disabled. This broke non-streaming image generation. ## Changes - Parse JSON responses directly when not streaming, use SSE parser only when stream=true AND partialImages > 0 - explicitly disable partial images when not streaming ## Why It Works Both API and dashboard now use the same condition (stream && partialImages > 0) to determine response format, ensuring correct parsing. ## Test Plan ### Manual Testing Non-streamed image generation results appear in the UI. Streamed image generation still works	2026-01-29 17:24:32 +00:00
Alex Cheema	7cc313b22a	Treat Swift/Xcode build warnings as errors (#1322 ) ## Motivation Warnings that go unchecked tend to accumulate and hide real issues. Treating them as errors ensures they are addressed immediately, both locally during development and in CI. ## Changes Added `SWIFT_TREAT_WARNINGS_AS_ERRORS = YES` and `GCC_TREAT_WARNINGS_AS_ERRORS = YES` to the project-level Debug and Release build configurations in `project.pbxproj`. This applies to all targets (EXO, EXOTests, EXOUITests). ## Why It Works Xcode's `SWIFT_TREAT_WARNINGS_AS_ERRORS` and `GCC_TREAT_WARNINGS_AS_ERRORS` build settings promote Swift and C/ObjC warnings to errors at compile time. Setting them at the project level means all targets inherit the policy without needing per-target or CI-level overrides. ## Test Plan ### Manual Testing - Built the EXO scheme in Release configuration with `xcodebuild` — no warning-as-error failures from Swift or C/ObjC sources. ### Automated Testing - CI already builds with `-configuration Release`, so it will automatically enforce warnings-as-errors via the inherited project settings — no CI changes needed.	2026-01-29 17:15:49 +00:00
rltakashige	2837225dc7	Load pipeline layers sequentially (#1329 ) ## Motivation Slightly annoyed by needing this change, but same story as for tensor loading...	2026-01-29 17:08:38 +00:00
Jake Hillion	e4c6a7dbb4	nix: add Python packaging with uv2nix Add uv2nix to build Python packages from uv.lock. This creates a fully Nix-managed Python environment with the Rust bindings injected via overlay. Changes: - Add pyproject-nix, uv2nix, and pyproject-build-systems flake inputs - Create python/parts.nix with overlays to inject Nix-built Rust wheel - Export packages.exo on macOS (wraps exo/exo-master/exo-worker with dashboard) - Add checks.lint (ruff, all platforms) and checks.pytest (macOS only) - Simplify CI typecheck job using nicknovitski/nix-develop action - Delete .github/actions/typecheck composite action (no longer needed) - Add no-build-package for MLX packages in pyproject.toml (use wheels) The Python build is currently macOS-only since MLX requires Metal. Linux support will be added once the pyproject dependencies are simplified. Test plan: - Run `nix flake check` on macOS to verify pytest and lint pass - Build exo package on macOS: `nix build .#exo` - Verify CI pipeline passes with simplified typecheck job	2026-01-29 16:35:58 +00:00
Evan	b1e88a3d06	shfmt adds shfmt, a shell formatter, and formats the bash files	2026-01-29 15:24:36 +00:00
Jake Hillion	ebeddfb308	mlx: build with Nix (#1285 ) In order to make testing and deployment simpler and more reproducible, we want to provide a Nix derivation for our macOS .app build. We already build the Rust and dashboard with Nix, but so far the Python has been blocked because we haven't had an MLX build. This change adds a Metal compiler derivation that uses `requireFile` to be provided a NAR of the unfree macOS Metal compiler. It is documented how to get this file, but effectively you have to trigger the download, mount the DMG, and NAR the result. Once this is added to the store by hash we can build MLX using it. The MLX build itself is quite self explanatory. Test plan: - CI. We follow the instructions to grab the Metal compiler. Once this is in Cachix we should really never do this again, and I can pin the path too to ensure it doesn't leave. - MLX tests run as part of the MLX derivation's build. They pass. - `NIXPKGS_ALLOW_UNFREE=1 nix build .#mlx.passthru.tests.mlxTest --impure --option sandbox false` --------- Co-authored-by: Ryuichi Leo Takashige <leo@exolabs.net>	2026-01-29 14:07:00 +00:00
Alex Cheema	9111575997	Add startup delay and update network setup message (#1309 ) ## Summary - Add 20-second startup delay to wait for macOS to finish network setup after boot - Update user-facing message to clarify the service configures local networking, disables Thunderbolt Bridge (preventing packet storms), and installs a Network Location ## Test plan - [ ] Manual verification of Swift syntax - [ ] Test network setup on macOS device after reboot 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: rltakashige <rl.takashige@gmail.com>	2026-01-29 13:05:50 +00:00
Sami Khan	ffacabe7e4	Fix uninstall button error (#1306 ) ## Motivation Fix "Network setup script failed" error when clicking uninstall button and resolve Xcode compiler warnings. ## Changes - NetworkSetupHelper.swift: Add \|\| true guards and explicit return 0 in find_and_enable_thunderbolt_bridge to prevent script failures with set -euo pipefail - ThunderboltBridgeService.swift: Use withCString and withUnsafeMutablePointer for Authorization API calls to fix pointer lifetime warnings - EXOApp.swift: Mark showNotification as nonisolated to fix main actor isolation warning ## Why It Works - The uninstall script's Thunderbolt re-enable function could exit non-zero in edge cases (no bridges, no matches). Since this is a cleanup step, failures should not abort uninstall. - Swift requires explicit pointer lifetime management when passing strings/structs to C APIs. - showNotification is called from a nonisolated delegate method and uses thread-safe APIs. ## Test Plan ### Manual Testing Hardware: MacBook Pro - Clicked Uninstall button, verified it completes without error - Built in Xcode, verified no warnings ### Automated Testing N/A	2026-01-29 12:57:48 +00:00
Ryuichi Leo Takashige	b60a59bbf6	Add minimax and fix qwen sharding strategies	2026-01-28 19:27:56 +00:00
rltakashige	9e58a57599	Add RDMA caveats to README.md (#1316 ) ## Motivation Running RDMA from source is not well documented as is. Several surprising things that took time to debug internally too. App should be updated to detect MacOS versions in future.	2026-01-28 18:44:00 +00:00
Evan Quiney	748a026071	fix configdata validation for kimi-k2 (#1314 ) ## motivation our shard downloader could not correctly fetch data for kimi-k2, as it deferred some values to a text_config field. ## changes config_data now prioritizes this field if it exists in information like layer_count	2026-01-28 14:29:36 +00:00
Alex Cheema	f1a2d054ec	Update tagline to "Run frontier AI locally" (#1313 ) - Update README tagline from "Run your own AI cluster at home with everyday devices" to "Run frontier AI locally"	2026-01-28 12:38:14 +00:00
Alex Cheema	b3c8f85fc8	Update MLX to 0.30.4 (#1311 ) ## Summary - Bump mlx from 0.30.3 to 0.30.4 ## Test plan - [x] `uv lock` succeeds - [x] Type checking passes (`uv run basedpyright`) - [x] Run inference tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-28 04:30:21 -08:00
rltakashige	a562114ba5	Add Kimi K2.5 support (#1302 ) ## Motivation <!-- Why is this change needed? What problem does it solve? --> <!-- If it fixes an open issue, please link to the issue here --> ## Changes <!-- Describe what you changed in detail --> ## Why It Works <!-- Explain why your approach solves the problem --> ## Test Plan ### Manual Testing <!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB, connected via Thunderbolt 4) --> <!-- What you did: --> <!-- - --> ### Automated Testing <!-- Describe changes to automated tests, or how existing tests cover this change --> <!-- - --> --------- Co-authored-by: Alex Cheema <41707476+AlexCheema@users.noreply.github.com>	2026-01-28 05:44:19 +00:00
Evan Quiney	991d278119	replace nix fmt with treefmt in just lint (#1301 ) man evaluating the nix flake is so slow. treefmt speeeedy	2026-01-27 17:03:01 +00:00
rltakashige	c55cbf6739	Add mlx lm style tensor sharding for Minimax (#1299 ) ## Motivation Broken right now. We'll potentially add a better one later ## Changes <!-- Describe what you changed in detail --> ## Why It Works <!-- Explain why your approach solves the problem --> ## Test Plan ### Manual Testing Used for evals without any issue. ### Automated Testing <!-- Describe changes to automated tests, or how existing tests cover this change --> <!-- - -->	2026-01-27 15:29:06 +00:00
Alex Cheema	bd4f0bf048	Fix download speed/ETA display for re-downloads (#1294 ) ## Motivation After the download verification fix, when files are re-downloaded due to upstream changes (size mismatch), the download progress displays correctly (completion %, bytes, file counts), but speed shows 0 B/s and ETA shows "--" for both overall and per-file progress. ## Changes - Modified `on_progress_wrapper` in `src/exo/download/download_utils.py` to detect re-download scenarios - Added re-download detection: when `curr_bytes < previous_downloaded`, the file was deleted and download restarted - On re-download: reset `start_time` to current time and set `downloaded_this_session = curr_bytes` - Added two tests to `test_download_verification.py` covering re-download and continuing download scenarios ## Why It Works The bug occurred because: 1. `file_progress` is initialized with the OLD local file size (e.g., 1.5GB) 2. When `_download_file` detects size mismatch, it deletes the file and starts fresh 3. Progress callback receives small `curr_bytes` (e.g., 8KB) but compares against old size 4. `downloaded_this_session = 0 + (8KB - 1.5GB) = -1.5GB` (negative!) 5. Negative session bytes → 0 or negative speed → ETA shows "--" The fix detects when `curr_bytes < previous_downloaded` (indicating re-download started) and resets tracking to treat it as a fresh download. ## Test Plan ### Manual Testing <!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB, connected via Thunderbolt 4) --> <!-- What you did: --> - Download a model, modify a file to change its size, restart exo, verify speed/ETA display correctly during re-download ### Automated Testing - Added `TestProgressResetOnRedownload` class with two tests: - `test_progress_resets_correctly_on_redownload`: Verifies progress resets correctly when re-download starts - `test_progress_accumulates_on_continuing_download`: Verifies continuing downloads still accumulate correctly - All 11 download tests pass - Type checking (basedpyright): 0 errors - Linting (ruff): All checks passed 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 21:56:58 +00:00
rltakashige	cd8c01b7c8	Fix kv prefix cache (#1262 ) ## Motivation OpenCode sends very large prompts, most of which are repeated on the next call. ## Changes Add prefix caching, reducing average time in prefill (in testing) from 40 seconds to 4. This massively improves user experience. Also evicts KV caches from this prefix cache in a LRU-style manner. ## Why It Works We no longer prefill repeatedly but rather use kv cache stored in memory. A future update may want to use storage to make the prefix cache larger. ## Test Plan ### Manual Testing Tested speedup on OpenCode ### Automated Testing Added a lot of tests --------- Co-authored-by: David Hind <davehind@yahoo.co.uk>	2026-01-26 20:13:58 +00:00
rltakashige	59e991ce15	Only ignore message if actually empty (#1292 ) ## Motivation <!-- Why is this change needed? What problem does it solve? --> <!-- If it fixes an open issue, please link to the issue here --> ## Changes <!-- Describe what you changed in detail --> ## Why It Works <!-- Explain why your approach solves the problem --> ## Test Plan ### Manual Testing <!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB, connected via Thunderbolt 4) --> <!-- What you did: --> <!-- - --> ### Automated Testing <!-- Describe changes to automated tests, or how existing tests cover this change --> <!-- - -->	2026-01-26 19:33:23 +00:00

1 2 3 4 5 ...

2025 Commits