mirror/exo - exo - Gitea: Git with a cup of tea

mirror/exo

mirror of https://github.com/exo-explore/exo.git synced 2026-01-29 00:01:15 -05:00

Author	SHA1	Message	Date
ciaranbor	a4a965c80c	Update mflux version	2026-01-20 18:47:49 +00:00
ciaranbor	970f62e645	Add mflux dependency	2026-01-20 18:47:49 +00:00
ciaranbor	b824f7a60d	Add pillow dependency	2026-01-20 18:47:49 +00:00
Evan Quiney	22b5d836ef	swap all instances of model_id: str for model_id: ModelId (#1221 ) This change uses the stronger typed ModelId, and introduces some convenience methods. It also cleans up some code left over from #1204. ## Changes `model_id: str -> model_id: ModelId` `repo_id: str -> model_id: ModelId` Introduces methods on ModelId, in particular ModelId.normalize() to replace `/` with `--`. This PR did introduce some circular imports, so has moved some code around to try and limit them. ## Test Plan Tests still pass, types still check. As this is about metadata, I haven't tested inference.	2026-01-20 17:38:06 +00:00
Alex Cheema	176ab5ba40	Add GLM-4.7-Flash model cards (4bit, 5bit, 6bit, 8bit) (#1214 ) ## Motivation Add support for GLM-4.7-Flash, a lighter variant of GLM-4.7 with the `glm4_moe_lite` architecture. These models are smaller and faster while maintaining good performance. ## Changes 1. Added 4 new model cards for GLM-4.7-Flash variants: - `glm-4.7-flash-4bit` (~18 GB) - `glm-4.7-flash-5bit` (~21 GB) - `glm-4.7-flash-6bit` (~25 GB) - `glm-4.7-flash-8bit` (~32 GB) All variants have: - `n_layers`: 47 (vs 91 in GLM-4.7) - `hidden_size`: 2048 (vs 5120 in GLM-4.7) - `supports_tensor`: True (native `shard()` method) 2. Bumped mlx from 0.30.1 to 0.30.3 - required by mlx-lm 0.30.4 3. Updated mlx-lm from 0.30.2 to 0.30.4 - adds `glm4_moe_lite` architecture support 4. Added type ignores in `auto_parallel.py` for stricter type annotations in new mlx-lm 5. Fixed EOS token IDs for GLM-4.7-Flash - uses different tokenizer with IDs `[154820, 154827, 154829]` vs other GLM models' `[151336, 151329, 151338]` 6. Renamed `MLX_IBV_DEVICES` to `MLX_JACCL_DEVICES` - env var name changed in new mlx ## Why It Works The model cards follow the same pattern as existing GLM-4.7 models. Tensor parallel support is enabled because GLM-4.7-Flash implements the native `shard()` method in mlx-lm 0.30.4, which is automatically detected in `auto_parallel.py`. GLM-4.7-Flash uses a new tokenizer with different special token IDs. Without the correct EOS tokens, generation wouldn't stop properly. ## Test Plan ### Manual Testing Tested generation with GLM-4.7-Flash-4bit - now correctly stops at EOS tokens. ### Automated Testing - `basedpyright`: 0 errors - `ruff check`: All checks passed - `pytest`: 162/162 tests pass (excluding pre-existing `test_distributed_fix.py` timeout failures) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-20 03:58:09 +00:00
Evan Quiney	39ee2bf7bd	switch from synchronous threaded pinging to an async implementation (#1170 ) still seeing churn in our networking - lets properly rate limit it ## changes added an httpx client with max connections with a persistent AsyncClient ## testing deployed on cluster, discovery VASTLY more stable (the only deleted edges were those discovered by mdns)	2026-01-16 13:20:03 +00:00
Alex Cheema	e5e74e1eef	Upgrade mlx-lm to 0.30.2 with transformers 5.x compatibility (#1125 ) ## Motivation Upgrade mlx-lm to version 0.30.2 which requires transformers 5.0.0rc2 as a prerelease dependency. This enables support for newer models like Kimi K2 Thinking while maintaining compatibility with existing models. The transformers 5.x release includes breaking changes that affect custom tokenizers like Kimi's TikTokenTokenizer, requiring compatibility fixes. ## Changes ### Core Changes - mlx-lm upgrade: Bump to 0.30.2 with locked exact versions for mlx/mlx-lm to prevent breaking changes - transformers 5.x compatibility: Enable prerelease transformers dependency ### Kimi K2 Tokenizer Fixes - Add `bytes_to_unicode` monkey-patch to restore function moved in transformers 5.0.0rc2 - Load `TikTokenTokenizer` directly instead of via `AutoTokenizer` to bypass transformers 5.x bug with `auto_map` fallback - Patch `encode()` to use tiktoken directly with `allowed_special="all"` to handle special tokens from chat templates ### Other Changes - Dashboard: Show disk usage for completed model downloads - CI: Add `workflow_dispatch` trigger to build-app workflow - Docs: Add basic API documentation ### Testing - Add comprehensive tokenizer unit tests for all supported models - Tests verify encode/decode, special token handling, and chat template encoding ## Why It Works bytes_to_unicode issue: transformers 5.0.0rc2 moved `bytes_to_unicode` from `transformers.models.gpt2.tokenization_gpt2` to `transformers.convert_slow_tokenizer`. Kimi's `tokenization_kimi.py` imports from the old location. The monkey-patch restores it at module load time. AutoTokenizer issue: transformers 5.x has a bug where `tokenizer_class_from_name('TikTokenTokenizer')` returns `None` for custom tokenizers with `auto_map`. Loading the tokenizer directly bypasses this. encode() issue: transformers 5.x's `pad()` method fails for slow tokenizers. Using tiktoken's encode directly with `allowed_special="all"` avoids this path and properly handles special tokens like `<\|im_user\|>` from chat templates. ## Test Plan ### Manual Testing - Hardware: 2x Mac Studios connected via Thunderbolt 5 (mike22 and james21) - Tested Kimi K2 Thinking, GPT-OSS-120B, GPT-OSS-20B, LLama-3.1-8B-bf16, qwen3-30B-A3B-8bit model with pipeline parallelism across both nodes - Verified warmup inference completes successfully - Verified chat completions work with special tokens ### Automated Testing - Added `test_tokenizers.py` with 31 tests covering: - Basic encode/decode for all model families (deepseek, kimi, llama, qwen, gpt-oss, glm) - Special token encoding (critical for chat templates) - Chat template application and encoding - Kimi-specific and GLM-specific edge cases - All tests pass: `uv run pytest src/exo/worker/tests/unittests/test_mlx/test_tokenizers.py` ### Failing Tests RDMA with all models. --------- Co-authored-by: Evan <evanev7@gmail.com>	2026-01-13 12:06:04 +00:00
Evan	cca8c9984a	cleanup unused dependencies we have a lot of dependencies we have no intent of using. kill them with fire! ## testing exo still launches and does the worst inference known to man on my Qwen3 instance. tests pass too!!	2026-01-09 13:11:58 +00:00
Evan Quiney	9d9e24f969	some dashboard updates (#1017 ) Mostly @samiamjidkhan and @AlexCheema's work in progress. --------- Co-authored-by: Sami Khan <smsak99@gmail.com> Co-authored-by: Alex Cheema	2025-12-28 20:50:23 +00:00
Jake Hillion	b5d424b658	placement: generate per-node host lists for MLX ring backend Pipeline + MLX Ring worked with 2 nodes but failed to initialize with 3 or more nodes. The MLX ring backend requires each node to know its specific left and right neighbors in the ring, but the previous implementation provided a single flat host list shared by all nodes. With 2 nodes, a flat list [host0, host1] accidentally worked because each node could find its only neighbor. With 3+ nodes, each node needs a customized view: - Rank 0: [self, right_neighbor, placeholder] - Rank 1: [left_neighbor, self, right_neighbor] - Rank 2: [placeholder, left_neighbor, self] Changed MlxRingInstance from `hosts: list[Host]` to `hosts_by_node: dict[NodeId, list[Host]]` with `ephemeral_port: int`. Added `get_mlx_ring_hosts_by_node()` which generates per-node host lists where: - Self position uses 0.0.0.0 for local binding - Left/right neighbors use actual connection IPs - Non-neighbors use 198.51.100.1 (RFC 5737 TEST-NET-2 placeholder) Also added IP prioritization (en0 > en1 > non-Thunderbolt > any) to prefer stable network interfaces. Fixed topology discovery recording loopback addresses (127.0.0.1) as valid connections to remote nodes. The reachability check now verifies node identity via HTTP GET /node_id rather than just checking if the port is open. Test plan: - Built a DMG [0] - Installed on all Macs and started cluster. - Requested a 3 node Pipeline + MLX Ring Llama 3.3 70B (FP16). - It started and I was able to send a few chat messages. Eventually my instance seemed to get into a broken state and chat stopped working, but this commit is a clear step forward. [0] https://github.com/exo-explore/exo/actions/runs/20473983471/job/58834969418	2025-12-28 20:38:20 +00:00
Evan Quiney	8e9332d6a7	Separate out the Runner's behaviour into a "connect" phase and a "load" phase (#1006 ) ## Motivation We should ensure all runners are connected before loading the model - this gives us finer grained control in the future for the workers planning mechanism over the runners state. ## Changes - Introduced task ConnectToGroup, preceeding LoadModel - Introduced runner statuses Idle, Connecting, Connected - Separated out initialize_mlx from shard_and_load - Single instances never go through the connecting phase ## Test Plan # Automated Testing Added a test for checking event ordering in a standard workflow. # Manual testing Tested Llama 3.2 1b and Kimi K2 Thinking loads and shuts down repeatedly on multiple configurations. Not exhaustive, however. --------- Co-authored-by: rltakashige <rl.takashige@gmail.com>	2025-12-27 16:28:42 +00:00
Jake Hillion	1c1792f5e8	mlx: update to 0.30.1 and align coordinator naming with MLX conventions The Jaccl distributed backend requires MLX 0.30.1+, which includes the RDMA over Thunderbolt support. The previous minimum version (0.29.3) would fail at runtime with "The only valid values for backend are 'any', 'mpi' and 'ring' but 'jaccl' was provided." Bump MLX dependency to >=0.30.1 and rename ibv_coordinators to jaccl_coordinators to match MLX's naming conventions. This includes the environment variable change from MLX_IBV_COORDINATOR to MLX_JACCL_COORDINATOR. Test plan: Hardware setup: 3x Mac Studio M3 Ultra connected all-to-all with TB5 - Built a DMG [0] - Installed on all Macs and started cluster. - Requested a 2 node Tensor + MLX RDMA instance of Llama 3.3 70B (FP16). - It started successfully. - Queried the chat a few times. All was good. This didn't work previously. - Killed the instance and spawned Pipeline + MLX Ring Llama 3.3 70B (FP16). Also started succesfully on two nodes and could be queried. Still not working: - Pipeline + MLX Ring on 3 nodes is failing. Haven't debugged that yet. [0] https://github.com/exo-explore/exo/actions/runs/20467656904/job/58815275013	2025-12-24 16:47:01 +00:00
Jake Hillion	02c915a88d	pyproject: drop pathlib dependency	2025-12-22 17:52:44 +00:00
Jake Hillion	dd0638b74d	pyproject: add pyinstaller to dev-dependencies	2025-12-22 15:49:27 +00:00
Evan Quiney	c9e2062f6e	switch from uvicorn to hypercorn	2025-12-05 17:29:06 +00:00
rltakashige	2b243bd80e	Consolidate!!! Fixes	2025-12-03 12:19:25 +00:00
rltakashige	b45cbdeecd	Consolidate cleanup	2025-11-21 14:54:02 +00:00
Alex Cheema	631cb81009	kimi k2 thinking	2025-11-11 18:03:39 +00:00
Evan Quiney	aa519b8c03	Worker refactor Co-authored-by: rltakashige <rl.takashige@gmail.com> Co-authored-by: Alex Cheema <alexcheema123@gmail.com>	2025-11-10 23:31:53 +00:00
rltakashige	6bbb6344b6	mlx.distributed.Group type stubs	2025-11-06 05:26:04 +00:00
rltakashige	16f724e24c	Update staging 14 Co-authored-by: Evan <evanev7@gmail.com> Co-authored-by: Alex Cheema <alexcheema123@gmail.com> Co-authored-by: David Munha Canas Correia <dmunha@MacBook-David.local> Co-authored-by: github-actions bot <github-actions@users.noreply.github.com>	2025-11-05 01:44:24 +00:00
rltakashige	91c635ca7a	Update mlx and mlx-lm packages Co-authored-by: Evan <evanev7@gmail.com>	2025-10-31 01:34:43 +00:00
Alex Cheema	a346af3477	download fixes	2025-10-22 11:56:52 +01:00
Alex Cheema	84dfc8a738	Fast memory profiling Co-authored-by: Evan <evanev7@gmail.com>	2025-10-07 16:23:51 +01:00
Evan Quiney	38ff949bf4	big refactor Fix. Everything. Co-authored-by: Andrei Cravtov <the.andrei.cravtov@gmail.com> Co-authored-by: Matt Beton <matthew.beton@gmail.com> Co-authored-by: Alex Cheema <alexcheema123@gmail.com> Co-authored-by: Seth Howes <sethshowes@gmail.com>	2025-09-30 11:03:04 +01:00
Matt Beton	1b8b456ced	full mlx caching implementation	2025-08-26 17:15:08 +01:00
Evan Quiney	5efe5562d7	feat: single entrypoint and logging rework	2025-08-26 11:08:09 +01:00
Andrei Cravtov	ef5c5b9654	changes include: ipc, general utilities, flakes stuff w/ just, autopull script	2025-08-25 17:33:40 +01:00
Evan Quiney	be6f5ae7f1	feat: build system and homebrew compatibility	2025-08-21 16:07:37 +01:00
Gelu Vrabie	57073f35c3	collection of fixes for Shanghai demo Co-authored-by: Matt Beton <matthew.beton@gmail.com> Co-authored-by: Gelu Vrabie <gelu@exolabs.net>	2025-08-15 15:21:51 +01:00
Matt Beton	1fe4ed3442	Worker Exception & Timeout Refactor Co-authored-by: Gelu Vrabie <gelu@exolabs.net> Co-authored-by: Alex Cheema <alexcheema123@gmail.com> Co-authored-by: Seth Howes <sethshowes@gmail.com>	2025-08-02 08:28:37 -07:00
Alex Cheema	92c9688bf0	Remove rust	2025-08-02 08:16:39 -07:00
Andrei Cravtov	3730160477	Fix the node-ID test Co-authored-by: Matt Beton <matthew.beton@gmail.com>	2025-07-24 17:09:12 +01:00
Andrei Cravtov	8d2536d926	Implemented basic discovery library in Rust + python bindings Co-authored-by: Gelu Vrabie <gelu@exolabs.net> Co-authored-by: Seth Howes <sethshowes@gmail.com> Co-authored-by: Matt Beton <matthew.beton@gmail.com>	2025-07-23 13:11:29 +01:00
Matt Beton	14b3c4a6be	New API!	2025-07-22 21:21:12 +01:00
Gelu Vrabie	596d9fc9d0	add forwarder service Co-authored-by: Gelu Vrabie <gelu@exolabs.net>	2025-07-22 20:53:26 +01:00
Alex Cheema	449fdac27a	Downloads	2025-07-21 22:42:37 +01:00
Gelu Vrabie	2f64e30dd1	Add sqlite connector Co-authored-by: Gelu Vrabie <gelu@exolabs.net>	2025-07-21 14:10:29 +01:00
Arbion Halili	520b1122a3	fix: Many Fixes	2025-07-16 13:35:31 +01:00
Arbion Halili	d9b9aa7ad2	Merge branch 'master-node' into staging	2025-07-15 16:32:08 +01:00
Arbion Halili	8799c288b0	BROKEN: work thus far	2025-07-14 21:09:08 +01:00
Matt Beton	21acd3794a	New Runner!	2025-07-10 16:34:35 +01:00
Arbion Halili	b0bd951005	Merge Basic Interfaces Co-authored-by: Alex Cheema <alexcheema123@gmail.com> Co-authored-by: Seth Howes <sethshowes@gmail.com> Co-authored-by: Matt Beton <matthew.beton@gmail.com> Co-authored-by: Andrei Cravtov <the.andrei.cravtov@gmail.com>	2025-07-09 19:04:21 +01:00
Arbion Halili	7dd8a979d2	feature: Simplest utilities for logging	2025-07-02 22:13:42 +01:00
Arbion Halili	40793f1d86	refactor: Refactor most things	2025-07-02 21:11:49 +01:00
Alex Cheema	0c46adc298	refactor: Use official OpenAI types	2025-06-29 22:30:18 +01:00
Arbion Halili	5abf03e31b	Scaffold Event Sourcing	2025-06-29 19:44:58 +01:00
Arbion Halili	c977ce9419	Ensure `exo-shared` is a Dependency of `exo-master` and `exo-worker`	2025-06-28 14:34:49 +01:00
Arbion Halili	f7f779da19	Fix Type Checker; Improve Protobuf Generation	2025-06-28 12:28:26 +01:00
Arbion Halili	61b8b1cb18	Add Protobuf Support	2025-06-28 01:26:49 +01:00

1 2

56 Commits