mirror/exo - exo - Gitea: Git with a cup of tea

mirror/exo

mirror of https://github.com/exo-explore/exo.git synced 2026-04-17 20:40:35 -04:00

Author	SHA1	Message	Date
Alex Cheema	63b8e64715	Add model cards for Qwen3.6-35B-A3B variants (#1907 ) ## Motivation `mlx-community` has just published the new Qwen3.6-35B-A3B multimodal MoE family on HuggingFace. Without static model cards exo doesn't surface these models in the dashboard picker or match its placement / prefill logic, so users can't one-click launch them. This PR adds cards for the three quants whose safetensors indexes are already live on HF (4bit / 5bit / bf16). ## Changes Three new TOML files in `resources/inference_model_cards/`: - `mlx-community--Qwen3.6-35B-A3B-4bit.toml` (~19 GB) - `mlx-community--Qwen3.6-35B-A3B-5bit.toml` (~23 GB) - `mlx-community--Qwen3.6-35B-A3B-bf16.toml` (~65 GB) All three share the same architectural fields (`n_layers = 40`, `hidden_size = 2048`, `num_key_value_heads = 2`, `context_length = 262144`, capabilities `text, thinking, thinking_toggle, vision`, `base_model = "Qwen3.6 35B A3B"`) — only `model_id`, `quantization`, and `storage_size.in_bytes` differ between variants. ## Why It Works - Qwen3.6-35B-A3B reuses the `qwen3_5_moe` architecture (`Qwen3_5MoeForConditionalGeneration`) — the same one already wired into exo's MLX runner at `src/exo/worker/engines/mlx/auto_parallel.py:47` via `Qwen3_5MoeModel`. The architectural fields are taken verbatim from the HF `config.json.text_config` and match the existing `Qwen3.5-35B-A3B-` cards. - Storage sizes are the exact `metadata.total_size` read from each variant's `model.safetensors.index.json` on HF, so download progress and cluster-memory-fit checks are accurate. - Vision support is flagged in `capabilities`; the `[vision]` block is auto-detected by `ModelCard._autodetect_vision` from the upstream `config.json`, so no hand-written vision config is required. - The card loader (`_refresh_card_cache` in `src/exo/shared/models/model_cards.py`) globs every `.toml` in `resources/inference_model_cards/` on startup, so nothing else needs to change — the `/models` endpoint and the dashboard picker pick them up automatically. The `mxfp4` / `mxfp8` / `nvfp4` variants are still uploading upstream (index JSONs currently 404) and can be added in a follow-up PR once HF completes. ## Test Plan ### Manual Testing Hardware: MacBook Pro M4 Max, 48 GB unified memory. - Built the dashboard, ran `uv run exo`, waited for the API to come up on `http://localhost:52415`. - `curl -s http://localhost:52415/models` returns the three new model ids (`mlx-community/Qwen3.6-35B-A3B-{4bit,5bit,bf16}`) alongside existing models. - Opened the dashboard, clicked SELECT MODEL, typed "Qwen3.6" into the search box. A single "Qwen3.6 35B A3B"* group appears showing `3 variants (19GB-65GB)`. Expanding it lists the `4bit` / `5bit` / `bf16` quants with sizes `19GB` / `23GB` / `65GB`, exactly as expected: ![Qwen3.6 35B A3B in model picker](`127119f703/qwen36-picker.png`) - Programmatically loaded each TOML via `ModelCard.load_from_path(...)` and confirmed the parsed fields (layers / hidden / KV heads / context / quant / base_model / caps / bytes) match what's written in the files. ### Automated Testing No code paths were touched — these are pure TOML data files that plug into the existing model-card loader. The existing pytest suite covers TOML parsing and card serving; adding new TOMLs doesn't require new test scaffolding. `uv run ruff check` and `nix fmt` are clean. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Ryuichi Leo Takashige <rl.takashige@gmail.com>	2026-04-16 23:25:26 +01:00
Evan Quiney	9b381f7bfe	bump and simplify flake (#1866 ) seems like stablepkgs swiftfmt works now! also bump macmon to 0.7	2026-04-13 15:45:17 +00:00
Evan Quiney	7ee88c1f05	override macmon in flake (#1747 ) updates macmon to an upstream fork that fixes m5 max issues. might see if the upstream version gets merged before we release. --------- Co-authored-by: Alex Cheema <alexcheema123@gmail.com>	2026-03-24 17:30:19 +00:00
Alex Cheema	f370452d7e	Better onboarding UX (#1533 ) ## Summary - Complete onboarding wizard: 7-step flow guiding new users from Welcome → Your Devices (topology) → Add More Devices (animation) → Choose Model → Download → Load → Chat - Native macOS integration: NSPopover welcome callout anchored to menu bar icon on first launch, polished DMG installer with drag-to-Applications arrow - Dashboard UX polish: auto-download on model select, toast notifications, connection banner, skeleton loading, download progress in header, recommended model tags, sidebar hidden in home state for cleaner first impression - Settings & menu bar overhaul: native Settings window with Advanced tab, onboarding reset, chat sidebar toggle ## Test plan - [ ] Fresh install: verify onboarding wizard appears and flows Welcome → Topology → Animation → Model → Download → Load → Chat - [ ] Verify topology shows real device data in onboarding step 2 - [ ] Verify selecting a model in the main dashboard picker auto-triggers download - [ ] Verify chat sidebar is hidden on home view, appears when chat is active - [ ] Verify DMG installer has white background with curved arrow - [ ] Verify NSPopover appears anchored to menu bar icon on first launch 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Ryuichi Leo Takashige <leo@exolabs.net>	2026-02-23 11:27:28 +00:00
Evan Quiney	c8997217cf	Revert "feat: better onboarding UX for new users (#1479 )" This reverts commit `490d2e46ba`.	2026-02-17 18:02:32 +00:00
Alex Cheema	490d2e46ba	feat: better onboarding UX for new users (#1479 ) ## Summary - Auto-open dashboard in browser on first launch (uses `~/.exo/.dashboard_opened` marker) - Welcome overlay with "Choose a Model" CTA button when no model instance is running - Tutorial progress messages during model download → loading → ready lifecycle stages - Fix conversation sidebar text contrast — bumped to white text, added active state background - Simplify technical jargon — sharding/instance type/min nodes hidden behind collapsible "Advanced Options" toggle; strategy display hidden behind debug mode - Polished DMG installer with drag-to-Applications layout, custom branded background, and AppleScript-configured window positioning ## Test plan - [ ] Launch exo for the first time (delete `~/.exo/.dashboard_opened` to simulate) — browser should auto-open - [ ] Verify welcome overlay appears on topology when no model is loaded - [ ] Launch a model and verify download/loading/ready messages appear in instance cards - [ ] Check conversation sidebar text is readable (white on dark, yellow when active) - [ ] Verify "Advanced Options" toggle hides/shows sharding controls - [ ] Build DMG with `packaging/dmg/create-dmg.sh` and verify drag-to-Applications layout 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 17:52:49 +00:00
Evan Quiney	d90605f198	migrate model cards to .toml files (#1354 )	2026-02-03 12:32:06 +00:00
Jake Hillion	0a7fe5d943	ci: migrate build-app to github hosted runners	2025-12-22 19:51:48 +00:00

8 Commits