mirror of
https://github.com/exo-explore/exo.git
synced 2026-02-05 11:43:17 -05:00
## Motivation Qwen3-Coder-Next just dropped on mlx-community in several quantizations. It's an 80B MoE model (Qwen3NextForCausalLM) which we already have tensor parallelism support for via QwenShardingStrategy — just needs model cards. ## Changes Added model cards for all 5 available quantizations: - `mlx-community/Qwen3-Coder-Next-4bit` (~46GB) - `mlx-community/Qwen3-Coder-Next-5bit` (~58GB) - `mlx-community/Qwen3-Coder-Next-6bit` (~69GB) - `mlx-community/Qwen3-Coder-Next-8bit` (~89GB) - `mlx-community/Qwen3-Coder-Next-bf16` (~158GB) All with `supports_tensor = true` since the architecture is already supported. ## Why It Works `Qwen3NextForCausalLM` is already handled by QwenShardingStrategy in auto_parallel.py and is in the supports_tensor allowlist in model_cards.py. No code changes needed — just the TOML card files. ## Test Plan ### Manual Testing <!-- n/a - model card addition only --> ### Automated Testing - `basedpyright` — 0 errors - `ruff check` — passes - `nix fmt` — no changes - `pytest` — 173 passed, 1 skipped 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>