exo/rust at david/attn-moe-split - exo - Gitea: Git with a cup of tea

mirror/exo

mirror of https://github.com/exo-explore/exo.git synced 2026-04-18 04:52:40 -04:00

Files

vskiwi fc1ae90111 fix: DeepSeek V3.2 warmup crash and tool calling + add catalog cards (#1769 )

## Summary

DeepSeek V3.2 (`DeepseekV32ForCausalLM`) is already supported by exo's
inference engine (architecture whitelisted in `model_cards.py`, DSML
encoding added in #1548), but **doesn't work out of the box** due to two
bugs:

### Bug 1: `warmup_inference` passes empty model ID

`warmup_inference()` in `generate.py` accepts `model_id: ModelId` as a
parameter but creates `TextGenerationTaskParams(model=ModelId(""), ...)`
instead of using it. Since `_needs_dsml_encoding()` checks
`"deepseek-v3.2" in task_params.model.lower()`, the empty string never
matches → falls back to `tokenizer.apply_chat_template()` →
**ValueError** because V3.2 has no Jinja chat template.

**Fix:** `model=ModelId("")` → `model=model_id` (one line).

### Bug 2: `_needs_dsml_encoding` limited to tool calling

`_needs_dsml_encoding()` returns `True` only when `task_params.tools` is
present or tool messages exist in `chat_template_messages`. For warmup
and regular chat requests without tools → `return False` → Jinja
fallback → **ValueError**.

Unlike V3.1 (which has a `.jinja` chat template file that transformers
picks up automatically), V3.2 **has no Jinja template at all** — it uses
Python-based DSML encoding for all message types.

**Fix:** For V3.2, always return `True` — DSML encoding handles all
message types.

### Catalog cards

Added inference model cards for:
- `mlx-community/DeepSeek-V3.2-8bit`
- `mlx-community/DeepSeek-V3.2-4bit`

Parameters taken from model `config.json` on HuggingFace, storage sizes
from HF API. Capabilities include `thinking_toggle` (related: #1456).

## Notes

- The model ID string matching approach (`"deepseek-v3.2" in
model.lower()`) is acknowledged tech debt — see #1371 for the planned
architecture-based approach.

## Test plan

- [x] Start exo with DeepSeek V3.2 model → warmup should complete
without crash
- [x] Send a regular chat message (no tools) → should get a response
- [x] Send a chat message with tools → should work as before
- [x] V3.2 cards should appear in the dashboard model catalog

---------

Co-authored-by: user <user@m1.note>
Co-authored-by: Ryuichi Leo Takashige <leo@exolabs.net>
Co-authored-by: Evan <evanev7@gmail.com>

2026-03-25 16:20:35 +00:00

exo_pyo3_bindings

fix: DeepSeek V3.2 warmup crash and tool calling + add catalog cards (#1769 )

2026-03-25 16:20:35 +00:00

networking

Feat/static peer discovery (#1690 )

2026-03-25 10:55:12 +00:00

util

util: remove VecExt trait, inline at call site (#1446 )

2026-02-10 20:45:57 +00:00

parts.nix

remove nightly (#1538 )

2026-02-19 12:55:31 +00:00