## Motivation
<!-- Why is this change needed? What problem does it solve? -->
<!-- If it fixes an open issue, please link to the issue here -->
## Changes
<!-- Describe what you changed in detail -->
## Why It Works
<!-- Explain why your approach solves the problem -->
## Test Plan
### Manual Testing
<!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB,
connected via Thunderbolt 4) -->
<!-- What you did: -->
<!-- - -->
### Automated Testing
<!-- Describe changes to automated tests, or how existing tests cover
this change -->
<!-- - -->
## Summary
DeepSeek V3.2 (`DeepseekV32ForCausalLM`) is already supported by exo's
inference engine (architecture whitelisted in `model_cards.py`, DSML
encoding added in #1548), but **doesn't work out of the box** due to two
bugs:
### Bug 1: `warmup_inference` passes empty model ID
`warmup_inference()` in `generate.py` accepts `model_id: ModelId` as a
parameter but creates `TextGenerationTaskParams(model=ModelId(""), ...)`
instead of using it. Since `_needs_dsml_encoding()` checks
`"deepseek-v3.2" in task_params.model.lower()`, the empty string never
matches → falls back to `tokenizer.apply_chat_template()` →
**ValueError** because V3.2 has no Jinja chat template.
**Fix:** `model=ModelId("")` → `model=model_id` (one line).
### Bug 2: `_needs_dsml_encoding` limited to tool calling
`_needs_dsml_encoding()` returns `True` only when `task_params.tools` is
present or tool messages exist in `chat_template_messages`. For warmup
and regular chat requests without tools → `return False` → Jinja
fallback → **ValueError**.
Unlike V3.1 (which has a `.jinja` chat template file that transformers
picks up automatically), V3.2 **has no Jinja template at all** — it uses
Python-based DSML encoding for all message types.
**Fix:** For V3.2, always return `True` — DSML encoding handles all
message types.
### Catalog cards
Added inference model cards for:
- `mlx-community/DeepSeek-V3.2-8bit`
- `mlx-community/DeepSeek-V3.2-4bit`
Parameters taken from model `config.json` on HuggingFace, storage sizes
from HF API. Capabilities include `thinking_toggle` (related: #1456).
## Notes
- The model ID string matching approach (`"deepseek-v3.2" in
model.lower()`) is acknowledged tech debt — see #1371 for the planned
architecture-based approach.
## Test plan
- [x] Start exo with DeepSeek V3.2 model → warmup should complete
without crash
- [x] Send a regular chat message (no tools) → should get a response
- [x] Send a chat message with tools → should work as before
- [x] V3.2 cards should appear in the dashboard model catalog
---------
Co-authored-by: user <user@m1.note>
Co-authored-by: Ryuichi Leo Takashige <leo@exolabs.net>
Co-authored-by: Evan <evanev7@gmail.com>
## Motivation
Some users (primarily in China) are unable to access HuggingFace.co. HF
supports the HF_ENDPOINT env variable, and we also support it, but there
is no way to easily do that from the app currently.
## Test Plan
### Manual Testing
hf-mirror works
empty field works
google.com endpoint fails
## Motivation
Simplification of API + moving model specific code to the runner
<!-- Why is this change needed? What problem does it solve? -->
<!-- If it fixes an open issue, please link to the issue here -->
## Test Plan
### Manual Testing
Tested that GPT OSS outputs are parsed correctly on the dashboard.
### Automated Testing
<!-- Describe changes to automated tests, or how existing tests cover
this change -->
<!-- - -->