mirror of
https://github.com/exo-explore/exo.git
synced 2026-01-19 11:28:51 -05:00
## Motivation
Upgrade mlx-lm to version 0.30.2 which requires transformers 5.0.0rc2 as
a prerelease dependency. This enables support for newer models like Kimi
K2 Thinking while maintaining compatibility with existing models.
The transformers 5.x release includes breaking changes that affect
custom tokenizers like Kimi's TikTokenTokenizer, requiring compatibility
fixes.
## Changes
### Core Changes
- **mlx-lm upgrade**: Bump to 0.30.2 with locked exact versions for
mlx/mlx-lm to prevent breaking changes
- **transformers 5.x compatibility**: Enable prerelease transformers
dependency
### Kimi K2 Tokenizer Fixes
- Add `bytes_to_unicode` monkey-patch to restore function moved in
transformers 5.0.0rc2
- Load `TikTokenTokenizer` directly instead of via `AutoTokenizer` to
bypass transformers 5.x bug with `auto_map` fallback
- Patch `encode()` to use tiktoken directly with `allowed_special="all"`
to handle special tokens from chat templates
### Other Changes
- Dashboard: Show disk usage for completed model downloads
- CI: Add `workflow_dispatch` trigger to build-app workflow
- Docs: Add basic API documentation
### Testing
- Add comprehensive tokenizer unit tests for all supported models
- Tests verify encode/decode, special token handling, and chat template
encoding
## Why It Works
**bytes_to_unicode issue**: transformers 5.0.0rc2 moved
`bytes_to_unicode` from `transformers.models.gpt2.tokenization_gpt2` to
`transformers.convert_slow_tokenizer`. Kimi's `tokenization_kimi.py`
imports from the old location. The monkey-patch restores it at module
load time.
**AutoTokenizer issue**: transformers 5.x has a bug where
`tokenizer_class_from_name('TikTokenTokenizer')` returns `None` for
custom tokenizers with `auto_map`. Loading the tokenizer directly
bypasses this.
**encode() issue**: transformers 5.x's `pad()` method fails for slow
tokenizers. Using tiktoken's encode directly with
`allowed_special="all"` avoids this path and properly handles special
tokens like `<|im_user|>` from chat templates.
## Test Plan
### Manual Testing
- Hardware: 2x Mac Studios connected via Thunderbolt 5 (mike22 and
james21)
- Tested Kimi K2 Thinking, GPT-OSS-120B, GPT-OSS-20B, LLama-3.1-8B-bf16, qwen3-30B-A3B-8bit model with pipeline parallelism across both
nodes
- Verified warmup inference completes successfully
- Verified chat completions work with special tokens
### Automated Testing
- Added `test_tokenizers.py` with 31 tests covering:
- Basic encode/decode for all model families (deepseek, kimi, llama,
qwen, gpt-oss, glm)
- Special token encoding (critical for chat templates)
- Chat template application and encoding
- Kimi-specific and GLM-specific edge cases
- All tests pass: `uv run pytest
src/exo/worker/tests/unittests/test_mlx/test_tokenizers.py`
### Failing Tests
RDMA with all models.
---------
Co-authored-by: Evan <evanev7@gmail.com>
57 lines
1.1 KiB
Bash
Executable File
57 lines
1.1 KiB
Bash
Executable File
#!/usr/bin/env bash
|
|
|
|
set -euo pipefail
|
|
|
|
query() {
|
|
tailscale status | awk -v find="$1" '$2 == find { print $1 }'
|
|
}
|
|
|
|
if [[ $# -lt 2 ]]; then
|
|
echo "USAGE: $0 <test kind> [host1] [host2] ..."
|
|
exit 1
|
|
fi
|
|
|
|
|
|
kind=$1
|
|
shift
|
|
|
|
test_kinds="ring jaccl"
|
|
|
|
if ! echo "$test_kinds" | grep -q "$kind"; then
|
|
printf "%s is not a known test kind.\nCurrent test kinds are %s" "$kind" "$test_kinds"
|
|
exit 1
|
|
fi
|
|
|
|
hostnames=("$@")
|
|
weaved=()
|
|
ips=()
|
|
for name in "${hostnames[@]}"; do
|
|
ip=$(query "$name")
|
|
ips+=("$ip")
|
|
weaved+=("$name" "$ip")
|
|
done
|
|
|
|
devs_raw=$(printf "[\"%s\", \"%s\"], " "${weaved[@]}")
|
|
devs="[${devs_raw%, }]"
|
|
|
|
model_ids=("qwen3-30b" "gpt-oss-120b-MXFP4-Q8" "kimi-k2-thinking")
|
|
|
|
for model_id in "${model_ids[@]}"; do
|
|
for i in "${!ips[@]}"; do
|
|
{
|
|
req="{
|
|
\"model_id\": \"${model_id}\",
|
|
\"devs\": ${devs},
|
|
\"kind\": \"inference\"
|
|
}"
|
|
echo "req $req"
|
|
curl -sN \
|
|
-X POST "http://${ips[$i]}:52415/${kind}" \
|
|
-H "Content-Type: application/json" -d "$req" \
|
|
2>&1 | sed "s/^/\n${hostnames[$i]}@${ips[$i]}: /" || echo "curl to ${hostnames[$i]} failed" && exit 1
|
|
} &
|
|
done
|
|
wait
|
|
done
|
|
|