Commit Graph

30 Commits

Author SHA1 Message Date
rltakashige
5757c27dd5 Add download utility script (#1855)
## Motivation

<!-- Why is this change needed? What problem does it solve? -->
<!-- If it fixes an open issue, please link to the issue here -->

## Changes

<!-- Describe what you changed in detail -->

## Why It Works

<!-- Explain why your approach solves the problem -->

## Test Plan

### Manual Testing
<!-- Hardware: (e.g., MacBook Pro M1 Max 32GB, Mac Mini M2 16GB,
connected via Thunderbolt 4) -->
<!-- What you did: -->
<!-- - -->

### Automated Testing
<!-- Describe changes to automated tests, or how existing tests cover
this change -->
<!-- - -->
2026-04-08 00:58:39 +00:00
Mustafa Alp Yılmaz
2994b41089 fix: validate num_key_value_heads in tensor sharding placement (#1669)
## Problem

Models with fewer KV heads than nodes crash during tensor parallelism.
For example, Qwen3.5 MoE models have only 2 KV heads — trying to shard
across 4 nodes produces empty tensors and a reshape error at runtime.

The placement system already validates `hidden_size % num_nodes == 0`
but doesn't check KV heads, so it creates configurations that look valid
but blow up when the worker tries to split the attention heads.

Affected models include Qwen3.5-35B-A3B, Qwen3.5-122B-A10B,
Qwen3.5-397B-A17B, Qwen3-Next-80B-A3B, and Qwen3-Coder-Next (all have 2
KV heads).

## Changes

**Placement validation** (`src/exo/master/placement.py`):
- Combined KV heads divisibility check with the existing hidden_size
filter in a single pass
- Cycles where `num_key_value_heads % len(cycle) != 0` are now excluded
for tensor sharding
- Error message includes both constraints when no valid cycle is found

**Model card schema** (`src/exo/shared/models/model_cards.py`):
- Added optional `num_key_value_heads` field to `ModelCard` and
`ConfigData`
- Extracted from HuggingFace `config.json` (handles both top-level and
`text_config` nesting)
- Passed through in `fetch_from_hf()` for dynamically fetched cards

**All 68 inference model cards**
(`resources/inference_model_cards/*.toml`):
- Populated `num_key_value_heads` from each model's HuggingFace config

**Utility script** (`scripts/fetch_kv_heads.py`):
- Fetches `num_key_value_heads` from HuggingFace and updates TOML cards
- `--missing`: only fills in cards that don't have the field yet
- `--all`: re-fetches and overwrites everything
- Uses tomlkit for safe TOML editing and ThreadPoolExecutor for parallel
fetches

## Behavior

- Instance previews no longer show tensor options for models that can't
split their KV heads across the cluster size
- `place_instance()` rejects with a clear error instead of crash-looping
- Pipeline parallelism is unaffected
- 2-node tensor still works for 2-KV-head models (2 ÷ 2 = 1)
- Field is optional — existing custom cards without it continue to work
(validation is skipped when `None`)
2026-03-11 13:46:33 +00:00
Jake Hillion
0fcee70833 prep repo for v1 2025-12-17 15:31:02 +00:00
Sami Khan
971f5240bf build fix 2025-02-28 15:45:57 +05:00
Sami Khan
a70943f8d2 base images for animation 2025-01-22 05:46:38 -05:00
Alex Cheema
ba5bb3e171 fix scripts/build_exo.py: com.exolabs.exo -> net.exolabs.exo 2025-01-21 05:36:02 +00:00
DeftDawg
cde912deef - Use #!/usr/bin/env bash instead of #!/bin/bash for better portability 2024-12-22 01:14:54 -05:00
Alex Cheema
e8ece1158f tweak sed, make compile_grpc.sh executable 2024-12-06 13:23:06 +00:00
josh
0996bcc3b6 Merge branch 'main' into package-exo-fixes 2024-11-22 10:39:56 -08:00
Nel Nibcord
e3ec9eaa44 Fixed GRPC issues 2024-11-21 17:28:44 -08:00
josh
f5afa4db4d compile error fix 2024-11-21 08:47:45 -08:00
josh
5269629d76 removed unused code 2024-11-21 05:22:17 -08:00
josh
90765922c8 added one file 2024-11-20 00:12:12 -08:00
josh
41697431dc error fix 2024-11-19 20:47:56 -08:00
josh
44118252e9 build error fix 2024-11-19 20:34:48 -08:00
josh
3a1871c84b typo fix 2024-11-19 08:00:41 -08:00
josh
97ed990a98 macos sign 2024-11-19 07:58:01 -08:00
josh
8bc823229a missing lib 2024-11-19 07:57:22 -08:00
josh
ce9231ad3d move model fix 2024-11-19 07:56:20 -08:00
josh
e1519246ee error fix 2024-11-19 05:54:05 -08:00
Alex Cheema
1fa42f3063 typo 2024-11-19 17:02:07 +04:00
josh
6fc0b04479 error fix 2024-11-19 04:55:50 -08:00
josh
520d9d1164 error fix 2024-11-19 04:49:02 -08:00
josh
bcd885dcc9 cleaned code 2024-11-19 01:02:46 -08:00
josh
8ce0fe2bb3 pr suggestion 2024-11-19 00:59:33 -08:00
josh
867f348e71 moving models 2024-11-19 00:49:10 -08:00
josh
00d4bda5bd fix build script 2024-11-18 23:29:53 -08:00
josh
e991438e72 pr suggestions fix 2024-11-18 23:02:03 -08:00
josh
fea1c0fc29 clean branch 2024-11-18 08:47:17 -08:00
Nel Nibcord
9712d696a9 Added a small script to compile grpc 2024-11-12 23:20:55 -08:00