Files
exo/resources
ciaranbor 6177550c34 Ciaran/parallel cfg (#1361)
## Motivation

Enable parallel classifier-free guidance (CFG) for Qwen image models.
CFG requires two forward passes (positive/negative prompts) - this
allows them to run on separate nodes simultaneously, reducing latency.

## Changes

  - Added uses_cfg flag to ModelCard to identify CFG-based models
- Extended PipelineShardMetadata with CFG topology fields (cfg_rank,
cfg_world_size, peer device info)
- Updated placement to create two CFG groups with reversed ordering
(places CFG peers as ring neighbors)
- Refactored DiffusionRunner to process CFG branches separately with
exchange at last pipeline stage
- Added get_cfg_branch_data() to PromptData for single-branch embeddings
  - Fixed seed handling in API for distributed consistency
  - Fixed image yield to only emit from CFG rank 0 at last stage
  - Increased num_sync_steps_factor from 0.125 to 0.25 for Qwen

## Why It Works

- 2 nodes + CFG: Both run all layers, process different CFG branches in
parallel
  - 4+ even nodes + CFG: Hybrid - 2 CFG groups × N/2 pipeline stages
  - Odd nodes or non-CFG: Falls back to pure pipeline parallelism
 
 Ring topology places CFG peers as neighbors to enable direct exchange.

## Test Plan

### Manual Testing

Verified performance gain for Qwen-Image for 2 node and 4 node cluster.
Non-CFG models still work

### Automated Testing
 
Added tests in test_placement_utils.py covering 2-node CFG parallel,
4-node hybrid, odd-node fallback, and non-CFG pipeline modes.
2026-02-04 21:16:35 +00:00
..