Files
Jesse Gross 358af4af23 Revert "mlxrunner: add DFlash speculative decoding (#16134)"
This reverts commit 98e26b8c37.

The DFlash integration is too invasive to keep at this stage: it
threads DFlash-specific logic through the pipeline, base model
interfaces, and the cache layer. The recurrent cache also now
has qwen3.5 model-specific code. Revert it now and reintroduce
the self-contained, generally-useful pieces (YaRN RoPE DRY-out, draft
architecture autodetection, gated-delta fp32 state) as separate
follow-up commits.
2026-05-22 09:32:09 -07:00
..