Files
Jesse Gross 275f122cda mlxrunner: keep gated-delta recurrent state in float32
Split the gated-delta Metal/CUDA kernels' dtype template into separate
input (InT) and state (StT) types so activations can stay in bf16/fp16
while the accumulated delta state stays in float32. Allocate the delta
state and qwen3_5's no-cache zero state in float32 to match.
2026-05-22 09:32:09 -07:00
..