mirror of
https://github.com/ollama/ollama.git
synced 2026-06-03 13:59:06 -04:00
Split the gated-delta Metal/CUDA kernels' dtype template into separate input (InT) and state (StT) types so activations can stay in bf16/fp16 while the accumulated delta state stays in float32. Allocate the delta state and qwen3_5's no-cache zero state in float32 to match.