ollama

mirror of https://github.com/ollama/ollama.git synced 2026-06-03 13:59:06 -04:00

Files

Jesse Gross 275f122cda mlxrunner: keep gated-delta recurrent state in float32

Split the gated-delta Metal/CUDA kernels' dtype template into separate
input (InT) and state (StT) types so activations can stay in bf16/fp16
while the accumulated delta state stays in float32. Allocate the delta
state and qwen3_5's no-cache zero state in float32 to match.

2026-05-22 09:32:09 -07:00

cache_test.go

mlxrunner: decouple models from attention cache storage layout

2026-04-27 20:04:46 -07:00

cache.go

Revert "mlxrunner: add DFlash speculative decoding (#16134 )"

2026-05-22 09:32:09 -07:00

recurrent_test.go

mlxrunner: keep gated-delta recurrent state in float32

2026-05-22 09:32:09 -07:00

recurrent.go

mlxrunner: keep gated-delta recurrent state in float32