ollama/x/mlxrunner at brucemacd/omp-docs - ollama - Gitea: Git with a cup of tea

mirror/ollama

mirror of https://github.com/ollama/ollama.git synced 2026-06-03 05:53:55 -04:00

Files

History

Jesse Gross 275f122cda mlxrunner: keep gated-delta recurrent state in float32

Split the gated-delta Metal/CUDA kernels' dtype template into separate
input (InT) and state (StT) types so activations can stay in bf16/fp16
while the accumulated delta state stays in float32. Allocate the delta
state and qwen3_5's no-cache zero state in float32 to match.

2026-05-22 09:32:09 -07:00

..

mlxrunner: decouple models from attention cache storage layout

2026-04-27 20:04:46 -07:00

mlxrunner: keep gated-delta recurrent state in float32

2026-05-22 09:32:09 -07:00

mlxrunner: keep gated-delta recurrent state in float32

2026-05-22 09:32:09 -07:00

Revert "mlxrunner: add DFlash speculative decoding (#16134 )"

2026-05-22 09:32:09 -07:00

mlx: rework the MLX sampler (#16122 )

2026-05-13 17:18:27 -07:00

cache_test.go

Revert "mlxrunner: add DFlash speculative decoding (#16134 )"

2026-05-22 09:32:09 -07:00

cache_trie_test.go

mlxrunner: share KV cache across conversations with common prefixes

2026-03-18 16:06:33 -07:00

cache_trie.go

mlxrunner: share KV cache across conversations with common prefixes

2026-03-18 16:06:33 -07:00

cache.go

Revert "mlxrunner: add DFlash speculative decoding (#16134 )"

2026-05-22 09:32:09 -07:00

client.go

Update MLX and MLX-C with threading fixes (#15845 )

2026-05-03 10:03:14 -07:00

imports.go

Revert "mlxrunner: add DFlash speculative decoding (#16134 )"

2026-05-22 09:32:09 -07:00

mtp.go

mlx: rework the MLX sampler (#16122 )

2026-05-13 17:18:27 -07:00

pipeline.go

Revert "mlxrunner: add DFlash speculative decoding (#16134 )"

2026-05-22 09:32:09 -07:00

runner.go

Revert "mlxrunner: add DFlash speculative decoding (#16134 )"

2026-05-22 09:32:09 -07:00

server.go

mlx: rework the MLX sampler (#16122 )

2026-05-13 17:18:27 -07:00

status_memory_test.go

mlx: avoid status timeout during inference (#16086 )

2026-05-11 16:03:38 -07:00

status_memory.go

mlx: avoid status timeout during inference (#16086 )

2026-05-11 16:03:38 -07:00

utf8_buffer_test.go

consolidate the tokenizer (#14327 )

2026-02-19 15:55:45 -08:00

utf8_buffer.go

consolidate the tokenizer (#14327 )

2026-02-19 15:55:45 -08:00