ollama

mirror of https://github.com/ollama/ollama.git synced 2026-06-03 22:13:30 -04:00

Files

Patrick Devine 4860130f83 mlx: rework the MLX sampler (#16122 )

* mlx: rework the MLX sampler

Replace the MLX sampler transform chain with an explicit distribution pipeline that applies:
  1. penalties
  2. top-k
  3. temperature/softmax
  4. top-p
  5. min-p
  6. normalize
  7. categorical

The common top_k path now keeps sparse [B,K] token ids/probabilities on GPU instead of carrying full-vocab
scores, and sampled MTP reuses those draft/target distributions for acceptance, bonus, and residual sampling.

This change also fixes the seed parameter so that temperature sampling and sampled MTP are reproducible.

2026-05-13 17:18:27 -07:00

logprob_test.go

mlxrunner: batch the sampler across multiple sequences

2026-04-25 09:53:53 -07:00

sample_test.go

mlx: rework the MLX sampler (#16122 )

2026-05-13 17:18:27 -07:00

sample.go

mlx: rework the MLX sampler (#16122 )

2026-05-13 17:18:27 -07:00