mirror of
https://github.com/ollama/ollama.git
synced 2026-06-03 22:13:30 -04:00
* mlx: rework the MLX sampler Replace the MLX sampler transform chain with an explicit distribution pipeline that applies: 1. penalties 2. top-k 3. temperature/softmax 4. top-p 5. min-p 6. normalize 7. categorical The common top_k path now keeps sparse [B,K] token ids/probabilities on GPU instead of carrying full-vocab scores, and sampled MTP reuses those draft/target distributions for acceptance, bonus, and residual sampling. This change also fixes the seed parameter so that temperature sampling and sampled MTP are reproducible.