Files
Patrick Devine 4860130f83 mlx: rework the MLX sampler (#16122)
* mlx: rework the MLX sampler

Replace the MLX sampler transform chain with an explicit distribution pipeline that applies:
  1. penalties
  2. top-k
  3. temperature/softmax
  4. top-p
  5. min-p
  6. normalize
  7. categorical

The common top_k path now keeps sparse [B,K] token ids/probabilities on GPU instead of carrying full-vocab
scores, and sampled MTP reuses those draft/target distributions for acceptance, bonus, and residual sampling.

This change also fixes the seed parameter so that temperature sampling and sampled MTP are reproducible.
2026-05-13 17:18:27 -07:00
..
2026-05-13 17:18:27 -07:00