ollama

mirror of https://github.com/ollama/ollama.git synced 2026-06-02 21:34:51 -04:00

Files

Daniel Hiltgen d3da29cbfc mlx: mixed-precision quant and capability detection improvements (#15409 )

Improve the MLX model creation pipeline with several model-agnostic changes:

- Rewrite supportsVision to use vision_config instead of architecture name
- Add supportsAudio for audio encoder detection
- Add alignment checking (isAligned) for quantization group sizes
- Support per-projection mixed quantization in MoE expert packing
- Record per-tensor quant metadata in safetensors blobs
- Parse per-tensor quant metadata at model load time
- Validate quantize output is non-empty before storing
- Fix pin/unpin cleanup in expert group quantization
- Promote v_proj/k_proj/down_proj to INT8 for INT4 base quant
- Add MetalIsAvailable() utility
- Skip audio encoder tensors from quantization

2026-04-13 11:43:07 -07:00

base

MLX: add header vendoring and remove go build tag (#14642 )

2026-03-09 17:24:45 -07:00

embedding_test.go

mlx: quantized embeddings, fast SwiGLU, and runtime fixes (#14884 )

2026-03-17 11:21:38 -07:00

embedding.go

mlx: quantized embeddings, fast SwiGLU, and runtime fixes (#14884 )

2026-03-17 11:21:38 -07:00

linear.go

MLX: add header vendoring and remove go build tag (#14642 )