exo/nix at add-assertion - exo - Gitea: Git with a cup of tea

mirror/exo

mirror of https://github.com/exo-explore/exo.git synced 2026-04-17 12:30:29 -04:00

Files

rltakashige 3f0df404a5 Reduce memory consumption by adding Flash Attention to Qwen3.5 and Gemma 4, and fix RotatingKVCache prefix cache memory leak (#1886 )

## Motivation

Part 1 of many memory improvements.

## Changes
As written in the title

## Test Plan

### Manual Testing
Gemma 4 26B cache reduced from 54GB -> 10GB per 100k tokens, Qwen3.5 35B
A3B cache reduced from 21GB every 100000 tokens to 7GB.

2026-04-13 18:32:17 +01:00

apple-sdk/metadata

nix: override apple-sdk to 26.2 and enable MLX_BUILD_CPU (#1443 )

2026-02-10 19:53:53 +00:00

apple-sdk-overlay.nix

nix: override apple-sdk to 26.2 and enable MLX_BUILD_CPU (#1443 )

2026-02-10 19:53:53 +00:00

darwin-build-fixes.patch

Fix BatchGenerator in line with upstream refactor (and prevent Qwen3.5 memory leak) (#1835 )

2026-04-07 11:50:12 +00:00

metal-toolchain.nix

mlx: build with Nix (#1285 )

2026-01-29 14:07:00 +00:00

mlx.nix

Reduce memory consumption by adding Flash Attention to Qwen3.5 and Gemma 4, and fix RotatingKVCache prefix cache memory leak (#1886 )

2026-04-13 18:32:17 +01:00