mirror of
https://github.com/exo-explore/exo.git
synced 2026-04-17 12:30:29 -04:00
## Motivation Part 1 of many memory improvements. ## Changes As written in the title ## Test Plan ### Manual Testing Gemma 4 26B cache reduced from 54GB -> 10GB per 100k tokens, Qwen3.5 35B A3B cache reduced from 21GB every 100000 tokens to 7GB.