Files
rltakashige 3f0df404a5 Reduce memory consumption by adding Flash Attention to Qwen3.5 and Gemma 4, and fix RotatingKVCache prefix cache memory leak (#1886)
## Motivation

Part 1 of many memory improvements.

## Changes
As written in the title

## Test Plan

### Manual Testing
Gemma 4 26B cache reduced from 54GB -> 10GB per 100k tokens, Qwen3.5 35B
A3B cache reduced from 21GB every 100000 tokens to 7GB.
2026-04-13 18:32:17 +01:00
..