mirror of
https://github.com/ollama/ollama.git
synced 2025-12-23 15:48:33 -05:00
On the llama engine, when we compute the memory layout, we reserve a buffer to allow for some flexibility for incorrect estimates. This is subtracted from GPU free memory and on GPUs with limited memory, it may underflow. Fixes #13494