mirror of
https://github.com/ollama/ollama.git
synced 2026-02-19 15:57:07 -05:00
When numPredict is set, the user will receive one less token than the requested limit. In addition, the stats will incorrectly show the number of tokens returned as the limit. In cases where numPredict is not set, the number of tokens is reported correctly. This occurs because numPredict is checked when setting up the next batch but hitting the limit will terminate the current batch as well. Instead, is is better to check the limit as we actually predict them.
4.8 KiB
4.8 KiB