* fix(downloader): stall timeout, resume-safe cancel, and stale-partial reaping
Large model installs would hang forever or never finish. Three defects in
the HTTP download path, all hit by big GGUF pulls over a slow or flaky link:
1. No stall timeout. The shared download client sets no body deadline
(correct for streaming) but also no read-idle timeout, and the
transport's IdleConnTimeout does not cover an in-flight body read. A
silently-dropped TCP connection (no FIN/RST) blocked the body Read
forever, freezing an install at N bytes until an external reaper killed
it. Add an idle-timeout reader that closes the body after a window of
zero progress (DownloadStallTimeout, default 60s), turning an indefinite
hang into a fast, retryable error. A read that returns data resets the
clock, so a slow-but-steady transfer is unaffected.
2. Cancellation deleted the partial. On context.Canceled the code removed
the .partial file, so any frontend restart (deploy, OOM) mid-download
wiped all progress and the retry restarted from zero. At slow egress,
files larger than the restart interval never completed. Keep the
.partial on cancel so the next attempt resumes via Range.
3. Partials leaked. Cleanup only ran on the context-cancel path, never on a
stall or a SIGKILL/OOM, so abandoned .partial files accumulated and could
fill the models volume. Add CleanupStalePartialFiles and reap partials
older than 24h on startup.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
* fix(downloader): discard the .partial on a deliberate user cancel
Review follow-up. The previous commit kept the .partial on every cancellation
so restarts could resume, but that also left a dangling partial when a user
*intentionally* cancelled an install — the file lingered until the 24h reaper.
Distinguish the two: cancel the gallery operation's context with a cause
(downloader.ErrUserCancelled) so the download layer can tell a deliberate
abort (discard the partial) from an incidental one such as a shutdown/restart
(keep it for resume). Detect cancellation via the context rather than the
returned error, because an HTTP request cancelled with a cause surfaces the
cause error, not context.Canceled.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
* fix(downloader): resolve gosec G122 in CleanupStalePartialFiles
CI's code-scanning (gosec) flagged G122 (symlink TOCTOU) for the os.Remove
call inside the filepath.WalkDir callback. Collect the stale paths during the
walk and delete them afterwards instead of mutating the tree from inside the
callback. Behavior is unchanged; the existing specs still pass.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>