Files
LocalAI/pkg/downloader/partial.go
LocalAI [bot] 2e734bf560 fix(downloader): stall timeout, resume-safe cancel, and stale-partial reaping (#10406)
* fix(downloader): stall timeout, resume-safe cancel, and stale-partial reaping

Large model installs would hang forever or never finish. Three defects in
the HTTP download path, all hit by big GGUF pulls over a slow or flaky link:

1. No stall timeout. The shared download client sets no body deadline
   (correct for streaming) but also no read-idle timeout, and the
   transport's IdleConnTimeout does not cover an in-flight body read. A
   silently-dropped TCP connection (no FIN/RST) blocked the body Read
   forever, freezing an install at N bytes until an external reaper killed
   it. Add an idle-timeout reader that closes the body after a window of
   zero progress (DownloadStallTimeout, default 60s), turning an indefinite
   hang into a fast, retryable error. A read that returns data resets the
   clock, so a slow-but-steady transfer is unaffected.

2. Cancellation deleted the partial. On context.Canceled the code removed
   the .partial file, so any frontend restart (deploy, OOM) mid-download
   wiped all progress and the retry restarted from zero. At slow egress,
   files larger than the restart interval never completed. Keep the
   .partial on cancel so the next attempt resumes via Range.

3. Partials leaked. Cleanup only ran on the context-cancel path, never on a
   stall or a SIGKILL/OOM, so abandoned .partial files accumulated and could
   fill the models volume. Add CleanupStalePartialFiles and reap partials
   older than 24h on startup.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* fix(downloader): discard the .partial on a deliberate user cancel

Review follow-up. The previous commit kept the .partial on every cancellation
so restarts could resume, but that also left a dangling partial when a user
*intentionally* cancelled an install — the file lingered until the 24h reaper.

Distinguish the two: cancel the gallery operation's context with a cause
(downloader.ErrUserCancelled) so the download layer can tell a deliberate
abort (discard the partial) from an incidental one such as a shutdown/restart
(keep it for resume). Detect cancellation via the context rather than the
returned error, because an HTTP request cancelled with a cause surfaces the
cause error, not context.Canceled.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* fix(downloader): resolve gosec G122 in CleanupStalePartialFiles

CI's code-scanning (gosec) flagged G122 (symlink TOCTOU) for the os.Remove
call inside the filepath.WalkDir callback. Collect the stale paths during the
walk and delete them afterwards instead of mutating the tree from inside the
callback. Behavior is unchanged; the existing specs still pass.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-19 21:35:21 +02:00

70 lines
2.1 KiB
Go

package downloader
import (
"io/fs"
"os"
"path/filepath"
"strings"
"time"
"github.com/mudler/xlog"
)
// PartialFileSuffix marks an in-progress download. The success path renames the
// partial to its final name, so any leftover with this suffix is an unfinished
// transfer.
const PartialFileSuffix = ".partial"
// CleanupStalePartialFiles removes *.partial files under root whose last
// modification is older than olderThan, returning the number removed. These are
// abandoned downloads left by a process killed mid-transfer (OOM, restart) or
// by a stall whose cleanup never ran; without reaping they accumulate and can
// fill the models volume. A still-in-progress download touches its .partial on
// every write, so a generous olderThan never trims an active transfer.
//
// A missing root is not an error (nothing to clean). Unreadable entries are
// skipped so one bad file does not abort the whole sweep.
func CleanupStalePartialFiles(root string, olderThan time.Duration) (int, error) {
if _, err := os.Stat(root); err != nil {
if os.IsNotExist(err) {
return 0, nil
}
return 0, err
}
cutoff := time.Now().Add(-olderThan)
// Collect candidates during the walk and delete them afterwards rather than
// mutating the tree from inside the WalkDir callback (avoids the symlink
// TOCTOU class flagged by gosec G122, and never removes an entry mid-walk).
var stale []string
err := filepath.WalkDir(root, func(path string, d fs.DirEntry, walkErr error) error {
if walkErr != nil {
return nil // skip unreadable subtree, keep going
}
if d.IsDir() || !strings.HasSuffix(d.Name(), PartialFileSuffix) {
return nil
}
info, err := d.Info()
if err != nil || info.ModTime().After(cutoff) {
return nil
}
stale = append(stale, path)
return nil
})
if err != nil {
return 0, err
}
removed := 0
for _, path := range stale {
if err := os.Remove(path); err != nil {
xlog.Warn("failed to remove stale partial download", "file", path, "error", err)
continue
}
removed++
xlog.Info("removed stale partial download", "file", path)
}
return removed, nil
}