Installing large backend images (e.g. vLLM/vLLM-omni, several GiB) over
the Web UI could fail with "failed to download layer 0: unexpected EOF"
when a single connection to the registry dropped mid-stream. The whole
install then failed with no recovery, and since the download is not
resumable, retrying from the UI restarted from zero and usually hit the
same blip again - so users saw it as a consistent, size-correlated
failure (issue #10577).
The registry transport already retries manifest/digest fetches via
defaultRetryPredicate (GetImage/GetImageDigest), but the per-layer data
stream in DownloadOCIImageTar bypassed it entirely: layer.Compressed()
+ xio.Copy ran exactly once.
Extract the per-layer copy into downloadLayerToFile, which retries on the
same transient errors (unexpected EOF, EOF, EPIPE, ECONNRESET, connection
refused) with exponential backoff, truncating any partial data before
each retry. Non-retryable errors and context cancellation still fail
fast.
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>