LocalAI [bot] 923c47020d fix(launcher): robust binary download/upgrade (resume, rate-limit, UX) (#10575)
* fix(launcher): resume flaky downloads, drop redundant percent, fit dialogs

The binary upgrade/download flow had three rough edges:

- The status label printed "Downloading... N%" right next to a progress
  bar already showing the percent. Replace it with a human-readable byte
  readout ("Downloading... 12.3 MB / 45.6 MB").
- A failed download (GitHub releases are flaky) had no recourse and always
  restarted from byte 0. Stream to "<dest>.part" and resume via a
  "Range: bytes=N-" request (handling 206/200/416), renaming to the final
  path only after checksum verification; on checksum failure the file is
  discarded so the next attempt starts clean. Add a Retry button that
  appears on failure and resumes from the partial file.
- Progress/install dialogs were hardcoded to oversized dimensions, leaving
  a blank gap below "View Release Notes". Size each window to its content
  with a sane minimum width.

Also unify the three near-identical download-progress popups into one
Launcher.showDownloadProgressWindow helper (and delete a dead unused copy
in ui.go) so the behaviour stays consistent across every entry point.

The progress callback now reports (downloaded, total) byte counts instead
of a single fraction. Resume/retry behaviour is covered by httptest-backed
unit tests in release_manager_test.go.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(launcher): resolve latest version via redirect to dodge GitHub API 403

On a fresh Linux start with no LocalAI installed, the download failed with
"failed to fetch latest release: status 403". The cause is the unauthenticated
api.github.com rate limit (60 requests/hour, per IP): on shared/NAT/CGNAT/cloud
addresses it is exhausted almost immediately and every request 403s.

Resolve the latest version by following the github.com "releases/latest"
redirect instead, reading the tag from the final ".../releases/tag/<tag>" URL.
That endpoint is not subject to the API rate limit. Only the version is ever
consumed by callers, so the tag is sufficient. The JSON API is kept as a
fallback, now honoring GITHUB_TOKEN and reporting rate-limit 403/429 clearly
instead of an opaque status code.

Covered by an httptest-backed unit test that asserts the redirect path is used.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-28 12:57:32 +02:00
2026-04-08 19:23:16 +02:00
2025-02-15 18:17:15 +01:00
2023-05-04 15:01:29 +02:00




LocalAI stars LocalAI License

Follow LocalAI_API Join LocalAI Discord Community

mudler%2FLocalAI | Trendshift

Deutsch | Español | français | 日本語 | 한국어 | Português | Русский | 中文

LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.

A small core, not a bundle. Each backend wraps a best-in-class engine (llama.cpp, vLLM, whisper.cpp, stable-diffusion, MLX...) in its own image, pulled only when a model needs it. You install nothing you don't use.

  • Composable by design: backends are separate and pulled on demand, so you install only what your model needs
  • Open and extensible: load any model, or build your own backend in any language against an open interface
  • Drop-in API compatibility: OpenAI, Anthropic, and ElevenLabs APIs across every backend
  • Any model, any modality: LLMs, vision, voice, image, and video behind one API
  • Any hardware: NVIDIA, AMD, Intel, Apple Silicon, Vulkan, or CPU-only
  • Multi-user ready: API key auth, user quotas, role-based access
  • Built-in AI agents: autonomous agents with tool use, RAG, MCP, and skills
  • Privacy-first: your data never leaves your infrastructure

A small LocalAI core with backends (llama.cpp, vLLM, MLX, whisper.cpp, stable-diffusion, kokoro, parakeet.cpp...) plugged in as separate on-demand images

Created by Ettore Di Giacinto and maintained by the LocalAI team.

📖 Documentation | 💬 Discord | 💻 Quickstart | 🖼️ Models | FAQ

Guided tour

https://github.com/user-attachments/assets/08cbb692-57da-48f7-963d-2e7b43883c18

Click to see more!

User and auth

https://github.com/user-attachments/assets/228fa9ad-81a3-4d43-bfb9-31557e14a36c

Agents

https://github.com/user-attachments/assets/6270b331-e21d-4087-a540-6290006b381a

Usage metrics per user

https://github.com/user-attachments/assets/cbb03379-23b4-4e3d-bd26-d152f057007f

Fine-tuning and Quantization

https://github.com/user-attachments/assets/5ba4ace9-d3df-4795-b7d4-b0b404ea71ee

WebRTC

https://github.com/user-attachments/assets/ed88e34c-fed3-4b83-8a67-4716a9feeb7b

Quickstart

macOS

Download LocalAI for macOS

Note: The DMG is not signed by Apple. After installing, run: sudo xattr -d com.apple.quarantine /Applications/LocalAI.app. See #6268 for details.

Containers (Docker, podman, ...)

Already ran LocalAI before? Use docker start -i local-ai to restart an existing container.

CPU only:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest

NVIDIA GPU:

# CUDA 13
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-13

# CUDA 12
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12

# NVIDIA Jetson ARM64 (CUDA 12, for AGX Orin and similar)
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64

# NVIDIA Jetson ARM64 (CUDA 13, for DGX Spark)
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64-cuda-13

AMD GPU (ROCm):

docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas

Intel GPU (oneAPI):

docker run -ti --name local-ai -p 8080:8080 --device=/dev/dri/card1 --device=/dev/dri/renderD128 localai/localai:latest-gpu-intel

Vulkan GPU:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan

Loading models

# From the model gallery (see available models with `local-ai models list` or at https://models.localai.io)
local-ai run llama-3.2-1b-instruct:q4_k_m
# From Huggingface
local-ai run huggingface://TheBloke/phi-2-GGUF/phi-2.Q8_0.gguf
# From the Ollama OCI registry
local-ai run ollama://gemma:2b
# From a YAML config
local-ai run https://gist.githubusercontent.com/.../phi-2.yaml
# From a standard OCI registry (e.g., Docker Hub)
local-ai run oci://localai/phi-2:latest

To test a running LocalAI server from the terminal, open an interactive chat session from another shell. Inside the prompt, /models lists installed models and /model <name> switches between them.

# Terminal 1
local-ai run llama-3.2-1b-instruct:q4_k_m

# Terminal 2
local-ai chat --model llama-3.2-1b-instruct:q4_k_m

Automatic Backend Detection: LocalAI automatically detects your GPU capabilities and downloads the appropriate backend. For advanced options, see GPU Acceleration.

For more details, see the Getting Started guide.

Latest News

For older news and full release notes, see GitHub Releases and the News page.

Features

Supported Backends & Acceleration

LocalAI supports 60+ backends including llama.cpp, vLLM, SGLang, transformers, whisper.cpp, diffusers, MLX, MLX-VLM, and many more. Hardware acceleration is available for NVIDIA (CUDA 12/13), AMD (ROCm), Intel (oneAPI/SYCL), Apple Silicon (Metal), Vulkan, and NVIDIA Jetson (L4T). All backends can be installed on-the-fly from the Backend Gallery.

See the full Backend & Model Compatibility Table and GPU Acceleration guide.

Backends built by us

Most backends wrap a best-in-class upstream engine. A handful of them are native C/C++/GGML engines (no Python at inference) developed and maintained by the LocalAI project itself:

Backend What it does
parakeet.cpp C++/GGML port of NVIDIA NeMo Parakeet ASR (tdt/ctc/rnnt/hybrid), with cache-aware streaming transcription
ced.cpp C++/GGML port of the CED audio-tagging models: sound-event classification (527-class AudioSet) over REST and the realtime API for live recognition
voxtral.c Voxtral Realtime 4B speech-to-text in pure C
vibevoice.cpp Native port of Microsoft VibeVoice for TTS (voice cloning) and long-form ASR with speaker diarization
rf-detr.cpp Native RF-DETR object detection and instance segmentation
locate-anything.cpp Open-vocabulary object detection and visual grounding (LocateAnything-3B)
depth-anything.cpp Depth Anything 3 monocular metric depth + camera pose estimation
privacy-filter.cpp Standalone GGML PII/NER token-classification engine powering LocalAI's PII redaction tier
LocalVQE Joint acoustic echo cancellation, noise suppression, and dereverberation
local-store Local-first vector database for embeddings (shipped in-tree)

We also maintain apex-quant, a per-tensor, per-layer quantization recipe for Mixture-of-Experts models that exploits their structural sparsity to produce GGUFs matching or beating Q8_0 quality - and they run out of the box on stock llama.cpp.

Resources

Team

LocalAI is maintained by a small team of humans, together with the wider community of contributors.

A huge thank you to everyone who contributes code, reviews PRs, files issues, and helps users in Discord — LocalAI is a community-driven project and wouldn't exist without you. See the full contributors list.

Citation

If you utilize this repository, data in a downstream project, please consider citing it with:

@misc{localai,
  author = {Ettore Di Giacinto},
  title = {LocalAI: The free, Open source OpenAI alternative},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/go-skynet/LocalAI}},

Sponsors

Do you find LocalAI useful?

Support the project by becoming a backer or sponsor. Your logo will show up here with a link to your website.

A huge thank you to our generous sponsors who support this project covering CI expenses, and our Sponsor list:

Past sponsors


Individual sponsors

A special thanks to individual sponsors, a full list is on GitHub and buymeacoffee. Special shout out to drikster80 for being generous. Thank you everyone!

Star history

LocalAI Star history Chart

License

LocalAI is a community-driven project created by Ettore Di Giacinto and maintained by the LocalAI team.

MIT - Author Ettore Di Giacinto mudler@localai.io

Acknowledgements

LocalAI couldn't have been built without the help of great software already available from the community. Thank you!

Contributors

This is a community project, a special thanks to our contributors!

Description
No description provided
Readme MIT 137 MiB
Languages
Go 69.2%
JavaScript 11.9%
Python 5.8%
HTML 4.7%
C++ 3%
Other 5.4%