LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-05-17 04:56:52 -04:00

Files

Ettore Di Giacinto 8452068f43 feat(importers): whisper.cpp HF repos pick a quant + nest under whisper/models (#9630 )

The WhisperImporter's Import() switch ordered LooksLikeURL ahead of the
HuggingFace branch, so any https://huggingface.co/<owner>/<repo> URI
(e.g. LocalAI-io/whisper-large-v3-it-yodas-only-ggml) hijacked the URL
path. FilenameFromUrl returned the repo slug, the gallery entry pointed
at the HTML repo page, the SHA256 was empty, and the HF file listing
was effectively dead code for HTTPS imports. The HF branch only fired
for huggingface://owner/repo and hf://owner/repo references.

Gate the URL case on a "ggml-*.bin" basename signal — mirroring how
the llama-cpp importer gates on ".gguf" — so direct file URLs still
take the URL path while HF repo URLs fall through to the HF branch.
There the file listing is actually consulted: every ggml-*.bin entry
is collected and one is picked by the new preferences.quantizations
preference (default q5_0; comma-separated for fallback ordering).

Pin the chosen file under whisper/models/<name>/<file> so a single
repo can ship q4_0/q5_0/q8_0 side-by-side without colliding on disk,
matching the llama-cpp/models/<name>/ layout. The fallback when no
preference matches is the last available ggml file, mirroring
llama-cpp's pickPreferredGroup behaviour.

Tests: replace the previous probe spec with positive assertions
against LocalAI-io/whisper-large-v3-it-yodas-only-ggml (default →
ggml-model-q5_0.bin, quantizations=q4_0 → ggml-model-q4_0.bin) plus
two offline specs that build a fake hfapi.ModelDetails to cover the
fallback rule and non-ggml filtering without touching the network.


Assisted-by: Claude:claude-opus-4-7 [Bash Read Edit WebFetch]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-05-01 12:03:07 +02:00