LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-04-30 12:08:13 -04:00

Files

Ettore Di Giacinto c1f923b2bc fix(importer): emit all shards for multi-part GGUF models (#9513 )

The llama-cpp HuggingFace importer iterated files one at a time and
kept overwriting `lastGGUFFile`, so sharded repos such as
`unsloth/Kimi-K2.6-GGUF` (14 `Q8_K_XL` parts) produced a gallery entry
pointing only at the final shard — useless to llama.cpp's split loader,
which needs shard 1 to discover the set.

Group shards up front via new helpers in `pkg/huggingface-api`
(`SplitShardSuffix`, `ShardGroup`, `GroupShards`). The llama-cpp
importer now picks a group (preferred quant, then last-group fallback)
and emits every shard, with `Model:` pointing at shard 1.
`FindPreferredModelFile` returns shard 1 of the first matching group so
the gallery agent's preview stays coherent for sharded repos.

Adds unit coverage for the HuggingFace branch of the importer (which
had none), plus shard-detection tests in the hfapi package.

Assisted-by: Claude:Opus-4.7 [Read] [Edit] [Bash]

2026-04-23 15:00:02 +02:00

client_test.go

fix(importer): emit all shards for multi-part GGUF models (#9513 )

2026-04-23 15:00:02 +02:00

client.go

fix(importer): emit all shards for multi-part GGUF models (#9513 )

2026-04-23 15:00:02 +02:00

hfapi_suite_test.go

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00