LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-07-31 10:28:43 -04:00

Files

Adira 2c804bef5a fix(config): skip vocab arrays and mmap GGUF headers to speed up startup (#10213 )

When the models directory holds many GGUF files, startup parsed every
model's full GGUF — including the tokenizer vocab arrays
(tokenizer.ggml.tokens/scores/merges, often >100k entries) — once per
model while guessing defaults. On slow storage (e.g. a models directory
on a Docker volume) those hundreds of thousands of tiny reads dominate
boot time before the HTTP server comes up.

The default-guessing path and the VRAM metadata reader only consume
scalar metadata and array lengths, never the array contents. Parse with
SkipLargeMetadata (seek past large arrays) and UseMMap (fault in a few
header pages instead of issuing per-element read() syscalls). For a
256k-token vocab this cuts the parse from ~524k read() syscalls to 8.
The mapping is released when ParseGGUFFile returns.

Fixes #9790

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Adira Denis Muhando <dennisadira@gmail.com>

2026-06-07 23:33:52 +02:00

cache.go

feat(gallery): Speed up load times and clean gallery entries (#9211 )

2026-05-06 14:51:38 +02:00

estimate_test.go

feat(gallery): Speed up load times and clean gallery entries (#9211 )

2026-05-06 14:51:38 +02:00

estimate.go

feat(gallery): Speed up load times and clean gallery entries (#9211 )

2026-05-06 14:51:38 +02:00

gguf_reader.go

fix(config): skip vocab arrays and mmap GGUF headers to speed up startup (#10213 )