Commit Graph

3 Commits

Author SHA1 Message Date
Ettore Di Giacinto
cd6079b2f3 feat(backend): add buun-llama-cpp fork (DFlash + TCQ KV-cache)
spiritbuun/buun-llama-cpp is a fork of TheTom/llama-cpp-turboquant that adds
two independent features on top: DFlash block-diffusion speculative decoding
(via a dedicated DFlashDraftModel GGUF arch) and two extra TCQ KV-cache
variants (turbo2_tcq, turbo3_tcq) on top of TurboQuant's turbo2/turbo3/turbo4.

Follows the turboquant thin-wrapper pattern — reuses backend/cpp/llama-cpp
grpc-server sources verbatim, patches only the build copy to extend the KV
allow-list and wire up buun-exclusive tree_budget / draft_topk options.
DraftModel is already wired end-to-end (proto field 39 → params.speculative),
so DFlash activation only needs the existing options passthrough
(spec_type:dflash) plus the drafter path in draft_model.

CacheTypeOptions now surfaces the five turbo* values so the React UI dropdown
shows them — benefits turboquant too (previously users had to type them in
YAML manually).

Assisted-by: Claude:Opus-4.7 [Read] [Edit] [Bash] [WebFetch]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-24 12:52:53 +00:00
Richard Palethorpe
9ac1bdc587 feat(ui): Interactive model config editor with autocomplete (#9149)
* feat(ui): Add dynamic model editor with autocomplete

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* chore(docs): Add link to longformat installation video

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-04-07 14:42:23 +02:00
Richard Palethorpe
557d0f0f04 feat(api): Allow coding agents to interactively discover how to control and configure LocalAI (#9084)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-04-04 15:14:35 +02:00