LocalAI

mirror/LocalAI

Fork 0

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-12 02:38:19 -04:00

Commit Graph

Author	SHA1	Message	Date
Ettore Di Giacinto	d6bf3a4969	fix(buun-llama-cpp): drop logit_bias_eog arg from params_from_json_cmpl Previous substitution kept the call as 5 args, but buun predates the upstream refactor that also added the logit_bias_eog parameter to params_from_json_cmpl — buun's signature is still the 4-arg form (const llama_vocab*, const common_params&, int, const json&) and it still derives logit_bias_eog internally from the common_params. Replace the substitution with a line-delete. Guard matches both the original call (ctx_server.get_meta().logit_bias_eog) and the previously substituted form (params_base.sampling.logit_bias_eog) so the script stays safe across re-runs and whatever state the tree was left in. Assisted-by: Claude:Opus-4.7 [Read] [Edit] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-24 12:52:53 +00:00
Ettore Di Giacinto	b27d38a53d	fix(buun-llama-cpp): backport logit_bias_eog field to grpc-server copy LocalAI's shared grpc-server.cpp reaches ctx_server.get_meta().logit_bias_eog twice (the twin params_from_json_cmpl callsites). That accessor was added to server_context_meta upstream after buun's 2026-04-05 fork-point, so compiling against buun errors with 'struct server_context_meta' has no member named 'logit_bias_eog'. Rewrite the call sites — only in the buun grpc-server.cpp copy — to source the vector from params_base.sampling.logit_bias_eog instead. That vector is the underlying data the upstream meta accessor eventually returns (buun still carries common_params_sampling::logit_bias_eog at common.h:280), so the substitution yields identical behavior on both trees. The sed is guarded by a grep for the call site, so this patch is self-disabling once buun rebases past the upstream refactor. Assisted-by: Claude:Opus-4.7 [Read] [Edit] [Bash] [WebFetch] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-24 12:52:53 +00:00
Ettore Di Giacinto	cd6079b2f3	feat(backend): add buun-llama-cpp fork (DFlash + TCQ KV-cache) spiritbuun/buun-llama-cpp is a fork of TheTom/llama-cpp-turboquant that adds two independent features on top: DFlash block-diffusion speculative decoding (via a dedicated DFlashDraftModel GGUF arch) and two extra TCQ KV-cache variants (turbo2_tcq, turbo3_tcq) on top of TurboQuant's turbo2/turbo3/turbo4. Follows the turboquant thin-wrapper pattern — reuses backend/cpp/llama-cpp grpc-server sources verbatim, patches only the build copy to extend the KV allow-list and wire up buun-exclusive tree_budget / draft_topk options. DraftModel is already wired end-to-end (proto field 39 → params.speculative), so DFlash activation only needs the existing options passthrough (spec_type:dflash) plus the drafter path in draft_model. CacheTypeOptions now surfaces the five turbo* values so the React UI dropdown shows them — benefits turboquant too (previously users had to type them in YAML manually). Assisted-by: Claude:Opus-4.7 [Read] [Edit] [Bash] [WebFetch] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-24 12:52:53 +00:00

Author

SHA1

Message

Date

Ettore Di Giacinto

d6bf3a4969

fix(buun-llama-cpp): drop logit_bias_eog arg from params_from_json_cmpl

Previous substitution kept the call as 5 args, but buun predates the
upstream refactor that also *added* the logit_bias_eog parameter to
params_from_json_cmpl — buun's signature is still the 4-arg form
  (const llama_vocab*, const common_params&, int, const json&)
and it still derives logit_bias_eog internally from the common_params.

Replace the substitution with a line-delete. Guard matches both the
original call (ctx_server.get_meta().logit_bias_eog) and the previously
substituted form (params_base.sampling.logit_bias_eog) so the script
stays safe across re-runs and whatever state the tree was left in.

Assisted-by: Claude:Opus-4.7 [Read] [Edit] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-04-24 12:52:53 +00:00

Ettore Di Giacinto

b27d38a53d

fix(buun-llama-cpp): backport logit_bias_eog field to grpc-server copy

LocalAI's shared grpc-server.cpp reaches
ctx_server.get_meta().logit_bias_eog twice (the twin params_from_json_cmpl
callsites). That accessor was added to server_context_meta upstream after
buun's 2026-04-05 fork-point, so compiling against buun errors with
  'struct server_context_meta' has no member named 'logit_bias_eog'.

Rewrite the call sites — only in the buun grpc-server.cpp copy — to source
the vector from params_base.sampling.logit_bias_eog instead. That vector is
the underlying data the upstream meta accessor eventually returns (buun
still carries common_params_sampling::logit_bias_eog at common.h:280), so
the substitution yields identical behavior on both trees.

The sed is guarded by a grep for the call site, so this patch is
self-disabling once buun rebases past the upstream refactor.

Assisted-by: Claude:Opus-4.7 [Read] [Edit] [Bash] [WebFetch]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-04-24 12:52:53 +00:00

Ettore Di Giacinto

cd6079b2f3

feat(backend): add buun-llama-cpp fork (DFlash + TCQ KV-cache)

spiritbuun/buun-llama-cpp is a fork of TheTom/llama-cpp-turboquant that adds
two independent features on top: DFlash block-diffusion speculative decoding
(via a dedicated DFlashDraftModel GGUF arch) and two extra TCQ KV-cache
variants (turbo2_tcq, turbo3_tcq) on top of TurboQuant's turbo2/turbo3/turbo4.

Follows the turboquant thin-wrapper pattern — reuses backend/cpp/llama-cpp
grpc-server sources verbatim, patches only the build copy to extend the KV
allow-list and wire up buun-exclusive tree_budget / draft_topk options.
DraftModel is already wired end-to-end (proto field 39 → params.speculative),
so DFlash activation only needs the existing options passthrough
(spec_type:dflash) plus the drafter path in draft_model.

CacheTypeOptions now surfaces the five turbo* values so the React UI dropdown
shows them — benefits turboquant too (previously users had to type them in
YAML manually).

Assisted-by: Claude:Opus-4.7 [Read] [Edit] [Bash] [WebFetch]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-04-24 12:52:53 +00:00

3 Commits