Files
LocalAI/backend/cpp/llama-cpp-localai-paged/docs
Ettore Di Giacinto ccf75d1dcd docs(paged): record P1 bf16-stream landing (GO)
P1 of the EXECUTION_REARCH_SCOPE additive program landed: LLAMA_BF16_STREAM
(default-off) bf16-resident residual-segment executor for the q36 MoE model's
projection boundaries.

- EXECUTION_REARCH_SCOPE.md: dated "P1 RESULT" subsection (P0 kill-gate GO,
  full build-out deltas, KL, correctness gates, honest magnitude, provenance).
- PARITY_HANDOFF.md: chronology note (verdict, engagement, prefill/KL numbers,
  fork commits, deferred-not-failed measurements).

Key reframe recorded: q36 GDN/attention projections are BF16 weights (not
NVFP4), so bf16-stream is a MoE-model prefill lever; the dense model quantizes
those projections to NVFP4 and engages nothing (stays bit-identical). Prefill
MoE @512 +1.99% (reproducible, at noise floor), KL delta -0.00052 (KL-improving),
all md5 + test-backend-ops gates green. Fork HEAD 653bb2f3d, tree 6cf1523047.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-07-02 14:34:26 +00:00
..