mirror of
https://github.com/mudler/LocalAI.git
synced 2026-07-03 04:46:54 -04:00
P1 of the EXECUTION_REARCH_SCOPE additive program landed: LLAMA_BF16_STREAM (default-off) bf16-resident residual-segment executor for the q36 MoE model's projection boundaries. - EXECUTION_REARCH_SCOPE.md: dated "P1 RESULT" subsection (P0 kill-gate GO, full build-out deltas, KL, correctness gates, honest magnitude, provenance). - PARITY_HANDOFF.md: chronology note (verdict, engagement, prefill/KL numbers, fork commits, deferred-not-failed measurements). Key reframe recorded: q36 GDN/attention projections are BF16 weights (not NVFP4), so bf16-stream is a MoE-model prefill lever; the dense model quantizes those projections to NVFP4 and engages nothing (stays bit-identical). Prefill MoE @512 +1.99% (reproducible, at noise floor), KL delta -0.00052 (KL-improving), all md5 + test-backend-ops gates green. Fork HEAD 653bb2f3d, tree 6cf1523047. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>