mirror of
https://github.com/mudler/LocalAI.git
synced 2026-07-03 04:46:54 -04:00
Adversarial verification against the canonical fork mudler/llama.cpp:localai-paged HEAD 1edddc8fe found the scope doc's section-3 seam references were anchored to the abandoned pre-trim tree 237ad9b96, which the immediately-preceding commitb529cc5420reset away. Two classes of defect, both corrected: - Phantom scaffolding (honesty): the doc claimed "the team has already started scaffolding P1 and P3" citing four commits (237ad9b96 bf16 GDN state cache, afc2c7030 act-quant trace, ea0875d14 LLAMA_BF16_CUBLAS_F32_OUT, 7967ad47f W4A16 direct-A stub) thatb529cc5420TRIMMED - none exist at 1edddc8fe (git cat-file: not a valid object). w4a16-policy.h, test-cuda-w4a16-policy.cpp and ggml_cuda_mul_mat_id_w4a16_grouped_direct_a are absent from the tree. Reworded P1 plank-1 and the P3 mechanism/files/effort to say these must be re-introduced on top of the surviving grouped W4A16 path (patch 0035), not "finished". - Stale line numbers (additivity): every file:line was off (computed against the larger 237ad9b96 tree). Re-anchored to 1edddc8fe: ggml_cuda_try_fuse 4232 (was 4661), capture loop 4908 (was 5444), moe whole-pattern matcher 4157 (was 4678), routed_ffn_poc moe-ffn.cu:275 (was 456), grouped W4A16 hook ggml-cuda.cu:2797 (was 3093/3188; the direct-A hooks 3085/3171 never existed), concurrent_event machinery 4769 (was 5305-5318), continuous-batch budget server-context.cpp 3083-3135 with LLAMA_MAX_BATCH_TOKENS at 3105 / prefill_budget_step at 3113 (was 3122-3200). Numbers (attribution table, recovery arithmetic), the six P0 kill-gates, and the unreachable-floor honesty were verified sound and left unchanged. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>