chore(dllm): review fixes - file modes and build-matrix doc accuracy

Drop the stray executable bit from the Go sources and Makefile (the sibling Go backends commit them 644; only run.sh/package.sh are executable), and correct two documentation claims found in the final branch review: cuda13-dllm is built for amd64 only (arm64 CUDA ships as the l4t flavor), and package.sh is the parakeet-cpp-style stub layout with no ldd walk. Assisted-by: Claude Code (Fable 5) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-14 03:37:47 -04:00 · 2026-06-11 17:17:54 +00:00
parent aba9c4794a
commit eb61e1d770
11 changed files with 7 additions and 6 deletions
--- a/.agents/dllm-backend.md
+++ b/.agents/dllm-backend.md
@@ -112,13 +112,14 @@ carry that coverage.

 ## Build matrix

-`cpu-dllm` (amd64 + arm64), `cuda13-dllm` (amd64 + arm64), and
-`cuda13-nvidia-l4t-arm64-dllm` (Jetson / DGX Spark GB10), via
+`cpu-dllm` (amd64 + arm64), `cuda13-dllm` (amd64), and
+`cuda13-nvidia-l4t-arm64-dllm` (arm64 CUDA: Jetson / DGX Spark GB10), via
 `.github/backend-matrix.yml`. No darwin/Metal. CUDA builds forward
 `-DDLLM_CUDA=ON` (dllm.cpp gates ggml's CUDA behind its own flag - a bare
 `-DGGML_CUDA=ON` is overridden by the cache FORCE). `libdllm.so` is
-self-contained (ggml statically absorbed, PIC), so packaging only ships the
-one .so plus the usual ldd walk.
+self-contained (ggml statically absorbed, PIC), so `package.sh` only ships
+the binary, `run.sh` and that one .so (the parakeet-cpp-style stub layout;
+no ldd walk yet).

 ## Known limitations

--- a/backend/go/dllm/Makefile
+++ b/backend/go/dllm/Makefile
--- a/backend/go/dllm/capi.go
+++ b/backend/go/dllm/capi.go
--- a/backend/go/dllm/dllm.go
+++ b/backend/go/dllm/dllm.go
--- a/backend/go/dllm/dllm_test.go
+++ b/backend/go/dllm/dllm_test.go
--- a/backend/go/dllm/gemma4_parser.go
+++ b/backend/go/dllm/gemma4_parser.go
--- a/backend/go/dllm/gemma4_parser_test.go
+++ b/backend/go/dllm/gemma4_parser_test.go
--- a/backend/go/dllm/gemma4_renderer.go
+++ b/backend/go/dllm/gemma4_renderer.go
--- a/backend/go/dllm/gemma4_renderer_test.go
+++ b/backend/go/dllm/gemma4_renderer_test.go
--- a/backend/go/dllm/main.go
+++ b/backend/go/dllm/main.go
--- a/docs/content/features/text-generation.md
+++ b/docs/content/features/text-generation.md
@@ -676,8 +676,8 @@ This backend is **experimental**, and the engine does not yet have a prompt-KV p
 | Flavor | Hardware |
 |---|---|
 | `cpu-dllm` | CPU (amd64 + arm64) - functional but very slow on the 26B model; mainly useful for wiring tests |
-| `cuda13-dllm` | NVIDIA CUDA 13 (amd64 + arm64) |
-| `cuda13-nvidia-l4t-arm64-dllm` | NVIDIA L4T (Jetson / DGX Spark GB10) |
+| `cuda13-dllm` | NVIDIA CUDA 13 (amd64) |
+| `cuda13-nvidia-l4t-arm64-dllm` | NVIDIA L4T arm64 (Jetson / DGX Spark GB10) |

 macOS/Metal is not available yet.