mtmd.h declares `using json = nlohmann::ordered_json` at global scope (and its
mtmd.cpp depends on it), while ik_llama's whole server/common stack also uses
ordered_json. Our grpc-server.cpp/utils.hpp kept a plain `nlohmann::json` alias,
which now collides with mtmd.h once it is included for the multimodal port:
"conflicting declaration 'using json = ...'". Switch our two aliases to
ordered_json to match; it is API-compatible (utils.hpp already used ordered_json
for its log helper) and our json never crosses into an unordered-json API.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
The IK_LLAMA_VERSION bump to f96eaddba8bed6a9a5e628bbf6a566775c70b49c pulls in
upstream commit "Prune examples/llava", which deletes examples/llava (clip.* /
llava.*). The ik-llama backend's grpc-server.cpp built a local `myclip` library
from those files and called the removed clip/llava C API, so the bump no longer
builds.
ik_llama keeps its multimodal stack in the surviving `mtmd` library
(examples/mtmd/, public headers mtmd.h + mtmd-helper.h). This ports the backend's
multimodal path onto the high-level mtmd_* / mtmd_helper_* API in place, leaving
the text path (which still uses ik_llama's retained old common API) untouched:
- Makefile: bump IK_LLAMA_VERSION to f96eaddb.
- prepare.sh: drop the clip/llava source copy + sed block; mtmd is a library
target, no source copy needed.
- CMakeLists.txt: remove the `myclip` target; link `mtmd` and add its include
dir; build grpc-server as C++17 (mtmd headers require it).
- patches: drop 0002 (targeted the deleted examples/llava/clip.cpp; the mtmd
clip.cpp never calls ggml_quantize_chunk, so the fix is unneeded). Keep 0001
(verified still applies).
- grpc-server.cpp / utils.hpp: replace clip_model_load + clip_image_load_from_bytes
+ llava_image_embed_make_with_clip_img + the manual [img-N] prefix splitting and
per-image llava_embd_batch decode loop with mtmd_init_from_file (moved after the
model load, which it requires), mtmd_helper_bitmap_init_from_buf, mtmd_tokenize
and mtmd_helper_eval_chunks. Legacy [img-N] tags are translated, in order, into
mtmd media markers (mtmd_default_marker()); the post-image suffix text stays on
the normal token path so the sampling loop is unchanged.
Supersedes #10534.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Upstream ik_llama.cpp commit e0596bf6 ("Autoparser") changed
common_params_sampling::grammar from std::string to a common_grammar
struct (type + grammar), which broke our two direct accesses:
- JSON ingest fed the field through json_value<common_grammar>(...),
for which nlohmann has no from_json adapter.
- JSON export emitted the struct directly, for which nlohmann has no
to_json adapter.
Wrap the incoming JSON string in common_grammar{COMMON_GRAMMAR_TYPE_USER, ...}
and serialize via the inner .grammar member, mirroring upstream's
examples/server/server-context.cpp.
Also bump IK_LLAMA_VERSION to 286ce324baed17c95faec77792eaa6bdb1c7a5f5
so the local-ai side lines up with the dependency bump in #9496.
Assisted-by: Claude-Code:claude-opus-4-7