LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-04-17 05:18:53 -04:00

Author	SHA1	Message	Date
LocalAI [bot]	df2d25cee5	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `1163af96cf6bb4a4b819f998f84c153a49768b99` (#9368 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-16 01:13:08 +02:00
LocalAI [bot]	96cd561d9d	chore: ⬆️ Update ggml-org/llama.cpp to `b3d758750a268bf93f084ccfa3060fb9a203192a` (#9370 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-16 01:12:39 +02:00
LocalAI [bot]	62862ca06b	chore: ⬆️ Update ggml-org/llama.cpp to `fae3a28070fe4026f87bd6a544aba1b2d1896566` (#9357 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-15 01:25:41 +02:00
Ettore Di Giacinto	95efb8a562	feat(backend): add turboquant llama.cpp-fork backend (#9355 ) * feat(backend): add turboquant llama.cpp-fork backend turboquant is a llama.cpp fork (TheTom/llama-cpp-turboquant, branch feature/turboquant-kv-cache) that adds a TurboQuant KV-cache scheme. It ships as a first-class backend reusing backend/cpp/llama-cpp sources via a thin wrapper Makefile: each variant target copies ../llama-cpp into a sibling build dir and invokes llama-cpp's build-llama-cpp-grpc-server with LLAMA_REPO/LLAMA_VERSION overridden to point at the fork. No duplication of grpc-server.cpp — upstream fixes flow through automatically. Wires up the full matrix (CPU, CUDA 12/13, L4T, L4T-CUDA13, ROCm, SYCL f32/f16, Vulkan) in backend.yml and the gallery entries in index.yaml, adds a tests-turboquant-grpc e2e job driven by BACKEND_TEST_CACHE_TYPE_K/V=q8_0 to exercise the KV-cache config path (backend_test.go gains dedicated env vars wired into ModelOptions.CacheTypeKey/Value — a generic improvement usable by any llama.cpp-family backend), and registers a nightly auto-bump PR in bump_deps.yaml tracking feature/turboquant-kv-cache. scripts/changed-backends.js gets a special-case so edits to backend/cpp/llama-cpp/ also retrigger the turboquant CI pipeline, since the wrapper reuses those sources. * feat(turboquant): carry upstream patches against fork API drift turboquant branched from llama.cpp before upstream commit 66060008 ("server: respect the ignore eos flag", #21203) which added the `logit_bias_eog` field to `server_context_meta` and a matching parameter to `server_task::params_from_json_cmpl`. The shared backend/cpp/llama-cpp/grpc-server.cpp depends on that field, so building it against the fork unmodified fails. Cherry-pick that commit as a patch file under backend/cpp/turboquant/patches/ and apply it to the cloned fork sources via a new apply-patches.sh hook called from the wrapper Makefile. Simplifies the build flow too: instead of hopping through llama-cpp's build-llama-cpp-grpc-server indirection, the wrapper now drives the copied Makefile directly (clone -> patch -> build). Drop the corresponding patch whenever the fork catches up with upstream — the build fails fast if a patch stops applying, which is the signal to retire it. * docs: add turboquant backend section + clarify cache_type_k/v Document the new turboquant (llama.cpp fork with TurboQuant KV-cache) backend alongside the existing llama-cpp / ik-llama-cpp sections in features/text-generation.md: when to pick it, how to install it from the gallery, and a YAML example showing backend: turboquant together with cache_type_k / cache_type_v. Also expand the cache_type_k / cache_type_v table rows in advanced/model-configuration.md to spell out the accepted llama.cpp quantization values and note that these fields apply to all llama.cpp-family backends, not just vLLM. * feat(turboquant): patch ggml-rpc GGML_OP_COUNT assertion The fork adds new GGML ops bringing GGML_OP_COUNT to 97, but ggml/include/ggml-rpc.h static-asserts it equals 96, breaking the GGML_RPC=ON build paths (turboquant-grpc / turboquant-rpc-server). Carry a one-line patch that updates the expected count so the assertion holds. Drop this patch whenever the fork fixes it upstream. * feat(turboquant): allow turbo* KV-cache types and exercise them in e2e The shared backend/cpp/llama-cpp/grpc-server.cpp carries its own allow-list of accepted KV-cache types (kv_cache_types[]) and rejects anything outside it before the value reaches llama.cpp's parser. That list only contains the standard llama.cpp types — turbo2/turbo3/turbo4 would throw "Unsupported cache type" at LoadModel time, meaning nothing the LocalAI gRPC layer accepted was actually fork-specific. Add a build-time augmentation step (patch-grpc-server.sh, called from the turboquant wrapper Makefile) that inserts GGML_TYPE_TURBO2_0/3_0/4_0 into the allow-list of the copied grpc-server.cpp under turboquant-<flavor>-build/. The original file under backend/cpp/llama-cpp/ is never touched, so the stock llama-cpp build keeps compiling against vanilla upstream which has no notion of those enum values. Switch test-extra-backend-turboquant to set BACKEND_TEST_CACHE_TYPE_K=turbo3 / _V=turbo3 so the e2e gRPC suite actually runs the fork's TurboQuant KV-cache code paths (turbo3 also auto-enables flash_attention in the fork). Picking q8_0 here would only re-test the standard llama.cpp path that the upstream llama-cpp backend already covers. Refresh the docs (text-generation.md + model-configuration.md) to list turbo2/turbo3/turbo4 explicitly and call out that you only get the TurboQuant code path with this backend + a turbo* cache type. * fix(turboquant): rewrite patch-grpc-server.sh in awk, not python3 The builder image (ubuntu:24.04 stage-2 in Dockerfile.turboquant) does not install python3, so the python-based augmentation step errored with `python3: command not found` at make time. Switch to awk, which ships in coreutils and is already available everywhere the rest of the wrapper Makefile runs. * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> --------- Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2026-04-15 01:25:04 +02:00
Ettore Di Giacinto	87e6de1989	feat: wire transcription for llama.cpp, add streaming support (#9353 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-14 16:13:40 +02:00
LocalAI [bot]	c0648b8836	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `55d3c05bf7b377deaa5dc84d255d9740a345a206` (#9348 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-14 08:56:25 +02:00
LocalAI [bot]	906acba8db	chore: ⬆️ Update ggml-org/llama.cpp to `e97492369888f5311e4d1f3beb325a36bbed70e9` (#9347 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-14 08:54:25 +02:00
LocalAI [bot]	ea32b8953f	chore: ⬆️ Update ggml-org/llama.cpp to `1e9d771e2c2f1113a5ebdd0dc15bafe57dce64be` (#9330 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-13 09:42:18 +02:00
Ettore Di Giacinto	9ca03cf9cc	feat(backends): add ik-llama-cpp (#9326 ) * feat(backends): add ik-llama-cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore: add grpc e2e suite, hook to CI, update README Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2026-04-12 13:51:28 +02:00
Ettore Di Giacinto	151ad271f2	feat(rocm): bump to 7.x (#9323 ) feat(rocm): bump to 7.2.1 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-12 08:51:30 +02:00
LocalAI [bot]	6fbda277c5	chore: ⬆️ Update ggml-org/llama.cpp to `ff5ef8278615a2462b79b50abdf3cc95cfb31c6f` (#9319 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-11 23:15:23 +02:00
LocalAI [bot]	62a674ce12	chore: ⬆️ Update ggml-org/llama.cpp to `e62fa13c2497b2cd1958cb496e9489e86bbd5182` (#9312 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-11 08:39:10 +02:00
LocalAI [bot]	d4cd6c284f	chore: ⬆️ Update ggml-org/llama.cpp to `d132f22fc92f36848f7ccf2fc9987cd0b0120825` (#9302 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-10 08:46:45 +02:00
Ettore Di Giacinto	9748a1cbc6	fix(streaming): skip chat deltas for role-init elements to prevent first token duplication (#9299 ) When TASK_RESPONSE_TYPE_OAI_CHAT is used, the first streaming token produces a JSON array with two elements: a role-init chunk and the actual content chunk. The grpc-server loop called attach_chat_deltas for both elements with the same raw_result pointer, stamping the first token's ChatDelta.Content on both replies. The Go side accumulated both, emitting the first content token twice to SSE clients. Fix: in the array iteration loops in PredictStream, detect role-init elements (delta has "role" key) and skip attach_chat_deltas for them. Only content/reasoning elements get chat deltas attached. Reasoning models are unaffected because their first token goes into reasoning_content, not content.	2026-04-10 08:45:47 +02:00
Ettore Di Giacinto	2b05420f95	chore(llama.cpp): bump to 'd12cc3d1ca6bba741cd77887ac9c9ee18c8415c7' (#9282 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-09 08:12:05 +02:00
LocalAI [bot]	0526e60f8d	chore: ⬆️ Update ggml-org/llama.cpp to `66c4f9ded01b29d9120255be1ed8d5835bcbb51d` (#9269 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-08 08:27:38 +02:00
LocalAI [bot]	bccaba1f66	chore: ⬆️ Update ggml-org/llama.cpp to `d0a6dfeb28a09831d904fc4d910ddb740da82834` (#9259 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-07 00:38:36 +02:00
LocalAI [bot]	0dda4fe6f0	chore: ⬆️ Update ggml-org/llama.cpp to `761797ffdf2ce3f118e82c663b1ad7d935fbd656` (#9243 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-06 10:52:38 +02:00
Ettore Di Giacinto	773489eeb1	fix(chat): do not retry if we had chatdeltas or tooldeltas from backend (#9244 ) * fix(chat): do not retry if we had chatdeltas or tooldeltas from backend Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: use oai compat for llama.cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: apply to non-streaming path too Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * map also other fields Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-06 10:52:23 +02:00
Ettore Di Giacinto	06fbe48b3f	feat(llama.cpp): wire speculative decoding settings (#9238 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-05 14:56:30 +02:00
Ettore Di Giacinto	53deeb1107	fix(reasoning): suppress partial tag tokens during autoparser warm-up The C++ PEG parser needs a few tokens to identify the reasoning format (e.g. "<\|channel>thought\n" for Gemma 4). During this warm-up, the gRPC layer was sending raw partial tag tokens to Go, which leaked into the reasoning field. - Clear reply.message in gRPC when autoparser is active but has no diffs yet, matching llama.cpp server behavior of only emitting classified output - Prefer C++ autoparser chat deltas for reasoning/content in all streaming paths, falling back to Go-side extraction for backends without autoparser (e.g. vLLM) - Override non-streaming no-tools result with chat delta content when available - Guard PrependThinkingTokenIfNeeded against partial tag prefixes during streaming accumulation - Reorder default thinking tokens so <\|channel>thought is checked before <\|think\|> (Gemma 4 templates contain both)	2026-04-04 20:45:57 +00:00
Ettore Di Giacinto	c5a840f6af	fix(reasoning): warm-up Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-04 20:25:24 +00:00
LocalAI [bot]	7962dd16f7	chore: ⬆️ Update ggml-org/llama.cpp to `d006858316d4650bb4da0c6923294ccd741caefd` (#9215 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-04 09:44:39 +02:00
Ettore Di Giacinto	1ed6b9e5ed	fix(llama.cpp): correctly parse grpc header for bearer token auth Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-03 21:38:41 +00:00
LocalAI [bot]	c0a023d13d	chore: ⬆️ Update ggml-org/llama.cpp to `a1cfb645307edc61a89e41557f290f441043d3c2` (#9203 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-03 08:30:15 +02:00
LocalAI [bot]	26f1b94f4d	chore: ⬆️ Update ggml-org/llama.cpp to `95a6ebabb277c4cc18247e7bc2a5502133caca63` (#9199 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-02 08:53:16 +02:00
LocalAI [bot]	cc5f33ce95	chore: ⬆️ Update ggml-org/llama.cpp to `0fcb3760b2b9a3a496ef14621a7e4dad7a8df90f` (#9196 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-01 00:48:40 +02:00
LocalAI [bot]	b0b37a472f	chore: ⬆️ Update ggml-org/llama.cpp to `08f21453aec846867b39878500d725a05bd32683` (#9190 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-03-31 09:27:08 +02:00
LocalAI [bot]	3d738164b7	chore: ⬆️ Update ggml-org/llama.cpp to `7c203670f8d746382247ed369fea7fbf10df8ae0` (#9160 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-03-30 08:27:26 +02:00
Ettore Di Giacinto	59108fbe32	feat: add distributed mode (#9124 ) * feat: add distributed mode (experimental) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix data races, mutexes, transactions Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix events and tool stream in agent chat Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * use ginkgo Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(cron): compute correctly time boundaries avoiding re-triggering Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * enhancements, refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * do not flood of healthy checks Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * do not list obvious backends as text backends Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * tests fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Drop redundant healthcheck Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * enhancements, refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-03-30 00:47:27 +02:00
LocalAI [bot]	4c870288d9	chore: ⬆️ Update ggml-org/llama.cpp to `59d840209a5195c2f6e2e81b5f8339a0637b59d9` (#9144 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-03-28 18:18:06 +01:00
LocalAI [bot]	b86fa63f70	chore: ⬆️ Update ggml-org/llama.cpp to `a970515bdb0b1d09519106847660b0d0c84d2472` (#9137 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-03-26 07:56:41 +01:00
LocalAI [bot]	9bc68b2721	chore: ⬆️ Update ggml-org/llama.cpp to `9f102a1407ed5d73b8c954f32edab50f8dfa3f58` (#9127 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-03-25 07:52:14 +01:00
LocalAI [bot]	2ad8c149e0	chore: ⬆️ Update ggml-org/llama.cpp to `1772701f99dd3fc13f5783b282c2361eda8ca47c` (#9123 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-03-24 00:35:40 +01:00
LocalAI [bot]	31fcb1425d	chore: ⬆️ Update ggml-org/llama.cpp to `49bfddeca18e62fa3d39114a23e9fcbdf8a22388` (#9102 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-03-23 01:11:18 +01:00
Ettore Di Giacinto	f891d60d26	fix(llama.cpp): bundle libdl, librt, libpthread in llama-cpp backend (#9099 ) chore(llama.cpp): bundle libdl, librt, libpthread in llama-cpp backend Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-03-22 00:58:14 +01:00
LocalAI [bot]	b74111feed	chore: ⬆️ Update ggml-org/llama.cpp to `990e4d96980d0b016a2b07049cc9031642fb9903` (#9095 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-03-22 00:57:39 +01:00
Ettore Di Giacinto	031a36c995	feat: inferencing default, automatic tool parsing fallback and wire min_p (#9092 ) * feat: wire min_p Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat: inferencing defaults Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(refactor): re-use iterative parser Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore: generate automatically inference defaults from unsloth Instead of trying to re-invent the wheel and maintain here the inference defaults, prefer to consume unsloth ones, and contribute there as necessary. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore: apply defaults also to models installed via gallery Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore: be consistent and apply fallback to all endpoint Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-03-22 00:57:15 +01:00
LocalAI [bot]	aa3e82976e	chore: ⬆️ Update ggml-org/llama.cpp to `4cb7e0bd61e7e1101e8ab10db5dee70c5717a386` (#9087 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-03-21 09:41:11 +01:00
Ettore Di Giacinto	c3174f9543	chore(deps): bump llama-cpp to 'a0bbcdd9b6b83eeeda6f1216088f42c33d464e38' (#9079 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-03-20 08:12:21 +01:00
LocalAI [bot]	9a9da062e1	chore: ⬆️ Update ggml-org/llama.cpp to `5744d7ec430e2f875a393770195fda530560773f` (#9063 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-03-19 07:58:30 +01:00
LocalAI [bot]	a58475dbef	chore: ⬆️ Update ggml-org/llama.cpp to `ee4801e5a6ee7ee4063144ab44ab4e127f76fba8` (#9044 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-03-18 08:46:12 +01:00
LocalAI [bot]	118bcee196	chore: ⬆️ Update ggml-org/llama.cpp to `9b342d0a9f2f4892daec065491583ec2be129685` (#9039 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-03-17 10:22:42 +01:00
LocalAI [bot]	b2030255ca	chore: ⬆️ Update ggml-org/llama.cpp to `88915cb55c14769738fcab7f1c6eaa6dcc9c2b0c` (#9020 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-03-16 00:10:11 +01:00
LocalAI [bot]	87525109f1	chore: ⬆️ Update ggml-org/llama.cpp to `3a6f059909ed5dab8587df5df4120315053d57a4` (#9009 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-03-15 09:46:45 +01:00
LocalAI [bot]	977063c4ba	chore: ⬆️ Update ggml-org/llama.cpp to `e30f1fdf74ea9238ff562901aa974c75aab6619b` (#8997 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-03-14 01:16:42 +01:00
LocalAI [bot]	46a8941a2c	chore: ⬆️ Update ggml-org/llama.cpp to `57819b8d4b39d893408e51520dff3d47d1ebb757` (#8983 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-03-13 07:59:15 +01:00
Richard Palethorpe	b24ca51287	fix(llama-cpp): Set enable_thinking in the correct place (#8973 ) Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-03-12 13:32:29 +01:00
LocalAI [bot]	270eb956c7	chore: ⬆️ Update ggml-org/llama.cpp to `10e5b148b061569aaee8ae0cf72a703129df0eab` (#8946 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-03-11 08:04:09 +01:00
LocalAI [bot]	b48920ecf6	chore: ⬆️ Update ggml-org/llama.cpp to `23fbfcb1ad6c6f76b230e8895254de785000be46` (#8921 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-03-10 07:30:43 +01:00

1 2 3 4 5 ...

401 Commits