mirror of
https://github.com/mudler/LocalAI.git
synced 2026-03-31 05:03:13 -04:00
* Revert "fix: Add timeout-based wait for model deletion completion (#8756)"
This reverts commit 9e1b0d0c82.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: add mcp prompts and resources
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ui): add client-side MCP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ui): allow to authenticate MCP servers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ui): add MCP Apps
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: update AGENTS
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: allow to collapse navbar, save state in storage
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ui): add MCP button also to home page
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(chat): populate string content
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
3.8 KiB
3.8 KiB
llama.cpp Backend
The llama.cpp backend (backend/cpp/llama-cpp/grpc-server.cpp) is a gRPC adaptation of the upstream HTTP server (llama.cpp/tools/server/server.cpp). It uses the same underlying server infrastructure from llama.cpp/tools/server/server-context.cpp.
Building and Testing
- Test llama.cpp backend compilation:
make backends/llama-cpp - The backend is built as part of the main build process
- Check
backend/cpp/llama-cpp/Makefilefor build configuration
Architecture
- grpc-server.cpp: gRPC server implementation, adapts HTTP server patterns to gRPC
- Uses shared server infrastructure:
server-context.cpp,server-task.cpp,server-queue.cpp,server-common.cpp - The gRPC server mirrors the HTTP server's functionality but uses gRPC instead of HTTP
Common Issues When Updating llama.cpp
When fixing compilation errors after upstream changes:
- Check how
server.cpp(HTTP server) handles the same change - Look for new public APIs or getter methods
- Store copies of needed data instead of accessing private members
- Update function calls to match new signatures
- Test with
make backends/llama-cpp
Key Differences from HTTP Server
- gRPC uses
BackendServiceImplclass with gRPC service methods - HTTP server uses
server_routeswith HTTP handlers - Both use the same
server_contextand task queue infrastructure - gRPC methods:
LoadModel,Predict,PredictStream,Embedding,Rerank,TokenizeString,GetMetrics,Health
Tool Call Parsing Maintenance
When working on JSON/XML tool call parsing functionality, always check llama.cpp for reference implementation and updates:
Checking for XML Parsing Changes
- Review XML Format Definitions: Check
llama.cpp/common/chat-parser-xml-toolcall.hforxml_tool_call_formatstruct changes - Review Parsing Logic: Check
llama.cpp/common/chat-parser-xml-toolcall.cppfor parsing algorithm updates - Review Format Presets: Check
llama.cpp/common/chat-parser.cppfor new XML format presets (search forxml_tool_call_format form) - Review Model Lists: Check
llama.cpp/common/chat.hforCOMMON_CHAT_FORMAT_*enum values that use XML parsing:COMMON_CHAT_FORMAT_GLM_4_5COMMON_CHAT_FORMAT_MINIMAX_M2COMMON_CHAT_FORMAT_KIMI_K2COMMON_CHAT_FORMAT_QWEN3_CODER_XMLCOMMON_CHAT_FORMAT_APRIEL_1_5COMMON_CHAT_FORMAT_XIAOMI_MIMO- Any new formats added
Model Configuration Options
Always check llama.cpp for new model configuration options that should be supported in LocalAI:
- Check Server Context: Review
llama.cpp/tools/server/server-context.cppfor new parameters - Check Chat Params: Review
llama.cpp/common/chat.hforcommon_chat_paramsstruct changes - Check Server Options: Review
llama.cpp/tools/server/server.cppfor command-line argument changes - Examples of options to check:
ctx_shift- Context shifting supportparallel_tool_calls- Parallel tool callingreasoning_format- Reasoning format options- Any new flags or parameters
Implementation Guidelines
- Feature Parity: Always aim for feature parity with llama.cpp's implementation
- Test Coverage: Add tests for new features matching llama.cpp's behavior
- Documentation: Update relevant documentation when adding new formats or options
- Backward Compatibility: Ensure changes don't break existing functionality
Files to Monitor
llama.cpp/common/chat-parser-xml-toolcall.h- Format definitionsllama.cpp/common/chat-parser-xml-toolcall.cpp- Parsing logicllama.cpp/common/chat-parser.cpp- Format presets and model-specific handlersllama.cpp/common/chat.h- Format enums and parameter structuresllama.cpp/tools/server/server-context.cpp- Server configuration options