mirror of
https://github.com/mudler/LocalAI.git
synced 2026-03-31 13:15:51 -04:00
* Revert "fix: Add timeout-based wait for model deletion completion (#8756)"
This reverts commit 9e1b0d0c82.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: add mcp prompts and resources
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ui): add client-side MCP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ui): allow to authenticate MCP servers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ui): add MCP Apps
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: update AGENTS
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: allow to collapse navbar, save state in storage
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ui): add MCP button also to home page
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(chat): populate string content
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
78 lines
3.8 KiB
Markdown
78 lines
3.8 KiB
Markdown
# llama.cpp Backend
|
|
|
|
The llama.cpp backend (`backend/cpp/llama-cpp/grpc-server.cpp`) is a gRPC adaptation of the upstream HTTP server (`llama.cpp/tools/server/server.cpp`). It uses the same underlying server infrastructure from `llama.cpp/tools/server/server-context.cpp`.
|
|
|
|
## Building and Testing
|
|
|
|
- Test llama.cpp backend compilation: `make backends/llama-cpp`
|
|
- The backend is built as part of the main build process
|
|
- Check `backend/cpp/llama-cpp/Makefile` for build configuration
|
|
|
|
## Architecture
|
|
|
|
- **grpc-server.cpp**: gRPC server implementation, adapts HTTP server patterns to gRPC
|
|
- Uses shared server infrastructure: `server-context.cpp`, `server-task.cpp`, `server-queue.cpp`, `server-common.cpp`
|
|
- The gRPC server mirrors the HTTP server's functionality but uses gRPC instead of HTTP
|
|
|
|
## Common Issues When Updating llama.cpp
|
|
|
|
When fixing compilation errors after upstream changes:
|
|
1. Check how `server.cpp` (HTTP server) handles the same change
|
|
2. Look for new public APIs or getter methods
|
|
3. Store copies of needed data instead of accessing private members
|
|
4. Update function calls to match new signatures
|
|
5. Test with `make backends/llama-cpp`
|
|
|
|
## Key Differences from HTTP Server
|
|
|
|
- gRPC uses `BackendServiceImpl` class with gRPC service methods
|
|
- HTTP server uses `server_routes` with HTTP handlers
|
|
- Both use the same `server_context` and task queue infrastructure
|
|
- gRPC methods: `LoadModel`, `Predict`, `PredictStream`, `Embedding`, `Rerank`, `TokenizeString`, `GetMetrics`, `Health`
|
|
|
|
## Tool Call Parsing Maintenance
|
|
|
|
When working on JSON/XML tool call parsing functionality, always check llama.cpp for reference implementation and updates:
|
|
|
|
### Checking for XML Parsing Changes
|
|
|
|
1. **Review XML Format Definitions**: Check `llama.cpp/common/chat-parser-xml-toolcall.h` for `xml_tool_call_format` struct changes
|
|
2. **Review Parsing Logic**: Check `llama.cpp/common/chat-parser-xml-toolcall.cpp` for parsing algorithm updates
|
|
3. **Review Format Presets**: Check `llama.cpp/common/chat-parser.cpp` for new XML format presets (search for `xml_tool_call_format form`)
|
|
4. **Review Model Lists**: Check `llama.cpp/common/chat.h` for `COMMON_CHAT_FORMAT_*` enum values that use XML parsing:
|
|
- `COMMON_CHAT_FORMAT_GLM_4_5`
|
|
- `COMMON_CHAT_FORMAT_MINIMAX_M2`
|
|
- `COMMON_CHAT_FORMAT_KIMI_K2`
|
|
- `COMMON_CHAT_FORMAT_QWEN3_CODER_XML`
|
|
- `COMMON_CHAT_FORMAT_APRIEL_1_5`
|
|
- `COMMON_CHAT_FORMAT_XIAOMI_MIMO`
|
|
- Any new formats added
|
|
|
|
### Model Configuration Options
|
|
|
|
Always check `llama.cpp` for new model configuration options that should be supported in LocalAI:
|
|
|
|
1. **Check Server Context**: Review `llama.cpp/tools/server/server-context.cpp` for new parameters
|
|
2. **Check Chat Params**: Review `llama.cpp/common/chat.h` for `common_chat_params` struct changes
|
|
3. **Check Server Options**: Review `llama.cpp/tools/server/server.cpp` for command-line argument changes
|
|
4. **Examples of options to check**:
|
|
- `ctx_shift` - Context shifting support
|
|
- `parallel_tool_calls` - Parallel tool calling
|
|
- `reasoning_format` - Reasoning format options
|
|
- Any new flags or parameters
|
|
|
|
### Implementation Guidelines
|
|
|
|
1. **Feature Parity**: Always aim for feature parity with llama.cpp's implementation
|
|
2. **Test Coverage**: Add tests for new features matching llama.cpp's behavior
|
|
3. **Documentation**: Update relevant documentation when adding new formats or options
|
|
4. **Backward Compatibility**: Ensure changes don't break existing functionality
|
|
|
|
### Files to Monitor
|
|
|
|
- `llama.cpp/common/chat-parser-xml-toolcall.h` - Format definitions
|
|
- `llama.cpp/common/chat-parser-xml-toolcall.cpp` - Parsing logic
|
|
- `llama.cpp/common/chat-parser.cpp` - Format presets and model-specific handlers
|
|
- `llama.cpp/common/chat.h` - Format enums and parameter structures
|
|
- `llama.cpp/tools/server/server-context.cpp` - Server configuration options
|