mirror of
https://github.com/mudler/LocalAI.git
synced 2026-07-05 13:57:28 -04:00
Additive superset of /v1/models that enriches each model entry with the capabilities it supports plus its input/output modalities (text / image / audio / video). Clients that only understand /v1/models are unaffected -- they simply never call the new route. Audio and video *input* are derived from the model's multimodal limits (vLLM limit_mm_per_prompt), which no single usecase FLAG expresses. That gap is exactly why a plain capability list is insufficient and this enriched endpoint exists: an attachment router can now decide whether an image/audio/video file can go to the active model directly, or must be converted/transcribed first. Capability derivation lives in core/config as the single source of truth (ModelConfig.Capabilities / InputModalities / OutputModalities / VisionSupported / ...); the Ollama capability surface now delegates to it instead of keeping a parallel copy. Vision is gated on chat/completion capability so a MediaMarker hydrated onto a non-chat model (e.g. a pure ASR/TTS backend) no longer reports a false vision capability. Read-only listing: no new FLAG_* flag, reuses the existing `models` swagger tag, and intentionally exposes no MCP admin tool (there is nothing to manage conversationally). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>