Files
LocalAI/docs/content/features/gpt-vision.md
Ettore Di Giacinto 5affb747a9 chore: drop AIO images (#9004)
AIO images are behind, and takes effort to maintain these. Wizard and
installation of models have been semplified massively, so AIO images
lost their purpose.

This allows us to be more laser focused on main images and reliefes
stress from CI.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-14 17:49:36 +01:00

1.8 KiB

+++ disableToc = false title = "🥽 GPT Vision" weight = 14 url = "/features/gpt-vision/" +++

LocalAI supports understanding images by using LLaVA, and implements the GPT Vision API from OpenAI.

llava

Usage

OpenAI docs: https://platform.openai.com/docs/guides/vision

To let LocalAI understand and reply with what sees in the image, use the /v1/chat/completions endpoint, for example with curl:

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "llava",
     "messages": [{"role": "user", "content": [{"type":"text", "text": "What is in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'

Grammars and function tools can be used as well in conjunction with vision APIs:

 curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "llava", "grammar": "root ::= (\"yes\" | \"no\")",
     "messages": [{"role": "user", "content": [{"type":"text", "text": "Is there some grass in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'

Setup

To setup the LLaVa models, follow the full example in the configuration examples.