mirror of
https://github.com/mudler/LocalAI.git
synced 2026-04-01 13:42:20 -04:00
AIO images are behind, and takes effort to maintain these. Wizard and installation of models have been semplified massively, so AIO images lost their purpose. This allows us to be more laser focused on main images and reliefes stress from CI. Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
1.8 KiB
1.8 KiB
+++ disableToc = false title = "🥽 GPT Vision" weight = 14 url = "/features/gpt-vision/" +++
LocalAI supports understanding images by using LLaVA, and implements the GPT Vision API from OpenAI.
Usage
OpenAI docs: https://platform.openai.com/docs/guides/vision
To let LocalAI understand and reply with what sees in the image, use the /v1/chat/completions endpoint, for example with curl:
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "llava",
"messages": [{"role": "user", "content": [{"type":"text", "text": "What is in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'
Grammars and function tools can be used as well in conjunction with vision APIs:
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "llava", "grammar": "root ::= (\"yes\" | \"no\")",
"messages": [{"role": "user", "content": [{"type":"text", "text": "Is there some grass in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'
Setup
To setup the LLaVa models, follow the full example in the configuration examples.