mirror of https://github.com/ollama/ollama.git synced 2026-02-05 05:03:21 -05:00

Files

Jeffrey Morgan f7102ba826 runner: discard compute results if sequence replaced mid-batch (#14072 )

If a sequence is replaced in s.seqs while a batch is computing, the old logits can be decoded into the new sequence. This change rechecks the sequence pointer after compute and skips decoding for replaced entries, preventing stale results from being applied.

2026-02-04 13:19:48 -08:00

common

server: add logprobs and top_logprobs support to Ollama's API (#12899 )

2025-11-11 08:49:50 -08:00

llamarunner

flash attn: add auto mode for llama engine (#13052 )

2025-12-12 13:27:19 -08:00

ollamarunner

runner: discard compute results if sequence replaced mid-batch (#14072 )

2026-02-04 13:19:48 -08:00

README.md

Runner for Ollama engine

2025-02-13 17:09:26 -08:00

runner.go

glm 4.7 flash support on experimental engine (#13838 )

2026-02-02 15:22:11 -08:00

README.md

`runner`

Note: this is a work in progress

A minimial runner for loading a model and running inference via a http web server.

./runner -model <model binary>

Completion

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion

Embeddings

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embedding