Commit Graph

319 Commits

Author SHA1 Message Date
Parth Sareen
2f4de1acf7 cmd: ollama launch always show model picker (#14299) 2026-02-17 12:02:14 -08:00
Patrick Devine
d18dcd7775 mlxrunner fixes (#14247)
* load glm4_moe_lite from the mlxrunner

* fix loading diffusion models

* remove log lines

* fix --imagegen flag
2026-02-13 22:30:42 -08:00
Parth Sareen
f0a07a353b cmd/tui: fix powershell search (#14242) 2026-02-13 15:53:11 -08:00
Devon Rifkin
948de6bbd2 add ability to disable cloud (#14221)
* add ability to disable cloud

Users can now easily opt-out of cloud inference and web search by
setting

```
"disable_ollama_cloud": true
```

in their `~/.ollama/server.json` settings file. After a setting update,
the server must be restarted.

Alternatively, setting the environment variable `OLLAMA_NO_CLOUD=1` will
also disable cloud features. While users previously were able to avoid
cloud models by not pulling or `ollama run`ing them, this gives them an
easy way to enforce that decision. Any attempt to run a cloud model when
cloud is disabled will fail.

The app's old "airplane mode" setting, which did a similar thing for
hiding cloud models within the app is now unified with this new cloud
disabled mode. That setting has been replaced with a "Cloud" toggle,
which behind the scenes edits `server.json` and then restarts the
server.

* gate cloud models across TUI and launch flows when cloud is disabled

Block cloud models from being selected, launched, or written to
integration configs when cloud mode is turned off:

- TUI main menu: open model picker instead of launching with a
  disabled cloud model
- cmd.go: add IsCloudModelDisabled checks for all Selection* paths
- LaunchCmd: filter cloud models from saved Editor configs before
  launch, fall through to picker if none remain
- Editor Run() methods (droid, opencode, openclaw): filter cloud
  models before calling Edit() and persist the cleaned list
- Export SaveIntegration, remove SaveIntegrationModel wrapper that
  was accumulating models instead of replacing them

* rename saveIntegration to SaveIntegration in config.go and tests

* cmd/config: add --model guarding and empty model list fixes

* Update docs/faq.mdx

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update internal/cloud/policy.go

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update internal/cloud/policy.go

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Update server/routes.go

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

* Revert "Update internal/cloud/policy.go"

This reverts commit 8bff8615f9.

Since this error shows up in other integrations, we want it to be
prefixed with Ollama

* rename cloud status

* more status renaming

* fix tests that weren't updated after rename

---------

Co-authored-by: ParthSareen <parth.sareen@ollama.com>
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2026-02-12 15:47:00 -08:00
Parth Sareen
77ba9404ac cmd/tui: improve model picker UX (#14209) 2026-02-11 14:36:54 -08:00
Patrick Devine
0aaf6119ec feature: add ctrl-g to allow users to use an editor to edit their prompt (#14197) 2026-02-11 13:04:41 -08:00
Parth Sareen
f08427c138 cmd: TUI UX improvements (#14198) 2026-02-11 10:18:41 -08:00
Jeffrey Morgan
9ec733e527 cmd: make 'ollama login' and 'ollama logout' aliases for 'ollama signin' and 'ollama signout' respectively (#14144) 2026-02-09 19:12:42 -08:00
Patrick Devine
235ba3df5c cmd: ollama menu and launch improvements (#14038) 2026-02-09 11:30:16 -08:00
Parth Sareen
5f53fe7884 cmd: ollama launch improvements (#14099) 2026-02-05 15:08:17 -08:00
Bruce MacDonald
c323161f24 cmd: helpful error message for remote models (#14057)
When trying to use cloud model with OLLAMA_HOST="ollama.com" while not signed in a helpful error message is displayed when the user is not signed in telling them they must sign in to use cloud models. This should be the same experience for models which specify a remote instance.
2026-02-04 14:55:11 -08:00
Bruce MacDonald
a6355329bf cmd: open browser on ollama signin when available (#14055)
When a browser is available open it to the connect URL automatically when running the `ollama signin` command. Browser is not opened in any other unauthorized scenario.
2026-02-03 16:42:09 -08:00
Seokrin Taron Sung
f92e362b2e cmd: capitalize Ollama in serve command help text (#13965) 2026-01-29 09:47:53 -08:00
Parth Sareen
aae6ecbaff cmd: rename ollama config to ollama launch (#13871) 2026-01-23 18:40:40 -08:00
Jeffrey Morgan
66831dcf70 x/imagegen: fix image editing support (#13866)
- Fix panic in ollama show for image gen models (safe type assertion)
- Add vision capability for Flux2KleinPipeline models at create time
- Flatten transparent PNG images onto white background for better results
2026-01-23 15:37:17 -08:00
Parth Sareen
199c41e16e cmd: ollama config command to help configure integrations to use Ollama (#13712) 2026-01-22 20:17:11 -08:00
Jeffrey Morgan
68e00c7c36 fix: prevent image generation models from loading during deletion (#13781)
Move the unload check (empty prompt + KeepAlive=0) before the image
generation model dispatch in GenerateHandler. This prevents models like
flux from being loaded into memory just to be immediately unloaded when
running `ollama rm`.

Also fix a bug in DeleteHandler where `args[0]` was used instead of
`arg` in the delete loop, causing only the first model to be unloaded
when deleting multiple models.
2026-01-19 12:48:34 -08:00
Jeffrey Morgan
634c416645 Add experimental image generation fields to /api/generate (#13753)
Request fields (experimental):
- width: image width (max 4096)
- height: image height (max 4096)
- steps: denoising steps
- seed: random seed

Response fields (experimental):
- images: base64-encoded generated images
- completed: current step progress
- total: total steps

Other changes:
- Fix lifecycle bug where image models wouldn't unload (refCount issue)
- Fix "headers already written" error on Ctrl+C during streaming
- Add gin middleware for OpenAI /v1/images/generations compatibility
- Update CLI to use /api/generate with progress bar
- Add preload support in interactive mode
2026-01-17 18:27:41 -08:00
Patrick Devine
a077d996e3 Fix create and show commands for experimental models (#13741)
* x: make `ollama create --experimental` import from safetensors

This change allows pulling in safetensors models into the new experimental model format, and also
fixes the `ollama show` command to be able to correctly display the model information.

* gofumpt the linter

* gofumpt the linter again

* validate the model name
2026-01-16 14:31:55 -08:00
Parth Sareen
d06acbcb19 x/cmd: enable web search and web fetch with flag (#13690) 2026-01-12 13:59:40 -08:00
Jeffrey Morgan
9667c2282f x/imagegen: add naive TeaCache and FP8 quantization support (#13683)
TeaCache:
- Timestep embedding similarity caching for diffusion models
- Polynomial rescaling with configurable thresholds
- Reduces transformer forward passes by ~30-50%

FP8 quantization:
- Support for FP8 quantized models (8-bit weights with scales)
- QuantizedMatmul on Metal, Dequantize on CUDA
- Client-side quantization via ollama create --quantize fp8

Other bug fixes:
- Fix `/api/show` API for image generation models
- Server properly returns model info (architecture, parameters, quantization)
- Memory allocation optimizations
- CLI improvements for image generation
2026-01-12 13:45:22 -08:00
Jeffrey Morgan
2584940016 Add z-image image generation prototype (#13659) 2026-01-09 21:09:46 -08:00
Parth Sareen
1ef4241727 x: request access for all commands, add welcome message (#13662) 2026-01-09 18:20:39 -08:00
Parth Sareen
12e2b3514a x: agent loop ux improvements (#13635) 2026-01-07 01:27:15 -08:00
Parth Sareen
76912c062a x: add experimental agent loop (#13628) 2026-01-05 23:38:40 -08:00
Jeffrey Morgan
8852220f59 add REQUIRES command to Modelfile (#13361) 2025-12-18 13:21:29 -08:00
Patrick Devine
d3e0a0dee4 model: ministral w/ llama4 scaling (#13292)
This change:

* fixes rope scaling in the mistral converter
* updates ministral to include llama4 scaling
* includes a new ministral parser for parsing reasoning and tool calling

---------

Co-authored-by: jmorganca <jmorganca@gmail.com>
2025-12-01 23:20:14 -08:00
nicole pardal
1ca608bcd1 embeddings: added embedding command for cl (#12795)
Co-authored-by: A-Akhil <akhilrahul70@gmail.com>

This PR introduces a new ollama embed command that allows users to generate embeddings directly from the command line.

Added ollama embed MODEL [TEXT...] command for generating text embeddings
Supports both direct text arguments and stdin piping for scripted workflows

Outputs embeddings as JSON arrays (one per line)
2025-11-05 11:58:03 -08:00
Patrick Devine
f89fc1cadd bugfix: show connection string for interactive cli usage (#12930) 2025-11-05 11:55:04 -08:00
Michael Yang
ed78e127d0 fix(cmd): unload model before removal (#12832)
this change fixes two bugs with `ollama rm`:

1. before a model is removed, it will first be stopped. this only
   happens for the first argument and skipped for all other models
2. models are unloaded indiscriminately. this errors for cloud models
   and should be omitted
2025-10-30 10:41:49 -07:00
Patrick Devine
b04e46da3e bugfix: restore the current runOptions if loading fails in the CLI (#12402)
There are two bugs when using `/load <model>` for a model that doesn't exist, namely:
  1. it will not restore the current model settings if the current model is a thinking model; and
  2. it will crash is the current model is a non-thinking model

This bug fix saves the current runOptions and then restores them if the model load
doesn't happen. It also fixes the crash happening for non-thinking models.
2025-09-25 18:30:45 -07:00
Patrick Devine
5a56ff3cf0 cli: add device signin flow when doing ollama push (#12405) 2025-09-25 15:04:43 -07:00
Patrick Devine
64883e3c4c auth: fix problems with the ollama keypairs (#12373)
* auth: fix problems with the ollama keypairs

This change adds several fixes including:
  - reading in the pubkey files correctly
  - fixing the push unit test to create a keypair file in a temp directory
  - not return 500 errors for normal status error
2025-09-22 23:20:20 -07:00
Patrick Devine
8b894933a7 engine: add remote proxy (#12307) 2025-09-17 14:40:53 -07:00
fengyuchuanshen
8a7e2055d2 cmd: use slices.Contains to simplify code (#12249) 2025-09-11 09:57:31 -07:00
Patrick Devine
026bc29237 cli: show the default context length env setting in online help (#11928) 2025-08-15 14:59:52 -07:00
Michael Yang
fa7776fd24 gpt-oss (#11672)
* bf16

* tests

* gpt-oss

* enable gptoss for engine

* rough estimate

* convert to mxfp4

* handle safetensors U8

* clamp glu/linear

* update tokenizer

* MXFP4 support

This implements the Open Compute Microscaling (MX) FP4 format
as a tensor type with backend implementations focusing
on mulmat and mulmatid on CPU, CUDA, and Metal.

* Unit tests for MXFP4 support

This exercises various operations and shapes on both CPU and GPU (if detected
on the system)

* cuda graph

* unit test adjustments

* cuda: optimize memory access

Read 4 bytes at a time (8 elements) when performing mul_mat_vec_mxfp4

* mac: fix crash on old macos versions

cblas_sgemm is only supported on v13.3 and up, however bf16 is
only supported on v14+ so we were falling back to ggml-blas and
crashing on bf16 tensors.  Checking for the function being null
seems to be the simplest way to condittionally avoid registering the
backend.

* server: Minimum context length for gptoss

This model requires a minimum context length of 8192 to function
effectively. Users can set higher values through all normal mechanisms
but lower values will be silently reset.

* ggml: Multiply by numParallel for gptoss sliding window

When computing the graph size estimate, the context size is already
multiplied by numParallel so estimates reflect that. However, since
sliding window models use a smaller, fixed context size, they need
to manually take numParallel into account.

* gpt-oss integration

includes harmony parser and thinking levels, etc.

* fix sync

* fix tests

* fix lint

---------

Co-authored-by: Daniel Hiltgen <daniel@ollama.com>
Co-authored-by: Jesse Gross <jesse@ollama.com>
Co-authored-by: Devon Rifkin <drifkin@drifkin.net>
2025-08-05 12:21:16 -07:00
Patrick Devine
80b538e312 cli: catch upstream errors gracefully (#11512) 2025-07-23 22:16:55 -07:00
frob
802ad16ce4 docs: add the no-Modelfile function of ollama create (#9077) 2025-07-16 22:16:10 -07:00
Parth Sareen
d73f8aa8c3 cmd: add default assistant role to message construction (#11431) 2025-07-16 11:18:16 -07:00
Daniel Hiltgen
34088dbcfb API/CLI context enhancements (#11331)
* API: expose context size of loaded models

* CLI: add context UX

This adds a column in the ps output to show the models context size.
2025-07-08 11:59:06 -07:00
Devon Rifkin
5f57b0ef42 add thinking support to the api and cli (#10584)
- Both `/api/generate` and `/api/chat` now accept a `"think"`
  option that allows specifying whether thinking mode should be on or
  not
- Templates get passed this new option so, e.g., qwen3's template can
  put `/think` or `/no_think` in the system prompt depending on the
  value of the setting
- Models' thinking support is inferred by inspecting model templates.
  The prefix and suffix the parser uses to identify thinking support is
  also automatically inferred from templates
- Thinking control & parsing is opt-in via the API to prevent breaking
  existing API consumers. If the `"think"` option is not specified, the
  behavior is unchanged from previous versions of ollama
- Add parsing for thinking blocks in both streaming/non-streaming mode
  in both `/generate` and `/chat`
- Update the CLI to make use of these changes. Users can pass `--think`
  or `--think=false` to control thinking, or during an interactive
  session they can use the commands `/set think` or `/set nothink`
- A `--hidethinking` option has also been added to the CLI. This makes
  it easy to use thinking in scripting scenarios like
  `ollama run qwen3 --think --hidethinking "my question here"` where you
  just want to see the answer but still want the benefits of thinking
  models
2025-05-28 19:38:52 -07:00
Daniel Hiltgen
7359b02707 win: detect background upgrade in progress (#10785)
Give the user a helpful error instead of showing
connection refused errors.
2025-05-21 10:46:56 -07:00
Daniel Hiltgen
27da2cddc5 Fix lingering Q4_0 help reference (#10720) 2025-05-15 16:33:23 -07:00
Bruce MacDonald
feb8923ada cmd: add ellipses to truncated show metadata (#10717)
When a piece of information has been truncated in the show output an ellipses to indicate that more data has not been displayed
2025-05-15 15:45:52 -07:00
Daniel Hiltgen
424810450f Move quantization to new backend (#10363)
* Move quantization logic to GGML via new backend

This moves the model aware logic to Go code and calls GGMLs quantization code for model creation.

* Remove "add model quantizations"

This is no longer needed now that quantization is implemented in Go+GGML code directly.
2025-05-06 11:20:48 -07:00
Michael Yang
d931ee8f22 create blobs in parallel (#10135)
* default max term height
* error on out of tree files
2025-05-05 11:59:26 -07:00
Devon Rifkin
dd93e1af85 Revert "increase default context length to 4096 (#10364)"
This reverts commit 424f648632.
2025-04-28 16:54:11 -07:00
Devon Rifkin
424f648632 increase default context length to 4096 (#10364)
* increase default context length to 4096

We lower the default numParallel from 4 to 2 and use these "savings" to
double the default context length from 2048 to 4096.

We're memory neutral in cases when we previously would've used
numParallel == 4, but we add the following mitigation to handle some
cases where we would have previously fallen back to 1x2048 due to low
VRAM: we decide between 2048 and 4096 using a runtime check, choosing
2048 if we're on a one GPU system with total VRAM of <= 4 GB. We
purposefully don't check the available VRAM because we don't want the
context window size to change unexpectedly based on the available VRAM.

We plan on making the default even larger, but this is a relatively
low-risk change we can make to quickly double it.

* fix tests

add an explicit context length so they don't get truncated. The code
that converts -1 from being a signal for doing a runtime check isn't
running as part of these tests.

* tweak small gpu message

* clarify context length default

also make it actually show up in `ollama serve --help`
2025-04-22 16:33:24 -07:00
Blake Mizerany
1e7f62cb42 cmd: add retry/backoff (#10069)
This commit adds retry/backoff to the registry client for pull requests.

Also, revert progress indication to match original client's until we can
"get it right."

Also, make WithTrace wrap existing traces instead of clobbering them.
This allows clients to compose traces.
2025-04-15 23:24:44 -07:00