* add ability to disable cloud
Users can now easily opt-out of cloud inference and web search by
setting
```
"disable_ollama_cloud": true
```
in their `~/.ollama/server.json` settings file. After a setting update,
the server must be restarted.
Alternatively, setting the environment variable `OLLAMA_NO_CLOUD=1` will
also disable cloud features. While users previously were able to avoid
cloud models by not pulling or `ollama run`ing them, this gives them an
easy way to enforce that decision. Any attempt to run a cloud model when
cloud is disabled will fail.
The app's old "airplane mode" setting, which did a similar thing for
hiding cloud models within the app is now unified with this new cloud
disabled mode. That setting has been replaced with a "Cloud" toggle,
which behind the scenes edits `server.json` and then restarts the
server.
* gate cloud models across TUI and launch flows when cloud is disabled
Block cloud models from being selected, launched, or written to
integration configs when cloud mode is turned off:
- TUI main menu: open model picker instead of launching with a
disabled cloud model
- cmd.go: add IsCloudModelDisabled checks for all Selection* paths
- LaunchCmd: filter cloud models from saved Editor configs before
launch, fall through to picker if none remain
- Editor Run() methods (droid, opencode, openclaw): filter cloud
models before calling Edit() and persist the cleaned list
- Export SaveIntegration, remove SaveIntegrationModel wrapper that
was accumulating models instead of replacing them
* rename saveIntegration to SaveIntegration in config.go and tests
* cmd/config: add --model guarding and empty model list fixes
* Update docs/faq.mdx
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update internal/cloud/policy.go
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update internal/cloud/policy.go
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update server/routes.go
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Revert "Update internal/cloud/policy.go"
This reverts commit 8bff8615f9.
Since this error shows up in other integrations, we want it to be
prefixed with Ollama
* rename cloud status
* more status renaming
* fix tests that weren't updated after rename
---------
Co-authored-by: ParthSareen <parth.sareen@ollama.com>
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
This change fixes an issue where GGML based models (for either the Ollama runner or
the legacy llama.cpp runner) would try to load the mlx library. That would panic
and the model fails to start.
This change adds a new MLX based runner which includes:
* Method-based MLX bindings
* Subprocess-based MLX runner (x/mlxrunner)
* KV cache with tree management
* A basic sampler
The GLM4-MoE-Lite model has been ported to use the new bindings.
---------
Co-authored-by: Michael Yang <git@mxy.ng>
This change includes:
- changes to the safetensors metadata format
- changes to the create command to properly create the blobs with the new format
- changes to load the new format
- fixes ollama show to properly show each tensor
This adds a new powershell install script suitable for running via
irm https://ollama.com/install.ps1 | iex
If you download the script and run '-?' it reports basic usage
information, as well as usage examples for common customization
options. The script is signed as part of the release process
to ensure it can run on a typically configured Windows system.
This does not include doc updates - we can merge those after a release
ships to avoid user confusion.
Set ANTHROPIC_DEFAULT_OPUS_MODEL, ANTHROPIC_DEFAULT_SONNET_MODEL,
ANTHROPIC_DEFAULT_HAIKU_MODEL, and CLAUDE_CODE_SUBAGENT_MODEL when
launching Claude Code so all model tiers route through Ollama.
Allow installing Ollama on MacOS directly from the command line. This is in line with other CLI tools and results in a more streamlined experience when the user is looking to use the CLI specifically.
When numPredict is set, the user will receive one less token
than the requested limit. In addition, the stats will incorrectly
show the number of tokens returned as the limit. In cases where
numPredict is not set, the number of tokens is reported correctly.
This occurs because numPredict is checked when setting up the next
batch but hitting the limit will terminate the current batch as well.
Instead, is is better to check the limit as we actually predict them.