Compare commits

..

119 Commits

Author SHA1 Message Date
Michael Yang
adeb40eaf2 Merge pull request #4231 from ollama/mxyng/parser
types/model: fix parser for empty values
2024-05-07 10:48:32 -07:00
Michael Yang
d7d33e5255 Merge pull request #951 from ollama/mxyng/example-fly
fly example
2024-05-07 10:46:24 -07:00
Michael Yang
63bc884e25 types/model: fix parser for empty values 2024-05-07 10:44:43 -07:00
Michael Yang
ef4e095d24 Merge pull request #4232 from ollama/revert-4190-fix/golang-ci
Revert "fix golangci workflow not enable gofmt and goimports"
2024-05-07 10:39:37 -07:00
Michael Yang
4d4f75a8a8 Revert "fix golangci workflow missing gofmt and goimports (#4190)"
This reverts commit 04f971c84b.
2024-05-07 10:35:44 -07:00
Mélony QIN
3f71ba406a Correct the kubernetes terminology (#3843)
* add details on kubernetes deployment and separate the testing process

* Update examples/kubernetes/README.md

thanks for suggesting this change, I agree with you and let's make this project better together !

Co-authored-by: JonZeolla <Zeolla@gmail.com>

---------

Co-authored-by: QIN Mélony <MQN1@dsone.3ds.com>
Co-authored-by: JonZeolla <Zeolla@gmail.com>
2024-05-07 09:53:08 -07:00
Hause Lin
88a67127d8 Update README.md to include ollama-r library (#4012)
* Update README.md

Add Ollama for R - ollama-r library

* Update README.md

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-07 09:52:30 -07:00
Jeffrey Morgan
f7dc7dcc64 Update .gitattributes 2024-05-07 09:50:19 -07:00
alwqx
04f971c84b fix golangci workflow missing gofmt and goimports (#4190) 2024-05-07 09:49:40 -07:00
Michael Yang
70edb9bc4d Merge pull request #4215 from ollama/mxyng/mem
llm: add minimum based on layer size
2024-05-07 09:26:33 -07:00
Michael Yang
3f0ed03856 Update examples/flyio/README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-07 09:25:01 -07:00
Michael Yang
4736391bfb llm: add minimum based on layer size 2024-05-06 17:04:19 -07:00
CrispStrobe
7c5330413b note on naming restrictions (#2625)
* note on naming restrictions

else push would fail with cryptic
retrieving manifest 
Error: file does not exist
==> maybe change that in code too

* Update docs/import.md

---------

Co-authored-by: C-4-5-3 <154636388+C-4-5-3@users.noreply.github.com>
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-06 16:03:21 -07:00
Jeffrey Morgan
39d9d22ca3 close server on receiving signal (#4213) 2024-05-06 16:01:37 -07:00
Jackie Li
af47413dba Add MarshalJSON to Duration (#3284)
---------

Co-authored-by: Patrick Devine <patrick@infrahq.com>
2024-05-06 15:59:18 -07:00
Jeffrey Chen
d091fe3c21 Windows automatically recognizes username (#3214) 2024-05-06 15:03:14 -07:00
Mohamed A. Fouad
ee02f548c8 Update linux.md (#3847)
Add -e to viewing logs in order to show end of ollama logs
2024-05-06 15:02:25 -07:00
Daniel Hiltgen
b08870aff3 Merge pull request #4188 from dhiltgen/use_our_lib
User our bundled libraries (cuda) instead of the host library
2024-05-06 14:41:05 -07:00
Darinka
3ecae420ac Update api.md (#3945)
* Update api.md

Changed the calculation of tps (token/s) in the documentation

* Update docs/api.md

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-06 14:39:58 -07:00
Daniel Hiltgen
4cbbf0e13b Merge pull request #4090 from dhiltgen/rocm_paths
Support Fedoras standard ROCm location
2024-05-06 14:33:41 -07:00
Daniel Hiltgen
380378cc80 Use our libraries first
Trying to live off the land for cuda libraries was not the right strategy.  We need to use the version we compiled against to ensure things work properly
2024-05-06 14:23:29 -07:00
Daniel Hiltgen
0963c65027 Merge pull request #4208 from dhiltgen/fix_sched_test
Fix stale test logic
2024-05-06 14:23:12 -07:00
Jeffrey Morgan
ed740a2504 Fix no slots available error with concurrent requests (#4160) 2024-05-06 14:22:53 -07:00
Jeffrey Morgan
c9f98622b1 Skip scheduling cancelled requests, always reload unloaded runners (#4189) 2024-05-06 14:22:24 -07:00
Daniel Hiltgen
0a954e5066 Fix stale test logic
The model processing was recently changed to be deferred but
this test scenario hadn't been adjusted for that change in behavior.
2024-05-06 14:15:37 -07:00
Adrien Brault
aa93423fbf docs: pbcopy on mac (#3129) 2024-05-06 13:47:00 -07:00
Nurgo
01c9386267 Add BrainSoup to compatible clients list (#3473) 2024-05-06 13:42:16 -07:00
Daniel Hiltgen
af9eb36f9f Merge pull request #4135 from dhiltgen/no_physx
Skip PhysX cudart library
2024-05-06 13:34:00 -07:00
Daniel Hiltgen
06093fd396 Merge pull request #4067 from dhiltgen/cudart
Add CUDA Driver API for GPU discovery
2024-05-06 13:30:27 -07:00
Tony Loehr
86b7fcac32 Update README.md with StreamDeploy (#3621)
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2024-05-06 11:14:41 -07:00
Hyden Liu
fb8ddc564e chore: delete HEAD (#4194) 2024-05-06 10:32:30 -07:00
Saif
242efe6611 👌 IMPROVE: add portkey library for production tools (#4119) 2024-05-06 10:25:23 -07:00
Jeffrey Morgan
1b0e6c9c0e Fix llava models not working after first request (#4164)
* fix llava models not working after first request

* individual requests only for llava models
2024-05-05 20:50:31 -07:00
Jeffrey Morgan
dfa2f32ca0 unload in critical section (#4187) 2024-05-05 17:18:27 -07:00
Daniel Hiltgen
840424a2c4 Merge pull request #4154 from dhiltgen/central_config
Centralize server config handling
2024-05-05 17:08:26 -07:00
Daniel Hiltgen
f56aa20014 Centralize server config handling
This moves all the env var reading into one central module
and logs the loaded config once at startup which should
help in troubleshooting user server logs
2024-05-05 16:49:50 -07:00
alwqx
6707768ebd chore: format go code (#4149) 2024-05-05 16:08:09 -07:00
Lord Basil - Automate EVERYTHING
c78bb76a12 update libraries for langchain_community + llama3 changed from llama2 (#4174) 2024-05-05 16:07:04 -07:00
Jeffrey Morgan
942c979232 allocate a large enough kv cache for all parallel requests (#4162) 2024-05-05 15:59:32 -07:00
Bernardo de Oliveira Bruning
06164911dd Update README.md (#4111)
---------

Co-authored-by: Patrick Devine <patrick@infrahq.com>
2024-05-05 14:45:32 -07:00
Patrick Devine
2a21363bb7 validate the format of the digest when getting the model path (#4175) 2024-05-05 11:46:12 -07:00
Daniel Hiltgen
026869915f Merge pull request #4144 from dhiltgen/max_queue
Make maximum pending request configurable
2024-05-05 10:53:44 -07:00
Daniel Hiltgen
45d61aaaa3 Add integration test to push max queue limits 2024-05-05 10:46:25 -07:00
Daniel Hiltgen
20f6c06569 Make maximum pending request configurable
This also bumps up the default to be 50 queued requests
instead of 10.
2024-05-04 21:00:52 -07:00
Daniel Hiltgen
371f5e52aa Merge pull request #4141 from dhiltgen/win_docs
Explain the 2 different windows download options
2024-05-04 12:50:16 -07:00
Daniel Hiltgen
e006480e49 Explain the 2 different windows download options 2024-05-04 12:50:05 -07:00
Michael Yang
aed545872d Merge pull request #4143 from ollama/mxyng/final-response
omit prompt and generate settings from final response
2024-05-03 17:39:49 -07:00
Michael Yang
44869c59d6 omit prompt and generate settings from final response 2024-05-03 17:00:02 -07:00
Daniel Hiltgen
52663284cf Merge pull request #4145 from dhiltgen/fix_lint
Fix lint warnings
2024-05-03 16:53:17 -07:00
Daniel Hiltgen
42fa9d7f0a Fix lint warnings 2024-05-03 16:44:19 -07:00
Michael Yang
b7a87a22b6 Merge pull request #4059 from ollama/mxyng/parser-2
rename parser to model/file
2024-05-03 13:01:22 -07:00
Dr Nic Williams
e8aaea030e Update 'llama2' -> 'llama3' in most places (#4116)
* Update 'llama2' -> 'llama3' in most places

---------

Co-authored-by: Patrick Devine <patrick@infrahq.com>
2024-05-03 15:25:04 -04:00
Daniel Hiltgen
b1ad3a43cb Skip PhysX cudart library
For some reason this library gives incorrect GPU information, so skip it
2024-05-03 11:55:32 -07:00
Daniel Hiltgen
267e25a750 Merge pull request #4129 from dhiltgen/unit_tests
Soften timeouts on sched unit tests
2024-05-03 11:10:26 -07:00
Daniel Hiltgen
9a32c514cb Soften timeouts on sched unit tests
This gives us more headroom on the scheduler tests to tamp
down some flakes.
2024-05-03 09:08:33 -07:00
Michael Yang
e9ae607ece Merge pull request #3892 from ollama/mxyng/parser
refactor modelfile parser
2024-05-02 17:04:47 -07:00
Michael Yang
93707fa3f2 Merge pull request #4108 from ollama/mxyng/lf
fix line ending
2024-05-02 14:55:15 -07:00
Michael Yang
94c369095f fix line ending
replace CRLF with LF
2024-05-02 14:53:13 -07:00
Jeffrey Morgan
9164b0161b Update .gitattributes 2024-05-02 14:06:31 -04:00
Daniel Hiltgen
e592e8fccb Support Fedoras standard ROCm location 2024-05-01 15:47:12 -07:00
Bryce Reitano
bf4fc25f7b Add a /clear command (#3947)
* Add a /clear command

* change help messages

---------

Co-authored-by: Patrick Devine <patrick@infrahq.com>
2024-05-01 17:44:36 -04:00
Michael Yang
5b806d8d24 Merge pull request #4089 from ollama/mxyng/target-invalid
server: destination invalid
2024-05-01 12:46:35 -07:00
Michael Yang
cb1e072643 Merge pull request #4087 from ollama/mxyng/fix-host-port
types/model: fix name for hostport
2024-05-01 12:42:07 -07:00
Michael Yang
45b6a12e45 server: target invalid 2024-05-01 12:40:45 -07:00
alwqx
68755f1f5e chore: fix typo in docs/development.md (#4073) 2024-05-01 15:39:11 -04:00
Michael Yang
997a455039 want filepath 2024-05-01 12:33:41 -07:00
Michael Yang
88775e1ff9 strip scheme from name 2024-05-01 12:26:19 -07:00
Michael Yang
8867e744ff types/model: fix name for hostport 2024-05-01 12:14:53 -07:00
Daniel Hiltgen
4fd064bea6 Merge pull request #4031 from MarkWard0110/fix/issue-3736
Fix/issue 3736: When runners are closing or expiring. Scheduler is getting dirty VRAM size readings.
2024-05-01 12:13:26 -07:00
Jeffrey Morgan
59fbceedcc use lf for line endings (#4085) 2024-05-01 15:02:45 -04:00
Mark Ward
321d57e1a0 Removing go routine calling .wait from load. 2024-05-01 18:51:10 +00:00
Mark Ward
ba26c7aa00 it will always return an error due to Kill() discarding Wait() errors 2024-05-01 18:51:10 +00:00
Mark Ward
63c763685f log when the waiting for the process to stop to help debug when other tasks execute during this wait.
expire timer clear the timer reference because it will not be reused.
close will clean up expireTimer if calling code has not already done this.
2024-05-01 18:51:10 +00:00
Mark Ward
34a4a94f13 ignore debug bin files 2024-05-01 18:51:10 +00:00
Mark Ward
f4a73d57a4 fix runner expire during active use. Clearing the expire timer as it is used. Allowing the finish to assign an expire timer so that the runner will expire after no use. 2024-05-01 18:51:10 +00:00
Mark Ward
948114e3e3 fix sched to wait for the runner to terminate to ensure following vram check will be more accurate 2024-05-01 18:51:10 +00:00
Arpit Jain
a3e60d9058 README.md: fix typos (#4007)
Co-authored-by: Blake Mizerany <blake.mizerany@gmail.com>
2024-05-01 10:39:38 -07:00
Michael Yang
8acb233668 use strings.Builder 2024-05-01 10:01:09 -07:00
Michael Yang
119589fcb3 rename parser to model/file 2024-05-01 09:53:50 -07:00
Michael Yang
5ea844964e cmd: import regexp 2024-05-01 09:53:45 -07:00
Michael Yang
bd8eed57fc fix parser name 2024-05-01 09:52:54 -07:00
Michael Yang
9cf0f2e973 use parser.Format instead of templating modelfile 2024-05-01 09:52:54 -07:00
Michael Yang
176ad3aa6e parser: add commands format 2024-05-01 09:52:54 -07:00
Michael Yang
4d08363580 comments 2024-05-01 09:52:54 -07:00
Michael Yang
8907bf51d2 fix multiline 2024-05-01 09:52:54 -07:00
Michael Yang
abe614c705 tests 2024-05-01 09:52:54 -07:00
Michael Yang
238715037d linting 2024-05-01 09:52:54 -07:00
Michael Yang
c0a00f68ae refactor modelfile parser 2024-05-01 09:52:54 -07:00
Jeffrey Morgan
f0c454ab57 gpu: add 512MiB to darwin minimum, metal doesn't have partial offloading overhead (#4068) 2024-05-01 11:46:03 -04:00
Daniel Hiltgen
089daaeabc Add CUDA Driver API for GPU discovery
We're seeing some corner cases with cudart which might be resolved by
switching to the driver API which comes bundled with the driver package
2024-04-30 18:00:45 -07:00
Blake Mizerany
b9f74ff3d6 types/model: reintroduce Digest (#4065) 2024-04-30 16:38:03 -07:00
jmorganca
fcf4d60eee llm: add back check for empty token cache 2024-04-30 17:38:44 -04:00
jmorganca
e33d5c2dbc update llama.cpp commit to 952d03d 2024-04-30 17:31:20 -04:00
Jeffrey Morgan
18d9a7e1f1 update llama.cpp submodule to f364eb6 (#4060) 2024-04-30 17:25:39 -04:00
Michael
8488388cbd Update README.md 2024-04-30 15:45:56 -04:00
Blake Mizerany
588901f449 types/model: reduce Name.Filepath allocs from 5 to 2 (#4039) 2024-04-30 11:09:19 -07:00
Bruce MacDonald
0a7fdbe533 prompt to display and add local ollama keys to account (#3717)
- return descriptive error messages when unauthorized to create blob or push a model
- display the local public key associated with the request that was denied
2024-04-30 11:02:08 -07:00
Christian Frantzen
5950c176ca Update langchainpy.md (#4037)
Updated the code a bit
2024-04-29 23:19:06 -04:00
Daniel Hiltgen
23d23409a0 Update llama.cpp (#4036)
* Bump llama.cpp to b2761

* Adjust types for bump
2024-04-29 23:18:48 -04:00
Patrick Devine
9009bedf13 better checking for OLLAMA_HOST variable (#3661) 2024-04-29 19:14:07 -04:00
Daniel Hiltgen
d4ac57e240 Merge pull request #4035 from dhiltgen/fix_relative_paths
Fix relative path lookup
2024-04-29 16:08:06 -07:00
Daniel Hiltgen
7b59d1770f Fix relative path lookup 2024-04-29 16:00:08 -07:00
Jeffrey Morgan
95ead8ffba Restart server on failure when running Windows app (#3985)
* app: restart server on failure

* fix linter

* address comments

* refactor log directory creation to be where logs are written

* check all log dir creation errors
2024-04-29 10:07:52 -04:00
Jeffrey Morgan
7aa08a77ca llm: dont cap context window limit to training context window (#3988) 2024-04-29 10:07:30 -04:00
Blake Mizerany
7e432cdfac types/model: remove old comment (#4020) 2024-04-28 20:52:26 -07:00
Jeffrey Morgan
586672f490 fix copying model to itself (#4019) 2024-04-28 23:47:49 -04:00
Daniel Hiltgen
b03408de74 Merge pull request #3972 from hmartinez82/win_arm64
Add support for building on Windows ARM64
2024-04-28 14:52:58 -07:00
Daniel Hiltgen
1e6a28bf5b Merge pull request #4009 from dhiltgen/cpu_concurrency
Fix concurrency for CPU mode
2024-04-28 14:20:27 -07:00
Daniel Hiltgen
d6e3b64582 Fix concurrency for CPU mode
Prior refactoring passes accidentally removed the logic to bypass VRAM
checks for CPU loads.  This adds that back, along with test coverage.

This also fixes loaded map access in the unit test to be behind the mutex which was
likely the cause of various flakes in the tests.
2024-04-28 13:42:39 -07:00
Blake Mizerany
114c932a8e types/model: allow _ as starter character in Name parts (#3991) 2024-04-27 21:24:52 -07:00
Jeffrey Morgan
7f7103de06 mac: update setup command to llama3 (#3986) 2024-04-27 22:52:10 -04:00
Blake Mizerany
c631a9c726 types/model: relax name length constraint from 2 to 1 (#3984) 2024-04-27 17:58:41 -07:00
Blake Mizerany
8fd9e56804 types/structs: drop unused structs package (#3981) 2024-04-27 14:06:11 -07:00
Hernan Martinez
8a65717f55 Do not build AVX runners on ARM64 2024-04-26 23:55:32 -06:00
Hernan Martinez
6d3152a98a Use architecture specific folders in installer script 2024-04-26 23:35:16 -06:00
Hernan Martinez
b438d485f1 Use architecture specific folders in the generate script 2024-04-26 23:34:12 -06:00
Hernan Martinez
204349b17b Use architecture specific folders in the build script 2024-04-26 23:26:03 -06:00
Hernan Martinez
86e67fc4a9 Add import declaration for windows,arm64 to llm.go 2024-04-26 23:23:53 -06:00
Michael Yang
b99c291f47 fly example 2023-11-01 14:58:20 -07:00
142 changed files with 20811 additions and 1752 deletions

1
.gitignore vendored
View File

@@ -12,3 +12,4 @@ ggml-metal.metal
test_data
*.crt
llm/build
__debug_bin*

View File

@@ -1,5 +1,5 @@
<div align="center">
<img alt="ollama" height="200px" src="https://github.com/ollama/ollama/assets/3325447/0d0b44e2-8f4a-4e99-9b52-a5c1c741c8f7">
 <img alt="ollama" height="200px" src="https://github.com/ollama/ollama/assets/3325447/0d0b44e2-8f4a-4e99-9b52-a5c1c741c8f7">
</div>
# Ollama
@@ -51,7 +51,7 @@ Here are some example models that can be downloaded:
| ------------------ | ---------- | ----- | ------------------------------ |
| Llama 3 | 8B | 4.7GB | `ollama run llama3` |
| Llama 3 | 70B | 40GB | `ollama run llama3:70b` |
| Phi-3 | 3,8B | 2.3GB | `ollama run phi3` |
| Phi-3 | 3.8B | 2.3GB | `ollama run phi3` |
| Mistral | 7B | 4.1GB | `ollama run mistral` |
| Neural Chat | 7B | 4.1GB | `ollama run neural-chat` |
| Starling | 7B | 4.1GB | `ollama run starling-lm` |
@@ -173,7 +173,7 @@ I'm a basic program that prints the famous "Hello, world!" message to the consol
The image features a yellow smiley face, which is likely the central focus of the picture.
```
### Pass in prompt as arguments
### Pass the prompt as an argument
```
$ ollama run llama3 "Summarize this file: $(cat README.md)"
@@ -284,17 +284,19 @@ See the [API documentation](./docs/api.md) for all endpoints.
- [OllamaGUI](https://github.com/enoch1118/ollamaGUI)
- [OpenAOE](https://github.com/InternLM/OpenAOE)
- [Odin Runes](https://github.com/leonid20000/OdinRunes)
- [LLM-X: Progressive Web App](https://github.com/mrdjohnson/llm-x)
- [LLM-X](https://github.com/mrdjohnson/llm-x) (Progressive Web App)
- [AnythingLLM (Docker + MacOs/Windows/Linux native app)](https://github.com/Mintplex-Labs/anything-llm)
- [Ollama Basic Chat: Uses HyperDiv Reactive UI](https://github.com/rapidarchitect/ollama_basic_chat)
- [Ollama-chats RPG](https://github.com/drazdra/ollama-chats)
- [QA-Pilot: Chat with Code Repository](https://github.com/reid41/QA-Pilot)
- [ChatOllama: Open Source Chatbot based on Ollama with Knowledge Bases](https://github.com/sugarforever/chat-ollama)
- [CRAG Ollama Chat: Simple Web Search with Corrective RAG](https://github.com/Nagi-ovo/CRAG-Ollama-Chat)
- [RAGFlow: Open-source Retrieval-Augmented Generation engine based on deep document understanding](https://github.com/infiniflow/ragflow)
- [chat: chat web app for teams](https://github.com/swuecho/chat)
- [QA-Pilot](https://github.com/reid41/QA-Pilot) (Chat with Code Repository)
- [ChatOllama](https://github.com/sugarforever/chat-ollama) (Open Source Chatbot based on Ollama with Knowledge Bases)
- [CRAG Ollama Chat](https://github.com/Nagi-ovo/CRAG-Ollama-Chat) (Simple Web Search with Corrective RAG)
- [RAGFlow](https://github.com/infiniflow/ragflow) (Open-source Retrieval-Augmented Generation engine based on deep document understanding)
- [StreamDeploy](https://github.com/StreamDeploy-DevRel/streamdeploy-llm-app-scaffold) (LLM Application Scaffold)
- [chat](https://github.com/swuecho/chat) (chat web app for teams)
- [Lobe Chat](https://github.com/lobehub/lobe-chat) with [Integrating Doc](https://lobehub.com/docs/self-hosting/examples/ollama)
- [Ollama RAG Chatbot: Local Chat with multiples PDFs using Ollama and RAG.](https://github.com/datvodinh/rag-chatbot.git)
- [Ollama RAG Chatbot](https://github.com/datvodinh/rag-chatbot.git) (Local Chat with multiple PDFs using Ollama and RAG)
- [BrainSoup](https://www.nurgo-software.com/products/brainsoup) (Flexible native client with RAG & multi-agent automation)
### Terminal
@@ -348,9 +350,11 @@ See the [API documentation](./docs/api.md) for all endpoints.
- [Haystack](https://github.com/deepset-ai/haystack-integrations/blob/main/integrations/ollama.md)
- [Elixir LangChain](https://github.com/brainlid/langchain)
- [Ollama for R - rollama](https://github.com/JBGruber/rollama)
- [Ollama for R - ollama-r](https://github.com/hauselin/ollama-r)
- [Ollama-ex for Elixir](https://github.com/lebrunel/ollama-ex)
- [Ollama Connector for SAP ABAP](https://github.com/b-tocs/abap_btocs_ollama)
- [Testcontainers](https://testcontainers.com/modules/ollama/)
- [Portkey](https://portkey.ai/docs/welcome/integration-guides/ollama)
### Mobile
@@ -370,12 +374,13 @@ See the [API documentation](./docs/api.md) for all endpoints.
- [Ollama Telegram Bot](https://github.com/ruecat/ollama-telegram)
- [Hass Ollama Conversation](https://github.com/ej52/hass-ollama-conversation)
- [Rivet plugin](https://github.com/abrenneke/rivet-plugin-ollama)
- [Llama Coder](https://github.com/ex3ndr/llama-coder) (Copilot alternative using Ollama)
- [Obsidian BMO Chatbot plugin](https://github.com/longy2k/obsidian-bmo-chatbot)
- [Cliobot](https://github.com/herval/cliobot) (Telegram bot with Ollama support)
- [Copilot for Obsidian plugin](https://github.com/logancyang/obsidian-copilot)
- [Obsidian Local GPT plugin](https://github.com/pfrankov/obsidian-local-gpt)
- [Open Interpreter](https://docs.openinterpreter.com/language-model-setup/local-models/ollama)
- [Llama Coder](https://github.com/ex3ndr/llama-coder) (Copilot alternative using Ollama)
- [Ollama Copilot](https://github.com/bernardo-bruning/ollama-copilot) (Proxy that allows you to use ollama as a copilot like Github copilot)
- [twinny](https://github.com/rjmacarthy/twinny) (Copilot and Copilot chat alternative using Ollama)
- [Wingman-AI](https://github.com/RussellCanfield/wingman-ai) (Copilot code and chat alternative using Ollama and HuggingFace)
- [Page Assist](https://github.com/n4ze3m/page-assist) (Chrome Extension)
@@ -384,4 +389,5 @@ See the [API documentation](./docs/api.md) for all endpoints.
- [Discord-Ollama Chat Bot](https://github.com/kevinthedang/discord-ollama) (Generalized TypeScript Discord Bot w/ Tuning Documentation)
### Supported backends
- [llama.cpp](https://github.com/ggerganov/llama.cpp) project founded by Georgi Gerganov.
- [llama.cpp](https://github.com/ggerganov/llama.cpp) project founded by Georgi Gerganov.

View File

@@ -18,6 +18,7 @@ import (
"net/url"
"os"
"runtime"
"strconv"
"strings"
"github.com/ollama/ollama/format"
@@ -57,12 +58,36 @@ func checkError(resp *http.Response, body []byte) error {
// If the variable is not specified, a default ollama host and port will be
// used.
func ClientFromEnvironment() (*Client, error) {
ollamaHost, err := GetOllamaHost()
if err != nil {
return nil, err
}
return &Client{
base: &url.URL{
Scheme: ollamaHost.Scheme,
Host: net.JoinHostPort(ollamaHost.Host, ollamaHost.Port),
},
http: http.DefaultClient,
}, nil
}
type OllamaHost struct {
Scheme string
Host string
Port string
}
func GetOllamaHost() (OllamaHost, error) {
defaultPort := "11434"
scheme, hostport, ok := strings.Cut(os.Getenv("OLLAMA_HOST"), "://")
hostVar := os.Getenv("OLLAMA_HOST")
hostVar = strings.TrimSpace(strings.Trim(strings.TrimSpace(hostVar), "\"'"))
scheme, hostport, ok := strings.Cut(hostVar, "://")
switch {
case !ok:
scheme, hostport = "http", os.Getenv("OLLAMA_HOST")
scheme, hostport = "http", hostVar
case scheme == "http":
defaultPort = "80"
case scheme == "https":
@@ -82,12 +107,14 @@ func ClientFromEnvironment() (*Client, error) {
}
}
return &Client{
base: &url.URL{
Scheme: scheme,
Host: net.JoinHostPort(host, port),
},
http: http.DefaultClient,
if portNum, err := strconv.ParseInt(port, 10, 32); err != nil || portNum > 65535 || portNum < 0 {
return OllamaHost{}, ErrInvalidHostPort
}
return OllamaHost{
Scheme: scheme,
Host: host,
Port: port,
}, nil
}

View File

@@ -1,6 +1,12 @@
package api
import "testing"
import (
"fmt"
"net"
"testing"
"github.com/stretchr/testify/assert"
)
func TestClientFromEnvironment(t *testing.T) {
type testCase struct {
@@ -40,4 +46,40 @@ func TestClientFromEnvironment(t *testing.T) {
}
})
}
hostTestCases := map[string]*testCase{
"empty": {value: "", expect: "127.0.0.1:11434"},
"only address": {value: "1.2.3.4", expect: "1.2.3.4:11434"},
"only port": {value: ":1234", expect: ":1234"},
"address and port": {value: "1.2.3.4:1234", expect: "1.2.3.4:1234"},
"hostname": {value: "example.com", expect: "example.com:11434"},
"hostname and port": {value: "example.com:1234", expect: "example.com:1234"},
"zero port": {value: ":0", expect: ":0"},
"too large port": {value: ":66000", err: ErrInvalidHostPort},
"too small port": {value: ":-1", err: ErrInvalidHostPort},
"ipv6 localhost": {value: "[::1]", expect: "[::1]:11434"},
"ipv6 world open": {value: "[::]", expect: "[::]:11434"},
"ipv6 no brackets": {value: "::1", expect: "[::1]:11434"},
"ipv6 + port": {value: "[::1]:1337", expect: "[::1]:1337"},
"extra space": {value: " 1.2.3.4 ", expect: "1.2.3.4:11434"},
"extra quotes": {value: "\"1.2.3.4\"", expect: "1.2.3.4:11434"},
"extra space+quotes": {value: " \" 1.2.3.4 \" ", expect: "1.2.3.4:11434"},
"extra single quotes": {value: "'1.2.3.4'", expect: "1.2.3.4:11434"},
}
for k, v := range hostTestCases {
t.Run(k, func(t *testing.T) {
t.Setenv("OLLAMA_HOST", v.value)
oh, err := GetOllamaHost()
if err != v.err {
t.Fatalf("expected %s, got %s", v.err, err)
}
if err == nil {
host := net.JoinHostPort(oh.Host, oh.Port)
assert.Equal(t, v.expect, host, fmt.Sprintf("%s: expected %s, got %s", k, v.expect, host))
}
})
}
}

View File

@@ -309,6 +309,7 @@ func (m *Metrics) Summary() {
}
var ErrInvalidOpts = errors.New("invalid options")
var ErrInvalidHostPort = errors.New("invalid port specified in OLLAMA_HOST")
func (opts *Options) FromMap(m map[string]interface{}) error {
valueOpts := reflect.ValueOf(opts).Elem() // names of the fields in the options struct
@@ -435,6 +436,13 @@ type Duration struct {
time.Duration
}
func (d Duration) MarshalJSON() ([]byte, error) {
if d.Duration < 0 {
return []byte("-1"), nil
}
return []byte("\"" + d.Duration.String() + "\""), nil
}
func (d *Duration) UnmarshalJSON(b []byte) (err error) {
var v any
if err := json.Unmarshal(b, &v); err != nil {
@@ -448,7 +456,7 @@ func (d *Duration) UnmarshalJSON(b []byte) (err error) {
if t < 0 {
d.Duration = time.Duration(math.MaxInt64)
} else {
d.Duration = time.Duration(t * float64(time.Second))
d.Duration = time.Duration(int(t) * int(time.Second))
}
case string:
d.Duration, err = time.ParseDuration(t)
@@ -458,6 +466,8 @@ func (d *Duration) UnmarshalJSON(b []byte) (err error) {
if d.Duration < 0 {
d.Duration = time.Duration(math.MaxInt64)
}
default:
return fmt.Errorf("Unsupported type: '%s'", reflect.TypeOf(v))
}
return nil

View File

@@ -21,6 +21,11 @@ func TestKeepAliveParsingFromJSON(t *testing.T) {
req: `{ "keep_alive": 42 }`,
exp: &Duration{42 * time.Second},
},
{
name: "Positive Float",
req: `{ "keep_alive": 42.5 }`,
exp: &Duration{42 * time.Second},
},
{
name: "Positive Integer String",
req: `{ "keep_alive": "42m" }`,
@@ -31,6 +36,11 @@ func TestKeepAliveParsingFromJSON(t *testing.T) {
req: `{ "keep_alive": -1 }`,
exp: &Duration{math.MaxInt64},
},
{
name: "Negative Float",
req: `{ "keep_alive": -3.14 }`,
exp: &Duration{math.MaxInt64},
},
{
name: "Negative Integer String",
req: `{ "keep_alive": "-1m" }`,
@@ -48,3 +58,50 @@ func TestKeepAliveParsingFromJSON(t *testing.T) {
})
}
}
func TestDurationMarshalUnmarshal(t *testing.T) {
tests := []struct {
name string
input time.Duration
expected time.Duration
}{
{
"negative duration",
time.Duration(-1),
time.Duration(math.MaxInt64),
},
{
"positive duration",
time.Duration(42 * time.Second),
time.Duration(42 * time.Second),
},
{
"another positive duration",
time.Duration(42 * time.Minute),
time.Duration(42 * time.Minute),
},
{
"zero duration",
time.Duration(0),
time.Duration(0),
},
{
"max duration",
time.Duration(math.MaxInt64),
time.Duration(math.MaxInt64),
},
}
for _, test := range tests {
t.Run(test.name, func(t *testing.T) {
b, err := json.Marshal(Duration{test.input})
require.NoError(t, err)
var d Duration
err = json.Unmarshal(b, &d)
require.NoError(t, err)
assert.Equal(t, test.expected, d.Duration, "input %v, marshalled %v, got %v", test.input, string(b), d.Duration)
})
}
}

1
app/.gitignore vendored
View File

@@ -1,2 +1 @@
ollama.syso
app

View File

@@ -1,7 +0,0 @@
#import <Cocoa/Cocoa.h>
@interface AppDelegate : NSObject <NSApplicationDelegate>
- (void)applicationDidFinishLaunching:(NSNotification *)aNotification;
@end

View File

@@ -1,6 +1,10 @@
# Ollama App
## macOS
## Linux
TODO
## MacOS
TODO

View File

@@ -1,76 +0,0 @@
package main
// #cgo CFLAGS: -x objective-c
// #cgo LDFLAGS: -framework Cocoa -framework LocalAuthentication -framework ServiceManagement
// #include "app_darwin.h"
import "C"
import (
"context"
"fmt"
"log/slog"
"os"
"path/filepath"
"syscall"
)
func init() {
home, err := os.UserHomeDir()
if err != nil {
panic(err)
}
ServerLogFile = filepath.Join(home, ".ollama", "logs", "server.log")
}
func run() {
initLogging()
slog.Info("ollama macOS app started")
// Ask to move to applications directory
moving := C.askToMoveToApplications()
if moving {
return
}
C.killOtherInstances()
code := C.installSymlink()
if code != 0 {
slog.Error("Failed to install symlink")
}
exe, err := os.Executable()
if err != nil {
panic(err)
}
var options ServerOptions
ctx, cancel := context.WithCancel(context.Background())
var done chan int
done, err = SpawnServer(ctx, filepath.Join(filepath.Dir(exe), "..", "Resources", "ollama"), options)
if err != nil {
slog.Error(fmt.Sprintf("Failed to spawn ollama server %s", err))
done = make(chan int, 1)
done <- 1
}
// Run the native macOS app
// Note: this will block until the app is closed
C.run()
slog.Info("ollama macOS app closed")
cancel()
slog.Info("Waiting for ollama server to shutdown...")
if done != nil {
<-done
}
slog.Info("Ollama app exiting")
}
//export Quit
func Quit() {
syscall.Kill(os.Getpid(), syscall.SIGTERM)
}

View File

@@ -1,13 +0,0 @@
#import <Cocoa/Cocoa.h>
@interface AppDelegate : NSObject <NSApplicationDelegate>
- (void)applicationDidFinishLaunching:(NSNotification *)aNotification;
@end
void run();
void killOtherInstances();
bool askToMoveToApplications();
int createSymlinkWithAuthorization();
int installSymlink();
extern void Restart();
extern void Quit();

View File

@@ -1,282 +0,0 @@
#import <AppKit/AppKit.h>
#import <Cocoa/Cocoa.h>
#import <CoreServices/CoreServices.h>
#import <Security/Security.h>
#import <ServiceManagement/ServiceManagement.h>
#import "app_darwin.h"
@interface AppDelegate ()
@property (strong, nonatomic) NSStatusItem *statusItem;
@end
@implementation AppDelegate
- (void)applicationDidFinishLaunching:(NSNotification *)aNotification {
// show status menu
NSMenu *menu = [[NSMenu alloc] init];
NSMenuItem *aboutMenuItem = [[NSMenuItem alloc] initWithTitle:@"About Ollama" action:@selector(aboutOllama) keyEquivalent:@""];
[aboutMenuItem setTarget:self];
[menu addItem:aboutMenuItem];
// Settings submenu
NSMenu *settingsMenu = [[NSMenu alloc] initWithTitle:@"Settings"];
// Submenu items
NSMenuItem *chooseModelDirectoryItem = [[NSMenuItem alloc] initWithTitle:@"Choose model directory..." action:@selector(chooseModelDirectory) keyEquivalent:@""];
[chooseModelDirectoryItem setTarget:self];
[chooseModelDirectoryItem setEnabled:YES];
[settingsMenu addItem:chooseModelDirectoryItem];
NSMenuItem *exposeExternallyItem = [[NSMenuItem alloc] initWithTitle:@"Allow external connections" action:@selector(toggleExposeExternally:) keyEquivalent:@""];
[exposeExternallyItem setTarget:self];
[exposeExternallyItem setState:NSOffState]; // Set initial state to off
[exposeExternallyItem setEnabled:YES];
[settingsMenu addItem:exposeExternallyItem];
NSMenuItem *allowCrossOriginItem = [[NSMenuItem alloc] initWithTitle:@"Allow browser requests" action:@selector(toggleCrossOrigin:) keyEquivalent:@""];
[allowCrossOriginItem setTarget:self];
[allowCrossOriginItem setState:NSOffState]; // Set initial state to off
[allowCrossOriginItem setEnabled:YES];
[settingsMenu addItem:allowCrossOriginItem];
NSMenuItem *settingsMenuItem = [[NSMenuItem alloc] initWithTitle:@"Settings" action:nil keyEquivalent:@""];
[settingsMenuItem setSubmenu:settingsMenu];
[menu addItem:settingsMenuItem];
[menu addItemWithTitle:@"Quit Ollama" action:@selector(quit) keyEquivalent:@"q"];
self.statusItem = [[NSStatusBar systemStatusBar] statusItemWithLength:NSVariableStatusItemLength];
[self.statusItem addObserver:self forKeyPath:@"button.effectiveAppearance" options:NSKeyValueObservingOptionNew|NSKeyValueObservingOptionInitial context:nil];
self.statusItem.menu = menu;
[self showIcon];
}
- (void)aboutOllama {
[[NSApplication sharedApplication] orderFrontStandardAboutPanel:nil];
}
- (void)toggleCrossOrigin:(id)sender {
NSMenuItem *item = (NSMenuItem *)sender;
if ([item state] == NSOffState) {
// Do something when cross-origin requests are allowed
[item setState:NSOnState];
} else {
// Do something when cross-origin requests are disallowed
[item setState:NSOffState];
}
}
- (void)toggleExposeExternally:(id)sender {
NSMenuItem *item = (NSMenuItem *)sender;
if ([item state] == NSOffState) {
// Do something when Ollama is exposed externally
[item setState:NSOnState];
} else {
// Do something when Ollama is not exposed externally
[item setState:NSOffState];
}
}
- (void)chooseModelDirectory {
NSOpenPanel *openPanel = [NSOpenPanel openPanel];
[openPanel setCanChooseFiles:NO];
[openPanel setCanChooseDirectories:YES];
[openPanel setAllowsMultipleSelection:NO];
NSInteger result = [openPanel runModal];
if (result == NSModalResponseOK) {
NSURL *selectedDirectoryURL = [openPanel URLs].firstObject;
// Do something with the selected directory URL
}
}
-(void) showIcon {
NSAppearance* appearance = self.statusItem.button.effectiveAppearance;
NSString* appearanceName = (NSString*)(appearance.name);
NSString* iconName = [[appearanceName lowercaseString] containsString:@"dark"] ? @"iconDark" : @"icon";
NSImage* statusImage = [NSImage imageNamed:iconName];
[statusImage setTemplate:YES];
self.statusItem.button.image = statusImage;
}
-(void)observeValueForKeyPath:(NSString *)keyPath ofObject:(id)object change:(NSDictionary<NSKeyValueChangeKey,id> *)change context:(void *)context {
[self showIcon];
}
- (void)quit {
[NSApp stop:nil];
}
@end
void run() {
@autoreleasepool {
[NSApplication sharedApplication];
AppDelegate *appDelegate = [[AppDelegate alloc] init];
[NSApp setDelegate:appDelegate];
[NSApp run];
}
}
// killOtherInstances kills all other instances of the app currently
// running. This way we can ensure that only the most recently started
// instance of Ollama is running
void killOtherInstances() {
pid_t pid = getpid();
NSArray *all = [[NSWorkspace sharedWorkspace] runningApplications];
NSMutableArray *apps = [NSMutableArray array];
for (NSRunningApplication *app in all) {
if ([app.bundleIdentifier isEqualToString:[[NSBundle mainBundle] bundleIdentifier]] ||
[app.bundleIdentifier isEqualToString:@"ai.ollama.ollama"] ||
[app.bundleIdentifier isEqualToString:@"com.electron.ollama"]) {
if (app.processIdentifier != pid) {
[apps addObject:app];
}
}
}
for (NSRunningApplication *app in apps) {
kill(app.processIdentifier, SIGTERM);
}
NSDate *startTime = [NSDate date];
for (NSRunningApplication *app in apps) {
while (!app.terminated) {
if (-[startTime timeIntervalSinceNow] >= 5) {
kill(app.processIdentifier, SIGKILL);
break;
}
[[NSRunLoop currentRunLoop] runUntilDate:[NSDate dateWithTimeIntervalSinceNow:0.1]];
}
}
}
bool askToMoveToApplications() {
NSString *bundlePath = [[NSBundle mainBundle] bundlePath];
if ([bundlePath hasPrefix:@"/Applications"]) {
return false;
}
NSAlert *alert = [[NSAlert alloc] init];
[alert setMessageText:@"Move to Applications?"];
[alert setInformativeText:@"Ollama works best when run from the Applications directory."];
[alert addButtonWithTitle:@"Move to Applications"];
[alert addButtonWithTitle:@"Don't move"];
[NSApp activateIgnoringOtherApps:YES];
if ([alert runModal] != NSAlertFirstButtonReturn) {
return false;
}
// move to applications
NSString *applicationsPath = @"/Applications";
NSString *newPath = [applicationsPath stringByAppendingPathComponent:@"Ollama.app"];
NSFileManager *fileManager = [NSFileManager defaultManager];
// Check if the newPath already exists
if ([fileManager fileExistsAtPath:newPath]) {
NSError *removeError = nil;
[fileManager removeItemAtPath:newPath error:&removeError];
if (removeError) {
NSLog(@"Error removing file at %@: %@", newPath, removeError);
return false; // or handle the error
}
}
NSError *moveError = nil;
[fileManager moveItemAtPath:bundlePath toPath:newPath error:&moveError];
if (moveError) {
NSLog(@"Error moving file from %@ to %@: %@", bundlePath, newPath, moveError);
return false;
}
NSLog(@"Opening %@", newPath);
NSError *error = nil;
NSWorkspace *workspace = [NSWorkspace sharedWorkspace];
#pragma clang diagnostic ignored "-Wdeprecated-declarations"
[workspace launchApplicationAtURL:[NSURL fileURLWithPath:newPath]
options:NSWorkspaceLaunchNewInstance | NSWorkspaceLaunchDefault
configuration:@{}
error:&error];
return true;
}
int installSymlink() {
NSString *linkPath = @"/usr/local/bin/ollama";
NSError *error = nil;
NSFileManager *fileManager = [NSFileManager defaultManager];
NSString *symlinkPath = [fileManager destinationOfSymbolicLinkAtPath:linkPath error:&error];
NSString *bundlePath = [[NSBundle mainBundle] bundlePath];
NSString *execPath = [[NSBundle mainBundle] executablePath];
NSString *resPath = [[NSBundle mainBundle] pathForResource:@"ollama" ofType:nil];
// if the symlink already exists and points to the right place, don't prompt
if ([symlinkPath isEqualToString:resPath]) {
NSLog(@"symbolic link already exists and points to the right place");
return 0;
}
NSString *authorizationPrompt = @"Ollama is trying to install its command line interface (CLI) tool.";
AuthorizationRef auth = NULL;
OSStatus createStatus = AuthorizationCreate(NULL, kAuthorizationEmptyEnvironment, kAuthorizationFlagDefaults, &auth);
if (createStatus != errAuthorizationSuccess) {
NSLog(@"Error creating authorization");
return -1;
}
NSString * bundleIdentifier = [[NSBundle mainBundle] bundleIdentifier];
NSString *rightNameString = [NSString stringWithFormat:@"%@.%@", bundleIdentifier, @"auth3"];
const char *rightName = rightNameString.UTF8String;
OSStatus getRightResult = AuthorizationRightGet(rightName, NULL);
if (getRightResult == errAuthorizationDenied) {
if (AuthorizationRightSet(auth, rightName, (__bridge CFTypeRef _Nonnull)(@(kAuthorizationRuleAuthenticateAsAdmin)), (__bridge CFStringRef _Nullable)(authorizationPrompt), NULL, NULL) != errAuthorizationSuccess) {
NSLog(@"Failed to set right");
return -1;
}
}
AuthorizationItem right = { .name = rightName, .valueLength = 0, .value = NULL, .flags = 0 };
AuthorizationRights rights = { .count = 1, .items = &right };
AuthorizationFlags flags = (AuthorizationFlags)(kAuthorizationFlagExtendRights | kAuthorizationFlagInteractionAllowed);
AuthorizationItem iconAuthorizationItem = {.name = kAuthorizationEnvironmentIcon, .valueLength = 0, .value = NULL, .flags = 0};
AuthorizationEnvironment authorizationEnvironment = {.count = 0, .items = NULL};
BOOL failedToUseSystemDomain = NO;
OSStatus copyStatus = AuthorizationCopyRights(auth, &rights, &authorizationEnvironment, flags, NULL);
if (copyStatus != errAuthorizationSuccess) {
failedToUseSystemDomain = YES;
if (copyStatus == errAuthorizationCanceled) {
NSLog(@"User cancelled authorization");
return -1;
} else {
NSLog(@"Failed copying system domain rights: %d", copyStatus);
return -1;
}
}
const char *toolPath = "/bin/ln";
const char *args[] = {"-s", "-F", [resPath UTF8String], "/usr/local/bin/ollama", NULL};
FILE *pipe = NULL;
#pragma clang diagnostic ignored "-Wdeprecated-declarations"
OSStatus status = AuthorizationExecuteWithPrivileges(auth, toolPath, kAuthorizationFlagDefaults, (char *const *)args, &pipe);
if (status != errAuthorizationSuccess) {
NSLog(@"Failed to create symlink");
return -1;
}
AuthorizationFree(auth, kAuthorizationFlagDestroyRights);
return 0;
}

View File

@@ -1,166 +0,0 @@
package main
import (
"context"
"errors"
"fmt"
"log"
"log/slog"
"os"
"os/exec"
"os/signal"
"path/filepath"
"strings"
"syscall"
"github.com/ollama/ollama/app/lifecycle"
"github.com/ollama/ollama/app/store"
"github.com/ollama/ollama/app/tray"
"github.com/ollama/ollama/app/updater"
)
func init() {
AppName += ".exe"
CLIName += ".exe"
// Logs, configs, downloads go to LOCALAPPDATA
localAppData := os.Getenv("LOCALAPPDATA")
AppDataDir = filepath.Join(localAppData, "Ollama")
AppLogFile = filepath.Join(AppDataDir, "app.log")
ServerLogFile = filepath.Join(AppDataDir, "server.log")
// Executables are stored in APPDATA
AppDir = filepath.Join(localAppData, "Programs", "Ollama")
// Make sure we have PATH set correctly for any spawned children
paths := strings.Split(os.Getenv("PATH"), ";")
// Start with whatever we find in the PATH/LD_LIBRARY_PATH
found := false
for _, path := range paths {
d, err := filepath.Abs(path)
if err != nil {
continue
}
if strings.EqualFold(AppDir, d) {
found = true
}
}
if !found {
paths = append(paths, AppDir)
pathVal := strings.Join(paths, ";")
slog.Debug("setting PATH=" + pathVal)
err := os.Setenv("PATH", pathVal)
if err != nil {
slog.Error(fmt.Sprintf("failed to update PATH: %s", err))
}
}
// Make sure our logging dir exists
_, err := os.Stat(AppDataDir)
if errors.Is(err, os.ErrNotExist) {
if err := os.MkdirAll(AppDataDir, 0o755); err != nil {
slog.Error(fmt.Sprintf("create ollama dir %s: %v", AppDataDir, err))
}
}
}
func ShowLogs() {
cmd_path := "c:\\Windows\\system32\\cmd.exe"
slog.Debug(fmt.Sprintf("viewing logs with start %s", AppDataDir))
cmd := exec.Command(cmd_path, "/c", "start", AppDataDir)
cmd.SysProcAttr = &syscall.SysProcAttr{HideWindow: false, CreationFlags: 0x08000000}
err := cmd.Start()
if err != nil {
slog.Error(fmt.Sprintf("Failed to open log dir: %s", err))
}
}
func Start() {
cmd_path := "c:\\Windows\\system32\\cmd.exe"
slog.Debug(fmt.Sprintf("viewing logs with start %s", AppDataDir))
cmd := exec.Command(cmd_path, "/c", "start", AppDataDir)
cmd.SysProcAttr = &syscall.SysProcAttr{HideWindow: false, CreationFlags: 0x08000000}
err := cmd.Start()
if err != nil {
slog.Error(fmt.Sprintf("Failed to open log dir: %s", err))
}
}
func run() {
initLogging()
slog.Info("ollama windows app started")
ctx, cancel := context.WithCancel(context.Background())
var done chan int
t, err := tray.NewTray()
if err != nil {
log.Fatalf("Failed to start: %s", err)
}
callbacks := t.GetCallbacks()
signals := make(chan os.Signal, 1)
signal.Notify(signals, syscall.SIGINT, syscall.SIGTERM)
go func() {
slog.Debug("starting callback loop")
for {
select {
case <-callbacks.Quit:
slog.Debug("quit called")
t.Quit()
case <-signals:
slog.Debug("shutting down due to signal")
t.Quit()
case <-callbacks.Update:
err := updater.DoUpgrade(cancel, done)
if err != nil {
slog.Warn(fmt.Sprintf("upgrade attempt failed: %s", err))
}
case <-callbacks.ShowLogs:
ShowLogs()
case <-callbacks.DoFirstUse:
err := lifecycle.GetStarted()
if err != nil {
slog.Warn(fmt.Sprintf("Failed to launch getting started shell: %s", err))
}
}
}
}()
if !store.GetFirstTimeRun() {
slog.Debug("First time run")
err = t.DisplayFirstUseNotification()
if err != nil {
slog.Debug(fmt.Sprintf("XXX failed to display first use notification %v", err))
}
store.SetFirstTimeRun(true)
} else {
slog.Debug("Not first time, skipping first run notification")
}
if isServerRunning(ctx) {
slog.Info("Detected another instance of ollama running, exiting")
os.Exit(1)
}
done, err = SpawnServer(ctx, CLIName)
if err != nil {
// TODO - should we retry in a backoff loop?
// TODO - should we pop up a warning and maybe add a menu item to view application logs?
slog.Error(fmt.Sprintf("Failed to spawn ollama server %s", err))
done = make(chan int, 1)
done <- 1
}
updater.StartBackgroundUpdaterChecker(ctx, t.UpdateAvailable)
t.Run()
cancel()
slog.Info("Waiting for ollama server to shutdown...")
if done != nil {
<-done
}
slog.Info("Ollama app exiting")
}

View File

@@ -1,40 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>CFBundleDisplayName</key>
<string>Ollama</string>
<key>CFBundleExecutable</key>
<string>Ollama</string>
<key>CFBundleIconFile</key>
<string>icon.icns</string>
<key>CFBundleIdentifier</key>
<string>com.ollama.ollama</string>
<key>CFBundleInfoDictionaryVersion</key>
<string>6.0</string>
<key>CFBundleName</key>
<string>Ollama</string>
<key>CFBundlePackageType</key>
<string>APPL</string>
<key>CFBundleShortVersionString</key>
<string>0.0.0</string>
<key>CFBundleVersion</key>
<string>0.0.0</string>
<key>DTCompiler</key>
<string>com.apple.compilers.llvm.clang.1_0</string>
<key>DTSDKBuild</key>
<string>22E245</string>
<key>DTSDKName</key>
<string>macosx13.3</string>
<key>DTXcode</key>
<string>1431</string>
<key>DTXcodeBuild</key>
<string>14E300c</string>
<key>LSApplicationCategoryType</key>
<string>public.app-category.developer-tools</string>
<key>LSMinimumSystemVersion</key>
<string>11.0</string>
<key>LSUIElement</key>
<true/>
</dict>
</plist>

View File

Binary file not shown.

Before

Width:  |  Height:  |  Size: 382 B

View File

Binary file not shown.

Before

Width:  |  Height:  |  Size: 691 B

View File

Binary file not shown.

Before

Width:  |  Height:  |  Size: 382 B

View File

Binary file not shown.

Before

Width:  |  Height:  |  Size: 721 B

View File

@@ -1,3 +1,5 @@
//go:build !windows
package lifecycle
import "fmt"

View File

@@ -0,0 +1,92 @@
package lifecycle
import (
"context"
"fmt"
"log"
"log/slog"
"os"
"os/signal"
"syscall"
"github.com/ollama/ollama/app/store"
"github.com/ollama/ollama/app/tray"
)
func Run() {
InitLogging()
ctx, cancel := context.WithCancel(context.Background())
var done chan int
t, err := tray.NewTray()
if err != nil {
log.Fatalf("Failed to start: %s", err)
}
callbacks := t.GetCallbacks()
signals := make(chan os.Signal, 1)
signal.Notify(signals, syscall.SIGINT, syscall.SIGTERM)
go func() {
slog.Debug("starting callback loop")
for {
select {
case <-callbacks.Quit:
slog.Debug("quit called")
t.Quit()
case <-signals:
slog.Debug("shutting down due to signal")
t.Quit()
case <-callbacks.Update:
err := DoUpgrade(cancel, done)
if err != nil {
slog.Warn(fmt.Sprintf("upgrade attempt failed: %s", err))
}
case <-callbacks.ShowLogs:
ShowLogs()
case <-callbacks.DoFirstUse:
err := GetStarted()
if err != nil {
slog.Warn(fmt.Sprintf("Failed to launch getting started shell: %s", err))
}
}
}
}()
// Are we first use?
if !store.GetFirstTimeRun() {
slog.Debug("First time run")
err = t.DisplayFirstUseNotification()
if err != nil {
slog.Debug(fmt.Sprintf("XXX failed to display first use notification %v", err))
}
store.SetFirstTimeRun(true)
} else {
slog.Debug("Not first time, skipping first run notification")
}
if IsServerRunning(ctx) {
slog.Info("Detected another instance of ollama running, exiting")
os.Exit(1)
} else {
done, err = SpawnServer(ctx, CLIName)
if err != nil {
// TODO - should we retry in a backoff loop?
// TODO - should we pop up a warning and maybe add a menu item to view application logs?
slog.Error(fmt.Sprintf("Failed to spawn ollama server %s", err))
done = make(chan int, 1)
done <- 1
}
}
StartBackgroundUpdaterChecker(ctx, t.UpdateAvailable)
t.Run()
cancel()
slog.Info("Waiting for ollama server to shutdown...")
if done != nil {
<-done
}
slog.Info("Ollama app exiting")
}

View File

@@ -1,16 +1,18 @@
package main
package lifecycle
import (
"fmt"
"log/slog"
"os"
"path/filepath"
"github.com/ollama/ollama/server/envconfig"
)
func initLogging() {
func InitLogging() {
level := slog.LevelInfo
if debug := os.Getenv("OLLAMA_DEBUG"); debug != "" {
if envconfig.Debug {
level = slog.LevelDebug
}
@@ -41,4 +43,6 @@ func initLogging() {
})
slog.SetDefault(slog.New(handler))
slog.Info("ollama app started")
}

View File

@@ -0,0 +1,9 @@
//go:build !windows
package lifecycle
import "log/slog"
func ShowLogs() {
slog.Warn("ShowLogs not yet implemented")
}

View File

@@ -0,0 +1,19 @@
package lifecycle
import (
"fmt"
"log/slog"
"os/exec"
"syscall"
)
func ShowLogs() {
cmd_path := "c:\\Windows\\system32\\cmd.exe"
slog.Debug(fmt.Sprintf("viewing logs with start %s", AppDataDir))
cmd := exec.Command(cmd_path, "/c", "start", AppDataDir)
cmd.SysProcAttr = &syscall.SysProcAttr{HideWindow: false, CreationFlags: 0x08000000}
err := cmd.Start()
if err != nil {
slog.Error(fmt.Sprintf("Failed to open log dir: %s", err))
}
}

View File

@@ -70,5 +70,10 @@ func init() {
}
}
} else if runtime.GOOS == "darwin" {
// TODO
AppName += ".app"
// } else if runtime.GOOS == "linux" {
// TODO
}
}

View File

@@ -1,4 +1,4 @@
package main
package lifecycle
import (
"context"
@@ -14,28 +14,37 @@ import (
"github.com/ollama/ollama/api"
)
type ServerOptions struct {
Cors bool
Expose bool
ModelsPath string
func getCLIFullPath(command string) string {
cmdPath := ""
appExe, err := os.Executable()
if err == nil {
cmdPath = filepath.Join(filepath.Dir(appExe), command)
_, err := os.Stat(cmdPath)
if err == nil {
return cmdPath
}
}
cmdPath, err = exec.LookPath(command)
if err == nil {
_, err := os.Stat(cmdPath)
if err == nil {
return cmdPath
}
}
pwd, err := os.Getwd()
if err == nil {
cmdPath = filepath.Join(pwd, command)
_, err = os.Stat(cmdPath)
if err == nil {
return cmdPath
}
}
return command
}
func start(ctx context.Context, command string, options ServerOptions) (*exec.Cmd, error) {
cmd := getCmd(ctx, command)
// set environment variables
if options.ModelsPath != "" {
cmd.Env = append(cmd.Env, fmt.Sprintf("OLLAMA_MODELS=%s", options.ModelsPath))
}
if options.Cors {
cmd.Env = append(cmd.Env, "OLLAMA_ORIGINS=*")
}
if options.Expose {
cmd.Env = append(cmd.Env, "OLLAMA_HOST=0.0.0.0")
}
func start(ctx context.Context, command string) (*exec.Cmd, error) {
cmd := getCmd(ctx, getCLIFullPath(command))
stdout, err := cmd.StdoutPipe()
if err != nil {
return nil, fmt.Errorf("failed to spawn server stdout pipe: %w", err)
@@ -50,6 +59,20 @@ func start(ctx context.Context, command string, options ServerOptions) (*exec.Cm
if err != nil {
return nil, fmt.Errorf("failed to create server log: %w", err)
}
logDir := filepath.Dir(ServerLogFile)
_, err = os.Stat(logDir)
if err != nil {
if !errors.Is(err, os.ErrNotExist) {
return nil, fmt.Errorf("stat ollama server log dir %s: %v", logDir, err)
}
if err := os.MkdirAll(logDir, 0o755); err != nil {
return nil, fmt.Errorf("create ollama server log dir %s: %v", logDir, err)
}
}
go func() {
defer logFile.Close()
io.Copy(logFile, stdout) //nolint:errcheck
@@ -103,25 +126,20 @@ func start(ctx context.Context, command string, options ServerOptions) (*exec.Cm
return cmd, nil
}
func SpawnServer(ctx context.Context, command string, options ServerOptions) (chan int, error) {
logDir := filepath.Dir(ServerLogFile)
_, err := os.Stat(logDir)
if errors.Is(err, os.ErrNotExist) {
if err := os.MkdirAll(logDir, 0o755); err != nil {
return nil, fmt.Errorf("create ollama server log dir %s: %v", logDir, err)
}
}
func SpawnServer(ctx context.Context, command string) (chan int, error) {
done := make(chan int)
go func() {
// Keep the server running unless we're shuttind down the app
crashCount := 0
for {
slog.Info(fmt.Sprintf("starting server..."))
cmd, err := start(ctx, command, options)
slog.Info("starting server...")
cmd, err := start(ctx, command)
if err != nil {
crashCount++
slog.Error(fmt.Sprintf("failed to start server %s", err))
time.Sleep(500 * time.Millisecond * time.Duration(crashCount))
continue
}
cmd.Wait() //nolint:errcheck
@@ -147,7 +165,7 @@ func SpawnServer(ctx context.Context, command string, options ServerOptions) (ch
return done, nil
}
func isServerRunning(ctx context.Context) bool {
func IsServerRunning(ctx context.Context) bool {
client, err := api.ClientFromEnvironment()
if err != nil {
slog.Info("unable to connect to server")

View File

@@ -1,4 +1,6 @@
package main
//go:build !windows
package lifecycle
import (
"context"

View File

@@ -1,4 +1,4 @@
package main
package lifecycle
import (
"context"

View File

@@ -1,4 +1,4 @@
package updater
package lifecycle
import (
"context"
@@ -22,10 +22,6 @@ import (
"github.com/ollama/ollama/version"
)
var (
UpdateStageDir string
)
var (
UpdateCheckURLBase = "https://ollama.com/api/update"
UpdateDownloaded = false
@@ -127,7 +123,7 @@ func DownloadNewRelease(ctx context.Context, updateResp UpdateResponse) error {
slog.Debug("no etag detected, falling back to filename based dedup")
etag = "_"
}
filename := "OllamaSetup.exe"
filename := Installer
_, params, err := mime.ParseMediaType(resp.Header.Get("content-disposition"))
if err == nil {
filename = params["filename"]

View File

@@ -1,4 +1,6 @@
package updater
//go:build !windows
package lifecycle
import (
"context"

View File

@@ -1,4 +1,4 @@
package updater
package lifecycle
import (
"context"
@@ -9,13 +9,7 @@ import (
"path/filepath"
)
func init() {
UpdateStageDir = filepath.Join(os.Getenv("LOCALAPPDATA"), "Ollama", "updates")
}
func DoUpgrade(cancel context.CancelFunc, done chan int) error {
logFile := filepath.Join(os.Getenv("LOCALAPPDATA"), "Ollama", "upgrade.log")
files, err := filepath.Glob(filepath.Join(UpdateStageDir, "*", "*.exe")) // TODO generalize for multiplatform
if err != nil {
return fmt.Errorf("failed to lookup downloads: %s", err)
@@ -29,24 +23,21 @@ func DoUpgrade(cancel context.CancelFunc, done chan int) error {
installerExe := files[0]
slog.Info("starting upgrade with " + installerExe)
slog.Info("upgrade log file " + logFile)
slog.Info("upgrade log file " + UpgradeLogFile)
// When running in debug mode, we'll be "verbose" and let the installer pop up and prompt
installArgs := []string{
"/CLOSEAPPLICATIONS", // Quit the tray app if it's still running
"/LOG=" + filepath.Base(logFile), // Only relative seems reliable, so set pwd
"/FORCECLOSEAPPLICATIONS", // Force close the tray app - might be needed
"/CLOSEAPPLICATIONS", // Quit the tray app if it's still running
"/LOG=" + filepath.Base(UpgradeLogFile), // Only relative seems reliable, so set pwd
"/FORCECLOSEAPPLICATIONS", // Force close the tray app - might be needed
}
// When we're not in debug mode, make the upgrade as quiet as possible (no GUI, no prompts)
// TODO - temporarily disable since we're pinning in debug mode for the preview
// if debug := os.Getenv("OLLAMA_DEBUG"); debug == "" {
// make the upgrade as quiet as possible (no GUI, no prompts)
installArgs = append(installArgs,
"/SP", // Skip the "This will install... Do you wish to continue" prompt
"/SUPPRESSMSGBOXES",
"/SILENT",
"/VERYSILENT",
)
// }
// Safeguard in case we have requests in flight that need to drain...
slog.Info("Waiting for server to shutdown")
@@ -59,7 +50,7 @@ func DoUpgrade(cancel context.CancelFunc, done chan int) error {
}
slog.Debug(fmt.Sprintf("starting installer: %s %v", installerExe, installArgs))
os.Chdir(filepath.Dir(logFile)) //nolint:errcheck
os.Chdir(filepath.Dir(UpgradeLogFile)) //nolint:errcheck
cmd := exec.Command(installerExe, installArgs...)
if err := cmd.Start(); err != nil {

View File

@@ -2,15 +2,11 @@ package main
// Compile with the following to get rid of the cmd pop up on windows
// go build -ldflags="-H windowsgui" .
var (
AppName string
CLIName string
AppDir string
AppDataDir string
AppLogFile string
ServerLogFile string
import (
"github.com/ollama/ollama/app/lifecycle"
)
func main() {
run()
lifecycle.Run()
}

View File

@@ -88,8 +88,8 @@ DialogFontSize=12
[Files]
Source: ".\app.exe"; DestDir: "{app}"; DestName: "{#MyAppExeName}" ; Flags: ignoreversion 64bit
Source: "..\ollama.exe"; DestDir: "{app}"; Flags: ignoreversion 64bit
Source: "..\dist\windows-amd64\*.dll"; DestDir: "{app}"; Flags: ignoreversion 64bit
Source: "..\dist\windows-amd64\ollama_runners\*"; DestDir: "{app}\ollama_runners"; Flags: ignoreversion 64bit recursesubdirs
Source: "..\dist\windows-{#ARCH}\*.dll"; DestDir: "{app}"; Flags: ignoreversion 64bit
Source: "..\dist\windows-{#ARCH}\ollama_runners\*"; DestDir: "{app}\ollama_runners"; Flags: ignoreversion 64bit recursesubdirs
Source: "..\dist\ollama_welcome.ps1"; DestDir: "{app}"; Flags: ignoreversion
Source: ".\assets\app.ico"; DestDir: "{app}"; Flags: ignoreversion
#if DirExists("..\dist\windows-amd64\rocm")

View File

View File

@@ -1,3 +1,5 @@
//go:build !windows
package tray
import (

View File

@@ -1,71 +1,71 @@
//go:build windows
package wintray
import (
"fmt"
"log/slog"
"unsafe"
"golang.org/x/sys/windows"
)
const (
updatAvailableMenuID = 1
updateMenuID = updatAvailableMenuID + 1
separatorMenuID = updateMenuID + 1
diagLogsMenuID = separatorMenuID + 1
diagSeparatorMenuID = diagLogsMenuID + 1
quitMenuID = diagSeparatorMenuID + 1
)
func (t *winTray) initMenus() error {
if err := t.addOrUpdateMenuItem(diagLogsMenuID, 0, diagLogsMenuTitle, false); err != nil {
return fmt.Errorf("unable to create menu entries %w\n", err)
}
if err := t.addSeparatorMenuItem(diagSeparatorMenuID, 0); err != nil {
return fmt.Errorf("unable to create menu entries %w", err)
}
if err := t.addOrUpdateMenuItem(quitMenuID, 0, quitMenuTitle, false); err != nil {
return fmt.Errorf("unable to create menu entries %w\n", err)
}
return nil
}
func (t *winTray) UpdateAvailable(ver string) error {
if !t.updateNotified {
slog.Debug("updating menu and sending notification for new update")
if err := t.addOrUpdateMenuItem(updatAvailableMenuID, 0, updateAvailableMenuTitle, true); err != nil {
return fmt.Errorf("unable to create menu entries %w", err)
}
if err := t.addOrUpdateMenuItem(updateMenuID, 0, updateMenutTitle, false); err != nil {
return fmt.Errorf("unable to create menu entries %w", err)
}
if err := t.addSeparatorMenuItem(separatorMenuID, 0); err != nil {
return fmt.Errorf("unable to create menu entries %w", err)
}
iconFilePath, err := iconBytesToFilePath(wt.updateIcon)
if err != nil {
return fmt.Errorf("unable to write icon data to temp file: %w", err)
}
if err := wt.setIcon(iconFilePath); err != nil {
return fmt.Errorf("unable to set icon: %w", err)
}
t.updateNotified = true
t.pendingUpdate = true
// Now pop up the notification
t.muNID.Lock()
defer t.muNID.Unlock()
copy(t.nid.InfoTitle[:], windows.StringToUTF16(updateTitle))
copy(t.nid.Info[:], windows.StringToUTF16(fmt.Sprintf(updateMessage, ver)))
t.nid.Flags |= NIF_INFO
t.nid.Timeout = 10
t.nid.Size = uint32(unsafe.Sizeof(*wt.nid))
err = t.nid.modify()
if err != nil {
return err
}
}
return nil
}
//go:build windows
package wintray
import (
"fmt"
"log/slog"
"unsafe"
"golang.org/x/sys/windows"
)
const (
updatAvailableMenuID = 1
updateMenuID = updatAvailableMenuID + 1
separatorMenuID = updateMenuID + 1
diagLogsMenuID = separatorMenuID + 1
diagSeparatorMenuID = diagLogsMenuID + 1
quitMenuID = diagSeparatorMenuID + 1
)
func (t *winTray) initMenus() error {
if err := t.addOrUpdateMenuItem(diagLogsMenuID, 0, diagLogsMenuTitle, false); err != nil {
return fmt.Errorf("unable to create menu entries %w\n", err)
}
if err := t.addSeparatorMenuItem(diagSeparatorMenuID, 0); err != nil {
return fmt.Errorf("unable to create menu entries %w", err)
}
if err := t.addOrUpdateMenuItem(quitMenuID, 0, quitMenuTitle, false); err != nil {
return fmt.Errorf("unable to create menu entries %w\n", err)
}
return nil
}
func (t *winTray) UpdateAvailable(ver string) error {
if !t.updateNotified {
slog.Debug("updating menu and sending notification for new update")
if err := t.addOrUpdateMenuItem(updatAvailableMenuID, 0, updateAvailableMenuTitle, true); err != nil {
return fmt.Errorf("unable to create menu entries %w", err)
}
if err := t.addOrUpdateMenuItem(updateMenuID, 0, updateMenutTitle, false); err != nil {
return fmt.Errorf("unable to create menu entries %w", err)
}
if err := t.addSeparatorMenuItem(separatorMenuID, 0); err != nil {
return fmt.Errorf("unable to create menu entries %w", err)
}
iconFilePath, err := iconBytesToFilePath(wt.updateIcon)
if err != nil {
return fmt.Errorf("unable to write icon data to temp file: %w", err)
}
if err := wt.setIcon(iconFilePath); err != nil {
return fmt.Errorf("unable to set icon: %w", err)
}
t.updateNotified = true
t.pendingUpdate = true
// Now pop up the notification
t.muNID.Lock()
defer t.muNID.Unlock()
copy(t.nid.InfoTitle[:], windows.StringToUTF16(updateTitle))
copy(t.nid.Info[:], windows.StringToUTF16(fmt.Sprintf(updateMessage, ver)))
t.nid.Flags |= NIF_INFO
t.nid.Timeout = 10
t.nid.Size = uint32(unsafe.Sizeof(*wt.nid))
err = t.nid.modify()
if err != nil {
return err
}
}
return nil
}

View File

@@ -10,12 +10,44 @@ import (
"log/slog"
"os"
"path/filepath"
"strings"
"golang.org/x/crypto/ssh"
)
const defaultPrivateKey = "id_ed25519"
func keyPath() (string, error) {
home, err := os.UserHomeDir()
if err != nil {
return "", err
}
return filepath.Join(home, ".ollama", defaultPrivateKey), nil
}
func GetPublicKey() (string, error) {
keyPath, err := keyPath()
if err != nil {
return "", err
}
privateKeyFile, err := os.ReadFile(keyPath)
if err != nil {
slog.Info(fmt.Sprintf("Failed to load private key: %v", err))
return "", err
}
privateKey, err := ssh.ParsePrivateKey(privateKeyFile)
if err != nil {
return "", err
}
publicKey := ssh.MarshalAuthorizedKey(privateKey.PublicKey())
return strings.TrimSpace(string(publicKey)), nil
}
func NewNonce(r io.Reader, length int) (string, error) {
nonce := make([]byte, length)
if _, err := io.ReadFull(r, nonce); err != nil {
@@ -26,13 +58,11 @@ func NewNonce(r io.Reader, length int) (string, error) {
}
func Sign(ctx context.Context, bts []byte) (string, error) {
home, err := os.UserHomeDir()
keyPath, err := keyPath()
if err != nil {
return "", err
}
keyPath := filepath.Join(home, ".ollama", defaultPrivateKey)
privateKeyFile, err := os.ReadFile(keyPath)
if err != nil {
slog.Info(fmt.Sprintf("Failed to load private key: %v", err))

View File

@@ -32,10 +32,12 @@ import (
"golang.org/x/term"
"github.com/ollama/ollama/api"
"github.com/ollama/ollama/auth"
"github.com/ollama/ollama/format"
"github.com/ollama/ollama/parser"
"github.com/ollama/ollama/progress"
"github.com/ollama/ollama/server"
"github.com/ollama/ollama/types/errtypes"
"github.com/ollama/ollama/types/model"
"github.com/ollama/ollama/version"
)
@@ -54,12 +56,13 @@ func CreateHandler(cmd *cobra.Command, args []string) error {
p := progress.NewProgress(os.Stderr)
defer p.Stop()
modelfile, err := os.ReadFile(filename)
f, err := os.Open(filename)
if err != nil {
return err
}
defer f.Close()
commands, err := parser.Parse(bytes.NewReader(modelfile))
modelfile, err := model.ParseFile(f)
if err != nil {
return err
}
@@ -73,10 +76,10 @@ func CreateHandler(cmd *cobra.Command, args []string) error {
spinner := progress.NewSpinner(status)
p.Add(status, spinner)
for _, c := range commands {
switch c.Name {
for i := range modelfile.Commands {
switch modelfile.Commands[i].Name {
case "model", "adapter":
path := c.Args
path := modelfile.Commands[i].Args
if path == "~" {
path = home
} else if strings.HasPrefix(path, "~/") {
@@ -88,7 +91,7 @@ func CreateHandler(cmd *cobra.Command, args []string) error {
}
fi, err := os.Stat(path)
if errors.Is(err, os.ErrNotExist) && c.Name == "model" {
if errors.Is(err, os.ErrNotExist) && modelfile.Commands[i].Name == "model" {
continue
} else if err != nil {
return err
@@ -111,13 +114,7 @@ func CreateHandler(cmd *cobra.Command, args []string) error {
return err
}
name := c.Name
if c.Name == "model" {
name = "from"
}
re := regexp.MustCompile(fmt.Sprintf(`(?im)^(%s)\s+%s\s*$`, name, c.Args))
modelfile = re.ReplaceAll(modelfile, []byte("$1 @"+digest))
modelfile.Commands[i].Args = "@" + digest
}
}
@@ -147,7 +144,7 @@ func CreateHandler(cmd *cobra.Command, args []string) error {
quantization, _ := cmd.Flags().GetString("quantization")
request := api.CreateRequest{Name: args[0], Modelfile: string(modelfile), Quantization: quantization}
request := api.CreateRequest{Name: args[0], Modelfile: modelfile.String(), Quantization: quantization}
if err := client.Create(cmd.Context(), &request, fn); err != nil {
return err
}
@@ -357,6 +354,47 @@ func RunHandler(cmd *cobra.Command, args []string) error {
return generateInteractive(cmd, opts)
}
func errFromUnknownKey(unknownKeyErr error) error {
// find SSH public key in the error message
sshKeyPattern := `ssh-\w+ [^\s"]+`
re := regexp.MustCompile(sshKeyPattern)
matches := re.FindStringSubmatch(unknownKeyErr.Error())
if len(matches) > 0 {
serverPubKey := matches[0]
localPubKey, err := auth.GetPublicKey()
if err != nil {
return unknownKeyErr
}
if runtime.GOOS == "linux" && serverPubKey != localPubKey {
// try the ollama service public key
svcPubKey, err := os.ReadFile("/usr/share/ollama/.ollama/id_ed25519.pub")
if err != nil {
return unknownKeyErr
}
localPubKey = strings.TrimSpace(string(svcPubKey))
}
// check if the returned public key matches the local public key, this prevents adding a remote key to the user's account
if serverPubKey != localPubKey {
return unknownKeyErr
}
var msg strings.Builder
msg.WriteString(unknownKeyErr.Error())
msg.WriteString("\n\nYour ollama key is:\n")
msg.WriteString(localPubKey)
msg.WriteString("\nAdd your key at:\n")
msg.WriteString("https://ollama.com/settings/keys")
return errors.New(msg.String())
}
return unknownKeyErr
}
func PushHandler(cmd *cobra.Command, args []string) error {
client, err := api.ClientFromEnvironment()
if err != nil {
@@ -404,6 +442,20 @@ func PushHandler(cmd *cobra.Command, args []string) error {
request := api.PushRequest{Name: args[0], Insecure: insecure}
if err := client.Push(cmd.Context(), &request, fn); err != nil {
if spinner != nil {
spinner.Stop()
}
if strings.Contains(err.Error(), "access denied") {
return errors.New("you are not authorized to push to this namespace, create the model under a namespace you own")
}
host := model.ParseName(args[0]).Host
isOllamaHost := strings.HasSuffix(host, ".ollama.ai") || strings.HasSuffix(host, ".ollama.com")
if strings.Contains(err.Error(), errtypes.UnknownOllamaKeyErrMsg) && isOllamaHost {
// the user has not added their ollama key to ollama.com
// re-throw an error with a more user-friendly message
return errFromUnknownKey(err)
}
return err
}
@@ -831,24 +883,27 @@ func generate(cmd *cobra.Command, opts runOptions) error {
}
func RunServer(cmd *cobra.Command, _ []string) error {
host, port, err := net.SplitHostPort(strings.Trim(os.Getenv("OLLAMA_HOST"), "\"'"))
// retrieve the OLLAMA_HOST environment variable
ollamaHost, err := api.GetOllamaHost()
if err != nil {
host, port = "127.0.0.1", "11434"
if ip := net.ParseIP(strings.Trim(os.Getenv("OLLAMA_HOST"), "[]")); ip != nil {
host = ip.String()
}
return err
}
if err := initializeKeypair(); err != nil {
return err
}
ln, err := net.Listen("tcp", net.JoinHostPort(host, port))
ln, err := net.Listen("tcp", net.JoinHostPort(ollamaHost.Host, ollamaHost.Port))
if err != nil {
return err
}
return server.Serve(ln)
err = server.Serve(ln)
if errors.Is(err, http.ErrServerClosed) {
return nil
}
return err
}
func initializeKeypair() error {
@@ -1069,7 +1124,7 @@ Environment Variables:
RunE: ListHandler,
}
copyCmd := &cobra.Command{
Use: "cp SOURCE TARGET",
Use: "cp SOURCE DESTINATION",
Short: "Copy a model",
Args: cobra.ExactArgs(2),
PreRunE: checkServerHeartbeat,

View File

@@ -94,6 +94,7 @@ func generateInteractive(cmd *cobra.Command, opts runOptions) error {
fmt.Fprintln(os.Stderr, " /show Show model information")
fmt.Fprintln(os.Stderr, " /load <model> Load a session or model")
fmt.Fprintln(os.Stderr, " /save <model> Save your current session")
fmt.Fprintln(os.Stderr, " /clear Clear session context")
fmt.Fprintln(os.Stderr, " /bye Exit")
fmt.Fprintln(os.Stderr, " /?, /help Help for a command")
fmt.Fprintln(os.Stderr, " /? shortcuts Help for keyboard shortcuts")
@@ -280,6 +281,10 @@ func generateInteractive(cmd *cobra.Command, opts runOptions) error {
}
fmt.Printf("Created new model '%s'\n", args[1])
continue
case strings.HasPrefix(line, "/clear"):
opts.Messages = []api.Message{}
fmt.Println("Cleared session context")
continue
case strings.HasPrefix(line, "/set"):
args := strings.Fields(line)
if len(args) > 1 {

View File

@@ -53,7 +53,7 @@ func (m *SafetensorFormat) GetTensors(dirpath string, params *Params) ([]llm.Ten
var err error
t, offset, err = m.readTensors(f, offset, params)
if err != nil {
slog.Error("%v", err)
slog.Error(err.Error())
return nil, err
}
tensors = append(tensors, t...)
@@ -122,7 +122,7 @@ func (m *SafetensorFormat) readTensors(fn string, offset uint64, params *Params)
ggufName, err := m.GetLayerName(k)
if err != nil {
slog.Error("%v", err)
slog.Error(err.Error())
return nil, 0, err
}

View File

@@ -74,7 +74,7 @@ func (tf *TorchFormat) GetTensors(dirpath string, params *Params) ([]llm.Tensor,
ggufName, err := tf.GetLayerName(k.(string))
if err != nil {
slog.Error("%v", err)
slog.Error(err.Error())
return nil, err
}
slog.Debug(fmt.Sprintf("finding name for '%s' -> '%s'", k.(string), ggufName))

View File

@@ -17,7 +17,7 @@
### Model names
Model names follow a `model:tag` format, where `model` can have an optional namespace such as `example/model`. Some examples are `orca-mini:3b-q4_1` and `llama2:70b`. The tag is optional and, if not provided, will default to `latest`. The tag is used to identify a specific version.
Model names follow a `model:tag` format, where `model` can have an optional namespace such as `example/model`. Some examples are `orca-mini:3b-q4_1` and `llama3:70b`. The tag is optional and, if not provided, will default to `latest`. The tag is used to identify a specific version.
### Durations
@@ -66,7 +66,7 @@ Enable JSON mode by setting the `format` parameter to `json`. This will structur
```shell
curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"model": "llama3",
"prompt": "Why is the sky blue?"
}'
```
@@ -77,7 +77,7 @@ A stream of JSON objects is returned:
```json
{
"model": "llama2",
"model": "llama3",
"created_at": "2023-08-04T08:52:19.385406455-07:00",
"response": "The",
"done": false
@@ -95,11 +95,11 @@ The final response in the stream also includes additional data about the generat
- `context`: an encoding of the conversation used in this response, this can be sent in the next request to keep a conversational memory
- `response`: empty if the response was streamed, if not streamed, this will contain the full response
To calculate how fast the response is generated in tokens per second (token/s), divide `eval_count` / `eval_duration`.
To calculate how fast the response is generated in tokens per second (token/s), divide `eval_count` / `eval_duration` * `10^9`.
```json
{
"model": "llama2",
"model": "llama3",
"created_at": "2023-08-04T19:22:45.499127Z",
"response": "",
"done": true,
@@ -121,7 +121,7 @@ A response can be received in one reply when streaming is off.
```shell
curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"model": "llama3",
"prompt": "Why is the sky blue?",
"stream": false
}'
@@ -133,7 +133,7 @@ If `stream` is set to `false`, the response will be a single JSON object:
```json
{
"model": "llama2",
"model": "llama3",
"created_at": "2023-08-04T19:22:45.499127Z",
"response": "The sky is blue because it is the color of the sky.",
"done": true,
@@ -155,7 +155,7 @@ If `stream` is set to `false`, the response will be a single JSON object:
```shell
curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"model": "llama3",
"prompt": "What color is the sky at different times of the day? Respond using JSON",
"format": "json",
"stream": false
@@ -166,7 +166,7 @@ curl http://localhost:11434/api/generate -d '{
```json
{
"model": "llama2",
"model": "llama3",
"created_at": "2023-11-09T21:07:55.186497Z",
"response": "{\n\"morning\": {\n\"color\": \"blue\"\n},\n\"noon\": {\n\"color\": \"blue-gray\"\n},\n\"afternoon\": {\n\"color\": \"warm gray\"\n},\n\"evening\": {\n\"color\": \"orange\"\n}\n}\n",
"done": true,
@@ -289,7 +289,7 @@ If you want to set custom options for the model at runtime rather than in the Mo
```shell
curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"model": "llama3",
"prompt": "Why is the sky blue?",
"stream": false,
"options": {
@@ -332,7 +332,7 @@ curl http://localhost:11434/api/generate -d '{
```json
{
"model": "llama2",
"model": "llama3",
"created_at": "2023-08-04T19:22:45.499127Z",
"response": "The sky is blue because it is the color of the sky.",
"done": true,
@@ -354,7 +354,7 @@ If an empty prompt is provided, the model will be loaded into memory.
```shell
curl http://localhost:11434/api/generate -d '{
"model": "llama2"
"model": "llama3"
}'
```
@@ -364,7 +364,7 @@ A single JSON object is returned:
```json
{
"model": "llama2",
"model": "llama3",
"created_at": "2023-12-18T19:52:07.071755Z",
"response": "",
"done": true
@@ -407,7 +407,7 @@ Send a chat message with a streaming response.
```shell
curl http://localhost:11434/api/chat -d '{
"model": "llama2",
"model": "llama3",
"messages": [
{
"role": "user",
@@ -423,7 +423,7 @@ A stream of JSON objects is returned:
```json
{
"model": "llama2",
"model": "llama3",
"created_at": "2023-08-04T08:52:19.385406455-07:00",
"message": {
"role": "assistant",
@@ -438,7 +438,7 @@ Final response:
```json
{
"model": "llama2",
"model": "llama3",
"created_at": "2023-08-04T19:22:45.499127Z",
"done": true,
"total_duration": 4883583458,
@@ -456,7 +456,7 @@ Final response:
```shell
curl http://localhost:11434/api/chat -d '{
"model": "llama2",
"model": "llama3",
"messages": [
{
"role": "user",
@@ -471,7 +471,7 @@ curl http://localhost:11434/api/chat -d '{
```json
{
"model": "registry.ollama.ai/library/llama2:latest",
"model": "registry.ollama.ai/library/llama3:latest",
"created_at": "2023-12-12T14:13:43.416799Z",
"message": {
"role": "assistant",
@@ -495,7 +495,7 @@ Send a chat message with a conversation history. You can use this same approach
```shell
curl http://localhost:11434/api/chat -d '{
"model": "llama2",
"model": "llama3",
"messages": [
{
"role": "user",
@@ -519,7 +519,7 @@ A stream of JSON objects is returned:
```json
{
"model": "llama2",
"model": "llama3",
"created_at": "2023-08-04T08:52:19.385406455-07:00",
"message": {
"role": "assistant",
@@ -533,7 +533,7 @@ Final response:
```json
{
"model": "llama2",
"model": "llama3",
"created_at": "2023-08-04T19:22:45.499127Z",
"done": true,
"total_duration": 8113331500,
@@ -591,7 +591,7 @@ curl http://localhost:11434/api/chat -d '{
```shell
curl http://localhost:11434/api/chat -d '{
"model": "llama2",
"model": "llama3",
"messages": [
{
"role": "user",
@@ -609,7 +609,7 @@ curl http://localhost:11434/api/chat -d '{
```json
{
"model": "registry.ollama.ai/library/llama2:latest",
"model": "registry.ollama.ai/library/llama3:latest",
"created_at": "2023-12-12T14:13:43.416799Z",
"message": {
"role": "assistant",
@@ -651,7 +651,7 @@ Create a new model from a `Modelfile`.
```shell
curl http://localhost:11434/api/create -d '{
"name": "mario",
"modelfile": "FROM llama2\nSYSTEM You are mario from Super Mario Bros."
"modelfile": "FROM llama3\nSYSTEM You are mario from Super Mario Bros."
}'
```
@@ -758,7 +758,7 @@ A single JSON object will be returned.
}
},
{
"name": "llama2:latest",
"name": "llama3:latest",
"modified_at": "2023-12-07T09:32:18.757212583-08:00",
"size": 3825819519,
"digest": "fe938a131f40e6f6d40083c9f0f430a515233eb2edaa6d72eb85c50d64f2300e",
@@ -792,7 +792,7 @@ Show information about a model including details, modelfile, template, parameter
```shell
curl http://localhost:11434/api/show -d '{
"name": "llama2"
"name": "llama3"
}'
```
@@ -827,8 +827,8 @@ Copy a model. Creates a model with another name from an existing model.
```shell
curl http://localhost:11434/api/copy -d '{
"source": "llama2",
"destination": "llama2-backup"
"source": "llama3",
"destination": "llama3-backup"
}'
```
@@ -854,7 +854,7 @@ Delete a model and its data.
```shell
curl -X DELETE http://localhost:11434/api/delete -d '{
"name": "llama2:13b"
"name": "llama3:13b"
}'
```
@@ -882,7 +882,7 @@ Download a model from the ollama library. Cancelled pulls are resumed from where
```shell
curl http://localhost:11434/api/pull -d '{
"name": "llama2"
"name": "llama3"
}'
```

View File

@@ -51,7 +51,7 @@ Typically the build scripts will auto-detect CUDA, however, if your Linux distro
or installation approach uses unusual paths, you can specify the location by
specifying an environment variable `CUDA_LIB_DIR` to the location of the shared
libraries, and `CUDACXX` to the location of the nvcc compiler. You can customize
set set of target CUDA architectues by setting `CMAKE_CUDA_ARCHITECTURES` (e.g. "50;60;70")
a set of target CUDA architectures by setting `CMAKE_CUDA_ARCHITECTURES` (e.g. "50;60;70")
Then generate dependencies:
@@ -142,4 +142,4 @@ In addition to the common Windows development tools described above, install AMD
- [AMD HIP](https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html)
- [Strawberry Perl](https://strawberryperl.com/)
Lastly, add `ninja.exe` included with MSVC to the system path (e.g. `C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\IDE\CommonExtensions\Microsoft\CMake\Ninja`).
Lastly, add `ninja.exe` included with MSVC to the system path (e.g. `C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\IDE\CommonExtensions\Microsoft\CMake\Ninja`).

View File

@@ -32,7 +32,7 @@ When using the API, specify the `num_ctx` parameter:
```
curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"model": "llama3",
"prompt": "Why is the sky blue?",
"options": {
"num_ctx": 4096
@@ -88,9 +88,9 @@ On windows, Ollama inherits your user and system environment variables.
3. Edit or create New variable(s) for your user account for `OLLAMA_HOST`, `OLLAMA_MODELS`, etc.
4. Click OK/Apply to save
4. Click OK/Apply to save
5. Run `ollama` from a new terminal window
5. Run `ollama` from a new terminal window
## How can I expose Ollama on my network?
@@ -140,7 +140,7 @@ Refer to the section [above](#how-do-i-configure-ollama-server) for how to set e
- macOS: `~/.ollama/models`
- Linux: `/usr/share/ollama/.ollama/models`
- Windows: `C:\Users\<username>\.ollama\models`
- Windows: `C:\Users\%username%\.ollama\models`
### How do I set them to a different location?
@@ -221,14 +221,20 @@ The `keep_alive` parameter can be set to:
For example, to preload a model and leave it in memory use:
```shell
curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": -1}'
curl http://localhost:11434/api/generate -d '{"model": "llama3", "keep_alive": -1}'
```
To unload the model and free up memory use:
```shell
curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": 0}'
curl http://localhost:11434/api/generate -d '{"model": "llama3", "keep_alive": 0}'
```
Alternatively, you can change the amount of time all models are loaded into memory by setting the `OLLAMA_KEEP_ALIVE` environment variable when starting the Ollama server. The `OLLAMA_KEEP_ALIVE` variable uses the same parameter types as the `keep_alive` parameter types mentioned above. Refer to section explaining [how to configure the Ollama server](#how-do-i-configure-ollama-server) to correctly set the environment variable.
If you wish to override the `OLLAMA_KEEP_ALIVE` setting, use the `keep_alive` API parameter with the `/api/generate` or `/api/chat` API endpoints.
## How do I manage the maximum number of requests the server can queue
If too many requests are sent to the server, it will respond with a 503 error
indicating the server is overloaded. You can adjust how many requests may be
queue by setting `OLLAMA_MAX_QUEUE`

View File

@@ -125,7 +125,7 @@ Publishing models is in early alpha. If you'd like to publish your model to shar
1. Create [an account](https://ollama.com/signup)
2. Copy your Ollama public key:
- macOS: `cat ~/.ollama/id_ed25519.pub`
- macOS: `cat ~/.ollama/id_ed25519.pub | pbcopy`
- Windows: `type %USERPROFILE%\.ollama\id_ed25519.pub`
- Linux: `cat /usr/share/ollama/.ollama/id_ed25519.pub`
3. Add your public key to your [Ollama account](https://ollama.com/settings/keys)
@@ -136,6 +136,8 @@ Next, copy your model to your username's namespace:
ollama cp example <your username>/example
```
> Note: model names may only contain lowercase letters, digits, and the characters `.`, `-`, and `_`.
Then push the model:
```

View File

@@ -105,7 +105,7 @@ sudo chmod +x /usr/bin/ollama
To view logs of Ollama running as a startup service, run:
```bash
journalctl -u ollama
journalctl -e -u ollama
```
## Uninstall

View File

@@ -10,7 +10,7 @@ A model file is the blueprint to create and share models with Ollama.
- [Examples](#examples)
- [Instructions](#instructions)
- [FROM (Required)](#from-required)
- [Build from llama2](#build-from-llama2)
- [Build from llama3](#build-from-llama3)
- [Build from a bin file](#build-from-a-bin-file)
- [PARAMETER](#parameter)
- [Valid Parameters and Values](#valid-parameters-and-values)
@@ -48,7 +48,7 @@ INSTRUCTION arguments
An example of a `Modelfile` creating a mario blueprint:
```modelfile
FROM llama2
FROM llama3
# sets the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
# sets the context window size to 4096, this controls how many tokens the LLM can use as context to generate the next token
@@ -67,33 +67,25 @@ To use this:
More examples are available in the [examples directory](../examples).
### `Modelfile`s in [ollama.com/library][1]
There are two ways to view `Modelfile`s underlying the models in [ollama.com/library][1]:
- Option 1: view a details page from a model's tags page:
1. Go to a particular model's tags (e.g. https://ollama.com/library/llama2/tags)
2. Click on a tag (e.g. https://ollama.com/library/llama2:13b)
3. Scroll down to "Layers"
- Note: if the [`FROM` instruction](#from-required) is not present,
it means the model was created from a local file
- Option 2: use `ollama show` to print the `Modelfile` for any local models like so:
To view the Modelfile of a given model, use the `ollama show --modelfile` command.
```bash
> ollama show --modelfile llama2:13b
> ollama show --modelfile llama3
# Modelfile generated by "ollama show"
# To build a new Modelfile based on this one, replace the FROM line with:
# FROM llama2:13b
# FROM llama3:latest
FROM /Users/pdevine/.ollama/models/blobs/sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>
FROM /root/.ollama/models/blobs/sha256:123abc
TEMPLATE """[INST] {{ if .System }}<<SYS>>{{ .System }}<</SYS>>
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
{{ end }}{{ .Prompt }} [/INST] """
SYSTEM """"""
PARAMETER stop [INST]
PARAMETER stop [/INST]
PARAMETER stop <<SYS>>
PARAMETER stop <</SYS>>
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
{{ .Response }}<|eot_id|>"""
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"
PARAMETER stop "<|reserved_special_token"
```
## Instructions
@@ -106,10 +98,10 @@ The `FROM` instruction defines the base model to use when creating a model.
FROM <model name>:<tag>
```
#### Build from llama2
#### Build from llama3
```modelfile
FROM llama2
FROM llama3
```
A list of available base models:

View File

@@ -25,7 +25,7 @@ chat_completion = client.chat.completions.create(
'content': 'Say this is a test',
}
],
model='llama2',
model='llama3',
)
```
@@ -43,7 +43,7 @@ const openai = new OpenAI({
const chatCompletion = await openai.chat.completions.create({
messages: [{ role: 'user', content: 'Say this is a test' }],
model: 'llama2',
model: 'llama3',
})
```
@@ -53,7 +53,7 @@ const chatCompletion = await openai.chat.completions.create({
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama2",
"model": "llama3",
"messages": [
{
"role": "system",
@@ -113,7 +113,7 @@ curl http://localhost:11434/v1/chat/completions \
Before using a model, pull it locally `ollama pull`:
```shell
ollama pull llama2
ollama pull llama3
```
### Default model names
@@ -121,7 +121,7 @@ ollama pull llama2
For tooling that relies on default OpenAI model names such as `gpt-3.5-turbo`, use `ollama cp` to copy an existing model name to a temporary name:
```
ollama cp llama2 gpt-3.5-turbo
ollama cp llama3 gpt-3.5-turbo
```
Afterwards, this new model name can be specified the `model` field:

View File

@@ -15,7 +15,7 @@ import { Ollama } from "langchain/llms/ollama";
const ollama = new Ollama({
baseUrl: "http://localhost:11434",
model: "llama2",
model: "llama3",
});
const answer = await ollama.invoke(`why is the sky blue?`);
@@ -23,10 +23,10 @@ const answer = await ollama.invoke(`why is the sky blue?`);
console.log(answer);
```
That will get us the same thing as if we ran `ollama run llama2 "why is the sky blue"` in the terminal. But we want to load a document from the web to ask a question against. **Cheerio** is a great library for ingesting a webpage, and **LangChain** uses it in their **CheerioWebBaseLoader**. So let's install **Cheerio** and build that part of the app.
That will get us the same thing as if we ran `ollama run llama3 "why is the sky blue"` in the terminal. But we want to load a document from the web to ask a question against. **Cheerio** is a great library for ingesting a webpage, and **LangChain** uses it in their **CheerioWebBaseLoader**. So let's install **Cheerio** and build that part of the app.
```bash
npm install cheerio
npm install cheerio
```
```javascript

View File

@@ -17,10 +17,12 @@ Let's start by asking a simple question that we can get an answer to from the **
Then we can create a model and ask the question:
```python
from langchain.llms import Ollama
ollama = Ollama(base_url='http://localhost:11434',
model="llama2")
print(ollama("why is the sky blue"))
from langchain_community.llms import Ollama
ollama = Ollama(
base_url='http://localhost:11434',
model="llama3"
)
print(ollama.invoke("why is the sky blue"))
```
Notice that we are defining the model and the base URL for Ollama.

View File

@@ -1,47 +1,61 @@
# Ollama Windows Preview
Welcome to the Ollama Windows preview.
No more WSL required!
Ollama now runs as a native Windows application, including NVIDIA and AMD Radeon GPU support.
After installing Ollama Windows Preview, Ollama will run in the background and
the `ollama` command line is available in `cmd`, `powershell` or your favorite
terminal application. As usual the Ollama [api](./api.md) will be served on
`http://localhost:11434`.
As this is a preview release, you should expect a few bugs here and there. If
you run into a problem you can reach out on
[Discord](https://discord.gg/ollama), or file an
[issue](https://github.com/ollama/ollama/issues).
Logs will often be helpful in diagnosing the problem (see
[Troubleshooting](#troubleshooting) below)
## System Requirements
* Windows 10 or newer, Home or Pro
* NVIDIA 452.39 or newer Drivers if you have an NVIDIA card
* AMD Radeon Driver https://www.amd.com/en/support if you have a Radeon card
## API Access
Here's a quick example showing API access from `powershell`
```powershell
(Invoke-WebRequest -method POST -Body '{"model":"llama2", "prompt":"Why is the sky blue?", "stream": false}' -uri http://localhost:11434/api/generate ).Content | ConvertFrom-json
```
## Troubleshooting
While we're in preview, `OLLAMA_DEBUG` is always enabled, which adds
a "view logs" menu item to the app, and increses logging for the GUI app and
server.
Ollama on Windows stores files in a few different locations. You can view them in
the explorer window by hitting `<cmd>+R` and type in:
- `explorer %LOCALAPPDATA%\Ollama` contains logs, and downloaded updates
- *app.log* contains logs from the GUI application
- *server.log* contains the server logs
- *upgrade.log* contains log output for upgrades
- `explorer %LOCALAPPDATA%\Programs\Ollama` contains the binaries (The installer adds this to your user PATH)
- `explorer %HOMEPATH%\.ollama` contains models and configuration
- `explorer %TEMP%` contains temporary executable files in one or more `ollama*` directories
# Ollama Windows Preview
Welcome to the Ollama Windows preview.
No more WSL required!
Ollama now runs as a native Windows application, including NVIDIA and AMD Radeon GPU support.
After installing Ollama Windows Preview, Ollama will run in the background and
the `ollama` command line is available in `cmd`, `powershell` or your favorite
terminal application. As usual the Ollama [api](./api.md) will be served on
`http://localhost:11434`.
As this is a preview release, you should expect a few bugs here and there. If
you run into a problem you can reach out on
[Discord](https://discord.gg/ollama), or file an
[issue](https://github.com/ollama/ollama/issues).
Logs will often be helpful in diagnosing the problem (see
[Troubleshooting](#troubleshooting) below)
## System Requirements
* Windows 10 or newer, Home or Pro
* NVIDIA 452.39 or newer Drivers if you have an NVIDIA card
* AMD Radeon Driver https://www.amd.com/en/support if you have a Radeon card
## API Access
Here's a quick example showing API access from `powershell`
```powershell
(Invoke-WebRequest -method POST -Body '{"model":"llama3", "prompt":"Why is the sky blue?", "stream": false}' -uri http://localhost:11434/api/generate ).Content | ConvertFrom-json
```
## Troubleshooting
While we're in preview, `OLLAMA_DEBUG` is always enabled, which adds
a "view logs" menu item to the app, and increses logging for the GUI app and
server.
Ollama on Windows stores files in a few different locations. You can view them in
the explorer window by hitting `<cmd>+R` and type in:
- `explorer %LOCALAPPDATA%\Ollama` contains logs, and downloaded updates
- *app.log* contains logs from the GUI application
- *server.log* contains the server logs
- *upgrade.log* contains log output for upgrades
- `explorer %LOCALAPPDATA%\Programs\Ollama` contains the binaries (The installer adds this to your user PATH)
- `explorer %HOMEPATH%\.ollama` contains models and configuration
- `explorer %TEMP%` contains temporary executable files in one or more `ollama*` directories
## Standalone CLI
The easiest way to install Ollama on Windows is to use the `OllamaSetup.exe`
installer. It installs in your account without requiring Administrator rights.
We update Ollama regularly to support the latest models, and this installer will
help you keep up to date.
If you'd like to install or integrate Ollama as a service, a standalone
`ollama-windows-amd64.zip` zip file is available containing only the Ollama CLI
and GPU library dependencies for Nvidia and AMD. This allows for embedding
Ollama in existing applications, or running it as a system service via `ollama
serve` with tools such as [NSSM](https://nssm.cc/).

View File

@@ -2,7 +2,7 @@
When calling `ollama`, you can pass it a file to run all the prompts in the file, one after the other:
`ollama run llama2 < sourcequestions.txt`
`ollama run llama3 < sourcequestions.txt`
This concept is used in the following example.

1
examples/flyio/.gitignore vendored Normal file
View File

@@ -0,0 +1 @@
fly.toml

67
examples/flyio/README.md Normal file
View File

@@ -0,0 +1,67 @@
# Deploy Ollama to Fly.io
> Note: this example exposes a public endpoint and does not configure authentication. Use with care.
## Prerequisites
- Ollama: https://ollama.com/download
- Fly.io account. Sign up for a free account: https://fly.io/app/sign-up
## Steps
1. Login to Fly.io
```bash
fly auth login
```
1. Create a new Fly app
```bash
fly launch --name <name> --image ollama/ollama --internal-port 11434 --vm-size shared-cpu-8x --now
```
1. Pull and run `orca-mini:3b`
```bash
OLLAMA_HOST=https://<name>.fly.dev ollama run orca-mini:3b
```
`shared-cpu-8x` is a free-tier eligible machine type. For better performance, switch to a `performance` or `dedicated` machine type or attach a GPU for hardware acceleration (see below).
## (Optional) Persistent Volume
By default Fly Machines use ephemeral storage which is problematic if you want to use the same model across restarts without pulling it again. Create and attach a persistent volume to store the downloaded models:
1. Create the Fly Volume
```bash
fly volume create ollama
```
1. Update `fly.toml` and add `[mounts]`
```toml
[mounts]
source = "ollama"
destination = "/mnt/ollama/models"
```
1. Update `fly.toml` and add `[env]`
```toml
[env]
OLLAMA_MODELS = "/mnt/ollama/models"
```
1. Deploy your app
```bash
fly deploy
```
## (Optional) Hardware Acceleration
Fly.io GPU is currently in waitlist. Sign up for the waitlist: https://fly.io/gpu
Once you've been accepted, create the app with the additional flags `--vm-gpu-kind a100-pcie-40gb` or `--vm-gpu-kind a100-pcie-80gb`.

View File

@@ -35,7 +35,7 @@ func main() {
ctx := context.Background()
req := &api.ChatRequest{
Model: "llama2",
Model: "llama3",
Messages: messages,
}

View File

@@ -19,7 +19,7 @@ func main() {
}
defer resp.Body.Close()
responseData, err := io.ReadAll(resp.Body)
if err != nil {
log.Fatal(err)

View File

@@ -7,12 +7,24 @@
## Steps
1. Create the Ollama namespace, daemon set, and service
1. Create the Ollama namespace, deployment, and service
```bash
kubectl apply -f cpu.yaml
```
## (Optional) Hardware Acceleration
Hardware acceleration in Kubernetes requires NVIDIA's [`k8s-device-plugin`](https://github.com/NVIDIA/k8s-device-plugin) which is deployed in Kubernetes in form of daemonset. Follow the link for more details.
Once configured, create a GPU enabled Ollama deployment.
```bash
kubectl apply -f gpu.yaml
```
## Test
1. Port forward the Ollama service to connect and use it locally
```bash
@@ -23,14 +35,4 @@
```bash
ollama run orca-mini:3b
```
## (Optional) Hardware Acceleration
Hardware acceleration in Kubernetes requires NVIDIA's [`k8s-device-plugin`](https://github.com/NVIDIA/k8s-device-plugin). Follow the link for more details.
Once configured, create a GPU enabled Ollama deployment.
```bash
kubectl apply -f gpu.yaml
```
```

View File

@@ -40,9 +40,9 @@ while True:
continue
# Prompt
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
{context}
Question: {question}
Helpful Answer:"""
@@ -51,11 +51,11 @@ while True:
template=template,
)
llm = Ollama(model="llama2:13b", callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]))
llm = Ollama(model="llama3:8b", callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]))
qa_chain = RetrievalQA.from_chain_type(
llm,
retriever=vectorstore.as_retriever(),
chain_type_kwargs={"prompt": QA_CHAIN_PROMPT},
)
result = qa_chain({"query": query})
result = qa_chain({"query": query})

View File

@@ -1,12 +1,12 @@
from langchain.llms import Ollama
from langchain.document_loaders import WebBaseLoader
from langchain_community.llms import Ollama
from langchain_community.document_loaders import WebBaseLoader
from langchain.chains.summarize import load_summarize_chain
loader = WebBaseLoader("https://ollama.com/blog/run-llama2-uncensored-locally")
docs = loader.load()
llm = Ollama(model="llama2")
llm = Ollama(model="llama3")
chain = load_summarize_chain(llm, chain_type="stuff")
result = chain.run(docs)
result = chain.invoke(docs)
print(result)

View File

@@ -4,10 +4,10 @@ This example is a basic "hello world" of using LangChain with Ollama.
## Running the Example
1. Ensure you have the `llama2` model installed:
1. Ensure you have the `llama3` model installed:
```bash
ollama pull llama2
ollama pull llama3
```
2. Install the Python Requirements.
@@ -21,4 +21,3 @@ This example is a basic "hello world" of using LangChain with Ollama.
```bash
python main.py
```

View File

@@ -1,6 +1,6 @@
from langchain.llms import Ollama
input = input("What is your question?")
llm = Ollama(model="llama2")
llm = Ollama(model="llama3")
res = llm.predict(input)
print (res)

View File

@@ -1,4 +1,4 @@
FROM llama2
FROM llama3
PARAMETER temperature 1
SYSTEM """
You are Mario from super mario bros, acting as an assistant.

View File

@@ -2,12 +2,12 @@
# Example character: Mario
This example shows how to create a basic character using Llama2 as the base model.
This example shows how to create a basic character using Llama3 as the base model.
To run this example:
1. Download the Modelfile
2. `ollama pull llama2` to get the base model used in the model file.
2. `ollama pull llama3` to get the base model used in the model file.
3. `ollama create NAME -f ./Modelfile`
4. `ollama run NAME`
@@ -18,7 +18,7 @@ Ask it some questions like "Who are you?" or "Is Peach in trouble again?"
What the model file looks like:
```
FROM llama2
FROM llama3
PARAMETER temperature 1
SYSTEM """
You are Mario from Super Mario Bros, acting as an assistant.

View File

@@ -2,16 +2,16 @@ import requests
import json
import random
model = "llama2"
model = "llama3"
template = {
"firstName": "",
"lastName": "",
"firstName": "",
"lastName": "",
"address": {
"street": "",
"city": "",
"state": "",
"street": "",
"city": "",
"state": "",
"zipCode": ""
},
},
"phoneNumber": ""
}

View File

@@ -12,7 +12,7 @@ countries = [
"France",
]
country = random.choice(countries)
model = "llama2"
model = "llama3"
prompt = f"generate one realistically believable sample data set of a persons first name, last name, address in {country}, and phone number. Do not use common names. Respond using JSON. Key names should have no backslashes, values should use plain ascii with no special characters."

View File

@@ -6,10 +6,10 @@ There are two python scripts in this example. `randomaddresses.py` generates ran
## Running the Example
1. Ensure you have the `llama2` model installed:
1. Ensure you have the `llama3` model installed:
```bash
ollama pull llama2
ollama pull llama3
```
2. Install the Python Requirements.

View File

@@ -2,7 +2,7 @@ import json
import requests
# NOTE: ollama must be running for this to work, start the ollama app or run `ollama serve`
model = "llama2" # TODO: update this for whatever model you wish to use
model = "llama3" # TODO: update this for whatever model you wish to use
def chat(messages):

View File

@@ -4,10 +4,10 @@ The **chat** endpoint is one of two ways to generate text from an LLM with Ollam
## Running the Example
1. Ensure you have the `llama2` model installed:
1. Ensure you have the `llama3` model installed:
```bash
ollama pull llama2
ollama pull llama3
```
2. Install the Python Requirements.

View File

@@ -4,10 +4,10 @@ This example demonstrates how one would create a set of 'mentors' you can have a
## Usage
1. Add llama2 to have the mentors ask your questions:
1. Add llama3 to have the mentors ask your questions:
```bash
ollama pull llama2
ollama pull llama3
```
2. Install prerequisites:

View File

@@ -15,7 +15,7 @@ async function characterGenerator() {
ollama.setModel("stablebeluga2:70b-q4_K_M");
const bio = await ollama.generate(`create a bio of ${character} in a single long paragraph. Instead of saying '${character} is...' or '${character} was...' use language like 'You are...' or 'You were...'. Then create a paragraph describing the speaking mannerisms and style of ${character}. Don't include anything about how ${character} looked or what they sounded like, just focus on the words they said. Instead of saying '${character} would say...' use language like 'You should say...'. If you use quotes, always use single quotes instead of double quotes. If there are any specific words or phrases you used a lot, show how you used them. `);
const thecontents = `FROM llama2\nSYSTEM """\n${bio.response.replace(/(\r\n|\n|\r)/gm, " ").replace('would', 'should')} All answers to questions should be related back to what you are most known for.\n"""`;
const thecontents = `FROM llama3\nSYSTEM """\n${bio.response.replace(/(\r\n|\n|\r)/gm, " ").replace('would', 'should')} All answers to questions should be related back to what you are most known for.\n"""`;
fs.writeFile(path.join(directory, 'Modelfile'), thecontents, (err: any) => {
if (err) throw err;
@@ -23,4 +23,4 @@ async function characterGenerator() {
});
}
characterGenerator();
characterGenerator();

View File

@@ -1,6 +1,6 @@
import * as readline from "readline";
const model = "llama2";
const model = "llama3";
type Message = {
role: "assistant" | "user" | "system";
content: string;
@@ -74,4 +74,4 @@ async function main() {
}
main();
main();

View File

@@ -81,8 +81,10 @@ func commonAMDValidateLibDir() (string, error) {
}
// Well known location(s)
if rocmLibUsable(RocmStandardLocation) {
return RocmStandardLocation, nil
for _, path := range RocmStandardLocations {
if rocmLibUsable(path) {
return path, nil
}
}
// Installer payload location if we're running the installed binary

View File

@@ -25,12 +25,12 @@ const (
// Prefix with the node dir
GPUTotalMemoryFileGlob = "mem_banks/*/properties" // size_in_bytes line
GPUUsedMemoryFileGlob = "mem_banks/*/used_memory"
RocmStandardLocation = "/opt/rocm/lib"
)
var (
// Used to validate if the given ROCm lib is usable
ROCmLibGlobs = []string{"libhipblas.so.2*", "rocblas"} // TODO - probably include more coverage of files here...
ROCmLibGlobs = []string{"libhipblas.so.2*", "rocblas"} // TODO - probably include more coverage of files here...
RocmStandardLocations = []string{"/opt/rocm/lib", "/usr/lib64"}
)
// Gather GPU information from the amdgpu driver if any supported GPUs are detected

View File

@@ -14,7 +14,6 @@ import (
)
const (
RocmStandardLocation = "C:\\Program Files\\AMD\\ROCm\\5.7\\bin" // TODO glob?
// TODO We're lookinng for this exact name to detect iGPUs since hipGetDeviceProperties never reports integrated==true
iGPUName = "AMD Radeon(TM) Graphics"
@@ -22,7 +21,8 @@ const (
var (
// Used to validate if the given ROCm lib is usable
ROCmLibGlobs = []string{"hipblas.dll", "rocblas"} // TODO - probably include more coverage of files here...
ROCmLibGlobs = []string{"hipblas.dll", "rocblas"} // TODO - probably include more coverage of files here...
RocmStandardLocations = []string{"C:\\Program Files\\AMD\\ROCm\\5.7\\bin"} // TODO glob?
)
func AMDGetGPUInfo() []GpuInfo {

View File

@@ -12,6 +12,8 @@ import (
"sync"
"syscall"
"time"
"github.com/ollama/ollama/server/envconfig"
)
var (
@@ -24,45 +26,8 @@ func PayloadsDir() (string, error) {
defer lock.Unlock()
var err error
if payloadsDir == "" {
runnersDir := os.Getenv("OLLAMA_RUNNERS_DIR")
// On Windows we do not carry the payloads inside the main executable
if runtime.GOOS == "windows" && runnersDir == "" {
appExe, err := os.Executable()
if err != nil {
slog.Error("failed to lookup executable path", "error", err)
return "", err
}
runnersDir := envconfig.RunnersDir
cwd, err := os.Getwd()
if err != nil {
slog.Error("failed to lookup working directory", "error", err)
return "", err
}
var paths []string
for _, root := range []string{appExe, cwd} {
paths = append(paths,
filepath.Join(root),
filepath.Join(root, "windows-"+runtime.GOARCH),
filepath.Join(root, "dist", "windows-"+runtime.GOARCH),
)
}
// Try a few variations to improve developer experience when building from source in the local tree
for _, p := range paths {
candidate := filepath.Join(p, "ollama_runners")
_, err := os.Stat(candidate)
if err == nil {
runnersDir = candidate
break
}
}
if runnersDir == "" {
err = fmt.Errorf("unable to locate llm runner directory. Set OLLAMA_RUNNERS_DIR to the location of 'ollama_runners'")
slog.Error("incomplete distribution", "error", err)
return "", err
}
}
if runnersDir != "" {
payloadsDir = runnersDir
return payloadsDir, nil
@@ -70,7 +35,7 @@ func PayloadsDir() (string, error) {
// The remainder only applies on non-windows where we still carry payloads in the main executable
cleanupTmpDirs()
tmpDir := os.Getenv("OLLAMA_TMPDIR")
tmpDir := envconfig.TmpDir
if tmpDir == "" {
tmpDir, err = os.MkdirTemp("", "ollama")
if err != nil {
@@ -133,7 +98,7 @@ func cleanupTmpDirs() {
func Cleanup() {
lock.Lock()
defer lock.Unlock()
runnersDir := os.Getenv("OLLAMA_RUNNERS_DIR")
runnersDir := envconfig.RunnersDir
if payloadsDir != "" && runnersDir == "" && runtime.GOOS != "windows" {
// We want to fully clean up the tmpdir parent of the payloads dir
tmpDir := filepath.Clean(filepath.Join(payloadsDir, ".."))

View File

@@ -21,16 +21,18 @@ import (
"unsafe"
"github.com/ollama/ollama/format"
"github.com/ollama/ollama/server/envconfig"
)
type handles struct {
deviceCount int
cudart *C.cudart_handle_t
nvcuda *C.nvcuda_handle_t
}
const (
cudaMinimumMemory = 457 * format.MebiByte
rocmMinimumMemory = 457 * format.MebiByte
cudaMinimumMemory = 256 * format.MebiByte
rocmMinimumMemory = 256 * format.MebiByte
)
var gpuMutex sync.Mutex
@@ -62,6 +64,22 @@ var CudartWindowsGlobs = []string{
"c:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v*\\bin\\cudart64_*.dll",
}
var NvcudaLinuxGlobs = []string{
"/usr/local/cuda*/targets/*/lib/libcuda.so*",
"/usr/lib/*-linux-gnu/nvidia/current/libcuda.so*",
"/usr/lib/*-linux-gnu/libcuda.so*",
"/usr/lib/wsl/lib/libcuda.so*",
"/usr/lib/wsl/drivers/*/libcuda.so*",
"/opt/cuda/lib*/libcuda.so*",
"/usr/local/cuda/lib*/libcuda.so*",
"/usr/lib*/libcuda.so*",
"/usr/local/lib*/libcuda.so*",
}
var NvcudaWindowsGlobs = []string{
"c:\\windows\\system*\\nvcuda.dll",
}
// Jetson devices have JETSON_JETPACK="x.y.z" factory set to the Jetpack version installed.
// Included to drive logic for reducing Ollama-allocated overhead on L4T/Jetson devices.
var CudaTegra string = os.Getenv("JETSON_JETPACK")
@@ -74,6 +92,8 @@ func initGPUHandles() *handles {
gpuHandles := &handles{}
var cudartMgmtName string
var cudartMgmtPatterns []string
var nvcudaMgmtName string
var nvcudaMgmtPatterns []string
tmpDir, _ := PayloadsDir()
switch runtime.GOOS {
@@ -82,6 +102,9 @@ func initGPUHandles() *handles {
localAppData := os.Getenv("LOCALAPPDATA")
cudartMgmtPatterns = []string{filepath.Join(localAppData, "Programs", "Ollama", cudartMgmtName)}
cudartMgmtPatterns = append(cudartMgmtPatterns, CudartWindowsGlobs...)
// Aligned with driver, we can't carry as payloads
nvcudaMgmtName = "nvcuda.dll"
nvcudaMgmtPatterns = NvcudaWindowsGlobs
case "linux":
cudartMgmtName = "libcudart.so*"
if tmpDir != "" {
@@ -89,11 +112,25 @@ func initGPUHandles() *handles {
cudartMgmtPatterns = []string{filepath.Join(tmpDir, "cuda*", cudartMgmtName)}
}
cudartMgmtPatterns = append(cudartMgmtPatterns, CudartLinuxGlobs...)
// Aligned with driver, we can't carry as payloads
nvcudaMgmtName = "libcuda.so*"
nvcudaMgmtPatterns = NvcudaLinuxGlobs
default:
return gpuHandles
}
slog.Info("Detecting GPUs")
nvcudaLibPaths := FindGPULibs(nvcudaMgmtName, nvcudaMgmtPatterns)
if len(nvcudaLibPaths) > 0 {
deviceCount, nvcuda, libPath := LoadNVCUDAMgmt(nvcudaLibPaths)
if nvcuda != nil {
slog.Info("detected GPUs", "count", deviceCount, "library", libPath)
gpuHandles.nvcuda = nvcuda
gpuHandles.deviceCount = deviceCount
return gpuHandles
}
}
cudartLibPaths := FindGPULibs(cudartMgmtName, cudartMgmtPatterns)
if len(cudartLibPaths) > 0 {
deviceCount, cudart, libPath := LoadCUDARTMgmt(cudartLibPaths)
@@ -118,6 +155,9 @@ func GetGPUInfo() GpuInfoList {
if gpuHandles.cudart != nil {
C.cudart_release(*gpuHandles.cudart)
}
if gpuHandles.nvcuda != nil {
C.nvcuda_release(*gpuHandles.nvcuda)
}
}()
// All our GPU builds on x86 have AVX enabled, so fallback to CPU if we don't detect at least AVX
@@ -126,6 +166,12 @@ func GetGPUInfo() GpuInfoList {
slog.Warn("CPU does not have AVX or AVX2, disabling GPU support.")
}
// On windows we bundle the nvidia library one level above the runner dir
depPath := ""
if runtime.GOOS == "windows" && envconfig.RunnersDir != "" {
depPath = filepath.Dir(envconfig.RunnersDir)
}
var memInfo C.mem_info_t
resp := []GpuInfo{}
@@ -138,7 +184,11 @@ func GetGPUInfo() GpuInfoList {
gpuInfo := GpuInfo{
Library: "cuda",
}
C.cudart_check_vram(*gpuHandles.cudart, C.int(i), &memInfo)
if gpuHandles.cudart != nil {
C.cudart_check_vram(*gpuHandles.cudart, C.int(i), &memInfo)
} else {
C.nvcuda_check_vram(*gpuHandles.nvcuda, C.int(i), &memInfo)
}
if memInfo.err != nil {
slog.Info("error looking up nvidia GPU memory", "error", C.GoString(memInfo.err))
C.free(unsafe.Pointer(memInfo.err))
@@ -154,6 +204,7 @@ func GetGPUInfo() GpuInfoList {
gpuInfo.Major = int(memInfo.major)
gpuInfo.Minor = int(memInfo.minor)
gpuInfo.MinimumMemory = cudaMinimumMemory
gpuInfo.DependencyPath = depPath
// TODO potentially sort on our own algorithm instead of what the underlying GPU library does...
resp = append(resp, gpuInfo)
@@ -196,9 +247,10 @@ func GetCPUMem() (memInfo, error) {
return ret, nil
}
func FindGPULibs(baseLibName string, patterns []string) []string {
func FindGPULibs(baseLibName string, defaultPatterns []string) []string {
// Multiple GPU libraries may exist, and some may not work, so keep trying until we exhaust them
var ldPaths []string
var patterns []string
gpuLibPaths := []string{}
slog.Debug("Searching for GPU library", "name", baseLibName)
@@ -218,8 +270,14 @@ func FindGPULibs(baseLibName string, patterns []string) []string {
}
patterns = append(patterns, filepath.Join(d, baseLibName+"*"))
}
patterns = append(patterns, defaultPatterns...)
slog.Debug("gpu library search", "globs", patterns)
for _, pattern := range patterns {
// Nvidia PhysX known to return bogus results
if strings.Contains(pattern, "PhysX") {
slog.Debug("skipping PhysX cuda library path", "path", pattern)
}
// Ignore glob discovery errors
matches, _ := filepath.Glob(pattern)
for _, match := range matches {
@@ -267,8 +325,25 @@ func LoadCUDARTMgmt(cudartLibPaths []string) (int, *C.cudart_handle_t, string) {
return 0, nil, ""
}
func LoadNVCUDAMgmt(nvcudaLibPaths []string) (int, *C.nvcuda_handle_t, string) {
var resp C.nvcuda_init_resp_t
resp.ch.verbose = getVerboseState()
for _, libPath := range nvcudaLibPaths {
lib := C.CString(libPath)
defer C.free(unsafe.Pointer(lib))
C.nvcuda_init(lib, &resp)
if resp.err != nil {
slog.Debug("Unable to load nvcuda", "library", libPath, "error", C.GoString(resp.err))
C.free(unsafe.Pointer(resp.err))
} else {
return int(resp.num_devices), &resp.ch, libPath
}
}
return 0, nil, ""
}
func getVerboseState() C.uint16_t {
if debug := os.Getenv("OLLAMA_DEBUG"); debug != "" {
if envconfig.Debug {
return C.uint16_t(1)
}
return C.uint16_t(0)

View File

@@ -1,3 +1,5 @@
//go:build darwin
package gpu
/*
@@ -8,6 +10,12 @@ package gpu
import "C"
import (
"runtime"
"github.com/ollama/ollama/format"
)
const (
metalMinimumMemory = 384 * format.MebiByte
)
func GetGPUInfo() GpuInfoList {
@@ -30,7 +38,7 @@ func GetGPUInfo() GpuInfoList {
// TODO is there a way to gather actual allocated video memory? (currentAllocatedSize doesn't work)
info.FreeMemory = info.TotalMemory
info.MinimumMemory = 0
info.MinimumMemory = metalMinimumMemory
return []GpuInfo{info}
}

View File

@@ -58,6 +58,7 @@ void cpu_check_ram(mem_info_t *resp);
#endif
#include "gpu_info_cudart.h"
#include "gpu_info_nvcuda.h"
#endif // __GPU_INFO_H__
#endif // __APPLE__

View File

@@ -6,9 +6,9 @@
// Just enough typedef's to dlopen/dlsym for memory information
typedef enum cudartReturn_enum {
CUDART_SUCCESS = 0,
CUDA_ERROR_INVALID_VALUE = 1,
CUDA_ERROR_MEMORY_ALLOCATION = 2,
CUDA_ERROR_INSUFFICIENT_DRIVER = 35,
CUDART_ERROR_INVALID_VALUE = 1,
CUDART_ERROR_MEMORY_ALLOCATION = 2,
CUDART_ERROR_INSUFFICIENT_DRIVER = 35,
// Other values omitted for now...
} cudartReturn_t;

203
gpu/gpu_info_nvcuda.c Normal file
View File

@@ -0,0 +1,203 @@
#ifndef __APPLE__ // TODO - maybe consider nvidia support on intel macs?
#include <string.h>
#include "gpu_info_nvcuda.h"
void nvcuda_init(char *nvcuda_lib_path, nvcuda_init_resp_t *resp) {
CUresult ret;
resp->err = NULL;
resp->num_devices = 0;
const int buflen = 256;
char buf[buflen + 1];
int i;
struct lookup {
char *s;
void **p;
} l[] = {
{"cuInit", (void *)&resp->ch.cuInit},
{"cuDriverGetVersion", (void *)&resp->ch.cuDriverGetVersion},
{"cuDeviceGetCount", (void *)&resp->ch.cuDeviceGetCount},
{"cuDeviceGet", (void *)&resp->ch.cuDeviceGet},
{"cuDeviceGetAttribute", (void *)&resp->ch.cuDeviceGetAttribute},
{"cuDeviceGetUuid", (void *)&resp->ch.cuDeviceGetUuid},
{"cuCtxCreate_v3", (void *)&resp->ch.cuCtxCreate_v3},
{"cuMemGetInfo_v2", (void *)&resp->ch.cuMemGetInfo_v2},
{"cuCtxDestroy", (void *)&resp->ch.cuCtxDestroy},
{NULL, NULL},
};
resp->ch.handle = LOAD_LIBRARY(nvcuda_lib_path, RTLD_LAZY);
if (!resp->ch.handle) {
char *msg = LOAD_ERR();
LOG(resp->ch.verbose, "library %s load err: %s\n", nvcuda_lib_path, msg);
snprintf(buf, buflen,
"Unable to load %s library to query for Nvidia GPUs: %s",
nvcuda_lib_path, msg);
free(msg);
resp->err = strdup(buf);
return;
}
for (i = 0; l[i].s != NULL; i++) {
*l[i].p = LOAD_SYMBOL(resp->ch.handle, l[i].s);
if (!*l[i].p) {
char *msg = LOAD_ERR();
LOG(resp->ch.verbose, "dlerr: %s\n", msg);
UNLOAD_LIBRARY(resp->ch.handle);
resp->ch.handle = NULL;
snprintf(buf, buflen, "symbol lookup for %s failed: %s", l[i].s,
msg);
free(msg);
resp->err = strdup(buf);
return;
}
}
ret = (*resp->ch.cuInit)(0);
if (ret != CUDA_SUCCESS) {
LOG(resp->ch.verbose, "cuInit err: %d\n", ret);
UNLOAD_LIBRARY(resp->ch.handle);
resp->ch.handle = NULL;
if (ret == CUDA_ERROR_INSUFFICIENT_DRIVER) {
resp->err = strdup("your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama");
return;
}
snprintf(buf, buflen, "nvcuda init failure: %d", ret);
resp->err = strdup(buf);
return;
}
int version = 0;
nvcudaDriverVersion_t driverVersion;
driverVersion.major = 0;
driverVersion.minor = 0;
// Report driver version if we're in verbose mode, ignore errors
ret = (*resp->ch.cuDriverGetVersion)(&version);
if (ret != CUDA_SUCCESS) {
LOG(resp->ch.verbose, "cuDriverGetVersion failed: %d\n", ret);
} else {
driverVersion.major = version / 1000;
driverVersion.minor = (version - (driverVersion.major * 1000)) / 10;
LOG(resp->ch.verbose, "CUDA driver version: %d-%d\n", driverVersion.major, driverVersion.minor);
}
ret = (*resp->ch.cuDeviceGetCount)(&resp->num_devices);
if (ret != CUDA_SUCCESS) {
LOG(resp->ch.verbose, "cuDeviceGetCount err: %d\n", ret);
UNLOAD_LIBRARY(resp->ch.handle);
resp->ch.handle = NULL;
snprintf(buf, buflen, "unable to get device count: %d", ret);
resp->err = strdup(buf);
return;
}
}
const int buflen = 256;
void nvcuda_check_vram(nvcuda_handle_t h, int i, mem_info_t *resp) {
resp->err = NULL;
nvcudaMemory_t memInfo = {0,0};
CUresult ret;
CUdevice device = -1;
CUcontext ctx = NULL;
char buf[buflen + 1];
CUuuid uuid = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
if (h.handle == NULL) {
resp->err = strdup("nvcuda handle isn't initialized");
return;
}
ret = (*h.cuDeviceGet)(&device, i);
if (ret != CUDA_SUCCESS) {
snprintf(buf, buflen, "nvcuda device failed to initialize");
resp->err = strdup(buf);
return;
}
resp->major = 0;
resp->minor = 0;
int major = 0;
int minor = 0;
ret = (*h.cuDeviceGetAttribute)(&major, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR, device);
if (ret != CUDA_SUCCESS) {
LOG(h.verbose, "[%d] device major lookup failure: %d\n", i, ret);
} else {
ret = (*h.cuDeviceGetAttribute)(&minor, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR, device);
if (ret != CUDA_SUCCESS) {
LOG(h.verbose, "[%d] device minor lookup failure: %d\n", i, ret);
} else {
resp->minor = minor;
resp->major = major;
}
}
ret = (*h.cuDeviceGetUuid)(&uuid, device);
if (ret != CUDA_SUCCESS) {
LOG(h.verbose, "[%d] device uuid lookup failure: %d\n", i, ret);
snprintf(&resp->gpu_id[0], GPU_ID_LEN, "%d", i);
} else {
// GPU-d110a105-ac29-1d54-7b49-9c90440f215b
snprintf(&resp->gpu_id[0], GPU_ID_LEN,
"GPU-%02x%02x%02x%02x-%02x%02x-%02x%02x-%02x%02x-%02x%02x%02x%02x%02x%02x",
uuid.bytes[0],
uuid.bytes[1],
uuid.bytes[2],
uuid.bytes[3],
uuid.bytes[4],
uuid.bytes[5],
uuid.bytes[6],
uuid.bytes[7],
uuid.bytes[8],
uuid.bytes[9],
uuid.bytes[10],
uuid.bytes[11],
uuid.bytes[12],
uuid.bytes[13],
uuid.bytes[14],
uuid.bytes[15]
);
}
// To get memory we have to set (and release) a context
ret = (*h.cuCtxCreate_v3)(&ctx, NULL, 0, 0, device);
if (ret != CUDA_SUCCESS) {
snprintf(buf, buflen, "nvcuda failed to get primary device context %d", ret);
resp->err = strdup(buf);
return;
}
ret = (*h.cuMemGetInfo_v2)(&memInfo.free, &memInfo.total);
if (ret != CUDA_SUCCESS) {
snprintf(buf, buflen, "nvcuda device memory info lookup failure %d", ret);
resp->err = strdup(buf);
// Best effort on failure...
(*h.cuCtxDestroy)(ctx);
return;
}
resp->total = memInfo.total;
resp->free = memInfo.free;
LOG(h.verbose, "[%s] CUDA totalMem %lu mb\n", resp->gpu_id, resp->total / 1024 / 1024);
LOG(h.verbose, "[%s] CUDA freeMem %lu mb\n", resp->gpu_id, resp->free / 1024 / 1024);
LOG(h.verbose, "[%s] Compute Capability %d.%d\n", resp->gpu_id, resp->major, resp->minor);
ret = (*h.cuCtxDestroy)(ctx);
if (ret != CUDA_SUCCESS) {
LOG(1, "nvcuda failed to release primary device context %d", ret);
}
}
void nvcuda_release(nvcuda_handle_t h) {
LOG(h.verbose, "releasing nvcuda library\n");
UNLOAD_LIBRARY(h.handle);
// TODO and other context release logic?
h.handle = NULL;
}
#endif // __APPLE__

71
gpu/gpu_info_nvcuda.h Normal file
View File

@@ -0,0 +1,71 @@
#ifndef __APPLE__
#ifndef __GPU_INFO_NVCUDA_H__
#define __GPU_INFO_NVCUDA_H__
#include "gpu_info.h"
// Just enough typedef's to dlopen/dlsym for memory information
typedef enum cudaError_enum {
CUDA_SUCCESS = 0,
CUDA_ERROR_INVALID_VALUE = 1,
CUDA_ERROR_MEMORY_ALLOCATION = 2,
CUDA_ERROR_NOT_INITIALIZED = 3,
CUDA_ERROR_INSUFFICIENT_DRIVER = 35,
// Other values omitted for now...
} CUresult;
typedef enum CUdevice_attribute_enum {
CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR = 75,
CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR = 76,
// TODO - not yet wired up but may be useful for Jetson or other
// integrated GPU scenarios with shared memory
CU_DEVICE_ATTRIBUTE_INTEGRATED = 18
} CUdevice_attribute;
typedef void *nvcudaDevice_t; // Opaque is sufficient
typedef struct nvcudaMemory_st {
uint64_t total;
uint64_t free;
} nvcudaMemory_t;
typedef struct nvcudaDriverVersion {
int major;
int minor;
} nvcudaDriverVersion_t;
typedef struct CUuuid_st {
unsigned char bytes[16];
} CUuuid;
typedef int CUdevice;
typedef void* CUcontext;
typedef struct nvcuda_handle {
void *handle;
uint16_t verbose;
CUresult (*cuInit)(unsigned int Flags);
CUresult (*cuDriverGetVersion)(int *driverVersion);
CUresult (*cuDeviceGetCount)(int *);
CUresult (*cuDeviceGet)(CUdevice* device, int ordinal);
CUresult (*cuDeviceGetAttribute)(int* pi, CUdevice_attribute attrib, CUdevice dev);
CUresult (*cuDeviceGetUuid)(CUuuid* uuid, CUdevice dev); // signature compatible with cuDeviceGetUuid_v2
// Context specific aspects
CUresult (*cuCtxCreate_v3)(CUcontext* pctx, void *params, int len, unsigned int flags, CUdevice dev);
CUresult (*cuMemGetInfo_v2)(uint64_t* free, uint64_t* total);
CUresult (*cuCtxDestroy)(CUcontext ctx);
} nvcuda_handle_t;
typedef struct nvcuda_init_resp {
char *err; // If err is non-null handle is invalid
nvcuda_handle_t ch;
int num_devices;
} nvcuda_init_resp_t;
void nvcuda_init(char *nvcuda_lib_path, nvcuda_init_resp_t *resp);
void nvcuda_check_vram(nvcuda_handle_t ch, int device_id, mem_info_t *resp);
void nvcuda_release(nvcuda_handle_t ch);
#endif // __GPU_INFO_NVCUDA_H__
#endif // __APPLE__

View File

@@ -0,0 +1,117 @@
//go:build integration
package integration
import (
"context"
"errors"
"fmt"
"log/slog"
"os"
"strconv"
"strings"
"sync"
"testing"
"time"
"github.com/ollama/ollama/api"
"github.com/stretchr/testify/require"
)
func TestMaxQueue(t *testing.T) {
// Note: This test can be quite slow when running in CPU mode, so keep the threadCount low unless your on GPU
// Also note that by default Darwin can't sustain > ~128 connections without adjusting limits
threadCount := 32
mq := os.Getenv("OLLAMA_MAX_QUEUE")
if mq != "" {
var err error
threadCount, err = strconv.Atoi(mq)
require.NoError(t, err)
} else {
os.Setenv("OLLAMA_MAX_QUEUE", fmt.Sprintf("%d", threadCount))
}
req := api.GenerateRequest{
Model: "orca-mini",
Prompt: "write a long historical fiction story about christopher columbus. use at least 10 facts from his actual journey",
Options: map[string]interface{}{
"seed": 42,
"temperature": 0.0,
},
}
resp := []string{"explore", "discover", "ocean"}
// CPU mode takes much longer at the limit with a large queue setting
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
defer cancel()
client, _, cleanup := InitServerConnection(ctx, t)
defer cleanup()
require.NoError(t, PullIfMissing(ctx, client, req.Model))
// Context for the worker threads so we can shut them down
// embedCtx, embedCancel := context.WithCancel(ctx)
embedCtx := ctx
var genwg sync.WaitGroup
go func() {
genwg.Add(1)
defer genwg.Done()
slog.Info("Starting generate request")
DoGenerate(ctx, t, client, req, resp, 45*time.Second, 5*time.Second)
slog.Info("generate completed")
}()
// Give the generate a chance to get started before we start hammering on embed requests
time.Sleep(5 * time.Millisecond)
threadCount += 10 // Add a few extra to ensure we push the queue past its limit
busyCount := 0
resetByPeerCount := 0
canceledCount := 0
succesCount := 0
counterMu := sync.Mutex{}
var embedwg sync.WaitGroup
for i := 0; i < threadCount; i++ {
go func(i int) {
embedwg.Add(1)
defer embedwg.Done()
slog.Info("embed started", "id", i)
embedReq := api.EmbeddingRequest{
Model: req.Model,
Prompt: req.Prompt,
Options: req.Options,
}
// Fresh client for every request
client, _ = GetTestEndpoint()
resp, genErr := client.Embeddings(embedCtx, &embedReq)
counterMu.Lock()
defer counterMu.Unlock()
switch {
case genErr == nil:
succesCount++
require.Greater(t, len(resp.Embedding), 5) // somewhat arbitrary, but sufficient to be reasonable
case errors.Is(genErr, context.Canceled):
canceledCount++
case strings.Contains(genErr.Error(), "busy"):
busyCount++
case strings.Contains(genErr.Error(), "connection reset by peer"):
resetByPeerCount++
default:
require.NoError(t, genErr, "%d request failed", i)
}
slog.Info("embed finished", "id", i)
}(i)
}
genwg.Wait()
slog.Info("generate done, waiting for embeds")
embedwg.Wait()
require.Equal(t, resetByPeerCount, 0, "Connections reset by peer, have you updated your fd and socket limits?")
require.True(t, busyCount > 0, "no requests hit busy error but some should have")
require.True(t, canceledCount == 0, "no requests should have been canceled due to timeout")
slog.Info("embeds completed", "success", succesCount, "busy", busyCount, "reset", resetByPeerCount, "canceled", canceledCount)
}

View File

@@ -1032,7 +1032,7 @@ struct llama_server_context
slot.has_next_token = false;
}
if (!slot.cache_tokens.empty() && result.tok == llama_token_eos(model))
if (!slot.cache_tokens.empty() && llama_token_is_eog(model, result.tok))
{
slot.stopped_eos = true;
slot.has_next_token = false;
@@ -1144,12 +1144,15 @@ struct llama_server_context
res.result_json = json
{
{"content", tkn.text_to_send},
{"stop", false},
{"slot_id", slot.id},
{"multimodal", multimodal}
};
if (!llama_token_is_eog(model, tkn.tok)) {
res.result_json["content"] = tkn.text_to_send;
}
if (slot.sparams.n_probs > 0)
{
std::vector<completion_token_output> probs_output = {};
@@ -1183,8 +1186,6 @@ struct llama_server_context
{"model", params.model_alias},
{"tokens_predicted", slot.n_decoded},
{"tokens_evaluated", slot.n_prompt_tokens},
{"generation_settings", get_formated_generation(slot)},
{"prompt", slot.prompt},
{"truncated", slot.truncated},
{"stopped_eos", slot.stopped_eos},
{"stopped_word", slot.stopped_word},
@@ -2644,18 +2645,18 @@ static void server_params_parse(int argc, char **argv, server_params &sparams,
if (strncmp(sep, "int:", 4) == 0) {
sep += 4;
kvo.tag = LLAMA_KV_OVERRIDE_TYPE_INT;
kvo.int_value = std::atol(sep);
kvo.val_i64 = std::atol(sep);
} else if (strncmp(sep, "float:", 6) == 0) {
sep += 6;
kvo.tag = LLAMA_KV_OVERRIDE_TYPE_FLOAT;
kvo.float_value = std::atof(sep);
kvo.val_f64 = std::atof(sep);
} else if (strncmp(sep, "bool:", 5) == 0) {
sep += 5;
kvo.tag = LLAMA_KV_OVERRIDE_TYPE_BOOL;
if (std::strcmp(sep, "true") == 0) {
kvo.bool_value = true;
kvo.val_bool = true;
} else if (std::strcmp(sep, "false") == 0) {
kvo.bool_value = false;
kvo.val_bool = false;
} else {
fprintf(stderr, "error: Invalid boolean value for KV override: %s\n", argv[i]);
invalid_param = true;

View File

@@ -42,7 +42,7 @@ function init_vars {
"-DLLAMA_NATIVE=off"
)
$script:commonCpuDefs = @("-DCMAKE_POSITION_INDEPENDENT_CODE=on")
$script:ARCH = "amd64" # arm not yet supported.
$script:ARCH = $Env:PROCESSOR_ARCHITECTURE.ToLower()
$script:DIST_BASE = "${script:SRC_DIR}\dist\windows-${script:ARCH}\ollama_runners"
md "$script:DIST_BASE" -ea 0 > $null
if ($env:CGO_CFLAGS -contains "-g") {
@@ -213,11 +213,11 @@ function build_static() {
}
}
function build_cpu() {
function build_cpu($gen_arch) {
if ((-not "${env:OLLAMA_SKIP_CPU_GENERATE}" ) -and ((-not "${env:OLLAMA_CPU_TARGET}") -or ("${env:OLLAMA_CPU_TARGET}" -eq "cpu"))) {
# remaining llama.cpp builds use MSVC
init_vars
$script:cmakeDefs = $script:commonCpuDefs + @("-A", "x64", "-DLLAMA_AVX=off", "-DLLAMA_AVX2=off", "-DLLAMA_AVX512=off", "-DLLAMA_FMA=off", "-DLLAMA_F16C=off") + $script:cmakeDefs
$script:cmakeDefs = $script:commonCpuDefs + @("-A", $gen_arch, "-DLLAMA_AVX=off", "-DLLAMA_AVX2=off", "-DLLAMA_AVX512=off", "-DLLAMA_FMA=off", "-DLLAMA_F16C=off") + $script:cmakeDefs
$script:buildDir="../build/windows/${script:ARCH}/cpu"
$script:distDir="$script:DIST_BASE\cpu"
write-host "Building LCD CPU"
@@ -349,11 +349,15 @@ if ($($args.count) -eq 0) {
git_module_setup
apply_patches
build_static
build_cpu
build_cpu_avx
build_cpu_avx2
build_cuda
build_rocm
if ($script:ARCH -eq "arm64") {
build_cpu("ARM64")
} else { # amd64
build_cpu("x64")
build_cpu_avx
build_cpu_avx2
build_cuda
build_rocm
}
cleanup
write-host "`ngo generate completed. LLM runners: $(get-childitem -path $script:DIST_BASE)"

View File

@@ -4,6 +4,7 @@ package llm
// #cgo darwin,arm64 LDFLAGS: ${SRCDIR}/build/darwin/arm64_static/libllama.a -lstdc++
// #cgo darwin,amd64 LDFLAGS: ${SRCDIR}/build/darwin/x86_64_static/libllama.a -lstdc++
// #cgo windows,amd64 LDFLAGS: ${SRCDIR}/build/windows/amd64_static/libllama.a -static -lstdc++
// #cgo windows,arm64 LDFLAGS: ${SRCDIR}/build/windows/arm64_static/libllama.a -static -lstdc++
// #cgo linux,amd64 LDFLAGS: ${SRCDIR}/build/linux/x86_64_static/libllama.a -lstdc++
// #cgo linux,arm64 LDFLAGS: ${SRCDIR}/build/linux/arm64_static/libllama.a -lstdc++
// #include <stdlib.h>

View File

@@ -3,12 +3,11 @@ package llm
import (
"fmt"
"log/slog"
"os"
"strconv"
"github.com/ollama/ollama/api"
"github.com/ollama/ollama/format"
"github.com/ollama/ollama/gpu"
"github.com/ollama/ollama/server/envconfig"
)
// This algorithm looks for a complete fit to determine if we need to unload other models
@@ -50,15 +49,8 @@ func EstimateGPULayers(gpus []gpu.GpuInfo, ggml *GGML, projectors []string, opts
for _, info := range gpus {
memoryAvailable += info.FreeMemory
}
userLimit := os.Getenv("OLLAMA_MAX_VRAM")
if userLimit != "" {
avail, err := strconv.ParseUint(userLimit, 10, 64)
if err != nil {
slog.Error("invalid setting, ignoring", "OLLAMA_MAX_VRAM", userLimit, "error", err)
} else {
slog.Info("user override memory limit", "OLLAMA_MAX_VRAM", avail, "actual", memoryAvailable)
memoryAvailable = avail
}
if envconfig.MaxVRAM > 0 {
memoryAvailable = envconfig.MaxVRAM
}
slog.Debug("evaluating", "library", gpus[0].Library, "gpu_count", len(gpus), "available", format.HumanBytes2(memoryAvailable))
@@ -88,19 +80,24 @@ func EstimateGPULayers(gpus []gpu.GpuInfo, ggml *GGML, projectors []string, opts
graphFullOffload *= uint64(len(gpus))
graphPartialOffload *= uint64(len(gpus))
// on metal there's no partial offload overhead
if gpus[0].Library == "metal" {
graphPartialOffload = graphFullOffload
}
layers := ggml.Tensors().Layers()
// memoryRequiredTotal represents the memory required for full GPU offloading (all layers)
memoryRequiredTotal := memoryMinimum + graphFullOffload
memoryRequiredTotal := memoryMinimum + graphFullOffload + layers["blk.0"].size()
// memoryRequiredPartial represents the memory required for partial GPU offloading (n > 0, n < layers)
memoryRequiredPartial := memoryMinimum + graphPartialOffload
memoryRequiredPartial := memoryMinimum + graphPartialOffload + layers["blk.0"].size()
if memoryRequiredPartial > memoryAvailable {
slog.Debug("insufficient VRAM to load any model layers")
return 0, 0
}
layers := ggml.Tensors().Layers()
var memoryLayerOutput uint64
if layer, ok := layers["output_norm"]; ok {
memoryLayerOutput += layer.size()

View File

@@ -0,0 +1,24 @@
diff --git a/examples/llava/clip.cpp b/examples/llava/clip.cpp
index e3c9bcd4..b43f892d 100644
--- a/examples/llava/clip.cpp
+++ b/examples/llava/clip.cpp
@@ -573,14 +573,16 @@ static ggml_cgraph * clip_image_build_graph(clip_ctx * ctx, const clip_image_f32
struct ggml_tensor * embeddings = inp;
if (ctx->has_class_embedding) {
embeddings = ggml_new_tensor_3d(ctx0, GGML_TYPE_F32, hidden_size, num_positions, batch_size);
+ }
+ ggml_set_name(embeddings, "embeddings");
+ ggml_set_input(embeddings);
+
+ if (ctx->has_class_embedding) {
embeddings = ggml_acc(ctx0, embeddings, model.class_embedding,
embeddings->nb[1], embeddings->nb[2], embeddings->nb[3], 0);
embeddings = ggml_acc(ctx0, embeddings, inp,
embeddings->nb[1], embeddings->nb[2], embeddings->nb[3], model.class_embedding->nb[1]);
}
- ggml_set_name(embeddings, "embeddings");
- ggml_set_input(embeddings);
-
struct ggml_tensor * positions = ggml_new_tensor_1d(ctx0, GGML_TYPE_I32, num_positions);
ggml_set_name(positions, "positions");

View File

@@ -26,6 +26,7 @@ import (
"github.com/ollama/ollama/api"
"github.com/ollama/ollama/format"
"github.com/ollama/ollama/gpu"
"github.com/ollama/ollama/server/envconfig"
)
type LlamaServer interface {
@@ -73,8 +74,7 @@ func LoadModel(model string) (*GGML, error) {
func NewLlamaServer(gpus gpu.GpuInfoList, model string, ggml *GGML, adapters, projectors []string, opts api.Options) (LlamaServer, error) {
var err error
if opts.NumCtx > int(ggml.KV().ContextLength()) {
slog.Warn("requested context length is greater than model max context length", "requested", opts.NumCtx, "model", ggml.KV().ContextLength())
opts.NumCtx = int(ggml.KV().ContextLength())
slog.Warn("requested context length is greater than the model's training context window size", "requested", opts.NumCtx, "training size", ggml.KV().ContextLength())
}
if opts.NumCtx < 4 {
@@ -125,7 +125,7 @@ func NewLlamaServer(gpus gpu.GpuInfoList, model string, ggml *GGML, adapters, pr
} else {
servers = serversForGpu(gpus[0]) // All GPUs in the list are matching Library and Variant
}
demandLib := strings.Trim(os.Getenv("OLLAMA_LLM_LIBRARY"), "\"' ")
demandLib := envconfig.LLMLibrary
if demandLib != "" {
serverPath := availableServers[demandLib]
if serverPath == "" {
@@ -146,7 +146,7 @@ func NewLlamaServer(gpus gpu.GpuInfoList, model string, ggml *GGML, adapters, pr
"--batch-size", fmt.Sprintf("%d", opts.NumBatch),
"--embedding",
}
if debug := os.Getenv("OLLAMA_DEBUG"); debug != "" {
if envconfig.Debug {
params = append(params, "--log-format", "json")
} else {
params = append(params, "--log-disable")
@@ -156,7 +156,7 @@ func NewLlamaServer(gpus gpu.GpuInfoList, model string, ggml *GGML, adapters, pr
params = append(params, "--n-gpu-layers", fmt.Sprintf("%d", opts.NumGPU))
}
if debug := os.Getenv("OLLAMA_DEBUG"); debug != "" {
if envconfig.Debug {
params = append(params, "--verbose")
}
@@ -194,16 +194,15 @@ func NewLlamaServer(gpus gpu.GpuInfoList, model string, ggml *GGML, adapters, pr
params = append(params, "--numa")
}
// "--cont-batching", // TODO - doesn't seem to have any noticeable perf change for multiple requests
numParallel := 1
if onp := os.Getenv("OLLAMA_NUM_PARALLEL"); onp != "" {
numParallel, err = strconv.Atoi(onp)
if err != nil || numParallel <= 0 {
err = fmt.Errorf("invalid OLLAMA_NUM_PARALLEL=%s must be greater than zero - %w", onp, err)
slog.Error("misconfiguration", "error", err)
return nil, err
}
numParallel := envconfig.NumParallel
// TODO (jmorganca): multimodal models don't support parallel yet
// see https://github.com/ollama/ollama/issues/4165
if len(projectors) > 0 {
numParallel = 1
slog.Warn("multimodal models don't support parallel requests yet")
}
params = append(params, "--parallel", fmt.Sprintf("%d", numParallel))
for i := 0; i < len(servers); i++ {
@@ -234,13 +233,13 @@ func NewLlamaServer(gpus gpu.GpuInfoList, model string, ggml *GGML, adapters, pr
if runtime.GOOS == "windows" {
pathEnv = "PATH"
}
// append the server directory to LD_LIBRARY_PATH/PATH
// prepend the server directory to LD_LIBRARY_PATH/PATH
libraryPaths := []string{dir}
if libraryPath, ok := os.LookupEnv(pathEnv); ok {
// Append our runner directory to the path
// This will favor system libraries over our bundled library dependencies
libraryPaths = append(filepath.SplitList(libraryPath), libraryPaths...)
libraryPaths = append(libraryPaths, filepath.SplitList(libraryPath)...)
}
// Note: we always put the dependency path first
@@ -276,15 +275,31 @@ func NewLlamaServer(gpus gpu.GpuInfoList, model string, ggml *GGML, adapters, pr
sem: semaphore.NewWeighted(int64(numParallel)),
}
libEnv := fmt.Sprintf("%s=%s", pathEnv, strings.Join(libraryPaths, string(filepath.ListSeparator)))
s.cmd.Env = append(os.Environ(), libEnv)
s.cmd.Env = os.Environ()
s.cmd.Stdout = os.Stdout
s.cmd.Stderr = s.status
// TODO - multiple GPU selection logic...
key, val := gpu.GpuInfoList(gpus).GetVisibleDevicesEnv()
if key != "" {
s.cmd.Env = append(s.cmd.Env, key+"="+val)
visibleDevicesEnv, visibleDevicesEnvVal := gpu.GpuInfoList(gpus).GetVisibleDevicesEnv()
pathEnvVal := strings.Join(libraryPaths, string(filepath.ListSeparator))
// Update or add the path and visible devices variable with our adjusted version
pathNeeded := true
devicesNeeded := visibleDevicesEnv != ""
for i := range s.cmd.Env {
cmp := strings.SplitN(s.cmd.Env[i], "=", 2)
if strings.EqualFold(cmp[0], pathEnv) {
s.cmd.Env[i] = pathEnv + "=" + pathEnvVal
pathNeeded = false
} else if devicesNeeded && strings.EqualFold(cmp[0], visibleDevicesEnv) {
s.cmd.Env[i] = visibleDevicesEnv + "=" + visibleDevicesEnvVal
devicesNeeded = false
}
}
if pathNeeded {
s.cmd.Env = append(s.cmd.Env, pathEnv+"="+pathEnvVal)
}
if devicesNeeded {
s.cmd.Env = append(s.cmd.Env, visibleDevicesEnv+"="+visibleDevicesEnvVal)
}
slog.Info("starting llama server", "cmd", s.cmd.String())
@@ -301,19 +316,6 @@ func NewLlamaServer(gpus gpu.GpuInfoList, model string, ggml *GGML, adapters, pr
continue
}
// reap subprocess when it exits
go func() {
// Exit status managed via getServerStatus
_ = s.cmd.Wait()
}()
// TODO - make sure this is all wired up correctly
// if err = s.WaitUntilRunning(); err != nil {
// slog.Error("error starting llama server", "server", servers[i], "error", err)
// s.Close()
// finalErr = err
// continue
// }
return s, nil
}
@@ -345,7 +347,7 @@ type ServerStatus int
const ( // iota is reset to 0
ServerStatusReady ServerStatus = iota
ServerStatusNoSlotsAvaialble
ServerStatusNoSlotsAvailable
ServerStatusLoadingModel
ServerStatusNotResponding
ServerStatusError
@@ -355,7 +357,7 @@ func (s ServerStatus) ToString() string {
switch s {
case ServerStatusReady:
return "llm server ready"
case ServerStatusNoSlotsAvaialble:
case ServerStatusNoSlotsAvailable:
return "llm busy - no slots available"
case ServerStatusLoadingModel:
return "llm server loading model"
@@ -412,7 +414,7 @@ func (s *llmServer) getServerStatus(ctx context.Context) (ServerStatus, error) {
case "ok":
return ServerStatusReady, nil
case "no slot available":
return ServerStatusNoSlotsAvaialble, nil
return ServerStatusNoSlotsAvailable, nil
case "loading model":
return ServerStatusLoadingModel, nil
default:
@@ -420,6 +422,29 @@ func (s *llmServer) getServerStatus(ctx context.Context) (ServerStatus, error) {
}
}
// getServerStatusRetry will retry if ServerStatusNoSlotsAvailable is received
func (s *llmServer) getServerStatusRetry(ctx context.Context) (ServerStatus, error) {
var retries int
for {
status, err := s.getServerStatus(ctx)
if err != nil {
return status, err
}
if status == ServerStatusNoSlotsAvailable {
if retries >= 10 {
return status, fmt.Errorf("no slots available after %d retries", retries)
}
time.Sleep(5 * time.Millisecond)
retries++
continue
}
return status, nil
}
}
func (s *llmServer) Ping(ctx context.Context) error {
_, err := s.getServerStatus(ctx)
if err != nil {
@@ -517,7 +542,6 @@ ws ::= ([ \t\n] ws)?
`
const maxBufferSize = 512 * format.KiloByte
const maxRetries = 3
type ImageData struct {
Data []byte `json:"data"`
@@ -593,7 +617,7 @@ func (s *llmServer) Completion(ctx context.Context, req CompletionRequest, fn fu
}
// Make sure the server is ready
status, err := s.getServerStatus(ctx)
status, err := s.getServerStatusRetry(ctx)
if err != nil {
return err
} else if status != ServerStatusReady {
@@ -607,133 +631,113 @@ func (s *llmServer) Completion(ctx context.Context, req CompletionRequest, fn fu
}
}
retryDelay := 100 * time.Microsecond
for retries := 0; retries < maxRetries; retries++ {
if retries > 0 {
time.Sleep(retryDelay) // wait before retrying
retryDelay *= 2 // exponential backoff
}
// Handling JSON marshaling with special characters unescaped.
buffer := &bytes.Buffer{}
enc := json.NewEncoder(buffer)
enc.SetEscapeHTML(false)
// Handling JSON marshaling with special characters unescaped.
buffer := &bytes.Buffer{}
enc := json.NewEncoder(buffer)
enc.SetEscapeHTML(false)
if err := enc.Encode(request); err != nil {
return fmt.Errorf("failed to marshal data: %v", err)
}
if err := enc.Encode(request); err != nil {
return fmt.Errorf("failed to marshal data: %v", err)
}
endpoint := fmt.Sprintf("http://127.0.0.1:%d/completion", s.port)
serverReq, err := http.NewRequestWithContext(ctx, http.MethodPost, endpoint, buffer)
if err != nil {
return fmt.Errorf("error creating POST request: %v", err)
}
serverReq.Header.Set("Content-Type", "application/json")
endpoint := fmt.Sprintf("http://127.0.0.1:%d/completion", s.port)
req, err := http.NewRequestWithContext(ctx, http.MethodPost, endpoint, buffer)
res, err := http.DefaultClient.Do(serverReq)
if err != nil {
return fmt.Errorf("POST predict: %v", err)
}
defer res.Body.Close()
if res.StatusCode >= 400 {
bodyBytes, err := io.ReadAll(res.Body)
if err != nil {
return fmt.Errorf("error creating POST request: %v", err)
return fmt.Errorf("failed reading llm error response: %w", err)
}
req.Header.Set("Content-Type", "application/json")
log.Printf("llm predict error: %s", bodyBytes)
return fmt.Errorf("%s", bodyBytes)
}
resp, err := http.DefaultClient.Do(req)
if err != nil {
return fmt.Errorf("POST predict: %v", err)
}
defer resp.Body.Close()
scanner := bufio.NewScanner(res.Body)
buf := make([]byte, 0, maxBufferSize)
scanner.Buffer(buf, maxBufferSize)
if resp.StatusCode >= 400 {
bodyBytes, err := io.ReadAll(resp.Body)
if err != nil {
return fmt.Errorf("failed reading llm error response: %w", err)
// keep track of the last token generated, this is used to abort if the model starts looping
var lastToken string
var tokenRepeat int
for scanner.Scan() {
select {
case <-ctx.Done():
// This handles the request cancellation
return ctx.Err()
default:
line := scanner.Bytes()
if len(line) == 0 {
continue
}
log.Printf("llm predict error: %s", bodyBytes)
return fmt.Errorf("%s", bodyBytes)
}
scanner := bufio.NewScanner(resp.Body)
buf := make([]byte, 0, maxBufferSize)
scanner.Buffer(buf, maxBufferSize)
evt, ok := bytes.CutPrefix(line, []byte("data: "))
if !ok {
return fmt.Errorf("error parsing llm response stream: %s", line)
}
retryNeeded := false
// keep track of the last token generated, this is used to abort if the model starts looping
var lastToken string
var tokenRepeat int
var c completion
if err := json.Unmarshal(evt, &c); err != nil {
return fmt.Errorf("error unmarshaling llm prediction response: %v", err)
}
for scanner.Scan() {
select {
case <-ctx.Done():
// This handles the request cancellation
return ctx.Err()
switch {
case strings.TrimSpace(c.Content) == lastToken:
tokenRepeat++
default:
line := scanner.Bytes()
if len(line) == 0 {
continue
}
// try again on slot unavailable
if bytes.Contains(line, []byte("slot unavailable")) {
retryNeeded = true
break
}
evt, ok := bytes.CutPrefix(line, []byte("data: "))
if !ok {
return fmt.Errorf("error parsing llm response stream: %s", line)
}
var c completion
if err := json.Unmarshal(evt, &c); err != nil {
return fmt.Errorf("error unmarshaling llm prediction response: %v", err)
}
switch {
case strings.TrimSpace(c.Content) == lastToken:
tokenRepeat++
default:
lastToken = strings.TrimSpace(c.Content)
tokenRepeat = 0
}
// 30 picked as an arbitrary max token repeat limit, modify as needed
if tokenRepeat > 30 {
slog.Debug("prediction aborted, token repeat limit reached")
return ctx.Err()
}
if c.Content != "" {
fn(CompletionResponse{
Content: c.Content,
})
}
if c.Stop {
fn(CompletionResponse{
Done: true,
PromptEvalCount: c.Timings.PromptN,
PromptEvalDuration: parseDurationMs(c.Timings.PromptMS),
EvalCount: c.Timings.PredictedN,
EvalDuration: parseDurationMs(c.Timings.PredictedMS),
})
return nil
}
lastToken = strings.TrimSpace(c.Content)
tokenRepeat = 0
}
}
if err := scanner.Err(); err != nil {
if strings.Contains(err.Error(), "unexpected EOF") {
s.Close()
msg := ""
if s.status != nil && s.status.LastErrMsg != "" {
msg = s.status.LastErrMsg
}
return fmt.Errorf("an unknown error was encountered while running the model %s", msg)
// 30 picked as an arbitrary max token repeat limit, modify as needed
if tokenRepeat > 30 {
slog.Debug("prediction aborted, token repeat limit reached")
return ctx.Err()
}
return fmt.Errorf("error reading llm response: %v", err)
}
if !retryNeeded {
return nil // success
if c.Content != "" {
fn(CompletionResponse{
Content: c.Content,
})
}
if c.Stop {
fn(CompletionResponse{
Done: true,
PromptEvalCount: c.Timings.PromptN,
PromptEvalDuration: parseDurationMs(c.Timings.PromptMS),
EvalCount: c.Timings.PredictedN,
EvalDuration: parseDurationMs(c.Timings.PredictedMS),
})
return nil
}
}
}
// should never reach here ideally
return fmt.Errorf("max retries exceeded")
if err := scanner.Err(); err != nil {
if strings.Contains(err.Error(), "unexpected EOF") {
s.Close()
msg := ""
if s.status != nil && s.status.LastErrMsg != "" {
msg = s.status.LastErrMsg
}
return fmt.Errorf("an unknown error was encountered while running the model %s", msg)
}
return fmt.Errorf("error reading llm response: %v", err)
}
return nil
}
type EmbeddingRequest struct {
@@ -750,8 +754,9 @@ func (s *llmServer) Embedding(ctx context.Context, prompt string) ([]float64, er
return nil, err
}
defer s.sem.Release(1)
// Make sure the server is ready
status, err := s.getServerStatus(ctx)
status, err := s.getServerStatusRetry(ctx)
if err != nil {
return nil, err
} else if status != ServerStatusReady {
@@ -806,7 +811,7 @@ func (s *llmServer) Tokenize(ctx context.Context, content string) ([]int, error)
status, err := s.getServerStatus(ctx)
if err != nil {
return nil, err
} else if status != ServerStatusReady && status != ServerStatusNoSlotsAvaialble {
} else if status != ServerStatusReady && status != ServerStatusNoSlotsAvailable {
return nil, fmt.Errorf("unexpected server status: %s", status.ToString())
}
@@ -858,7 +863,7 @@ func (s *llmServer) Detokenize(ctx context.Context, tokens []int) (string, error
status, err := s.getServerStatus(ctx)
if err != nil {
return "", err
} else if status != ServerStatusReady && status != ServerStatusNoSlotsAvaialble {
} else if status != ServerStatusReady && status != ServerStatusNoSlotsAvailable {
return "", fmt.Errorf("unexpected server status: %s", status.ToString())
}
@@ -900,7 +905,13 @@ func (s *llmServer) Detokenize(ctx context.Context, tokens []int) (string, error
func (s *llmServer) Close() error {
if s.cmd != nil {
slog.Debug("stopping llama server")
return s.cmd.Process.Kill()
if err := s.cmd.Process.Kill(); err != nil {
return err
}
_ = s.cmd.Wait()
slog.Debug("llama server stopped")
}
return nil

16
macapp/.eslintrc.json Normal file
View File

@@ -0,0 +1,16 @@
{
"env": {
"browser": true,
"es6": true,
"node": true
},
"extends": [
"eslint:recommended",
"plugin:@typescript-eslint/eslint-recommended",
"plugin:@typescript-eslint/recommended",
"plugin:import/recommended",
"plugin:import/electron",
"plugin:import/typescript"
],
"parser": "@typescript-eslint/parser"
}

92
macapp/.gitignore vendored Normal file
View File

@@ -0,0 +1,92 @@
# Logs
logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
lerna-debug.log*
# Diagnostic reports (https://nodejs.org/api/report.html)
report.[0-9]*.[0-9]*.[0-9]*.[0-9]*.json
# Runtime data
pids
*.pid
*.seed
*.pid.lock
.DS_Store
# Directory for instrumented libs generated by jscoverage/JSCover
lib-cov
# Coverage directory used by tools like istanbul
coverage
*.lcov
# nyc test coverage
.nyc_output
# node-waf configuration
.lock-wscript
# Compiled binary addons (https://nodejs.org/api/addons.html)
build/Release
# Dependency directories
node_modules/
jspm_packages/
# TypeScript v1 declaration files
typings/
# TypeScript cache
*.tsbuildinfo
# Optional npm cache directory
.npm
# Optional eslint cache
.eslintcache
# Optional REPL history
.node_repl_history
# Output of 'npm pack'
*.tgz
# Yarn Integrity file
.yarn-integrity
# dotenv environment variables file
.env
.env.test
# parcel-bundler cache (https://parceljs.org/)
.cache
# next.js build output
.next
# nuxt.js build output
.nuxt
# vuepress build output
.vuepress/dist
# Serverless directories
.serverless/
# FuseBox cache
.fusebox/
# DynamoDB Local files
.dynamodb/
# Webpack
.webpack/
# Vite
.vite/
# Electron-Forge
out/

21
macapp/README.md Normal file
View File

@@ -0,0 +1,21 @@
# Desktop
This app builds upon Ollama to provide a desktop experience for running models.
## Developing
First, build the `ollama` binary:
```
cd ..
go build .
```
Then run the desktop app with `npm start`:
```
cd macapp
npm install
npm start
```

View File

Binary file not shown.

After

Width:  |  Height:  |  Size: 402 B

View File

Binary file not shown.

After

Width:  |  Height:  |  Size: 741 B

View File

Binary file not shown.

After

Width:  |  Height:  |  Size: 440 B

View File

Binary file not shown.

After

Width:  |  Height:  |  Size: 763 B

View File

Binary file not shown.

After

Width:  |  Height:  |  Size: 447 B

View File

Binary file not shown.

After

Width:  |  Height:  |  Size: 891 B

View File

Binary file not shown.

After

Width:  |  Height:  |  Size: 443 B

Some files were not shown because too many files have changed in this diff Show More