improve scratch buffer estimates

2026-02-09 15:12:59 -05:00 · 2024-01-19 13:24:24 -05:00
239 changed files with 4845 additions and 53084 deletions
--- a/.dockerignore
+++ b/.dockerignore
@@ -1,9 +1,8 @@
 .vscode
 ollama
 app
-macapp
 dist
 llm/llama.cpp
 .env
 .cache
-test_data
+test_data
--- a/.gitattributes
+++ b/.gitattributes
@@ -1 +0,0 @@
-llm/ext_server/* linguist-vendored
--- a/.github/ISSUE_TEMPLATE/10_model_request.yml
+++ b/.github/ISSUE_TEMPLATE/10_model_request.yml
@@ -1,18 +0,0 @@
-name: Model request
-description: Request a new model for the library
-labels: [mr]
-body:
-  - type: markdown
-    attributes:
-      value: |
-        Please check if your Model request is [already available](https://ollama.com/search) or that you cannot [import it](https://github.com/ollama/ollama/blob/main/docs/import.md#import-a-model) yourself.
-        Tell us about which Model you'd like to see in the library!
-  - type: textarea
-    id: problem
-    attributes:
-      label: What model would you like?
-      description: Please provide a link to the model.
-  - type: markdown
-    attributes:
-      value: |
-        Thanks for filing a model request!
--- a/.github/ISSUE_TEMPLATE/20_feature_request.yml
+++ b/.github/ISSUE_TEMPLATE/20_feature_request.yml
@@ -1,41 +0,0 @@
-name: Feature request
-description: Propose a new feature
-labels: [needs-triage, fr]
-body:
-  - type: markdown
-    attributes:
-      value: |
-        Please check if your feature request is [already filed](https://github.com/ollama/ollama/issues).
-        Tell us about your idea!
-  - type: textarea
-    id: problem
-    attributes:
-      label: What are you trying to do?
-      description: Tell us about the problem you're trying to solve.
-    validations:
-      required: false
-  - type: textarea
-    id: solution
-    attributes:
-      label: How should we solve this?
-      description: If you have an idea of how you'd like to see this feature work, let us know.
-    validations:
-      required: false
-  - type: textarea
-    id: alternative
-    attributes:
-      label: What is the impact of not solving this?
-      description: (How) Are you currently working around the issue?
-    validations:
-      required: false
-  - type: textarea
-    id: context
-    attributes:
-      label: Anything else?
-      description: Any additional context to share, e.g., links
-    validations:
-      required: false
-  - type: markdown
-    attributes:
-      value: |
-        Thanks for filing a feature request!
--- a/.github/ISSUE_TEMPLATE/90_bug_report.yml
+++ b/.github/ISSUE_TEMPLATE/90_bug_report.yml
@@ -1,125 +0,0 @@
-name: Bug report
-description: File a bug report. If you need help, please join our Discord server.
-labels: [needs-triage, bug]
-body:
-  - type: markdown
-    attributes:
-      value: |
-        Please check if your bug is [already filed](https://github.com/ollama/ollama/issues) before filing a new one.
-  - type: textarea
-    id: what-happened
-    attributes:
-      label: What is the issue?
-      description: What happened? What did you expect to happen?
-    validations:
-      required: true
-  - type: textarea
-    id: what-was-expected
-    attributes:
-      label: What did you expect to see?
-      description: What did you expect to see/happen instead?
-    validations:
-      required: false
-  - type: textarea
-    id: steps
-    attributes:
-      label: Steps to reproduce
-      description: What are the steps you took that hit this issue?
-    validations:
-      required: false
-  - type: textarea
-    id: changes
-    attributes:
-      label: Are there any recent changes that introduced the issue?
-      description: If so, what are those changes?
-    validations:
-      required: false
-  - type: dropdown
-    id: os
-    attributes:
-      label: OS
-      description: What OS are you using? You may select more than one.
-      multiple: true
-      options:
-        - Linux
-        - macOS
-        - Windows
-        - Other
-    validations:
-      required: false
-  - type: dropdown
-    id: architecture
-    attributes:
-      label: Architecture
-      description: What architecture are you using? You may select more than one.
-      multiple: true
-      options:
-        - arm64
-        - amd64
-        - x86
-        - Other
-  - type: dropdown
-    id: platform
-    attributes:
-      label: Platform
-      description: What platform are you using? You may select more than one.
-      multiple: true
-      options:
-        - Docker
-        - WSL
-        - WSL2
-    validations:
-      required: false
-  - type: input
-    id: ollama-version
-    attributes:
-      label: Ollama version
-      description: What Ollama version are you using? (`ollama --version`)
-      placeholder: e.g., 1.14.4
-    validations:
-      required: false
-  - type: dropdown
-    id: gpu
-    attributes:
-      label: GPU
-      description: What GPU, if any, are you using? You may select more than one.
-      multiple: true
-      options:
-        - Nvidia
-        - AMD
-        - Intel
-        - Apple
-        - Other
-    validations:
-      required: false
-  - type: textarea
-    id: gpu-info
-    attributes:
-      label: GPU info
-      description: What GPU info do you have? (`nvidia-smi`, `rocminfo`, `system_profiler SPDisplaysDataType`, etc.)
-    validations:
-      required: false
-  - type: dropdown
-    id: cpu
-    attributes:
-      label: CPU
-      description: What CPU are you using? You may select more than one.
-      multiple: true
-      options:
-        - Intel
-        - AMD
-        - Apple
-        - Other
-    validations:
-      required: false
-  - type: textarea
-    id: other-software
-    attributes:
-      label: Other software
-      description: What other software are you using that might be related to this issue?
-    validations:
-      required: false
-  - type: markdown
-    attributes:
-      value: |
-        Thanks for filing a bug report!
--- a/.github/ISSUE_TEMPLATE/config.yml
+++ b/.github/ISSUE_TEMPLATE/config.yml
@@ -1,8 +0,0 @@
-blank_issues_enabled: true
-contact_links:
-  - name: Help
-    url: https://discord.com/invite/ollama
-    about: Please join our Discord server for help using Ollama
-  - name: Troubleshooting
-    url: https://github.com/ollama/ollama/blob/main/docs/faq.md#faq
-    about: See the FAQ for common issues and solutions
--- a/.github/workflows/latest.yaml
+++ b/.github/workflows/latest.yaml
@@ -1,24 +0,0 @@
-name: latest
-
-on:
-  release:
-    types: [released]
-
-jobs:
-  update-latest:
-    environment: release
-    runs-on: linux
-    steps:
-      - uses: actions/checkout@v4
-      - name: Login to Docker Hub
-        uses: docker/login-action@v3
-        with:
-          username: ${{ vars.DOCKER_USER }}
-          password: ${{ secrets.DOCKER_ACCESS_TOKEN }}
-      - name: Tag images as latest
-        env:
-          PUSH: "1"
-        shell: bash
-        run: |
-          export "VERSION=${GITHUB_REF_NAME#v}"
-          ./scripts/tag_latest.sh
--- a/.github/workflows/release.yaml
+++ b/.github/workflows/release.yaml
@@ -1,473 +0,0 @@
-name: release
-
-on:
-  push:
-    tags:
-      - 'v*'
-
-jobs:
-  # Full build of the Mac assets
-  build-darwin:
-    runs-on: macos-12
-    environment: release
-    steps:
-      - uses: actions/checkout@v4
-      - name: Set Version
-        shell: bash
-        run: |
-          echo "VERSION=${GITHUB_REF_NAME#v}" >> $GITHUB_ENV
-          echo "RELEASE_VERSION=$(echo ${GITHUB_REF_NAME} | cut -f1 -d-)" >> $GITHUB_ENV
-      - name: key
-        env:
-          MACOS_SIGNING_KEY: ${{ secrets.MACOS_SIGNING_KEY }}
-          MACOS_SIGNING_KEY_PASSWORD: ${{ secrets.MACOS_SIGNING_KEY_PASSWORD }}
-        run: |
-          echo $MACOS_SIGNING_KEY | base64 --decode > certificate.p12
-          security create-keychain -p password build.keychain
-          security default-keychain -s build.keychain
-          security unlock-keychain -p password build.keychain
-          security import certificate.p12 -k build.keychain -P $MACOS_SIGNING_KEY_PASSWORD -T /usr/bin/codesign
-          security set-key-partition-list -S apple-tool:,apple:,codesign: -s -k password build.keychain
-      - uses: actions/setup-go@v5
-        with:
-          go-version-file: go.mod
-          cache: true
-      - name: Build Darwin
-        env:
-          APPLE_IDENTITY: ${{ secrets.APPLE_IDENTITY }}
-          APPLE_PASSWORD: ${{ secrets.APPLE_PASSWORD }}
-          APPLE_TEAM_ID: ${{ vars.APPLE_TEAM_ID }}
-          APPLE_ID: ${{ vars.APPLE_ID }}
-          SDKROOT: /Applications/Xcode_13.4.1.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk
-          DEVELOPER_DIR: /Applications/Xcode_13.4.1.app/Contents/Developer
-        run: |
-          ./scripts/build_darwin.sh
-
-      - uses: actions/upload-artifact@v4
-        with:
-          name: dist-darwin
-          path: |
-            dist/*arwin*
-            !dist/*-cov
-
-  # Windows builds take a long time to both install the dependencies and build, so parallelize
-  # CPU generation step
-  generate-windows-cpu:
-    environment: release
-    runs-on: windows
-    env:
-      KEY_CONTAINER: ${{ vars.KEY_CONTAINER }}
-    steps:
-      - uses: actions/checkout@v4
-      - name: Set Version
-        shell: bash
-        run: echo "VERSION=${GITHUB_REF_NAME#v}" >> $GITHUB_ENV
-      - uses: 'google-github-actions/auth@v2'
-        with:
-          project_id: 'ollama'
-          credentials_json: '${{ secrets.GOOGLE_SIGNING_CREDENTIALS }}'
-      - run: echo "${{ vars.OLLAMA_CERT }}" > ollama_inc.crt
-      - name: install Windows SDK 8.1 to get signtool
-        run: |
-          $ErrorActionPreference = "Stop"
-          write-host "downloading SDK"
-          Invoke-WebRequest -Uri "https://go.microsoft.com/fwlink/p/?LinkId=323507" -OutFile "${env:RUNNER_TEMP}\sdksetup.exe"
-          Start-Process "${env:RUNNER_TEMP}\sdksetup.exe" -ArgumentList @("/q") -NoNewWindow -Wait
-          write-host "Win SDK 8.1 installed"
-          gci -path 'C:\Program Files (x86)\Windows Kits\' -r -fi 'signtool.exe'
-      - name: install signing plugin
-        run: |
-          $ErrorActionPreference = "Stop"
-          write-host "downloading plugin"
-          Invoke-WebRequest -Uri "https://github.com/GoogleCloudPlatform/kms-integrations/releases/download/cng-v1.0/kmscng-1.0-windows-amd64.zip" -OutFile "${env:RUNNER_TEMP}\plugin.zip"
-          Expand-Archive -Path "${env:RUNNER_TEMP}\plugin.zip" -DestinationPath ${env:RUNNER_TEMP}\plugin\
-          write-host "Installing plugin"
-          & "${env:RUNNER_TEMP}\plugin\*\kmscng.msi" /quiet
-          write-host "plugin installed"
-      - uses: actions/setup-go@v5
-        with:
-          go-version-file: go.mod
-          cache: true
-      - run: go get ./...
-      - run: |
-          $gopath=(get-command go).source | split-path -parent
-          & "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\Tools\Launch-VsDevShell.ps1"
-          cd $env:GITHUB_WORKSPACE
-          $env:CMAKE_SYSTEM_VERSION="10.0.22621.0"
-          $env:PATH="$gopath;$env:PATH"
-          go generate -x ./...
-        name: go generate
-      - uses: actions/upload-artifact@v4
-        with:
-          name: generate-windows-cpu
-          path: |
-            llm/build/**/bin/*
-            llm/build/**/*.a
-
-  # ROCm generation step
-  generate-windows-rocm:
-    environment: release
-    runs-on: windows
-    env:
-      KEY_CONTAINER: ${{ vars.KEY_CONTAINER }}
-    steps:
-      - uses: actions/checkout@v4
-      - name: Set Version
-        shell: bash
-        run: echo "VERSION=${GITHUB_REF_NAME#v}" >> $GITHUB_ENV
-      - uses: 'google-github-actions/auth@v2'
-        with:
-          project_id: 'ollama'
-          credentials_json: '${{ secrets.GOOGLE_SIGNING_CREDENTIALS }}'
-      - run: echo "${{ vars.OLLAMA_CERT }}" > ollama_inc.crt
-      - name: install Windows SDK 8.1 to get signtool
-        run: |
-          $ErrorActionPreference = "Stop"
-          write-host "downloading SDK"
-          Invoke-WebRequest -Uri "https://go.microsoft.com/fwlink/p/?LinkId=323507" -OutFile "${env:RUNNER_TEMP}\sdksetup.exe"
-          Start-Process "${env:RUNNER_TEMP}\sdksetup.exe" -ArgumentList @("/q") -NoNewWindow -Wait
-          write-host "Win SDK 8.1 installed"
-          gci -path 'C:\Program Files (x86)\Windows Kits\' -r -fi 'signtool.exe'
-      - name: install signing plugin
-        run: |
-          $ErrorActionPreference = "Stop"
-          write-host "downloading plugin"
-          Invoke-WebRequest -Uri "https://github.com/GoogleCloudPlatform/kms-integrations/releases/download/cng-v1.0/kmscng-1.0-windows-amd64.zip" -OutFile "${env:RUNNER_TEMP}\plugin.zip"
-          Expand-Archive -Path "${env:RUNNER_TEMP}\plugin.zip" -DestinationPath ${env:RUNNER_TEMP}\plugin\
-          write-host "Installing plugin"
-          & "${env:RUNNER_TEMP}\plugin\*\kmscng.msi" /quiet
-          write-host "plugin installed"
-      - uses: actions/setup-go@v5
-        with:
-          go-version-file: go.mod
-          cache: true
-      - name: 'Install ROCm'
-        run: |
-          $ErrorActionPreference = "Stop"
-          write-host "downloading AMD HIP Installer"
-          Invoke-WebRequest -Uri "https://download.amd.com/developer/eula/rocm-hub/AMD-Software-PRO-Edition-23.Q4-WinSvr2022-For-HIP.exe" -OutFile "${env:RUNNER_TEMP}\rocm-install.exe"
-          write-host "Installing AMD HIP"
-          Start-Process "${env:RUNNER_TEMP}\rocm-install.exe" -ArgumentList '-install' -NoNewWindow -Wait
-          write-host "Completed AMD HIP"
-      - name: 'Verify ROCm'
-        run: |
-          & 'C:\Program Files\AMD\ROCm\*\bin\clang.exe' --version
-      - run: go get ./...
-      - run: |
-          $gopath=(get-command go).source | split-path -parent
-          & "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\Tools\Launch-VsDevShell.ps1"
-          cd $env:GITHUB_WORKSPACE
-          $env:CMAKE_SYSTEM_VERSION="10.0.22621.0"
-          $env:PATH="$gopath;$env:PATH"
-          $env:OLLAMA_SKIP_CPU_GENERATE="1"
-          $env:HIP_PATH=$(Resolve-Path 'C:\Program Files\AMD\ROCm\*\bin\clang.exe' | split-path | split-path)
-          go generate -x ./...
-        name: go generate
-      - name: 'gather rocm dependencies'
-        run: |
-          $HIP_PATH=$(Resolve-Path 'C:\Program Files\AMD\ROCm\*\bin\clang.exe' | split-path | split-path)
-          md "dist\deps\bin\rocblas\library"
-          cp "${HIP_PATH}\bin\hipblas.dll" "dist\deps\bin\"
-          cp "${HIP_PATH}\bin\rocblas.dll" "dist\deps\bin\"
-          cp "${HIP_PATH}\bin\rocblas\library\*" "dist\deps\bin\rocblas\library\"
-      - uses: actions/upload-artifact@v4
-        with:
-          name: generate-windows-rocm
-          path: llm/build/**/bin/*
-      - uses: actions/upload-artifact@v4
-        with:
-          name: windows-rocm-deps
-          path: dist/deps/*
-
-  # CUDA generation step
-  generate-windows-cuda:
-    environment: release
-    runs-on: windows
-    env:
-      KEY_CONTAINER: ${{ vars.KEY_CONTAINER }}
-    steps:
-      - uses: actions/checkout@v4
-      - name: Set Version
-        shell: bash
-        run: echo "VERSION=${GITHUB_REF_NAME#v}" >> $GITHUB_ENV
-      - uses: 'google-github-actions/auth@v2'
-        with:
-          project_id: 'ollama'
-          credentials_json: '${{ secrets.GOOGLE_SIGNING_CREDENTIALS }}'
-      - run: echo "${{ vars.OLLAMA_CERT }}" > ollama_inc.crt
-      - name: install Windows SDK 8.1 to get signtool
-        run: |
-          $ErrorActionPreference = "Stop"
-          write-host "downloading SDK"
-          Invoke-WebRequest -Uri "https://go.microsoft.com/fwlink/p/?LinkId=323507" -OutFile "${env:RUNNER_TEMP}\sdksetup.exe"
-          Start-Process "${env:RUNNER_TEMP}\sdksetup.exe" -ArgumentList @("/q") -NoNewWindow -Wait
-          write-host "Win SDK 8.1 installed"
-          gci -path 'C:\Program Files (x86)\Windows Kits\' -r -fi 'signtool.exe'
-      - name: install signing plugin
-        run: |
-          $ErrorActionPreference = "Stop"
-          write-host "downloading plugin"
-          Invoke-WebRequest -Uri "https://github.com/GoogleCloudPlatform/kms-integrations/releases/download/cng-v1.0/kmscng-1.0-windows-amd64.zip" -OutFile "${env:RUNNER_TEMP}\plugin.zip"
-          Expand-Archive -Path "${env:RUNNER_TEMP}\plugin.zip" -DestinationPath ${env:RUNNER_TEMP}\plugin\
-          write-host "Installing plugin"
-          & "${env:RUNNER_TEMP}\plugin\*\kmscng.msi" /quiet
-          write-host "plugin installed"
-      - uses: actions/setup-go@v5
-        with:
-          go-version-file: go.mod
-          cache: true
-      - name: 'Install CUDA'
-        run: |
-          $ErrorActionPreference = "Stop"
-          write-host "downloading CUDA Installer"
-          Invoke-WebRequest -Uri "https://developer.download.nvidia.com/compute/cuda/11.3.1/local_installers/cuda_11.3.1_465.89_win10.exe" -OutFile "${env:RUNNER_TEMP}\cuda-install.exe"
-          write-host "Installing CUDA"
-          Start-Process "${env:RUNNER_TEMP}\cuda-install.exe" -ArgumentList '-s' -NoNewWindow -Wait
-          write-host "Completed CUDA"
-          $cudaPath=((resolve-path "c:\Program Files\NVIDIA*\CUDA\v*\bin\nvcc.exe")[0].path | split-path | split-path)
-          $cudaVer=($cudaPath | split-path -leaf ) -replace 'v(\d+).(\d+)', '$1_$2' 
-          echo "$cudaPath\bin" >> $env:GITHUB_PATH
-          echo "CUDA_PATH=$cudaPath" >> $env:GITHUB_ENV
-          echo "CUDA_PATH_V${cudaVer}=$cudaPath" >> $env:GITHUB_ENV
-          echo "CUDA_PATH_VX_Y=CUDA_PATH_V${cudaVer}" >> $env:GITHUB_ENV
-      - name: 'Verify CUDA'
-        run: nvcc -V
-      - run: go get ./...
-      - name: go generate
-        run: |
-          $gopath=(get-command go).source | split-path -parent
-          $cudabin=(get-command nvcc).source | split-path
-          & "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\Tools\Launch-VsDevShell.ps1"
-          cd $env:GITHUB_WORKSPACE
-          $env:CMAKE_SYSTEM_VERSION="10.0.22621.0"
-          $env:PATH="$gopath;$cudabin;$env:PATH"
-          $env:OLLAMA_SKIP_CPU_GENERATE="1"
-          go generate -x ./...
-      - name: 'gather cuda dependencies'
-        run: |
-          $NVIDIA_DIR=(resolve-path 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\*\bin\')[0]
-          md "dist\deps"
-          cp "${NVIDIA_DIR}\cudart64_*.dll" "dist\deps\"
-          cp "${NVIDIA_DIR}\cublas64_*.dll" "dist\deps\"
-          cp "${NVIDIA_DIR}\cublasLt64_*.dll" "dist\deps\"
-      - uses: actions/upload-artifact@v4
-        with:
-          name: generate-windows-cuda
-          path: llm/build/**/bin/*
-      - uses: actions/upload-artifact@v4
-        with:
-          name: windows-cuda-deps
-          path: dist/deps/*
-
-  # Import the prior generation steps and build the final windows assets
-  build-windows:
-    environment: release
-    runs-on: windows
-    needs:
-      - generate-windows-cuda
-      - generate-windows-rocm
-      - generate-windows-cpu
-    env:
-      KEY_CONTAINER: ${{ vars.KEY_CONTAINER }}
-    steps:
-      - uses: actions/checkout@v4
-        with:
-          submodules: recursive
-      - name: Set Version
-        shell: bash
-        run: echo "VERSION=${GITHUB_REF_NAME#v}" >> $GITHUB_ENV
-      - uses: 'google-github-actions/auth@v2'
-        with:
-          project_id: 'ollama'
-          credentials_json: '${{ secrets.GOOGLE_SIGNING_CREDENTIALS }}'
-      - run: echo "${{ vars.OLLAMA_CERT }}" > ollama_inc.crt
-      - name: install Windows SDK 8.1 to get signtool
-        run: |
-          $ErrorActionPreference = "Stop"
-          write-host "downloading SDK"
-          Invoke-WebRequest -Uri "https://go.microsoft.com/fwlink/p/?LinkId=323507" -OutFile "${env:RUNNER_TEMP}\sdksetup.exe"
-          Start-Process "${env:RUNNER_TEMP}\sdksetup.exe" -ArgumentList @("/q") -NoNewWindow -Wait
-          write-host "Win SDK 8.1 installed"
-          gci -path 'C:\Program Files (x86)\Windows Kits\' -r -fi 'signtool.exe'
-      - name: install signing plugin
-        run: |
-          $ErrorActionPreference = "Stop"
-          write-host "downloading plugin"
-          Invoke-WebRequest -Uri "https://github.com/GoogleCloudPlatform/kms-integrations/releases/download/cng-v1.0/kmscng-1.0-windows-amd64.zip" -OutFile "${env:RUNNER_TEMP}\plugin.zip"
-          Expand-Archive -Path "${env:RUNNER_TEMP}\plugin.zip" -DestinationPath ${env:RUNNER_TEMP}\plugin\
-          write-host "Installing plugin"
-          & "${env:RUNNER_TEMP}\plugin\*\kmscng.msi" /quiet
-          write-host "plugin installed"
-      - uses: actions/setup-go@v5
-        with:
-          go-version-file: go.mod
-          cache: true
-      - run: go get
-      - uses: actions/download-artifact@v4
-        with:
-          name: generate-windows-cpu
-          path: llm/build
-      - uses: actions/download-artifact@v4
-        with:
-          name: generate-windows-cuda
-          path: llm/build
-      - uses: actions/download-artifact@v4
-        with:
-          name: windows-cuda-deps
-          path: dist/deps
-      - uses: actions/download-artifact@v4
-        with:
-          name: windows-rocm-deps
-          path: dist/deps
-      - uses: actions/download-artifact@v4
-        with:
-          name: generate-windows-rocm
-          path: llm/build
-      - run: dir llm/build
-      - run: |
-          $gopath=(get-command go).source | split-path -parent
-          & "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\Tools\Launch-VsDevShell.ps1"
-          cd $env:GITHUB_WORKSPACE
-          $env:CMAKE_SYSTEM_VERSION="10.0.22621.0"
-          $env:PATH="$gopath;$env:PATH"
-          $env:OLLAMA_SKIP_GENERATE="1"
-          $env:NVIDIA_DIR=$(resolve-path ".\dist\deps")
-          $env:HIP_PATH=$(resolve-path ".\dist\deps")
-          & .\scripts\build_windows.ps1
-      - uses: actions/upload-artifact@v4
-        with:
-          name: dist-windows
-          path: dist/*.exe
-
-  # Linux x86 assets built using the container based build
-  build-linux-amd64:
-    environment: release
-    runs-on: linux
-    env:
-      OLLAMA_SKIP_MANIFEST_CREATE: '1'
-      BUILD_ARCH: amd64
-      PUSH: '1'
-    steps:
-      - uses: actions/checkout@v4
-        with:
-          submodules: recursive
-      - name: Set Version
-        shell: bash
-        run: echo "VERSION=${GITHUB_REF_NAME#v}" >> $GITHUB_ENV
-      - name: Login to Docker Hub
-        uses: docker/login-action@v3
-        with:
-          username: ${{ vars.DOCKER_USER }}
-          password: ${{ secrets.DOCKER_ACCESS_TOKEN }}
-      - run: |
-          ./scripts/build_linux.sh
-          ./scripts/build_docker.sh
-          mv dist/deps/* dist/
-      - uses: actions/upload-artifact@v4
-        with:
-          name: dist-linux-amd64
-          path: |
-            dist/*linux*
-            !dist/*-cov
-
-  # Linux ARM assets built using the container based build
-  # (at present, docker isn't pre-installed on arm ubunutu images)
-  build-linux-arm64:
-    environment: release
-    runs-on: linux-arm64
-    env:
-      OLLAMA_SKIP_MANIFEST_CREATE: '1'
-      BUILD_ARCH: arm64
-      PUSH: '1'
-    steps:
-      - uses: actions/checkout@v4
-        with:
-          submodules: recursive
-      - name: Set Version
-        shell: bash
-        run: echo "VERSION=${GITHUB_REF_NAME#v}" >> $GITHUB_ENV
-      - name: 'Install Docker'
-        run: |
-          # Add Docker's official GPG key:
-          env
-          uname -a
-          sudo apt-get update
-          sudo apt-get install -y ca-certificates curl
-          sudo install -m 0755 -d /etc/apt/keyrings
-          sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
-          sudo chmod a+r /etc/apt/keyrings/docker.asc
-
-          # Add the repository to Apt sources:
-          echo \
-            "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
-            $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
-            sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
-          sudo apt-get update
-          sudo apt-get install -y docker-ce docker-ce-cli containerd.io
-          sudo usermod -aG docker $USER
-          sudo apt-get install acl
-          sudo setfacl --modify user:$USER:rw /var/run/docker.sock
-      - name: Login to Docker Hub
-        uses: docker/login-action@v3
-        with:
-          username: ${{ vars.DOCKER_USER }}
-          password: ${{ secrets.DOCKER_ACCESS_TOKEN }}
-      - run: |
-          ./scripts/build_linux.sh
-          ./scripts/build_docker.sh
-      - uses: actions/upload-artifact@v4
-        with:
-          name: dist-linux-arm64
-          path: |
-            dist/*linux*
-            !dist/*-cov
-
-  # Aggregate all the assets and ship a release
-  release:
-    needs:
-      - build-darwin
-      - build-windows
-      - build-linux-amd64
-      - build-linux-arm64
-    runs-on: linux
-    environment: release
-    permissions:
-      contents: write
-    env:
-      OLLAMA_SKIP_IMAGE_BUILD: '1'
-      PUSH: '1'
-    steps:
-      - uses: actions/checkout@v4
-      - name: Set Version
-        shell: bash
-        run: |
-          echo "VERSION=${GITHUB_REF_NAME#v}" >> $GITHUB_ENV
-          echo "RELEASE_VERSION=$(echo ${GITHUB_REF_NAME} | cut -f1 -d-)" >> $GITHUB_ENV
-      - name: Login to Docker Hub
-        uses: docker/login-action@v3
-        with:
-          username: ${{ vars.DOCKER_USER }}
-          password: ${{ secrets.DOCKER_ACCESS_TOKEN }}
-      - run: ./scripts/build_docker.sh
-      - name: Retrieve built artifact
-        uses: actions/download-artifact@v4
-        with:
-          path: dist
-          pattern: dist-*
-          merge-multiple: true
-      - run: |
-          ls -lh dist/
-          (cd dist; sha256sum * > sha256sum.txt)
-          cat dist/sha256sum.txt
-      - uses: ncipollo/release-action@v1
-        with:
-          name: ${{ env.RELEASE_VERSION }}
-          allowUpdates: true
-          artifacts: 'dist/*'
-          draft: true
-          prerelease: true
-          omitBodyDuringUpdate: true
-          generateReleaseNotes: true
-          omitDraftDuringUpdate: true
-          omitPrereleaseDuringUpdate: true
-          replacesArtifacts: true
--- a/.github/workflows/test.yaml
+++ b/.github/workflows/test.yaml
@@ -2,46 +2,17 @@ name: test

 on:
  pull_request:
-    paths:
-      - '**/*'
-      - '!docs/**'
-      - '!README.md'

 jobs:
-  changes:
-    runs-on: ubuntu-latest
-    outputs:
-      GENERATE: ${{ steps.changes.outputs.GENERATE }}
-      GENERATE_CUDA: ${{ steps.changes.outputs.GENERATE_CUDA }}
-      GENERATE_ROCM: ${{ steps.changes.outputs.GENERATE_ROCM }}
-    steps:
-      - uses: actions/checkout@v4
-        with:
-          fetch-depth: 0
-      - id: changes
-        run: |
-          changed() {
-            git diff-tree -r --no-commit-id --name-only ${{ github.event.pull_request.base.sha }} ${{ github.event.pull_request.head.sha }} \
-              | xargs python3 -c "import sys; print(any([x.startswith('$1') for x in sys.argv[1:]]))"
-          }
-
-          {
-            echo GENERATE=$(changed llm/)
-            echo GENERATE_CUDA=$(changed llm/)
-            echo GENERATE_ROCM=$(changed llm/)
-          } >>$GITHUB_OUTPUT
-
  generate:
-    needs: [changes]
-    if: ${{ needs.changes.outputs.GENERATE == 'True' }}
    strategy:
      matrix:
-        os: [ubuntu-latest, macos-latest, windows-2019]
+        os: [ubuntu-latest, macos-latest, windows-latest]
        arch: [amd64, arm64]
        exclude:
          - os: ubuntu-latest
            arch: arm64
-          - os: windows-2019
+          - os: windows-latest
            arch: arm64
    runs-on: ${{ matrix.os }}
    env:
@@ -50,264 +21,86 @@ jobs:
      - uses: actions/checkout@v4
      - uses: actions/setup-go@v5
        with:
-          go-version-file: go.mod
+          go-version: '1.21'
          cache: true
+      - if: ${{ startsWith(matrix.os, 'windows-') }}
+        shell: pwsh
+        run: |
+          $path = vswhere -latest -products * -requires Microsoft.VisualStudio.Component.VC.Tools.x86.x64 -property installationPath
+          if ($path) {
+              $path = join-path $path 'Common7\Tools\vsdevcmd.bat'
+              if (test-path $path) {
+                  cmd /s /c """$path"" $args && set" | where { $_ -match '(\w+)=(.*)' } | foreach {
+                      echo "$($Matches[1])=$($Matches[2])" | Out-File -FilePath $Env:GITHUB_ENV -Encoding utf8 -Append
+                  }
+              }
+          }
+
+          echo "C:\Program Files\Git\usr\bin" | Out-File -FilePath $Env:GITHUB_PATH -Encoding utf8 -Append
      - run: go get ./...
-      - run: |
-          $gopath=(get-command go).source | split-path -parent
-          $gccpath=(get-command gcc).source | split-path -parent
-          & "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\Tools\Launch-VsDevShell.ps1"
-          cd $env:GITHUB_WORKSPACE
-          $env:CMAKE_SYSTEM_VERSION="10.0.22621.0"
-          $env:PATH="$gopath;$gccpath;$env:PATH"
-          echo $env:PATH
-          go generate -x ./...
-        if: ${{ startsWith(matrix.os, 'windows-') }}
-        name: 'Windows Go Generate'
      - run: go generate -x ./...
-        if: ${{ ! startsWith(matrix.os, 'windows-') }}
-        name: 'Unix Go Generate'
      - uses: actions/upload-artifact@v4
        with:
          name: ${{ matrix.os }}-${{ matrix.arch }}-libraries
          path: |
-            llm/build/**/bin/*
-            llm/build/**/*.a
-  generate-cuda:
-    needs: [changes]
-    if: ${{ needs.changes.outputs.GENERATE_CUDA == 'True' }}
-    strategy:
-      matrix:
-        cuda-version:
-          - '11.8.0'
-    runs-on: linux
-    container: nvidia/cuda:${{ matrix.cuda-version }}-devel-ubuntu20.04
-    steps:
-      - run: |
-          apt-get update && apt-get install -y git build-essential curl
-          curl -fsSL https://github.com/Kitware/CMake/releases/download/v3.28.1/cmake-3.28.1-linux-x86_64.tar.gz \
-            | tar -zx -C /usr --strip-components 1
-        env:
-          DEBIAN_FRONTEND: noninteractive
-      - uses: actions/checkout@v4
-      - uses: actions/setup-go@v4
-        with:
-          go-version-file: go.mod
-          cache: true
-      - run: go get ./...
-      - run: |
-          git config --global --add safe.directory /__w/ollama/ollama
-          go generate -x ./...
-        env:
-          OLLAMA_SKIP_CPU_GENERATE: '1'
-      - uses: actions/upload-artifact@v4
-        with:
-          name: cuda-${{ matrix.cuda-version }}-libraries
-          path: llm/build/**/bin/*
-  generate-rocm:
-    needs: [changes]
-    if: ${{ needs.changes.outputs.GENERATE_ROCM == 'True' }}
-    strategy:
-      matrix:
-        rocm-version:
-          - '6.0.2'
-    runs-on: linux
-    container: rocm/dev-ubuntu-20.04:${{ matrix.rocm-version }}
-    steps:
-      - run: |
-          apt-get update && apt-get install -y git build-essential curl rocm-libs
-          curl -fsSL https://github.com/Kitware/CMake/releases/download/v3.28.1/cmake-3.28.1-linux-x86_64.tar.gz \
-            | tar -zx -C /usr --strip-components 1
-        env:
-          DEBIAN_FRONTEND: noninteractive
-      - uses: actions/checkout@v4
-      - uses: actions/setup-go@v4
-        with:
-          go-version-file: go.mod
-          cache: true
-      - run: go get ./...
-      - run: |
-          git config --global --add safe.directory /__w/ollama/ollama
-          go generate -x ./...
-        env:
-          OLLAMA_SKIP_CPU_GENERATE: '1'
-      - uses: actions/upload-artifact@v4
-        with:
-          name: rocm-${{ matrix.rocm-version }}-libraries
-          path: llm/build/**/bin/*
-
-  # ROCm generation step
-  generate-windows-rocm:
-    needs: [changes]
-    if: ${{ needs.changes.outputs.GENERATE_ROCM == 'True' }}
-    runs-on: windows
-    steps:
-      - uses: actions/checkout@v4
-      - uses: actions/setup-go@v5
-        with:
-          go-version-file: go.mod
-          cache: true
-      - name: 'Install ROCm'
-        run: |
-          $ErrorActionPreference = "Stop"
-          write-host "downloading AMD HIP Installer"
-          Invoke-WebRequest -Uri "https://download.amd.com/developer/eula/rocm-hub/AMD-Software-PRO-Edition-23.Q4-WinSvr2022-For-HIP.exe" -OutFile "${env:RUNNER_TEMP}\rocm-install.exe"
-          write-host "Installing AMD HIP"
-          Start-Process "${env:RUNNER_TEMP}\rocm-install.exe" -ArgumentList '-install' -NoNewWindow -Wait
-          write-host "Completed AMD HIP"
-      - name: 'Verify ROCm'
-        run: |
-          & 'C:\Program Files\AMD\ROCm\*\bin\clang.exe' --version
-      - run: go get ./...
-      - run: |
-          $gopath=(get-command go).source | split-path -parent
-          & "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\Tools\Launch-VsDevShell.ps1"
-          cd $env:GITHUB_WORKSPACE
-          $env:CMAKE_SYSTEM_VERSION="10.0.22621.0"
-          $env:PATH="$gopath;$env:PATH"
-          $env:OLLAMA_SKIP_CPU_GENERATE="1"
-          $env:HIP_PATH=$(Resolve-Path 'C:\Program Files\AMD\ROCm\*\bin\clang.exe' | split-path | split-path)
-          go generate -x ./...
-        name: go generate
-        env:
-          OLLAMA_SKIP_CPU_GENERATE: '1'
-      # TODO - do we need any artifacts?
-
-  # CUDA generation step
-  generate-windows-cuda:
-    needs: [changes]
-    if: ${{ needs.changes.outputs.GENERATE_CUDA == 'True' }}
-    runs-on: windows
-    steps:
-      - uses: actions/checkout@v4
-      - uses: actions/setup-go@v5
-        with:
-          go-version-file: go.mod
-          cache: true
-      - name: 'Install CUDA'
-        run: |
-          $ErrorActionPreference = "Stop"
-          write-host "downloading CUDA Installer"
-          Invoke-WebRequest -Uri "https://developer.download.nvidia.com/compute/cuda/11.3.1/local_installers/cuda_11.3.1_465.89_win10.exe" -OutFile "${env:RUNNER_TEMP}\cuda-install.exe"
-          write-host "Installing CUDA"
-          Start-Process "${env:RUNNER_TEMP}\cuda-install.exe" -ArgumentList '-s' -NoNewWindow -Wait
-          write-host "Completed CUDA"
-          $cudaPath=((resolve-path "c:\Program Files\NVIDIA*\CUDA\v*\bin\nvcc.exe")[0].path | split-path | split-path)
-          $cudaVer=($cudaPath | split-path -leaf ) -replace 'v(\d+).(\d+)', '$1_$2' 
-          echo "$cudaPath\bin" >> $env:GITHUB_PATH
-          echo "CUDA_PATH=$cudaPath" >> $env:GITHUB_ENV
-          echo "CUDA_PATH_V${cudaVer}=$cudaPath" >> $env:GITHUB_ENV
-          echo "CUDA_PATH_VX_Y=CUDA_PATH_V${cudaVer}" >> $env:GITHUB_ENV
-      - name: 'Verify CUDA'
-        run: nvcc -V
-      - run: go get ./...
-      - name: go generate
-        run: |
-          $gopath=(get-command go).source | split-path -parent
-          $cudabin=(get-command nvcc).source | split-path
-          & "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\Tools\Launch-VsDevShell.ps1"
-          cd $env:GITHUB_WORKSPACE
-          $env:CMAKE_SYSTEM_VERSION="10.0.22621.0"
-          $env:PATH="$gopath;$cudabin;$env:PATH"
-          $env:OLLAMA_SKIP_CPU_GENERATE="1"
-          go generate -x ./...
-        env:
-          OLLAMA_SKIP_CPU_GENERATE: '1'
-      # TODO - do we need any artifacts?
-
+            llm/llama.cpp/build/**/lib/*
  lint:
+    needs: generate
    strategy:
      matrix:
-        os: [ubuntu-latest, macos-latest, windows-2019]
+        os: [ubuntu-latest, macos-latest, windows-latest]
        arch: [amd64, arm64]
        exclude:
          - os: ubuntu-latest
            arch: arm64
-          - os: windows-2019
+          - os: windows-latest
            arch: arm64
          - os: macos-latest
            arch: amd64
    runs-on: ${{ matrix.os }}
    env:
      GOARCH: ${{ matrix.arch }}
-      CGO_ENABLED: '1'
+      CGO_ENABLED: "1"
    steps:
      - uses: actions/checkout@v4
        with:
          submodules: recursive
      - uses: actions/setup-go@v5
        with:
-          go-version-file: go.mod
+          go-version: '1.21'
          cache: false
-      - run: |
-          case ${{ matrix.arch }} in
-            amd64) echo ARCH=x86_64 ;;
-            arm64) echo ARCH=arm64 ;;
-          esac >>$GITHUB_ENV
-        shell: bash
-      - run: |
-          mkdir -p llm/build/linux/$ARCH/stub/bin
-          touch llm/build/linux/$ARCH/stub/bin/ollama_llama_server
-        if: ${{ startsWith(matrix.os, 'ubuntu-') }}
-      - run: |
-          mkdir -p llm/build/darwin/$ARCH/stub/bin
-          touch llm/build/darwin/$ARCH/stub/bin/ollama_llama_server
-        if: ${{ startsWith(matrix.os, 'macos-') }}
-      - run: |
-          mkdir -p llm/build/windows/$ARCH/stub/bin
-          touch llm/build/windows/$ARCH/stub/bin/ollama_llama_server
-        if: ${{ startsWith(matrix.os, 'windows-') }}
-        shell: bash
-      - uses: golangci/golangci-lint-action@v4
+      - uses: actions/download-artifact@v4
        with:
-          args: --timeout 8m0s
+          name: ${{ matrix.os }}-${{ matrix.arch }}-libraries
+          path: llm/llama.cpp/build
+      - uses: golangci/golangci-lint-action@v3
  test:
+    needs: generate
    strategy:
      matrix:
-        os: [ubuntu-latest, macos-latest, windows-2019]
+        os: [ubuntu-latest, macos-latest, windows-latest]
        arch: [amd64]
        exclude:
          - os: ubuntu-latest
            arch: arm64
-          - os: windows-2019
+          - os: windows-latest
            arch: arm64
    runs-on: ${{ matrix.os }}
    env:
      GOARCH: ${{ matrix.arch }}
-      CGO_ENABLED: '1'
-      OLLAMA_CPU_TARGET: 'static'
+      CGO_ENABLED: "1"
    steps:
      - uses: actions/checkout@v4
        with:
          submodules: recursive
      - uses: actions/setup-go@v5
        with:
-          go-version-file: go.mod
+          go-version: '1.21'
          cache: true
      - run: go get
-      - run: |
-          case ${{ matrix.arch }} in
-            amd64) echo ARCH=x86_64 ;;
-            arm64) echo ARCH=arm64 ;;
-          esac >>$GITHUB_ENV
-        shell: bash
-      - run: |
-          mkdir -p llm/build/linux/$ARCH/stub/bin
-          touch llm/build/linux/$ARCH/stub/bin/ollama_llama_server
-        if: ${{ startsWith(matrix.os, 'ubuntu-') }}
-      - run: |
-          mkdir -p llm/build/darwin/$ARCH/stub/bin
-          touch llm/build/darwin/$ARCH/stub/bin/ollama_llama_server
-        if: ${{ startsWith(matrix.os, 'macos-') }}
-      - run: |
-          mkdir -p llm/build/windows/$ARCH/stub/bin
-          touch llm/build/windows/$ARCH/stub/bin/ollama_llama_server
-        if: ${{ startsWith(matrix.os, 'windows-') }}
-        shell: bash
-      - run: go generate ./...
+      - uses: actions/download-artifact@v4
+        with:
+          name: ${{ matrix.os }}-${{ matrix.arch }}-libraries
+          path: llm/llama.cpp/build
      - run: go build
      - run: go test -v ./...
-      - uses: actions/upload-artifact@v4
-        with:
-          name: ${{ matrix.os }}-binaries
-          path: ollama
--- a/.gitignore
+++ b/.gitignore
@@ -9,6 +9,4 @@ ggml-metal.metal
 .cache
 *.exe
 .idea
-test_data
-*.crt
-llm/build
+test_data
--- a/.golangci.yaml
+++ b/.golangci.yaml
@@ -15,3 +15,13 @@ linters:
    - misspell
    - nilerr
    - unused
+linters-settings:
+  errcheck:
+    # exclude the following functions since we don't generally
+    # need to be concerned with the returned errors
+    exclude-functions:
+      - encoding/binary.Read
+      - (*os.File).Seek
+      - (*bufio.Writer).WriteString
+      - (*github.com/spf13/pflag.FlagSet).Set
+      - (*github.com/jmorganca/ollama/llm.readSeekOffset).Seek
--- a/145
+++ b/145
@@ -1,144 +1,29 @@
-ARG GOLANG_VERSION=1.22.1
-ARG CMAKE_VERSION=3.22.1
-# this CUDA_VERSION corresponds with the one specified in docs/gpu.md
-ARG CUDA_VERSION=11.3.1
-ARG ROCM_VERSION=6.0.2
+FROM nvidia/cuda:11.8.0-devel-ubuntu22.04

-# Copy the minimal context we need to run the generate scripts
-FROM scratch AS llm-code
-COPY .git .git
-COPY .gitmodules .gitmodules
-COPY llm llm
+ARG TARGETARCH
+ARG GOFLAGS="'-ldflags=-w -s'"

-FROM --platform=linux/amd64 nvidia/cuda:$CUDA_VERSION-devel-centos7 AS cuda-build-amd64
-ARG CMAKE_VERSION
-COPY ./scripts/rh_linux_deps.sh /
-RUN CMAKE_VERSION=${CMAKE_VERSION} sh /rh_linux_deps.sh
-ENV PATH /opt/rh/devtoolset-10/root/usr/bin:$PATH
-COPY --from=llm-code / /go/src/github.com/ollama/ollama/
-WORKDIR /go/src/github.com/ollama/ollama/llm/generate
-ARG CGO_CFLAGS
-RUN OLLAMA_SKIP_CPU_GENERATE=1 sh gen_linux.sh
+WORKDIR /go/src/github.com/jmorganca/ollama
+RUN apt-get update && apt-get install -y git build-essential cmake
+ADD https://dl.google.com/go/go1.21.3.linux-$TARGETARCH.tar.gz /tmp/go1.21.3.tar.gz
+RUN mkdir -p /usr/local && tar xz -C /usr/local </tmp/go1.21.3.tar.gz

-FROM --platform=linux/arm64 nvidia/cuda:$CUDA_VERSION-devel-rockylinux8 AS cuda-build-arm64
-ARG CMAKE_VERSION
-COPY ./scripts/rh_linux_deps.sh /
-RUN CMAKE_VERSION=${CMAKE_VERSION} sh /rh_linux_deps.sh
-ENV PATH /opt/rh/gcc-toolset-10/root/usr/bin:$PATH
-COPY --from=llm-code / /go/src/github.com/ollama/ollama/
-WORKDIR /go/src/github.com/ollama/ollama/llm/generate
-ARG CGO_CFLAGS
-RUN OLLAMA_SKIP_CPU_GENERATE=1 sh gen_linux.sh
-
-FROM --platform=linux/amd64 rocm/dev-centos-7:${ROCM_VERSION}-complete AS rocm-build-amd64
-ARG CMAKE_VERSION
-COPY ./scripts/rh_linux_deps.sh /
-RUN CMAKE_VERSION=${CMAKE_VERSION} sh /rh_linux_deps.sh
-ENV PATH /opt/rh/devtoolset-10/root/usr/bin:$PATH
-ENV LIBRARY_PATH /opt/amdgpu/lib64
-COPY --from=llm-code / /go/src/github.com/ollama/ollama/
-WORKDIR /go/src/github.com/ollama/ollama/llm/generate
-ARG CGO_CFLAGS
-ARG AMDGPU_TARGETS
-RUN OLLAMA_SKIP_CPU_GENERATE=1 sh gen_linux.sh
-RUN mkdir /tmp/scratch && \
-    for dep in $(cat /go/src/github.com/ollama/ollama/llm/llama.cpp/build/linux/x86_64/rocm*/lib/deps.txt) ; do \
-        cp ${dep} /tmp/scratch/ || exit 1 ; \
-    done && \
-    (cd /opt/rocm/lib && tar cf - rocblas/library) | (cd /tmp/scratch/ && tar xf - ) && \
-    mkdir -p /go/src/github.com/ollama/ollama/dist/deps/ && \
-    (cd /tmp/scratch/ && tar czvf /go/src/github.com/ollama/ollama/dist/deps/ollama-linux-amd64-rocm.tgz . )
-
-
-FROM --platform=linux/amd64 centos:7 AS cpu-builder-amd64
-ARG CMAKE_VERSION
-ARG GOLANG_VERSION
-COPY ./scripts/rh_linux_deps.sh /
-RUN CMAKE_VERSION=${CMAKE_VERSION} GOLANG_VERSION=${GOLANG_VERSION} sh /rh_linux_deps.sh
-ENV PATH /opt/rh/devtoolset-10/root/usr/bin:$PATH
-COPY --from=llm-code / /go/src/github.com/ollama/ollama/
-ARG OLLAMA_CUSTOM_CPU_DEFS
-ARG CGO_CFLAGS
-WORKDIR /go/src/github.com/ollama/ollama/llm/generate
-
-FROM --platform=linux/amd64 cpu-builder-amd64 AS static-build-amd64
-RUN OLLAMA_CPU_TARGET="static" sh gen_linux.sh
-FROM --platform=linux/amd64 cpu-builder-amd64 AS cpu-build-amd64
-RUN OLLAMA_CPU_TARGET="cpu" sh gen_linux.sh
-FROM --platform=linux/amd64 cpu-builder-amd64 AS cpu_avx-build-amd64
-RUN OLLAMA_CPU_TARGET="cpu_avx" sh gen_linux.sh
-FROM --platform=linux/amd64 cpu-builder-amd64 AS cpu_avx2-build-amd64
-RUN OLLAMA_CPU_TARGET="cpu_avx2" sh gen_linux.sh
-
-FROM --platform=linux/arm64 centos:7 AS cpu-builder-arm64
-ARG CMAKE_VERSION
-ARG GOLANG_VERSION
-COPY ./scripts/rh_linux_deps.sh /
-RUN CMAKE_VERSION=${CMAKE_VERSION} GOLANG_VERSION=${GOLANG_VERSION} sh /rh_linux_deps.sh
-ENV PATH /opt/rh/devtoolset-10/root/usr/bin:$PATH
-COPY --from=llm-code / /go/src/github.com/ollama/ollama/
-ARG OLLAMA_CUSTOM_CPU_DEFS
-ARG CGO_CFLAGS
-WORKDIR /go/src/github.com/ollama/ollama/llm/generate
-
-FROM --platform=linux/arm64 cpu-builder-arm64 AS static-build-arm64
-RUN OLLAMA_CPU_TARGET="static" sh gen_linux.sh
-FROM --platform=linux/arm64 cpu-builder-arm64 AS cpu-build-arm64
-RUN OLLAMA_CPU_TARGET="cpu" sh gen_linux.sh
-
-
-# Intermediate stage used for ./scripts/build_linux.sh
-FROM --platform=linux/amd64 cpu-build-amd64 AS build-amd64
-ENV CGO_ENABLED 1
-WORKDIR /go/src/github.com/ollama/ollama
 COPY . .
-COPY --from=static-build-amd64 /go/src/github.com/ollama/ollama/llm/build/linux/ llm/build/linux/
-COPY --from=cpu_avx-build-amd64 /go/src/github.com/ollama/ollama/llm/build/linux/ llm/build/linux/
-COPY --from=cpu_avx2-build-amd64 /go/src/github.com/ollama/ollama/llm/build/linux/ llm/build/linux/
-COPY --from=cuda-build-amd64 /go/src/github.com/ollama/ollama/llm/build/linux/ llm/build/linux/
-COPY --from=rocm-build-amd64 /go/src/github.com/ollama/ollama/llm/build/linux/ llm/build/linux/
-COPY --from=rocm-build-amd64 /go/src/github.com/ollama/ollama/dist/deps/ ./dist/deps/
-ARG GOFLAGS
-ARG CGO_CFLAGS
-RUN go build -trimpath .
+ENV GOARCH=$TARGETARCH
+ENV GOFLAGS=$GOFLAGS
+RUN /usr/local/go/bin/go generate ./... \
+    && /usr/local/go/bin/go build .

-# Intermediate stage used for ./scripts/build_linux.sh
-FROM --platform=linux/arm64 cpu-build-arm64 AS build-arm64
-ENV CGO_ENABLED 1
-ARG GOLANG_VERSION
-WORKDIR /go/src/github.com/ollama/ollama
-COPY . .
-COPY --from=static-build-arm64 /go/src/github.com/ollama/ollama/llm/build/linux/ llm/build/linux/
-COPY --from=cuda-build-arm64 /go/src/github.com/ollama/ollama/llm/build/linux/ llm/build/linux/
-ARG GOFLAGS
-ARG CGO_CFLAGS
-RUN go build -trimpath .
-
-# Runtime stages
-FROM --platform=linux/amd64 ubuntu:22.04 as runtime-amd64
+FROM ubuntu:22.04
 RUN apt-get update && apt-get install -y ca-certificates
-COPY --from=build-amd64 /go/src/github.com/ollama/ollama/ollama /bin/ollama
-FROM --platform=linux/arm64 ubuntu:22.04 as runtime-arm64
-RUN apt-get update && apt-get install -y ca-certificates
-COPY --from=build-arm64 /go/src/github.com/ollama/ollama/ollama /bin/ollama
-
-# Radeon images are much larger so we keep it distinct from the CPU/CUDA image
-FROM --platform=linux/amd64 rocm/dev-centos-7:${ROCM_VERSION}-complete as runtime-rocm
-RUN update-pciids
-COPY --from=build-amd64 /go/src/github.com/ollama/ollama/ollama /bin/ollama
+COPY --from=0 /go/src/github.com/jmorganca/ollama/ollama /bin/ollama
 EXPOSE 11434
 ENV OLLAMA_HOST 0.0.0.0

-ENTRYPOINT ["/bin/ollama"]
-CMD ["serve"]
-
-FROM runtime-$TARGETARCH
-EXPOSE 11434
-ENV OLLAMA_HOST 0.0.0.0
-ENV PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+# set some environment variable for better NVIDIA compatibility
+ENV PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
 ENV LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64
 ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
-ENV NVIDIA_VISIBLE_DEVICES=all

 ENTRYPOINT ["/bin/ollama"]
 CMD ["serve"]
--- a/Dockerfile.build
+++ b/Dockerfile.build
@@ -0,0 +1,99 @@
+ARG GOLANG_VERSION=1.21.3
+ARG CMAKE_VERSION=3.22.1
+ARG CUDA_VERSION=11.3.1
+
+# Copy the minimal context we need to run the generate scripts
+FROM scratch AS llm-code
+COPY .git .git
+COPY .gitmodules .gitmodules
+COPY llm llm
+
+FROM --platform=linux/amd64 nvidia/cuda:$CUDA_VERSION-devel-centos7 AS cuda-build-amd64
+ARG CMAKE_VERSION
+ARG CGO_CFLAGS
+COPY ./scripts/rh_linux_deps.sh /
+RUN CMAKE_VERSION=${CMAKE_VERSION} sh /rh_linux_deps.sh
+ENV PATH /opt/rh/devtoolset-10/root/usr/bin:$PATH
+COPY --from=llm-code / /go/src/github.com/jmorganca/ollama/
+WORKDIR /go/src/github.com/jmorganca/ollama/llm/generate
+RUN OLLAMA_SKIP_CPU_GENERATE=1 sh gen_linux.sh
+
+FROM --platform=linux/arm64 nvidia/cuda:$CUDA_VERSION-devel-rockylinux8 AS cuda-build-arm64
+ARG CMAKE_VERSION
+ARG CGO_CFLAGS
+COPY ./scripts/rh_linux_deps.sh /
+RUN CMAKE_VERSION=${CMAKE_VERSION} sh /rh_linux_deps.sh
+ENV PATH /opt/rh/gcc-toolset-10/root/usr/bin:$PATH
+COPY --from=llm-code / /go/src/github.com/jmorganca/ollama/
+WORKDIR /go/src/github.com/jmorganca/ollama/llm/generate
+RUN OLLAMA_SKIP_CPU_GENERATE=1 sh gen_linux.sh
+
+FROM --platform=linux/amd64 rocm/dev-centos-7:5.7.1-complete AS rocm-5-build-amd64
+ARG CMAKE_VERSION
+ARG CGO_CFLAGS
+COPY ./scripts/rh_linux_deps.sh /
+RUN CMAKE_VERSION=${CMAKE_VERSION} sh /rh_linux_deps.sh
+ENV PATH /opt/rh/devtoolset-10/root/usr/bin:$PATH
+ENV LIBRARY_PATH /opt/amdgpu/lib64
+COPY --from=llm-code / /go/src/github.com/jmorganca/ollama/
+WORKDIR /go/src/github.com/jmorganca/ollama/llm/generate
+RUN OLLAMA_SKIP_CPU_GENERATE=1 sh gen_linux.sh
+
+FROM --platform=linux/amd64 rocm/dev-centos-7:6.0-complete AS rocm-6-build-amd64
+ARG CMAKE_VERSION
+ARG CGO_CFLAGS
+COPY ./scripts/rh_linux_deps.sh /
+RUN CMAKE_VERSION=${CMAKE_VERSION} sh /rh_linux_deps.sh
+ENV PATH /opt/rh/devtoolset-10/root/usr/bin:$PATH
+ENV LIBRARY_PATH /opt/amdgpu/lib64
+COPY --from=llm-code / /go/src/github.com/jmorganca/ollama/
+WORKDIR /go/src/github.com/jmorganca/ollama/llm/generate
+RUN OLLAMA_SKIP_CPU_GENERATE=1 sh gen_linux.sh
+
+FROM --platform=linux/amd64 centos:7 AS cpu-build-amd64
+ARG CMAKE_VERSION
+ARG GOLANG_VERSION
+ARG OLLAMA_CUSTOM_CPU_DEFS
+ARG CGO_CFLAGS
+COPY ./scripts/rh_linux_deps.sh /
+RUN CMAKE_VERSION=${CMAKE_VERSION} GOLANG_VERSION=${GOLANG_VERSION} sh /rh_linux_deps.sh
+ENV PATH /opt/rh/devtoolset-10/root/usr/bin:$PATH
+COPY --from=llm-code / /go/src/github.com/jmorganca/ollama/
+WORKDIR /go/src/github.com/jmorganca/ollama/llm/generate
+RUN sh gen_linux.sh
+
+FROM --platform=linux/arm64 centos:7 AS cpu-build-arm64
+ARG CMAKE_VERSION
+ARG GOLANG_VERSION
+ARG OLLAMA_CUSTOM_CPU_DEFS
+ARG CGO_CFLAGS
+COPY ./scripts/rh_linux_deps.sh /
+RUN CMAKE_VERSION=${CMAKE_VERSION} GOLANG_VERSION=${GOLANG_VERSION} sh /rh_linux_deps.sh
+ENV PATH /opt/rh/devtoolset-10/root/usr/bin:$PATH
+COPY --from=llm-code / /go/src/github.com/jmorganca/ollama/
+WORKDIR /go/src/github.com/jmorganca/ollama/llm/generate
+RUN sh gen_linux.sh
+
+
+FROM --platform=linux/amd64 cpu-build-amd64 AS build-amd64
+ENV CGO_ENABLED 1
+ARG GOFLAGS
+ARG CGO_CFLAGS
+WORKDIR /go/src/github.com/jmorganca/ollama
+COPY . .
+COPY --from=cuda-build-amd64 /go/src/github.com/jmorganca/ollama/llm/llama.cpp/build/linux/ llm/llama.cpp/build/linux/
+COPY --from=rocm-5-build-amd64 /go/src/github.com/jmorganca/ollama/llm/llama.cpp/build/linux/ llm/llama.cpp/build/linux/
+COPY --from=rocm-6-build-amd64 /go/src/github.com/jmorganca/ollama/llm/llama.cpp/build/linux/ llm/llama.cpp/build/linux/
+RUN go build .
+
+FROM --platform=linux/arm64 cpu-build-arm64 AS build-arm64
+ENV CGO_ENABLED 1
+ARG GOLANG_VERSION
+ARG GOFLAGS
+ARG CGO_CFLAGS
+WORKDIR /go/src/github.com/jmorganca/ollama
+COPY . .
+COPY --from=cuda-build-arm64 /go/src/github.com/jmorganca/ollama/llm/llama.cpp/build/linux/ llm/llama.cpp/build/linux/
+RUN go build .
+
+FROM build-$TARGETARCH
--- a/README.md
+++ b/README.md
@@ -1,5 +1,8 @@
 <div align="center">
-  <img alt="ollama" height="200px" src="https://github.com/ollama/ollama/assets/3325447/0d0b44e2-8f4a-4e99-9b52-a5c1c741c8f7">
+  <picture>
+    <source media="(prefers-color-scheme: dark)" height="200px" srcset="https://github.com/jmorganca/ollama/assets/3325447/56ea1849-1284-4645-8970-956de6e51c3c">
+    <img alt="logo" height="200px" src="https://github.com/jmorganca/ollama/assets/3325447/0d0b44e2-8f4a-4e99-9b52-a5c1c741c8f7">
+  </picture>
 </div>

 # Ollama
@@ -10,32 +13,27 @@ Get up and running with large language models locally.

 ### macOS

-[Download](https://ollama.com/download/Ollama-darwin.zip)
+[Download](https://ollama.ai/download/Ollama-darwin.zip)

-### Windows preview
+### Windows

-[Download](https://ollama.com/download/OllamaSetup.exe)
+Coming soon! For now, you can install Ollama on Windows via WSL2.

-### Linux
+### Linux & WSL2

 ```
-curl -fsSL https://ollama.com/install.sh | sh
+curl https://ollama.ai/install.sh | sh
 ```

-[Manual install instructions](https://github.com/ollama/ollama/blob/main/docs/linux.md)
+[Manual install instructions](https://github.com/jmorganca/ollama/blob/main/docs/linux.md)

 ### Docker

 The official [Ollama Docker image](https://hub.docker.com/r/ollama/ollama) `ollama/ollama` is available on Docker Hub.

-### Libraries
-
- [ollama-python](https://github.com/ollama/ollama-python)
- [ollama-js](https://github.com/ollama/ollama-js)
-
 ## Quickstart

-To run and chat with [Llama 2](https://ollama.com/library/llama2):
+To run and chat with [Llama 2](https://ollama.ai/library/llama2):

 ```
 ollama run llama2
@@ -43,9 +41,9 @@ ollama run llama2

 ## Model library

-Ollama supports a list of models available on [ollama.com/library](https://ollama.com/library 'ollama model library')
+Ollama supports a list of open-source models available on [ollama.ai/library](https://ollama.ai/library 'ollama model library')

-Here are some example models that can be downloaded:
+Here are some example open-source models that can be downloaded:

 | Model              | Parameters | Size  | Download                       |
 | ------------------ | ---------- | ----- | ------------------------------ |
@@ -62,8 +60,6 @@ Here are some example models that can be downloaded:
 | Orca Mini          | 3B         | 1.9GB | `ollama run orca-mini`         |
 | Vicuna             | 7B         | 3.8GB | `ollama run vicuna`            |
 | LLaVA              | 7B         | 4.5GB | `ollama run llava`             |
-| Gemma              | 2B         | 1.4GB | `ollama run gemma:2b`          |
-| Gemma              | 7B         | 4.8GB | `ollama run gemma:7b`          |

 > Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.

@@ -202,21 +198,18 @@ brew install cmake go
 ```

 Then generate dependencies:
-
 ```
 go generate ./...
 ```
-
 Then build the binary:
-
 ```
 go build .
 ```

-More detailed instructions can be found in the [developer guide](https://github.com/ollama/ollama/blob/main/docs/development.md)
+More detailed instructions can be found in the [developer guide](https://github.com/jmorganca/ollama/blob/main/docs/development.md)
+

 ### Running local builds
-
 Next, start the server:

 ```
@@ -255,44 +248,26 @@ curl http://localhost:11434/api/chat -d '{

 See the [API documentation](./docs/api.md) for all endpoints.

+## Integrations
+
+- [ollama-python](https://github.com/jmorganca/ollama-python)
+
 ## Community Integrations

 ### Web & Desktop
-
- [Lollms-Webui](https://github.com/ParisNeo/lollms-webui)
- [LibreChat](https://github.com/danny-avila/LibreChat)
 - [Bionic GPT](https://github.com/bionic-gpt/bionic-gpt)
- [Enchanted (macOS native)](https://github.com/AugustDev/enchanted)
 - [HTML UI](https://github.com/rtcfirefly/ollama-ui)
- [Saddle](https://github.com/jikkuatwork/saddle)
 - [Chatbot UI](https://github.com/ivanfioravanti/chatbot-ollama)
 - [Typescript UI](https://github.com/ollama-interface/Ollama-Gui?tab=readme-ov-file)
 - [Minimalistic React UI for Ollama Models](https://github.com/richawo/minimal-llm-ui)
- [Open WebUI](https://github.com/open-webui/open-webui)
+- [Web UI](https://github.com/ollama-webui/ollama-webui)
 - [Ollamac](https://github.com/kevinhermawan/Ollamac)
- [big-AGI](https://github.com/enricoros/big-AGI/blob/main/docs/config-local-ollama.md)
+- [big-AGI](https://github.com/enricoros/big-agi/blob/main/docs/config-ollama.md)
 - [Cheshire Cat assistant framework](https://github.com/cheshire-cat-ai/core)
 - [Amica](https://github.com/semperai/amica)
 - [chatd](https://github.com/BruceMacD/chatd)
 - [Ollama-SwiftUI](https://github.com/kghandour/Ollama-SwiftUI)
- [Dify.AI](https://github.com/langgenius/dify)
- [MindMac](https://mindmac.app)
- [NextJS Web Interface for Ollama](https://github.com/jakobhoeg/nextjs-ollama-llm-ui)
- [Msty](https://msty.app)
- [Chatbox](https://github.com/Bin-Huang/Chatbox)
- [WinForm Ollama Copilot](https://github.com/tgraupmann/WinForm_Ollama_Copilot)
- [NextChat](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web) with [Get Started Doc](https://docs.nextchat.dev/models/ollama)
- [Alpaca WebUI](https://github.com/mmo80/alpaca-webui)
- [OllamaGUI](https://github.com/enoch1118/ollamaGUI)
- [OpenAOE](https://github.com/InternLM/OpenAOE)
- [Odin Runes](https://github.com/leonid20000/OdinRunes)
- [LLM-X: Progressive Web App](https://github.com/mrdjohnson/llm-x)
- [AnythingLLM (Docker + MacOs/Windows/Linux native app)](https://github.com/Mintplex-Labs/anything-llm)
- [Ollama Basic Chat: Uses HyperDiv Reactive UI](https://github.com/rapidarchitect/ollama_basic_chat)
- [Ollama-chats RPG](https://github.com/drazdra/ollama-chats)
- [ChatOllama: Open Source Chatbot based on Ollama with Knowledge Bases](https://github.com/sugarforever/chat-ollama)
- [CRAG Ollama Chat: Simple Web Search with Corrective RAG](https://github.com/Nagi-ovo/CRAG-Ollama-Chat)
- [RAGFlow: Open-source Retrieval-Augmented Generation engine based on deep document understanding](https://github.com/infiniflow/ragflow)
+

 ### Terminal

@@ -301,34 +276,23 @@ See the [API documentation](./docs/api.md) for all endpoints.
 - [Emacs client](https://github.com/zweifisch/ollama)
 - [gen.nvim](https://github.com/David-Kunz/gen.nvim)
 - [ollama.nvim](https://github.com/nomnivore/ollama.nvim)
- [ollero.nvim](https://github.com/marco-souza/ollero.nvim)
- [ollama-chat.nvim](https://github.com/gerazov/ollama-chat.nvim)
 - [ogpt.nvim](https://github.com/huynle/ogpt.nvim)
 - [gptel Emacs client](https://github.com/karthink/gptel)
 - [Oatmeal](https://github.com/dustinblackman/oatmeal)
 - [cmdh](https://github.com/pgibler/cmdh)
- [ooo](https://github.com/npahlfer/ooo)
- [tenere](https://github.com/pythops/tenere)
- [llm-ollama](https://github.com/taketwo/llm-ollama) for [Datasette's LLM CLI](https://llm.datasette.io/en/stable/).
- [typechat-cli](https://github.com/anaisbetts/typechat-cli)
- [ShellOracle](https://github.com/djcopley/ShellOracle)
- [tlm](https://github.com/yusufcanb/tlm)

 ### Database

 - [MindsDB](https://github.com/mindsdb/mindsdb/blob/staging/mindsdb/integrations/handlers/ollama_handler/README.md)
- [chromem-go](https://github.com/philippgille/chromem-go/blob/v0.5.0/embed_ollama.go) with [example](https://github.com/philippgille/chromem-go/tree/v0.5.0/examples/rag-wikipedia-ollama)

 ### Package managers

 - [Pacman](https://archlinux.org/packages/extra/x86_64/ollama/)
- [Helm Chart](https://artifacthub.io/packages/helm/ollama-helm/ollama)

 ### Libraries

 - [LangChain](https://python.langchain.com/docs/integrations/llms/ollama) and [LangChain.js](https://js.langchain.com/docs/modules/model_io/models/llms/integrations/ollama) with [example](https://js.langchain.com/docs/use_cases/question_answering/local_retrieval_qa)
 - [LangChainGo](https://github.com/tmc/langchaingo/) with [example](https://github.com/tmc/langchaingo/tree/main/examples/ollama-completion-example)
- [LangChain4j](https://github.com/langchain4j/langchain4j) with [example](https://github.com/langchain4j/langchain4j-examples/tree/main/ollama-examples/src/main/java)
 - [LlamaIndex](https://gpt-index.readthedocs.io/en/stable/examples/llm/ollama.html)
 - [LiteLLM](https://github.com/BerriAI/litellm)
 - [OllamaSharp for .NET](https://github.com/awaescher/OllamaSharp)
@@ -342,11 +306,7 @@ See the [API documentation](./docs/api.md) for all endpoints.
 - [LangChainDart](https://github.com/davidmigloz/langchain_dart)
 - [Semantic Kernel - Python](https://github.com/microsoft/semantic-kernel/tree/main/python/semantic_kernel/connectors/ai/ollama)
 - [Haystack](https://github.com/deepset-ai/haystack-integrations/blob/main/integrations/ollama.md)
- [Elixir LangChain](https://github.com/brainlid/langchain)
- [Ollama for R - rollama](https://github.com/JBGruber/rollama)
- [Ollama-ex for Elixir](https://github.com/lebrunel/ollama-ex)
- [Ollama Connector for SAP ABAP](https://github.com/b-tocs/abap_btocs_ollama)
- [Testcontainers](https://testcontainers.com/modules/ollama/)
+

 ### Mobile

@@ -360,7 +320,6 @@ See the [API documentation](./docs/api.md) for all endpoints.
 - [Continue](https://github.com/continuedev/continue)
 - [Obsidian Ollama plugin](https://github.com/hinterdupfinger/obsidian-ollama)
 - [Logseq Ollama plugin](https://github.com/omagdy7/ollama-logseq)
- [NotesOllama](https://github.com/andersrex/notesollama) (Apple Notes Ollama plugin)
 - [Dagger Chatbot](https://github.com/samalba/dagger-chatbot)
 - [Discord AI Bot](https://github.com/mekb-turtle/discord-ai-bot)
 - [Ollama Telegram Bot](https://github.com/ruecat/ollama-telegram)
@@ -368,12 +327,4 @@ See the [API documentation](./docs/api.md) for all endpoints.
 - [Rivet plugin](https://github.com/abrenneke/rivet-plugin-ollama)
 - [Llama Coder](https://github.com/ex3ndr/llama-coder) (Copilot alternative using Ollama)
 - [Obsidian BMO Chatbot plugin](https://github.com/longy2k/obsidian-bmo-chatbot)
- [Cliobot](https://github.com/herval/cliobot) (Telegram bot with Ollama support)
- [Copilot for Obsidian plugin](https://github.com/logancyang/obsidian-copilot)
- [Obsidian Local GPT plugin](https://github.com/pfrankov/obsidian-local-gpt)
- [Open Interpreter](https://docs.openinterpreter.com/language-model-setup/local-models/ollama)
- [twinny](https://github.com/rjmacarthy/twinny) (Copilot and Copilot chat alternative using Ollama)
- [Wingman-AI](https://github.com/RussellCanfield/wingman-ai) (Copilot code and chat alternative using Ollama and HuggingFace)
- [Page Assist](https://github.com/n4ze3m/page-assist) (Chrome Extension)
- [AI Telegram Bot](https://github.com/tusharhero/aitelegrambot) (Telegram bot using Ollama in backend)
- [AI ST Completion](https://github.com/yaroslavyaroslav/OpenAI-sublime-text) (Sublime Text 4 AI assistant plugin with Ollama support)
+- [Open Interpreter](https://docs.openinterpreter.com/language-model-setup/local-models/ollama)
--- a/api/client.go
+++ b/api/client.go
@@ -1,9 +1,3 @@
-// Package api implements the client-side API for code wishing to interact
-// with the ollama service. The methods of the [Client] type correspond to
-// the ollama REST API as described in https://github.com/ollama/ollama/blob/main/docs/api.md
-//
-// The ollama command-line client itself uses this package to interact with
-// the backend service.
 package api

 import (
@@ -11,6 +5,7 @@ import (
 	"bytes"
 	"context"
 	"encoding/json"
+	"errors"
 	"fmt"
 	"io"
 	"net"
@@ -20,15 +15,13 @@ import (
 	"runtime"
 	"strings"

-	"github.com/ollama/ollama/format"
-	"github.com/ollama/ollama/version"
+	"github.com/jmorganca/ollama/format"
+	"github.com/jmorganca/ollama/version"
 )

-// Client encapsulates client state for interacting with the ollama
-// service. Use [ClientFromEnvironment] to create new Clients.
 type Client struct {
 	base *url.URL
-	http *http.Client
+	http http.Client
 }

 func checkError(resp *http.Response, body []byte) error {
@@ -47,15 +40,6 @@ func checkError(resp *http.Response, body []byte) error {
 	return apiError
 }

-// ClientFromEnvironment creates a new [Client] using configuration from the
-// environment variable OLLAMA_HOST, which points to the network host and
-// port on which the ollama service is listenting. The format of this variable
-// is:
-//
-//	<scheme>://<host>:<port>
-//
-// If the variable is not specified, a default ollama host and port will be
-// used.
 func ClientFromEnvironment() (*Client, error) {
 	defaultPort := "11434"

@@ -82,13 +66,30 @@ func ClientFromEnvironment() (*Client, error) {
 		}
 	}

-	return &Client{
+	client := Client{
 		base: &url.URL{
 			Scheme: scheme,
 			Host:   net.JoinHostPort(host, port),
 		},
-		http: http.DefaultClient,
-	}, nil
+	}
+
+	mockRequest, err := http.NewRequest(http.MethodHead, client.base.String(), nil)
+	if err != nil {
+		return nil, err
+	}
+
+	proxyURL, err := http.ProxyFromEnvironment(mockRequest)
+	if err != nil {
+		return nil, err
+	}
+
+	client.http = http.Client{
+		Transport: &http.Transport{
+			Proxy: http.ProxyURL(proxyURL),
+		},
+	}
+
+	return &client, nil
 }

 func (c *Client) do(ctx context.Context, method, path string, reqData, respData any) error {
@@ -207,14 +208,8 @@ func (c *Client) stream(ctx context.Context, method, path string, data any, fn f
 	return nil
 }

-// GenerateResponseFunc is a function that [Client.Generate] invokes every time
-// a response is received from the service. If this function returns an error,
-// [Client.Generate] will stop generating and return this error.
 type GenerateResponseFunc func(GenerateResponse) error

-// Generate generates a response for a given prompt. The req parameter should
-// be populated with prompt details. fn is called for each response (there may
-// be multiple responses, e.g. in case streaming is enabled).
 func (c *Client) Generate(ctx context.Context, req *GenerateRequest, fn GenerateResponseFunc) error {
 	return c.stream(ctx, http.MethodPost, "/api/generate", req, func(bts []byte) error {
 		var resp GenerateResponse
@@ -226,15 +221,8 @@ func (c *Client) Generate(ctx context.Context, req *GenerateRequest, fn Generate
 	})
 }

-// ChatResponseFunc is a function that [Client.Chat] invokes every time
-// a response is received from the service. If this function returns an error,
-// [Client.Chat] will stop generating and return this error.
 type ChatResponseFunc func(ChatResponse) error

-// Chat generates the next message in a chat. [ChatRequest] may contain a
-// sequence of messages which can be used to maintain chat history with a model.
-// fn is called for each response (there may be multiple responses, e.g. if case
-// streaming is enabled).
 func (c *Client) Chat(ctx context.Context, req *ChatRequest, fn ChatResponseFunc) error {
 	return c.stream(ctx, http.MethodPost, "/api/chat", req, func(bts []byte) error {
 		var resp ChatResponse
@@ -246,14 +234,8 @@ func (c *Client) Chat(ctx context.Context, req *ChatRequest, fn ChatResponseFunc
 	})
 }

-// PullProgressFunc is a function that [Client.Pull] invokes every time there
-// is progress with a "pull" request sent to the service. If this function
-// returns an error, [Client.Pull] will stop the process and return this error.
 type PullProgressFunc func(ProgressResponse) error

-// Pull downloads a model from the ollama library. fn is called each time
-// progress is made on the request and can be used to display a progress bar,
-// etc.
 func (c *Client) Pull(ctx context.Context, req *PullRequest, fn PullProgressFunc) error {
 	return c.stream(ctx, http.MethodPost, "/api/pull", req, func(bts []byte) error {
 		var resp ProgressResponse
@@ -336,7 +318,18 @@ func (c *Client) Embeddings(ctx context.Context, req *EmbeddingRequest) (*Embedd
 }

 func (c *Client) CreateBlob(ctx context.Context, digest string, r io.Reader) error {
-	return c.do(ctx, http.MethodPost, fmt.Sprintf("/api/blobs/%s", digest), r, nil)
+	if err := c.do(ctx, http.MethodHead, fmt.Sprintf("/api/blobs/%s", digest), nil, nil); err != nil {
+		var statusError StatusError
+		if !errors.As(err, &statusError) || statusError.StatusCode != http.StatusNotFound {
+			return err
+		}
+
+		if err := c.do(ctx, http.MethodPost, fmt.Sprintf("/api/blobs/%s", digest), r, nil); err != nil {
+			return err
+		}
+	}
+
+	return nil
 }

 func (c *Client) Version(ctx context.Context) (string, error) {
--- a/api/types.go
+++ b/api/types.go
@@ -33,55 +33,25 @@ func (e StatusError) Error() string {

 type ImageData []byte

-// GenerateRequest describes a request sent by [Client.Generate]. While you
-// have to specify the Model and Prompt fields, all the other fields have
-// reasonable defaults for basic uses.
 type GenerateRequest struct {
-	// Model is the model name; it should be a name familiar to Ollama from
-	// the library at https://ollama.com/library
-	Model string `json:"model"`
+	Model    string      `json:"model"`
+	Prompt   string      `json:"prompt"`
+	System   string      `json:"system"`
+	Template string      `json:"template"`
+	Context  []int       `json:"context,omitempty"`
+	Stream   *bool       `json:"stream,omitempty"`
+	Raw      bool        `json:"raw,omitempty"`
+	Format   string      `json:"format"`
+	Images   []ImageData `json:"images,omitempty"`

-	// Prompt is the textual prompt to send to the model.
-	Prompt string `json:"prompt"`
-
-	// System overrides the model's default system message/prompt.
-	System string `json:"system"`
-
-	// Template overrides the model's default prompt template.
-	Template string `json:"template"`
-
-	// Context is the context parameter returned from a previous call to
-	// Generate call. It can be used to keep a short conversational memory.
-	Context []int `json:"context,omitempty"`
-
-	// Stream specifies whether the response is streaming; it is true by default.
-	Stream *bool `json:"stream,omitempty"`
-
-	// Raw set to true means that no formatting will be applied to the prompt.
-	Raw bool `json:"raw,omitempty"`
-
-	// Format specifies the format to return a response in.
-	Format string `json:"format"`
-
-	// KeepAlive controls how long the model will stay loaded in memory following
-	// this request.
-	KeepAlive *Duration `json:"keep_alive,omitempty"`
-
-	// Images is an optional list of base64-encoded images accompanying this
-	// request, for multimodal models.
-	Images []ImageData `json:"images,omitempty"`
-
-	// Options lists model-specific options. For example, temperature can be
-	// set through this field, if the model supports it.
 	Options map[string]interface{} `json:"options"`
 }

 type ChatRequest struct {
-	Model     string    `json:"model"`
-	Messages  []Message `json:"messages"`
-	Stream    *bool     `json:"stream,omitempty"`
-	Format    string    `json:"format"`
-	KeepAlive *Duration `json:"keep_alive,omitempty"`
+	Model    string    `json:"model"`
+	Messages []Message `json:"messages"`
+	Stream   *bool     `json:"stream,omitempty"`
+	Format   string    `json:"format"`

 	Options map[string]interface{} `json:"options"`
 }
@@ -111,7 +81,7 @@ type Metrics struct {
 	EvalDuration       time.Duration `json:"eval_duration,omitempty"`
 }

-// Options specified in GenerateRequest, if you add a new option here add it to the API docs also
+// Options specfied in GenerateRequest, if you add a new option here add it to the API docs also
 type Options struct {
 	Runner

@@ -137,30 +107,27 @@ type Options struct {

 // Runner options which must be set when the model is loaded into memory
 type Runner struct {
-	UseNUMA   bool `json:"numa,omitempty"`
-	NumCtx    int  `json:"num_ctx,omitempty"`
-	NumBatch  int  `json:"num_batch,omitempty"`
-	NumGQA    int  `json:"num_gqa,omitempty"`
-	NumGPU    int  `json:"num_gpu,omitempty"`
-	MainGPU   int  `json:"main_gpu,omitempty"`
-	LowVRAM   bool `json:"low_vram,omitempty"`
-	F16KV     bool `json:"f16_kv,omitempty"`
-	LogitsAll bool `json:"logits_all,omitempty"`
-	VocabOnly bool `json:"vocab_only,omitempty"`
-	UseMMap   bool `json:"use_mmap,omitempty"`
-	UseMLock  bool `json:"use_mlock,omitempty"`
-	NumThread int  `json:"num_thread,omitempty"`
-
-	// Unused: RopeFrequencyBase is ignored. Instead the value in the model will be used
-	RopeFrequencyBase float32 `json:"rope_frequency_base,omitempty"`
-	// Unused: RopeFrequencyScale is ignored. Instead the value in the model will be used
+	UseNUMA            bool    `json:"numa,omitempty"`
+	NumCtx             int     `json:"num_ctx,omitempty"`
+	NumBatch           int     `json:"num_batch,omitempty"`
+	NumGQA             int     `json:"num_gqa,omitempty"`
+	NumGPU             int     `json:"num_gpu,omitempty"`
+	MainGPU            int     `json:"main_gpu,omitempty"`
+	LowVRAM            bool    `json:"low_vram,omitempty"`
+	F16KV              bool    `json:"f16_kv,omitempty"`
+	LogitsAll          bool    `json:"logits_all,omitempty"`
+	VocabOnly          bool    `json:"vocab_only,omitempty"`
+	UseMMap            bool    `json:"use_mmap,omitempty"`
+	UseMLock           bool    `json:"use_mlock,omitempty"`
+	EmbeddingOnly      bool    `json:"embedding_only,omitempty"`
+	RopeFrequencyBase  float32 `json:"rope_frequency_base,omitempty"`
 	RopeFrequencyScale float32 `json:"rope_frequency_scale,omitempty"`
+	NumThread          int     `json:"num_thread,omitempty"`
 }

 type EmbeddingRequest struct {
-	Model     string    `json:"model"`
-	Prompt    string    `json:"prompt"`
-	KeepAlive *Duration `json:"keep_alive,omitempty"`
+	Model  string `json:"model"`
+	Prompt string `json:"prompt"`

 	Options map[string]interface{} `json:"options"`
 }
@@ -170,11 +137,10 @@ type EmbeddingResponse struct {
 }

 type CreateRequest struct {
-	Model        string `json:"model"`
-	Path         string `json:"path"`
-	Modelfile    string `json:"modelfile"`
-	Stream       *bool  `json:"stream,omitempty"`
-	Quantization string `json:"quantization,omitempty"`
+	Model     string `json:"model"`
+	Path      string `json:"path"`
+	Modelfile string `json:"modelfile"`
+	Stream    *bool  `json:"stream,omitempty"`

 	// Name is deprecated, see Model
 	Name string `json:"name"`
@@ -205,7 +171,6 @@ type ShowResponse struct {
 	Template   string       `json:"template,omitempty"`
 	System     string       `json:"system,omitempty"`
 	Details    ModelDetails `json:"details,omitempty"`
-	Messages   []Message    `json:"messages,omitempty"`
 }

 type CopyRequest struct {
@@ -271,7 +236,6 @@ type GenerateResponse struct {
 }

 type ModelDetails struct {
-	ParentModel       string   `json:"parent_model"`
 	Format            string   `json:"format"`
 	Family            string   `json:"family"`
 	Families          []string `json:"families"`
@@ -414,16 +378,19 @@ func DefaultOptions() Options {

 		Runner: Runner{
 			// options set when the model is loaded
-			NumCtx:    2048,
-			NumBatch:  512,
-			NumGPU:    -1, // -1 here indicates that NumGPU should be set dynamically
-			NumGQA:    1,
-			NumThread: 0, // let the runtime decide
-			LowVRAM:   false,
-			F16KV:     true,
-			UseMLock:  false,
-			UseMMap:   true,
-			UseNUMA:   false,
+			NumCtx:             2048,
+			RopeFrequencyBase:  10000.0,
+			RopeFrequencyScale: 1.0,
+			NumBatch:           512,
+			NumGPU:             -1, // -1 here indicates that NumGPU should be set dynamically
+			NumGQA:             1,
+			NumThread:          0, // let the runtime decide
+			LowVRAM:            false,
+			F16KV:              true,
+			UseMLock:           false,
+			UseMMap:            true,
+			UseNUMA:            false,
+			EmbeddingOnly:      true,
 		},
 	}
 }
@@ -443,18 +410,15 @@ func (d *Duration) UnmarshalJSON(b []byte) (err error) {
 	switch t := v.(type) {
 	case float64:
 		if t < 0 {
-			d.Duration = time.Duration(math.MaxInt64)
-		} else {
-			d.Duration = time.Duration(t * float64(time.Second))
+			t = math.MaxFloat64
 		}
+
+		d.Duration = time.Duration(t)
 	case string:
 		d.Duration, err = time.ParseDuration(t)
 		if err != nil {
 			return err
 		}
-		if d.Duration < 0 {
-			d.Duration = time.Duration(math.MaxInt64)
-		}
 	}

 	return nil
--- a/api/types_test.go
+++ b/api/types_test.go
@@ -1,50 +0,0 @@
-package api
-
-import (
-	"encoding/json"
-	"math"
-	"testing"
-	"time"
-
-	"github.com/stretchr/testify/assert"
-	"github.com/stretchr/testify/require"
-)
-
-func TestKeepAliveParsingFromJSON(t *testing.T) {
-	tests := []struct {
-		name string
-		req  string
-		exp  *Duration
-	}{
-		{
-			name: "Positive Integer",
-			req:  `{ "keep_alive": 42 }`,
-			exp:  &Duration{42 * time.Second},
-		},
-		{
-			name: "Positive Integer String",
-			req:  `{ "keep_alive": "42m" }`,
-			exp:  &Duration{42 * time.Minute},
-		},
-		{
-			name: "Negative Integer",
-			req:  `{ "keep_alive": -1 }`,
-			exp:  &Duration{math.MaxInt64},
-		},
-		{
-			name: "Negative Integer String",
-			req:  `{ "keep_alive": "-1m" }`,
-			exp:  &Duration{math.MaxInt64},
-		},
-	}
-
-	for _, test := range tests {
-		t.Run(test.name, func(t *testing.T) {
-			var dec ChatRequest
-			err := json.Unmarshal([]byte(test.req), &dec)
-			require.NoError(t, err)
-
-			assert.Equal(t, test.exp, dec.KeepAlive)
-		})
-	}
-}
--- a/macapp/.eslintrc.json
+++ b/macapp/.eslintrc.json
--- a/app/.gitignore
+++ b/app/.gitignore
@@ -1 +1,92 @@
-ollama.syso
+# Logs
+logs
+*.log
+npm-debug.log*
+yarn-debug.log*
+yarn-error.log*
+lerna-debug.log*
+
+# Diagnostic reports (https://nodejs.org/api/report.html)
+report.[0-9]*.[0-9]*.[0-9]*.[0-9]*.json
+
+# Runtime data
+pids
+*.pid
+*.seed
+*.pid.lock
+.DS_Store
+
+# Directory for instrumented libs generated by jscoverage/JSCover
+lib-cov
+
+# Coverage directory used by tools like istanbul
+coverage
+*.lcov
+
+# nyc test coverage
+.nyc_output
+
+# node-waf configuration
+.lock-wscript
+
+# Compiled binary addons (https://nodejs.org/api/addons.html)
+build/Release
+
+# Dependency directories
+node_modules/
+jspm_packages/
+
+# TypeScript v1 declaration files
+typings/
+
+# TypeScript cache
+*.tsbuildinfo
+
+# Optional npm cache directory
+.npm
+
+# Optional eslint cache
+.eslintcache
+
+# Optional REPL history
+.node_repl_history
+
+# Output of 'npm pack'
+*.tgz
+
+# Yarn Integrity file
+.yarn-integrity
+
+# dotenv environment variables file
+.env
+.env.test
+
+# parcel-bundler cache (https://parceljs.org/)
+.cache
+
+# next.js build output
+.next
+
+# nuxt.js build output
+.nuxt
+
+# vuepress build output
+.vuepress/dist
+
+# Serverless directories
+.serverless/
+
+# FuseBox cache
+.fusebox/
+
+# DynamoDB Local files
+.dynamodb/
+
+# Webpack
+.webpack/
+
+# Vite
+.vite/
+
+# Electron-Forge
+out/
--- a/app/README.md
+++ b/app/README.md
@@ -1,22 +1,21 @@
-# Ollama App
+# Desktop

-## Linux
+This app builds upon Ollama to provide a desktop experience for running models.

-TODO
+## Developing

-## MacOS
-
-TODO
-
-## Windows
-
-If you want to build the installer, youll need to install
- https://jrsoftware.org/isinfo.php
-
-
-In the top directory of this repo, run the following powershell script
-to build the ollama CLI, ollama app, and ollama installer.
+First, build the `ollama` binary:

 ```
-powershell -ExecutionPolicy Bypass -File .\scripts\build_windows.ps1
+cd ..
+go build .
 ```
+
+Then run the desktop app with `npm start`:
+
+```
+cd app
+npm install
+npm start
+```
+
--- a/app/assets/app.ico
+++ b/app/assets/app.ico
--- a/app/assets/assets.go
+++ b/app/assets/assets.go
@@ -1,17 +0,0 @@
-package assets
-
-import (
-	"embed"
-	"io/fs"
-)
-
-//go:embed *.ico
-var icons embed.FS
-
-func ListIcons() ([]string, error) {
-	return fs.Glob(icons, "*")
-}
-
-func GetIcon(filename string) ([]byte, error) {
-	return icons.ReadFile(filename)
-}
--- a/macapp/assets/icon.icns
+++ b/macapp/assets/icon.icns
--- a/macapp/assets/iconDarkTemplate.png
+++ b/macapp/assets/iconDarkTemplate.png
--- a/macapp/assets/iconDarkTemplate@2x.png
+++ b/macapp/assets/iconDarkTemplate@2x.png
--- a/macapp/assets/iconDarkUpdateTemplate.png
+++ b/macapp/assets/iconDarkUpdateTemplate.png
--- a/macapp/assets/iconDarkUpdateTemplate@2x.png
+++ b/macapp/assets/iconDarkUpdateTemplate@2x.png
--- a/macapp/assets/iconTemplate.png
+++ b/macapp/assets/iconTemplate.png
--- a/macapp/assets/iconTemplate@2x.png
+++ b/macapp/assets/iconTemplate@2x.png
--- a/macapp/assets/iconUpdateTemplate.png
+++ b/macapp/assets/iconUpdateTemplate.png
--- a/macapp/assets/iconUpdateTemplate@2x.png
+++ b/macapp/assets/iconUpdateTemplate@2x.png
--- a/app/assets/setup.bmp
+++ b/app/assets/setup.bmp
--- a/app/assets/tray.ico
+++ b/app/assets/tray.ico
--- a/app/assets/tray_upgrade.ico
+++ b/app/assets/tray_upgrade.ico
--- a/macapp/forge.config.ts
+++ b/macapp/forge.config.ts
--- a/app/lifecycle/getstarted_nonwindows.go
+++ b/app/lifecycle/getstarted_nonwindows.go
@@ -1,9 +0,0 @@
-//go:build !windows
-
-package lifecycle
-
-import "fmt"
-
-func GetStarted() error {
-	return fmt.Errorf("GetStarted not implemented")
-}
--- a/app/lifecycle/getstarted_windows.go
+++ b/app/lifecycle/getstarted_windows.go
@@ -1,44 +0,0 @@
-package lifecycle
-
-import (
-	"fmt"
-	"log/slog"
-	"os"
-	"os/exec"
-	"path/filepath"
-	"syscall"
-)
-
-func GetStarted() error {
-	const CREATE_NEW_CONSOLE = 0x00000010
-	var err error
-	bannerScript := filepath.Join(AppDir, "ollama_welcome.ps1")
-	args := []string{
-		// TODO once we're signed, the execution policy bypass should be removed
-		"powershell", "-noexit", "-ExecutionPolicy", "Bypass", "-nologo", "-file", bannerScript,
-	}
-	args[0], err = exec.LookPath(args[0])
-	if err != nil {
-		return err
-	}
-
-	// Make sure the script actually exists
-	_, err = os.Stat(bannerScript)
-	if err != nil {
-		return fmt.Errorf("getting started banner script error %s", err)
-	}
-
-	slog.Info(fmt.Sprintf("opening getting started terminal with %v", args))
-	attrs := &os.ProcAttr{
-		Files: []*os.File{os.Stdin, os.Stdout, os.Stderr},
-		Sys:   &syscall.SysProcAttr{CreationFlags: CREATE_NEW_CONSOLE, HideWindow: false},
-	}
-	proc, err := os.StartProcess(args[0], args, attrs)
-
-	if err != nil {
-		return fmt.Errorf("unable to start getting started shell %w", err)
-	}
-
-	slog.Debug(fmt.Sprintf("getting started terminal PID: %d", proc.Pid))
-	return proc.Release()
-}
--- a/app/lifecycle/lifecycle.go
+++ b/app/lifecycle/lifecycle.go
@@ -1,92 +0,0 @@
-package lifecycle
-
-import (
-	"context"
-	"fmt"
-	"log"
-	"log/slog"
-	"os"
-	"os/signal"
-	"syscall"
-
-	"github.com/ollama/ollama/app/store"
-	"github.com/ollama/ollama/app/tray"
-)
-
-func Run() {
-	InitLogging()
-
-	ctx, cancel := context.WithCancel(context.Background())
-	var done chan int
-
-	t, err := tray.NewTray()
-	if err != nil {
-		log.Fatalf("Failed to start: %s", err)
-	}
-	callbacks := t.GetCallbacks()
-
-	signals := make(chan os.Signal, 1)
-	signal.Notify(signals, syscall.SIGINT, syscall.SIGTERM)
-
-	go func() {
-		slog.Debug("starting callback loop")
-		for {
-			select {
-			case <-callbacks.Quit:
-				slog.Debug("quit called")
-				t.Quit()
-			case <-signals:
-				slog.Debug("shutting down due to signal")
-				t.Quit()
-			case <-callbacks.Update:
-				err := DoUpgrade(cancel, done)
-				if err != nil {
-					slog.Warn(fmt.Sprintf("upgrade attempt failed: %s", err))
-				}
-			case <-callbacks.ShowLogs:
-				ShowLogs()
-			case <-callbacks.DoFirstUse:
-				err := GetStarted()
-				if err != nil {
-					slog.Warn(fmt.Sprintf("Failed to launch getting started shell: %s", err))
-				}
-			}
-		}
-	}()
-
-	// Are we first use?
-	if !store.GetFirstTimeRun() {
-		slog.Debug("First time run")
-		err = t.DisplayFirstUseNotification()
-		if err != nil {
-			slog.Debug(fmt.Sprintf("XXX failed to display first use notification %v", err))
-		}
-		store.SetFirstTimeRun(true)
-	} else {
-		slog.Debug("Not first time, skipping first run notification")
-	}
-
-	if IsServerRunning(ctx) {
-		slog.Info("Detected another instance of ollama running, exiting")
-		os.Exit(1)
-	} else {
-		done, err = SpawnServer(ctx, CLIName)
-		if err != nil {
-			// TODO - should we retry in a backoff loop?
-			// TODO - should we pop up a warning and maybe add a menu item to view application logs?
-			slog.Error(fmt.Sprintf("Failed to spawn ollama server %s", err))
-			done = make(chan int, 1)
-			done <- 1
-		}
-	}
-
-	StartBackgroundUpdaterChecker(ctx, t.UpdateAvailable)
-
-	t.Run()
-	cancel()
-	slog.Info("Waiting for ollama server to shutdown...")
-	if done != nil {
-		<-done
-	}
-	slog.Info("Ollama app exiting")
-}
--- a/app/lifecycle/logging.go
+++ b/app/lifecycle/logging.go
@@ -1,46 +0,0 @@
-package lifecycle
-
-import (
-	"fmt"
-	"log/slog"
-	"os"
-	"path/filepath"
-)
-
-func InitLogging() {
-	level := slog.LevelInfo
-
-	if debug := os.Getenv("OLLAMA_DEBUG"); debug != "" {
-		level = slog.LevelDebug
-	}
-
-	var logFile *os.File
-	var err error
-	// Detect if we're a GUI app on windows, and if not, send logs to console
-	if os.Stderr.Fd() != 0 {
-		// Console app detected
-		logFile = os.Stderr
-		// TODO - write one-line to the app.log file saying we're running in console mode to help avoid confusion
-	} else {
-		logFile, err = os.OpenFile(AppLogFile, os.O_APPEND|os.O_WRONLY|os.O_CREATE, 0755)
-		if err != nil {
-			slog.Error(fmt.Sprintf("failed to create server log %v", err))
-			return
-		}
-	}
-	handler := slog.NewTextHandler(logFile, &slog.HandlerOptions{
-		Level:     level,
-		AddSource: true,
-		ReplaceAttr: func(_ []string, attr slog.Attr) slog.Attr {
-			if attr.Key == slog.SourceKey {
-				source := attr.Value.Any().(*slog.Source)
-				source.File = filepath.Base(source.File)
-			}
-			return attr
-		},
-	})
-
-	slog.SetDefault(slog.New(handler))
-
-	slog.Info("ollama app started")
-}
--- a/app/lifecycle/logging_nonwindows.go
+++ b/app/lifecycle/logging_nonwindows.go
@@ -1,9 +0,0 @@
-//go:build !windows
-
-package lifecycle
-
-import "log/slog"
-
-func ShowLogs() {
-	slog.Warn("ShowLogs not yet implemented")
-}
--- a/app/lifecycle/logging_windows.go
+++ b/app/lifecycle/logging_windows.go
@@ -1,19 +0,0 @@
-package lifecycle
-
-import (
-	"fmt"
-	"log/slog"
-	"os/exec"
-	"syscall"
-)
-
-func ShowLogs() {
-	cmd_path := "c:\\Windows\\system32\\cmd.exe"
-	slog.Debug(fmt.Sprintf("viewing logs with start %s", AppDataDir))
-	cmd := exec.Command(cmd_path, "/c", "start", AppDataDir)
-	cmd.SysProcAttr = &syscall.SysProcAttr{HideWindow: false, CreationFlags: 0x08000000}
-	err := cmd.Start()
-	if err != nil {
-		slog.Error(fmt.Sprintf("Failed to open log dir: %s", err))
-	}
-}
--- a/app/lifecycle/paths.go
+++ b/app/lifecycle/paths.go
@@ -1,79 +0,0 @@
-package lifecycle
-
-import (
-	"errors"
-	"fmt"
-	"log/slog"
-	"os"
-	"path/filepath"
-	"runtime"
-	"strings"
-)
-
-var (
-	AppName    = "ollama app"
-	CLIName    = "ollama"
-	AppDir     = "/opt/Ollama"
-	AppDataDir = "/opt/Ollama"
-	// TODO - should there be a distinct log dir?
-	UpdateStageDir = "/tmp"
-	AppLogFile     = "/tmp/ollama_app.log"
-	ServerLogFile  = "/tmp/ollama.log"
-	UpgradeLogFile = "/tmp/ollama_update.log"
-	Installer      = "OllamaSetup.exe"
-)
-
-func init() {
-	if runtime.GOOS == "windows" {
-		AppName += ".exe"
-		CLIName += ".exe"
-		// Logs, configs, downloads go to LOCALAPPDATA
-		localAppData := os.Getenv("LOCALAPPDATA")
-		AppDataDir = filepath.Join(localAppData, "Ollama")
-		UpdateStageDir = filepath.Join(AppDataDir, "updates")
-		AppLogFile = filepath.Join(AppDataDir, "app.log")
-		ServerLogFile = filepath.Join(AppDataDir, "server.log")
-		UpgradeLogFile = filepath.Join(AppDataDir, "upgrade.log")
-
-		// Executables are stored in APPDATA
-		AppDir = filepath.Join(localAppData, "Programs", "Ollama")
-
-		// Make sure we have PATH set correctly for any spawned children
-		paths := strings.Split(os.Getenv("PATH"), ";")
-		// Start with whatever we find in the PATH/LD_LIBRARY_PATH
-		found := false
-		for _, path := range paths {
-			d, err := filepath.Abs(path)
-			if err != nil {
-				continue
-			}
-			if strings.EqualFold(AppDir, d) {
-				found = true
-			}
-		}
-		if !found {
-			paths = append(paths, AppDir)
-
-			pathVal := strings.Join(paths, ";")
-			slog.Debug("setting PATH=" + pathVal)
-			err := os.Setenv("PATH", pathVal)
-			if err != nil {
-				slog.Error(fmt.Sprintf("failed to update PATH: %s", err))
-			}
-		}
-
-		// Make sure our logging dir exists
-		_, err := os.Stat(AppDataDir)
-		if errors.Is(err, os.ErrNotExist) {
-			if err := os.MkdirAll(AppDataDir, 0o755); err != nil {
-				slog.Error(fmt.Sprintf("create ollama dir %s: %v", AppDataDir, err))
-			}
-		}
-
-	} else if runtime.GOOS == "darwin" {
-		// TODO
-		AppName += ".app"
-		// } else if runtime.GOOS == "linux" {
-		// TODO
-	}
-}
--- a/app/lifecycle/server.go
+++ b/app/lifecycle/server.go
@@ -1,162 +0,0 @@
-package lifecycle
-
-import (
-	"context"
-	"errors"
-	"fmt"
-	"io"
-	"log/slog"
-	"os"
-	"os/exec"
-	"path/filepath"
-	"syscall"
-	"time"
-
-	"github.com/ollama/ollama/api"
-)
-
-func getCLIFullPath(command string) string {
-	cmdPath := ""
-	appExe, err := os.Executable()
-	if err == nil {
-		cmdPath = filepath.Join(filepath.Dir(appExe), command)
-		_, err := os.Stat(cmdPath)
-		if err == nil {
-			return cmdPath
-		}
-	}
-	cmdPath, err = exec.LookPath(command)
-	if err == nil {
-		_, err := os.Stat(cmdPath)
-		if err == nil {
-			return cmdPath
-		}
-	}
-	pwd, err := os.Getwd()
-	if err == nil {
-		cmdPath = filepath.Join(pwd, command)
-		_, err = os.Stat(cmdPath)
-		if err == nil {
-			return cmdPath
-		}
-	}
-
-	return command
-}
-
-func SpawnServer(ctx context.Context, command string) (chan int, error) {
-	done := make(chan int)
-
-	logDir := filepath.Dir(ServerLogFile)
-	_, err := os.Stat(logDir)
-	if errors.Is(err, os.ErrNotExist) {
-		if err := os.MkdirAll(logDir, 0o755); err != nil {
-			return done, fmt.Errorf("create ollama server log dir %s: %v", logDir, err)
-		}
-	}
-
-	cmd := getCmd(ctx, getCLIFullPath(command))
-	// send stdout and stderr to a file
-	stdout, err := cmd.StdoutPipe()
-	if err != nil {
-		return done, fmt.Errorf("failed to spawn server stdout pipe %s", err)
-	}
-	stderr, err := cmd.StderrPipe()
-	if err != nil {
-		return done, fmt.Errorf("failed to spawn server stderr pipe %s", err)
-	}
-	stdin, err := cmd.StdinPipe()
-	if err != nil {
-		return done, fmt.Errorf("failed to spawn server stdin pipe %s", err)
-	}
-
-	// TODO - rotation
-	logFile, err := os.OpenFile(ServerLogFile, os.O_APPEND|os.O_WRONLY|os.O_CREATE, 0755)
-	if err != nil {
-		return done, fmt.Errorf("failed to create server log %w", err)
-	}
-	go func() {
-		defer logFile.Close()
-		io.Copy(logFile, stdout) //nolint:errcheck
-	}()
-	go func() {
-		defer logFile.Close()
-		io.Copy(logFile, stderr) //nolint:errcheck
-	}()
-
-	// Re-wire context done behavior to attempt a graceful shutdown of the server
-	cmd.Cancel = func() error {
-		if cmd.Process != nil {
-			cmd.Process.Signal(os.Interrupt) //nolint:errcheck
-			tick := time.NewTicker(10 * time.Millisecond)
-			defer tick.Stop()
-			for {
-				select {
-				case <-tick.C:
-					// OS agnostic "is it still running"
-					if proc, err := os.FindProcess(int(cmd.Process.Pid)); err != nil || errors.Is(proc.Signal(syscall.Signal(0)), os.ErrProcessDone) {
-						return nil //nolint:nilerr
-					}
-				case <-time.After(5 * time.Second):
-					slog.Warn("graceful server shutdown timeout, killing", "pid", cmd.Process.Pid)
-					cmd.Process.Kill() //nolint:errcheck
-				}
-			}
-		}
-		return nil
-	}
-
-	// run the command and wait for it to finish
-	if err := cmd.Start(); err != nil {
-		return done, fmt.Errorf("failed to start server %w", err)
-	}
-	if cmd.Process != nil {
-		slog.Info(fmt.Sprintf("started ollama server with pid %d", cmd.Process.Pid))
-	}
-	slog.Info(fmt.Sprintf("ollama server logs %s", ServerLogFile))
-
-	go func() {
-		// Keep the server running unless we're shuttind down the app
-		crashCount := 0
-		for {
-			cmd.Wait() //nolint:errcheck
-			stdin.Close()
-			var code int
-			if cmd.ProcessState != nil {
-				code = cmd.ProcessState.ExitCode()
-			}
-
-			select {
-			case <-ctx.Done():
-				slog.Info(fmt.Sprintf("server shutdown with exit code %d", code))
-				done <- code
-				return
-			default:
-				crashCount++
-				slog.Warn(fmt.Sprintf("server crash %d - exit code %d - respawning", crashCount, code))
-				time.Sleep(500 * time.Millisecond)
-				if err := cmd.Start(); err != nil {
-					slog.Error(fmt.Sprintf("failed to restart server %s", err))
-					// Keep trying, but back off if we keep failing
-					time.Sleep(time.Duration(crashCount) * time.Second)
-				}
-			}
-		}
-	}()
-	return done, nil
-}
-
-func IsServerRunning(ctx context.Context) bool {
-	client, err := api.ClientFromEnvironment()
-	if err != nil {
-		slog.Info("unable to connect to server")
-		return false
-	}
-	err = client.Heartbeat(ctx)
-	if err != nil {
-		slog.Debug(fmt.Sprintf("heartbeat from server: %s", err))
-		slog.Info("unable to connect to server")
-		return false
-	}
-	return true
-}
--- a/app/lifecycle/server_unix.go
+++ b/app/lifecycle/server_unix.go
@@ -1,12 +0,0 @@
-//go:build !windows
-
-package lifecycle
-
-import (
-	"context"
-	"os/exec"
-)
-
-func getCmd(ctx context.Context, cmd string) *exec.Cmd {
-	return exec.CommandContext(ctx, cmd, "serve")
-}
--- a/app/lifecycle/server_windows.go
+++ b/app/lifecycle/server_windows.go
@@ -1,13 +0,0 @@
-package lifecycle
-
-import (
-	"context"
-	"os/exec"
-	"syscall"
-)
-
-func getCmd(ctx context.Context, exePath string) *exec.Cmd {
-	cmd := exec.CommandContext(ctx, exePath, "serve")
-	cmd.SysProcAttr = &syscall.SysProcAttr{HideWindow: true, CreationFlags: 0x08000000}
-	return cmd
-}
--- a/app/lifecycle/updater.go
+++ b/app/lifecycle/updater.go
@@ -1,228 +0,0 @@
-package lifecycle
-
-import (
-	"context"
-	"crypto/rand"
-	"encoding/json"
-	"errors"
-	"fmt"
-	"io"
-	"log/slog"
-	"mime"
-	"net/http"
-	"net/url"
-	"os"
-	"path"
-	"path/filepath"
-	"runtime"
-	"strings"
-	"time"
-
-	"github.com/ollama/ollama/auth"
-	"github.com/ollama/ollama/version"
-)
-
-var (
-	UpdateCheckURLBase  = "https://ollama.com/api/update"
-	UpdateDownloaded    = false
-	UpdateCheckInterval = 60 * 60 * time.Second
-)
-
-// TODO - maybe move up to the API package?
-type UpdateResponse struct {
-	UpdateURL     string `json:"url"`
-	UpdateVersion string `json:"version"`
-}
-
-func IsNewReleaseAvailable(ctx context.Context) (bool, UpdateResponse) {
-	var updateResp UpdateResponse
-
-	requestURL, err := url.Parse(UpdateCheckURLBase)
-	if err != nil {
-		return false, updateResp
-	}
-
-	query := requestURL.Query()
-	query.Add("os", runtime.GOOS)
-	query.Add("arch", runtime.GOARCH)
-	query.Add("version", version.Version)
-	query.Add("ts", fmt.Sprintf("%d", time.Now().Unix()))
-
-	nonce, err := auth.NewNonce(rand.Reader, 16)
-	if err != nil {
-		return false, updateResp
-	}
-
-	query.Add("nonce", nonce)
-	requestURL.RawQuery = query.Encode()
-
-	data := []byte(fmt.Sprintf("%s,%s", http.MethodGet, requestURL.RequestURI()))
-	signature, err := auth.Sign(ctx, data)
-	if err != nil {
-		return false, updateResp
-	}
-
-	req, err := http.NewRequestWithContext(ctx, http.MethodGet, requestURL.String(), nil)
-	if err != nil {
-		slog.Warn(fmt.Sprintf("failed to check for update: %s", err))
-		return false, updateResp
-	}
-	req.Header.Set("Authorization", signature)
-	req.Header.Set("User-Agent", fmt.Sprintf("ollama/%s (%s %s) Go/%s", version.Version, runtime.GOARCH, runtime.GOOS, runtime.Version()))
-
-	slog.Debug("checking for available update", "requestURL", requestURL)
-	resp, err := http.DefaultClient.Do(req)
-	if err != nil {
-		slog.Warn(fmt.Sprintf("failed to check for update: %s", err))
-		return false, updateResp
-	}
-	defer resp.Body.Close()
-
-	if resp.StatusCode == 204 {
-		slog.Debug("check update response 204 (current version is up to date)")
-		return false, updateResp
-	}
-	body, err := io.ReadAll(resp.Body)
-	if err != nil {
-		slog.Warn(fmt.Sprintf("failed to read body response: %s", err))
-	}
-
-	if resp.StatusCode != 200 {
-		slog.Info(fmt.Sprintf("check update error %d - %.96s", resp.StatusCode, string(body)))
-		return false, updateResp
-	}
-	err = json.Unmarshal(body, &updateResp)
-	if err != nil {
-		slog.Warn(fmt.Sprintf("malformed response checking for update: %s", err))
-		return false, updateResp
-	}
-	// Extract the version string from the URL in the github release artifact path
-	updateResp.UpdateVersion = path.Base(path.Dir(updateResp.UpdateURL))
-
-	slog.Info("New update available at " + updateResp.UpdateURL)
-	return true, updateResp
-}
-
-func DownloadNewRelease(ctx context.Context, updateResp UpdateResponse) error {
-	// Do a head first to check etag info
-	req, err := http.NewRequestWithContext(ctx, http.MethodHead, updateResp.UpdateURL, nil)
-	if err != nil {
-		return err
-	}
-
-	resp, err := http.DefaultClient.Do(req)
-	if err != nil {
-		return fmt.Errorf("error checking update: %w", err)
-	}
-	if resp.StatusCode != 200 {
-		return fmt.Errorf("unexpected status attempting to download update %d", resp.StatusCode)
-	}
-	resp.Body.Close()
-	etag := strings.Trim(resp.Header.Get("etag"), "\"")
-	if etag == "" {
-		slog.Debug("no etag detected, falling back to filename based dedup")
-		etag = "_"
-	}
-	filename := Installer
-	_, params, err := mime.ParseMediaType(resp.Header.Get("content-disposition"))
-	if err == nil {
-		filename = params["filename"]
-	}
-
-	stageFilename := filepath.Join(UpdateStageDir, etag, filename)
-
-	// Check to see if we already have it downloaded
-	_, err = os.Stat(stageFilename)
-	if err == nil {
-		slog.Info("update already downloaded")
-		return nil
-	}
-
-	cleanupOldDownloads()
-
-	req.Method = http.MethodGet
-	resp, err = http.DefaultClient.Do(req)
-	if err != nil {
-		return fmt.Errorf("error checking update: %w", err)
-	}
-	defer resp.Body.Close()
-	etag = strings.Trim(resp.Header.Get("etag"), "\"")
-	if etag == "" {
-		slog.Debug("no etag detected, falling back to filename based dedup") // TODO probably can get rid of this redundant log
-		etag = "_"
-	}
-
-	stageFilename = filepath.Join(UpdateStageDir, etag, filename)
-
-	_, err = os.Stat(filepath.Dir(stageFilename))
-	if errors.Is(err, os.ErrNotExist) {
-		if err := os.MkdirAll(filepath.Dir(stageFilename), 0o755); err != nil {
-			return fmt.Errorf("create ollama dir %s: %v", filepath.Dir(stageFilename), err)
-		}
-	}
-
-	payload, err := io.ReadAll(resp.Body)
-	if err != nil {
-		return fmt.Errorf("failed to read body response: %w", err)
-	}
-	fp, err := os.OpenFile(stageFilename, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0o755)
-	if err != nil {
-		return fmt.Errorf("write payload %s: %w", stageFilename, err)
-	}
-	defer fp.Close()
-	if n, err := fp.Write(payload); err != nil || n != len(payload) {
-		return fmt.Errorf("write payload %s: %d vs %d -- %w", stageFilename, n, len(payload), err)
-	}
-	slog.Info("new update downloaded " + stageFilename)
-
-	UpdateDownloaded = true
-	return nil
-}
-
-func cleanupOldDownloads() {
-	files, err := os.ReadDir(UpdateStageDir)
-	if err != nil && errors.Is(err, os.ErrNotExist) {
-		// Expected behavior on first run
-		return
-	} else if err != nil {
-		slog.Warn(fmt.Sprintf("failed to list stage dir: %s", err))
-		return
-	}
-	for _, file := range files {
-		fullname := filepath.Join(UpdateStageDir, file.Name())
-		slog.Debug("cleaning up old download: " + fullname)
-		err = os.RemoveAll(fullname)
-		if err != nil {
-			slog.Warn(fmt.Sprintf("failed to cleanup stale update download %s", err))
-		}
-	}
-}
-
-func StartBackgroundUpdaterChecker(ctx context.Context, cb func(string) error) {
-	go func() {
-		// Don't blast an update message immediately after startup
-		// time.Sleep(30 * time.Second)
-		time.Sleep(3 * time.Second)
-
-		for {
-			available, resp := IsNewReleaseAvailable(ctx)
-			if available {
-				err := DownloadNewRelease(ctx, resp)
-				if err != nil {
-					slog.Error(fmt.Sprintf("failed to download new release: %s", err))
-				}
-				err = cb(resp.UpdateVersion)
-				if err != nil {
-					slog.Warn(fmt.Sprintf("failed to register update available with tray: %s", err))
-				}
-			}
-			select {
-			case <-ctx.Done():
-				slog.Debug("stopping background update checker")
-				return
-			default:
-				time.Sleep(UpdateCheckInterval)
-			}
-		}
-	}()
-}
--- a/app/lifecycle/updater_nonwindows.go
+++ b/app/lifecycle/updater_nonwindows.go
@@ -1,12 +0,0 @@
-//go:build !windows
-
-package lifecycle
-
-import (
-	"context"
-	"fmt"
-)
-
-func DoUpgrade(cancel context.CancelFunc, done chan int) error {
-	return fmt.Errorf("DoUpgrade not yet implemented")
-}
--- a/app/lifecycle/updater_windows.go
+++ b/app/lifecycle/updater_windows.go
@@ -1,80 +0,0 @@
-package lifecycle
-
-import (
-	"context"
-	"fmt"
-	"log/slog"
-	"os"
-	"os/exec"
-	"path/filepath"
-)
-
-func DoUpgrade(cancel context.CancelFunc, done chan int) error {
-	files, err := filepath.Glob(filepath.Join(UpdateStageDir, "*", "*.exe")) // TODO generalize for multiplatform
-	if err != nil {
-		return fmt.Errorf("failed to lookup downloads: %s", err)
-	}
-	if len(files) == 0 {
-		return fmt.Errorf("no update downloads found")
-	} else if len(files) > 1 {
-		// Shouldn't happen
-		slog.Warn(fmt.Sprintf("multiple downloads found, using first one %v", files))
-	}
-	installerExe := files[0]
-
-	slog.Info("starting upgrade with " + installerExe)
-	slog.Info("upgrade log file " + UpgradeLogFile)
-
-	// When running in debug mode, we'll be "verbose" and let the installer pop up and prompt
-	installArgs := []string{
-		"/CLOSEAPPLICATIONS",                    // Quit the tray app if it's still running
-		"/LOG=" + filepath.Base(UpgradeLogFile), // Only relative seems reliable, so set pwd
-		"/FORCECLOSEAPPLICATIONS",               // Force close the tray app - might be needed
-	}
-	// When we're not in debug mode, make the upgrade as quiet as possible (no GUI, no prompts)
-	// TODO - temporarily disable since we're pinning in debug mode for the preview
-	// if debug := os.Getenv("OLLAMA_DEBUG"); debug == "" {
-	installArgs = append(installArgs,
-		"/SP", // Skip the "This will install... Do you wish to continue" prompt
-		"/SUPPRESSMSGBOXES",
-		"/SILENT",
-		"/VERYSILENT",
-	)
-	// }
-
-	// Safeguard in case we have requests in flight that need to drain...
-	slog.Info("Waiting for server to shutdown")
-	cancel()
-	if done != nil {
-		<-done
-	} else {
-		// Shouldn't happen
-		slog.Warn("done chan was nil, not actually waiting")
-	}
-
-	slog.Debug(fmt.Sprintf("starting installer: %s %v", installerExe, installArgs))
-	os.Chdir(filepath.Dir(UpgradeLogFile)) //nolint:errcheck
-	cmd := exec.Command(installerExe, installArgs...)
-
-	if err := cmd.Start(); err != nil {
-		return fmt.Errorf("unable to start ollama app %w", err)
-	}
-
-	if cmd.Process != nil {
-		err = cmd.Process.Release()
-		if err != nil {
-			slog.Error(fmt.Sprintf("failed to release server process: %s", err))
-		}
-	} else {
-		// TODO - some details about why it didn't start, or is this a pedantic error case?
-		return fmt.Errorf("installer process did not start")
-	}
-
-	// TODO should we linger for a moment and check to make sure it's actually running by checking the pid?
-
-	slog.Info("Installer started in background, exiting")
-
-	os.Exit(0)
-	// Not reached
-	return nil
-}
--- a/app/main.go
+++ b/app/main.go
@@ -1,12 +0,0 @@
-package main
-
-// Compile with the following to get rid of the cmd pop up on windows
-// go build -ldflags="-H windowsgui" .
-
-import (
-	"github.com/ollama/ollama/app/lifecycle"
-)
-
-func main() {
-	lifecycle.Run()
-}
--- a/app/ollama.iss
+++ b/app/ollama.iss
@@ -1,159 +0,0 @@
-; Inno Setup Installer for Ollama
-;
-; To build the installer use the build script invoked from the top of the source tree
-; 
-; powershell -ExecutionPolicy Bypass -File .\scripts\build_windows.ps
-
-
-#define MyAppName "Ollama"
-#if GetEnv("PKG_VERSION") != ""
-  #define MyAppVersion GetEnv("PKG_VERSION")
-#else
-  #define MyAppVersion "0.0.0"
-#endif
-#define MyAppPublisher "Ollama"
-#define MyAppURL "https://ollama.com/"
-#define MyAppExeName "ollama app.exe"
-#define MyIcon ".\assets\app.ico"
-
-[Setup]
-; NOTE: The value of AppId uniquely identifies this application. Do not use the same AppId value in installers for other applications.
-; (To generate a new GUID, click Tools | Generate GUID inside the IDE.)
-AppId={{44E83376-CE68-45EB-8FC1-393500EB558C}
-AppName={#MyAppName}
-AppVersion={#MyAppVersion}
-VersionInfoVersion={#MyAppVersion}
-;AppVerName={#MyAppName} {#MyAppVersion}
-AppPublisher={#MyAppPublisher}
-AppPublisherURL={#MyAppURL}
-AppSupportURL={#MyAppURL}
-AppUpdatesURL={#MyAppURL}
-ArchitecturesAllowed=x64 arm64
-ArchitecturesInstallIn64BitMode=x64 arm64
-DefaultDirName={localappdata}\Programs\{#MyAppName}
-DefaultGroupName={#MyAppName}
-DisableProgramGroupPage=yes
-PrivilegesRequired=lowest
-OutputBaseFilename="OllamaSetup"
-SetupIconFile={#MyIcon}
-UninstallDisplayIcon={uninstallexe}
-Compression=lzma2
-SolidCompression=no
-WizardStyle=modern
-ChangesEnvironment=yes
-OutputDir=..\dist\
-
-; Disable logging once everything's battle tested
-; Filename will be %TEMP%\Setup Log*.txt
-SetupLogging=yes
-CloseApplications=yes
-RestartApplications=no
-
-; https://jrsoftware.org/ishelp/index.php?topic=setup_wizardimagefile
-WizardSmallImageFile=.\assets\setup.bmp
-
-; TODO verifty actual min windows version...
-; OG Win 10
-MinVersion=10.0.10240
-
-; First release that supports WinRT UI Composition for win32 apps
-; MinVersion=10.0.17134
-; First release with XAML Islands - possible UI path forward
-; MinVersion=10.0.18362
-
-; quiet...
-DisableDirPage=yes
-DisableFinishedPage=yes
-DisableReadyMemo=yes
-DisableReadyPage=yes
-DisableStartupPrompt=yes
-DisableWelcomePage=yes
-
-; TODO - percentage can't be set less than 100, so how to make it shorter?
-; WizardSizePercent=100,80
-
-#if GetEnv("KEY_CONTAINER")
-SignTool=MySignTool
-SignedUninstaller=yes
-#endif
-
-SetupMutex=OllamaSetupMutex
-
-[Languages]
-Name: "english"; MessagesFile: "compiler:Default.isl"
-
-[LangOptions]
-DialogFontSize=12
-
-[Files]
-Source: ".\app.exe"; DestDir: "{app}"; DestName: "{#MyAppExeName}" ; Flags: ignoreversion 64bit
-Source: "..\ollama.exe"; DestDir: "{app}"; Flags: ignoreversion 64bit
-Source: "..\dist\windeps\*.dll"; DestDir: "{app}"; Flags: ignoreversion 64bit
-Source: "..\dist\ollama_welcome.ps1"; DestDir: "{app}"; Flags: ignoreversion
-Source: ".\assets\app.ico"; DestDir: "{app}"; Flags: ignoreversion
-; Assumes v5.7, may need adjustments for v6
-#if GetEnv("HIP_PATH") != ""
-  Source: "{#GetEnv('HIP_PATH')}\bin\hipblas.dll"; DestDir: "{app}\rocm\"; Flags: ignoreversion
-  Source: "{#GetEnv('HIP_PATH')}\bin\rocblas.dll"; DestDir: "{app}\rocm\"; Flags: ignoreversion
-  ; amdhip64.dll dependency comes from the driver and must be installed already
-  Source: "{#GetEnv('HIP_PATH')}\bin\rocblas\library\*"; DestDir: "{app}\rocm\rocblas\library\"; Flags: ignoreversion
-#endif
-
-
-[Icons]
-Name: "{group}\{#MyAppName}"; Filename: "{app}\{#MyAppExeName}"; IconFilename: "{app}\app.ico"
-Name: "{userstartup}\{#MyAppName}"; Filename: "{app}\{#MyAppExeName}"; IconFilename: "{app}\app.ico"
-Name: "{userprograms}\{#MyAppName}"; Filename: "{app}\{#MyAppExeName}"; IconFilename: "{app}\app.ico"
-
-[Run]
-Filename: "{cmd}"; Parameters: "/C set PATH={app};%PATH% & ""{app}\{#MyAppExeName}"""; Flags: postinstall nowait runhidden
-
-[UninstallRun]
-; Filename: "{cmd}"; Parameters: "/C ""taskkill /im ''{#MyAppExeName}'' /f /t"; Flags: runhidden
-; Filename: "{cmd}"; Parameters: "/C ""taskkill /im ollama.exe /f /t"; Flags: runhidden
-Filename: "taskkill"; Parameters: "/im ""{#MyAppExeName}"" /f /t"; Flags: runhidden
-Filename: "taskkill"; Parameters: "/im ""ollama.exe"" /f /t"; Flags: runhidden
-; HACK!  need to give the server and app enough time to exit
-; TODO - convert this to a Pascal code script so it waits until they're no longer running, then completes
-Filename: "{cmd}"; Parameters: "/c timeout 5"; Flags: runhidden
-
-[UninstallDelete]
-Type: filesandordirs; Name: "{%TEMP}\ollama*"
-Type: filesandordirs; Name: "{%LOCALAPPDATA}\Ollama"
-Type: filesandordirs; Name: "{%LOCALAPPDATA}\Programs\Ollama"
-Type: filesandordirs; Name: "{%USERPROFILE}\.ollama\models"
-Type: filesandordirs; Name: "{%USERPROFILE}\.ollama\history"
-; NOTE: if the user has a custom OLLAMA_MODELS it will be preserved
-
-[Messages]
-WizardReady=Ollama Windows Preview
-ReadyLabel1=%nLet's get you up and running with your own large language models.
-SetupAppRunningError=Another Ollama installer is running.%n%nPlease cancel or finish the other installer, then click OK to continue with this install, or Cancel to exit.
-
-
-;FinishedHeadingLabel=Run your first model
-;FinishedLabel=%nRun this command in a PowerShell or cmd terminal.%n%n%n    ollama run llama2
-;ClickFinish=%n
-
-[Registry]
-Root: HKCU; Subkey: "Environment"; \
-    ValueType: expandsz; ValueName: "Path"; ValueData: "{olddata};{app}"; \
-    Check: NeedsAddPath('{app}')
-
-[Code]
-
-function NeedsAddPath(Param: string): boolean;
-var
-  OrigPath: string;
-begin
-  if not RegQueryStringValue(HKEY_CURRENT_USER,
-    'Environment',
-    'Path', OrigPath)
-  then begin
-    Result := True;
-    exit;
-  end;
-  { look for the path with leading and trailing semicolon }
-  { Pos() returns 0 if not found }
-  Result := Pos(';' + ExpandConstant(Param) + ';', ';' + OrigPath + ';') = 0;
-end;
--- a/app/ollama.rc
+++ b/app/ollama.rc
@@ -1,29 +0,0 @@
-#include <winver.h>
-
-VS_VERSION_INFO VERSIONINFO
- FILEFLAGSMASK 0x3fL
-#ifdef _DEBUG
- FILEFLAGS 0x1L
-#else
- FILEFLAGS 0x0L
-#endif
- FILEOS 0x40004L
- FILETYPE 0x1L
- FILESUBTYPE 0x0L
-BEGIN
-    BLOCK "StringFileInfo"
-    BEGIN
-        BLOCK "040904b0"
-        BEGIN
-            VALUE "FileDescription", "Ollama"
-            VALUE "InternalName", "Ollama"
-            VALUE "OriginalFilename", "ollama app.exe"
-            VALUE "ProductName", "Ollama"
-        END
-    END
-
-    BLOCK "VarFileInfo"
-    BEGIN
-        VALUE "Translation", 0x409, 1200
-    END
-END
--- a/app/ollama_welcome.ps1
+++ b/app/ollama_welcome.ps1
@@ -1,8 +0,0 @@
-# TODO - consider ANSI colors and maybe ASCII art...
-write-host ""
-write-host "Welcome to Ollama!"
-write-host ""
-write-host "Run your first model:"
-write-host ""
-write-host "`tollama run llama2"
-write-host ""
--- a/macapp/package-lock.json
+++ b/macapp/package-lock.json
--- a/macapp/package.json
+++ b/macapp/package.json
--- a/macapp/postcss.config.js
+++ b/macapp/postcss.config.js
--- a/macapp/src/app.css
+++ b/macapp/src/app.css
--- a/macapp/src/app.tsx
+++ b/macapp/src/app.tsx
--- a/macapp/src/declarations.d.ts
+++ b/macapp/src/declarations.d.ts
--- a/macapp/src/index.html
+++ b/macapp/src/index.html
--- a/macapp/src/index.ts
+++ b/macapp/src/index.ts
--- a/macapp/src/install.ts
+++ b/macapp/src/install.ts
--- a/macapp/src/ollama.svg
+++ b/macapp/src/ollama.svg
--- a/macapp/src/preload.ts
+++ b/macapp/src/preload.ts
--- a/macapp/src/renderer.tsx
+++ b/macapp/src/renderer.tsx
--- a/app/store/store.go
+++ b/app/store/store.go
@@ -1,98 +0,0 @@
-package store
-
-import (
-	"encoding/json"
-	"errors"
-	"fmt"
-	"log/slog"
-	"os"
-	"path/filepath"
-	"sync"
-
-	"github.com/google/uuid"
-)
-
-type Store struct {
-	ID           string `json:"id"`
-	FirstTimeRun bool   `json:"first-time-run"`
-}
-
-var (
-	lock  sync.Mutex
-	store Store
-)
-
-func GetID() string {
-	lock.Lock()
-	defer lock.Unlock()
-	if store.ID == "" {
-		initStore()
-	}
-	return store.ID
-
-}
-
-func GetFirstTimeRun() bool {
-	lock.Lock()
-	defer lock.Unlock()
-	if store.ID == "" {
-		initStore()
-	}
-	return store.FirstTimeRun
-}
-
-func SetFirstTimeRun(val bool) {
-	lock.Lock()
-	defer lock.Unlock()
-	if store.FirstTimeRun == val {
-		return
-	}
-	store.FirstTimeRun = val
-	writeStore(getStorePath())
-}
-
-// lock must be held
-func initStore() {
-	storeFile, err := os.Open(getStorePath())
-	if err == nil {
-		defer storeFile.Close()
-		err = json.NewDecoder(storeFile).Decode(&store)
-		if err == nil {
-			slog.Debug(fmt.Sprintf("loaded existing store %s - ID: %s", getStorePath(), store.ID))
-			return
-		}
-	} else if !errors.Is(err, os.ErrNotExist) {
-		slog.Debug(fmt.Sprintf("unexpected error searching for store: %s", err))
-	}
-	slog.Debug("initializing new store")
-	store.ID = uuid.New().String()
-	writeStore(getStorePath())
-}
-
-func writeStore(storeFilename string) {
-	ollamaDir := filepath.Dir(storeFilename)
-	_, err := os.Stat(ollamaDir)
-	if errors.Is(err, os.ErrNotExist) {
-		if err := os.MkdirAll(ollamaDir, 0o755); err != nil {
-			slog.Error(fmt.Sprintf("create ollama dir %s: %v", ollamaDir, err))
-			return
-		}
-	}
-	payload, err := json.Marshal(store)
-	if err != nil {
-		slog.Error(fmt.Sprintf("failed to marshal store: %s", err))
-		return
-	}
-	fp, err := os.OpenFile(storeFilename, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0o755)
-	if err != nil {
-		slog.Error(fmt.Sprintf("write store payload %s: %v", storeFilename, err))
-		return
-	}
-	defer fp.Close()
-	if n, err := fp.Write(payload); err != nil || n != len(payload) {
-		slog.Error(fmt.Sprintf("write store payload %s: %d vs %d -- %v", storeFilename, n, len(payload), err))
-		return
-	}
-	slog.Debug("Store contents: " + string(payload))
-	slog.Info(fmt.Sprintf("wrote store: %s", storeFilename))
-}
--- a/app/store/store_darwin.go
+++ b/app/store/store_darwin.go
@@ -1,13 +0,0 @@
-package store
-
-import (
-	"os"
-	"path/filepath"
-)
-
-func getStorePath() string {
-	// TODO - system wide location?
-
-	home := os.Getenv("HOME")
-	return filepath.Join(home, "Library", "Application Support", "Ollama", "config.json")
-}
--- a/app/store/store_linux.go
+++ b/app/store/store_linux.go
@@ -1,16 +0,0 @@
-package store
-
-import (
-	"os"
-	"path/filepath"
-)
-
-func getStorePath() string {
-	if os.Geteuid() == 0 {
-		// TODO where should we store this on linux for system-wide operation?
-		return "/etc/ollama/config.json"
-	}
-
-	home := os.Getenv("HOME")
-	return filepath.Join(home, ".ollama", "config.json")
-}
--- a/app/store/store_windows.go
+++ b/app/store/store_windows.go
@@ -1,11 +0,0 @@
-package store
-
-import (
-	"os"
-	"path/filepath"
-)
-
-func getStorePath() string {
-	localAppData := os.Getenv("LOCALAPPDATA")
-	return filepath.Join(localAppData, "Ollama", "config.json")
-}
--- a/macapp/tailwind.config.js
+++ b/macapp/tailwind.config.js
--- a/app/tray/commontray/types.go
+++ b/app/tray/commontray/types.go
@@ -1,24 +0,0 @@
-package commontray
-
-var (
-	Title   = "Ollama"
-	ToolTip = "Ollama"
-
-	UpdateIconName = "tray_upgrade"
-	IconName       = "tray"
-)
-
-type Callbacks struct {
-	Quit       chan struct{}
-	Update     chan struct{}
-	DoFirstUse chan struct{}
-	ShowLogs   chan struct{}
-}
-
-type OllamaTray interface {
-	GetCallbacks() Callbacks
-	Run()
-	UpdateAvailable(ver string) error
-	DisplayFirstUseNotification() error
-	Quit()
-}
--- a/app/tray/tray.go
+++ b/app/tray/tray.go
@@ -1,28 +0,0 @@
-package tray
-
-import (
-	"fmt"
-	"runtime"
-
-	"github.com/ollama/ollama/app/assets"
-	"github.com/ollama/ollama/app/tray/commontray"
-)
-
-func NewTray() (commontray.OllamaTray, error) {
-	extension := ".png"
-	if runtime.GOOS == "windows" {
-		extension = ".ico"
-	}
-	iconName := commontray.UpdateIconName + extension
-	updateIcon, err := assets.GetIcon(iconName)
-	if err != nil {
-		return nil, fmt.Errorf("failed to load icon %s: %w", iconName, err)
-	}
-	iconName = commontray.IconName + extension
-	icon, err := assets.GetIcon(iconName)
-	if err != nil {
-		return nil, fmt.Errorf("failed to load icon %s: %w", iconName, err)
-	}
-
-	return InitPlatformTray(icon, updateIcon)
-}
--- a/app/tray/tray_nonwindows.go
+++ b/app/tray/tray_nonwindows.go
@@ -1,13 +0,0 @@
-//go:build !windows
-
-package tray
-
-import (
-	"fmt"
-
-	"github.com/ollama/ollama/app/tray/commontray"
-)
-
-func InitPlatformTray(icon, updateIcon []byte) (commontray.OllamaTray, error) {
-	return nil, fmt.Errorf("NOT IMPLEMENTED YET")
-}
--- a/app/tray/tray_windows.go
+++ b/app/tray/tray_windows.go
@@ -1,10 +0,0 @@
-package tray
-
-import (
-	"github.com/ollama/ollama/app/tray/commontray"
-	"github.com/ollama/ollama/app/tray/wintray"
-)
-
-func InitPlatformTray(icon, updateIcon []byte) (commontray.OllamaTray, error) {
-	return wintray.InitTray(icon, updateIcon)
-}
--- a/app/tray/wintray/eventloop.go
+++ b/app/tray/wintray/eventloop.go
@@ -1,184 +0,0 @@
-//go:build windows
-
-package wintray
-
-import (
-	"fmt"
-	"log/slog"
-	"sync"
-	"unsafe"
-
-	"golang.org/x/sys/windows"
-)
-
-var (
-	quitOnce sync.Once
-)
-
-func (t *winTray) Run() {
-	nativeLoop()
-}
-
-func nativeLoop() {
-	// Main message pump.
-	slog.Debug("starting event handling loop")
-	m := &struct {
-		WindowHandle windows.Handle
-		Message      uint32
-		Wparam       uintptr
-		Lparam       uintptr
-		Time         uint32
-		Pt           point
-		LPrivate     uint32
-	}{}
-	for {
-		ret, _, err := pGetMessage.Call(uintptr(unsafe.Pointer(m)), 0, 0, 0)
-
-		// If the function retrieves a message other than WM_QUIT, the return value is nonzero.
-		// If the function retrieves the WM_QUIT message, the return value is zero.
-		// If there is an error, the return value is -1
-		// https://msdn.microsoft.com/en-us/library/windows/desktop/ms644936(v=vs.85).aspx
-		switch int32(ret) {
-		case -1:
-			slog.Error(fmt.Sprintf("get message failure: %v", err))
-			return
-		case 0:
-			return
-		default:
-			pTranslateMessage.Call(uintptr(unsafe.Pointer(m))) //nolint:errcheck
-			pDispatchMessage.Call(uintptr(unsafe.Pointer(m)))  //nolint:errcheck
-
-		}
-	}
-}
-
-// WindowProc callback function that processes messages sent to a window.
-// https://msdn.microsoft.com/en-us/library/windows/desktop/ms633573(v=vs.85).aspx
-func (t *winTray) wndProc(hWnd windows.Handle, message uint32, wParam, lParam uintptr) (lResult uintptr) {
-	const (
-		WM_RBUTTONUP   = 0x0205
-		WM_LBUTTONUP   = 0x0202
-		WM_COMMAND     = 0x0111
-		WM_ENDSESSION  = 0x0016
-		WM_CLOSE       = 0x0010
-		WM_DESTROY     = 0x0002
-		WM_MOUSEMOVE   = 0x0200
-		WM_LBUTTONDOWN = 0x0201
-	)
-	switch message {
-	case WM_COMMAND:
-		menuItemId := int32(wParam)
-		// https://docs.microsoft.com/en-us/windows/win32/menurc/wm-command#menus
-		switch menuItemId {
-		case quitMenuID:
-			select {
-			case t.callbacks.Quit <- struct{}{}:
-			// should not happen but in case not listening
-			default:
-				slog.Error("no listener on Quit")
-			}
-		case updateMenuID:
-			select {
-			case t.callbacks.Update <- struct{}{}:
-			// should not happen but in case not listening
-			default:
-				slog.Error("no listener on Update")
-			}
-		case diagLogsMenuID:
-			select {
-			case t.callbacks.ShowLogs <- struct{}{}:
-			// should not happen but in case not listening
-			default:
-				slog.Error("no listener on ShowLogs")
-			}
-		default:
-			slog.Debug(fmt.Sprintf("Unexpected menu item id: %d", menuItemId))
-		}
-	case WM_CLOSE:
-		boolRet, _, err := pDestroyWindow.Call(uintptr(t.window))
-		if boolRet == 0 {
-			slog.Error(fmt.Sprintf("failed to destroy window: %s", err))
-		}
-		err = t.wcex.unregister()
-		if err != nil {
-			slog.Error(fmt.Sprintf("failed to uregister windo %s", err))
-		}
-	case WM_DESTROY:
-		// same as WM_ENDSESSION, but throws 0 exit code after all
-		defer pPostQuitMessage.Call(uintptr(int32(0))) //nolint:errcheck
-		fallthrough
-	case WM_ENDSESSION:
-		t.muNID.Lock()
-		if t.nid != nil {
-			err := t.nid.delete()
-			if err != nil {
-				slog.Error(fmt.Sprintf("failed to delete nid: %s", err))
-			}
-		}
-		t.muNID.Unlock()
-	case t.wmSystrayMessage:
-		switch lParam {
-		case WM_MOUSEMOVE, WM_LBUTTONDOWN:
-			// Ignore these...
-		case WM_RBUTTONUP, WM_LBUTTONUP:
-			err := t.showMenu()
-			if err != nil {
-				slog.Error(fmt.Sprintf("failed to show menu: %s", err))
-			}
-		case 0x405: // TODO - how is this magic value derived for the notification left click
-			if t.pendingUpdate {
-				select {
-				case t.callbacks.Update <- struct{}{}:
-				// should not happen but in case not listening
-				default:
-					slog.Error("no listener on Update")
-				}
-			} else {
-				select {
-				case t.callbacks.DoFirstUse <- struct{}{}:
-				// should not happen but in case not listening
-				default:
-					slog.Error("no listener on DoFirstUse")
-				}
-			}
-		case 0x404: // Middle click or close notification
-			// slog.Debug("doing nothing on close of first time notification")
-		default:
-			// 0x402 also seems common - what is it?
-			slog.Debug(fmt.Sprintf("unmanaged app message, lParm: 0x%x", lParam))
-		}
-	case t.wmTaskbarCreated: // on explorer.exe restarts
-		t.muNID.Lock()
-		err := t.nid.add()
-		if err != nil {
-			slog.Error(fmt.Sprintf("failed to refresh the taskbar on explorer restart: %s", err))
-		}
-		t.muNID.Unlock()
-	default:
-		// Calls the default window procedure to provide default processing for any window messages that an application does not process.
-		// https://msdn.microsoft.com/en-us/library/windows/desktop/ms633572(v=vs.85).aspx
-		lResult, _, _ = pDefWindowProc.Call(
-			uintptr(hWnd),
-			uintptr(message),
-			uintptr(wParam),
-			uintptr(lParam),
-		)
-	}
-	return
-}
-
-func (t *winTray) Quit() {
-	quitOnce.Do(quit)
-}
-
-func quit() {
-	boolRet, _, err := pPostMessage.Call(
-		uintptr(wt.window),
-		WM_CLOSE,
-		0,
-		0,
-	)
-	if boolRet == 0 {
-		slog.Error(fmt.Sprintf("failed to post close message on shutdown %s", err))
-	}
-}
--- a/app/tray/wintray/menus.go
+++ b/app/tray/wintray/menus.go
@@ -1,71 +0,0 @@
-//go:build windows
-
-package wintray
-
-import (
-	"fmt"
-	"log/slog"
-	"unsafe"
-
-	"golang.org/x/sys/windows"
-)
-
-const (
-	updatAvailableMenuID = 1
-	updateMenuID         = updatAvailableMenuID + 1
-	separatorMenuID      = updateMenuID + 1
-	diagLogsMenuID       = separatorMenuID + 1
-	diagSeparatorMenuID  = diagLogsMenuID + 1
-	quitMenuID           = diagSeparatorMenuID + 1
-)
-
-func (t *winTray) initMenus() error {
-	if err := t.addOrUpdateMenuItem(diagLogsMenuID, 0, diagLogsMenuTitle, false); err != nil {
-		return fmt.Errorf("unable to create menu entries %w\n", err)
-	}
-	if err := t.addSeparatorMenuItem(diagSeparatorMenuID, 0); err != nil {
-		return fmt.Errorf("unable to create menu entries %w", err)
-	}
-	if err := t.addOrUpdateMenuItem(quitMenuID, 0, quitMenuTitle, false); err != nil {
-		return fmt.Errorf("unable to create menu entries %w\n", err)
-	}
-	return nil
-}
-
-func (t *winTray) UpdateAvailable(ver string) error {
-	if !t.updateNotified {
-		slog.Debug("updating menu and sending notification for new update")
-		if err := t.addOrUpdateMenuItem(updatAvailableMenuID, 0, updateAvailableMenuTitle, true); err != nil {
-			return fmt.Errorf("unable to create menu entries %w", err)
-		}
-		if err := t.addOrUpdateMenuItem(updateMenuID, 0, updateMenutTitle, false); err != nil {
-			return fmt.Errorf("unable to create menu entries %w", err)
-		}
-		if err := t.addSeparatorMenuItem(separatorMenuID, 0); err != nil {
-			return fmt.Errorf("unable to create menu entries %w", err)
-		}
-		iconFilePath, err := iconBytesToFilePath(wt.updateIcon)
-		if err != nil {
-			return fmt.Errorf("unable to write icon data to temp file: %w", err)
-		}
-		if err := wt.setIcon(iconFilePath); err != nil {
-			return fmt.Errorf("unable to set icon: %w", err)
-		}
-		t.updateNotified = true
-
-		t.pendingUpdate = true
-		// Now pop up the notification
-		t.muNID.Lock()
-		defer t.muNID.Unlock()
-		copy(t.nid.InfoTitle[:], windows.StringToUTF16(updateTitle))
-		copy(t.nid.Info[:], windows.StringToUTF16(fmt.Sprintf(updateMessage, ver)))
-		t.nid.Flags |= NIF_INFO
-		t.nid.Timeout = 10
-		t.nid.Size = uint32(unsafe.Sizeof(*wt.nid))
-		err = t.nid.modify()
-		if err != nil {
-			return err
-		}
-	}
-	return nil
-}
--- a/app/tray/wintray/messages.go
+++ b/app/tray/wintray/messages.go
@@ -1,15 +0,0 @@
-//go:build windows
-
-package wintray
-
-const (
-	firstTimeTitle   = "Ollama is running"
-	firstTimeMessage = "Click here to get started"
-	updateTitle      = "Update available"
-	updateMessage    = "Ollama version %s is ready to install"
-
-	quitMenuTitle            = "Quit Ollama"
-	updateAvailableMenuTitle = "An update is available"
-	updateMenutTitle         = "Restart to update"
-	diagLogsMenuTitle        = "View logs"
-)
--- a/app/tray/wintray/notifyicon.go
+++ b/app/tray/wintray/notifyicon.go
@@ -1,66 +0,0 @@
-//go:build windows
-
-package wintray
-
-import (
-	"unsafe"
-
-	"golang.org/x/sys/windows"
-)
-
-// Contains information that the system needs to display notifications in the notification area.
-// Used by Shell_NotifyIcon.
-// https://msdn.microsoft.com/en-us/library/windows/desktop/bb773352(v=vs.85).aspx
-// https://msdn.microsoft.com/en-us/library/windows/desktop/bb762159
-type notifyIconData struct {
-	Size                       uint32
-	Wnd                        windows.Handle
-	ID, Flags, CallbackMessage uint32
-	Icon                       windows.Handle
-	Tip                        [128]uint16
-	State, StateMask           uint32
-	Info                       [256]uint16
-	// Timeout, Version           uint32
-	Timeout uint32
-
-	InfoTitle   [64]uint16
-	InfoFlags   uint32
-	GuidItem    windows.GUID
-	BalloonIcon windows.Handle
-}
-
-func (nid *notifyIconData) add() error {
-	const NIM_ADD = 0x00000000
-	res, _, err := pShellNotifyIcon.Call(
-		uintptr(NIM_ADD),
-		uintptr(unsafe.Pointer(nid)),
-	)
-	if res == 0 {
-		return err
-	}
-	return nil
-}
-
-func (nid *notifyIconData) modify() error {
-	const NIM_MODIFY = 0x00000001
-	res, _, err := pShellNotifyIcon.Call(
-		uintptr(NIM_MODIFY),
-		uintptr(unsafe.Pointer(nid)),
-	)
-	if res == 0 {
-		return err
-	}
-	return nil
-}
-
-func (nid *notifyIconData) delete() error {
-	const NIM_DELETE = 0x00000002
-	res, _, err := pShellNotifyIcon.Call(
-		uintptr(NIM_DELETE),
-		uintptr(unsafe.Pointer(nid)),
-	)
-	if res == 0 {
-		return err
-	}
-	return nil
-}
--- a/app/tray/wintray/tray.go
+++ b/app/tray/wintray/tray.go
@@ -1,485 +0,0 @@
-//go:build windows
-
-package wintray
-
-import (
-	"crypto/md5"
-	"encoding/hex"
-	"fmt"
-	"log/slog"
-	"os"
-	"path/filepath"
-	"sort"
-	"sync"
-	"unsafe"
-
-	"github.com/ollama/ollama/app/tray/commontray"
-	"golang.org/x/sys/windows"
-)
-
-// Helpful sources: https://github.com/golang/exp/blob/master/shiny/driver/internal/win32
-
-// Contains information about loaded resources
-type winTray struct {
-	instance,
-	icon,
-	cursor,
-	window windows.Handle
-
-	loadedImages   map[string]windows.Handle
-	muLoadedImages sync.RWMutex
-
-	// menus keeps track of the submenus keyed by the menu item ID, plus 0
-	// which corresponds to the main popup menu.
-	menus    map[uint32]windows.Handle
-	muMenus  sync.RWMutex
-	menuOf   map[uint32]windows.Handle
-	muMenuOf sync.RWMutex
-	// menuItemIcons maintains the bitmap of each menu item (if applies). It's
-	// needed to show the icon correctly when showing a previously hidden menu
-	// item again.
-	// menuItemIcons   map[uint32]windows.Handle
-	// muMenuItemIcons sync.RWMutex
-	visibleItems   map[uint32][]uint32
-	muVisibleItems sync.RWMutex
-
-	nid   *notifyIconData
-	muNID sync.RWMutex
-	wcex  *wndClassEx
-
-	wmSystrayMessage,
-	wmTaskbarCreated uint32
-
-	pendingUpdate  bool
-	updateNotified bool // Only pop up the notification once - TODO consider daily nag?
-	// Callbacks
-	callbacks  commontray.Callbacks
-	normalIcon []byte
-	updateIcon []byte
-}
-
-var wt winTray
-
-func (t *winTray) GetCallbacks() commontray.Callbacks {
-	return t.callbacks
-}
-
-func InitTray(icon, updateIcon []byte) (*winTray, error) {
-	wt.callbacks.Quit = make(chan struct{})
-	wt.callbacks.Update = make(chan struct{})
-	wt.callbacks.ShowLogs = make(chan struct{})
-	wt.callbacks.DoFirstUse = make(chan struct{})
-	wt.normalIcon = icon
-	wt.updateIcon = updateIcon
-	if err := wt.initInstance(); err != nil {
-		return nil, fmt.Errorf("Unable to init instance: %w\n", err)
-	}
-
-	if err := wt.createMenu(); err != nil {
-		return nil, fmt.Errorf("Unable to create menu: %w\n", err)
-	}
-
-	iconFilePath, err := iconBytesToFilePath(wt.normalIcon)
-	if err != nil {
-		return nil, fmt.Errorf("Unable to write icon data to temp file: %w", err)
-	}
-	if err := wt.setIcon(iconFilePath); err != nil {
-		return nil, fmt.Errorf("Unable to set icon: %w", err)
-	}
-
-	return &wt, wt.initMenus()
-}
-
-func (t *winTray) initInstance() error {
-	const (
-		className  = "OllamaClass"
-		windowName = ""
-	)
-
-	t.wmSystrayMessage = WM_USER + 1
-	t.visibleItems = make(map[uint32][]uint32)
-	t.menus = make(map[uint32]windows.Handle)
-	t.menuOf = make(map[uint32]windows.Handle)
-
-	t.loadedImages = make(map[string]windows.Handle)
-
-	taskbarEventNamePtr, _ := windows.UTF16PtrFromString("TaskbarCreated")
-	// https://msdn.microsoft.com/en-us/library/windows/desktop/ms644947
-	res, _, err := pRegisterWindowMessage.Call(
-		uintptr(unsafe.Pointer(taskbarEventNamePtr)),
-	)
-	if res == 0 { // success 0xc000-0xfff
-		return fmt.Errorf("failed to register window: %w", err)
-	}
-	t.wmTaskbarCreated = uint32(res)
-
-	instanceHandle, _, err := pGetModuleHandle.Call(0)
-	if instanceHandle == 0 {
-		return err
-	}
-	t.instance = windows.Handle(instanceHandle)
-
-	// https://msdn.microsoft.com/en-us/library/windows/desktop/ms648072(v=vs.85).aspx
-	iconHandle, _, err := pLoadIcon.Call(0, uintptr(IDI_APPLICATION))
-	if iconHandle == 0 {
-		return err
-	}
-	t.icon = windows.Handle(iconHandle)
-
-	// https://msdn.microsoft.com/en-us/library/windows/desktop/ms648391(v=vs.85).aspx
-	cursorHandle, _, err := pLoadCursor.Call(0, uintptr(IDC_ARROW))
-	if cursorHandle == 0 {
-		return err
-	}
-	t.cursor = windows.Handle(cursorHandle)
-
-	classNamePtr, err := windows.UTF16PtrFromString(className)
-	if err != nil {
-		return err
-	}
-
-	windowNamePtr, err := windows.UTF16PtrFromString(windowName)
-	if err != nil {
-		return err
-	}
-
-	t.wcex = &wndClassEx{
-		Style:      CS_HREDRAW | CS_VREDRAW,
-		WndProc:    windows.NewCallback(t.wndProc),
-		Instance:   t.instance,
-		Icon:       t.icon,
-		Cursor:     t.cursor,
-		Background: windows.Handle(6), // (COLOR_WINDOW + 1)
-		ClassName:  classNamePtr,
-		IconSm:     t.icon,
-	}
-	if err := t.wcex.register(); err != nil {
-		return err
-	}
-
-	windowHandle, _, err := pCreateWindowEx.Call(
-		uintptr(0),
-		uintptr(unsafe.Pointer(classNamePtr)),
-		uintptr(unsafe.Pointer(windowNamePtr)),
-		uintptr(WS_OVERLAPPEDWINDOW),
-		uintptr(CW_USEDEFAULT),
-		uintptr(CW_USEDEFAULT),
-		uintptr(CW_USEDEFAULT),
-		uintptr(CW_USEDEFAULT),
-		uintptr(0),
-		uintptr(0),
-		uintptr(t.instance),
-		uintptr(0),
-	)
-	if windowHandle == 0 {
-		return err
-	}
-	t.window = windows.Handle(windowHandle)
-
-	pShowWindow.Call(uintptr(t.window), uintptr(SW_HIDE)) //nolint:errcheck
-
-	boolRet, _, err := pUpdateWindow.Call(uintptr(t.window))
-	if boolRet == 0 {
-		slog.Error(fmt.Sprintf("failed to update window: %s", err))
-	}
-
-	t.muNID.Lock()
-	defer t.muNID.Unlock()
-	t.nid = &notifyIconData{
-		Wnd:             windows.Handle(t.window),
-		ID:              100,
-		Flags:           NIF_MESSAGE,
-		CallbackMessage: t.wmSystrayMessage,
-	}
-	t.nid.Size = uint32(unsafe.Sizeof(*t.nid))
-
-	return t.nid.add()
-}
-
-func (t *winTray) createMenu() error {
-
-	menuHandle, _, err := pCreatePopupMenu.Call()
-	if menuHandle == 0 {
-		return err
-	}
-	t.menus[0] = windows.Handle(menuHandle)
-
-	// https://msdn.microsoft.com/en-us/library/windows/desktop/ms647575(v=vs.85).aspx
-	mi := struct {
-		Size, Mask, Style, Max uint32
-		Background             windows.Handle
-		ContextHelpID          uint32
-		MenuData               uintptr
-	}{
-		Mask: MIM_APPLYTOSUBMENUS,
-	}
-	mi.Size = uint32(unsafe.Sizeof(mi))
-
-	res, _, err := pSetMenuInfo.Call(
-		uintptr(t.menus[0]),
-		uintptr(unsafe.Pointer(&mi)),
-	)
-	if res == 0 {
-		return err
-	}
-	return nil
-}
-
-// Contains information about a menu item.
-// https://msdn.microsoft.com/en-us/library/windows/desktop/ms647578(v=vs.85).aspx
-type menuItemInfo struct {
-	Size, Mask, Type, State     uint32
-	ID                          uint32
-	SubMenu, Checked, Unchecked windows.Handle
-	ItemData                    uintptr
-	TypeData                    *uint16
-	Cch                         uint32
-	BMPItem                     windows.Handle
-}
-
-func (t *winTray) addOrUpdateMenuItem(menuItemId uint32, parentId uint32, title string, disabled bool) error {
-	titlePtr, err := windows.UTF16PtrFromString(title)
-	if err != nil {
-		return err
-	}
-
-	mi := menuItemInfo{
-		Mask:     MIIM_FTYPE | MIIM_STRING | MIIM_ID | MIIM_STATE,
-		Type:     MFT_STRING,
-		ID:       uint32(menuItemId),
-		TypeData: titlePtr,
-		Cch:      uint32(len(title)),
-	}
-	mi.Size = uint32(unsafe.Sizeof(mi))
-	if disabled {
-		mi.State |= MFS_DISABLED
-	}
-
-	var res uintptr
-	t.muMenus.RLock()
-	menu := t.menus[parentId]
-	t.muMenus.RUnlock()
-	if t.getVisibleItemIndex(parentId, menuItemId) != -1 {
-		// We set the menu item info based on the menuID
-		boolRet, _, err := pSetMenuItemInfo.Call(
-			uintptr(menu),
-			uintptr(menuItemId),
-			0,
-			uintptr(unsafe.Pointer(&mi)),
-		)
-		if boolRet == 0 {
-			return fmt.Errorf("failed to set menu item: %w", err)
-		}
-	}
-
-	if res == 0 {
-		// Menu item does not already exist, create it
-		t.muMenus.RLock()
-		submenu, exists := t.menus[menuItemId]
-		t.muMenus.RUnlock()
-		if exists {
-			mi.Mask |= MIIM_SUBMENU
-			mi.SubMenu = submenu
-		}
-		t.addToVisibleItems(parentId, menuItemId)
-		position := t.getVisibleItemIndex(parentId, menuItemId)
-		res, _, err = pInsertMenuItem.Call(
-			uintptr(menu),
-			uintptr(position),
-			1,
-			uintptr(unsafe.Pointer(&mi)),
-		)
-		if res == 0 {
-			t.delFromVisibleItems(parentId, menuItemId)
-			return err
-		}
-		t.muMenuOf.Lock()
-		t.menuOf[menuItemId] = menu
-		t.muMenuOf.Unlock()
-	}
-
-	return nil
-}
-
-func (t *winTray) addSeparatorMenuItem(menuItemId, parentId uint32) error {
-
-	mi := menuItemInfo{
-		Mask: MIIM_FTYPE | MIIM_ID | MIIM_STATE,
-		Type: MFT_SEPARATOR,
-		ID:   uint32(menuItemId),
-	}
-
-	mi.Size = uint32(unsafe.Sizeof(mi))
-
-	t.addToVisibleItems(parentId, menuItemId)
-	position := t.getVisibleItemIndex(parentId, menuItemId)
-	t.muMenus.RLock()
-	menu := uintptr(t.menus[parentId])
-	t.muMenus.RUnlock()
-	res, _, err := pInsertMenuItem.Call(
-		menu,
-		uintptr(position),
-		1,
-		uintptr(unsafe.Pointer(&mi)),
-	)
-	if res == 0 {
-		return err
-	}
-
-	return nil
-}
-
-// func (t *winTray) hideMenuItem(menuItemId, parentId uint32) error {
-// 	const ERROR_SUCCESS syscall.Errno = 0
-
-// 	t.muMenus.RLock()
-// 	menu := uintptr(t.menus[parentId])
-// 	t.muMenus.RUnlock()
-// 	res, _, err := pRemoveMenu.Call(
-// 		menu,
-// 		uintptr(menuItemId),
-// 		MF_BYCOMMAND,
-// 	)
-// 	if res == 0 && err.(syscall.Errno) != ERROR_SUCCESS {
-// 		return err
-// 	}
-// 	t.delFromVisibleItems(parentId, menuItemId)
-
-// 	return nil
-// }
-
-func (t *winTray) showMenu() error {
-	p := point{}
-	boolRet, _, err := pGetCursorPos.Call(uintptr(unsafe.Pointer(&p)))
-	if boolRet == 0 {
-		return err
-	}
-	boolRet, _, err = pSetForegroundWindow.Call(uintptr(t.window))
-	if boolRet == 0 {
-		slog.Warn(fmt.Sprintf("failed to bring menu to foreground: %s", err))
-	}
-
-	boolRet, _, err = pTrackPopupMenu.Call(
-		uintptr(t.menus[0]),
-		TPM_BOTTOMALIGN|TPM_LEFTALIGN,
-		uintptr(p.X),
-		uintptr(p.Y),
-		0,
-		uintptr(t.window),
-		0,
-	)
-	if boolRet == 0 {
-		return err
-	}
-
-	return nil
-}
-
-func (t *winTray) delFromVisibleItems(parent, val uint32) {
-	t.muVisibleItems.Lock()
-	defer t.muVisibleItems.Unlock()
-	visibleItems := t.visibleItems[parent]
-	for i, itemval := range visibleItems {
-		if val == itemval {
-			t.visibleItems[parent] = append(visibleItems[:i], visibleItems[i+1:]...)
-			break
-		}
-	}
-}
-
-func (t *winTray) addToVisibleItems(parent, val uint32) {
-	t.muVisibleItems.Lock()
-	defer t.muVisibleItems.Unlock()
-	if visibleItems, exists := t.visibleItems[parent]; !exists {
-		t.visibleItems[parent] = []uint32{val}
-	} else {
-		newvisible := append(visibleItems, val)
-		sort.Slice(newvisible, func(i, j int) bool { return newvisible[i] < newvisible[j] })
-		t.visibleItems[parent] = newvisible
-	}
-}
-
-func (t *winTray) getVisibleItemIndex(parent, val uint32) int {
-	t.muVisibleItems.RLock()
-	defer t.muVisibleItems.RUnlock()
-	for i, itemval := range t.visibleItems[parent] {
-		if val == itemval {
-			return i
-		}
-	}
-	return -1
-}
-
-func iconBytesToFilePath(iconBytes []byte) (string, error) {
-	bh := md5.Sum(iconBytes)
-	dataHash := hex.EncodeToString(bh[:])
-	iconFilePath := filepath.Join(os.TempDir(), "ollama_temp_icon_"+dataHash)
-
-	if _, err := os.Stat(iconFilePath); os.IsNotExist(err) {
-		if err := os.WriteFile(iconFilePath, iconBytes, 0644); err != nil {
-			return "", err
-		}
-	}
-	return iconFilePath, nil
-}
-
-// Loads an image from file and shows it in tray.
-// Shell_NotifyIcon: https://msdn.microsoft.com/en-us/library/windows/desktop/bb762159(v=vs.85).aspx
-func (t *winTray) setIcon(src string) error {
-
-	h, err := t.loadIconFrom(src)
-	if err != nil {
-		return err
-	}
-
-	t.muNID.Lock()
-	defer t.muNID.Unlock()
-	t.nid.Icon = h
-	t.nid.Flags |= NIF_ICON
-	t.nid.Size = uint32(unsafe.Sizeof(*t.nid))
-
-	return t.nid.modify()
-}
-
-// Loads an image from file to be shown in tray or menu item.
-// LoadImage: https://msdn.microsoft.com/en-us/library/windows/desktop/ms648045(v=vs.85).aspx
-func (t *winTray) loadIconFrom(src string) (windows.Handle, error) {
-
-	// Save and reuse handles of loaded images
-	t.muLoadedImages.RLock()
-	h, ok := t.loadedImages[src]
-	t.muLoadedImages.RUnlock()
-	if !ok {
-		srcPtr, err := windows.UTF16PtrFromString(src)
-		if err != nil {
-			return 0, err
-		}
-		res, _, err := pLoadImage.Call(
-			0,
-			uintptr(unsafe.Pointer(srcPtr)),
-			IMAGE_ICON,
-			0,
-			0,
-			LR_LOADFROMFILE|LR_DEFAULTSIZE,
-		)
-		if res == 0 {
-			return 0, err
-		}
-		h = windows.Handle(res)
-		t.muLoadedImages.Lock()
-		t.loadedImages[src] = h
-		t.muLoadedImages.Unlock()
-	}
-	return h, nil
-}
-
-func (t *winTray) DisplayFirstUseNotification() error {
-	t.muNID.Lock()
-	defer t.muNID.Unlock()
-	copy(t.nid.InfoTitle[:], windows.StringToUTF16(firstTimeTitle))
-	copy(t.nid.Info[:], windows.StringToUTF16(firstTimeMessage))
-	t.nid.Flags |= NIF_INFO
-	t.nid.Size = uint32(unsafe.Sizeof(*wt.nid))
-
-	return t.nid.modify()
-}
--- a/app/tray/wintray/w32api.go
+++ b/app/tray/wintray/w32api.go
@@ -1,89 +0,0 @@
-//go:build windows
-
-package wintray
-
-import (
-	"runtime"
-
-	"golang.org/x/sys/windows"
-)
-
-var (
-	k32 = windows.NewLazySystemDLL("Kernel32.dll")
-	u32 = windows.NewLazySystemDLL("User32.dll")
-	s32 = windows.NewLazySystemDLL("Shell32.dll")
-
-	pCreatePopupMenu       = u32.NewProc("CreatePopupMenu")
-	pCreateWindowEx        = u32.NewProc("CreateWindowExW")
-	pDefWindowProc         = u32.NewProc("DefWindowProcW")
-	pDestroyWindow         = u32.NewProc("DestroyWindow")
-	pDispatchMessage       = u32.NewProc("DispatchMessageW")
-	pGetCursorPos          = u32.NewProc("GetCursorPos")
-	pGetMessage            = u32.NewProc("GetMessageW")
-	pGetModuleHandle       = k32.NewProc("GetModuleHandleW")
-	pInsertMenuItem        = u32.NewProc("InsertMenuItemW")
-	pLoadCursor            = u32.NewProc("LoadCursorW")
-	pLoadIcon              = u32.NewProc("LoadIconW")
-	pLoadImage             = u32.NewProc("LoadImageW")
-	pPostMessage           = u32.NewProc("PostMessageW")
-	pPostQuitMessage       = u32.NewProc("PostQuitMessage")
-	pRegisterClass         = u32.NewProc("RegisterClassExW")
-	pRegisterWindowMessage = u32.NewProc("RegisterWindowMessageW")
-	pSetForegroundWindow   = u32.NewProc("SetForegroundWindow")
-	pSetMenuInfo           = u32.NewProc("SetMenuInfo")
-	pSetMenuItemInfo       = u32.NewProc("SetMenuItemInfoW")
-	pShellNotifyIcon       = s32.NewProc("Shell_NotifyIconW")
-	pShowWindow            = u32.NewProc("ShowWindow")
-	pTrackPopupMenu        = u32.NewProc("TrackPopupMenu")
-	pTranslateMessage      = u32.NewProc("TranslateMessage")
-	pUnregisterClass       = u32.NewProc("UnregisterClassW")
-	pUpdateWindow          = u32.NewProc("UpdateWindow")
-)
-
-const (
-	CS_HREDRAW          = 0x0002
-	CS_VREDRAW          = 0x0001
-	CW_USEDEFAULT       = 0x80000000
-	IDC_ARROW           = 32512 // Standard arrow
-	IDI_APPLICATION     = 32512
-	IMAGE_ICON          = 1          // Loads an icon
-	LR_DEFAULTSIZE      = 0x00000040 // Loads default-size icon for windows(SM_CXICON x SM_CYICON) if cx, cy are set to zero
-	LR_LOADFROMFILE     = 0x00000010 // Loads the stand-alone image from the file
-	MF_BYCOMMAND        = 0x00000000
-	MFS_DISABLED        = 0x00000003
-	MFT_SEPARATOR       = 0x00000800
-	MFT_STRING          = 0x00000000
-	MIIM_BITMAP         = 0x00000080
-	MIIM_FTYPE          = 0x00000100
-	MIIM_ID             = 0x00000002
-	MIIM_STATE          = 0x00000001
-	MIIM_STRING         = 0x00000040
-	MIIM_SUBMENU        = 0x00000004
-	MIM_APPLYTOSUBMENUS = 0x80000000
-	NIF_ICON            = 0x00000002
-	NIF_INFO            = 0x00000010
-	NIF_MESSAGE         = 0x00000001
-	SW_HIDE             = 0
-	TPM_BOTTOMALIGN     = 0x0020
-	TPM_LEFTALIGN       = 0x0000
-	WM_CLOSE            = 0x0010
-	WM_USER             = 0x0400
-	WS_CAPTION          = 0x00C00000
-	WS_MAXIMIZEBOX      = 0x00010000
-	WS_MINIMIZEBOX      = 0x00020000
-	WS_OVERLAPPED       = 0x00000000
-	WS_OVERLAPPEDWINDOW = WS_OVERLAPPED | WS_CAPTION | WS_SYSMENU | WS_THICKFRAME | WS_MINIMIZEBOX | WS_MAXIMIZEBOX
-	WS_SYSMENU          = 0x00080000
-	WS_THICKFRAME       = 0x00040000
-)
-
-// Not sure if this is actually needed on windows
-func init() {
-	runtime.LockOSThread()
-}
-
-// The POINT structure defines the x- and y- coordinates of a point.
-// https://msdn.microsoft.com/en-us/library/windows/desktop/dd162805(v=vs.85).aspx
-type point struct {
-	X, Y int32
-}
--- a/app/tray/wintray/winclass.go
+++ b/app/tray/wintray/winclass.go
@@ -1,45 +0,0 @@
-//go:build windows
-
-package wintray
-
-import (
-	"unsafe"
-
-	"golang.org/x/sys/windows"
-)
-
-// Contains window class information.
-// It is used with the RegisterClassEx and GetClassInfoEx functions.
-// https://msdn.microsoft.com/en-us/library/ms633577.aspx
-type wndClassEx struct {
-	Size, Style                        uint32
-	WndProc                            uintptr
-	ClsExtra, WndExtra                 int32
-	Instance, Icon, Cursor, Background windows.Handle
-	MenuName, ClassName                *uint16
-	IconSm                             windows.Handle
-}
-
-// Registers a window class for subsequent use in calls to the CreateWindow or CreateWindowEx function.
-// https://msdn.microsoft.com/en-us/library/ms633587.aspx
-func (w *wndClassEx) register() error {
-	w.Size = uint32(unsafe.Sizeof(*w))
-	res, _, err := pRegisterClass.Call(uintptr(unsafe.Pointer(w)))
-	if res == 0 {
-		return err
-	}
-	return nil
-}
-
-// Unregisters a window class, freeing the memory required for the class.
-// https://msdn.microsoft.com/en-us/library/ms644899.aspx
-func (w *wndClassEx) unregister() error {
-	res, _, err := pUnregisterClass.Call(
-		uintptr(unsafe.Pointer(w.ClassName)),
-		uintptr(w.Instance),
-	)
-	if res == 0 {
-		return err
-	}
-	return nil
-}
--- a/macapp/tsconfig.json
+++ b/macapp/tsconfig.json
--- a/macapp/webpack.main.config.ts
+++ b/macapp/webpack.main.config.ts
--- a/macapp/webpack.plugins.ts
+++ b/macapp/webpack.plugins.ts
--- a/macapp/webpack.renderer.config.ts
+++ b/macapp/webpack.renderer.config.ts
--- a/macapp/webpack.rules.ts
+++ b/macapp/webpack.rules.ts
--- a/auth/auth.go
+++ b/auth/auth.go
@@ -1,61 +0,0 @@
-package auth
-
-import (
-	"bytes"
-	"context"
-	"crypto/rand"
-	"encoding/base64"
-	"fmt"
-	"io"
-	"log/slog"
-	"os"
-	"path/filepath"
-
-	"golang.org/x/crypto/ssh"
-)
-
-const defaultPrivateKey = "id_ed25519"
-
-func NewNonce(r io.Reader, length int) (string, error) {
-	nonce := make([]byte, length)
-	if _, err := io.ReadFull(r, nonce); err != nil {
-		return "", err
-	}
-
-	return base64.RawURLEncoding.EncodeToString(nonce), nil
-}
-
-func Sign(ctx context.Context, bts []byte) (string, error) {
-	home, err := os.UserHomeDir()
-	if err != nil {
-		return "", err
-	}
-
-	keyPath := filepath.Join(home, ".ollama", defaultPrivateKey)
-
-	privateKeyFile, err := os.ReadFile(keyPath)
-	if err != nil {
-		slog.Info(fmt.Sprintf("Failed to load private key: %v", err))
-		return "", err
-	}
-
-	privateKey, err := ssh.ParsePrivateKey(privateKeyFile)
-	if err != nil {
-		return "", err
-	}
-
-	// get the pubkey, but remove the type
-	publicKey := ssh.MarshalAuthorizedKey(privateKey.PublicKey())
-	parts := bytes.Split(publicKey, []byte(" "))
-	if len(parts) < 2 {
-		return "", fmt.Errorf("malformed public key")
-	}
-
-	signedData, err := privateKey.Sign(rand.Reader, bts)
-	if err != nil {
-		return "", err
-	}
-
-	// signature is <pubkey>:<signature>
-	return fmt.Sprintf("%s:%s", bytes.TrimSpace(parts[1]), base64.StdEncoding.EncodeToString(signedData.Blob)), nil
-}
--- a/cmd/cmd.go
+++ b/cmd/cmd.go
@@ -1,7 +1,6 @@
 package cmd

 import (
-	"archive/zip"
 	"bytes"
 	"context"
 	"crypto/ed25519"
@@ -15,6 +14,7 @@ import (
 	"net"
 	"net/http"
 	"os"
+	"os/exec"
 	"os/signal"
 	"path/filepath"
 	"runtime"
@@ -22,20 +22,17 @@ import (
 	"syscall"
 	"time"

-	"github.com/containerd/console"
-
 	"github.com/olekukonko/tablewriter"
 	"github.com/spf13/cobra"
 	"golang.org/x/crypto/ssh"
-	"golang.org/x/exp/slices"
 	"golang.org/x/term"

-	"github.com/ollama/ollama/api"
-	"github.com/ollama/ollama/format"
-	"github.com/ollama/ollama/parser"
-	"github.com/ollama/ollama/progress"
-	"github.com/ollama/ollama/server"
-	"github.com/ollama/ollama/version"
+	"github.com/jmorganca/ollama/api"
+	"github.com/jmorganca/ollama/format"
+	"github.com/jmorganca/ollama/parser"
+	"github.com/jmorganca/ollama/progress"
+	"github.com/jmorganca/ollama/server"
+	"github.com/jmorganca/ollama/version"
 )

 func CreateHandler(cmd *cobra.Command, args []string) error {
@@ -88,82 +85,22 @@ func CreateHandler(cmd *cobra.Command, args []string) error {
 				path = filepath.Join(filepath.Dir(filename), path)
 			}

-			fi, err := os.Stat(path)
+			bin, err := os.Open(path)
 			if errors.Is(err, os.ErrNotExist) && c.Name == "model" {
 				continue
 			} else if err != nil {
 				return err
 			}
+			defer bin.Close()

-			// TODO make this work w/ adapters
-			if fi.IsDir() {
-				tf, err := os.CreateTemp("", "ollama-tf")
-				if err != nil {
-					return err
-				}
-				defer os.RemoveAll(tf.Name())
-
-				zf := zip.NewWriter(tf)
-
-				files, err := filepath.Glob(filepath.Join(path, "model-*.safetensors"))
-				if err != nil {
-					return err
-				}
-
-				if len(files) == 0 {
-					return fmt.Errorf("no safetensors files were found in '%s'", path)
-				}
-
-				// add the safetensor config file + tokenizer
-				files = append(files, filepath.Join(path, "config.json"))
-				files = append(files, filepath.Join(path, "added_tokens.json"))
-				files = append(files, filepath.Join(path, "tokenizer.model"))
-
-				for _, fn := range files {
-					f, err := os.Open(fn)
-					if os.IsNotExist(err) && strings.HasSuffix(fn, "added_tokens.json") {
-						continue
-					} else if err != nil {
-						return err
-					}
-
-					fi, err := f.Stat()
-					if err != nil {
-						return err
-					}
-
-					h, err := zip.FileInfoHeader(fi)
-					if err != nil {
-						return err
-					}
-
-					h.Name = filepath.Base(fn)
-					h.Method = zip.Store
-
-					w, err := zf.CreateHeader(h)
-					if err != nil {
-						return err
-					}
-
-					_, err = io.Copy(w, f)
-					if err != nil {
-						return err
-					}
-
-				}
-
-				if err := zf.Close(); err != nil {
-					return err
-				}
-
-				if err := tf.Close(); err != nil {
-					return err
-				}
-				path = tf.Name()
+			hash := sha256.New()
+			if _, err := io.Copy(hash, bin); err != nil {
+				return err
 			}
+			bin.Seek(0, io.SeekStart)

-			digest, err := createBlob(cmd, client, path)
-			if err != nil {
+			digest := fmt.Sprintf("sha256:%x", hash.Sum(nil))
+			if err = client.CreateBlob(cmd.Context(), digest, bin); err != nil {
 				return err
 			}

@@ -194,9 +131,7 @@ func CreateHandler(cmd *cobra.Command, args []string) error {
 		return nil
 	}

-	quantization, _ := cmd.Flags().GetString("quantization")
-
-	request := api.CreateRequest{Name: args[0], Modelfile: string(modelfile), Quantization: quantization}
+	request := api.CreateRequest{Name: args[0], Modelfile: string(modelfile)}
 	if err := client.Create(cmd.Context(), &request, fn); err != nil {
 		return err
 	}
@@ -204,106 +139,26 @@ func CreateHandler(cmd *cobra.Command, args []string) error {
 	return nil
 }

-func createBlob(cmd *cobra.Command, client *api.Client, path string) (string, error) {
-	bin, err := os.Open(path)
-	if err != nil {
-		return "", err
-	}
-	defer bin.Close()
-
-	hash := sha256.New()
-	if _, err := io.Copy(hash, bin); err != nil {
-		return "", err
-	}
-
-	if _, err := bin.Seek(0, io.SeekStart); err != nil {
-		return "", err
-	}
-
-	digest := fmt.Sprintf("sha256:%x", hash.Sum(nil))
-	if err = client.CreateBlob(cmd.Context(), digest, bin); err != nil {
-		return "", err
-	}
-	return digest, nil
-}
-
 func RunHandler(cmd *cobra.Command, args []string) error {
-	if os.Getenv("OLLAMA_MODELS") != "" {
-		return errors.New("OLLAMA_MODELS must only be set for 'ollama serve'")
-	}
-
-	if err := checkServerHeartbeat(cmd, args); err != nil {
-		return err
-	}
-
 	client, err := api.ClientFromEnvironment()
 	if err != nil {
 		return err
 	}

 	name := args[0]
-
 	// check if the model exists on the server
-	show, err := client.Show(cmd.Context(), &api.ShowRequest{Name: name})
+	_, err = client.Show(cmd.Context(), &api.ShowRequest{Name: name})
 	var statusError api.StatusError
 	switch {
 	case errors.As(err, &statusError) && statusError.StatusCode == http.StatusNotFound:
 		if err := PullHandler(cmd, []string{name}); err != nil {
 			return err
 		}
-
-		show, err = client.Show(cmd.Context(), &api.ShowRequest{Name: name})
-		if err != nil {
-			return err
-		}
 	case err != nil:
 		return err
 	}

-	interactive := true
-
-	opts := runOptions{
-		Model:       args[0],
-		WordWrap:    os.Getenv("TERM") == "xterm-256color",
-		Options:     map[string]interface{}{},
-		MultiModal:  slices.Contains(show.Details.Families, "clip"),
-		ParentModel: show.Details.ParentModel,
-	}
-
-	format, err := cmd.Flags().GetString("format")
-	if err != nil {
-		return err
-	}
-	opts.Format = format
-
-	prompts := args[1:]
-	// prepend stdin to the prompt if provided
-	if !term.IsTerminal(int(os.Stdin.Fd())) {
-		in, err := io.ReadAll(os.Stdin)
-		if err != nil {
-			return err
-		}
-
-		prompts = append([]string{string(in)}, prompts...)
-		opts.WordWrap = false
-		interactive = false
-	}
-	opts.Prompt = strings.Join(prompts, " ")
-	if len(prompts) > 0 {
-		interactive = false
-	}
-
-	nowrap, err := cmd.Flags().GetBool("nowordwrap")
-	if err != nil {
-		return err
-	}
-	opts.WordWrap = !nowrap
-
-	if !interactive {
-		return generate(cmd, opts)
-	}
-
-	return generateInteractive(cmd, opts)
+	return RunGenerate(cmd, args)
 }

 func PushHandler(cmd *cobra.Command, args []string) error {
@@ -555,20 +410,63 @@ func PullHandler(cmd *cobra.Command, args []string) error {
 	return nil
 }

+func RunGenerate(cmd *cobra.Command, args []string) error {
+	interactive := true
+
+	opts := runOptions{
+		Model:    args[0],
+		WordWrap: os.Getenv("TERM") == "xterm-256color",
+		Options:  map[string]interface{}{},
+	}
+
+	format, err := cmd.Flags().GetString("format")
+	if err != nil {
+		return err
+	}
+	opts.Format = format
+
+	prompts := args[1:]
+	// prepend stdin to the prompt if provided
+	if !term.IsTerminal(int(os.Stdin.Fd())) {
+		in, err := io.ReadAll(os.Stdin)
+		if err != nil {
+			return err
+		}
+
+		prompts = append([]string{string(in)}, prompts...)
+		opts.WordWrap = false
+		interactive = false
+	}
+	opts.Prompt = strings.Join(prompts, " ")
+	if len(prompts) > 0 {
+		interactive = false
+	}
+
+	nowrap, err := cmd.Flags().GetBool("nowordwrap")
+	if err != nil {
+		return err
+	}
+	opts.WordWrap = !nowrap
+
+	if !interactive {
+		return generate(cmd, opts)
+	}
+
+	return generateInteractive(cmd, opts)
+}
+
 type generateContextKey string

 type runOptions struct {
-	Model       string
-	ParentModel string
-	Prompt      string
-	Messages    []api.Message
-	WordWrap    bool
-	Format      string
-	System      string
-	Template    string
-	Images      []api.ImageData
-	Options     map[string]interface{}
-	MultiModal  bool
+	Model    string
+	Prompt   string
+	Messages []api.Message
+	WordWrap bool
+	Format   string
+	System   string
+	Template string
+	Images   []api.ImageData
+	Options  map[string]interface{}
 }

 type displayResponseState struct {
@@ -730,18 +628,10 @@ func generate(cmd *cobra.Command, opts runOptions) error {
 		return nil
 	}

-	if opts.MultiModal {
-		opts.Prompt, opts.Images, err = extractFileData(opts.Prompt)
-		if err != nil {
-			return err
-		}
-	}
-
 	request := api.GenerateRequest{
 		Model:    opts.Model,
 		Prompt:   opts.Prompt,
 		Context:  generateContext,
-		Images:   opts.Images,
 		Format:   opts.Format,
 		System:   opts.System,
 		Template: opts.Template,
@@ -780,7 +670,7 @@ func generate(cmd *cobra.Command, opts runOptions) error {
 }

 func RunServer(cmd *cobra.Command, _ []string) error {
-	host, port, err := net.SplitHostPort(strings.Trim(os.Getenv("OLLAMA_HOST"), "\"'"))
+	host, port, err := net.SplitHostPort(os.Getenv("OLLAMA_HOST"))
 	if err != nil {
 		host, port = "127.0.0.1", "11434"
 		if ip := net.ParseIP(strings.Trim(os.Getenv("OLLAMA_HOST"), "[]")); ip != nil {
@@ -812,42 +702,59 @@ func initializeKeypair() error {
 	_, err = os.Stat(privKeyPath)
 	if os.IsNotExist(err) {
 		fmt.Printf("Couldn't find '%s'. Generating new private key.\n", privKeyPath)
-		cryptoPublicKey, cryptoPrivateKey, err := ed25519.GenerateKey(rand.Reader)
+		_, privKey, err := ed25519.GenerateKey(rand.Reader)
 		if err != nil {
 			return err
 		}

-		privateKeyBytes, err := ssh.MarshalPrivateKey(cryptoPrivateKey, "")
+		privKeyBytes, err := format.OpenSSHPrivateKey(privKey, "")
 		if err != nil {
 			return err
 		}

-		if err := os.MkdirAll(filepath.Dir(privKeyPath), 0o755); err != nil {
+		err = os.MkdirAll(filepath.Dir(privKeyPath), 0o755)
+		if err != nil {
 			return fmt.Errorf("could not create directory %w", err)
 		}

-		if err := os.WriteFile(privKeyPath, pem.EncodeToMemory(privateKeyBytes), 0o600); err != nil {
-			return err
-		}
-
-		sshPublicKey, err := ssh.NewPublicKey(cryptoPublicKey)
+		err = os.WriteFile(privKeyPath, pem.EncodeToMemory(privKeyBytes), 0o600)
 		if err != nil {
 			return err
 		}

-		publicKeyBytes := ssh.MarshalAuthorizedKey(sshPublicKey)
-
-		if err := os.WriteFile(pubKeyPath, publicKeyBytes, 0o644); err != nil {
+		sshPrivateKey, err := ssh.NewSignerFromKey(privKey)
+		if err != nil {
 			return err
 		}

-		fmt.Printf("Your new public key is: \n\n%s\n", publicKeyBytes)
+		pubKeyData := ssh.MarshalAuthorizedKey(sshPrivateKey.PublicKey())
+
+		err = os.WriteFile(pubKeyPath, pubKeyData, 0o644)
+		if err != nil {
+			return err
+		}
+
+		fmt.Printf("Your new public key is: \n\n%s\n", string(pubKeyData))
 	}
 	return nil
 }

-//nolint:unused
-func waitForServer(ctx context.Context, client *api.Client) error {
+func startMacApp(ctx context.Context, client *api.Client) error {
+	exe, err := os.Executable()
+	if err != nil {
+		return err
+	}
+	link, err := os.Readlink(exe)
+	if err != nil {
+		return err
+	}
+	if !strings.Contains(link, "Ollama.app") {
+		return fmt.Errorf("could not find ollama app")
+	}
+	path := strings.Split(link, "Ollama.app")
+	if err := exec.Command("/usr/bin/open", "-a", path[0]+"Ollama.app").Run(); err != nil {
+		return err
+	}
 	// wait for the server to start
 	timeout := time.After(5 * time.Second)
 	tick := time.Tick(500 * time.Millisecond)
@@ -861,7 +768,6 @@ func waitForServer(ctx context.Context, client *api.Client) error {
 			}
 		}
 	}
-
 }

 func checkServerHeartbeat(cmd *cobra.Command, _ []string) error {
@@ -870,11 +776,15 @@ func checkServerHeartbeat(cmd *cobra.Command, _ []string) error {
 		return err
 	}
 	if err := client.Heartbeat(cmd.Context()); err != nil {
-		if !strings.Contains(err.Error(), " refused") {
+		if !strings.Contains(err.Error(), "connection refused") {
 			return err
 		}
-		if err := startApp(cmd.Context(), client); err != nil {
-			return fmt.Errorf("could not connect to ollama app, is it running?")
+		if runtime.GOOS == "darwin" {
+			if err := startMacApp(cmd.Context(), client); err != nil {
+				return fmt.Errorf("could not connect to ollama app, is it running?")
+			}
+		} else {
+			return fmt.Errorf("could not connect to ollama server, run 'ollama serve' to start it")
 		}
 	}
 	return nil
@@ -900,22 +810,10 @@ func versionHandler(cmd *cobra.Command, _ []string) {
 	}
 }

-func appendHostEnvDocs(cmd *cobra.Command) {
-	const hostEnvDocs = `
-Environment Variables:
-      OLLAMA_HOST        The host:port or base URL of the Ollama server (e.g. http://localhost:11434)
-`
-	cmd.SetUsageTemplate(cmd.UsageTemplate() + hostEnvDocs)
-}
-
 func NewCLI() *cobra.Command {
 	log.SetFlags(log.LstdFlags | log.Lshortfile)
 	cobra.EnableCommandSorting = false

-	if runtime.GOOS == "windows" {
-		console.ConsoleFromFile(os.Stdin) //nolint:errcheck
-	}
-
 	rootCmd := &cobra.Command{
 		Use:           "ollama",
 		Short:         "Large language model runner",
@@ -945,7 +843,6 @@ func NewCLI() *cobra.Command {
 	}

 	createCmd.Flags().StringP("file", "f", "Modelfile", "Name of the Modelfile (default \"Modelfile\")")
-	createCmd.Flags().StringP("quantization", "q", "", "Quantization level.")

 	showCmd := &cobra.Command{
 		Use:     "show MODEL",
@@ -962,16 +859,18 @@ func NewCLI() *cobra.Command {
 	showCmd.Flags().Bool("system", false, "Show system message of a model")

 	runCmd := &cobra.Command{
-		Use:   "run MODEL [PROMPT]",
-		Short: "Run a model",
-		Args:  cobra.MinimumNArgs(1),
-		RunE:  RunHandler,
+		Use:     "run MODEL [PROMPT]",
+		Short:   "Run a model",
+		Args:    cobra.MinimumNArgs(1),
+		PreRunE: checkServerHeartbeat,
+		RunE:    RunHandler,
 	}

 	runCmd.Flags().Bool("verbose", false, "Show timings for response")
 	runCmd.Flags().Bool("insecure", false, "Use an insecure registry")
 	runCmd.Flags().Bool("nowordwrap", false, "Don't wrap words to the next line automatically")
 	runCmd.Flags().String("format", "", "Response format (e.g. json)")
+
 	serveCmd := &cobra.Command{
 		Use:     "serve",
 		Aliases: []string{"start"},
@@ -979,15 +878,6 @@ func NewCLI() *cobra.Command {
 		Args:    cobra.ExactArgs(0),
 		RunE:    RunServer,
 	}
-	serveCmd.SetUsageTemplate(serveCmd.UsageTemplate() + `
-Environment Variables:
-
-    OLLAMA_HOST         The host:port to bind to (default "127.0.0.1:11434")
-    OLLAMA_ORIGINS      A comma separated list of allowed origins.
-    OLLAMA_MODELS       The path to the models directory (default is "~/.ollama/models")
-    OLLAMA_KEEP_ALIVE   The duration that models stay loaded in memory (default is "5m")
-    OLLAMA_DEBUG        Set to 1 to enable additional debug logging
-`)

 	pullCmd := &cobra.Command{
 		Use:     "pull MODEL",
@@ -1016,6 +906,7 @@ Environment Variables:
 		PreRunE: checkServerHeartbeat,
 		RunE:    ListHandler,
 	}
+
 	copyCmd := &cobra.Command{
 		Use:     "cp SOURCE TARGET",
 		Short:   "Copy a model",
@@ -1032,19 +923,6 @@ Environment Variables:
 		RunE:    DeleteHandler,
 	}

-	for _, cmd := range []*cobra.Command{
-		createCmd,
-		showCmd,
-		runCmd,
-		pullCmd,
-		pushCmd,
-		listCmd,
-		copyCmd,
-		deleteCmd,
-	} {
-		appendHostEnvDocs(cmd)
-	}
-
 	rootCmd.AddCommand(
 		serveCmd,
 		createCmd,
--- a/cmd/interactive.go
+++ b/cmd/interactive.go
@@ -6,17 +6,14 @@ import (
 	"io"
 	"net/http"
 	"os"
-	"path/filepath"
 	"regexp"
-	"sort"
 	"strings"

 	"github.com/spf13/cobra"
 	"golang.org/x/exp/slices"

-	"github.com/ollama/ollama/api"
-	"github.com/ollama/ollama/progress"
-	"github.com/ollama/ollama/readline"
+	"github.com/jmorganca/ollama/api"
+	"github.com/jmorganca/ollama/readline"
 )

 type MultilineState int
@@ -28,82 +25,45 @@ const (
 	MultilineTemplate
 )

-func loadModel(cmd *cobra.Command, opts *runOptions) error {
+func modelIsMultiModal(cmd *cobra.Command, name string) bool {
+	// get model details
 	client, err := api.ClientFromEnvironment()
 	if err != nil {
-		return err
+		fmt.Println("error: couldn't connect to ollama server")
+		return false
 	}

-	p := progress.NewProgress(os.Stderr)
-	defer p.StopAndClear()
-
-	spinner := progress.NewSpinner("")
-	p.Add("", spinner)
-
-	showReq := api.ShowRequest{Name: opts.Model}
-	showResp, err := client.Show(cmd.Context(), &showReq)
+	req := api.ShowRequest{Name: name}
+	resp, err := client.Show(cmd.Context(), &req)
 	if err != nil {
-		return err
-	}
-	opts.MultiModal = slices.Contains(showResp.Details.Families, "clip")
-	opts.ParentModel = showResp.Details.ParentModel
-
-	if len(showResp.Messages) > 0 {
-		opts.Messages = append(opts.Messages, showResp.Messages...)
+		return false
 	}

-	chatReq := &api.ChatRequest{
-		Model:    opts.Model,
-		Messages: []api.Message{},
-	}
-	err = client.Chat(cmd.Context(), chatReq, func(resp api.ChatResponse) error {
-		p.StopAndClear()
-		if len(opts.Messages) > 0 {
-			for _, msg := range opts.Messages {
-				switch msg.Role {
-				case "user":
-					fmt.Printf(">>> %s\n", msg.Content)
-				case "assistant":
-					state := &displayResponseState{}
-					displayResponse(msg.Content, opts.WordWrap, state)
-					fmt.Println()
-					fmt.Println()
-				}
-			}
-		}
-		return nil
-	})
-	if err != nil {
-		return err
-	}
-
-	return nil
+	return slices.Contains(resp.Details.Families, "clip")
 }

 func generateInteractive(cmd *cobra.Command, opts runOptions) error {
-	opts.Messages = make([]api.Message, 0)
+	multiModal := modelIsMultiModal(cmd, opts.Model)

-	err := loadModel(cmd, &opts)
-	if err != nil {
+	// load the model
+	loadOpts := runOptions{
+		Model:    opts.Model,
+		Prompt:   "",
+		Messages: []api.Message{},
+	}
+	if _, err := chat(cmd, loadOpts); err != nil {
 		return err
 	}

 	usage := func() {
 		fmt.Fprintln(os.Stderr, "Available Commands:")
-		fmt.Fprintln(os.Stderr, "  /set            Set session variables")
-		fmt.Fprintln(os.Stderr, "  /show           Show model information")
-		fmt.Fprintln(os.Stderr, "  /load <model>   Load a session or model")
-		fmt.Fprintln(os.Stderr, "  /save <model>   Save your current session")
-		fmt.Fprintln(os.Stderr, "  /bye            Exit")
-		fmt.Fprintln(os.Stderr, "  /?, /help       Help for a command")
-		fmt.Fprintln(os.Stderr, "  /? shortcuts    Help for keyboard shortcuts")
+		fmt.Fprintln(os.Stderr, "  /set          Set session variables")
+		fmt.Fprintln(os.Stderr, "  /show         Show model information")
+		fmt.Fprintln(os.Stderr, "  /bye          Exit")
+		fmt.Fprintln(os.Stderr, "  /?, /help     Help for a command")
+		fmt.Fprintln(os.Stderr, "  /? shortcuts  Help for keyboard shortcuts")
 		fmt.Fprintln(os.Stderr, "")
 		fmt.Fprintln(os.Stderr, "Use \"\"\" to begin a multi-line message.")
-
-		if opts.MultiModal {
-			fmt.Fprintf(os.Stderr, "Use %s to include .jpg or .png images.\n", filepath.FromSlash("/path/to/file"))
-		}
-
 		fmt.Fprintln(os.Stderr, "")
 	}

@@ -180,6 +140,7 @@ func generateInteractive(cmd *cobra.Command, opts runOptions) error {

 	var sb strings.Builder
 	var multiline MultilineState
+	opts.Messages = make([]api.Message, 0)

 	for {
 		line, err := scanner.Readline()
@@ -213,7 +174,6 @@ func generateInteractive(cmd *cobra.Command, opts runOptions) error {
 			switch multiline {
 			case MultilineSystem:
 				opts.System = sb.String()
-				opts.Messages = append(opts.Messages, api.Message{Role: "system", Content: opts.System})
 				fmt.Println("Set system message.")
 				sb.Reset()
 			case MultilineTemplate:
@@ -233,6 +193,7 @@ func generateInteractive(cmd *cobra.Command, opts runOptions) error {
 				fmt.Fprintln(&sb)
 				multiline = MultilinePrompt
 				scanner.Prompt.UseAlt = true
+				break
 			}
 		case scanner.Pasting:
 			fmt.Fprintln(&sb, line)
@@ -242,44 +203,6 @@ func generateInteractive(cmd *cobra.Command, opts runOptions) error {
 			if err := ListHandler(cmd, args[1:]); err != nil {
 				return err
 			}
-		case strings.HasPrefix(line, "/load"):
-			args := strings.Fields(line)
-			if len(args) != 2 {
-				fmt.Println("Usage:\n  /load <modelname>")
-				continue
-			}
-			opts.Model = args[1]
-			opts.Messages = []api.Message{}
-			fmt.Printf("Loading model '%s'\n", opts.Model)
-			if err := loadModel(cmd, &opts); err != nil {
-				return err
-			}
-			continue
-		case strings.HasPrefix(line, "/save"):
-			args := strings.Fields(line)
-			if len(args) != 2 {
-				fmt.Println("Usage:\n  /save <modelname>")
-				continue
-			}
-
-			client, err := api.ClientFromEnvironment()
-			if err != nil {
-				fmt.Println("error: couldn't connect to ollama server")
-				return err
-			}
-
-			req := &api.CreateRequest{
-				Name:      args[1],
-				Modelfile: buildModelfile(opts),
-			}
-			fn := func(resp api.ProgressResponse) error { return nil }
-			err = client.Create(cmd.Context(), req, fn)
-			if err != nil {
-				fmt.Println("error: couldn't save model")
-				return err
-			}
-			fmt.Printf("Created new model '%s'\n", args[1])
-			continue
 		case strings.HasPrefix(line, "/set"):
 			args := strings.Fields(line)
 			if len(args) > 1 {
@@ -295,14 +218,10 @@ func generateInteractive(cmd *cobra.Command, opts runOptions) error {
 					opts.WordWrap = false
 					fmt.Println("Set 'nowordwrap' mode.")
 				case "verbose":
-					if err := cmd.Flags().Set("verbose", "true"); err != nil {
-						return err
-					}
+					cmd.Flags().Set("verbose", "true")
 					fmt.Println("Set 'verbose' mode.")
 				case "quiet":
-					if err := cmd.Flags().Set("verbose", "false"); err != nil {
-						return err
-					}
+					cmd.Flags().Set("verbose", "false")
 					fmt.Println("Set 'quiet' mode.")
 				case "format":
 					if len(args) < 3 || args[2] != "json" {
@@ -358,21 +277,11 @@ func generateInteractive(cmd *cobra.Command, opts runOptions) error {
 					}

 					if args[1] == "system" {
-						opts.System = sb.String() // for display in modelfile
-						newMessage := api.Message{Role: "system", Content: sb.String()}
-						// Check if the slice is not empty and the last message is from 'system'
-						if len(opts.Messages) > 0 && opts.Messages[len(opts.Messages)-1].Role == "system" {
-							// Replace the last message
-							opts.Messages[len(opts.Messages)-1] = newMessage
-						} else {
-							opts.Messages = append(opts.Messages, newMessage)
-						}
+						opts.System = sb.String()
 						fmt.Println("Set system message.")
-						sb.Reset()
 					} else if args[1] == "template" {
 						opts.Template = sb.String()
 						fmt.Println("Set prompt template.")
-						sb.Reset()
 					}

 					sb.Reset()
@@ -474,13 +383,13 @@ func generateInteractive(cmd *cobra.Command, opts runOptions) error {
 			} else {
 				usage()
 			}
-		case strings.HasPrefix(line, "/exit"), strings.HasPrefix(line, "/bye"):
+		case line == "/exit", line == "/bye":
 			return nil
 		case strings.HasPrefix(line, "/"):
 			args := strings.Fields(line)
 			isFile := false

-			if opts.MultiModal {
+			if multiModal {
 				for _, f := range extractFileNames(line) {
 					if strings.HasPrefix(f, args[0]) {
 						isFile = true
@@ -502,23 +411,34 @@ func generateInteractive(cmd *cobra.Command, opts runOptions) error {
 		if sb.Len() > 0 && multiline == MultilineNone {
 			newMessage := api.Message{Role: "user", Content: sb.String()}

-			if opts.MultiModal {
+			if multiModal {
 				msg, images, err := extractFileData(sb.String())
 				if err != nil {
 					return err
 				}
+				newMessage.Content = msg

-				// clear all previous images for better responses
+				// reset the context if we find another image
 				if len(images) > 0 {
-					for i := range opts.Messages {
-						opts.Messages[i].Images = nil
+					newMessage.Images = append(newMessage.Images, images...)
+					// reset the context for the new image
+					opts.Messages = []api.Message{}
+				} else {
+					if len(opts.Messages) > 1 {
+						newMessage.Images = append(newMessage.Images, opts.Messages[len(opts.Messages)-2].Images...)
 					}
 				}
-
-				newMessage.Content = msg
-				newMessage.Images = images
+				if len(newMessage.Images) == 0 {
+					fmt.Println("This model requires you to add a jpeg, png, or svg image.")
+					fmt.Println()
+					sb.Reset()
+					continue
+				}
 			}

+			if opts.System != "" {
+				opts.Messages = append(opts.Messages, api.Message{Role: "system", Content: opts.System})
+			}
 			opts.Messages = append(opts.Messages, newMessage)

 			assistant, err := chat(cmd, opts)
@@ -534,38 +454,6 @@ func generateInteractive(cmd *cobra.Command, opts runOptions) error {
 	}
 }

-func buildModelfile(opts runOptions) string {
-	var mf strings.Builder
-	model := opts.ParentModel
-	if model == "" {
-		model = opts.Model
-	}
-	fmt.Fprintf(&mf, "FROM %s\n", model)
-	if opts.System != "" {
-		fmt.Fprintf(&mf, "SYSTEM \"\"\"%s\"\"\"\n", opts.System)
-	}
-
-	if opts.Template != "" {
-		fmt.Fprintf(&mf, "TEMPLATE \"\"\"%s\"\"\"\n", opts.Template)
-	}
-
-	keys := make([]string, 0)
-	for k := range opts.Options {
-		keys = append(keys, k)
-	}
-	sort.Strings(keys)
-	for _, k := range keys {
-		fmt.Fprintf(&mf, "PARAMETER %s %v\n", k, opts.Options[k])
-	}
-	fmt.Fprintln(&mf)
-
-	for _, msg := range opts.Messages {
-		fmt.Fprintf(&mf, "MESSAGE %s \"\"\"%s\"\"\"\n", msg.Role, msg.Content)
-	}
-
-	return mf.String()
-}
-
 func normalizeFilePath(fp string) string {
 	// Define a map of escaped characters and their replacements
 	replacements := map[string]string{
@@ -612,10 +500,10 @@ func extractFileData(input string) (string, []api.ImageData, error) {
 			if os.IsNotExist(err) {
 				continue
 			}
-			fmt.Fprintf(os.Stderr, "Couldn't process image: %q\n", err)
+			fmt.Printf("Couldn't process image: %q\n", err)
 			return "", imgs, err
 		}
-		fmt.Fprintf(os.Stderr, "Added image '%s'\n", nfp)
+		fmt.Printf("Added image '%s'\n", nfp)
 		input = strings.ReplaceAll(input, fp, "")
 		imgs = append(imgs, data)
 	}
@@ -636,7 +524,7 @@ func getImageData(filePath string) ([]byte, error) {
 	}

 	contentType := http.DetectContentType(buf)
-	allowedTypes := []string{"image/jpeg", "image/jpg", "image/png"}
+	allowedTypes := []string{"image/jpeg", "image/jpg", "image/svg+xml", "image/png"}
 	if !slices.Contains(allowedTypes, contentType) {
 		return nil, fmt.Errorf("invalid image type: %s", contentType)
 	}
--- a/cmd/interactive_test.go
+++ b/cmd/interactive_test.go
@@ -1,13 +1,9 @@
 package cmd

 import (
-	"bytes"
 	"testing"
-	"text/template"

 	"github.com/stretchr/testify/assert"
-
-	"github.com/ollama/ollama/api"
 )

 func TestExtractFilenames(t *testing.T) {
@@ -53,64 +49,3 @@ d:\path with\spaces\seven.svg inbetween7 c:\users\jdoe\eight.png inbetween8
 	assert.Contains(t, res[9], "ten.svg")
 	assert.Contains(t, res[9], "E:")
 }
-
-func TestModelfileBuilder(t *testing.T) {
-	opts := runOptions{
-		Model:    "hork",
-		System:   "You are part horse and part shark, but all hork. Do horklike things",
-		Template: "This is a template.",
-		Messages: []api.Message{
-			{Role: "user", Content: "Hey there hork!"},
-			{Role: "assistant", Content: "Yes it is true, I am half horse, half shark."},
-		},
-		Options: map[string]interface{}{},
-	}
-
-	opts.Options["temperature"] = 0.9
-	opts.Options["seed"] = 42
-	opts.Options["penalize_newline"] = false
-	opts.Options["stop"] = []string{"hi", "there"}
-
-	mf := buildModelfile(opts)
-	expectedModelfile := `FROM {{.Model}}
-SYSTEM """{{.System}}"""
-TEMPLATE """{{.Template}}"""
-PARAMETER penalize_newline false
-PARAMETER seed 42
-PARAMETER stop [hi there]
-PARAMETER temperature 0.9
-
-MESSAGE user """Hey there hork!"""
-MESSAGE assistant """Yes it is true, I am half horse, half shark."""
-`
-
-	tmpl, err := template.New("").Parse(expectedModelfile)
-	assert.Nil(t, err)
-
-	var buf bytes.Buffer
-	err = tmpl.Execute(&buf, opts)
-	assert.Nil(t, err)
-	assert.Equal(t, buf.String(), mf)
-
-	opts.ParentModel = "horseshark"
-	mf = buildModelfile(opts)
-	expectedModelfile = `FROM {{.ParentModel}}
-SYSTEM """{{.System}}"""
-TEMPLATE """{{.Template}}"""
-PARAMETER penalize_newline false
-PARAMETER seed 42
-PARAMETER stop [hi there]
-PARAMETER temperature 0.9
-
-MESSAGE user """Hey there hork!"""
-MESSAGE assistant """Yes it is true, I am half horse, half shark."""
-`
-
-	tmpl, err = template.New("").Parse(expectedModelfile)
-	assert.Nil(t, err)
-
-	var parentBuf bytes.Buffer
-	err = tmpl.Execute(&parentBuf, opts)
-	assert.Nil(t, err)
-	assert.Equal(t, parentBuf.String(), mf)
-}
--- a/cmd/start_darwin.go
+++ b/cmd/start_darwin.go
@@ -1,30 +0,0 @@
-package cmd
-
-import (
-	"context"
-	"fmt"
-	"os"
-	"os/exec"
-	"strings"
-
-	"github.com/ollama/ollama/api"
-)
-
-func startApp(ctx context.Context, client *api.Client) error {
-	exe, err := os.Executable()
-	if err != nil {
-		return err
-	}
-	link, err := os.Readlink(exe)
-	if err != nil {
-		return err
-	}
-	if !strings.Contains(link, "Ollama.app") {
-		return fmt.Errorf("could not find ollama app")
-	}
-	path := strings.Split(link, "Ollama.app")
-	if err := exec.Command("/usr/bin/open", "-a", path[0]+"Ollama.app").Run(); err != nil {
-		return err
-	}
-	return waitForServer(ctx, client)
-}
--- a/cmd/start_default.go
+++ b/cmd/start_default.go
@@ -1,14 +0,0 @@
-//go:build !windows && !darwin
-
-package cmd
-
-import (
-	"context"
-	"fmt"
-
-	"github.com/ollama/ollama/api"
-)
-
-func startApp(ctx context.Context, client *api.Client) error {
-	return fmt.Errorf("could not connect to ollama server, run 'ollama serve' to start it")
-}
--- a/cmd/start_windows.go
+++ b/cmd/start_windows.go
@@ -1,58 +0,0 @@
-package cmd
-
-import (
-	"context"
-	"errors"
-	"fmt"
-	"os"
-	"os/exec"
-	"path/filepath"
-	"strings"
-	"syscall"
-
-	"github.com/ollama/ollama/api"
-)
-
-func startApp(ctx context.Context, client *api.Client) error {
-	// log.Printf("XXX Attempting to find and start ollama app")
-	AppName := "ollama app.exe"
-	exe, err := os.Executable()
-	if err != nil {
-		return err
-	}
-	appExe := filepath.Join(filepath.Dir(exe), AppName)
-	_, err = os.Stat(appExe)
-	if errors.Is(err, os.ErrNotExist) {
-		// Try the standard install location
-		localAppData := os.Getenv("LOCALAPPDATA")
-		appExe = filepath.Join(localAppData, "Ollama", AppName)
-		_, err := os.Stat(appExe)
-		if errors.Is(err, os.ErrNotExist) {
-			// Finally look in the path
-			appExe, err = exec.LookPath(AppName)
-			if err != nil {
-				return fmt.Errorf("could not locate ollama app")
-			}
-		}
-	}
-	// log.Printf("XXX attempting to start app %s", appExe)
-
-	cmd_path := "c:\\Windows\\system32\\cmd.exe"
-	cmd := exec.Command(cmd_path, "/c", appExe)
-	// TODO - these hide flags aren't working - still pops up a command window for some reason
-	cmd.SysProcAttr = &syscall.SysProcAttr{CreationFlags: 0x08000000, HideWindow: true}
-
-	// TODO this didn't help either...
-	cmd.Stdin = strings.NewReader("")
-	cmd.Stdout = os.Stdout
-	cmd.Stderr = os.Stderr
-
-	if err := cmd.Start(); err != nil {
-		return fmt.Errorf("unable to start ollama app %w", err)
-	}
-
-	if cmd.Process != nil {
-		defer cmd.Process.Release() //nolint:errcheck
-	}
-	return waitForServer(ctx, client)
-}
--- a/convert/convert.go
+++ b/convert/convert.go
@@ -1,433 +0,0 @@
-package convert
-
-import (
-	"bytes"
-	"cmp"
-	"encoding/binary"
-	"encoding/json"
-	"fmt"
-	"io"
-	"log/slog"
-	"os"
-	"path/filepath"
-	"regexp"
-	"slices"
-
-	"github.com/d4l3k/go-bfloat16"
-	"github.com/mitchellh/mapstructure"
-	"github.com/x448/float16"
-	"google.golang.org/protobuf/proto"
-
-	"github.com/ollama/ollama/convert/sentencepiece"
-	"github.com/ollama/ollama/llm"
-)
-
-type Params struct {
-	Architectures    []string `json:"architectures"`
-	VocabSize        int      `json:"vocab_size"`
-	HiddenSize       int      `json:"hidden_size"`       // n_embd
-	HiddenLayers     int      `json:"num_hidden_layers"` // n_layer
-	ContextSize      int      `json:"max_position_embeddings"`
-	IntermediateSize int      `json:"intermediate_size"`
-	AttentionHeads   int      `json:"num_attention_heads"` // n_head
-	KeyValHeads      int      `json:"num_key_value_heads"`
-	NormEPS          float64  `json:"rms_norm_eps"`
-	BoSTokenID       int      `json:"bos_token_id"`
-	EoSTokenID       int      `json:"eos_token_id"`
-	HeadDimension    int      `json:"head_dim"`
-	PaddingTokenID   int      `json:"pad_token_id"`
-
-	ByteOrder
-}
-
-type ByteOrder interface {
-	binary.ByteOrder
-	binary.AppendByteOrder
-}
-
-type MetaData struct {
-	Type    string `mapstructure:"dtype"`
-	Shape   []int  `mapstructure:"shape"`
-	Offsets []int  `mapstructure:"data_offsets"`
-}
-
-type ModelArch interface {
-	GetTensors() error
-	LoadVocab() error
-	WriteGGUF() (string, error)
-}
-
-type ModelData struct {
-	Path    string
-	Name    string
-	Params  *Params
-	Vocab   *Vocab
-	Tensors []llm.Tensor
-}
-
-func ReadSafeTensors(fn string, offset uint64, params *Params) ([]llm.Tensor, uint64, error) {
-	f, err := os.Open(fn)
-	if err != nil {
-		return nil, 0, err
-	}
-	defer f.Close()
-
-	var jsonSize uint64
-	if err := binary.Read(f, binary.LittleEndian, &jsonSize); err != nil {
-		return nil, 0, err
-	}
-
-	buf := make([]byte, jsonSize)
-	_, err = io.ReadFull(f, buf)
-	if err != nil {
-		return nil, 0, err
-	}
-
-	d := json.NewDecoder(bytes.NewBuffer(buf))
-	d.UseNumber()
-	var parsed map[string]interface{}
-	if err = d.Decode(&parsed); err != nil {
-		return nil, 0, err
-	}
-
-	var keys []string
-	for k := range parsed {
-		keys = append(keys, k)
-	}
-
-	slices.Sort(keys)
-
-	slog.Info("converting layers")
-
-	var tensors []llm.Tensor
-	for _, k := range keys {
-		vals := parsed[k].(map[string]interface{})
-		var data MetaData
-		if err = mapstructure.Decode(vals, &data); err != nil {
-			return nil, 0, err
-		}
-
-		var size uint64
-		var kind uint32
-		switch len(data.Shape) {
-		case 0:
-			// metadata
-			continue
-		case 1:
-			// convert to float32
-			kind = 0
-			size = uint64(data.Shape[0] * 4)
-		case 2:
-			// convert to float16
-			kind = 1
-			size = uint64(data.Shape[0] * data.Shape[1] * 2)
-		}
-
-		ggufName, err := GetTensorName(k)
-		if err != nil {
-			slog.Error("%v", err)
-			return nil, 0, err
-		}
-
-		shape := []uint64{0, 0, 0, 0}
-		for i := range data.Shape {
-			shape[i] = uint64(data.Shape[i])
-		}
-
-		t := llm.Tensor{
-			Name:   ggufName,
-			Kind:   kind,
-			Offset: offset,
-			Shape:  shape[:],
-		}
-
-		t.WriterTo = safetensorWriterTo{
-			t:        &t,
-			params:   params,
-			bo:       params.ByteOrder,
-			filename: fn,
-			start:    uint64(data.Offsets[0]),
-			end:      uint64(data.Offsets[1]),
-			padding:  8 + jsonSize,
-		}
-
-		slog.Debug(fmt.Sprintf("%v", t))
-		tensors = append(tensors, t)
-		offset += size
-	}
-	return tensors, offset, nil
-}
-
-func GetSafeTensors(dirpath string, params *Params) ([]llm.Tensor, error) {
-	var tensors []llm.Tensor
-	files, err := filepath.Glob(filepath.Join(dirpath, "/model-*.safetensors"))
-	if err != nil {
-		return nil, err
-	}
-
-	var offset uint64
-	for _, f := range files {
-		var t []llm.Tensor
-		var err error
-		t, offset, err = ReadSafeTensors(f, offset, params)
-		if err != nil {
-			slog.Error("%v", err)
-			return nil, err
-		}
-		tensors = append(tensors, t...)
-	}
-	return tensors, nil
-}
-
-func GetParams(dirpath string) (*Params, error) {
-	f, err := os.Open(filepath.Join(dirpath, "config.json"))
-	if err != nil {
-		return nil, err
-	}
-	defer f.Close()
-
-	var params Params
-
-	d := json.NewDecoder(f)
-	err = d.Decode(&params)
-	if err != nil {
-		return nil, err
-	}
-
-	params.ByteOrder = binary.LittleEndian
-	return &params, nil
-}
-
-// Details on gguf's tokenizer can be found at:
-// https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#tokenizer
-type Vocab struct {
-	Tokens []string
-	Scores []float32
-	Types  []int32
-}
-
-func LoadSentencePieceTokens(dirpath string, vocabSize int) (*Vocab, error) {
-	slog.Info(fmt.Sprintf("reading vocab from %s", filepath.Join(dirpath, "tokenizer.model")))
-	in, err := os.ReadFile(filepath.Join(dirpath, "tokenizer.model"))
-	if err != nil {
-		return nil, err
-	}
-
-	// To regenerate sentencepiece from the protobufs use:
-	// protoc -I=./ --go_out=./ sentencepiece_model.proto
-	modelProto := &sentencepiece.ModelProto{}
-	if err := proto.Unmarshal(in, modelProto); err != nil {
-		return nil, err
-	}
-
-	v := &Vocab{
-		Tokens: make([]string, 0),
-		Scores: make([]float32, 0),
-		Types:  make([]int32, 0),
-	}
-
-	pieces := modelProto.GetPieces()
-	for _, p := range pieces {
-		v.Tokens = append(v.Tokens, p.GetPiece())
-		v.Scores = append(v.Scores, p.GetScore())
-		t := p.GetType()
-		switch t {
-		case sentencepiece.ModelProto_SentencePiece_UNKNOWN:
-		case sentencepiece.ModelProto_SentencePiece_CONTROL:
-		case sentencepiece.ModelProto_SentencePiece_UNUSED:
-		case sentencepiece.ModelProto_SentencePiece_BYTE:
-		default:
-			t = sentencepiece.ModelProto_SentencePiece_NORMAL
-		}
-		v.Types = append(v.Types, int32(t))
-	}
-
-	slog.Info(fmt.Sprintf("vocab size: %d", len(v.Tokens)))
-
-	// add any additional tokens
-	addIn, err := os.ReadFile(filepath.Join(dirpath, "added_tokens.json"))
-	if os.IsNotExist(err) {
-		return v, nil
-	} else if err != nil {
-		return nil, err
-	}
-
-	slog.Info("reading user defined tokens")
-
-	var extraTokenData map[string]int
-	if err := json.Unmarshal(addIn, &extraTokenData); err != nil {
-		return nil, err
-	}
-
-	type token struct {
-		key string
-		pos int
-	}
-
-	extraTokens := make([]token, 0)
-	for k, id := range extraTokenData {
-		extraTokens = append(extraTokens, token{k, id})
-	}
-
-	slices.SortFunc(extraTokens, func(a, b token) int {
-		return cmp.Compare(a.pos, b.pos)
-	})
-
-	numToks := len(v.Tokens)
-
-	for cnt, t := range extraTokens {
-		// the token id should match the specific index for the total number of tokens
-		if t.pos != cnt+numToks {
-			return nil, fmt.Errorf("token ID '%d' for '%s' doesn't match total token size", t.pos, t.key)
-		}
-		v.Tokens = append(v.Tokens, t.key)
-		v.Scores = append(v.Scores, -1000.0)
-		v.Types = append(v.Types, int32(llm.GGUFTokenUserDefined))
-	}
-	slog.Info(fmt.Sprintf("vocab size w/ extra tokens: %d", len(v.Tokens)))
-
-	if vocabSize > len(v.Tokens) {
-		missingTokens := vocabSize - len(v.Tokens)
-		slog.Warn(fmt.Sprintf("vocab is missing %d tokens", missingTokens))
-		for cnt := 0; cnt < missingTokens; cnt++ {
-			v.Tokens = append(v.Tokens, fmt.Sprintf("<dummy%05d>", cnt+1))
-			v.Scores = append(v.Scores, -1)
-			v.Types = append(v.Types, int32(llm.GGUFTokenUserDefined))
-		}
-	}
-
-	return v, nil
-}
-
-func GetTensorName(n string) (string, error) {
-	tMap := map[string]string{
-		"model.embed_tokens.weight":                           "token_embd.weight",
-		"model.layers.(\\d+).input_layernorm.weight":          "blk.$1.attn_norm.weight",
-		"model.layers.(\\d+).mlp.down_proj.weight":            "blk.$1.ffn_down.weight",
-		"model.layers.(\\d+).mlp.gate_proj.weight":            "blk.$1.ffn_gate.weight",
-		"model.layers.(\\d+).mlp.up_proj.weight":              "blk.$1.ffn_up.weight",
-		"model.layers.(\\d+).post_attention_layernorm.weight": "blk.$1.ffn_norm.weight",
-		"model.layers.(\\d+).self_attn.k_proj.weight":         "blk.$1.attn_k.weight",
-		"model.layers.(\\d+).self_attn.o_proj.weight":         "blk.$1.attn_output.weight",
-		"model.layers.(\\d+).self_attn.q_proj.weight":         "blk.$1.attn_q.weight",
-		"model.layers.(\\d+).self_attn.v_proj.weight":         "blk.$1.attn_v.weight",
-		"lm_head.weight":    "output.weight",
-		"model.norm.weight": "output_norm.weight",
-	}
-
-	v, ok := tMap[n]
-	if ok {
-		return v, nil
-	}
-
-	// quick hack to rename the layers to gguf format
-	for k, v := range tMap {
-		re := regexp.MustCompile(k)
-		newName := re.ReplaceAllString(n, v)
-		if newName != n {
-			return newName, nil
-		}
-	}
-
-	return "", fmt.Errorf("couldn't find a layer name for '%s'", n)
-}
-
-type safetensorWriterTo struct {
-	t *llm.Tensor
-
-	params *Params
-	bo     ByteOrder
-
-	filename string
-
-	start, end, padding uint64
-	handler             func(w io.Writer, r safetensorWriterTo, f *os.File) error
-}
-
-func (r safetensorWriterTo) WriteTo(w io.Writer) (n int64, err error) {
-	f, err := os.Open(r.filename)
-	if err != nil {
-		return 0, err
-	}
-	defer f.Close()
-
-	if _, err = f.Seek(int64(r.padding+r.start), 0); err != nil {
-		return 0, err
-	}
-
-	// use the handler if one is present
-	if r.handler != nil {
-		return 0, r.handler(w, r, f)
-	}
-
-	remaining := r.end - r.start
-
-	bufSize := uint64(10240)
-	var finished bool
-	for {
-		data := make([]byte, min(bufSize, remaining))
-
-		b, err := io.ReadFull(f, data)
-		remaining -= uint64(b)
-
-		if err == io.EOF || remaining <= 0 {
-			finished = true
-		} else if err != nil {
-			return 0, err
-		}
-
-		// convert bfloat16 -> ieee float32
-		tDataF32 := bfloat16.DecodeFloat32(data)
-
-		switch r.t.Kind {
-		case 0:
-			if err := binary.Write(w, r.bo, tDataF32); err != nil {
-				return 0, err
-			}
-		case 1:
-			// convert float32 -> float16
-			tempBuf := make([]uint16, len(data)/2)
-			for cnt, v := range tDataF32 {
-				tDataF16 := float16.Fromfloat32(v)
-				tempBuf[cnt] = uint16(tDataF16)
-			}
-			if err := binary.Write(w, binary.LittleEndian, tempBuf); err != nil {
-				return 0, err
-			}
-		}
-		if finished {
-			break
-		}
-	}
-	return 0, nil
-}
-
-func GetModelArchFromParams(name, dirPath string, params *Params) (ModelArch, error) {
-	switch len(params.Architectures) {
-	case 0:
-		return nil, fmt.Errorf("No architecture specified to convert")
-	case 1:
-		switch params.Architectures[0] {
-		case "MistralForCausalLM":
-			return &MistralModel{
-				ModelData{
-					Name:   name,
-					Path:   dirPath,
-					Params: params,
-				},
-			}, nil
-		case "GemmaForCausalLM":
-			return &GemmaModel{
-				ModelData{
-					Name:   name,
-					Path:   dirPath,
-					Params: params,
-				},
-			}, nil
-		default:
-			return nil, fmt.Errorf("Models based on '%s' are not yet supported", params.Architectures[0])
-		}
-	}
-
-	return nil, fmt.Errorf("Unknown error")
-}
--- a/convert/gemma.go
+++ b/convert/gemma.go
@@ -1,136 +0,0 @@
-package convert
-
-import (
-	"encoding/binary"
-	"fmt"
-	"io"
-	"log/slog"
-	"os"
-	"strings"
-
-	"github.com/d4l3k/go-bfloat16"
-	"github.com/pdevine/tensor"
-	"github.com/pdevine/tensor/native"
-
-	"github.com/ollama/ollama/llm"
-)
-
-type GemmaModel struct {
-	ModelData
-}
-
-func gemmaLayerHandler(w io.Writer, r safetensorWriterTo, f *os.File) error {
-	slog.Debug(fmt.Sprintf("converting '%s'", r.t.Name))
-
-	data := make([]byte, r.end-r.start)
-	if err := binary.Read(f, r.bo, data); err != nil {
-		return err
-	}
-
-	tDataF32 := bfloat16.DecodeFloat32(data)
-
-	var err error
-	tDataF32, err = addOnes(tDataF32, int(r.t.Shape[0]))
-	if err != nil {
-		return err
-	}
-
-	if err := binary.Write(w, r.bo, tDataF32); err != nil {
-		return err
-	}
-	return nil
-}
-
-func addOnes(data []float32, vectorSize int) ([]float32, error) {
-	n := tensor.New(tensor.WithShape(vectorSize), tensor.WithBacking(data))
-	ones := tensor.Ones(tensor.Float32, vectorSize)
-
-	var err error
-	n, err = n.Add(ones)
-	if err != nil {
-		return []float32{}, err
-	}
-
-	newN, err := native.SelectF32(n, 0)
-	if err != nil {
-		return []float32{}, err
-	}
-
-	var fullTensor []float32
-	for _, v := range newN {
-		fullTensor = append(fullTensor, v...)
-	}
-
-	return fullTensor, nil
-}
-
-func (m *GemmaModel) GetTensors() error {
-	t, err := GetSafeTensors(m.Path, m.Params)
-	if err != nil {
-		return err
-	}
-
-	m.Tensors = []llm.Tensor{}
-
-	for _, l := range t {
-		if strings.HasSuffix(l.Name, "norm.weight") {
-			wt := l.WriterTo.(safetensorWriterTo)
-			wt.handler = gemmaLayerHandler
-			l.WriterTo = wt
-		}
-		m.Tensors = append(m.Tensors, l)
-	}
-
-	return nil
-}
-
-func (m *GemmaModel) LoadVocab() error {
-	v, err := LoadSentencePieceTokens(m.Path, m.Params.VocabSize)
-	if err != nil {
-		return err
-	}
-	m.Vocab = v
-	return nil
-}
-
-func (m *GemmaModel) WriteGGUF() (string, error) {
-	kv := llm.KV{
-		"general.architecture":                   "gemma",
-		"general.name":                           m.Name,
-		"gemma.context_length":                   uint32(m.Params.ContextSize),
-		"gemma.embedding_length":                 uint32(m.Params.HiddenSize),
-		"gemma.block_count":                      uint32(m.Params.HiddenLayers),
-		"gemma.feed_forward_length":              uint32(m.Params.IntermediateSize),
-		"gemma.attention.head_count":             uint32(m.Params.AttentionHeads),
-		"gemma.attention.head_count_kv":          uint32(m.Params.KeyValHeads),
-		"gemma.attention.layer_norm_rms_epsilon": float32(m.Params.NormEPS),
-		"gemma.attention.key_length":             uint32(m.Params.HeadDimension),
-		"gemma.attention.value_length":           uint32(m.Params.HeadDimension),
-		"general.file_type":                      uint32(1),
-		"tokenizer.ggml.model":                   "llama",
-
-		"tokenizer.ggml.tokens":     m.Vocab.Tokens,
-		"tokenizer.ggml.scores":     m.Vocab.Scores,
-		"tokenizer.ggml.token_type": m.Vocab.Types,
-
-		"tokenizer.ggml.bos_token_id":     uint32(m.Params.BoSTokenID),
-		"tokenizer.ggml.eos_token_id":     uint32(m.Params.EoSTokenID),
-		"tokenizer.ggml.padding_token_id": uint32(m.Params.PaddingTokenID),
-		"tokenizer.ggml.unknown_token_id": uint32(3),
-		"tokenizer.ggml.add_bos_token":    true,
-		"tokenizer.ggml.add_eos_token":    false,
-	}
-
-	f, err := os.CreateTemp("", "ollama-gguf")
-	if err != nil {
-		return "", err
-	}
-	defer f.Close()
-
-	mod := llm.NewGGUFV3(m.Params.ByteOrder)
-	if err := mod.Encode(f, kv, m.Tensors); err != nil {
-		return "", err
-	}
-
-	return f.Name(), nil
-}
--- a/convert/mistral.go
+++ b/convert/mistral.go
@@ -1,173 +0,0 @@
-package convert
-
-import (
-	"encoding/binary"
-	"fmt"
-	"io"
-	"os"
-	"regexp"
-	"strings"
-
-	"github.com/d4l3k/go-bfloat16"
-	"github.com/pdevine/tensor"
-	"github.com/pdevine/tensor/native"
-	"github.com/x448/float16"
-
-	"github.com/ollama/ollama/llm"
-)
-
-type MistralModel struct {
-	ModelData
-}
-
-func mistralLayerHandler(w io.Writer, r safetensorWriterTo, f *os.File) error {
-	layerSize := r.end - r.start
-
-	var err error
-	tData := make([]uint16, layerSize/2)
-	if err = binary.Read(f, r.bo, tData); err != nil {
-		return err
-	}
-
-	var heads uint32
-	if strings.Contains(r.t.Name, "attn_q") {
-		heads = uint32(r.params.AttentionHeads)
-	} else if strings.Contains(r.t.Name, "attn_k") {
-		heads = uint32(r.params.KeyValHeads)
-		if heads == 0 {
-			heads = uint32(r.params.AttentionHeads)
-		}
-	} else {
-		return fmt.Errorf("unknown layer type")
-	}
-
-	tData, err = repack(tData, int(heads), r.t.Shape)
-	if err != nil {
-		return err
-	}
-
-	var buf []byte
-	for _, n := range tData {
-		buf = r.bo.AppendUint16(buf, n)
-	}
-
-	tempBuf := make([]uint16, len(tData))
-	tDataF32 := bfloat16.DecodeFloat32(buf)
-	for cnt, v := range tDataF32 {
-		tDataF16 := float16.Fromfloat32(v)
-		tempBuf[cnt] = uint16(tDataF16)
-	}
-
-	if err = binary.Write(w, r.bo, tempBuf); err != nil {
-		return err
-	}
-	return nil
-}
-
-func repack(data []uint16, heads int, shape []uint64) ([]uint16, error) {
-	n := tensor.New(tensor.WithShape(int(shape[0]), int(shape[1])), tensor.WithBacking(data))
-	origShape := n.Shape().Clone()
-
-	// reshape the tensor and swap axes 1 and 2 to unpack the layer for gguf
-	if err := n.Reshape(heads, 2, origShape[0]/heads/2, origShape[1]); err != nil {
-		return nil, err
-	}
-
-	if err := n.T(0, 2, 1, 3); err != nil {
-		return nil, err
-	}
-
-	if err := n.Reshape(origShape...); err != nil {
-		return nil, err
-	}
-
-	if err := n.Transpose(); err != nil {
-		return nil, err
-	}
-	newN, err := native.SelectU16(n, 1)
-	if err != nil {
-		return nil, err
-	}
-
-	var fullTensor []uint16
-	for _, v := range newN {
-		fullTensor = append(fullTensor, v...)
-	}
-	return fullTensor, nil
-}
-
-func (m *MistralModel) GetTensors() error {
-	t, err := GetSafeTensors(m.Path, m.Params)
-	if err != nil {
-		return err
-	}
-
-	m.Tensors = []llm.Tensor{}
-
-	pattern := `^blk\.[0-9]+\.attn_(?P<layer>q|k)\.weight$`
-	re, err := regexp.Compile(pattern)
-	if err != nil {
-		return err
-	}
-
-	for _, l := range t {
-		matches := re.FindAllStringSubmatch(l.Name, -1)
-		if len(matches) > 0 {
-			wt := l.WriterTo.(safetensorWriterTo)
-			wt.handler = mistralLayerHandler
-			l.WriterTo = wt
-		}
-		m.Tensors = append(m.Tensors, l)
-	}
-
-	return nil
-}
-
-func (m *MistralModel) LoadVocab() error {
-	v, err := LoadSentencePieceTokens(m.Path, m.Params.VocabSize)
-	if err != nil {
-		return err
-	}
-	m.Vocab = v
-	return nil
-}
-
-func (m *MistralModel) WriteGGUF() (string, error) {
-	kv := llm.KV{
-		"general.architecture":                   "llama",
-		"general.name":                           m.Name,
-		"llama.context_length":                   uint32(m.Params.ContextSize),
-		"llama.embedding_length":                 uint32(m.Params.HiddenSize),
-		"llama.block_count":                      uint32(m.Params.HiddenLayers),
-		"llama.feed_forward_length":              uint32(m.Params.IntermediateSize),
-		"llama.rope.dimension_count":             uint32(m.Params.HiddenSize / m.Params.AttentionHeads),
-		"llama.attention.head_count":             uint32(m.Params.AttentionHeads),
-		"llama.attention.head_count_kv":          uint32(m.Params.KeyValHeads),
-		"llama.attention.layer_norm_rms_epsilon": float32(m.Params.NormEPS),
-		"general.file_type":                      uint32(1),
-		"tokenizer.ggml.model":                   "llama",
-
-		"tokenizer.ggml.tokens":     m.Vocab.Tokens,
-		"tokenizer.ggml.scores":     m.Vocab.Scores,
-		"tokenizer.ggml.token_type": m.Vocab.Types,
-
-		"tokenizer.ggml.bos_token_id":     uint32(m.Params.BoSTokenID),
-		"tokenizer.ggml.eos_token_id":     uint32(m.Params.EoSTokenID),
-		"tokenizer.ggml.add_bos_token":    true,
-		"tokenizer.ggml.add_eos_token":    false,
-		"tokenizer.ggml.unknown_token_id": uint32(0),
-	}
-
-	f, err := os.CreateTemp("", "ollama-gguf")
-	if err != nil {
-		return "", err
-	}
-	defer f.Close()
-
-	mod := llm.NewGGUFV3(m.Params.ByteOrder)
-	if err := mod.Encode(f, kv, m.Tensors); err != nil {
-		return "", err
-	}
-
-	return f.Name(), nil
-}
--- a/convert/sentencepiece/sentencepiece_model.pb.go
+++ b/convert/sentencepiece/sentencepiece_model.pb.go
--- a/convert/sentencepiece_model.proto
+++ b/convert/sentencepiece_model.proto
@@ -1,333 +0,0 @@
-// Copyright 2016 Google Inc.
-//
-// Licensed under the Apache License, Version 2.0 (the "License");
-// you may not use this file except in compliance with the License.
-// You may obtain a copy of the License at
-//
-//     http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing, software
-// distributed under the License is distributed on an "AS IS" BASIS,
-// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-// See the License for the specific language governing permissions and
-// limitations under the License.!
-
-syntax = "proto2";
-
-// TODO(taku): Needs to use LITE RUNTIME in OSS release.
-option optimize_for = LITE_RUNTIME;
-option go_package = "./sentencepiece";
-
-package sentencepiece;
-
-// TrainerSpec encodes a various parameters for SentencePiece training.
-// Next id: 55
-message TrainerSpec {
-  ///////////////////////////////////////////////////////////////////
-  // General parameters
-  //
-  // Input corpus files.
-  //  Trainer accepts the following two formats:
-  //  A) Monolingual: plain text, one sentence per line.
-  //  B) Bilingual:   TSV, source sentence <tab> target sentence
-  //  When bilingual data is passed, shared vocabulary model is built.
-  //  Note that the input file must be raw corpus, not a preprocessed corpus.
-  //  Trainer only loads the first `input_sentence_size` sentences specified
-  //  with this parameter.
-  repeated string input = 1;
-
-  // Input corpus format:
-  // "text": one-sentence-per-line text format (default)
-  // "tsv":  sentence <tab> freq
-  optional string input_format = 7;
-
-  // Output model file prefix.
-  // <model_prefix>.model and <model_prefix>.vocab are generated.
-  optional string model_prefix = 2;
-
-  // Model type. only have UNIGRAM now.
-  enum ModelType {
-    UNIGRAM = 1;  // Unigram language model with dynamic algorithm
-    BPE = 2;      // Byte Pair Encoding
-    WORD = 3;     // Delimitered by whitespace.
-    CHAR = 4;     // tokenizes into character sequence
-  }
-  optional ModelType model_type = 3 [default = UNIGRAM];
-
-  // Vocabulary size. 8k is the default size.
-  optional int32 vocab_size = 4 [default = 8000];
-
-  // List of the languages this model can accept.
-  // Since the model is language-agnostic, this field is used as a reference.
-  repeated string accept_language = 5;
-
-  // Size of self-test samples, which are encoded in the model file.
-  optional int32 self_test_sample_size = 6 [default = 0];
-
-  // Whether to use DP version of sentencepiece. Use it with TSV input format
-  // (requires precomputed word tab counts to work).
-  optional bool enable_differential_privacy = 50 [default = false];
-  // Set these parameters if you need DP version of sentencepiece.
-  // std of noise to add.
-  optional float differential_privacy_noise_level = 51 [default = 0.0];
-  // Clipping threshold to apply after adding noise. All the words with
-  // frequency less than this value are dropped.
-  optional uint64 differential_privacy_clipping_threshold = 52 [default = 0];
-
-  ///////////////////////////////////////////////////////////////////
-  // Training parameters.
-  //
-  // Uses characters which cover the corpus with the ratio of `chars_coverage`.
-  // This parameter determines the set of basic Alphabet of sentence piece.
-  // 1.0 - `chars_coverage` characters are treated as UNK.
-  // See also required_chars field.
-  optional float character_coverage = 10 [default = 0.9995];
-
-  // Maximum size of sentences the trainer loads from `input` parameter.
-  // Trainer simply loads the `input` files in sequence.
-  // It is better to shuffle the input corpus randomly.
-  optional uint64 input_sentence_size = 11 [default = 0];
-  optional bool shuffle_input_sentence = 19 [default = true];
-
-  // Maximum size of sentences to make seed sentence pieces.
-  // Extended suffix array is constructed to extract frequent
-  // sub-strings from the corpus. This uses 20N working space,
-  // where N is the size of corpus.
-  optional int32 mining_sentence_size = 12 [deprecated = true];
-
-  // Maximum size of sentences to train sentence pieces.
-  optional int32 training_sentence_size = 13 [deprecated = true];
-
-  // The size of seed sentencepieces.
-  // `seed_sentencepiece_size` must be larger than `vocab_size`.
-  optional int32 seed_sentencepiece_size = 14 [default = 1000000];
-
-  // In every EM sub-iterations, keeps top
-  // `shrinking_factor` * `current sentencepieces size` with respect to
-  // the loss of the sentence piece. This value should be smaller than 1.0.
-  optional float shrinking_factor = 15 [default = 0.75];
-
-  // The maximum sentence length in byte. The sentences with the length
-  // larger than `max_sentence_length` is simply ignored.
-  // Longer input tends to bring the following risks:
-  //  * Overflow during EM training (unigram language model only)
-  //  * Performance drop because of O(n log n) cost in BPE.
-  optional int32 max_sentence_length = 18 [default = 4192];
-
-  // Number of threads in the training.
-  optional int32 num_threads = 16 [default = 16];
-
-  // Number of EM sub iterations.
-  optional int32 num_sub_iterations = 17 [default = 2];
-
-  ///////////////////////////////////////////////////////////////////
-  // SentencePiece parameters which control the shapes of sentence piece.
-  //
-  // Maximum length of sentencepiece.
-  optional int32 max_sentencepiece_length = 20 [default = 16];
-
-  // Uses Unicode script to split sentence pieces.
-  // When `split_by_unicode_script` is true, we do not allow sentence piece to
-  // include multiple Unicode scripts, e.g. "F1" is not a valid piece.
-  // Exception: CJ characters (Hiragana/Katakana/Han) are all handled
-  // as one script type, since Japanese word can consist of multiple scripts.
-  // This exception is always applied regardless of the accept-language
-  // parameter.
-  optional bool split_by_unicode_script = 21 [default = true];
-
-  // When `split_by_number` is true, put a boundary between number and
-  // non-number transition. If we want to treat "F1" is one token, set this flag
-  // to be false.
-  optional bool split_by_number = 23 [default = true];
-
-  // Use a white space to split sentence pieces.
-  // When `split_by_whitespace` is false, we may have the piece containing
-  // a white space in the middle. e.g., "in_the".
-  optional bool split_by_whitespace = 22 [default = true];
-
-  // Adds whitespace symbol (_) as a suffix instead of prefix. e.g., _hello =>
-  // hello_. When `treat_whitespace_as_suffix` is true,
-  // NormalizerSpec::add_dummy_prefix will add the dummy whitespace to the end
-  // of sentence.
-  optional bool treat_whitespace_as_suffix = 24 [default = false];
-
-  // Allows pieces that only contain whitespaces instead of appearing only as
-  // prefix or suffix of other pieces.
-  optional bool allow_whitespace_only_pieces = 26 [default = false];
-
-  // Split all digits (0-9) into separate pieces.
-  optional bool split_digits = 25 [default = false];
-
-  // Defines the pre-tokenization delimiter.
-  // When specified, no pieces crossing this delimiter is not included
-  // in the vocab. Then the delimiter string is virtually ignored
-  // during the training. This field can allows constraints on the vocabulary
-  // selection. Note that this field is available on unigram mode.
-  optional string pretokenization_delimiter = 53 [ default = ""];
-
-  ///////////////////////////////////////////////////////////////////
-  // Vocabulary management
-  //
-  // Defines control symbols used as an indicator to
-  // change the behavior of the decoder. <s> and </s> are pre-defined.
-  // We can use this field to encode various meta information,
-  // including language indicator in multilingual model.
-  // These symbols are not visible to users, but visible to
-  // the decoder. Note that when the input sentence contains control symbols,
-  // they are not treated as one token, but segmented into normal pieces.
-  // Control symbols must be inserted independently from the segmentation.
-  repeated string control_symbols = 30;
-
-  // Defines user defined symbols.
-  // These symbols are added with extremely high score
-  // so they are always treated as one unique symbol in any context.
-  // Typical usage of user_defined_symbols is placeholder for named entities.
-  repeated string user_defined_symbols = 31;
-
-  // Defines required characters. Each UTF8 character in this string is included
-  // in the character set regardless of character_coverage value. Unlike
-  // user_defined_symbols, these characters have scores based on the frequency
-  // on input sentences, and the model can form subwords using characters
-  // in this field.
-  optional string required_chars = 36;
-
-  // Decomposes unknown pieces into UTF-8 bytes.
-  optional bool byte_fallback = 35 [default = false];
-
-  // When creating the vocabulary file, defines whether or not to additionally
-  // output the score for each piece.
-  optional bool vocabulary_output_piece_score = 32 [default = true];
-
-  // `vocab_size` is treated as hard limit. Crash if
-  // the model can not produce the vocab of size `vocab_size`,
-  // When `hard_vocab_limit` is false, vocab_size is treated
-  // as soft limit. Note that when model_type=char,
-  // always assumes hard_vocab_limit = false.
-  optional bool hard_vocab_limit = 33 [default = true];
-
-  // use all symbols for vocab extraction. This flag is valid
-  // if model type is either CHAR or WORD
-  optional bool use_all_vocab = 34 [default = false];
-
-  ///////////////////////////////////////////////////////////////////
-  // Reserved special meta tokens.
-  // * -1 is not used.
-  // * unk_id must not be -1.
-  // Id must starts with 0 and be contigous.
-  optional int32 unk_id = 40 [default = 0];   // <unk>
-  optional int32 bos_id = 41 [default = 1];   // <s>
-  optional int32 eos_id = 42 [default = 2];   // </s>
-  optional int32 pad_id = 43 [default = -1];  // <pad> (padding)
-  optional string unk_piece = 45 [default = "<unk>"];
-  optional string bos_piece = 46 [default = "<s>"];
-  optional string eos_piece = 47 [default = "</s>"];
-  optional string pad_piece = 48 [default = "<pad>"];
-
-  // Encodes <unk> into U+2047 (DOUBLE QUESTION MARK),
-  // since this character can be useful both for user and
-  // developer. We can easily figure out that <unk> is emitted.
-  optional string unk_surface = 44 [default = " \xE2\x81\x87 "];
-
-  // Increase bit depth to allow unigram model training on large
-  // (>10M sentences) corpora. A Side-effect of enabling this flag
-  // is increased memory usage.
-  optional bool train_extremely_large_corpus = 49 [default = false];
-
- // Path to a seed sentencepieces file, with one tab-separated
-  // seed sentencepiece <tab> frequency per line.
-  optional string seed_sentencepieces_file = 54 [default = ""];
-
-  // Customized extensions: the range of field numbers
-  // are open to third-party extensions.
-  extensions 200 to max;
-}
-
-// NormalizerSpec encodes a various parameters for string normalizaiton
-message NormalizerSpec {
-  // name of normalization rule.
-  optional string name = 1;
-
-  // Pre-compiled normalization rule created by
-  // Builder::GetPrecompiledCharsMap() or Builder::CompileCharsMap() method.
-  // Usually this field is set by Builder::GetNormalizerSpec() method.
-  optional bytes precompiled_charsmap = 2;
-
-  // Adds dummy whitespace at the beginning of text in order to
-  // treat "world" in "world" and "hello world" in the same way.
-  optional bool add_dummy_prefix = 3 [default = true];
-
-  // Removes leading, trailing, and duplicate internal whitespace.
-  optional bool remove_extra_whitespaces = 4 [default = true];
-
-  // Replaces whitespace with meta symbol.
-  // This field must be true to train sentence piece model.
-  optional bool escape_whitespaces = 5 [default = true];
-
-  // Custom normalization rule file in TSV format.
-  // https://github.com/google/sentencepiece/blob/master/doc/normalization.md
-  // This field is only used in SentencePieceTrainer::Train() method, which
-  // compiles the rule into the binary rule stored in `precompiled_charsmap`.
-  optional string normalization_rule_tsv = 6;
-
-  // Customized extensions: the range of field numbers
-  // are open to third-party extensions.
-  extensions 200 to max;
-}
-
-// Proto to store samples for self-testing.
-message SelfTestData {
-  message Sample {
-    optional string input = 1;
-    optional string expected = 2;
-  }
-  repeated Sample samples = 1;
-
-  // Customized extensions: the range of field numbers
-  // are open to third-party extensions.
-  extensions 200 to max;
-}
-
-// ModelProto stores model parameters.
-// SentencePieceProcessor is supposed to be self-contained.
-// All settings/parameters which may change the behavior must be encoded
-// in ModelProto.
-message ModelProto {
-  message SentencePiece {
-    enum Type {
-      NORMAL = 1;        // normal symbol
-      UNKNOWN = 2;       // unknown symbol. only <unk> for now.
-      CONTROL = 3;       // control symbols. </s>, <s>, <2ja> etc.
-      USER_DEFINED = 4;  // user defined symbols.
-                         // Typical usage of USER_DEFINED symbol
-                         // is placeholder.
-      BYTE = 6;          // byte symbols. Used when `byte_fallback` is true.
-      UNUSED = 5;        // this piece is not used.
-    }
-    optional string piece = 1;  // piece must not be empty.
-    optional float score = 2;
-    optional Type type = 3 [default = NORMAL];
-
-    // Customized extensions: the range of field numbers
-    // are open to third-party extensions.
-    extensions 200 to max;
-  }
-
-  // Sentence pieces with scores.
-  repeated SentencePiece pieces = 1;
-
-  // Spec used to generate this model file.
-  optional TrainerSpec trainer_spec = 2;
-
-  // Spec for text normalization.
-  optional NormalizerSpec normalizer_spec = 3;
-
-  // Stores sample input and its expected segmentation to verify the model.
-  optional SelfTestData self_test_data = 4;
-
-  // Spec for text de-normalization.
-  optional NormalizerSpec denormalizer_spec = 5;
-
-  // Customized extensions: the range of field numbers
-  // are open to third-party extensions.
-  extensions 200 to max;
-}
--- a/docs/README.md
+++ b/docs/README.md
@@ -1,21 +1,25 @@
 # Documentation

-### Getting Started
-* [Quickstart](../README.md#quickstart)
-* [Examples](../examples)
-* [Importing models](./import.md)
-* [Linux Documentation](./linux.md)
-* [Windows Documentation](./windows.md)
-* [Docker Documentation](https://hub.docker.com/r/ollama/ollama)
+To get started, see the project's **[quickstart](../README.md#quickstart)**.

-### Reference
+Ollama is a tool for running AI models on your hardware. Many users will choose to use the Command Line Interface (CLI) to work with Ollama. Learn more about all the commands in the CLI in the **[Main Readme](../README.md)**.

-* [API Reference](./api.md)
-* [Modelfile Reference](./modelfile.md)
-* [OpenAI Compatibility](./openai.md)
+Use the RESTful API using any language, including Python, JavaScript, Typescript, Go, Rust, and many more. Learn more about using the API in the **[API Documentation](./api.md)**.

-### Resources
+Create new models or modify models already in the library using the Modelfile. Learn more about the Modelfile syntax in the **[Modelfile Documentation](./modelfile.md)**.

-* [Troubleshooting Guide](./troubleshooting.md)
-* [FAQ](./faq.md)
-* [Development guide](./development.md)
+Import models using source model weights found on Hugging Face and similar sites by referring to the **[Import Documentation](./import.md)**.
+
+Installing on Linux in most cases is easy using the script on Ollama.ai. To get more detail about the install, including CUDA drivers, see the **[Linux Documentation](./linux.md)**.
+
+Many of our users like the flexibility of using our official Docker Image. Learn more about using Docker with Ollama using the **[Docker Documentation](https://hub.docker.com/r/ollama/ollama)**.
+
+It is easy to install on Linux and Mac, but many users will choose to build Ollama on their own. To do this, refer to the **[Development Documentation](./development.md)**.
+
+If encountering a problem with Ollama, the best place to start is the logs. Find more information about them here in the **[Troubleshooting Guide](./troubleshooting.md)**.
+
+Finally for all the questions that don't fit anywhere else, there is the **[FAQ](./faq.md)**
+
+[Tutorials](./tutorials.md) apply the documentation to tasks.
+
+For working code examples of using Ollama, see [Examples](../examples).
--- a/docs/api.md
+++ b/docs/api.md
@@ -49,12 +49,11 @@ Advanced parameters (optional):
 - `template`: the prompt template to use (overrides what is defined in the `Modelfile`)
 - `context`: the context parameter returned from a previous request to `/generate`, this can be used to keep a short conversational memory
 - `stream`: if `false` the response will be returned as a single response object, rather than a stream of objects
- `raw`: if `true` no formatting will be applied to the prompt. You may choose to use the `raw` parameter if you are specifying a full templated prompt in your request to the API
- `keep_alive`: controls how long the model will stay loaded into memory following the request (default: `5m`)
+- `raw`: if `true` no formatting will be applied to the prompt. You may choose to use the `raw` parameter if you are specifying a full templated prompt in your request to the API.

 #### JSON mode

-Enable JSON mode by setting the `format` parameter to `json`. This will structure the response as a valid JSON object. See the JSON mode [example](#request-json-mode) below.
+Enable JSON mode by setting the `format` parameter to `json`. This will structure the response as a valid JSON object. See the JSON mode [example](#generate-request-json-mode) below.

 > Note: it's important to instruct the model to use JSON in the `prompt`. Otherwise, the model may generate large amounts whitespace.

@@ -247,23 +246,6 @@ curl http://localhost:11434/api/generate -d '{
 }'
 ```

-#### Request (Reproducible outputs)
-
-For reproducible outputs, set `temperature` to 0 and `seed` to a number:
-
-##### Request
-
-```shell
-curl http://localhost:11434/api/generate -d '{
-  "model": "mistral",
-  "prompt": "Why is the sky blue?",
-  "options": {
-    "seed": 123,
-    "temperature": 0
-  }
-}'
-```
-
 ##### Response

 ```json
@@ -321,6 +303,7 @@ curl http://localhost:11434/api/generate -d '{
    "vocab_only": false,
    "use_mmap": true,
    "use_mlock": false,
+    "embedding_only": false,
    "rope_frequency_base": 1.1,
    "rope_frequency_scale": 0.8,
    "num_thread": 8
@@ -394,8 +377,8 @@ Advanced parameters (optional):

 - `format`: the format to return a response in. Currently the only accepted value is `json`
 - `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.md#valid-parameters-and-values) such as `temperature`
+- `template`: the prompt template to use (overrides what is defined in the `Modelfile`)
 - `stream`: if `false` the response will be returned as a single response object, rather than a stream of objects
- `keep_alive`: controls how long the model will stay loaded into memory following the request (default: `5m`)

 ### Examples

@@ -559,7 +542,7 @@ curl http://localhost:11434/api/chat -d '{
      "role": "user",
      "content": "what is in this image?",
      "images": ["iVBORw0KGgoAAAANSUhEUgAAAG0AAABmCAYAAADBPx+VAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAA3VSURBVHgB7Z27r0zdG8fX743i1bi1ikMoFMQloXRpKFFIqI7LH4BEQ+NWIkjQuSWCRIEoULk0gsK1kCBI0IhrQVT7tz/7zZo888yz1r7MnDl7z5xvsjkzs2fP3uu71nNfa7lkAsm7d++Sffv2JbNmzUqcc8m0adOSzZs3Z+/XES4ZckAWJEGWPiCxjsQNLWmQsWjRIpMseaxcuTKpG/7HP27I8P79e7dq1ars/yL4/v27S0ejqwv+cUOGEGGpKHR37tzJCEpHV9tnT58+dXXCJDdECBE2Ojrqjh071hpNECjx4cMHVycM1Uhbv359B2F79+51586daxN/+pyRkRFXKyRDAqxEp4yMlDDzXG1NPnnyJKkThoK0VFd1ELZu3TrzXKxKfW7dMBQ6bcuWLW2v0VlHjx41z717927ba22U9APcw7Nnz1oGEPeL3m3p2mTAYYnFmMOMXybPPXv2bNIPpFZr1NHn4HMw0KRBjg9NuRw95s8PEcz/6DZELQd/09C9QGq5RsmSRybqkwHGjh07OsJSsYYm3ijPpyHzoiacg35MLdDSIS/O1yM778jOTwYUkKNHWUzUWaOsylE00MyI0fcnOwIdjvtNdW/HZwNLGg+sR1kMepSNJXmIwxBZiG8tDTpEZzKg0GItNsosY8USkxDhD0Rinuiko2gfL/RbiD2LZAjU9zKQJj8RDR0vJBR1/Phx9+PHj9Z7REF4nTZkxzX4LCXHrV271qXkBAPGfP/atWvu/PnzHe4C97F48eIsRLZ9+3a3f/9+87dwP1JxaF7/3r17ba+5l4EcaVo0lj3SBq5kGTJSQmLWMjgYNei2GPT1MuMqGTDEFHzeQSP2wi/jGnkmPJ/nhccs44jvDAxpVcxnq0F6eT8h4ni/iIWpR5lPyA6ETkNXoSukvpJAD3AsXLiwpZs49+fPn5ke4j10TqYvegSfn0OnafC+Tv9ooA/JPkgQysqQNBzagXY55nO/oa1F7qvIPWkRL12WRpMWUvpVDYmxAPehxWSe8ZEXL20sadYIozfmNch4QJPAfeJgW3rNsnzphBKNJM2KKODo1rVOMRYik5ETy3ix4qWNI81qAAirizgMIc+yhTytx0JWZuNI03qsrgWlGtwjoS9XwgUhWGyhUaRZZQNNIEwCiXD16tXcAHUs79co0vSD8rrJCIW98pzvxpAWyyo3HYwqS0+H0BjStClcZJT5coMm6D2LOF8TolGJtK9fvyZpyiC5ePFi9nc/oJU4eiEP0jVoAnHa9wyJycITMP78+eMeP37sXrx44d6+fdt6f82aNdkx1pg9e3Zb5W+RSRE+n+VjksQWifvVaTKFhn5O8my63K8Qabdv33b379/PiAP//vuvW7BggZszZ072/+TJk91YgkafPn166zXB1rQHFvouAWHq9z3SEevSUerqCn2/dDCeta2jxYbr69evk4MHDyY7d+7MjhMnTiTPnz9Pfv/+nfQT2ggpO2dMF8cghuoM7Ygj5iWCqRlGFml0QC/ftGmTmzt3rmsaKDsgBSPh0/8yPeLLBihLkOKJc0jp8H8vUzcxIA1k6QJ/c78tWEyj5P3o4u9+jywNPdJi5rAH9x0KHcl4Hg570eQp3+vHXGyrmEeigzQsQsjavXt38ujRo44LQuDDhw+TW7duRS1HGgMxhNXHgflaNTOsHyKvHK5Ijo2jbFjJBQK9YwFd6RVMzfgRBmEfP37suBBm/p49e1qjEP2mwTViNRo0VJWH1deMXcNK08uUjVUu7s/zRaL+oLNxz1bpANco4npUgX4G2eFbpDFyQoQxojBCpEGSytmOH8qrH5Q9vuzD6ofQylkCUmh8DBAr+q8JCyVNtWQIidKQE9wNtLSQnS4jDSsxNHogzFuQBw4cyM61UKVsjfr3ooBkPSqqQHesUPWVtzi9/vQi1T+rJj7WiTz4Pt/l3LxUkr5P2VYZaZ4URpsE+st/dujQoaBBYokbrz/8TJNQYLSonrPS9kUaSkPeZyj1AWSj+d+VBoy1pIWVNed8P0Ll/ee5HdGRhrHhR5GGN0r4LGZBaj8oFDJitBTJzIZgFcmU0Y8ytWMZMzJOaXUSrUs5RxKnrxmbb5YXO9VGUhtpXldhEUogFr3IzIsvlpmdosVcGVGXFWp2oU9kLFL3dEkSz6NHEY1sjSRdIuDFWEhd8KxFqsRi1uM/nz9/zpxnwlESONdg6dKlbsaMGS4EHFHtjFIDHwKOo46l4TxSuxgDzi+rE2jg+BaFruOX4HXa0Nnf1lwAPufZeF8/r6zD97WK2qFnGjBxTw5qNGPxT+5T/r7/7RawFC3j4vTp09koCxkeHjqbHJqArmH5UrFKKksnxrK7FuRIs8STfBZv+luugXZ2pR/pP9Ois4z+TiMzUUkUjD0iEi1fzX8GmXyuxUBRcaUfykV0YZnlJGKQpOiGB76x5GeWkWWJc3mOrK6S7xdND+W5N6XyaRgtWJFe13GkaZnKOsYqGdOVVVbGupsyA/l7emTLHi7vwTdirNEt0qxnzAvBFcnQF16xh/TMpUuXHDowhlA9vQVraQhkudRdzOnK+04ZSP3DUhVSP61YsaLtd/ks7ZgtPcXqPqEafHkdqa84X6aCeL7YWlv6edGFHb+ZFICPlljHhg0bKuk0CSvVznWsotRu433alNdFrqG45ejoaPCaUkWERpLXjzFL2Rpllp7PJU2a/v7Ab8N05/9t27Z16KUqoFGsxnI9EosS2niSYg9SpU6B4JgTrvVW1flt1sT+0ADIJU2maXzcUTraGCRaL1Wp9rUMk16PMom8QhruxzvZIegJjFU7LLCePfS8uaQdPny4jTTL0dbee5mYokQsXTIWNY46kuMbnt8Kmec+LGWtOVIl9cT1rCB0V8WqkjAsRwta93TbwNYoGKsUSChN44lgBNCoHLHzquYKrU6qZ8lolCIN0Rh6cP0Q3U6I6IXILYOQI513hJaSKAorFpuHXJNfVlpRtmYBk1Su1obZr5dnKAO+L10Hrj3WZW+E3qh6IszE37F6EB+68mGpvKm4eb9bFrlzrok7fvr0Kfv727dvWRmdVTJHw0qiiCUSZ6wCK+7XL/AcsgNyL74DQQ730sv78Su7+t/A36MdY0sW5o40ahslXr58aZ5HtZB8GH64m9EmMZ7FpYw4T6QnrZfgenrhFxaSiSGXtPnz57e9TkNZLvTjeqhr734CNtrK41L40sUQckmj1lGKQ0rC37x544r8eNXRpnVE3ZZY7zXo8NomiO0ZUCj2uHz58rbXoZ6gc0uA+F6ZeKS/jhRDUq8MKrTho9fEkihMmhxtBI1DxKFY9XLpVcSkfoi8JGnToZO5sU5aiDQIW716ddt7ZLYtMQlhECdBGXZZMWldY5BHm5xgAroWj4C0hbYkSc/jBmggIrXJWlZM6pSETsEPGqZOndr2uuuR5rF169a2HoHPdurUKZM4CO1WTPqaDaAd+GFGKdIQkxAn9RuEWcTRyN2KSUgiSgF5aWzPTeA/lN5rZubMmR2bE4SIC4nJoltgAV/dVefZm72AtctUCJU2CMJ327hxY9t7EHbkyJFseq+EJSY16RPo3Dkq1kkr7+q0bNmyDuLQcZBEPYmHVdOBiJyIlrRDq41YPWfXOxUysi5fvtyaj+2BpcnsUV/oSoEMOk2CQGlr4ckhBwaetBhjCwH0ZHtJROPJkyc7UjcYLDjmrH7ADTEBXFfOYmB0k9oYBOjJ8b4aOYSe7QkKcYhFlq3QYLQhSidNmtS2RATwy8YOM3EQJsUjKiaWZ+vZToUQgzhkHXudb/PW5YMHD9yZM2faPsMwoc7RciYJXbGuBqJ1UIGKKLv915jsvgtJxCZDubdXr165mzdvtr1Hz5LONA8jrUwKPqsmVesKa49S3Q4WxmRPUEYdTjgiUcfUwLx589ySJUva3oMkP6IYddq6HMS4o55xBJBUeRjzfa4Zdeg56QZ43LhxoyPo7Lf1kNt7oO8wWAbNwaYjIv5lhyS7kRf96dvm5Jah8vfvX3flyhX35cuX6HfzFHOToS1H4BenCaHvO8pr8iDuwoUL7tevX+b5ZdbBair0xkFIlFDlW4ZknEClsp/TzXyAKVOmmHWFVSbDNw1l1+4f90U6IY/q4V27dpnE9bJ+v87QEydjqx/UamVVPRG+mwkNTYN+9tjkwzEx+atCm/X9WvWtDtAb68Wy9LXa1UmvCDDIpPkyOQ5ZwSzJ4jMrvFcr0rSjOUh+GcT4LSg5ugkW1Io0/SCDQBojh0hPlaJdah+tkVYrnTZowP8iq1F1TgMBBauufyB33x1v+NWFYmT5KmppgHC+NkAgbmRkpD3yn9QIseXymoTQFGQmIOKTxiZIWpvAatenVqRVXf2nTrAWMsPnKrMZHz6bJq5jvce6QK8J1cQNgKxlJapMPdZSR64/UivS9NztpkVEdKcrs5alhhWP9NeqlfWopzhZScI6QxseegZRGeg5a8C3Re1Mfl1ScP36ddcUaMuv24iOJtz7sbUjTS4qBvKmstYJoUauiuD3k5qhyr7QdUHMeCgLa1Ear9NquemdXgmum4fvJ6w1lqsuDhNrg1qSpleJK7K3TF0Q2jSd94uSZ60kK1e3qyVpQK6PVWXp2/FC3mp6jBhKKOiY2h3gtUV64TWM6wDETRPLDfSakXmH3w8g9Jlug8ZtTt4kVF0kLUYYmCCtD/DrQ5YhMGbA9L3ucdjh0y8kOHW5gU/VEEmJTcL4Pz/f7mgoAbYkAAAAAElFTkSuQmCC"]
-    }
+    },
  ]
 }'
 ```
@@ -585,46 +568,6 @@ curl http://localhost:11434/api/chat -d '{
 }
 ```

-#### Chat request (Reproducible outputs)
-
-##### Request
-
-```shell
-curl http://localhost:11434/api/chat -d '{
-  "model": "llama2",
-  "messages": [
-    {
-      "role": "user",
-      "content": "Hello!"
-    }
-  ],
-  "options": {
-    "seed": 101,
-    "temperature": 0
-  }
-}'
-```
-
-##### Response
-
-```json
-{
-  "model": "registry.ollama.ai/library/llama2:latest",
-  "created_at": "2023-12-12T14:13:43.416799Z",
-  "message": {
-    "role": "assistant",
-    "content": "Hello! How are you today?"
-  },
-  "done": true,
-  "total_duration": 5191566416,
-  "load_duration": 2154458,
-  "prompt_eval_count": 26,
-  "prompt_eval_duration": 383809000,
-  "eval_count": 298,
-  "eval_duration": 4799921000
-}
-```
-
 ## Create a Model

 ```shell
@@ -1015,7 +958,6 @@ Generate embeddings from a model
 Advanced parameters:

 - `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.md#valid-parameters-and-values) such as `temperature`
- `keep_alive`: controls how long the model will stay loaded into memory following the request (default: `5m`)

 ### Examples

@@ -1023,7 +965,7 @@ Advanced parameters:

 ```shell
 curl http://localhost:11434/api/embeddings -d '{
-  "model": "all-minilm",
+  "model": "llama2",
  "prompt": "Here is an article about llamas..."
 }'
 ```
--- a/docs/development.md
+++ b/docs/development.md
@@ -3,7 +3,7 @@
 Install required tools:

 - cmake version 3.24 or higher
- go version 1.22 or higher
+- go version 1.21 or higher
 - gcc version 11.4.0 or higher

 ```bash
@@ -42,16 +42,15 @@ Now you can run `ollama`:

 #### Linux CUDA (NVIDIA)

-_Your operating system distribution may already have packages for NVIDIA CUDA. Distro packages are often preferable, but instructions are distro-specific. Please consult distro-specific docs for dependencies if available!_
+*Your operating system distribution may already have packages for NVIDIA CUDA. Distro packages are often preferable, but instructions are distro-specific. Please consult distro-specific docs for dependencies if available!*

 Install `cmake` and `golang` as well as [NVIDIA CUDA](https://developer.nvidia.com/cuda-downloads)
-development and runtime packages.
+development and runtime packages. 

 Typically the build scripts will auto-detect CUDA, however, if your Linux distro
 or installation approach uses unusual paths, you can specify the location by
 specifying an environment variable `CUDA_LIB_DIR` to the location of the shared
-libraries, and `CUDACXX` to the location of the nvcc compiler. You can customize
-set set of target CUDA architectues by setting `CMAKE_CUDA_ARCHITECTURES` (e.g. "50;60;70")
+libraries, and `CUDACXX` to the location of the nvcc compiler.

 Then generate dependencies:

@@ -67,16 +66,15 @@ go build .

 #### Linux ROCm (AMD)

-_Your operating system distribution may already have packages for AMD ROCm and CLBlast. Distro packages are often preferable, but instructions are distro-specific. Please consult distro-specific docs for dependencies if available!_
+*Your operating system distribution may already have packages for AMD ROCm and CLBlast. Distro packages are often preferable, but instructions are distro-specific. Please consult distro-specific docs for dependencies if available!*

-Install [CLBlast](https://github.com/CNugteren/CLBlast/blob/master/doc/installation.md) and [ROCm](https://rocm.docs.amd.com/en/latest/) development packages first, as well as `cmake` and `golang`.
+Install [CLBlast](https://github.com/CNugteren/CLBlast/blob/master/doc/installation.md) and [ROCm](https://rocm.docs.amd.com/en/latest/deploy/linux/quick_start.html) developement packages first, as well as `cmake` and `golang`.

 Typically the build scripts will auto-detect ROCm, however, if your Linux distro
 or installation approach uses unusual paths, you can specify the location by
 specifying an environment variable `ROCM_PATH` to the location of the ROCm
 install (typically `/opt/rocm`), and `CLBlast_DIR` to the location of the
-CLBlast install (typically `/usr/lib/cmake/CLBlast`). You can also customize
-the AMD GPU targets by setting AMDGPU_TARGETS (e.g. `AMDGPU_TARGETS="gfx1101;gfx1102"`)
+CLBlast install (typically `/usr/lib/cmake/CLBlast`).

 ```
 go generate ./...
@@ -88,17 +86,17 @@ Then build the binary:
 go build .
 ```

-ROCm requires elevated privileges to access the GPU at runtime. On most distros you can add your user account to the `render` group, or run as root.
+ROCm requires elevated privileges to access the GPU at runtime.  On most distros you can add your user account to the `render` group, or run as root.

 #### Advanced CPU Settings

 By default, running `go generate ./...` will compile a few different variations
 of the LLM library based on common CPU families and vector math capabilities,
 including a lowest-common-denominator which should run on almost any 64 bit CPU
-somewhat slowly. At runtime, Ollama will auto-detect the optimal variation to
-load. If you would like to build a CPU-based build customized for your
+somewhat slowly.  At runtime, Ollama will auto-detect the optimal variation to
+load.  If you would like to build a CPU-based build customized for your
 processor, you can set `OLLAMA_CUSTOM_CPU_DEFS` to the llama.cpp flags you would
-like to use. For example, to compile an optimized binary for an Intel i9-9880H,
+like to use.  For example, to compile an optimized binary for an Intel i9-9880H,
 you might use:

 ```
@@ -108,7 +106,8 @@ go build .

 #### Containerized Linux Build

-If you have Docker available, you can build linux binaries with `./scripts/build_linux.sh` which has the CUDA and ROCm dependencies included. The resulting binary is placed in `./dist`
+If you have Docker available, you can build linux binaries with `./scripts/build_linux.sh` which has the CUDA and ROCm dependencies included.  The resulting binary is placed in `./dist`
+

 ### Windows

@@ -117,29 +116,21 @@ Note: The windows build for Ollama is still under development.
 Install required tools:

 - MSVC toolchain - C/C++ and cmake as minimal requirements
- Go version 1.22 or higher
+- go version 1.21 or higher
 - MinGW (pick one variant) with GCC.
-  - [MinGW-w64](https://www.mingw-w64.org/)
-  - [MSYS2](https://www.msys2.org/)
+  - <https://www.mingw-w64.org/>
+  - <https://www.msys2.org/>

 ```powershell
 $env:CGO_ENABLED="1"
+
 go generate ./...
+
 go build .
 ```

 #### Windows CUDA (NVIDIA)

-In addition to the common Windows development tools described above, install CUDA after installing MSVC.
+In addition to the common Windows development tools described above, install:

 - [NVIDIA CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html)
-
-
-#### Windows ROCm (AMD Radeon)
-
-In addition to the common Windows development tools described above, install AMDs HIP package after installing MSVC.
-
- [AMD HIP](https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html)
- [Strawberry Perl](https://strawberryperl.com/)
-
-Lastly, add `ninja.exe` included with MSVC to the system path (e.g. `C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\IDE\CommonExtensions\Microsoft\CMake\Ninja`).
--- a/Show More
+++ b/Show More