add back in the windows terminal file

initial commit of the readline editor replacement
treat ollama run model < file as entire prompt, not prompt-per-line (#1126 )
2026-01-01 20:18:52 -05:00 · 2023-11-14 16:52:34 -08:00 · 2023-11-14 15:59:35 -08:00 · 2023-11-14 16:42:21 -05:00 · 2023-11-14 16:12:30 -05:00 · 2023-11-14 16:09:09 -05:00
35 changed files with 1235 additions and 818 deletions
--- a/README.md
+++ b/README.md
@@ -29,8 +29,7 @@ curl https://ollama.ai/install.sh | sh

 ### Docker

-The official [Ollama Docker image `ollama/ollama`](https://hub.docker.com/r/ollama/ollama)
-is available on Docker Hub.
+The official [Ollama Docker image](https://hub.docker.com/r/ollama/ollama) `ollama/ollama` is available on Docker Hub.

 ## Quickstart

@@ -160,7 +159,7 @@ I'm a basic program that prints the famous "Hello, world!" message to the consol
 ### Pass in prompt as arguments

 ```
-$ ollama run llama2 "summarize this file:" "$(cat README.md)"
+$ ollama run llama2 "Summarize this file: $(cat README.md)"
 Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.
 ```

@@ -217,21 +216,44 @@ See the [API documentation](./docs/api.md) for all endpoints.

 ## Community Integrations

+### Web & Desktop
+
+- [HTML UI](https://github.com/rtcfirefly/ollama-ui)
+- [Chatbot UI](https://github.com/ivanfioravanti/chatbot-ollama)
+- [Typescript UI](https://github.com/ollama-interface/Ollama-Gui?tab=readme-ov-file)
+- [Minimalistic React UI for Ollama Models](https://github.com/richawo/minimal-llm-ui)
+- [Web UI](https://github.com/ollama-webui/ollama-webui)
+- [Ollamac](https://github.com/kevinhermawan/Ollamac)
+- [big-AGI](https://github.com/enricoros/big-agi/blob/main/docs/config-ollama.md)
+
+### Terminal
+
+- [oterm](https://github.com/ggozad/oterm)
+- [Ellama Emacs client](https://github.com/s-kostyaev/ellama)
+- [Emacs client](https://github.com/zweifisch/ollama)
+- [gen.nvim](https://github.com/David-Kunz/gen.nvim)
+- [ollama.nvim](https://github.com/nomnivore/ollama.nvim)
+- [gptel Emacs client](https://github.com/karthink/gptel)
+
+### Libraries
+
 - [LangChain](https://python.langchain.com/docs/integrations/llms/ollama) and [LangChain.js](https://js.langchain.com/docs/modules/model_io/models/llms/integrations/ollama) with [example](https://js.langchain.com/docs/use_cases/question_answering/local_retrieval_qa)
 - [LlamaIndex](https://gpt-index.readthedocs.io/en/stable/examples/llm/ollama.html)
+- [LiteLLM](https://github.com/BerriAI/litellm)
+- [OllamaSharp for .NET](https://github.com/awaescher/OllamaSharp)
+- [Ollama-rs for Rust](https://github.com/pepperoni21/ollama-rs)
+- [Ollama4j for Java](https://github.com/amithkoujalgi/ollama4j)
+- [ModelFusion Typescript Library](https://modelfusion.dev/integration/model-provider/ollama)
+- [OllamaKit for Swift](https://github.com/kevinhermawan/OllamaKit)
+- [Ollama for Dart](https://github.com/breitburg/dart-ollama)
+
+### Extensions & Plugins
+
 - [Raycast extension](https://github.com/MassimilianoPasquini97/raycast_ollama)
 - [Discollama](https://github.com/mxyng/discollama) (Discord bot inside the Ollama discord channel)
 - [Continue](https://github.com/continuedev/continue)
 - [Obsidian Ollama plugin](https://github.com/hinterdupfinger/obsidian-ollama)
+- [Logseq Ollama plugin](https://github.com/omagdy7/ollama-logseq)
 - [Dagger Chatbot](https://github.com/samalba/dagger-chatbot)
- [LiteLLM](https://github.com/BerriAI/litellm)
 - [Discord AI Bot](https://github.com/mekb-turtle/discord-ai-bot)
- [Chatbot UI](https://github.com/ivanfioravanti/chatbot-ollama)
- [HTML UI](https://github.com/rtcfirefly/ollama-ui)
- [Typescript UI](https://github.com/ollama-interface/Ollama-Gui?tab=readme-ov-file)
- [Dumbar](https://github.com/JerrySievert/Dumbar)
- [Emacs client](https://github.com/zweifisch/ollama)
- [oterm](https://github.com/ggozad/oterm)
- [Ellama Emacs client](https://github.com/s-kostyaev/ellama)
- [OllamaSharp for .NET](https://github.com/awaescher/OllamaSharp)
- [Minimalistic React UI for Ollama Models](https://github.com/richawo/minimal-llm-ui)
+- [Hass Ollama Conversation](https://github.com/ej52/hass-ollama-conversation)
--- a/api/client.py
+++ b/api/client.py
@@ -7,7 +7,7 @@ BASE_URL = os.environ.get('OLLAMA_HOST', 'http://localhost:11434')
 # Generate a response for a given prompt with a provided model. This is a streaming endpoint, so will be a series of responses.
 # The final response object will include statistics and additional data from the request. Use the callback function to override
 # the default handler.
-def generate(model_name, prompt, system=None, template=None, context=None, options=None, callback=None):
+def generate(model_name, prompt, system=None, template=None, format="", context=None, options=None, callback=None):
    try:
        url = f"{BASE_URL}/api/generate"
        payload = {
@@ -16,7 +16,8 @@ def generate(model_name, prompt, system=None, template=None, context=None, optio
            "system": system, 
            "template": template, 
            "context": context, 
-            "options": options
+            "options": options,
+            "format": format,
        }
        
        # Remove keys with None values
--- a/api/types.go
+++ b/api/types.go
@@ -37,10 +37,56 @@ type GenerateRequest struct {
 	Template string `json:"template"`
 	Context  []int  `json:"context,omitempty"`
 	Stream   *bool  `json:"stream,omitempty"`
+	Raw      bool   `json:"raw,omitempty"`
+	Format   string `json:"format"`

 	Options map[string]interface{} `json:"options"`
 }

+// Options specfied in GenerateRequest, if you add a new option here add it to the API docs also
+type Options struct {
+	Runner
+
+	// Predict options used at runtime
+	NumKeep          int      `json:"num_keep,omitempty"`
+	Seed             int      `json:"seed,omitempty"`
+	NumPredict       int      `json:"num_predict,omitempty"`
+	TopK             int      `json:"top_k,omitempty"`
+	TopP             float32  `json:"top_p,omitempty"`
+	TFSZ             float32  `json:"tfs_z,omitempty"`
+	TypicalP         float32  `json:"typical_p,omitempty"`
+	RepeatLastN      int      `json:"repeat_last_n,omitempty"`
+	Temperature      float32  `json:"temperature,omitempty"`
+	RepeatPenalty    float32  `json:"repeat_penalty,omitempty"`
+	PresencePenalty  float32  `json:"presence_penalty,omitempty"`
+	FrequencyPenalty float32  `json:"frequency_penalty,omitempty"`
+	Mirostat         int      `json:"mirostat,omitempty"`
+	MirostatTau      float32  `json:"mirostat_tau,omitempty"`
+	MirostatEta      float32  `json:"mirostat_eta,omitempty"`
+	PenalizeNewline  bool     `json:"penalize_newline,omitempty"`
+	Stop             []string `json:"stop,omitempty"`
+}
+
+// Runner options which must be set when the model is loaded into memory
+type Runner struct {
+	UseNUMA            bool    `json:"numa,omitempty"`
+	NumCtx             int     `json:"num_ctx,omitempty"`
+	NumBatch           int     `json:"num_batch,omitempty"`
+	NumGQA             int     `json:"num_gqa,omitempty"`
+	NumGPU             int     `json:"num_gpu,omitempty"`
+	MainGPU            int     `json:"main_gpu,omitempty"`
+	LowVRAM            bool    `json:"low_vram,omitempty"`
+	F16KV              bool    `json:"f16_kv,omitempty"`
+	LogitsAll          bool    `json:"logits_all,omitempty"`
+	VocabOnly          bool    `json:"vocab_only,omitempty"`
+	UseMMap            bool    `json:"use_mmap,omitempty"`
+	UseMLock           bool    `json:"use_mlock,omitempty"`
+	EmbeddingOnly      bool    `json:"embedding_only,omitempty"`
+	RopeFrequencyBase  float32 `json:"rope_frequency_base,omitempty"`
+	RopeFrequencyScale float32 `json:"rope_frequency_scale,omitempty"`
+	NumThread          int     `json:"num_thread,omitempty"`
+}
+
 type EmbeddingRequest struct {
 	Model  string `json:"model"`
 	Prompt string `json:"prompt"`
@@ -161,49 +207,6 @@ func (r *GenerateResponse) Summary() {
 	}
 }

-// Runner options which must be set when the model is loaded into memory
-type Runner struct {
-	UseNUMA            bool    `json:"numa,omitempty"`
-	NumCtx             int     `json:"num_ctx,omitempty"`
-	NumBatch           int     `json:"num_batch,omitempty"`
-	NumGQA             int     `json:"num_gqa,omitempty"`
-	NumGPU             int     `json:"num_gpu,omitempty"`
-	MainGPU            int     `json:"main_gpu,omitempty"`
-	LowVRAM            bool    `json:"low_vram,omitempty"`
-	F16KV              bool    `json:"f16_kv,omitempty"`
-	LogitsAll          bool    `json:"logits_all,omitempty"`
-	VocabOnly          bool    `json:"vocab_only,omitempty"`
-	UseMMap            bool    `json:"use_mmap,omitempty"`
-	UseMLock           bool    `json:"use_mlock,omitempty"`
-	EmbeddingOnly      bool    `json:"embedding_only,omitempty"`
-	RopeFrequencyBase  float32 `json:"rope_frequency_base,omitempty"`
-	RopeFrequencyScale float32 `json:"rope_frequency_scale,omitempty"`
-	NumThread          int     `json:"num_thread,omitempty"`
-}
-
-type Options struct {
-	Runner
-
-	// Predict options used at runtime
-	NumKeep          int      `json:"num_keep,omitempty"`
-	Seed             int      `json:"seed,omitempty"`
-	NumPredict       int      `json:"num_predict,omitempty"`
-	TopK             int      `json:"top_k,omitempty"`
-	TopP             float32  `json:"top_p,omitempty"`
-	TFSZ             float32  `json:"tfs_z,omitempty"`
-	TypicalP         float32  `json:"typical_p,omitempty"`
-	RepeatLastN      int      `json:"repeat_last_n,omitempty"`
-	Temperature      float32  `json:"temperature,omitempty"`
-	RepeatPenalty    float32  `json:"repeat_penalty,omitempty"`
-	PresencePenalty  float32  `json:"presence_penalty,omitempty"`
-	FrequencyPenalty float32  `json:"frequency_penalty,omitempty"`
-	Mirostat         int      `json:"mirostat,omitempty"`
-	MirostatTau      float32  `json:"mirostat_tau,omitempty"`
-	MirostatEta      float32  `json:"mirostat_eta,omitempty"`
-	PenalizeNewline  bool     `json:"penalize_newline,omitempty"`
-	Stop             []string `json:"stop,omitempty"`
-}
-
 var ErrInvalidOpts = fmt.Errorf("invalid options")

 func (opts *Options) FromMap(m map[string]interface{}) error {
--- a/cmd/cmd.go
+++ b/cmd/cmd.go
@@ -1,7 +1,6 @@
 package cmd

 import (
-	"bufio"
 	"context"
 	"crypto/ed25519"
 	"crypto/rand"
@@ -28,9 +27,9 @@ import (
 	"golang.org/x/term"

 	"github.com/jmorganca/ollama/api"
+	"github.com/jmorganca/ollama/editor"
 	"github.com/jmorganca/ollama/format"
 	"github.com/jmorganca/ollama/progressbar"
-	"github.com/jmorganca/ollama/readline"
 	"github.com/jmorganca/ollama/server"
 	"github.com/jmorganca/ollama/version"
 )
@@ -350,34 +349,49 @@ func pull(model string, insecure bool) error {
 }

 func RunGenerate(cmd *cobra.Command, args []string) error {
-	if len(args) > 1 {
-		// join all args into a single prompt
-		wordWrap := false
-		if term.IsTerminal(int(os.Stdout.Fd())) {
-			wordWrap = true
-		}
+	format, err := cmd.Flags().GetString("format")
+	if err != nil {
+		return err
+	}

-		nowrap, err := cmd.Flags().GetBool("nowordwrap")
+	prompts := args[1:]
+
+	// prepend stdin to the prompt if provided
+	if !term.IsTerminal(int(os.Stdin.Fd())) {
+		in, err := io.ReadAll(os.Stdin)
 		if err != nil {
 			return err
 		}
-		if nowrap {
-			wordWrap = false
-		}

-		return generate(cmd, args[0], strings.Join(args[1:], " "), wordWrap)
+		prompts = append([]string{string(in)}, prompts...)
 	}

-	if readline.IsTerminal(int(os.Stdin.Fd())) {
-		return generateInteractive(cmd, args[0])
+	// output is being piped
+	if !term.IsTerminal(int(os.Stdout.Fd())) {
+		return generate(cmd, args[0], strings.Join(prompts, " "), false, format)
 	}

-	return generateBatch(cmd, args[0])
+	wordWrap := os.Getenv("TERM") == "xterm-256color"
+
+	nowrap, err := cmd.Flags().GetBool("nowordwrap")
+	if err != nil {
+		return err
+	}
+	if nowrap {
+		wordWrap = false
+	}
+
+	// prompts are provided via stdin or args so don't enter interactive mode
+	if len(prompts) > 0 {
+		return generate(cmd, args[0], strings.Join(prompts, " "), wordWrap, format)
+	}
+
+	return generateInteractive(cmd, args[0], wordWrap, format)
 }

 type generateContextKey string

-func generate(cmd *cobra.Command, model, prompt string, wordWrap bool) error {
+func generate(cmd *cobra.Command, model, prompt string, wordWrap bool, format string) error {
 	client, err := api.ClientFromEnvironment()
 	if err != nil {
 		return err
@@ -393,7 +407,7 @@ func generate(cmd *cobra.Command, model, prompt string, wordWrap bool) error {
 		generateContext = []int{}
 	}

-	termWidth, _, err := term.GetSize(int(0))
+	termWidth, _, err := term.GetSize(int(os.Stdout.Fd()))
 	if err != nil {
 		wordWrap = false
 	}
@@ -414,7 +428,7 @@ func generate(cmd *cobra.Command, model, prompt string, wordWrap bool) error {
 	var currentLineLength int
 	var wordBuffer string

-	request := api.GenerateRequest{Model: model, Prompt: prompt, Context: generateContext}
+	request := api.GenerateRequest{Model: model, Prompt: prompt, Context: generateContext, Format: format}
 	fn := func(response api.GenerateResponse) error {
 		if !spinner.IsFinished() {
 			spinner.Finish()
@@ -485,9 +499,9 @@ func generate(cmd *cobra.Command, model, prompt string, wordWrap bool) error {
 	return nil
 }

-func generateInteractive(cmd *cobra.Command, model string) error {
+func generateInteractive(cmd *cobra.Command, model string, wordWrap bool, format string) error {
 	// load the model
-	if err := generate(cmd, model, "", false); err != nil {
+	if err := generate(cmd, model, "", false, ""); err != nil {
 		return err
 	}

@@ -508,6 +522,8 @@ func generateInteractive(cmd *cobra.Command, model string) error {
 		fmt.Fprintln(os.Stderr, "  /set nohistory    Disable history")
 		fmt.Fprintln(os.Stderr, "  /set wordwrap     Enable wordwrap")
 		fmt.Fprintln(os.Stderr, "  /set nowordwrap   Disable wordwrap")
+		fmt.Fprintln(os.Stderr, "  /set format json  Enable JSON mode")
+		fmt.Fprintln(os.Stderr, "  /set noformat     Disable formatting")
 		fmt.Fprintln(os.Stderr, "  /set verbose      Show LLM stats")
 		fmt.Fprintln(os.Stderr, "  /set quiet        Disable LLM stats")
 		fmt.Fprintln(os.Stderr, "")
@@ -523,45 +539,24 @@ func generateInteractive(cmd *cobra.Command, model string) error {
 		fmt.Fprintln(os.Stderr, "")
 	}

-	prompt := readline.Prompt{
-		Prompt:         ">>> ",
-		AltPrompt:      "... ",
-		Placeholder:    "Send a message (/? for help)",
-		AltPlaceholder: `Use """ to end multi-line input`,
+	prompt := editor.Prompt{
+		Prompt:      ">>> ",
+		AltPrompt:   "... ",
+		Placeholder: "Send a message (/? for help)",
 	}

-	scanner, err := readline.New(prompt)
+	ed, err := editor.New(prompt)
 	if err != nil {
 		return err
 	}

-	var wordWrap bool
-	termType := os.Getenv("TERM")
-	if termType == "xterm-256color" {
-		wordWrap = true
-	}
-
-	// override wrapping if the user turned it off
-	nowrap, err := cmd.Flags().GetBool("nowordwrap")
-	if err != nil {
-		return err
-	}
-	if nowrap {
-		wordWrap = false
-	}
-
-	fmt.Print(readline.StartBracketedPaste)
-	defer fmt.Printf(readline.EndBracketedPaste)
-
-	var multiLineBuffer string
-
 	for {
-		line, err := scanner.Readline()
+		line, err := ed.HandleInput()
 		switch {
 		case errors.Is(err, io.EOF):
 			fmt.Println()
 			return nil
-		case errors.Is(err, readline.ErrInterrupt):
+		case errors.Is(err, editor.ErrInterrupt):
 			if line == "" {
 				fmt.Println("\nUse Ctrl-D or /bye to exit.")
 			}
@@ -574,20 +569,6 @@ func generateInteractive(cmd *cobra.Command, model string) error {
 		line = strings.TrimSpace(line)

 		switch {
-		case scanner.Prompt.UseAlt:
-			if strings.HasSuffix(line, `"""`) {
-				scanner.Prompt.UseAlt = false
-				multiLineBuffer += strings.TrimSuffix(line, `"""`)
-				line = multiLineBuffer
-				multiLineBuffer = ""
-			} else {
-				multiLineBuffer += line + " "
-				continue
-			}
-		case strings.HasPrefix(line, `"""`):
-			scanner.Prompt.UseAlt = true
-			multiLineBuffer = strings.TrimPrefix(line, `"""`) + " "
-			continue
 		case strings.HasPrefix(line, "/list"):
 			args := strings.Fields(line)
 			if err := ListHandler(cmd, args[1:]); err != nil {
@@ -598,9 +579,9 @@ func generateInteractive(cmd *cobra.Command, model string) error {
 			if len(args) > 1 {
 				switch args[1] {
 				case "history":
-					scanner.HistoryEnable()
+					//scanner.HistoryEnable()
 				case "nohistory":
-					scanner.HistoryDisable()
+					//scanner.HistoryDisable()
 				case "wordwrap":
 					wordWrap = true
 					fmt.Println("Set 'wordwrap' mode.")
@@ -613,6 +594,16 @@ func generateInteractive(cmd *cobra.Command, model string) error {
 				case "quiet":
 					cmd.Flags().Set("verbose", "false")
 					fmt.Println("Set 'quiet' mode.")
+				case "format":
+					if len(args) < 3 || args[2] != "json" {
+						fmt.Println("Invalid or missing format. For 'json' mode use '/set format json'")
+					} else {
+						format = args[2]
+						fmt.Printf("Set format to '%s' mode.\n", args[2])
+					}
+				case "noformat":
+					format = ""
+					fmt.Println("Disabled format.")
 				default:
 					fmt.Printf("Unknown command '/set %s'. Type /? for help\n", args[1])
 				}
@@ -686,26 +677,13 @@ func generateInteractive(cmd *cobra.Command, model string) error {
 		}

 		if len(line) > 0 && line[0] != '/' {
-			if err := generate(cmd, model, line, wordWrap); err != nil {
+			if err := generate(cmd, model, line, wordWrap, format); err != nil {
 				return err
 			}
 		}
 	}
 }

-func generateBatch(cmd *cobra.Command, model string) error {
-	scanner := bufio.NewScanner(os.Stdin)
-	for scanner.Scan() {
-		prompt := scanner.Text()
-		fmt.Printf(">>> %s\n", prompt)
-		if err := generate(cmd, model, prompt, false); err != nil {
-			return err
-		}
-	}
-
-	return nil
-}
-
 func RunServer(cmd *cobra.Command, _ []string) error {
 	host, port, err := net.SplitHostPort(os.Getenv("OLLAMA_HOST"))
 	if err != nil {
@@ -883,6 +861,7 @@ func NewCLI() *cobra.Command {
 	runCmd.Flags().Bool("verbose", false, "Show timings for response")
 	runCmd.Flags().Bool("insecure", false, "Use an insecure registry")
 	runCmd.Flags().Bool("nowordwrap", false, "Don't wrap words to the next line automatically")
+	runCmd.Flags().String("format", "", "Response format (e.g. json)")

 	serveCmd := &cobra.Command{
 		Use:     "serve",
--- a/docs/api.md
+++ b/docs/api.md
@@ -41,11 +41,17 @@ Generate a response for a given prompt with a provided model. This is a streamin

 Advanced parameters (optional):

+- `format`: the format to return a response in. Currently the only accepted value is `json`
 - `options`: additional model parameters listed in the documentation for the [Modelfile](./modelfile.md#valid-parameters-and-values) such as `temperature`
 - `system`: system prompt to (overrides what is defined in the `Modelfile`)
 - `template`: the full prompt or prompt template (overrides what is defined in the `Modelfile`)
 - `context`: the context parameter returned from a previous request to `/generate`, this can be used to keep a short conversational memory
 - `stream`: if `false` the response will be returned as a single response object, rather than a stream of objects
+- `raw`: if `true` no formatting will be applied to the prompt and no context will be returned. You may choose to use the `raw` parameter if you are specifying a full templated prompt in your request to the API, and are managing history yourself.
+
+### JSON mode
+
+Enable JSON mode by setting the `format` parameter to `json` and specifying the model should use JSON in the `prompt`. This will structure the response as valid JSON. See the JSON mode [example](#request-json-mode) below.

 ### Examples

@@ -53,7 +59,7 @@ Advanced parameters (optional):

 ```shell
 curl -X POST http://localhost:11434/api/generate -d '{
-  "model": "llama2:7b",
+  "model": "llama2",
  "prompt": "Why is the sky blue?"
 }'
 ```
@@ -64,7 +70,7 @@ A stream of JSON objects is returned:

 ```json
 {
-  "model": "llama2:7b",
+  "model": "llama2",
  "created_at": "2023-08-04T08:52:19.385406455-07:00",
  "response": "The",
  "done": false
@@ -88,7 +94,7 @@ To calculate how fast the response is generated in tokens per second (token/s),

 ```json
 {
-  "model": "llama2:7b",
+  "model": "llama2",
  "created_at": "2023-08-04T19:22:45.499127Z",
  "response": "",
  "context": [1, 2, 3],
@@ -104,7 +110,7 @@ To calculate how fast the response is generated in tokens per second (token/s),
 }
 ```

-#### Request
+#### Request (No streaming)

 ```shell
 curl -X POST http://localhost:11434/api/generate -d '{
@@ -136,6 +142,150 @@ If `stream` is set to `false`, the response will be a single JSON object:
 }
 ```

+#### Request (Raw mode)
+
+In some cases you may wish to bypass the templating system and provide a full prompt. In this case, you can use the `raw` parameter to disable formatting and context.
+
+```shell
+curl -X POST http://localhost:11434/api/generate -d '{
+  "model": "mistral",
+  "prompt": "[INST] why is the sky blue? [/INST]",
+  "raw": true,
+  "stream": false
+}'
+```
+
+#### Response
+
+```json
+{
+  "model": "mistral",
+  "created_at": "2023-11-03T15:36:02.583064Z",
+  "response": " The sky appears blue because of a phenomenon called Rayleigh scattering.",
+  "done": true,
+  "total_duration": 14648695333,
+  "load_duration": 3302671417,
+  "prompt_eval_count": 14,
+  "prompt_eval_duration": 286243000,
+  "eval_count": 129,
+  "eval_duration": 10931424000
+}
+```
+
+#### Request (JSON mode)
+
+```shell
+curl -X POST http://localhost:11434/api/generate -d '{
+  "model": "llama2",
+  "prompt": "What color is the sky at different times of the day? Respond using JSON",
+  "format": "json",
+  "stream": false
+}'
+```
+
+#### Response
+
+```json
+{
+  "model": "llama2",
+  "created_at": "2023-11-09T21:07:55.186497Z",
+  "response": "{\n\"morning\": {\n\"color\": \"blue\"\n},\n\"noon\": {\n\"color\": \"blue-gray\"\n},\n\"afternoon\": {\n\"color\": \"warm gray\"\n},\n\"evening\": {\n\"color\": \"orange\"\n}\n}\n",
+  "done": true,
+  "total_duration": 4661289125,
+  "load_duration": 1714434500,
+  "prompt_eval_count": 36,
+  "prompt_eval_duration": 264132000,
+  "eval_count": 75,
+  "eval_duration": 2112149000
+}
+```
+
+The value of `response` will be a string containing JSON similar to:
+
+```json
+{
+  "morning": {
+    "color": "blue"
+  },
+  "noon": {
+    "color": "blue-gray"
+  },
+  "afternoon": {
+    "color": "warm gray"
+  },
+  "evening": {
+    "color": "orange"
+  }
+}
+```
+
+#### Request (With options)
+
+If you want to set custom options for the model at runtime rather than in the Modelfile, you can do so with the `options` parameter. This example sets every available option, but you can set any of them individually and omit the ones you do not want to override.
+
+```shell
+curl -X POST http://localhost:11434/api/generate -d '{
+  "model": "llama2:7b",
+  "prompt": "Why is the sky blue?",
+  "stream": false,
+  "options": {
+    "num_keep": 5,
+    "seed": 42,
+    "num_predict": 100,
+    "top_k": 20,
+    "top_p": 0.9,
+    "tfs_z": 0.5,
+    "typical_p": 0.7,
+    "repeat_last_n": 33,
+    "temperature": 0.8,
+    "repeat_penalty": 1.2,
+    "presence_penalty": 1.5,
+    "frequency_penalty": 1.0,
+    "mirostat": 1,
+    "mirostat_tau": 0.8,
+    "mirostat_eta": 0.6,
+    "penalize_newline": true,
+    "stop": ["\n", "user:"],
+    "numa": false,
+    "num_ctx": 4,
+    "num_batch": 2,
+    "num_gqa": 1,
+    "num_gpu": 1,
+    "main_gpu": 0,
+    "low_vram": false,
+    "f16_kv": true,
+    "logits_all": false,
+    "vocab_only": false,
+    "use_mmap": true,
+    "use_mlock": false,
+    "embedding_only": false,
+    "rope_frequency_base": 1.1,
+    "rope_frequency_scale": 0.8,
+    "num_thread": 8
+    }
+}'
+```
+
+#### Response
+
+```json
+{
+  "model": "llama2:7b",
+  "created_at": "2023-08-04T19:22:45.499127Z",
+  "response": "The sky is blue because it is the color of the sky.",
+  "context": [1, 2, 3],
+  "done": true,
+  "total_duration": 5589157167,
+  "load_duration": 3013701500,
+  "sample_count": 114,
+  "sample_duration": 81442000,
+  "prompt_eval_count": 46,
+  "prompt_eval_duration": 1160282000,
+  "eval_count": 13,
+  "eval_duration": 1325948000
+}
+```
+
 ## Create a Model

 ```shell
--- a/docs/faq.md
+++ b/docs/faq.md
@@ -74,6 +74,25 @@ systemctl restart ollama
 - macOS: Raw model data is stored under `~/.ollama/models`.
 - Linux: Raw model data is stored under `/usr/share/ollama/.ollama/models`

+
+
+Below the models directory you will find a structure similar to the following:
+
+```shell
+.
+├── blobs
+└── manifests
+   └── registry.ollama.ai
+      ├── f0rodo
+      ├── library
+      ├── mattw
+      └── saikatkumardey
+```
+
+There is a `manifests/registry.ollama.ai/namespace` path. In example above, the user has downloaded models from the official `library`, `f0rodo`, `mattw`, and `saikatkumardey` namespaces. Within each of those directories, you will find directories for each of the models downloaded. And in there you will find a file name representing each tag. Each tag file is the manifest for the model.  
+
+The manifest lists all the layers used in this model. You will see a `media type` for each layer, along with a digest. That digest corresponds with a file in the `models/blobs directory`.
+
 ### How can I change where Ollama stores models?

 To modify where models are stored, you can use the `OLLAMA_MODELS` environment variable. Note that on Linux this means defining `OLLAMA_MODELS` in a drop-in `/etc/systemd/system/ollama.service.d` service file, reloading systemd, and restarting the ollama service.
--- a/docs/modelfile.md
+++ b/docs/modelfile.md
@@ -112,8 +112,8 @@ PARAMETER <parameter> <parametervalue>
 | repeat_last_n  | Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)                                                                                                                                           | int        | repeat_last_n 64     |
 | repeat_penalty | Sets how strongly to penalize repetitions. A higher value (e.g., 1.5) will penalize repetitions more strongly, while a lower value (e.g., 0.9) will be more lenient. (Default: 1.1)                                                                     | float      | repeat_penalty 1.1   |
 | temperature    | The temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8)                                                                                                                                     | float      | temperature 0.7      |
-| seed | Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt. (Default: 0) | int | seed 42 |
-| stop           | Sets the stop sequences to use.                                                                                                                                                                                                                         | string     | stop "AI assistant:" |
+| seed           | Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt. (Default: 0)                                                                                       | int        | seed 42              |
+| stop           | Sets the stop sequences to use. When this pattern is encountered the LLM will stop generating text and return. Multiple stop patterns may be set by specifying multiple separate `stop` parameters in a modelfile.                                      | string     | stop "AI assistant:" |
 | tfs_z          | Tail free sampling is used to reduce the impact of less probable tokens from the output. A higher value (e.g., 2.0) will reduce the impact more, while a value of 1.0 disables this setting. (default: 1)                                               | float      | tfs_z 1              |
 | num_predict    | Maximum number of tokens to predict when generating text. (Default: 128, -1 = infinite generation, -2 = fill context)                                                                                                                                   | int        | num_predict 42       |
 | top_k          | Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)                                                                        | int        | top_k 40             |
--- a/docs/tutorials/langchainjs.md
+++ b/docs/tutorials/langchainjs.md
@@ -23,13 +23,17 @@ const answer = await ollama.call(`why is the sky blue?`);
 console.log(answer);
 ```

-That will get us the same thing as if we ran `ollama run llama2 "why is the sky blue"` in the terminal. But we want to load a document from the web to ask a question against. **Cheerio** is a great library for ingesting a webpage, and **LangChain** uses it in their **CheerioWebBaseLoader**. So let's build that part of the app.
+That will get us the same thing as if we ran `ollama run llama2 "why is the sky blue"` in the terminal. But we want to load a document from the web to ask a question against. **Cheerio** is a great library for ingesting a webpage, and **LangChain** uses it in their **CheerioWebBaseLoader**. So let's install **Cheerio** and build that part of the app.
+
+```bash
+npm install cheerio 
+```

 ```javascript
 import { CheerioWebBaseLoader } from "langchain/document_loaders/web/cheerio";

 const loader = new CheerioWebBaseLoader("https://en.wikipedia.org/wiki/2023_Hawaii_wildfires");
-const data = loader.load();
+const data = await loader.load();
 ```

 That will load the document. Although this page is smaller than the Odyssey, it is certainly bigger than the context size for most LLMs. So we are going to need to split into smaller pieces, and then select just the pieces relevant to our question. This is a great use for a vector datastore. In this example, we will use the **MemoryVectorStore** that is part of **LangChain**. But there is one more thing we need to get the content into the datastore. We have to run an embeddings process that converts the tokens in the text into a series of vectors. And for that, we are going to use **Tensorflow**. There is a lot of stuff going on in this one. First, install the **Tensorflow** components that we need.
--- a/editor/buffer.go
+++ b/editor/buffer.go
@@ -0,0 +1,488 @@
+package editor
+
+import (
+	"fmt"
+	"strings"
+
+	"github.com/emirpasic/gods/lists/arraylist"
+	"golang.org/x/term"
+)
+
+type Buffer struct {
+	PosX         int
+	PosY         int
+	Buf          []*arraylist.List
+	Prompt       *Prompt
+	WordWrap     int
+	ScreenWidth  int
+	ScreenHeight int
+}
+
+func NewBuffer(prompt *Prompt) (*Buffer, error) {
+	width, height, err := term.GetSize(0)
+	if err != nil {
+		fmt.Println("Error getting size:", err)
+		return nil, err
+	}
+
+	b := &Buffer{
+		PosX:         0,
+		PosY:         0,
+		Buf:          []*arraylist.List{arraylist.New()},
+		Prompt:       prompt,
+		ScreenWidth:  width,
+		ScreenHeight: height,
+	}
+
+	return b, nil
+}
+
+func (b *Buffer) LineWidth() int {
+	return b.ScreenWidth - len(b.Prompt.Prompt)
+}
+
+func (b *Buffer) findWordAtPos(line string, pos int) string {
+	return ""
+}
+
+func (b *Buffer) addLine(row int) {
+	if row+1 == len(b.Buf) {
+		b.Buf = append(b.Buf, arraylist.New())
+	} else {
+		b.Buf = append(b.Buf, nil)
+		copy(b.Buf[row+2:], b.Buf[row+1:])
+		b.Buf[row+1] = arraylist.New()
+	}
+}
+
+func (b *Buffer) Add(r rune) {
+	switch r {
+	case CharCtrlJ, CharEnter:
+		b.addLine(b.PosY)
+
+		// handle Ctrl-J in the middle of a line
+		var remainingText string
+		if b.PosX < b.Buf[b.PosY].Size() {
+			fmt.Print(ClearToEOL)
+			remainingText = b.StringLine(b.PosX, b.PosY)
+			for cnt := 0; cnt < len(remainingText); cnt++ {
+				b.Buf[b.PosY].Remove(b.Buf[b.PosY].Size() - 1)
+				b.Buf[b.PosY+1].Add(rune(remainingText[cnt]))
+			}
+		}
+		b.PosY++
+		b.PosX = 0
+		fmt.Printf("\n... " + ClearToEOL)
+		b.drawRemaining()
+	default:
+		if b.PosX == b.Buf[b.PosY].Size() {
+			fmt.Printf("%c", r)
+			b.PosX++
+			b.Buf[b.PosY].Add(r)
+			wrap, prefix, offset := b.splitLineInsert(b.PosY, b.PosX)
+			if wrap {
+				fmt.Print(CursorHide + cursorLeftN(len(prefix)+1) + ClearToEOL)
+				fmt.Printf("\n%s... %s%c", ClearToEOL, prefix, r)
+				b.PosY++
+				b.PosX = offset
+				b.ResetCursor()
+				b.drawRemaining()
+				fmt.Print(CursorShow)
+			}
+		} else {
+			fmt.Printf("%c", r)
+			b.Buf[b.PosY].Insert(b.PosX, r)
+			b.PosX++
+			_, prefix, offset := b.splitLineInsert(b.PosY, b.PosX)
+			fmt.Print(CursorHide)
+			if b.PosX > b.Buf[b.PosY].Size() {
+				if offset > 0 {
+					fmt.Print(cursorLeftN(offset))
+				}
+				fmt.Print(ClearToEOL + CursorDown + CursorBOL + ClearToEOL)
+				fmt.Printf("... %s", prefix[:offset])
+				b.PosY++
+				b.PosX = offset
+				b.ResetCursor()
+			}
+			b.drawRemaining()
+			fmt.Print(CursorShow)
+		}
+	}
+}
+
+func (b *Buffer) ResetCursor() {
+	fmt.Print(CursorHide + CursorBOL)
+	fmt.Print(cursorRightN(b.PosX + len(b.Prompt.Prompt)))
+	fmt.Print(CursorShow)
+}
+
+func (b *Buffer) splitLineInsert(posY, posX int) (bool, string, int) {
+	line := b.StringLine(0, posY)
+	screenEdge := b.LineWidth() - 5
+
+	// if the current line doesn't need to be reflowed, none of the other
+	// lines will either
+	if len(line) <= screenEdge {
+		return false, "", 0
+	}
+
+	// we know we're going to have to insert onto the next line, so
+	// add another line if there isn't one already
+	if posY == len(b.Buf)-1 {
+		b.Buf = append(b.Buf, arraylist.New())
+	}
+
+	// make a truncated version of the current line
+	currLine := line[:screenEdge]
+
+	// figure out where the last space in the line is
+	idx := strings.LastIndex(currLine, " ")
+
+	// deal with strings that don't have spaces in them
+	if idx == -1 {
+		idx = len(currLine) - 1
+	}
+
+	// if the next line already has text on it, we need
+	// to add a space to insert our new word
+	if b.Buf[posY+1].Size() > 0 {
+		b.Buf[posY+1].Insert(0, ' ')
+	}
+
+	// calculate the number of characters we need to remove
+	// from the current line to add to the next one
+	totalChars := len(line) - idx - 1
+
+	for cnt := 0; cnt < totalChars; cnt++ {
+		b.Buf[posY].Remove(b.Buf[posY].Size() - 1)
+		b.Buf[posY+1].Insert(0, rune(line[len(line)-1-cnt]))
+	}
+	// remove the trailing space
+	b.Buf[posY].Remove(b.Buf[posY].Size() - 1)
+
+	// wrap any further lines
+	if b.Buf[posY+1].Size() > b.LineWidth()-5 {
+		b.splitLineInsert(posY+1, 0)
+	}
+
+	return true, currLine[idx+1:], posX - idx - 1
+}
+
+func (b *Buffer) drawRemaining() {
+	remainingText := b.StringFromRow(b.PosY)
+	remainingText = remainingText[b.PosX:]
+
+	fmt.Print(CursorHide + ClearToEOL)
+
+	var rowCount int
+	for _, c := range remainingText {
+		fmt.Print(string(c))
+		if c == '\n' {
+			fmt.Print("... " + ClearToEOL)
+			rowCount++
+		}
+	}
+	if rowCount > 0 {
+		fmt.Print(cursorUpN(rowCount))
+	}
+	b.ResetCursor()
+}
+
+func (b *Buffer) findWordBeginning(posX int) int {
+	for {
+		if posX < 0 {
+			return -1
+		}
+		r, ok := b.Buf[b.PosY].Get(posX)
+		if !ok {
+			return -1
+		} else if r.(rune) == ' ' {
+			return posX
+		}
+		posX--
+	}
+}
+
+func (b *Buffer) Delete() {
+	if b.PosX < b.Buf[b.PosY].Size()-1 {
+		b.Buf[b.PosY].Remove(b.PosX)
+		b.drawRemaining()
+	} else {
+		b.joinLines()
+	}
+}
+
+func (b *Buffer) joinLines() {
+	lineLen := b.Buf[b.PosY].Size()
+	for cnt := 0; cnt < lineLen; cnt++ {
+		r, _ := b.Buf[b.PosY].Get(0)
+		b.Buf[b.PosY].Remove(0)
+		b.Buf[b.PosY-1].Add(r)
+	}
+}
+
+func (b *Buffer) Remove() {
+	if b.PosX > 0 {
+		fmt.Print(CursorLeft + " " + CursorLeft)
+		b.PosX--
+		b.Buf[b.PosY].Remove(b.PosX)
+		if b.PosX < b.Buf[b.PosY].Size() {
+			fmt.Print(ClearToEOL)
+			b.drawRemaining()
+		}
+	} else if b.PosX == 0 && b.PosY > 0 {
+		b.joinLines()
+
+		lastPos := b.Buf[b.PosY-1].Size()
+		var cnt int
+		b.PosX = lastPos
+		b.PosY--
+
+		fmt.Print(CursorHide)
+		for {
+			if b.PosX+cnt > b.LineWidth()-5 {
+				// the concatenated line won't fit, so find the beginning of the word
+				// and copy the rest of the string from there
+				idx := b.findWordBeginning(b.PosX)
+				lineLen := b.Buf[b.PosY].Size()
+				for offset := idx + 1; offset < lineLen; offset++ {
+					r, _ := b.Buf[b.PosY].Get(idx + 1)
+					b.Buf[b.PosY].Remove(idx + 1)
+					b.Buf[b.PosY+1].Add(r)
+				}
+				// remove the trailing space
+				b.Buf[b.PosY].Remove(idx)
+				fmt.Print(CursorUp + ClearToEOL)
+				b.PosX = 0
+				b.drawRemaining()
+				fmt.Print(CursorDown)
+				if idx > 0 {
+					if lastPos-idx-1 > 0 {
+						b.PosX = lastPos - idx - 1
+						b.ResetCursor()
+					}
+				}
+				b.PosY++
+				break
+			}
+			r, ok := b.Buf[b.PosY].Get(b.PosX + cnt)
+			if !ok {
+				// found the end of the string
+				fmt.Print(CursorUp + cursorRightN(b.PosX) + ClearToEOL)
+				b.drawRemaining()
+				break
+			}
+			if r == ' ' {
+				// found the end of the word
+				lineLen := b.Buf[b.PosY].Size()
+				for offset := b.PosX + cnt + 1; offset < lineLen; offset++ {
+					r, _ := b.Buf[b.PosY].Get(b.PosX + cnt + 1)
+					b.Buf[b.PosY].Remove(b.PosX + cnt + 1)
+					b.Buf[b.PosY+1].Add(r)
+				}
+				fmt.Print(CursorUp + cursorRightN(b.PosX) + ClearToEOL)
+				b.drawRemaining()
+				break
+			}
+			cnt++
+		}
+		fmt.Print(CursorShow)
+	}
+}
+
+func (b *Buffer) RemoveBefore() {
+	for {
+		if b.PosX == 0 && b.PosY == 0 {
+			break
+		}
+		b.Remove()
+	}
+}
+
+func (b *Buffer) RemoveWordBefore() {
+	if b.PosX > 0 || b.PosY > 0 {
+		var foundNonspace bool
+		for {
+			xPos := b.PosX
+			yPos := b.PosY
+
+			v, _ := b.Buf[yPos].Get(xPos - 1)
+			if v == ' ' {
+				if !foundNonspace {
+					b.Remove()
+				} else {
+					break
+				}
+			} else {
+				foundNonspace = true
+				b.Remove()
+			}
+
+			if xPos == 0 && yPos == 0 {
+				break
+			}
+		}
+	}
+}
+
+func (b *Buffer) StringLine(x, y int) string {
+	if y >= len(b.Buf) {
+		return ""
+	}
+
+	var output string
+
+	for cnt := x; cnt < b.Buf[y].Size(); cnt++ {
+		r, _ := b.Buf[y].Get(cnt)
+		output += string(r.(rune))
+	}
+	return output
+}
+
+func (b *Buffer) String() string {
+	return b.StringFromRow(0)
+}
+
+func (b *Buffer) StringFromRow(n int) string {
+	var output []string
+	for _, row := range b.Buf[n:] {
+		var currLine string
+		for cnt := 0; cnt < row.Size(); cnt++ {
+			r, _ := row.Get(cnt)
+			currLine += string(r.(rune))
+		}
+		currLine = strings.TrimRight(currLine, " ")
+		output = append(output, currLine)
+	}
+	return strings.Join(output, "\n")
+}
+
+func (b *Buffer) cursorUp() {
+	fmt.Print(CursorUp)
+	b.ResetCursor()
+}
+
+func (b *Buffer) cursorDown() {
+	fmt.Print(CursorDown)
+	b.ResetCursor()
+}
+
+func (b *Buffer) MoveUp() {
+	if b.PosY > 0 {
+		b.PosY--
+		if b.Buf[b.PosY].Size() < b.PosX {
+			b.PosX = b.Buf[b.PosY].Size()
+		}
+		b.cursorUp()
+	} else {
+		fmt.Print("\a")
+	}
+}
+
+func (b *Buffer) MoveDown() {
+	if b.PosY < len(b.Buf)-1 {
+		b.PosY++
+		if b.Buf[b.PosY].Size() < b.PosX {
+			b.PosX = b.Buf[b.PosY].Size()
+		}
+		b.cursorDown()
+	} else {
+		fmt.Print("\a")
+	}
+}
+
+func (b *Buffer) MoveLeft() {
+	if b.PosX > 0 {
+		b.PosX--
+		fmt.Print(CursorLeft)
+	} else if b.PosY > 0 {
+		b.PosX = b.Buf[b.PosY-1].Size()
+		b.PosY--
+		b.cursorUp()
+	} else if b.PosX == 0 && b.PosY == 0 {
+		fmt.Print("\a")
+	}
+}
+
+func (b *Buffer) MoveRight() {
+	if b.PosX < b.Buf[b.PosY].Size() {
+		b.PosX++
+		fmt.Print(CursorRight)
+	} else if b.PosY < len(b.Buf)-1 {
+		b.PosY++
+		b.PosX = 0
+		b.cursorDown()
+	} else {
+		fmt.Print("\a")
+	}
+}
+
+func (b *Buffer) MoveToBOL() {
+	if b.PosX > 0 {
+		b.PosX = 0
+		b.ResetCursor()
+	}
+}
+
+func (b *Buffer) MoveToEOL() {
+	if b.PosX < b.Buf[b.PosY].Size() {
+		b.PosX = b.Buf[b.PosY].Size()
+		b.ResetCursor()
+	}
+}
+
+func (b *Buffer) MoveToEnd() {
+	fmt.Print(CursorHide)
+	yDiff := len(b.Buf)-1 - b.PosY
+	if yDiff > 0 {
+		fmt.Print(cursorDownN(yDiff))
+	}
+	b.PosY = len(b.Buf)-1
+	b.MoveToEOL()
+	fmt.Print(CursorShow)
+}
+
+func cursorLeftN(n int) string {
+	return fmt.Sprintf(CursorLeftN, n)
+}
+
+func cursorRightN(n int) string {
+	return fmt.Sprintf(CursorRightN, n)
+}
+
+func cursorUpN(n int) string {
+	return fmt.Sprintf(CursorUpN, n)
+}
+
+func cursorDownN(n int) string {
+	return fmt.Sprintf(CursorDownN, n)
+}
+
+func (b *Buffer) ClearScreen() {
+	fmt.Printf(CursorHide + ClearScreen + CursorReset + b.Prompt.Prompt)
+	if b.IsEmpty() {
+		ph := b.Prompt.Placeholder
+		fmt.Printf(ColorGrey + ph + cursorLeftN(len(ph)) + ColorDefault)
+	} else {
+		currPosX := b.PosX
+		currPosY := b.PosY
+		b.PosX = 0
+		b.PosY = 0
+		b.drawRemaining()
+		b.PosX = currPosX
+		b.PosY = currPosY
+		fmt.Print(CursorReset + cursorRightN(len(b.Prompt.Prompt)))
+		if b.PosY > 0 {
+			fmt.Print(cursorDownN(b.PosY))
+		}
+		if b.PosX > 0 {
+			fmt.Print(cursorRightN(b.PosX))
+		}
+	}
+	fmt.Print(CursorShow)
+}
+
+func (b *Buffer) IsEmpty() bool {
+	return len(b.Buf) == 1 && b.Buf[0].Empty()
+}
--- a/readline/readline.go
+++ b/readline/readline.go
@@ -1,4 +1,4 @@
-package readline
+package editor

 import (
 	"bufio"
@@ -23,7 +23,6 @@ type Terminal struct {
 type Instance struct {
 	Prompt   *Prompt
 	Terminal *Terminal
-	History  *History
 }

 func New(prompt Prompt) (*Instance, error) {
@@ -32,40 +31,33 @@ func New(prompt Prompt) (*Instance, error) {
 		return nil, err
 	}

-	history, err := NewHistory()
-	if err != nil {
-		return nil, err
-	}
-
 	return &Instance{
 		Prompt:   &prompt,
 		Terminal: term,
-		History:  history,
 	}, nil
 }

-func (i *Instance) Readline() (string, error) {
+func (i *Instance) HandleInput() (string, error) {
 	prompt := i.Prompt.Prompt
 	if i.Prompt.UseAlt {
 		prompt = i.Prompt.AltPrompt
 	}
 	fmt.Print(prompt)

-	fd := int(syscall.Stdin)
-	termios, err := SetRawMode(fd)
+	termios, err := SetRawMode(syscall.Stdin)
 	if err != nil {
 		return "", err
 	}
-	defer UnsetRawMode(fd, termios)
+	defer UnsetRawMode(syscall.Stdin, termios)

 	buf, _ := NewBuffer(i.Prompt)

 	var esc bool
 	var escex bool
-	var metaDel bool
 	var pasteMode PasteMode

-	var currentLineBuf []rune
+	fmt.Print(StartBracketedPaste)
+	defer fmt.Printf(EndBracketedPaste)

 	for {
 		if buf.IsEmpty() {
@@ -77,33 +69,22 @@ func (i *Instance) Readline() (string, error) {
 		}

 		r, err := i.Terminal.Read()
+		if err != nil {
+			return "", io.EOF
+		}

 		if buf.IsEmpty() {
 			fmt.Print(ClearToEOL)
 		}

-		if err != nil {
-			return "", io.EOF
-		}
-
 		if escex {
 			escex = false

 			switch r {
 			case KeyUp:
-				if i.History.Pos > 0 {
-					if i.History.Pos == i.History.Size() {
-						currentLineBuf = []rune(buf.String())
-					}
-					buf.Replace(i.History.Prev())
-				}
+				buf.MoveUp()
 			case KeyDown:
-				if i.History.Pos < i.History.Size() {
-					buf.Replace(i.History.Next())
-					if i.History.Pos == i.History.Size() {
-						buf.Replace(currentLineBuf)
-					}
-				}
+				buf.MoveDown()
 			case KeyLeft:
 				buf.MoveLeft()
 			case KeyRight:
@@ -123,28 +104,16 @@ func (i *Instance) Readline() (string, error) {
 				} else if code == CharBracketedPasteEnd {
 					pasteMode = PasteModeEnd
 				}
-			case KeyDel:
-				if buf.Size() > 0 {
-					buf.Delete()
-				}
-				metaDel = true
 			case MetaStart:
-				buf.MoveToStart()
+				buf.MoveToBOL()
 			case MetaEnd:
-				buf.MoveToEnd()
-			default:
-				// skip any keys we don't know about
-				continue
+				buf.MoveToEOL()
 			}
 			continue
 		} else if esc {
 			esc = false

 			switch r {
-			case 'b':
-				buf.MoveLeftWord()
-			case 'f':
-				buf.MoveRightWord()
 			case CharEscapeEx:
 				escex = true
 			}
@@ -159,9 +128,9 @@ func (i *Instance) Readline() (string, error) {
 		case CharInterrupt:
 			return "", ErrInterrupt
 		case CharLineStart:
-			buf.MoveToStart()
+			buf.MoveToBOL()
 		case CharLineEnd:
-			buf.MoveToEnd()
+			buf.MoveToEOL()
 		case CharBackward:
 			buf.MoveLeft()
 		case CharForward:
@@ -169,56 +138,38 @@ func (i *Instance) Readline() (string, error) {
 		case CharBackspace, CharCtrlH:
 			buf.Remove()
 		case CharTab:
-			// todo: convert back to real tabs
 			for cnt := 0; cnt < 8; cnt++ {
 				buf.Add(' ')
 			}
 		case CharDelete:
-			if buf.Size() > 0 {
+			if len(buf.Buf) > 0 && buf.Buf[0].Size() > 0 {
 				buf.Delete()
 			} else {
 				return "", io.EOF
 			}
-		case CharKill:
-			buf.DeleteRemaining()
 		case CharCtrlU:
-			buf.DeleteBefore()
+			buf.RemoveBefore()
 		case CharCtrlL:
 			buf.ClearScreen()
 		case CharCtrlW:
-			buf.DeleteWord()
+			buf.RemoveWordBefore()
+		case CharCtrlJ:
+			buf.Add(r)
 		case CharEnter:
-			output := buf.String()
-			if output != "" {
-				i.History.Add([]rune(output))
+			if pasteMode == PasteModeStart {
+				buf.Add(r)
+				continue
 			}
 			buf.MoveToEnd()
 			fmt.Println()
-			switch pasteMode {
-			case PasteModeStart:
-				output = `"""` + output
-			case PasteModeEnd:
-				output = output + `"""`
-			}
-			return output, nil
+			return buf.String(), nil
 		default:
-			if metaDel {
-				metaDel = false
-				continue
-			}
 			if r >= CharSpace || r == CharEnter {
 				buf.Add(r)
 			}
 		}
 	}
-}

-func (i *Instance) HistoryEnable() {
-	i.History.Enabled = true
-}
-
-func (i *Instance) HistoryDisable() {
-	i.History.Enabled = false
 }

 func NewTerminal() (*Terminal, error) {
--- a/readline/errors.go
+++ b/readline/errors.go
@@ -1,4 +1,4 @@
-package readline
+package editor

 import (
 	"errors"
--- a/readline/term.go
+++ b/readline/term.go
@@ -1,6 +1,6 @@
 //go:build aix || darwin || dragonfly || freebsd || (linux && !appengine) || netbsd || openbsd || os400 || solaris

-package readline
+package editor

 import (
 	"syscall"
--- a/readline/term_bsd.go
+++ b/readline/term_bsd.go
@@ -1,6 +1,5 @@
 //go:build darwin || freebsd || netbsd || openbsd
-
-package readline
+package editor

 import (
 	"syscall"
--- a/readline/term_linux.go
+++ b/readline/term_linux.go
@@ -1,6 +1,5 @@
 //go:build linux || solaris
-
-package readline
+package editor

 import (
 	"syscall"
--- a/readline/term_windows.go
+++ b/readline/term_windows.go
--- a/readline/types.go
+++ b/readline/types.go
@@ -1,4 +1,4 @@
-package readline
+package editor

 const (
 	CharNull      = 0
--- a/examples/bash-comparemodels/README.md
+++ b/examples/bash-comparemodels/README.md
@@ -0,0 +1,10 @@
+# Bash Shell examples
+
+When calling `ollama`, you can pass it a file to run all the prompts in the file, one after the other:
+
+`ollama run llama2 < sourcequestions.txt`
+
+This concept is used in the following example.
+
+## Compare Models
+`comparemodels.sh` is a script that runs all the questions in `sourcequestions.txt` using any 4 models you choose that you have already pulled from the Ollama library or have created locally.
--- a/examples/bash-comparemodels/comparemodels.sh
+++ b/examples/bash-comparemodels/comparemodels.sh
@@ -0,0 +1,64 @@
+#! /usr/bin/env bash
+# Compare multiple models by running them with the same questions
+
+NUMBEROFCHOICES=4
+SELECTIONS=()
+declare -a SUMS=()
+
+# Get the list of models
+CHOICES=$(ollama list | awk '{print $1}')
+
+# Select which models to run as a comparison
+echo "Select $NUMBEROFCHOICES models to compare:"
+select ITEM in $CHOICES; do
+    if [[ -n $ITEM ]]; then
+        echo "You have selected $ITEM"
+        SELECTIONS+=("$ITEM")
+        ((COUNT++))
+        if [[ $COUNT -eq $NUMBEROFCHOICES ]]; then
+            break
+        fi
+    else
+        echo "Invalid selection"
+    fi
+done
+
+# Loop through each of the selected models
+for ITEM in "${SELECTIONS[@]}"; do
+    echo "--------------------------------------------------------------"
+    echo "Loading the model $ITEM into memory"
+    ollama run "$ITEM" ""
+    echo "--------------------------------------------------------------"
+    echo "Running the questions through the model $ITEM"
+    COMMAND_OUTPUT=$(ollama run "$ITEM" --verbose < sourcequestions.txt 2>&1| tee /dev/stderr)
+
+    # eval duration is sometimes listed in seconds and sometimes in milliseconds. 
+    # Add up the values for each model
+    SUM=$(echo "$COMMAND_OUTPUT" | awk '
+    /eval duration:/ {
+        value = $3
+        if (index(value, "ms") > 0) {
+            gsub("ms", "", value)
+            value /= 1000
+        } else {
+            gsub("s", "", value)
+        }
+        sum += value
+    }
+    END { print sum }')
+
+
+    SUMS+=("All questions for $ITEM completed in $SUM seconds")
+done
+
+echo ""
+echo "--------------------------------------------------------------"
+echo -e "Sums of eval durations for each run:"
+for val in "${SUMS[@]}"; do
+    echo "$val"
+done
+
+echo "--------------------------------------------------------------"
+echo "Comparison complete. Now you can decide"
+echo "which model is best."
+echo "--------------------------------------------------------------"
--- a/examples/bash-comparemodels/sourcequestions.txt
+++ b/examples/bash-comparemodels/sourcequestions.txt
@@ -0,0 +1,7 @@
+Why is the sky blue
+What is a black hole
+Explain the big bang theory like I am 5?
+What is the quickest way to win a game of Monopoly with 3 others?
+Why does a vacuum bottle keep my coffee hot and my milkshake cold?
+What is the difference between a meteor, a meteorite, and a meteoroid?
+Create an array with 5 items and print to the console. Do this in Python, C#, Typescript, and Rust.
--- a/examples/kubernetes/README.md
+++ b/examples/kubernetes/README.md
@@ -0,0 +1,36 @@
+# Deploy Ollama to Kubernetes
+
+## Prerequisites
+
+- Ollama: https://ollama.ai/download
+- Kubernetes cluster. This example will use Google Kubernetes Engine.
+
+## Steps
+
+1. Create the Ollama namespace, daemon set, and service
+
+    ```bash
+    kubectl apply -f cpu.yaml
+    ```
+
+1. Port forward the Ollama service to connect and use it locally
+
+    ```bash
+    kubectl -n ollama port-forward service/ollama 11434:80
+    ```
+
+1. Pull and run a model, for example `orca-mini:3b`
+
+    ```bash
+    ollama run orca-mini:3b
+    ```
+
+## (Optional) Hardware Acceleration
+
+Hardware acceleration in Kubernetes requires NVIDIA's [`k8s-device-plugin`](https://github.com/NVIDIA/k8s-device-plugin). Follow the link for more details.
+
+Once configured, create a GPU enabled Ollama deployment.
+
+```bash
+kubectl apply -f gpu.yaml
+```
--- a/examples/kubernetes/cpu.yaml
+++ b/examples/kubernetes/cpu.yaml
@@ -0,0 +1,42 @@
+---
+apiVersion: v1
+kind: Namespace
+metadata:
+  name: ollama
+---
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: ollama
+  namespace: ollama
+spec:
+  selector:
+    matchLabels:
+      name: ollama
+  template:
+    metadata:
+      labels:
+        name: ollama
+    spec:
+      containers:
+      - name: ollama
+        image: ollama/ollama:latest
+        ports:
+        - name: http
+          containerPort: 11434
+          protocol: TCP
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: ollama
+  namespace: ollama
+spec:
+  type: ClusterIP
+  selector:
+    name: ollama
+  ports:
+  - port: 80
+    name: http
+    targetPort: http
+    protocol: TCP
--- a/examples/kubernetes/gpu.yaml
+++ b/examples/kubernetes/gpu.yaml
@@ -0,0 +1,56 @@
+---
+apiVersion: v1
+kind: Namespace
+metadata:
+  name: ollama
+---
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: ollama
+  namespace: ollama
+spec:
+  strategy:
+    type: Recreate
+  selector:
+    matchLabels:
+      name: ollama
+  template:
+    metadata:
+      labels:
+        name: ollama
+    spec:
+      containers:
+      - name: ollama
+        image: ollama/ollama:latest
+        env:
+        - name: PATH
+          value: /usr/local/nvidia/bin:/usr/local/nvidia/lib64:/usr/bin:/usr/sbin:/bin:/sbin
+        - name: LD_LIBRARY_PATH
+          value: /usr/local/nvidia/lib64
+        ports:
+        - name: http
+          containerPort: 11434
+          protocol: TCP
+        resources:
+          limits:
+            nvidia.com/gpu: 1
+      tolerations:
+      - key: nvidia.com/gpu
+        operator: Exists
+        effect: NoSchedule
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: ollama
+  namespace: ollama
+spec:
+  type: ClusterIP
+  selector:
+    name: ollama
+  ports:
+  - port: 80
+    name: http
+    targetPort: http
+    protocol: TCP
--- a/examples/python-simplegenerate/client.py
+++ b/examples/python-simplegenerate/client.py
@@ -17,7 +17,7 @@ def generate(prompt, context):
    for line in r.iter_lines():
        body = json.loads(line)
        response_part = body.get('response', '')
-        # the response streams one token at a time, print that as we recieve it
+        # the response streams one token at a time, print that as we receive it
        print(response_part, end='', flush=True)

        if 'error' in body:
@@ -35,4 +35,4 @@ def main():
        print()

 if __name__ == "__main__":
-    main()
+    main()
--- a/format/format.go
+++ b/format/format.go
@@ -0,0 +1,25 @@
+package format
+
+import (
+	"fmt"
+	"math"
+)
+
+const (
+	Thousand = 1000
+	Million  = Thousand * 1000
+	Billion  = Million * 1000
+)
+
+func HumanNumber(b uint64) string {
+	switch {
+	case b > Billion:
+		return fmt.Sprintf("%.0fB", math.Round(float64(b)/Billion))
+	case b > Million:
+		return fmt.Sprintf("%.0fM", math.Round(float64(b)/Million))
+	case b > Thousand:
+		return fmt.Sprintf("%.0fK", math.Round(float64(b)/Thousand))
+	default:
+		return fmt.Sprintf("%d", b)
+	}
+}
--- a/llm/gguf.go
+++ b/llm/gguf.go
@@ -5,6 +5,8 @@ import (
 	"encoding/binary"
 	"fmt"
 	"io"
+
+	"github.com/jmorganca/ollama/format"
 )

 type containerGGUF struct {
@@ -21,6 +23,8 @@ type containerGGUF struct {
 		NumTensor uint64
 		NumKV     uint64
 	}
+
+	parameters uint64
 }

 func (c *containerGGUF) Name() string {
@@ -75,6 +79,14 @@ func newGGUFModel(container *containerGGUF) *ggufModel {
 	}
 }

+func (llm *ggufModel) NumTensor() uint64 {
+	if llm.Version == 1 {
+		return uint64(llm.V1.NumTensor)
+	}
+
+	return llm.V2.NumTensor
+}
+
 func (llm *ggufModel) NumKV() uint64 {
 	if llm.Version == 1 {
 		return uint64(llm.V1.NumKV)
@@ -93,6 +105,10 @@ func (llm *ggufModel) ModelFamily() string {
 }

 func (llm *ggufModel) ModelType() string {
+	if llm.parameters > 0 {
+		return format.HumanNumber(llm.parameters)
+	}
+
 	switch llm.ModelFamily() {
 	case "llama":
 		if blocks, ok := llm.kv["llama.block_count"].(uint32); ok {
@@ -127,13 +143,9 @@ func (llm *ggufModel) FileType() string {
 }

 func (llm *ggufModel) Decode(r io.Reader) error {
-	read := llm.readString
-	if llm.Version == 1 {
-		read = llm.readStringV1
-	}
-
+	// decode key-values
 	for i := 0; uint64(i) < llm.NumKV(); i++ {
-		k, err := read(r)
+		k, err := llm.readString(r)
 		if err != nil {
 			return err
 		}
@@ -165,24 +177,14 @@ func (llm *ggufModel) Decode(r io.Reader) error {
 		case ggufTypeBool:
 			v = llm.readBool(r)
 		case ggufTypeString:
-			fn := llm.readString
-			if llm.Version == 1 {
-				fn = llm.readStringV1
-			}
-
-			s, err := fn(r)
+			s, err := llm.readString(r)
 			if err != nil {
 				return err
 			}

 			v = s
 		case ggufTypeArray:
-			fn := llm.readArray
-			if llm.Version == 1 {
-				fn = llm.readArrayV1
-			}
-
-			a, err := fn(r)
+			a, err := llm.readArray(r)
 			if err != nil {
 				return err
 			}
@@ -195,6 +197,25 @@ func (llm *ggufModel) Decode(r io.Reader) error {
 		llm.kv[k] = v
 	}

+	// decode tensors
+	for i := 0; uint64(i) < llm.NumTensor(); i++ {
+		if _, err := llm.readString(r); err != nil {
+			return err
+		}
+
+		dimensions := llm.readU32(r)
+
+		var elements uint64 = 1
+		for i := 0; uint32(i) < dimensions; i++ {
+			elements *= llm.readU64(r)
+		}
+
+		llm.readU32(r) // type
+		llm.readU64(r) // offset
+
+		llm.parameters += elements
+	}
+
 	return nil
 }

@@ -290,6 +311,10 @@ func (llm ggufModel) readStringV1(r io.Reader) (string, error) {
 }

 func (llm ggufModel) readString(r io.Reader) (string, error) {
+	if llm.Version == 1 {
+		return llm.readStringV1(r)
+	}
+
 	var nameLength uint64
 	binary.Read(r, llm.bo, &nameLength)

@@ -339,6 +364,10 @@ func (llm *ggufModel) readArrayV1(r io.Reader) (arr []any, err error) {
 }

 func (llm *ggufModel) readArray(r io.Reader) (arr []any, err error) {
+	if llm.Version == 1 {
+		return llm.readArrayV1(r)
+	}
+
 	atype := llm.readU32(r)
 	n := llm.readU64(r)

--- a/llm/llama.go
+++ b/llm/llama.go
@@ -27,6 +27,34 @@ import (
 	"github.com/jmorganca/ollama/format"
 )

+const jsonGrammar = `
+root   ::= object
+value  ::= object | array | string | number | ("true" | "false" | "null") ws
+
+object ::=
+  "{" ws (
+            string ":" ws value
+    ("," ws string ":" ws value)*
+  )? "}" ws
+
+array  ::=
+  "[" ws (
+            value
+    ("," ws value)*
+  )? "]" ws
+
+string ::=
+  "\"" (
+    [^"\\] |
+    "\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F]) # escapes
+  )* "\"" ws
+
+number ::= ("-"? ([0-9] | [1-9] [0-9]*)) ("." [0-9]+)? ([eE] [-+]? [0-9]+)? ws
+
+# Optional space: by convention, applied in this grammar after literal chars when allowed
+ws ::= ([ \t\n] ws)?
+`
+
 //go:embed llama.cpp/*/build/*/bin/*
 var llamaCppEmbed embed.FS

@@ -196,7 +224,10 @@ type llama struct {
 	Running
 }

-var errNoGPU = errors.New("nvidia-smi command failed")
+var (
+	errNvidiaSMI     = errors.New("nvidia-smi command failed")
+	errAvailableVRAM = errors.New("not enough VRAM available, falling back to CPU only")
+)

 // CheckVRAM returns the free VRAM in bytes on Linux machines with NVIDIA GPUs
 func CheckVRAM() (int64, error) {
@@ -205,7 +236,7 @@ func CheckVRAM() (int64, error) {
 	cmd.Stdout = &stdout
 	err := cmd.Run()
 	if err != nil {
-		return 0, errNoGPU
+		return 0, errNvidiaSMI
 	}

 	var freeMiB int64
@@ -226,8 +257,8 @@ func CheckVRAM() (int64, error) {

 	freeBytes := freeMiB * 1024 * 1024
 	if freeBytes < 2*format.GigaByte {
-		log.Printf("less than 2 GB VRAM available, falling back to CPU only")
-		freeMiB = 0
+		log.Printf("less than 2 GB VRAM available")
+		return 0, errAvailableVRAM
 	}

 	return freeBytes, nil
@@ -240,7 +271,7 @@ func NumGPU(numLayer, fileSizeBytes int64, opts api.Options) int {
 	if runtime.GOOS == "linux" {
 		freeBytes, err := CheckVRAM()
 		if err != nil {
-			if err.Error() != "nvidia-smi command failed" {
+			if !errors.Is(err, errNvidiaSMI) {
 				log.Print(err.Error())
 			}
 			// nvidia driver not installed or no nvidia GPU found
@@ -494,7 +525,7 @@ type prediction struct {

 const maxBufferSize = 512 * format.KiloByte

-func (llm *llama) Predict(ctx context.Context, prevContext []int, prompt string, fn func(api.GenerateResponse)) error {
+func (llm *llama) Predict(ctx context.Context, prevContext []int, prompt string, format string, fn func(api.GenerateResponse)) error {
 	prevConvo, err := llm.Decode(ctx, prevContext)
 	if err != nil {
 		return err
@@ -529,6 +560,10 @@ func (llm *llama) Predict(ctx context.Context, prevContext []int, prompt string,
 		"stop":              llm.Stop,
 	}

+	if format == "json" {
+		request["grammar"] = jsonGrammar
+	}
+
 	// Handling JSON marshaling with special characters unescaped.
 	buffer := &bytes.Buffer{}
 	enc := json.NewEncoder(buffer)
--- a/llm/llm.go
+++ b/llm/llm.go
@@ -14,7 +14,7 @@ import (
 )

 type LLM interface {
-	Predict(context.Context, []int, string, func(api.GenerateResponse)) error
+	Predict(context.Context, []int, string, string, func(api.GenerateResponse)) error
 	Embedding(context.Context, string) ([]float64, error)
 	Encode(context.Context, string) ([]int, error)
 	Decode(context.Context, []int) (string, error)
--- a/progressbar/progressbar.go
+++ b/progressbar/progressbar.go
@@ -291,7 +291,7 @@ func OptionShowDescriptionAtLineEnd() Option {
 	}
 }

-var defaultTheme = Theme{Saucer: "█", SaucerPadding: " ", BarStart: "|", BarEnd: "|"}
+var defaultTheme = Theme{Saucer: "█", SaucerPadding: " ", BarStart: "▕", BarEnd: "▏"}

 // NewOptions constructs a new instance of ProgressBar, with any options you specify
 func NewOptions(max int, options ...Option) *ProgressBar {
--- a/readline/buffer.go
+++ b/readline/buffer.go
@@ -1,372 +0,0 @@
-package readline
-
-import (
-	"fmt"
-	"os"
-
-	"github.com/emirpasic/gods/lists/arraylist"
-	"golang.org/x/term"
-)
-
-type Buffer struct {
-	Pos       int
-	Buf       *arraylist.List
-	Prompt    *Prompt
-	LineWidth int
-	Width     int
-	Height    int
-}
-
-func NewBuffer(prompt *Prompt) (*Buffer, error) {
-	fd := int(os.Stdout.Fd())
-	width, height, err := term.GetSize(fd)
-	if err != nil {
-		fmt.Println("Error getting size:", err)
-		return nil, err
-	}
-
-	lwidth := width - len(prompt.Prompt)
-	if prompt.UseAlt {
-		lwidth = width - len(prompt.AltPrompt)
-	}
-
-	b := &Buffer{
-		Pos:       0,
-		Buf:       arraylist.New(),
-		Prompt:    prompt,
-		Width:     width,
-		Height:    height,
-		LineWidth: lwidth,
-	}
-
-	return b, nil
-}
-
-func (b *Buffer) MoveLeft() {
-	if b.Pos > 0 {
-		if b.Pos%b.LineWidth == 0 {
-			fmt.Printf(CursorUp + CursorBOL + cursorRightN(b.Width))
-		} else {
-			fmt.Print(CursorLeft)
-		}
-		b.Pos -= 1
-	}
-}
-
-func (b *Buffer) MoveLeftWord() {
-	if b.Pos > 0 {
-		var foundNonspace bool
-		for {
-			v, _ := b.Buf.Get(b.Pos - 1)
-			if v == ' ' {
-				if foundNonspace {
-					break
-				}
-			} else {
-				foundNonspace = true
-			}
-			b.MoveLeft()
-
-			if b.Pos == 0 {
-				break
-			}
-		}
-	}
-}
-
-func (b *Buffer) MoveRight() {
-	if b.Pos < b.Size() {
-		b.Pos += 1
-		if b.Pos%b.LineWidth == 0 {
-			fmt.Printf(CursorDown + CursorBOL + cursorRightN(b.PromptSize()))
-		} else {
-			fmt.Print(CursorRight)
-		}
-	}
-}
-
-func (b *Buffer) MoveRightWord() {
-	if b.Pos < b.Size() {
-		for {
-			b.MoveRight()
-			v, _ := b.Buf.Get(b.Pos)
-			if v == ' ' {
-				break
-			}
-
-			if b.Pos == b.Size() {
-				break
-			}
-		}
-	}
-}
-
-func (b *Buffer) MoveToStart() {
-	if b.Pos > 0 {
-		currLine := b.Pos / b.LineWidth
-		if currLine > 0 {
-			for cnt := 0; cnt < currLine; cnt++ {
-				fmt.Print(CursorUp)
-			}
-		}
-		fmt.Printf(CursorBOL + cursorRightN(b.PromptSize()))
-		b.Pos = 0
-	}
-}
-
-func (b *Buffer) MoveToEnd() {
-	if b.Pos < b.Size() {
-		currLine := b.Pos / b.LineWidth
-		totalLines := b.Size() / b.LineWidth
-		if currLine < totalLines {
-			for cnt := 0; cnt < totalLines-currLine; cnt++ {
-				fmt.Print(CursorDown)
-			}
-			remainder := b.Size() % b.LineWidth
-			fmt.Printf(CursorBOL + cursorRightN(b.PromptSize()+remainder))
-		} else {
-			fmt.Print(cursorRightN(b.Size() - b.Pos))
-		}
-
-		b.Pos = b.Size()
-	}
-}
-
-func (b *Buffer) Size() int {
-	return b.Buf.Size()
-}
-
-func min(n, m int) int {
-	if n > m {
-		return m
-	}
-	return n
-}
-
-func (b *Buffer) PromptSize() int {
-	if b.Prompt.UseAlt {
-		return len(b.Prompt.AltPrompt)
-	}
-	return len(b.Prompt.Prompt)
-}
-
-func (b *Buffer) Add(r rune) {
-	if b.Pos == b.Buf.Size() {
-		fmt.Printf("%c", r)
-		b.Buf.Add(r)
-		b.Pos += 1
-		if b.Pos > 0 && b.Pos%b.LineWidth == 0 {
-			fmt.Printf("\n%s", b.Prompt.AltPrompt)
-		}
-	} else {
-		fmt.Printf("%c", r)
-		b.Buf.Insert(b.Pos, r)
-		b.Pos += 1
-		if b.Pos > 0 && b.Pos%b.LineWidth == 0 {
-			fmt.Printf("\n%s", b.Prompt.AltPrompt)
-		}
-		b.drawRemaining()
-	}
-}
-
-func (b *Buffer) drawRemaining() {
-	var place int
-	remainingText := b.StringN(b.Pos)
-	if b.Pos > 0 {
-		place = b.Pos % b.LineWidth
-	}
-	fmt.Print(CursorHide)
-
-	// render the rest of the current line
-	currLine := remainingText[:min(b.LineWidth-place, len(remainingText))]
-	if len(currLine) > 0 {
-		fmt.Printf(ClearToEOL + currLine)
-		fmt.Print(cursorLeftN(len(currLine)))
-	} else {
-		fmt.Print(ClearToEOL)
-	}
-
-	// render the other lines
-	if len(remainingText) > len(currLine) {
-		remaining := []rune(remainingText[len(currLine):])
-		var totalLines int
-		for i, c := range remaining {
-			if i%b.LineWidth == 0 {
-				fmt.Printf("\n%s", b.Prompt.AltPrompt)
-				totalLines += 1
-			}
-			fmt.Printf("%c", c)
-		}
-		fmt.Print(ClearToEOL)
-		fmt.Print(cursorUpN(totalLines))
-		fmt.Printf(CursorBOL + cursorRightN(b.Width-len(currLine)))
-	}
-
-	fmt.Print(CursorShow)
-}
-
-func (b *Buffer) Remove() {
-	if b.Buf.Size() > 0 && b.Pos > 0 {
-		if b.Pos%b.LineWidth == 0 {
-			// if the user backspaces over the word boundary, do this magic to clear the line
-			// and move to the end of the previous line
-			fmt.Printf(CursorBOL + ClearToEOL)
-			fmt.Printf(CursorUp + CursorBOL + cursorRightN(b.Width) + " " + CursorLeft)
-		} else {
-			fmt.Printf(CursorLeft + " " + CursorLeft)
-		}
-
-		var eraseExtraLine bool
-		if (b.Size()-1)%b.LineWidth == 0 {
-			eraseExtraLine = true
-		}
-
-		b.Pos -= 1
-		b.Buf.Remove(b.Pos)
-
-		if b.Pos < b.Size() {
-			b.drawRemaining()
-			// this erases a line which is left over when backspacing in the middle of a line and there
-			// are trailing characters which go over the line width boundary
-			if eraseExtraLine {
-				remainingLines := (b.Size() - b.Pos) / b.LineWidth
-				fmt.Printf(cursorDownN(remainingLines+1) + CursorBOL + ClearToEOL)
-				place := b.Pos % b.LineWidth
-				fmt.Printf(cursorUpN(remainingLines+1) + cursorRightN(place+len(b.Prompt.Prompt)))
-			}
-		}
-	}
-}
-
-func (b *Buffer) Delete() {
-	if b.Size() > 0 && b.Pos < b.Size() {
-		b.Buf.Remove(b.Pos)
-		b.drawRemaining()
-		if b.Size()%b.LineWidth == 0 {
-			if b.Pos != b.Size() {
-				remainingLines := (b.Size() - b.Pos) / b.LineWidth
-				fmt.Printf(cursorDownN(remainingLines) + CursorBOL + ClearToEOL)
-				place := b.Pos % b.LineWidth
-				fmt.Printf(cursorUpN(remainingLines) + cursorRightN(place+len(b.Prompt.Prompt)))
-			}
-		}
-	}
-}
-
-func (b *Buffer) DeleteBefore() {
-	if b.Pos > 0 {
-		for cnt := b.Pos - 1; cnt >= 0; cnt-- {
-			b.Remove()
-		}
-	}
-}
-
-func (b *Buffer) DeleteRemaining() {
-	if b.Size() > 0 && b.Pos < b.Size() {
-		charsToDel := b.Size() - b.Pos
-		for cnt := 0; cnt < charsToDel; cnt++ {
-			b.Delete()
-		}
-	}
-}
-
-func (b *Buffer) DeleteWord() {
-	if b.Buf.Size() > 0 && b.Pos > 0 {
-		var foundNonspace bool
-		for {
-			v, _ := b.Buf.Get(b.Pos - 1)
-			if v == ' ' {
-				if !foundNonspace {
-					b.Remove()
-				} else {
-					break
-				}
-			} else {
-				foundNonspace = true
-				b.Remove()
-			}
-
-			if b.Pos == 0 {
-				break
-			}
-		}
-	}
-}
-
-func (b *Buffer) ClearScreen() {
-	fmt.Printf(ClearScreen + CursorReset + b.Prompt.Prompt)
-	if b.IsEmpty() {
-		ph := b.Prompt.Placeholder
-		fmt.Printf(ColorGrey + ph + cursorLeftN(len(ph)) + ColorDefault)
-	} else {
-		currPos := b.Pos
-		b.Pos = 0
-		b.drawRemaining()
-		fmt.Printf(CursorReset + cursorRightN(len(b.Prompt.Prompt)))
-		if currPos > 0 {
-			targetLine := currPos / b.LineWidth
-			if targetLine > 0 {
-				for cnt := 0; cnt < targetLine; cnt++ {
-					fmt.Print(CursorDown)
-				}
-			}
-			remainder := currPos % b.LineWidth
-			if remainder > 0 {
-				fmt.Print(cursorRightN(remainder))
-			}
-			if currPos%b.LineWidth == 0 {
-				fmt.Printf(CursorBOL + b.Prompt.AltPrompt)
-			}
-		}
-		b.Pos = currPos
-	}
-}
-
-func (b *Buffer) IsEmpty() bool {
-	return b.Buf.Empty()
-}
-
-func (b *Buffer) Replace(r []rune) {
-	b.Pos = 0
-	b.Buf.Clear()
-	fmt.Printf(ClearLine + CursorBOL + b.Prompt.Prompt)
-	for _, c := range r {
-		b.Add(c)
-	}
-}
-
-func (b *Buffer) String() string {
-	return b.StringN(0)
-}
-
-func (b *Buffer) StringN(n int) string {
-	return b.StringNM(n, 0)
-}
-
-func (b *Buffer) StringNM(n, m int) string {
-	var s string
-	if m == 0 {
-		m = b.Size()
-	}
-	for cnt := n; cnt < m; cnt++ {
-		c, _ := b.Buf.Get(cnt)
-		s += string(c.(rune))
-	}
-	return s
-}
-
-func cursorLeftN(n int) string {
-	return fmt.Sprintf(CursorLeftN, n)
-}
-
-func cursorRightN(n int) string {
-	return fmt.Sprintf(CursorRightN, n)
-}
-
-func cursorUpN(n int) string {
-	return fmt.Sprintf(CursorUpN, n)
-}
-
-func cursorDownN(n int) string {
-	return fmt.Sprintf(CursorDownN, n)
-}
--- a/readline/history.go
+++ b/readline/history.go
@@ -1,152 +0,0 @@
-package readline
-
-import (
-	"bufio"
-	"errors"
-	"io"
-	"os"
-	"path/filepath"
-	"strings"
-
-	"github.com/emirpasic/gods/lists/arraylist"
-)
-
-type History struct {
-	Buf      *arraylist.List
-	Autosave bool
-	Pos      int
-	Limit    int
-	Filename string
-	Enabled  bool
-}
-
-func NewHistory() (*History, error) {
-	h := &History{
-		Buf:      arraylist.New(),
-		Limit:    100, //resizeme
-		Autosave: true,
-		Enabled:  true,
-	}
-
-	err := h.Init()
-	if err != nil {
-		return nil, err
-	}
-
-	return h, nil
-}
-
-func (h *History) Init() error {
-	home, err := os.UserHomeDir()
-	if err != nil {
-		return err
-	}
-
-	path := filepath.Join(home, ".ollama", "history")
-	h.Filename = path
-
-	//todo check if the file exists
-	f, err := os.OpenFile(path, os.O_CREATE|os.O_RDONLY, 0600)
-	if err != nil {
-		if errors.Is(err, os.ErrNotExist) {
-			return nil
-		}
-		return err
-	}
-	defer f.Close()
-
-	r := bufio.NewReader(f)
-	for {
-		line, err := r.ReadString('\n')
-		if err != nil {
-			if err == io.EOF {
-				break
-			}
-			return err
-		}
-
-		line = strings.TrimSpace(line)
-		if len(line) == 0 {
-			continue
-		}
-
-		h.Add([]rune(line))
-	}
-
-	return nil
-}
-
-func (h *History) Add(l []rune) {
-	h.Buf.Add(l)
-	h.Compact()
-	h.Pos = h.Size()
-	if h.Autosave {
-		h.Save()
-	}
-}
-
-func (h *History) Compact() {
-	s := h.Buf.Size()
-	if s > h.Limit {
-		for cnt := 0; cnt < s-h.Limit; cnt++ {
-			h.Buf.Remove(0)
-		}
-	}
-}
-
-func (h *History) Clear() {
-	h.Buf.Clear()
-}
-
-func (h *History) Prev() []rune {
-	var line []rune
-	if h.Pos > 0 {
-		h.Pos -= 1
-	}
-	v, _ := h.Buf.Get(h.Pos)
-	line, _ = v.([]rune)
-	return line
-}
-
-func (h *History) Next() []rune {
-	var line []rune
-	if h.Pos < h.Buf.Size() {
-		h.Pos += 1
-		v, _ := h.Buf.Get(h.Pos)
-		line, _ = v.([]rune)
-	}
-	return line
-}
-
-func (h *History) Size() int {
-	return h.Buf.Size()
-}
-
-func (h *History) Save() error {
-	if !h.Enabled {
-		return nil
-	}
-
-	tmpFile := h.Filename + ".tmp"
-
-	f, err := os.OpenFile(tmpFile, os.O_CREATE|os.O_WRONLY|os.O_TRUNC|os.O_APPEND, 0666)
-	if err != nil {
-		return err
-	}
-	defer f.Close()
-
-	buf := bufio.NewWriter(f)
-	for cnt := 0; cnt < h.Size(); cnt++ {
-		v, _ := h.Buf.Get(cnt)
-		line, _ := v.([]rune)
-		buf.WriteString(string(line) + "\n")
-	}
-	buf.Flush()
-	f.Close()
-
-	if err = os.Rename(tmpFile, h.Filename); err != nil {
-		return err
-	}
-
-	return nil
-}
--- a/scripts/install.sh
+++ b/scripts/install.sh
@@ -180,7 +180,7 @@ install_cuda_driver_apt() {
    case $1 in
        debian)
            status 'Enabling contrib sources...'
-            $SUDO sed 's/main/contrib/' < /etc/apt/sources.list | sudo tee /etc/apt/sources.list.d/contrib.list > /dev/null
+            $SUDO sed 's/main/contrib/' < /etc/apt/sources.list | $SUDO tee /etc/apt/sources.list.d/contrib.list > /dev/null
            ;;
    esac

--- a/server/download.go
+++ b/server/download.go
@@ -149,9 +149,10 @@ func (b *blobDownload) run(ctx context.Context, requestURL *url.URL, opts *Regis

 		i := i
 		g.Go(func() error {
+			var err error
 			for try := 0; try < maxRetries; try++ {
 				w := io.NewOffsetWriter(file, part.StartsAt())
-				err := b.downloadChunk(inner, requestURL, w, part, opts)
+				err = b.downloadChunk(inner, requestURL, w, part, opts)
 				switch {
 				case errors.Is(err, context.Canceled), errors.Is(err, syscall.ENOSPC):
 					// return immediately if the context is canceled or the device is out of space
@@ -160,11 +161,14 @@ func (b *blobDownload) run(ctx context.Context, requestURL *url.URL, opts *Regis
 					log.Printf("%s part %d attempt %d failed: %v, retrying", b.Digest[7:19], i, try, err)
 					continue
 				default:
+					if try > 0 {
+						log.Printf("%s part %d completed after %d retries", b.Digest[7:19], i, try)
+					}
 					return nil
 				}
 			}

-			return errMaxRetriesExceeded
+			return fmt.Errorf("%w: %w", errMaxRetriesExceeded, err)
 		})
 	}

@@ -201,7 +205,7 @@ func (b *blobDownload) downloadChunk(ctx context.Context, requestURL *url.URL, w
 	defer resp.Body.Close()

 	n, err := io.Copy(w, io.TeeReader(resp.Body, b))
-	if err != nil && !errors.Is(err, context.Canceled) {
+	if err != nil && !errors.Is(err, context.Canceled) && !errors.Is(err, io.ErrUnexpectedEOF) {
 		// rollback progress
 		b.Completed.Add(-n)
 		return err
@@ -212,7 +216,7 @@ func (b *blobDownload) downloadChunk(ctx context.Context, requestURL *url.URL, w
 		return err
 	}

-	// return nil or context.Canceled
+	// return nil or context.Canceled or UnexpectedEOF (resumable)
 	return err
 }

--- a/server/images.go
+++ b/server/images.go
@@ -397,7 +397,7 @@ func CreateModel(ctx context.Context, name string, path string, fn func(resp api
 					if err != nil {
 						return err
 					}
-					newLayer.From = mp.GetNamespaceRepository()
+					newLayer.From = mp.GetShortTagname()
 					layers = append(layers, newLayer)
 				}
 			}
--- a/server/routes.go
+++ b/server/routes.go
@@ -158,9 +158,17 @@ func GenerateHandler(c *gin.Context) {
 		return
 	}

-	if req.Model == "" {
+	// validate the request
+	switch {
+	case req.Model == "":
 		c.AbortWithStatusJSON(http.StatusBadRequest, gin.H{"error": "model is required"})
 		return
+	case len(req.Format) > 0 && req.Format != "json":
+		c.AbortWithStatusJSON(http.StatusBadRequest, gin.H{"error": "format must be json"})
+		return
+	case req.Raw && (req.Template != "" || req.System != "" || len(req.Context) > 0):
+		c.AbortWithStatusJSON(http.StatusBadRequest, gin.H{"error": "raw mode does not support template, system, or context"})
+		return
 	}

 	model, err := GetModel(req.Model)
@@ -189,10 +197,13 @@ func GenerateHandler(c *gin.Context) {

 	checkpointLoaded := time.Now()

-	prompt, err := model.Prompt(req)
-	if err != nil {
-		c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
-		return
+	prompt := req.Prompt
+	if !req.Raw {
+		prompt, err = model.Prompt(req)
+		if err != nil {
+			c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
+			return
+		}
 	}

 	ch := make(chan any)
@@ -215,10 +226,15 @@ func GenerateHandler(c *gin.Context) {
 				r.LoadDuration = checkpointLoaded.Sub(checkpointStart)
 			}

+			if req.Raw {
+				// in raw mode the client must manage history on their own
+				r.Context = nil
+			}
+
 			ch <- r
 		}

-		if err := loaded.runner.Predict(c.Request.Context(), req.Context, prompt, fn); err != nil {
+		if err := loaded.runner.Predict(c.Request.Context(), req.Context, prompt, req.Format, fn); err != nil {
 			ch <- gin.H{"error": err.Error()}
 		}
 	}()
--- a/server/upload.go
+++ b/server/upload.go
@@ -40,14 +40,6 @@ type blobUpload struct {
 	references atomic.Int32
 }

-type blobUploadPart struct {
-	// N is the part number
-	N      int
-	Offset int64
-	Size   int64
-	hash.Hash
-}
-
 const (
 	numUploadParts          = 64
 	minUploadPartSize int64 = 95 * 1000 * 1000
@@ -100,7 +92,7 @@ func (b *blobUpload) Prepare(ctx context.Context, requestURL *url.URL, opts *Reg
 		}

 		// set part.N to the current number of parts
-		b.Parts = append(b.Parts, blobUploadPart{N: len(b.Parts), Offset: offset, Size: size, Hash: md5.New()})
+		b.Parts = append(b.Parts, blobUploadPart{blobUpload: b, N: len(b.Parts), Offset: offset, Size: size})
 		offset += size
 	}

@@ -143,9 +135,10 @@ func (b *blobUpload) Run(ctx context.Context, opts *RegistryOptions) {
 		case <-inner.Done():
 		case requestURL := <-b.nextURL:
 			g.Go(func() error {
+				var err error
 				for try := 0; try < maxRetries; try++ {
-					r := io.NewSectionReader(f, part.Offset, part.Size)
-					err := b.uploadChunk(inner, http.MethodPatch, requestURL, r, part, opts)
+					part.ReadSeeker = io.NewSectionReader(f, part.Offset, part.Size)
+					err = b.uploadChunk(inner, http.MethodPatch, requestURL, part, opts)
 					switch {
 					case errors.Is(err, context.Canceled):
 						return err
@@ -159,7 +152,7 @@ func (b *blobUpload) Run(ctx context.Context, opts *RegistryOptions) {
 					return nil
 				}

-				return errMaxRetriesExceeded
+				return fmt.Errorf("%w: %w", errMaxRetriesExceeded, err)
 			})
 		}
 	}
@@ -197,7 +190,9 @@ func (b *blobUpload) Run(ctx context.Context, opts *RegistryOptions) {
 	b.done = true
 }

-func (b *blobUpload) uploadChunk(ctx context.Context, method string, requestURL *url.URL, rs io.ReadSeeker, part *blobUploadPart, opts *RegistryOptions) error {
+func (b *blobUpload) uploadChunk(ctx context.Context, method string, requestURL *url.URL, part *blobUploadPart, opts *RegistryOptions) error {
+	part.Reset()
+
 	headers := make(http.Header)
 	headers.Set("Content-Type", "application/octet-stream")
 	headers.Set("Content-Length", fmt.Sprintf("%d", part.Size))
@@ -207,8 +202,7 @@ func (b *blobUpload) uploadChunk(ctx context.Context, method string, requestURL
 		headers.Set("Content-Range", fmt.Sprintf("%d-%d", part.Offset, part.Offset+part.Size-1))
 	}

-	buw := blobUploadWriter{blobUpload: b}
-	resp, err := makeRequest(ctx, method, requestURL, headers, io.TeeReader(rs, io.MultiWriter(&buw, part.Hash)), opts)
+	resp, err := makeRequest(ctx, method, requestURL, headers, io.TeeReader(part.ReadSeeker, io.MultiWriter(part, part.Hash)), opts)
 	if err != nil {
 		return err
 	}
@@ -234,11 +228,7 @@ func (b *blobUpload) uploadChunk(ctx context.Context, method string, requestURL
 		}

 		for try := 0; try < maxRetries; try++ {
-			rs.Seek(0, io.SeekStart)
-			b.Completed.Add(-buw.written)
-			buw.written = 0
-			part.Hash = md5.New()
-			err := b.uploadChunk(ctx, http.MethodPut, redirectURL, rs, part, nil)
+			err = b.uploadChunk(ctx, http.MethodPut, redirectURL, part, nil)
 			switch {
 			case errors.Is(err, context.Canceled):
 				return err
@@ -252,7 +242,7 @@ func (b *blobUpload) uploadChunk(ctx context.Context, method string, requestURL
 			return nil
 		}

-		return errMaxRetriesExceeded
+		return fmt.Errorf("%w: %w", errMaxRetriesExceeded, err)

 	case resp.StatusCode == http.StatusUnauthorized:
 		auth := resp.Header.Get("www-authenticate")
@@ -270,9 +260,6 @@ func (b *blobUpload) uploadChunk(ctx context.Context, method string, requestURL
 			return err
 		}

-		rs.Seek(0, io.SeekStart)
-		b.Completed.Add(-buw.written)
-		buw.written = 0
 		return fmt.Errorf("http status %d %s: %s", resp.StatusCode, resp.Status, body)
 	}

@@ -318,18 +305,33 @@ func (b *blobUpload) Wait(ctx context.Context, fn func(api.ProgressResponse)) er
 	}
 }

-type blobUploadWriter struct {
+type blobUploadPart struct {
+	// N is the part number
+	N      int
+	Offset int64
+	Size   int64
+	hash.Hash
+
 	written int64
+
+	io.ReadSeeker
 	*blobUpload
 }

-func (b *blobUploadWriter) Write(p []byte) (n int, err error) {
-	n = len(p)
-	b.written += int64(n)
-	b.Completed.Add(int64(n))
+func (p *blobUploadPart) Write(b []byte) (n int, err error) {
+	n = len(b)
+	p.written += int64(n)
+	p.Completed.Add(int64(n))
 	return n, nil
 }

+func (p *blobUploadPart) Reset() {
+	p.Seek(0, io.SeekStart)
+	p.Completed.Add(-int64(p.written))
+	p.written = 0
+	p.Hash = md5.New()
+}
+
 func uploadBlob(ctx context.Context, mp ModelPath, layer *Layer, opts *RegistryOptions, fn func(api.ProgressResponse)) error {
 	requestURL := mp.BaseURL()
 	requestURL = requestURL.JoinPath("v2", mp.GetNamespaceRepository(), "blobs", layer.Digest)
Author	SHA1	Message	Date
Patrick Devine	ad83c87454	add back in the windows terminal file	2023-11-14 16:52:34 -08:00
Patrick Devine	8627f6c66c	initial commit of the readline editor replacement	2023-11-14 15:59:35 -08:00
Jeffrey Morgan	423862042a	treat `ollama run model < file` as entire prompt, not prompt-per-line (#1126 ) Previously, `ollama run` treated a non-terminal stdin (such as `ollama run model < file`) as containing one prompt per line. To run inference on a multi-line prompt, the only non-API workaround was to run `ollama run` interactively and wrap the prompt in `"""..."""`. Now, `ollama run` treats a non-terminal stdin as containing a single prompt. For example, if `myprompt.txt` is a multi-line file, then `ollama run model < myprompt.txt` would treat `myprompt.txt`'s entire contents as the prompt. Co-authored-by: Quinn Slack <quinn@slack.org>	2023-11-14 16:42:21 -05:00
Bruce MacDonald	df18486c35	Move /generate format to optional parameters (#1127 ) This field is optional and should be under the `Advanced parameters` header	2023-11-14 16:12:30 -05:00
Jeffrey Morgan	4e612a2e92	use stdout fd for terminal size (#1125 )	2023-11-14 16:09:09 -05:00
Jeffrey Morgan	6e0f686afa	`--format json` should work in interactive mode	2023-11-14 10:22:03 -05:00
Jeffrey Morgan	c1844bbee2	add json mode to cli (#1095 )	2023-11-13 21:54:02 -05:00
Huy Le	cb745965ce	adding ollama.nvim for visibility (#1115 )	2023-11-13 17:00:17 -05:00
Enrico Ros	8d29b6a2b6	New big-AGI integration (#1078 ) * New big-AGI integration Ollama works great in big-AGI, and this document explains how to link the two projects. * Update README.md	2023-11-13 16:59:00 -05:00
Ilya Breitburg	724aa64bee	Add Dart library to README.md (#1106 )	2023-11-13 14:50:42 -05:00
Michael Yang	d91c103e74	Merge pull request #1055 from dansreis/946-fix-incorrect-base-model-name Fixed incorrect base model name	2023-11-13 08:42:55 -08:00
Kevin Hermawan	98ec7d81e3	Add OllamaKit to the community integrations (#1085 )	2023-11-11 14:41:42 -08:00
Daniel Reis	7c438f2c53	Replaced method	2023-11-10 20:22:03 +00:00
Daniel Reis	6e46338d44	Reverting previous changes	2023-11-10 20:21:35 +00:00
Jeffrey Morgan	cdddd3df65	add `format` to example python client	2023-11-10 10:22:21 -08:00
Daniel Hiltgen	afa61bdf45	Merge pull request #1075 from jmorganca/dhiltgen/unexpected-eof Resume chunk download on UnexpectedEOF errors	2023-11-10 08:48:27 -08:00
Daniel Hiltgen	cc54a416c6	Resume chunk download on UnexpectedEOF errors If the chunk download is interrupted, resume from where we left off	2023-11-10 08:29:42 -08:00
Matt Williams	c819d7f68a	Merge pull request #955 from jmorganca/mattw/example-bash-compare docs: add examples using bash to compare models	2023-11-10 08:59:32 -06:00
Jeffrey Morgan	5cba29b9d6	JSON mode: add `"format" as an api parameter (#1051 ) * add `"format": "json"` as an API parameter --------- Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>	2023-11-09 16:44:02 -08:00
Daniel Reis	d17730356a	Removed inline parse model path	2023-11-09 22:44:26 +00:00
Daniel Reis	32d79a6eea	Using 'GetShortTagname' method instead	2023-11-09 22:40:37 +00:00
Bruce MacDonald	5b39503bcd	document specifying multiple stop params (#1061 )	2023-11-09 13:16:26 -08:00
Bruce MacDonald	1ae84bc2a2	skip gpu if less than 2GB VRAM are available (#1059 )	2023-11-09 13:16:16 -08:00
Bruce MacDonald	db8bf336fc	Update README.md	2023-11-09 12:53:24 -08:00
Nick Anderson	d77e094a90	Added gptel to list of integrations (#1062 )	2023-11-09 12:52:36 -08:00
Matt Williams	dd3dc47ddb	Merge pull request #992 from aashish2057/aashish2057/langchainjs_doc_update	2023-11-09 05:08:31 -08:00
Michael Yang	c5e1bbabda	instead of static number of parameters for each model family, get the real number from the tensors (#1022 ) * parse tensor info * refactor decoder * return actual parameter count * explicit rounding * s/Human/HumanNumber/	2023-11-08 17:55:46 -08:00
Bruce MacDonald	a49d6acc1e	add a complete /generate options example (#1035 )	2023-11-08 16:44:36 -08:00
Moritz Poldrack	6e9bcdb9b3	progressbar: make start and end seamless (#1042 )	2023-11-08 16:42:40 -08:00
Matt Williams	13086363bd	Update as per bmacd Signed-off-by: Matt Williams <m@technovangelist.com>	2023-11-08 18:09:05 -06:00
Bruce MacDonald	ec2a31e9b3	support raw generation requests (#952 ) - add the optional `raw` generate request parameter to bypass prompt formatting and response context -add raw request to docs	2023-11-08 14:05:02 -08:00
Amith Koujalgi	ec84c02d54	Add Ollama4j Java library to the list of community libraries (#1044 )	2023-11-08 11:04:32 -08:00
Kevin Hermawan	2a88b66bc9	Add Ollamac to community integrations (#1043 )	2023-11-08 11:01:09 -08:00
Jeffrey Morgan	2d0faea96c	clean up `README.md`	2023-11-08 00:03:29 -08:00
Jeffrey Morgan	637142181a	clean up `README.md`	2023-11-07 23:52:31 -08:00
Matt Williams	bcbff421c9	Merge pull request #1023 from jmorganca/mattw/wherearemodelsfaq	2023-11-07 17:59:54 -08:00
thealhu	1359d6cf3b	Fix sudo variable in install.sh (#1034 ) It was forgotten to replace sudo at one place with the variable for sudo.	2023-11-07 09:59:57 -08:00
Omar Magdy	6e2d0224d9	Added logseq ollama plugin (#1029 )	2023-11-07 09:58:13 -08:00
Ikko Eltociear Ashimine	921406f721	Update client.py (#1026 ) recieve -> receive	2023-11-07 09:55:47 -08:00
Michael Yang	c7047d7353	Merge pull request #959 from jmorganca/mxyng/example-k8s	2023-11-07 10:43:21 -06:00
Matt Williams	1d155caba3	docs: clarify where the models are stored in the faq Signed-off-by: Matt Williams <m@technovangelist.com>	2023-11-06 14:38:49 -08:00
Michael Yang	866324b9a5	Merge pull request #943 from tjbck/patch-1 doc: categorised community integrations + added ollama-webui	2023-11-06 11:35:39 -08:00
Michael Yang	145e060855	Apply suggestions from code review Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>	2023-11-06 11:32:23 -08:00
Michael Yang	146072113d	Merge pull request #993 from jmorganca/mxyng/cleanup cleanup upload and download errors	2023-11-06 11:32:12 -08:00
Timothy Jaeryang Baek	33d31d1b56	Merge branch 'main' into patch-1	2023-11-06 14:27:02 -05:00
Dr. David A. Kunz	274c6cbf4c	Added gen.nvim to community integrations (#996 )	2023-11-06 10:51:41 -08:00
Elton Renda	7ebbd89bbf	add hass-ollama-conversation (#999 )	2023-11-06 10:50:35 -08:00
Lars Grammel	9079b1bb6d	Add ModelFusion community integration (#1020 )	2023-11-06 10:46:16 -08:00
Timothy Jaeryang Baek	6febde7200	Merge branch 'main' into patch-1	2023-11-04 19:12:18 -05:00
pepperoni21	325cfcd9ff	Added ollama-rs to community integrations (#995 ) Co-authored-by: pepperoni21 <pepperoni2100@gmail.com>	2023-11-04 14:51:29 -07:00
Jeffrey Morgan	639d0fd070	Update README.md	2023-11-04 12:24:24 -07:00
Michael Yang	434a6f9d46	return last error	2023-11-03 16:49:51 -07:00
aashish2057	b13586cc72	update langchainjs doc	2023-11-03 18:45:19 -05:00
Michael Yang	84725ec7e3	refactor part reset	2023-11-03 09:20:32 -07:00
Michael Yang	dccac8c8fa	k8s example	2023-11-01 14:52:58 -07:00
Michael	f31961637f	Update README.md	2023-11-01 12:20:55 -04:00
Matt Williams	80362fedce	better readme Signed-off-by: Matt Williams <m@technovangelist.com>	2023-10-31 12:40:46 -07:00
Matt Williams	5757925060	add a gif Signed-off-by: Matt Williams <m@technovangelist.com>	2023-10-31 11:52:01 -07:00
Michael	4512301756	Update README.md	2023-10-31 13:25:36 -04:00
Matt Williams	2236a93efc	docs: add examples using bash to compare models Signed-off-by: Matt Williams <m@technovangelist.com>	2023-10-31 09:12:39 -07:00
Timothy Jaeryang Baek	96da0792e6	doc: OllamaSharp for .NET moved to libraries	2023-10-28 16:18:38 -05:00
Timothy Jaeryang Baek	95d24262fc	doc: categorised community integrations + added web-ui	2023-10-28 16:02:13 -05:00