Drop duplicate target

Add gpt4all instructions
Drop fat images, will document how to consume models
2026-02-03 03:02:38 -05:00 · 2023-03-29 19:44:41 +02:00 · 2023-03-29 18:58:54 +02:00 · 2023-03-29 18:55:24 +02:00 · 2023-03-29 18:53:24 +02:00 · 2023-03-27 01:18:14 +02:00
8 changed files with 200 additions and 45 deletions
--- a/17
+++ b/17
@@ -11,11 +11,6 @@ go-deps:
    SAVE ARTIFACT go.mod AS LOCAL go.mod
    SAVE ARTIFACT go.sum AS LOCAL go.sum

-model-image:
-    ARG MODEL_IMAGE=quay.io/go-skynet/models:ggml2-alpaca-7b-v0.2
-    FROM $MODEL_IMAGE
-    SAVE ARTIFACT /models/model.bin
-
 build:
    FROM +go-deps
    WORKDIR /build
@@ -27,21 +22,11 @@ build:

 image:
    FROM +go-deps
-    ARG IMAGE=alpaca-cli
-    COPY +model-image/model.bin /model.bin
+    ARG IMAGE=alpaca-cli-nomodel
    COPY +build/llama-cli /llama-cli
    ENV MODEL_PATH=/model.bin
    ENTRYPOINT [ "/llama-cli" ]
    SAVE IMAGE --push $IMAGE

-lite-image:
-    FROM +go-deps
-    ARG IMAGE=alpaca-cli-nomodel
-    COPY +build/llama-cli /llama-cli
-    ENV MODEL_PATH=/model.bin
-    ENTRYPOINT [ "/llama-cli" ]
-    SAVE IMAGE --push $IMAGE-lite
-
 image-all:
    BUILD --platform=linux/amd64 --platform=linux/arm64 +image
-    BUILD --platform=linux/amd64 --platform=linux/arm64 +lite-image
--- a/README.md
+++ b/README.md
@@ -5,10 +5,10 @@ llama-cli is a straightforward golang CLI interface for [llama.cpp](https://gith

 ## Container images

-The `llama-cli` [container images](https://quay.io/repository/go-skynet/llama-cli?tab=tags&tag=latest) come preloaded with the [alpaca.cpp 7B](https://github.com/antimatter15/alpaca.cpp) model, enabling you to start making predictions immediately! To begin, run:
+To begin, run:

 ```
-docker run -ti --rm quay.io/go-skynet/llama-cli:v0.2  --instruction "What's an alpaca?" --topk 10000
+docker run -ti --rm quay.io/go-skynet/llama-cli:v0.3  --instruction "What's an alpaca?" --topk 10000
 ```

 You will receive a response like the following:
@@ -38,6 +38,7 @@ llama-cli --model <model_path> --instruction <instruction> [--input <input>] [--
 | top_k        | TOP_K                | 20            | The number of top-k tokens to consider for text generation.  |
 | context-size | CONTEXT_SIZE         | 512           | Default token context size. |
 | alpaca       | ALPACA               | true          | Set to true for alpaca models. |
+| gpt4all       | GPT4ALL               | false          | Set to true for gpt4all models. |

 Here's an example of using `llama-cli`:

@@ -49,12 +50,12 @@ This will generate text based on the given model and instruction.

 ## Advanced usage

-`llama-cli` also provides an API for running text generation as a service. 
+`llama-cli` also provides an API for running text generation as a service. The model will be pre-loaded and kept in memory.

 Example of starting the API with `docker`:

 ```bash
-docker run -p 8080:8080 -ti --rm quay.io/go-skynet/llama-cli:v0.2 api
+docker run -p 8080:8080 -ti --rm quay.io/go-skynet/llama-cli:v0.3 api --context-size 700 --threads 4
 ```

 And you'll see:
@@ -84,6 +85,7 @@ The API takes takes the following:
 | address      | ADDRESS              | :8080         | The address and port to listen on. |
 | context-size | CONTEXT_SIZE         | 512           | Default token context size. |
 | alpaca       | ALPACA               | true          | Set to true for alpaca models. |
+| gpt4all       | GPT4ALL               | false          | Set to true for gpt4all models. |


 Once the server is running, you can make requests to it using HTTP. For example, to generate text based on an instruction, you can send a POST request to the `/predict` endpoint with the instruction as the request body:
@@ -111,30 +113,25 @@ Below is an instruction that describes a task. Write a response that appropriate

 ## Using other models

-You can use the lite images ( for example `quay.io/go-skynet/llama-cli:v0.2-lite`) that don't ship any model, and specify a model binary to be used for inference with `--model`.
+You can specify a model binary to be used for inference with `--model`.

-13B and 30B models are known to work:
-
-### 13B
+13B and 30B alpaca models are known to work:

 ```
 # Download the model image, extract the model
-docker run --name model --entrypoint /models quay.io/go-skynet/models:ggml2-alpaca-13b-v0.2
-docker cp model:/models/model.bin ./
-
 # Use the model with llama-cli
-docker run -v $PWD:/models -p 8080:8080 -ti --rm quay.io/go-skynet/llama-cli:v0.2-lite api --model /models/model.bin
+docker run -v $PWD:/models -p 8080:8080 -ti --rm quay.io/go-skynet/llama-cli:v0.3-lite api --model /models/model.bin
 ```

-### 30B
+gpt4all (https://github.com/nomic-ai/gpt4all) works as well, however the original model needs to be converted:

-```
-# Download the model image, extract the model
-docker run --name model --entrypoint /models quay.io/go-skynet/models:ggml2-alpaca-30b-v0.2
-docker cp model:/models/model.bin ./
-
-# Use the model with llama-cli
-docker run -v $PWD:/models -p 8080:8080 -ti --rm quay.io/go-skynet/llama-cli:v0.2-lite api --model /models/model.bin
+```bash
+wget -O tokenizer.model https://huggingface.co/decapoda-research/llama-30b-hf/resolve/main/tokenizer.model
+mkdir models
+cp gpt4all.. models/
+git clone https://gist.github.com/eiz/828bddec6162a023114ce19146cb2b82
+pip install sentencepiece
+python 828bddec6162a023114ce19146cb2b82/gistfile1.txt models tokenizer.model
 ```

 ### Golang client API
@@ -169,4 +166,26 @@ You can run the API directly in Kubernetes:

 ```bash
 kubectl apply -f https://raw.githubusercontent.com/go-skynet/llama-cli/master/kubernetes/deployment.yaml
-```
+```
+
+### Build locally
+
+Pre-built images might fit well for most of the modern hardware, however you can and might need to build the images manually.
+
+In order to build the `llama-cli` container image locally you can use `docker`:
+
+```
+# build the image as "alpaca-image"
+docker run --privileged -v /var/run/docker.sock:/var/run/docker.sock --rm -t -v "$(pwd)":/workspace -v earthly-tmp:/tmp/earthly:rw earthly/earthly:v0.7.2 +image --IMAGE=alpaca-image
+# run the image
+docker run alpaca-image --instruction "What's an alpaca?"
+```
+
+Or build the binary with:
+
+```
+# build the image as "alpaca-image"
+docker run --privileged -v /var/run/docker.sock:/var/run/docker.sock --rm -t -v "$(pwd)":/workspace -v earthly-tmp:/tmp/earthly:rw earthly/earthly:v0.7.2 +build
+# run the binary
+./llama-cli --instruction "What's an alpaca?"
+```
--- a/api.go
+++ b/api.go
@@ -1,15 +1,25 @@
 package main

 import (
+	"embed"
+	"net/http"
 	"strconv"
+	"sync"

 	llama "github.com/go-skynet/llama/go"
 	"github.com/gofiber/fiber/v2"
+	"github.com/gofiber/fiber/v2/middleware/filesystem"
 )

+//go:embed index.html
+var indexHTML embed.FS
+
 func api(l *llama.LLama, listenAddr string, threads int) error {
 	app := fiber.New()
-
+	app.Use("/", filesystem.New(filesystem.Config{
+		Root:         http.FS(indexHTML),
+		NotFoundFile: "index.html",
+	}))
 	/*
 		curl --location --request POST 'http://localhost:8080/predict' --header 'Content-Type: application/json' --data-raw '{
 		    "text": "What is an alpaca?",
@@ -19,9 +29,12 @@ func api(l *llama.LLama, listenAddr string, threads int) error {
 		    "tokens": 100
 		}'
 	*/
+	var mutex = &sync.Mutex{}

 	// Endpoint to generate the prediction
 	app.Post("/predict", func(c *fiber.Ctx) error {
+		mutex.Lock()
+		defer mutex.Unlock()
 		// Get input data from the request body
 		input := new(struct {
 			Text string `json:"text"`
--- a/go.mod
+++ b/go.mod
@@ -6,7 +6,7 @@ require (
 	github.com/charmbracelet/bubbles v0.15.0
 	github.com/charmbracelet/bubbletea v0.23.2
 	github.com/charmbracelet/lipgloss v0.7.1
-	github.com/go-skynet/llama v0.0.0-20230321172246-7be5326e18cc
+	github.com/go-skynet/llama v0.0.0-20230329165201-84efc8db3647
 	github.com/gofiber/fiber/v2 v2.42.0
 	github.com/urfave/cli/v2 v2.25.0
 )
@@ -40,6 +40,6 @@ require (
 	github.com/xrash/smetrics v0.0.0-20201216005158-039620a65673 // indirect
 	golang.org/x/sync v0.1.0 // indirect
 	golang.org/x/sys v0.6.0 // indirect
-	golang.org/x/term v0.0.0-20210927222741-03fcf44c2211 // indirect
-	golang.org/x/text v0.3.7 // indirect
+	golang.org/x/term v0.5.0 // indirect
+	golang.org/x/text v0.7.0 // indirect
 )
--- a/go.sum
+++ b/go.sum
@@ -21,6 +21,10 @@ github.com/cpuguy83/go-md2man/v2 v2.0.2 h1:p1EgwI/C7NhT0JmVkwCD2ZBK8j4aeHQX2pMHH
 github.com/cpuguy83/go-md2man/v2 v2.0.2/go.mod h1:tgQtvFlXSQOSOSIRvRPT7W67SCa46tRHOmNcaadrF8o=
 github.com/go-skynet/llama v0.0.0-20230321172246-7be5326e18cc h1:NcmO8mA7iRZIX0Qy2SjcsSaV14+g87MiTey1neUJaFQ=
 github.com/go-skynet/llama v0.0.0-20230321172246-7be5326e18cc/go.mod h1:ZtYsAIud4cvP9VTTI9uhdgR1uCwaO/gGKnZZ95h9i7w=
+github.com/go-skynet/llama v0.0.0-20230325223742-a3563a2690ba h1:u6OhAqlWFHsTjfWKePdK2kP4/mTyXX5vsmKwrK5QX6o=
+github.com/go-skynet/llama v0.0.0-20230325223742-a3563a2690ba/go.mod h1:ZtYsAIud4cvP9VTTI9uhdgR1uCwaO/gGKnZZ95h9i7w=
+github.com/go-skynet/llama v0.0.0-20230329165201-84efc8db3647 h1:W6qHHD/Bv6wRXSzdv38gWMAXgw3fklHyEblfw88uEUU=
+github.com/go-skynet/llama v0.0.0-20230329165201-84efc8db3647/go.mod h1:ZtYsAIud4cvP9VTTI9uhdgR1uCwaO/gGKnZZ95h9i7w=
 github.com/gofiber/fiber/v2 v2.42.0 h1:Fnp7ybWvS+sjNQsFvkhf4G8OhXswvB6Vee8hM/LyS+8=
 github.com/gofiber/fiber/v2 v2.42.0/go.mod h1:3+SGNjqMh5VQH5Vz2Wdi43zTIV16ktlFd3x3R6O1Zlc=
 github.com/google/uuid v1.3.0 h1:t6JiXgmwXMjEs8VusXIJk2BXHsn+wx8BZdTaoZ5fu7I=
@@ -108,13 +112,15 @@ golang.org/x/sys v0.0.0-20220811171246-fbc7d0a398ab/go.mod h1:oPkhp1MJrh7nUepCBc
 golang.org/x/sys v0.6.0 h1:MVltZSvRTcU2ljQOhs94SXPftV6DCNnZViHeQps87pQ=
 golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
 golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
-golang.org/x/term v0.0.0-20210927222741-03fcf44c2211 h1:JGgROgKl9N8DuW20oFS5gxc+lE67/N3FcwmBPMe7ArY=
 golang.org/x/term v0.0.0-20210927222741-03fcf44c2211/go.mod h1:jbD1KX2456YbFQfuXm/mYQcufACuNUgVhRMnK/tPxf8=
+golang.org/x/term v0.5.0 h1:n2a8QNdAb0sZNpU9R1ALUXBbY+w51fCQDN+7EdxNBsY=
+golang.org/x/term v0.5.0/go.mod h1:jMB1sMXY+tzblOD4FWmEbocvup2/aLOaQEp7JmGp78k=
 golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
 golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
 golang.org/x/text v0.3.6/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
-golang.org/x/text v0.3.7 h1:olpwvP2KacW1ZWvsR7uQhoyTYvKAupfQrRGBFM352Gk=
 golang.org/x/text v0.3.7/go.mod h1:u+2+/6zg+i71rQMx5EYifcz6MCKuco9NR6JIITiCfzQ=
+golang.org/x/text v0.7.0 h1:4BRB4x83lYWy72KwLD/qYDuTu7q9PjSagHvijDw7cLo=
+golang.org/x/text v0.7.0/go.mod h1:mrYo+phRRbMaCq/xk9113O4dZlRixOauAjOtrjsXDZ8=
 golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
 golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo=
 golang.org/x/tools v0.0.0-20201022035929-9cf592e881e9/go.mod h1:emZCQorbCU4vsT4fOWvOPXz4eW1wZW4PmDk9uLelYpA=
--- a/index.html
+++ b/index.html
@@ -0,0 +1,120 @@
+<!DOCTYPE html>
+<html>
+<head>
+    <title>llama-cli</title>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.3/css/all.min.css" crossorigin="anonymous" referrerpolicy="no-referrer" />
+    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css">
+</head>
+<style>
+    @keyframes rotating {
+    from {
+        transform: rotate(0deg);
+    }
+    to {
+        transform: rotate(360deg);
+    }
+}
+
+.waiting {
+    animation: rotating 1s linear infinite;
+}
+
+</style>
+<body>
+
+<div class="container mt-5" x-data="{ templates:[
+    {
+      name: 'Alpaca: Instruction without input',
+      text: `Below is an instruction that describes a task. Write a response that appropriately completes the request.
+
+### Instruction:
+{{.Instruction}}
+
+### Response:`,
+    },
+    {
+      name: 'Alpaca: Instruction with input',
+      text: `Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
+
+### Instruction:
+{{.Instruction}}
+
+### Input:
+{{.Input}}
+
+### Response:`,
+    }
+  ], selectedTemplate: '', selectedTemplateText: '' }">
+    <h1>llama-cli API</h1>
+    <div class="form-group">
+        <label for="inputText">Input Text:</label>
+        <textarea class="form-control" id="inputText" rows="6" placeholder="Your text input here..." x-text="selectedTemplateText"></textarea>
+    </div>
+    <div class="form-group">
+        <label for="templateSelect">Select Template:</label>
+        <select class="form-control" id="templateSelect" x-model="selectedTemplateText">
+            <option value="">None</option>
+            <template x-for="(template, index) in templates" :key="index">
+                <option :value="template.text" x-text="template.name"></option>
+            </template>
+        </select>
+    </div>
+    <div class="form-group">
+        <label for="topP">Top P:</label>
+        <input type="range" step="0.01" min="0" max="1" class="form-control" id="topP" value="0.20" name="topP" onchange="this.nextElementSibling.value = this.value" required>
+        <output>0.20</output>
+    </div>
+    <div class="form-group">
+        <label for="topK">Top K:</label>
+        <input type="number" class="form-control" id="topK" value="10000" name="topK"  required>
+    </div>
+    <div class="form-group">
+        <label for="temperature">Temperature:</label>
+        <input type="range" step="0.01" min="0" max="1" value="0.9" class="form-control" id="temperature" name="temperature" onchange="this.nextElementSibling.value = this.value"  required>
+        <output>0.9</output>
+    </div>
+    <div class="form-group">
+        <label for="tokens">Tokens:</label>
+        <input type="number" class="form-control" id="tokens" name="tokens" value="128" required>
+    </div>
+    <button class="btn btn-primary" x-on:click="submitRequest()">Submit <i class="fas fa-paper-plane"></i></button>
+    <hr>
+    <div class="form-group">
+        <label for="outputText">Output Text:</label>
+        <textarea class="form-control" id="outputText" rows="5" readonly></textarea>
+    </div>
+</div>
+
+<script defer src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js"></script>
+<script>
+    function submitRequest() {
+        var button = document.querySelector("i.fa-paper-plane");
+        button.classList.add("waiting");
+        var text = document.getElementById("inputText").value;
+        var url = "/predict";
+        var data = {
+            "text": text,
+            "topP": document.getElementById("topP").value,
+            "topK": document.getElementById("topK").value,
+            "temperature": document.getElementById("temperature").value,
+            "tokens": document.getElementById("tokens").value
+        };
+        fetch(url, {
+            method: "POST",
+            headers: {
+                "Content-Type": "application/json"
+            },
+            body: JSON.stringify(data)
+        })
+        .then(response => response.json())
+        .then(data => {
+            document.getElementById("outputText").value = data.prediction;
+            button.classList.remove("waiting");
+        })
+        .catch(error => { console.error(error); button.classList.remove("waiting"); });
+    }
+</script>
+</body>
+</html>
--- a/kubernetes/deployment.yaml
+++ b/kubernetes/deployment.yaml
@@ -25,7 +25,7 @@ spec:
        - name: llama
          args:
          - api
-          image: quay.io/go-skynet/llama-cli:v0.1
+          image: quay.io/go-skynet/llama-cli:v0.3
 ---
 apiVersion: v1
 kind: Service
--- a/main.go
+++ b/main.go
@@ -36,7 +36,9 @@ func llamaFromOptions(ctx *cli.Context) (*llama.LLama, error) {
 	if ctx.Bool("alpaca") {
 		opts = append(opts, llama.EnableAlpaca)
 	}
-
+	if ctx.Bool("gpt4all") {
+		opts = append(opts, llama.EnableGPT4All)
+	}
 	return llama.New(ctx.String("model"), opts...)
 }

@@ -95,6 +97,11 @@ var modelFlags = []cli.Flag{
 		EnvVars: []string{"ALPACA"},
 		Value:   true,
 	},
+	&cli.BoolFlag{
+		Name:    "gpt4all",
+		EnvVars: []string{"GPT4ALL"},
+		Value:   false,
+	},
 }

 func main() {
@@ -168,6 +175,11 @@ echo "An Alpaca (Vicugna pacos) is a domesticated species of South American came
 						EnvVars: []string{"ALPACA"},
 						Value:   true,
 					},
+					&cli.BoolFlag{
+						Name:    "gpt4all",
+						EnvVars: []string{"GPT4ALL"},
+						Value:   false,
+					},
 					&cli.IntFlag{
 						Name:    "context-size",
 						EnvVars: []string{"CONTEXT_SIZE"},
Author	SHA1	Message	Date
mudler	a23deb5ec7	Drop duplicate target	2023-03-29 19:44:41 +02:00
mudler	999676b106	Add gpt4all instructions	2023-03-29 18:58:54 +02:00
mudler	c61b023bc8	Drop fat images, will document how to consume models	2023-03-29 18:55:24 +02:00
mudler	650a22aef1	Add compatibility to gpt4all models	2023-03-29 18:53:24 +02:00
mudler	17b1724f7c	Update llama-go	2023-03-27 01:18:14 +02:00
mudler	e860e62036	Add mutex, build only lite images	2023-03-27 01:01:38 +02:00
Ettore Di Giacinto	1f45ff8cd6	Update README.md	2023-03-26 23:37:26 +02:00
mudler	abee34f60a	Cleanup leftover	2023-03-25 01:10:50 +01:00
mudler	dbc70dc13c	Add a simple web-page as index of the API for helping with inference testing	2023-03-25 01:09:51 +01:00
mudler	55142065eb	Update README with building instructions	2023-03-24 01:11:13 +01:00
mudler	d83d2293b5	Update version in kubernetes deployment	2023-03-23 23:22:43 +01:00
mudler	467ce5a7aa	Update models download instructions, update images	2023-03-23 22:06:41 +01:00