Update README.md

2026-02-03 11:13:31 -05:00 · 2023-04-05 22:28:03 +02:00 · 2023-04-05 22:14:00 +02:00 · 2023-04-05 22:04:35 +02:00 · 2023-04-05 22:00:15 +02:00 · 2023-04-05 00:41:02 +02:00
2 changed files with 34 additions and 7 deletions
--- a/1
+++ b/1
@@ -24,7 +24,6 @@ image:
    FROM +go-deps
    ARG IMAGE=alpaca-cli-nomodel
    COPY +build/llama-cli /llama-cli
-    ENV MODEL_PATH=/model.bin
    ENTRYPOINT [ "/llama-cli" ]
    SAVE IMAGE --push $IMAGE

--- a/README.md
+++ b/README.md
@@ -1,14 +1,16 @@
 ## :camel: llama-cli


-llama-cli is a straightforward golang CLI interface for [llama.cpp](https://github.com/ggerganov/llama.cpp), providing a simple API and a command line interface that allows text generation using a GPT-based model like llama directly from the terminal.
+llama-cli is a straightforward golang CLI interface for [llama.cpp](https://github.com/ggerganov/llama.cpp), providing a simple API and a command line interface that allows text generation using a GPT-based model like llama directly from the terminal. It is also compatible with [gpt4all](https://github.com/nomic-ai/gpt4all) and [alpaca](https://github.com/tatsu-lab/stanford_alpaca).
+
+`llama-cli` uses https://github.com/go-skynet/llama, which is a fork of [llama.cpp](https://github.com/ggerganov/llama.cpp) providing golang binding.

 ## Container images

 To begin, run:

 ```
-docker run -ti --rm quay.io/go-skynet/llama-cli:v0.3  --instruction "What's an alpaca?" --topk 10000
+docker run -ti --rm quay.io/go-skynet/llama-cli:v0.4  --instruction "What's an alpaca?" --topk 10000 --model ...
 ```

 You will receive a response like the following:
@@ -55,7 +57,7 @@ This will generate text based on the given model and instruction.
 Example of starting the API with `docker`:

 ```bash
-docker run -p 8080:8080 -ti --rm quay.io/go-skynet/llama-cli:v0.3 api --context-size 700 --threads 4
+docker run -p 8080:8080 -ti --rm quay.io/go-skynet/llama-cli:v0.4 api --context-size 700 --threads 4
 ```

 And you'll see:
@@ -88,7 +90,7 @@ The API takes takes the following:
 | gpt4all       | GPT4ALL               | false          | Set to true for gpt4all models. |


-Once the server is running, you can make requests to it using HTTP. For example, to generate text based on an instruction, you can send a POST request to the `/predict` endpoint with the instruction as the request body:
+Once the server is running, you can start making requests to it using HTTP. For example, to generate text based on an instruction, you can send a POST request to the `/predict` endpoint with the instruction as the request body:

 ```
 curl --location --request POST 'http://localhost:8080/predict' --header 'Content-Type: application/json' --data-raw '{
@@ -100,6 +102,8 @@ curl --location --request POST 'http://localhost:8080/predict' --header 'Content
 }'
 ```

+There is also available a simple web interface (for instance, http://localhost:8080/) which can be used as a playground.
+
 Note: The API doesn't inject a template for talking to the instance, while the CLI does. You have to use a prompt similar to what's described in the standford-alpaca docs: https://github.com/tatsu-lab/stanford_alpaca#data-release, for instance:

 ```
@@ -120,10 +124,10 @@ You can specify a model binary to be used for inference with `--model`.
 ```
 # Download the model image, extract the model
 # Use the model with llama-cli
-docker run -v $PWD:/models -p 8080:8080 -ti --rm quay.io/go-skynet/llama-cli:v0.3-lite api --model /models/model.bin
+docker run -v $PWD:/models -p 8080:8080 -ti --rm quay.io/go-skynet/llama-cli:v0.4 api --model /models/model.bin
 ```

-gpt4all (https://github.com/nomic-ai/gpt4all) works as well, however the original model needs to be converted:
+gpt4all (https://github.com/nomic-ai/gpt4all) works as well, however the original model needs to be converted (same applies for old alpaca models, too):

 ```bash
 wget -O tokenizer.model https://huggingface.co/decapoda-research/llama-30b-hf/resolve/main/tokenizer.model
@@ -132,6 +136,7 @@ cp gpt4all.. models/
 git clone https://gist.github.com/eiz/828bddec6162a023114ce19146cb2b82
 pip install sentencepiece
 python 828bddec6162a023114ce19146cb2b82/gistfile1.txt models tokenizer.model
+# There will be a new model with the ".tmp" extension, you have to use that one!
 ```

 ### Golang client API
@@ -160,6 +165,10 @@ func main() {
 }
 ```

+### Windows compatibility
+
+It should work, however you need to make sure you give enough resources to the container. See https://github.com/go-skynet/llama-cli/issues/2
+
 ### Kubernetes

 You can run the API directly in Kubernetes:
@@ -189,3 +198,22 @@ docker run --privileged -v /var/run/docker.sock:/var/run/docker.sock --rm -t -v
 # run the binary
 ./llama-cli --instruction "What's an alpaca?"
 ```
+
+## Short-term roadmap
+
+- Mimic OpenAI API (https://github.com/go-skynet/llama-cli/issues/10)
+- Binary releases (https://github.com/go-skynet/llama-cli/issues/6)
+- Upstream our golang bindings to llama.cpp (https://github.com/ggerganov/llama.cpp/issues/351)
+- Multi-model support
+- Full Deployment and compatibility with https://github.com/mckaywrigley/chatbot-ui
+
+## License
+
+MIT
+
+## Acknowledgements
+
+- [llama.cpp](https://github.com/ggerganov/llama.cpp)
+- https://github.com/tatsu-lab/stanford_alpaca
+- https://github.com/cornelk/llama-go for the initial ideas
+- https://github.com/antimatter15/alpaca.cpp for the light model version (this is compatible and tested only with that checkpoint model!)
Author	SHA1	Message	Date
Ettore Di Giacinto	b7c0a108f5	Update README.md	2023-04-05 22:28:03 +02:00
Ettore Di Giacinto	f694a89c28	Update README.md	2023-04-05 22:14:00 +02:00
Ettore Di Giacinto	be682e6c2f	Update README.md Add short-term roadmap and mention webui	2023-04-05 22:04:35 +02:00
mudler	bf85a31f9e	Don't set a default model path	2023-04-05 22:00:15 +02:00
Ettore Di Giacinto	d69048e0b0	Update README.md	2023-04-05 00:41:02 +02:00
mudler	827f189163	Update README	2023-03-30 18:46:11 +02:00