mirror of
https://github.com/bentoml/OpenLLM.git
synced 2026-04-29 03:13:44 -04:00
chore: update examples and readme
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
This commit is contained in:
164
README.md
164
README.md
@@ -42,7 +42,7 @@ For starter, we provide two ways to quickly try out OpenLLM:
|
||||
|
||||
### Jupyter Notebooks
|
||||
|
||||
Try this [OpenLLM tutorial in Google Colab: Serving Llama 2 with OpenLLM](https://colab.research.google.com/github/bentoml/OpenLLM/blob/main/examples/llama2.ipynb).
|
||||
Try this [OpenLLM tutorial in Google Colab: Serving Phi 3 with OpenLLM](https://colab.research.google.com/github/bentoml/OpenLLM/blob/main/examples/llama2.ipynb).
|
||||
|
||||
## 🏃 Get started
|
||||
|
||||
@@ -98,22 +98,20 @@ OpenLLM currently supports the following models. By default, OpenLLM doesn't inc
|
||||
|
||||
<summary>Baichuan</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** Baichuan requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[baichuan]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a Baichuan server:
|
||||
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start baichuan-inc/baichuan-7b
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -121,16 +119,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any Baichuan variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=baichuan) to see more Baichuan-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following Baichuan models via `openllm start`:
|
||||
|
||||
|
||||
- [baichuan-inc/baichuan2-7b-base](https://huggingface.co/baichuan-inc/baichuan2-7b-base)
|
||||
- [baichuan-inc/baichuan2-7b-chat](https://huggingface.co/baichuan-inc/baichuan2-7b-chat)
|
||||
- [baichuan-inc/baichuan2-13b-base](https://huggingface.co/baichuan-inc/baichuan2-13b-base)
|
||||
@@ -142,22 +136,20 @@ You can specify any of the following Baichuan models via `openllm start`:
|
||||
|
||||
<summary>ChatGLM</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** ChatGLM requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[chatglm]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a ChatGLM server:
|
||||
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start thudm/chatglm-6b
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -165,16 +157,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any ChatGLM variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=chatglm) to see more ChatGLM-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following ChatGLM models via `openllm start`:
|
||||
|
||||
|
||||
- [thudm/chatglm-6b](https://huggingface.co/thudm/chatglm-6b)
|
||||
- [thudm/chatglm-6b-int8](https://huggingface.co/thudm/chatglm-6b-int8)
|
||||
- [thudm/chatglm-6b-int4](https://huggingface.co/thudm/chatglm-6b-int4)
|
||||
@@ -188,22 +176,20 @@ You can specify any of the following ChatGLM models via `openllm start`:
|
||||
|
||||
<summary>Dbrx</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** Dbrx requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[dbrx]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a Dbrx server:
|
||||
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start databricks/dbrx-instruct
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -211,16 +197,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any Dbrx variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=dbrx) to see more Dbrx-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following Dbrx models via `openllm start`:
|
||||
|
||||
|
||||
- [databricks/dbrx-instruct](https://huggingface.co/databricks/dbrx-instruct)
|
||||
- [databricks/dbrx-base](https://huggingface.co/databricks/dbrx-base)
|
||||
|
||||
@@ -230,7 +212,6 @@ You can specify any of the following Dbrx models via `openllm start`:
|
||||
|
||||
<summary>DollyV2</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
Run the following command to quickly spin up a DollyV2 server:
|
||||
@@ -238,6 +219,7 @@ Run the following command to quickly spin up a DollyV2 server:
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start databricks/dolly-v2-3b
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -245,16 +227,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any DollyV2 variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=dolly_v2) to see more DollyV2-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following DollyV2 models via `openllm start`:
|
||||
|
||||
|
||||
- [databricks/dolly-v2-3b](https://huggingface.co/databricks/dolly-v2-3b)
|
||||
- [databricks/dolly-v2-7b](https://huggingface.co/databricks/dolly-v2-7b)
|
||||
- [databricks/dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b)
|
||||
@@ -265,22 +243,20 @@ You can specify any of the following DollyV2 models via `openllm start`:
|
||||
|
||||
<summary>Falcon</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** Falcon requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[falcon]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a Falcon server:
|
||||
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start tiiuae/falcon-7b
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -288,16 +264,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any Falcon variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=falcon) to see more Falcon-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following Falcon models via `openllm start`:
|
||||
|
||||
|
||||
- [tiiuae/falcon-7b](https://huggingface.co/tiiuae/falcon-7b)
|
||||
- [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b)
|
||||
- [tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct)
|
||||
@@ -309,22 +281,20 @@ You can specify any of the following Falcon models via `openllm start`:
|
||||
|
||||
<summary>Gemma</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** Gemma requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[gemma]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a Gemma server:
|
||||
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start google/gemma-7b
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -332,16 +302,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any Gemma variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=gemma) to see more Gemma-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following Gemma models via `openllm start`:
|
||||
|
||||
|
||||
- [google/gemma-7b](https://huggingface.co/google/gemma-7b)
|
||||
- [google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it)
|
||||
- [google/gemma-2b](https://huggingface.co/google/gemma-2b)
|
||||
@@ -353,7 +319,6 @@ You can specify any of the following Gemma models via `openllm start`:
|
||||
|
||||
<summary>GPTNeoX</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
Run the following command to quickly spin up a GPTNeoX server:
|
||||
@@ -361,6 +326,7 @@ Run the following command to quickly spin up a GPTNeoX server:
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start eleutherai/gpt-neox-20b
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -368,16 +334,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any GPTNeoX variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=gpt_neox) to see more GPTNeoX-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following GPTNeoX models via `openllm start`:
|
||||
|
||||
|
||||
- [eleutherai/gpt-neox-20b](https://huggingface.co/eleutherai/gpt-neox-20b)
|
||||
|
||||
</details>
|
||||
@@ -386,22 +348,20 @@ You can specify any of the following GPTNeoX models via `openllm start`:
|
||||
|
||||
<summary>Llama</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** Llama requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[llama]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a Llama server:
|
||||
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start NousResearch/llama-2-7b-hf
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -409,16 +369,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any Llama variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=llama) to see more Llama-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following Llama models via `openllm start`:
|
||||
|
||||
|
||||
- [meta-llama/Llama-2-70b-chat-hf](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf)
|
||||
- [meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf)
|
||||
- [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
|
||||
@@ -438,22 +394,20 @@ You can specify any of the following Llama models via `openllm start`:
|
||||
|
||||
<summary>Mistral</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** Mistral requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[mistral]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a Mistral server:
|
||||
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start mistralai/Mistral-7B-Instruct-v0.1
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -461,16 +415,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any Mistral variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=mistral) to see more Mistral-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following Mistral models via `openllm start`:
|
||||
|
||||
|
||||
- [HuggingFaceH4/zephyr-7b-alpha](https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha)
|
||||
- [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)
|
||||
- [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
|
||||
@@ -483,22 +433,20 @@ You can specify any of the following Mistral models via `openllm start`:
|
||||
|
||||
<summary>Mixtral</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** Mixtral requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[mixtral]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a Mixtral server:
|
||||
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start mistralai/Mixtral-8x7B-Instruct-v0.1
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -506,16 +454,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any Mixtral variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=mixtral) to see more Mixtral-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following Mixtral models via `openllm start`:
|
||||
|
||||
|
||||
- [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
|
||||
- [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)
|
||||
|
||||
@@ -525,22 +469,20 @@ You can specify any of the following Mixtral models via `openllm start`:
|
||||
|
||||
<summary>MPT</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** MPT requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[mpt]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a MPT server:
|
||||
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start mosaicml/mpt-7b-instruct
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -548,16 +490,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any MPT variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=mpt) to see more MPT-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following MPT models via `openllm start`:
|
||||
|
||||
|
||||
- [mosaicml/mpt-7b](https://huggingface.co/mosaicml/mpt-7b)
|
||||
- [mosaicml/mpt-7b-instruct](https://huggingface.co/mosaicml/mpt-7b-instruct)
|
||||
- [mosaicml/mpt-7b-chat](https://huggingface.co/mosaicml/mpt-7b-chat)
|
||||
@@ -572,22 +510,20 @@ You can specify any of the following MPT models via `openllm start`:
|
||||
|
||||
<summary>OPT</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** OPT requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[opt]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a OPT server:
|
||||
|
||||
```bash
|
||||
openllm start facebook/opt-1.3b
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -595,16 +531,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any OPT variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=opt) to see more OPT-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following OPT models via `openllm start`:
|
||||
|
||||
|
||||
- [facebook/opt-125m](https://huggingface.co/facebook/opt-125m)
|
||||
- [facebook/opt-350m](https://huggingface.co/facebook/opt-350m)
|
||||
- [facebook/opt-1.3b](https://huggingface.co/facebook/opt-1.3b)
|
||||
@@ -618,22 +550,20 @@ You can specify any of the following OPT models via `openllm start`:
|
||||
|
||||
<summary>Phi</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** Phi requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[phi]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a Phi server:
|
||||
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start microsoft/Phi-3-mini-4k-instruct
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -641,16 +571,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any Phi variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=phi) to see more Phi-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following Phi models via `openllm start`:
|
||||
|
||||
|
||||
- [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)
|
||||
- [microsoft/Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)
|
||||
- [microsoft/Phi-3-small-8k-instruct](https://huggingface.co/microsoft/Phi-3-small-8k-instruct)
|
||||
@@ -664,22 +590,20 @@ You can specify any of the following Phi models via `openllm start`:
|
||||
|
||||
<summary>Qwen</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** Qwen requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[qwen]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a Qwen server:
|
||||
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start qwen/Qwen-7B-Chat
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -687,16 +611,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any Qwen variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=qwen) to see more Qwen-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following Qwen models via `openllm start`:
|
||||
|
||||
|
||||
- [qwen/Qwen-7B-Chat](https://huggingface.co/qwen/Qwen-7B-Chat)
|
||||
- [qwen/Qwen-7B-Chat-Int8](https://huggingface.co/qwen/Qwen-7B-Chat-Int8)
|
||||
- [qwen/Qwen-7B-Chat-Int4](https://huggingface.co/qwen/Qwen-7B-Chat-Int4)
|
||||
@@ -710,22 +630,20 @@ You can specify any of the following Qwen models via `openllm start`:
|
||||
|
||||
<summary>StableLM</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** StableLM requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[stablelm]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a StableLM server:
|
||||
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start stabilityai/stablelm-tuned-alpha-3b
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -733,16 +651,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any StableLM variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=stablelm) to see more StableLM-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following StableLM models via `openllm start`:
|
||||
|
||||
|
||||
- [stabilityai/stablelm-tuned-alpha-3b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-3b)
|
||||
- [stabilityai/stablelm-tuned-alpha-7b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b)
|
||||
- [stabilityai/stablelm-base-alpha-3b](https://huggingface.co/stabilityai/stablelm-base-alpha-3b)
|
||||
@@ -754,22 +668,20 @@ You can specify any of the following StableLM models via `openllm start`:
|
||||
|
||||
<summary>StarCoder</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** StarCoder requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[starcoder]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a StarCoder server:
|
||||
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start bigcode/starcoder
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -777,16 +689,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any StarCoder variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=starcoder) to see more StarCoder-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following StarCoder models via `openllm start`:
|
||||
|
||||
|
||||
- [bigcode/starcoder](https://huggingface.co/bigcode/starcoder)
|
||||
- [bigcode/starcoderbase](https://huggingface.co/bigcode/starcoderbase)
|
||||
|
||||
@@ -796,22 +704,20 @@ You can specify any of the following StarCoder models via `openllm start`:
|
||||
|
||||
<summary>Yi</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** Yi requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[yi]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a Yi server:
|
||||
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start 01-ai/Yi-6B
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -819,16 +725,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any Yi variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=yi) to see more Yi-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following Yi models via `openllm start`:
|
||||
|
||||
|
||||
- [01-ai/Yi-6B](https://huggingface.co/01-ai/Yi-6B)
|
||||
- [01-ai/Yi-34B](https://huggingface.co/01-ai/Yi-34B)
|
||||
- [01-ai/Yi-6B-200K](https://huggingface.co/01-ai/Yi-6B-200K)
|
||||
|
||||
@@ -27,25 +27,3 @@ python openai_chat_completion_client.py
|
||||
# For streaming set STREAM=True
|
||||
STREAM=True python openai_chat_completion_client.py
|
||||
```
|
||||
|
||||
### TinyLLM
|
||||
|
||||
The [`api_server.py`](./api_server.py) demos how one can easily write production-ready BentoML service with OpenLLM and vLLM.
|
||||
|
||||
Install requirements:
|
||||
|
||||
```bash
|
||||
pip install -U "openllm[vllm]"
|
||||
```
|
||||
|
||||
To serve the Bento (given you have access to GPU):
|
||||
|
||||
```bash
|
||||
bentoml serve api_server:svc
|
||||
```
|
||||
|
||||
To build the Bento do the following:
|
||||
|
||||
```bash
|
||||
bentoml build -f bentofile.yaml .
|
||||
```
|
||||
|
||||
@@ -1,48 +0,0 @@
|
||||
from __future__ import annotations
|
||||
import uuid, os
|
||||
from typing import Any, AsyncGenerator, Dict, TypedDict, Union
|
||||
|
||||
from bentoml import Service
|
||||
from bentoml.io import JSON, Text
|
||||
from openllm import LLM
|
||||
|
||||
os.environ['IMPLEMENTATION'] = 'deprecated'
|
||||
|
||||
llm = LLM[Any, Any]('HuggingFaceH4/zephyr-7b-alpha', backend='vllm')
|
||||
|
||||
|
||||
svc = Service('tinyllm', runners=[llm.runner])
|
||||
|
||||
|
||||
class GenerateInput(TypedDict):
|
||||
prompt: str
|
||||
stream: bool
|
||||
sampling_params: Dict[str, Any]
|
||||
|
||||
|
||||
@svc.api(
|
||||
route='/v1/generate',
|
||||
input=JSON.from_sample(
|
||||
GenerateInput(prompt='What is time?', stream=False, sampling_params={'temperature': 0.73, 'logprobs': 1})
|
||||
),
|
||||
output=Text(content_type='text/event-stream'),
|
||||
)
|
||||
async def generate(request: GenerateInput) -> Union[AsyncGenerator[str, None], str]:
|
||||
n = request['sampling_params'].pop('n', 1)
|
||||
request_id = f'tinyllm-{uuid.uuid4().hex}'
|
||||
previous_texts = [[]] * n
|
||||
|
||||
generator = llm.generate_iterator(request['prompt'], request_id=request_id, n=n, **request['sampling_params'])
|
||||
|
||||
async def streamer() -> AsyncGenerator[str, None]:
|
||||
async for request_output in generator:
|
||||
for output in request_output.outputs:
|
||||
i = output.index
|
||||
previous_texts[i].append(output.text)
|
||||
yield output.text
|
||||
|
||||
if request['stream']:
|
||||
return streamer()
|
||||
|
||||
async for _ in streamer(): pass
|
||||
return ''.join(previous_texts[0])
|
||||
@@ -1,6 +0,0 @@
|
||||
service: 'api_server.py:svc'
|
||||
include:
|
||||
- 'api_server.py'
|
||||
python:
|
||||
packages:
|
||||
- openllm[vllm]>=0.4.15
|
||||
@@ -1,26 +0,0 @@
|
||||
# LangChain + BentoML + OpenLLM
|
||||
|
||||
Run it locally:
|
||||
|
||||
```bash
|
||||
export BENTOML_CONFIG_OPTIONS="api_server.traffic.timeout=900 runners.traffic.timeout=900"
|
||||
bentoml serve
|
||||
```
|
||||
|
||||
Build Bento:
|
||||
|
||||
```bash
|
||||
bentoml build
|
||||
```
|
||||
|
||||
Generate docker image:
|
||||
|
||||
```bash
|
||||
bentoml containerize ...
|
||||
docker run \
|
||||
-e SERPAPI_API_KEY="__Your_SERP_API_key__" \
|
||||
-e BENTOML_CONFIG_OPTIONS="api_server.traffic.timeout=900 runners.traffic.timeout=900" \
|
||||
-p 3000:3000 \
|
||||
..image_name
|
||||
|
||||
```
|
||||
@@ -1,19 +0,0 @@
|
||||
# Copyright 2023 BentoML Team. All rights reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
service: 'service:svc'
|
||||
include:
|
||||
- '*.py'
|
||||
python:
|
||||
requirements_txt: ./requirements.txt
|
||||
@@ -1,4 +0,0 @@
|
||||
openllm
|
||||
langchain>=0.0.212
|
||||
pydantic
|
||||
BeautifulSoup4
|
||||
@@ -1,60 +0,0 @@
|
||||
from __future__ import annotations
|
||||
import typing as t
|
||||
|
||||
from langchain.chains import LLMChain
|
||||
from langchain.llms import OpenLLM
|
||||
from langchain.prompts import PromptTemplate
|
||||
from pydantic import BaseModel
|
||||
|
||||
import bentoml
|
||||
from bentoml.io import JSON, Text
|
||||
|
||||
|
||||
class Query(BaseModel):
|
||||
industry: str
|
||||
product_name: str
|
||||
keywords: t.List[str]
|
||||
llm_config: t.Dict[str, t.Any]
|
||||
|
||||
|
||||
def gen_llm(model_name: str, model_id: str | None = None, **attrs: t.Any) -> OpenLLM:
|
||||
lc_llm = OpenLLM(model_name=model_name, model_id=model_id, embedded=False, **attrs)
|
||||
lc_llm.runner.download_model()
|
||||
return lc_llm
|
||||
|
||||
|
||||
llm = gen_llm('llama', model_id='TheBloke/Llama-2-13B-chat-GPTQ', quantize='gptq')
|
||||
|
||||
prompt = PromptTemplate(
|
||||
input_variables=['industry', 'product_name', 'keywords'],
|
||||
template="""
|
||||
You are a Facebook Ads Copywriter with a strong background in persuasive
|
||||
writing and marketing. You craft compelling copy that appeals to the target
|
||||
audience's emotions and needs, peruading them to take action or make a
|
||||
purchase. You are given the following context to create a facebook ad copy.
|
||||
It should provide an attention-grabbing headline optimizied for capivating
|
||||
leads and perusaive calls to action.
|
||||
|
||||
Industry: {industry}
|
||||
Product: {product_name}
|
||||
Keywords: {keywords}
|
||||
Facebook Ads copy:
|
||||
""",
|
||||
)
|
||||
chain = LLMChain(llm=llm, prompt=prompt)
|
||||
|
||||
svc = bentoml.Service('fb-ads-copy', runners=[llm.runner])
|
||||
|
||||
SAMPLE_INPUT = Query(
|
||||
industry='SAAS',
|
||||
product_name='BentoML',
|
||||
keywords=['open source', 'developer tool', 'AI application platform', 'serverless', 'cost-efficient'],
|
||||
llm_config=llm.runner.config.model_dump(),
|
||||
)
|
||||
|
||||
|
||||
@svc.api(input=JSON.from_sample(sample=SAMPLE_INPUT), output=Text())
|
||||
def generate(query: Query):
|
||||
return chain.run(
|
||||
{'industry': query.industry, 'product_name': query.product_name, 'keywords': ', '.join(query.keywords)}
|
||||
)
|
||||
@@ -1,27 +0,0 @@
|
||||
# LangChain + BentoML + OpenLLM
|
||||
|
||||
Run it locally:
|
||||
|
||||
```bash
|
||||
export SERPAPI_API_KEY="__Your_SERP_API_key__"
|
||||
export BENTOML_CONFIG_OPTIONS="api_server.traffic.timeout=900 runners.traffic.timeout=900"
|
||||
bentoml serve
|
||||
```
|
||||
|
||||
Build Bento:
|
||||
|
||||
```bash
|
||||
bentoml build
|
||||
```
|
||||
|
||||
Generate docker image:
|
||||
|
||||
```bash
|
||||
bentoml containerize ...
|
||||
docker run \
|
||||
-e SERPAPI_API_KEY="__Your_SERP_API_key__" \
|
||||
-e BENTOML_CONFIG_OPTIONS="api_server.traffic.timeout=900 runners.traffic.timeout=900" \
|
||||
-p 3000:3000 \
|
||||
..image_name
|
||||
|
||||
```
|
||||
@@ -1,5 +0,0 @@
|
||||
service: 'service:svc'
|
||||
include:
|
||||
- '*.py'
|
||||
python:
|
||||
requirements_txt: './requirements.txt'
|
||||
@@ -1,8 +0,0 @@
|
||||
runners:
|
||||
llm-dolly-v2-runner:
|
||||
resources:
|
||||
nvidia.com/gpu: 2
|
||||
workers_per_resource: 0.5
|
||||
llm-stablelm-runner:
|
||||
resources:
|
||||
nvidia.com/gpu: 1
|
||||
@@ -1,3 +0,0 @@
|
||||
openllm
|
||||
langchain
|
||||
google-search-results
|
||||
@@ -1,19 +0,0 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from langchain.agents import AgentType, initialize_agent, load_tools
|
||||
from langchain.llms import OpenLLM
|
||||
|
||||
import bentoml
|
||||
from bentoml.io import Text
|
||||
|
||||
SAMPLE_INPUT = 'What is the weather in San Francisco?'
|
||||
|
||||
llm = OpenLLM(model_name='dolly-v2', model_id='databricks/dolly-v2-7b', embedded=False)
|
||||
tools = load_tools(['serpapi'], llm=llm)
|
||||
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION)
|
||||
svc = bentoml.Service('langchain-openllm', runners=[llm.runner])
|
||||
|
||||
|
||||
@svc.api(input=Text.from_sample(sample=SAMPLE_INPUT), output=Text())
|
||||
def chat(input_text: str):
|
||||
return agent.run(input_text)
|
||||
File diff suppressed because one or more lines are too long
164
openllm-python/README.md
generated
164
openllm-python/README.md
generated
@@ -42,7 +42,7 @@ For starter, we provide two ways to quickly try out OpenLLM:
|
||||
|
||||
### Jupyter Notebooks
|
||||
|
||||
Try this [OpenLLM tutorial in Google Colab: Serving Llama 2 with OpenLLM](https://colab.research.google.com/github/bentoml/OpenLLM/blob/main/examples/llama2.ipynb).
|
||||
Try this [OpenLLM tutorial in Google Colab: Serving Phi 3 with OpenLLM](https://colab.research.google.com/github/bentoml/OpenLLM/blob/main/examples/llama2.ipynb).
|
||||
|
||||
## 🏃 Get started
|
||||
|
||||
@@ -98,22 +98,20 @@ OpenLLM currently supports the following models. By default, OpenLLM doesn't inc
|
||||
|
||||
<summary>Baichuan</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** Baichuan requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[baichuan]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a Baichuan server:
|
||||
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start baichuan-inc/baichuan-7b
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -121,16 +119,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any Baichuan variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=baichuan) to see more Baichuan-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following Baichuan models via `openllm start`:
|
||||
|
||||
|
||||
- [baichuan-inc/baichuan2-7b-base](https://huggingface.co/baichuan-inc/baichuan2-7b-base)
|
||||
- [baichuan-inc/baichuan2-7b-chat](https://huggingface.co/baichuan-inc/baichuan2-7b-chat)
|
||||
- [baichuan-inc/baichuan2-13b-base](https://huggingface.co/baichuan-inc/baichuan2-13b-base)
|
||||
@@ -142,22 +136,20 @@ You can specify any of the following Baichuan models via `openllm start`:
|
||||
|
||||
<summary>ChatGLM</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** ChatGLM requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[chatglm]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a ChatGLM server:
|
||||
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start thudm/chatglm-6b
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -165,16 +157,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any ChatGLM variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=chatglm) to see more ChatGLM-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following ChatGLM models via `openllm start`:
|
||||
|
||||
|
||||
- [thudm/chatglm-6b](https://huggingface.co/thudm/chatglm-6b)
|
||||
- [thudm/chatglm-6b-int8](https://huggingface.co/thudm/chatglm-6b-int8)
|
||||
- [thudm/chatglm-6b-int4](https://huggingface.co/thudm/chatglm-6b-int4)
|
||||
@@ -188,22 +176,20 @@ You can specify any of the following ChatGLM models via `openllm start`:
|
||||
|
||||
<summary>Dbrx</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** Dbrx requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[dbrx]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a Dbrx server:
|
||||
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start databricks/dbrx-instruct
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -211,16 +197,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any Dbrx variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=dbrx) to see more Dbrx-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following Dbrx models via `openllm start`:
|
||||
|
||||
|
||||
- [databricks/dbrx-instruct](https://huggingface.co/databricks/dbrx-instruct)
|
||||
- [databricks/dbrx-base](https://huggingface.co/databricks/dbrx-base)
|
||||
|
||||
@@ -230,7 +212,6 @@ You can specify any of the following Dbrx models via `openllm start`:
|
||||
|
||||
<summary>DollyV2</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
Run the following command to quickly spin up a DollyV2 server:
|
||||
@@ -238,6 +219,7 @@ Run the following command to quickly spin up a DollyV2 server:
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start databricks/dolly-v2-3b
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -245,16 +227,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any DollyV2 variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=dolly_v2) to see more DollyV2-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following DollyV2 models via `openllm start`:
|
||||
|
||||
|
||||
- [databricks/dolly-v2-3b](https://huggingface.co/databricks/dolly-v2-3b)
|
||||
- [databricks/dolly-v2-7b](https://huggingface.co/databricks/dolly-v2-7b)
|
||||
- [databricks/dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b)
|
||||
@@ -265,22 +243,20 @@ You can specify any of the following DollyV2 models via `openllm start`:
|
||||
|
||||
<summary>Falcon</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** Falcon requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[falcon]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a Falcon server:
|
||||
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start tiiuae/falcon-7b
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -288,16 +264,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any Falcon variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=falcon) to see more Falcon-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following Falcon models via `openllm start`:
|
||||
|
||||
|
||||
- [tiiuae/falcon-7b](https://huggingface.co/tiiuae/falcon-7b)
|
||||
- [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b)
|
||||
- [tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct)
|
||||
@@ -309,22 +281,20 @@ You can specify any of the following Falcon models via `openllm start`:
|
||||
|
||||
<summary>Gemma</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** Gemma requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[gemma]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a Gemma server:
|
||||
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start google/gemma-7b
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -332,16 +302,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any Gemma variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=gemma) to see more Gemma-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following Gemma models via `openllm start`:
|
||||
|
||||
|
||||
- [google/gemma-7b](https://huggingface.co/google/gemma-7b)
|
||||
- [google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it)
|
||||
- [google/gemma-2b](https://huggingface.co/google/gemma-2b)
|
||||
@@ -353,7 +319,6 @@ You can specify any of the following Gemma models via `openllm start`:
|
||||
|
||||
<summary>GPTNeoX</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
Run the following command to quickly spin up a GPTNeoX server:
|
||||
@@ -361,6 +326,7 @@ Run the following command to quickly spin up a GPTNeoX server:
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start eleutherai/gpt-neox-20b
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -368,16 +334,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any GPTNeoX variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=gpt_neox) to see more GPTNeoX-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following GPTNeoX models via `openllm start`:
|
||||
|
||||
|
||||
- [eleutherai/gpt-neox-20b](https://huggingface.co/eleutherai/gpt-neox-20b)
|
||||
|
||||
</details>
|
||||
@@ -386,22 +348,20 @@ You can specify any of the following GPTNeoX models via `openllm start`:
|
||||
|
||||
<summary>Llama</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** Llama requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[llama]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a Llama server:
|
||||
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start NousResearch/llama-2-7b-hf
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -409,16 +369,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any Llama variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=llama) to see more Llama-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following Llama models via `openllm start`:
|
||||
|
||||
|
||||
- [meta-llama/Llama-2-70b-chat-hf](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf)
|
||||
- [meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf)
|
||||
- [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
|
||||
@@ -438,22 +394,20 @@ You can specify any of the following Llama models via `openllm start`:
|
||||
|
||||
<summary>Mistral</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** Mistral requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[mistral]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a Mistral server:
|
||||
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start mistralai/Mistral-7B-Instruct-v0.1
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -461,16 +415,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any Mistral variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=mistral) to see more Mistral-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following Mistral models via `openllm start`:
|
||||
|
||||
|
||||
- [HuggingFaceH4/zephyr-7b-alpha](https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha)
|
||||
- [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)
|
||||
- [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
|
||||
@@ -483,22 +433,20 @@ You can specify any of the following Mistral models via `openllm start`:
|
||||
|
||||
<summary>Mixtral</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** Mixtral requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[mixtral]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a Mixtral server:
|
||||
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start mistralai/Mixtral-8x7B-Instruct-v0.1
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -506,16 +454,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any Mixtral variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=mixtral) to see more Mixtral-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following Mixtral models via `openllm start`:
|
||||
|
||||
|
||||
- [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
|
||||
- [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)
|
||||
|
||||
@@ -525,22 +469,20 @@ You can specify any of the following Mixtral models via `openllm start`:
|
||||
|
||||
<summary>MPT</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** MPT requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[mpt]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a MPT server:
|
||||
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start mosaicml/mpt-7b-instruct
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -548,16 +490,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any MPT variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=mpt) to see more MPT-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following MPT models via `openllm start`:
|
||||
|
||||
|
||||
- [mosaicml/mpt-7b](https://huggingface.co/mosaicml/mpt-7b)
|
||||
- [mosaicml/mpt-7b-instruct](https://huggingface.co/mosaicml/mpt-7b-instruct)
|
||||
- [mosaicml/mpt-7b-chat](https://huggingface.co/mosaicml/mpt-7b-chat)
|
||||
@@ -572,22 +510,20 @@ You can specify any of the following MPT models via `openllm start`:
|
||||
|
||||
<summary>OPT</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** OPT requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[opt]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a OPT server:
|
||||
|
||||
```bash
|
||||
openllm start facebook/opt-1.3b
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -595,16 +531,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any OPT variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=opt) to see more OPT-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following OPT models via `openllm start`:
|
||||
|
||||
|
||||
- [facebook/opt-125m](https://huggingface.co/facebook/opt-125m)
|
||||
- [facebook/opt-350m](https://huggingface.co/facebook/opt-350m)
|
||||
- [facebook/opt-1.3b](https://huggingface.co/facebook/opt-1.3b)
|
||||
@@ -618,22 +550,20 @@ You can specify any of the following OPT models via `openllm start`:
|
||||
|
||||
<summary>Phi</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** Phi requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[phi]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a Phi server:
|
||||
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start microsoft/Phi-3-mini-4k-instruct
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -641,16 +571,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any Phi variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=phi) to see more Phi-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following Phi models via `openllm start`:
|
||||
|
||||
|
||||
- [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)
|
||||
- [microsoft/Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)
|
||||
- [microsoft/Phi-3-small-8k-instruct](https://huggingface.co/microsoft/Phi-3-small-8k-instruct)
|
||||
@@ -664,22 +590,20 @@ You can specify any of the following Phi models via `openllm start`:
|
||||
|
||||
<summary>Qwen</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** Qwen requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[qwen]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a Qwen server:
|
||||
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start qwen/Qwen-7B-Chat
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -687,16 +611,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any Qwen variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=qwen) to see more Qwen-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following Qwen models via `openllm start`:
|
||||
|
||||
|
||||
- [qwen/Qwen-7B-Chat](https://huggingface.co/qwen/Qwen-7B-Chat)
|
||||
- [qwen/Qwen-7B-Chat-Int8](https://huggingface.co/qwen/Qwen-7B-Chat-Int8)
|
||||
- [qwen/Qwen-7B-Chat-Int4](https://huggingface.co/qwen/Qwen-7B-Chat-Int4)
|
||||
@@ -710,22 +630,20 @@ You can specify any of the following Qwen models via `openllm start`:
|
||||
|
||||
<summary>StableLM</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** StableLM requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[stablelm]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a StableLM server:
|
||||
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start stabilityai/stablelm-tuned-alpha-3b
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -733,16 +651,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any StableLM variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=stablelm) to see more StableLM-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following StableLM models via `openllm start`:
|
||||
|
||||
|
||||
- [stabilityai/stablelm-tuned-alpha-3b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-3b)
|
||||
- [stabilityai/stablelm-tuned-alpha-7b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b)
|
||||
- [stabilityai/stablelm-base-alpha-3b](https://huggingface.co/stabilityai/stablelm-base-alpha-3b)
|
||||
@@ -754,22 +668,20 @@ You can specify any of the following StableLM models via `openllm start`:
|
||||
|
||||
<summary>StarCoder</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** StarCoder requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[starcoder]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a StarCoder server:
|
||||
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start bigcode/starcoder
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -777,16 +689,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any StarCoder variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=starcoder) to see more StarCoder-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following StarCoder models via `openllm start`:
|
||||
|
||||
|
||||
- [bigcode/starcoder](https://huggingface.co/bigcode/starcoder)
|
||||
- [bigcode/starcoderbase](https://huggingface.co/bigcode/starcoderbase)
|
||||
|
||||
@@ -796,22 +704,20 @@ You can specify any of the following StarCoder models via `openllm start`:
|
||||
|
||||
<summary>Yi</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
|
||||
|
||||
> **Note:** Yi requires to install with:
|
||||
>
|
||||
> ```bash
|
||||
> pip install "openllm[yi]"
|
||||
> ```
|
||||
|
||||
|
||||
Run the following command to quickly spin up a Yi server:
|
||||
|
||||
```bash
|
||||
TRUST_REMOTE_CODE=True openllm start 01-ai/Yi-6B
|
||||
```
|
||||
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
@@ -819,16 +725,12 @@ export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any Yi variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=yi) to see more Yi-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following Yi models via `openllm start`:
|
||||
|
||||
|
||||
- [01-ai/Yi-6B](https://huggingface.co/01-ai/Yi-6B)
|
||||
- [01-ai/Yi-34B](https://huggingface.co/01-ai/Yi-34B)
|
||||
- [01-ai/Yi-6B-200K](https://huggingface.co/01-ai/Yi-6B-200K)
|
||||
|
||||
Reference in New Issue
Block a user