mirror of
https://github.com/bentoml/OpenLLM.git
synced 2026-06-11 18:09:52 -04:00
feat(mixtral): correct support for mixtral (#772)
feat(mixtral): support inference with pt Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
This commit is contained in:
69
openllm-python/README.md
generated
69
openllm-python/README.md
generated
@@ -724,6 +724,7 @@ You can specify any of the following Mistral models via `openllm start`:
|
||||
|
||||
- [HuggingFaceH4/zephyr-7b-alpha](https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha)
|
||||
- [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)
|
||||
- [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
|
||||
- [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)
|
||||
- [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
|
||||
|
||||
@@ -765,6 +766,74 @@ openllm start HuggingFaceH4/zephyr-7b-alpha --backend pt
|
||||
|
||||
<details>
|
||||
|
||||
<summary>Mixtral</summary>
|
||||
|
||||
|
||||
### Quickstart
|
||||
|
||||
Run the following command to quickly spin up a Mixtral server:
|
||||
|
||||
```bash
|
||||
openllm start mistralai/Mixtral-8x7B-Instruct-v0.1
|
||||
```
|
||||
In a different terminal, run the following command to interact with the server:
|
||||
|
||||
```bash
|
||||
export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'What are large language models?'
|
||||
```
|
||||
|
||||
|
||||
> **Note:** Any Mixtral variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=mixtral) to see more Mixtral-compatible models.
|
||||
|
||||
|
||||
|
||||
### Supported models
|
||||
|
||||
You can specify any of the following Mixtral models via `openllm start`:
|
||||
|
||||
|
||||
- [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
|
||||
- [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)
|
||||
|
||||
### Supported backends
|
||||
|
||||
OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch.
|
||||
|
||||
|
||||
|
||||
> **Important:** We recommend user to explicitly specify `--backend` to choose the desired backend to run the model. If you have access to a GPU, always use `--backend vllm`.
|
||||
|
||||
|
||||
|
||||
- vLLM (Recommended):
|
||||
|
||||
|
||||
To install vLLM, run `pip install "openllm[vllm]"`
|
||||
|
||||
```bash
|
||||
openllm start mistralai/Mixtral-8x7B-Instruct-v0.1 --backend vllm
|
||||
```
|
||||
|
||||
|
||||
> **Important:** Using vLLM requires a GPU that has architecture newer than 8.0 to get the best performance for serving. It is recommended that for all serving usecase in production, you should choose vLLM for serving.
|
||||
|
||||
|
||||
|
||||
> **Note:** Currently, adapters are yet to be supported with vLLM.
|
||||
|
||||
|
||||
- PyTorch:
|
||||
|
||||
|
||||
```bash
|
||||
openllm start mistralai/Mixtral-8x7B-Instruct-v0.1 --backend pt
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
|
||||
<summary>MPT</summary>
|
||||
|
||||
|
||||
|
||||
@@ -11,7 +11,7 @@ Fine-tune, serve, deploy, and monitor any LLMs with ease.
|
||||
'''
|
||||
|
||||
# update-config-stubs.py: import stubs start
|
||||
from openlm_core.config import CONFIG_MAPPING as CONFIG_MAPPING, CONFIG_MAPPING_NAMES as CONFIG_MAPPING_NAMES, AutoConfig as AutoConfig, BaichuanConfig as BaichuanConfig, ChatGLMConfig as ChatGLMConfig, DollyV2Config as DollyV2Config, FalconConfig as FalconConfig, FlanT5Config as FlanT5Config, GPTNeoXConfig as GPTNeoXConfig, LlamaConfig as LlamaConfig, MistralConfig as MistralConfig, MPTConfig as MPTConfig, OPTConfig as OPTConfig, PhiConfig as PhiConfig, QwenConfig as QwenConfig, StableLMConfig as StableLMConfig, StarCoderConfig as StarCoderConfig, YiConfig as YiConfig
|
||||
from openlm_core.config import CONFIG_MAPPING as CONFIG_MAPPING, CONFIG_MAPPING_NAMES as CONFIG_MAPPING_NAMES, AutoConfig as AutoConfig, BaichuanConfig as BaichuanConfig, ChatGLMConfig as ChatGLMConfig, DollyV2Config as DollyV2Config, FalconConfig as FalconConfig, FlanT5Config as FlanT5Config, GPTNeoXConfig as GPTNeoXConfig, LlamaConfig as LlamaConfig, MistralConfig as MistralConfig, MixtralConfig as MixtralConfig, MPTConfig as MPTConfig, OPTConfig as OPTConfig, PhiConfig as PhiConfig, QwenConfig as QwenConfig, StableLMConfig as StableLMConfig, StarCoderConfig as StarCoderConfig, YiConfig as YiConfig
|
||||
# update-config-stubs.py: import stubs stop
|
||||
|
||||
from openllm_cli._sdk import (
|
||||
|
||||
Reference in New Issue
Block a user