diff --git a/README.md b/README.md index 6cb1d6dd..7650b738 100644 --- a/README.md +++ b/README.md @@ -209,6 +209,7 @@ You can specify any of the following Baichuan models via `openllm start`: - [fireballoon/baichuan-vicuna-chinese-7b](https://huggingface.co/fireballoon/baichuan-vicuna-chinese-7b) - [fireballoon/baichuan-vicuna-7b](https://huggingface.co/fireballoon/baichuan-vicuna-7b) - [hiyouga/baichuan-7b-sft](https://huggingface.co/hiyouga/baichuan-7b-sft) + ### Supported backends OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch. @@ -225,7 +226,7 @@ OpenLLM will support vLLM and PyTorch as default backend. By default, it will us To install vLLM, run `pip install "openllm[vllm]"` ```bash -openllm start baichuan-inc/baichuan-7b --backend vllm +TRUST_REMOTE_CODE=True openllm start baichuan-inc/baichuan-7b --backend vllm ``` @@ -240,7 +241,7 @@ openllm start baichuan-inc/baichuan-7b --backend vllm ```bash -openllm start baichuan-inc/baichuan-7b --backend pt +TRUST_REMOTE_CODE=True openllm start baichuan-inc/baichuan-7b --backend pt ``` @@ -287,6 +288,7 @@ You can specify any of the following ChatGLM models via `openllm start`: - [thudm/chatglm-6b-int4](https://huggingface.co/thudm/chatglm-6b-int4) - [thudm/chatglm2-6b](https://huggingface.co/thudm/chatglm2-6b) - [thudm/chatglm2-6b-int4](https://huggingface.co/thudm/chatglm2-6b-int4) + ### Supported backends OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch. @@ -301,7 +303,7 @@ OpenLLM will support vLLM and PyTorch as default backend. By default, it will us ```bash -openllm start thudm/chatglm-6b --backend pt +TRUST_REMOTE_CODE=True openllm start thudm/chatglm-6b --backend pt ``` @@ -338,6 +340,7 @@ You can specify any of the following DollyV2 models via `openllm start`: - [databricks/dolly-v2-3b](https://huggingface.co/databricks/dolly-v2-3b) - [databricks/dolly-v2-7b](https://huggingface.co/databricks/dolly-v2-7b) - [databricks/dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b) + ### Supported backends OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch. @@ -415,6 +418,7 @@ You can specify any of the following Falcon models via `openllm start`: - [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b) - [tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) - [tiiuae/falcon-40b-instruct](https://huggingface.co/tiiuae/falcon-40b-instruct) + ### Supported backends OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch. @@ -493,6 +497,7 @@ You can specify any of the following FlanT5 models via `openllm start`: - [google/flan-t5-large](https://huggingface.co/google/flan-t5-large) - [google/flan-t5-xl](https://huggingface.co/google/flan-t5-xl) - [google/flan-t5-xxl](https://huggingface.co/google/flan-t5-xxl) + ### Supported backends OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch. @@ -542,6 +547,7 @@ You can specify any of the following GPTNeoX models via `openllm start`: - [eleutherai/gpt-neox-20b](https://huggingface.co/eleutherai/gpt-neox-20b) + ### Supported backends OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch. @@ -627,6 +633,7 @@ You can specify any of the following Llama models via `openllm start`: - [NousResearch/llama-2-70b-hf](https://huggingface.co/NousResearch/llama-2-70b-hf) - [NousResearch/llama-2-13b-hf](https://huggingface.co/NousResearch/llama-2-13b-hf) - [NousResearch/llama-2-7b-hf](https://huggingface.co/NousResearch/llama-2-7b-hf) + ### Supported backends OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch. @@ -696,6 +703,7 @@ You can specify any of the following Mistral models via `openllm start`: - [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) - [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) - [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) + ### Supported backends OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch. @@ -776,6 +784,7 @@ You can specify any of the following MPT models via `openllm start`: - [mosaicml/mpt-30b](https://huggingface.co/mosaicml/mpt-30b) - [mosaicml/mpt-30b-instruct](https://huggingface.co/mosaicml/mpt-30b-instruct) - [mosaicml/mpt-30b-chat](https://huggingface.co/mosaicml/mpt-30b-chat) + ### Supported backends OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch. @@ -792,7 +801,7 @@ OpenLLM will support vLLM and PyTorch as default backend. By default, it will us To install vLLM, run `pip install "openllm[vllm]"` ```bash -openllm start mosaicml/mpt-7b --backend vllm +TRUST_REMOTE_CODE=True openllm start mosaicml/mpt-7b --backend vllm ``` @@ -807,7 +816,7 @@ openllm start mosaicml/mpt-7b --backend vllm ```bash -openllm start mosaicml/mpt-7b --backend pt +TRUST_REMOTE_CODE=True openllm start mosaicml/mpt-7b --backend pt ``` @@ -855,6 +864,7 @@ You can specify any of the following OPT models via `openllm start`: - [facebook/opt-2.7b](https://huggingface.co/facebook/opt-2.7b) - [facebook/opt-6.7b](https://huggingface.co/facebook/opt-6.7b) - [facebook/opt-66b](https://huggingface.co/facebook/opt-66b) + ### Supported backends OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch. @@ -893,6 +903,73 @@ openllm start facebook/opt-125m --backend pt
+Phi + + +### Quickstart + +Run the following command to quickly spin up a Phi server: + +```bash +TRUST_REMOTE_CODE=True openllm start microsoft/phi-1_5 +``` +In a different terminal, run the following command to interact with the server: + +```bash +export OPENLLM_ENDPOINT=http://localhost:3000 +openllm query 'What are large language models?' +``` + + +> **Note:** Any Phi variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=phi) to see more Phi-compatible models. + + + +### Supported models + +You can specify any of the following Phi models via `openllm start`: + + +- [microsoft/phi-1_5](https://huggingface.co/microsoft/phi-1_5) + +### Supported backends + +OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch. + + + +> **Important:** We recommend user to explicitly specify `--backend` to choose the desired backend to run the model. If you have access to a GPU, always use `--backend vllm`. + + + +- vLLM (Recommended): + + +To install vLLM, run `pip install "openllm[vllm]"` + +```bash +TRUST_REMOTE_CODE=True openllm start microsoft/phi-1_5 --backend vllm +``` + + +> **Important:** Using vLLM requires a GPU that has architecture newer than 8.0 to get the best performance for serving. It is recommended that for all serving usecase in production, you should choose vLLM for serving. + + + +> **Note:** Currently, adapters are yet to be supported with vLLM. + + +- PyTorch: + + +```bash +TRUST_REMOTE_CODE=True openllm start microsoft/phi-1_5 --backend pt +``` + +
+ +
+ StableLM @@ -924,6 +1001,7 @@ You can specify any of the following StableLM models via `openllm start`: - [stabilityai/stablelm-tuned-alpha-7b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b) - [stabilityai/stablelm-base-alpha-3b](https://huggingface.co/stabilityai/stablelm-base-alpha-3b) - [stabilityai/stablelm-base-alpha-7b](https://huggingface.co/stabilityai/stablelm-base-alpha-7b) + ### Supported backends OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch. @@ -999,6 +1077,7 @@ You can specify any of the following StarCoder models via `openllm start`: - [bigcode/starcoder](https://huggingface.co/bigcode/starcoder) - [bigcode/starcoderbase](https://huggingface.co/bigcode/starcoderbase) + ### Supported backends OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch. @@ -1068,6 +1147,7 @@ You can specify any of the following Yi models via `openllm start`: - [01-ai/Yi-34B](https://huggingface.co/01-ai/Yi-34B) - [01-ai/Yi-6B-200K](https://huggingface.co/01-ai/Yi-6B-200K) - [01-ai/Yi-34B-200K](https://huggingface.co/01-ai/Yi-34B-200K) + ### Supported backends OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch. @@ -1084,7 +1164,7 @@ OpenLLM will support vLLM and PyTorch as default backend. By default, it will us To install vLLM, run `pip install "openllm[vllm]"` ```bash -openllm start 01-ai/Yi-6B --backend vllm +TRUST_REMOTE_CODE=True openllm start 01-ai/Yi-6B --backend vllm ``` @@ -1099,7 +1179,7 @@ openllm start 01-ai/Yi-6B --backend vllm ```bash -openllm start 01-ai/Yi-6B --backend pt +TRUST_REMOTE_CODE=True openllm start 01-ai/Yi-6B --backend pt ```
diff --git a/openllm-core/src/openllm_core/config/configuration_auto.py b/openllm-core/src/openllm_core/config/configuration_auto.py index 680e8566..55bc5c79 100644 --- a/openllm-core/src/openllm_core/config/configuration_auto.py +++ b/openllm-core/src/openllm_core/config/configuration_auto.py @@ -38,6 +38,7 @@ CONFIG_MAPPING_NAMES = OrderedDict( ('llama', 'LlamaConfig'), ('mpt', 'MPTConfig'), ('opt', 'OPTConfig'), + ('phi', 'PhiConfig'), ('stablelm', 'StableLMConfig'), ('starcoder', 'StarCoderConfig'), ('mistral', 'MistralConfig'), @@ -151,6 +152,9 @@ class AutoConfig: def for_model(cls,model_name:t.Literal['opt'],**attrs:t.Any)->openllm_core.config.OPTConfig:... @t.overload @classmethod + def for_model(cls,model_name:t.Literal['phi'],**attrs:t.Any)->openllm_core.config.PhiConfig:... + @t.overload + @classmethod def for_model(cls,model_name:t.Literal['stablelm'],**attrs:t.Any)->openllm_core.config.StableLMConfig:... @t.overload @classmethod diff --git a/openllm-core/src/openllm_core/config/configuration_phi.py b/openllm-core/src/openllm_core/config/configuration_phi.py new file mode 100644 index 00000000..1f435795 --- /dev/null +++ b/openllm-core/src/openllm_core/config/configuration_phi.py @@ -0,0 +1,69 @@ +from __future__ import annotations +import typing as t + +import openllm_core +from openllm_core.prompts import PromptTemplate + +DEFAULT_PROMPT_TEMPLATE = """{instruction}""" + +DEFAULT_SYSTEM_MESSAGE = '' + + +class PhiConfig(openllm_core.LLMConfig): + """The language model phi-1.5 is a Transformer with 1.3 billion parameters. + + It was trained using the same data sources as [phi-1](https://huggingface.co/microsoft/phi-1), augmented with a new data source that consists of various + NLP synthetic texts. When assessed against benchmarks testing common sense, + language understanding, and logical reasoning, phi-1.5 demonstrates a nearly state-of-the-art performance among models with less + than 10 billion parameters. + + Refer to [Phi's HuggingFace repos](https://huggingface.co/microsoft/phi-1_5) + for more information. + """ + + __config__ = { + 'name_type': 'lowercase', + 'url': 'https://arxiv.org/abs/2309.05463', + 'architecture': 'PhiForCausalLM', + 'requirements': ['einops'], + 'trust_remote_code': True, + 'default_id': 'microsoft/phi-1_5', + 'serialisation': 'safetensors', + 'model_ids': ['microsoft/phi-1_5'], + 'fine_tune_strategies': ( + {'adapter_type': 'lora', 'r': 64, 'lora_alpha': 16, 'lora_dropout': 0.1, 'bias': 'none'}, + ), + } + + class GenerationConfig: + max_new_tokens: int = 200 + + class SamplingParams: + best_of: int = 1 + + @property + def default_prompt_template(self) -> str: + return DEFAULT_PROMPT_TEMPLATE + + @property + def default_system_message(self) -> str: + return DEFAULT_SYSTEM_MESSAGE + + def sanitize_parameters( + self, + prompt: str, + prompt_template: PromptTemplate | str | None = None, + system_message: str | None = None, + max_new_tokens: int | None = None, + **attrs: t.Any, + ) -> tuple[str, dict[str, t.Any], dict[str, t.Any]]: + system_message = DEFAULT_SYSTEM_MESSAGE if system_message is None else system_message + if prompt_template is None: + prompt_template = PromptTemplate(template=self.default_prompt_template) + elif isinstance(prompt_template, str): + prompt_template = PromptTemplate(template=prompt_template) + return ( + prompt_template.with_options(system_message=system_message).format(instruction=prompt), + {'max_new_tokens': max_new_tokens}, + {}, + ) diff --git a/openllm-python/README.md b/openllm-python/README.md index d05a6e33..7650b738 100644 --- a/openllm-python/README.md +++ b/openllm-python/README.md @@ -209,6 +209,7 @@ You can specify any of the following Baichuan models via `openllm start`: - [fireballoon/baichuan-vicuna-chinese-7b](https://huggingface.co/fireballoon/baichuan-vicuna-chinese-7b) - [fireballoon/baichuan-vicuna-7b](https://huggingface.co/fireballoon/baichuan-vicuna-7b) - [hiyouga/baichuan-7b-sft](https://huggingface.co/hiyouga/baichuan-7b-sft) + ### Supported backends OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch. @@ -225,7 +226,7 @@ OpenLLM will support vLLM and PyTorch as default backend. By default, it will us To install vLLM, run `pip install "openllm[vllm]"` ```bash -openllm start baichuan-inc/baichuan-7b --backend vllm +TRUST_REMOTE_CODE=True openllm start baichuan-inc/baichuan-7b --backend vllm ``` @@ -240,7 +241,7 @@ openllm start baichuan-inc/baichuan-7b --backend vllm ```bash -openllm start baichuan-inc/baichuan-7b --backend pt +TRUST_REMOTE_CODE=True openllm start baichuan-inc/baichuan-7b --backend pt ``` @@ -287,6 +288,7 @@ You can specify any of the following ChatGLM models via `openllm start`: - [thudm/chatglm-6b-int4](https://huggingface.co/thudm/chatglm-6b-int4) - [thudm/chatglm2-6b](https://huggingface.co/thudm/chatglm2-6b) - [thudm/chatglm2-6b-int4](https://huggingface.co/thudm/chatglm2-6b-int4) + ### Supported backends OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch. @@ -301,7 +303,7 @@ OpenLLM will support vLLM and PyTorch as default backend. By default, it will us ```bash -openllm start thudm/chatglm-6b --backend pt +TRUST_REMOTE_CODE=True openllm start thudm/chatglm-6b --backend pt ``` @@ -338,6 +340,7 @@ You can specify any of the following DollyV2 models via `openllm start`: - [databricks/dolly-v2-3b](https://huggingface.co/databricks/dolly-v2-3b) - [databricks/dolly-v2-7b](https://huggingface.co/databricks/dolly-v2-7b) - [databricks/dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b) + ### Supported backends OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch. @@ -415,6 +418,7 @@ You can specify any of the following Falcon models via `openllm start`: - [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b) - [tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) - [tiiuae/falcon-40b-instruct](https://huggingface.co/tiiuae/falcon-40b-instruct) + ### Supported backends OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch. @@ -493,6 +497,7 @@ You can specify any of the following FlanT5 models via `openllm start`: - [google/flan-t5-large](https://huggingface.co/google/flan-t5-large) - [google/flan-t5-xl](https://huggingface.co/google/flan-t5-xl) - [google/flan-t5-xxl](https://huggingface.co/google/flan-t5-xxl) + ### Supported backends OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch. @@ -542,6 +547,7 @@ You can specify any of the following GPTNeoX models via `openllm start`: - [eleutherai/gpt-neox-20b](https://huggingface.co/eleutherai/gpt-neox-20b) + ### Supported backends OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch. @@ -627,6 +633,7 @@ You can specify any of the following Llama models via `openllm start`: - [NousResearch/llama-2-70b-hf](https://huggingface.co/NousResearch/llama-2-70b-hf) - [NousResearch/llama-2-13b-hf](https://huggingface.co/NousResearch/llama-2-13b-hf) - [NousResearch/llama-2-7b-hf](https://huggingface.co/NousResearch/llama-2-7b-hf) + ### Supported backends OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch. @@ -696,6 +703,7 @@ You can specify any of the following Mistral models via `openllm start`: - [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) - [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) - [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) + ### Supported backends OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch. @@ -776,6 +784,7 @@ You can specify any of the following MPT models via `openllm start`: - [mosaicml/mpt-30b](https://huggingface.co/mosaicml/mpt-30b) - [mosaicml/mpt-30b-instruct](https://huggingface.co/mosaicml/mpt-30b-instruct) - [mosaicml/mpt-30b-chat](https://huggingface.co/mosaicml/mpt-30b-chat) + ### Supported backends OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch. @@ -792,7 +801,7 @@ OpenLLM will support vLLM and PyTorch as default backend. By default, it will us To install vLLM, run `pip install "openllm[vllm]"` ```bash -openllm start mosaicml/mpt-7b --backend vllm +TRUST_REMOTE_CODE=True openllm start mosaicml/mpt-7b --backend vllm ``` @@ -807,7 +816,7 @@ openllm start mosaicml/mpt-7b --backend vllm ```bash -openllm start mosaicml/mpt-7b --backend pt +TRUST_REMOTE_CODE=True openllm start mosaicml/mpt-7b --backend pt ``` @@ -855,6 +864,7 @@ You can specify any of the following OPT models via `openllm start`: - [facebook/opt-2.7b](https://huggingface.co/facebook/opt-2.7b) - [facebook/opt-6.7b](https://huggingface.co/facebook/opt-6.7b) - [facebook/opt-66b](https://huggingface.co/facebook/opt-66b) + ### Supported backends OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch. @@ -893,6 +903,73 @@ openllm start facebook/opt-125m --backend pt
+Phi + + +### Quickstart + +Run the following command to quickly spin up a Phi server: + +```bash +TRUST_REMOTE_CODE=True openllm start microsoft/phi-1_5 +``` +In a different terminal, run the following command to interact with the server: + +```bash +export OPENLLM_ENDPOINT=http://localhost:3000 +openllm query 'What are large language models?' +``` + + +> **Note:** Any Phi variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=phi) to see more Phi-compatible models. + + + +### Supported models + +You can specify any of the following Phi models via `openllm start`: + + +- [microsoft/phi-1_5](https://huggingface.co/microsoft/phi-1_5) + +### Supported backends + +OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch. + + + +> **Important:** We recommend user to explicitly specify `--backend` to choose the desired backend to run the model. If you have access to a GPU, always use `--backend vllm`. + + + +- vLLM (Recommended): + + +To install vLLM, run `pip install "openllm[vllm]"` + +```bash +TRUST_REMOTE_CODE=True openllm start microsoft/phi-1_5 --backend vllm +``` + + +> **Important:** Using vLLM requires a GPU that has architecture newer than 8.0 to get the best performance for serving. It is recommended that for all serving usecase in production, you should choose vLLM for serving. + + + +> **Note:** Currently, adapters are yet to be supported with vLLM. + + +- PyTorch: + + +```bash +TRUST_REMOTE_CODE=True openllm start microsoft/phi-1_5 --backend pt +``` + +
+ +
+ StableLM @@ -924,6 +1001,7 @@ You can specify any of the following StableLM models via `openllm start`: - [stabilityai/stablelm-tuned-alpha-7b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b) - [stabilityai/stablelm-base-alpha-3b](https://huggingface.co/stabilityai/stablelm-base-alpha-3b) - [stabilityai/stablelm-base-alpha-7b](https://huggingface.co/stabilityai/stablelm-base-alpha-7b) + ### Supported backends OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch. @@ -999,6 +1077,7 @@ You can specify any of the following StarCoder models via `openllm start`: - [bigcode/starcoder](https://huggingface.co/bigcode/starcoder) - [bigcode/starcoderbase](https://huggingface.co/bigcode/starcoderbase) + ### Supported backends OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch. @@ -1068,6 +1147,7 @@ You can specify any of the following Yi models via `openllm start`: - [01-ai/Yi-34B](https://huggingface.co/01-ai/Yi-34B) - [01-ai/Yi-6B-200K](https://huggingface.co/01-ai/Yi-6B-200K) - [01-ai/Yi-34B-200K](https://huggingface.co/01-ai/Yi-34B-200K) + ### Supported backends OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch. @@ -1084,7 +1164,7 @@ OpenLLM will support vLLM and PyTorch as default backend. By default, it will us To install vLLM, run `pip install "openllm[vllm]"` ```bash -openllm start 01-ai/Yi-6B --backend vllm +TRUST_REMOTE_CODE=True openllm start 01-ai/Yi-6B --backend vllm ``` @@ -1099,7 +1179,7 @@ openllm start 01-ai/Yi-6B --backend vllm ```bash -openllm start 01-ai/Yi-6B --backend pt +TRUST_REMOTE_CODE=True openllm start 01-ai/Yi-6B --backend pt ```
@@ -1296,6 +1376,7 @@ OpenLLM is not just a standalone product; it's a building block designed to integrate with other powerful tools easily. We currently offer integration with [BentoML](https://github.com/bentoml/BentoML), [OpenAI's Compatible Endpoints](https://platform.openai.com/docs/api-reference/completions/object), +[LlamaIndex](https://www.llamaindex.ai/), [LangChain](https://github.com/hwchase17/langchain), and [Transformers Agents](https://huggingface.co/docs/transformers/transformers_agents). @@ -1340,6 +1421,33 @@ async def prompt(input_text: str) -> str: return generation.outputs[0].text ``` +### [LlamaIndex](https://docs.llamaindex.ai/en/stable/module_guides/models/llms/modules.html#openllm) + +To start a local LLM with `llama_index`, simply use `llama_index.llms.openllm.OpenLLM`: + +```python +import asyncio +from llama_index.llms.openllm import OpenLLM + +llm = OpenLLM('HuggingFaceH4/zephyr-7b-alpha') + +llm.complete("The meaning of life is") + +async def main(prompt, **kwargs): + async for it in llm.astream_chat(prompt, **kwargs): print(it) + +asyncio.run(main("The time at San Francisco is")) +``` + +If there is a remote LLM Server running elsewhere, then you can use `llama_index.llms.openllm.OpenLLMAPI`: + +```python +from llama_index.llms.openllm import OpenLLMAPI +``` + +> [!NOTE] +> All synchronous and asynchronous API from `llama_index.llms.LLM` are supported. + ### [LangChain](https://python.langchain.com/docs/ecosystem/integrations/openllm) To quickly start a local LLM with `langchain`, simply do the following: @@ -1372,7 +1480,6 @@ To integrate a LangChain agent with BentoML, you can do the following: ```python llm = OpenLLM( - model_name='flan-t5', model_id='google/flan-t5-large', embedded=False, serialisation="legacy" diff --git a/openllm-python/src/openllm/__init__.pyi b/openllm-python/src/openllm/__init__.pyi index 6bb92cb9..11047560 100644 --- a/openllm-python/src/openllm/__init__.pyi +++ b/openllm-python/src/openllm/__init__.pyi @@ -12,7 +12,7 @@ Fine-tune, serve, deploy, and monitor any LLMs with ease. # fmt: off # update-config-stubs.py: import stubs start -from openlm_core.config import CONFIG_MAPPING as CONFIG_MAPPING,CONFIG_MAPPING_NAMES as CONFIG_MAPPING_NAMES,AutoConfig as AutoConfig,BaichuanConfig as BaichuanConfig,ChatGLMConfig as ChatGLMConfig,DollyV2Config as DollyV2Config,FalconConfig as FalconConfig,FlanT5Config as FlanT5Config,GPTNeoXConfig as GPTNeoXConfig,LlamaConfig as LlamaConfig,MistralConfig as MistralConfig,MPTConfig as MPTConfig,OPTConfig as OPTConfig,StableLMConfig as StableLMConfig,StarCoderConfig as StarCoderConfig,YiConfig as YiConfig +from openlm_core.config import CONFIG_MAPPING as CONFIG_MAPPING,CONFIG_MAPPING_NAMES as CONFIG_MAPPING_NAMES,AutoConfig as AutoConfig,BaichuanConfig as BaichuanConfig,ChatGLMConfig as ChatGLMConfig,DollyV2Config as DollyV2Config,FalconConfig as FalconConfig,FlanT5Config as FlanT5Config,GPTNeoXConfig as GPTNeoXConfig,LlamaConfig as LlamaConfig,MistralConfig as MistralConfig,MPTConfig as MPTConfig,OPTConfig as OPTConfig,PhiConfig as PhiConfig,StableLMConfig as StableLMConfig,StarCoderConfig as StarCoderConfig,YiConfig as YiConfig # update-config-stubs.py: import stubs stop # fmt: on diff --git a/tools/update-readme.py b/tools/update-readme.py index e8ef044b..e8cb8ae6 100755 --- a/tools/update-readme.py +++ b/tools/update-readme.py @@ -62,7 +62,7 @@ openllm query 'What are large language models?' details_block.extend(list_ids) details_block.extend( [ - '### Supported backends\n', + '\n### Supported backends\n', 'OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch.\n', *markdown_importantblock( 'We recommend user to explicitly specify `--backend` to choose the desired backend to run the model. If you have access to a GPU, always use `--backend vllm`.\n' @@ -76,7 +76,7 @@ openllm query 'What are large language models?' 'To install vLLM, run `pip install "openllm[vllm]"`\n', f"""\ ```bash -openllm start {it['model_ids'][0]} --backend vllm +{'' if not it['trust_remote_code'] else 'TRUST_REMOTE_CODE=True '}openllm start {it['model_ids'][0]} --backend vllm ```""", *markdown_importantblock( 'Using vLLM requires a GPU that has architecture newer than 8.0 to get the best performance for serving. It is recommended that for all serving usecase in production, you should choose vLLM for serving.' @@ -90,7 +90,7 @@ openllm start {it['model_ids'][0]} --backend vllm '\n- PyTorch:\n\n', f"""\ ```bash -openllm start {it['model_ids'][0]} --backend pt +{'' if not it['trust_remote_code'] else 'TRUST_REMOTE_CODE=True '}openllm start {it['model_ids'][0]} --backend pt ```""", ] )