feat(models): Phi 1.5 (#672)

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2026-06-11 18:09:52 -04:00 · 2023-11-16 17:48:10 -05:00
parent 44f6db982d
commit c850d76ccd
6 changed files with 279 additions and 19 deletions
--- a/openllm-python/README.md
+++ b/openllm-python/README.md
@@ -209,6 +209,7 @@ You can specify any of the following Baichuan models via `openllm start`:
 - [fireballoon/baichuan-vicuna-chinese-7b](https://huggingface.co/fireballoon/baichuan-vicuna-chinese-7b)
 - [fireballoon/baichuan-vicuna-7b](https://huggingface.co/fireballoon/baichuan-vicuna-7b)
 - [hiyouga/baichuan-7b-sft](https://huggingface.co/hiyouga/baichuan-7b-sft)
+
 ### Supported backends

 OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch.
@@ -225,7 +226,7 @@ OpenLLM will support vLLM and PyTorch as default backend. By default, it will us
 To install vLLM, run `pip install "openllm[vllm]"`

 ```bash
-openllm start baichuan-inc/baichuan-7b --backend vllm
+TRUST_REMOTE_CODE=True openllm start baichuan-inc/baichuan-7b --backend vllm
 ```


@@ -240,7 +241,7 @@ openllm start baichuan-inc/baichuan-7b --backend vllm


 ```bash
-openllm start baichuan-inc/baichuan-7b --backend pt
+TRUST_REMOTE_CODE=True openllm start baichuan-inc/baichuan-7b --backend pt
 ```

 </details>
@@ -287,6 +288,7 @@ You can specify any of the following ChatGLM models via `openllm start`:
 - [thudm/chatglm-6b-int4](https://huggingface.co/thudm/chatglm-6b-int4)
 - [thudm/chatglm2-6b](https://huggingface.co/thudm/chatglm2-6b)
 - [thudm/chatglm2-6b-int4](https://huggingface.co/thudm/chatglm2-6b-int4)
+
 ### Supported backends

 OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch.
@@ -301,7 +303,7 @@ OpenLLM will support vLLM and PyTorch as default backend. By default, it will us


 ```bash
-openllm start thudm/chatglm-6b --backend pt
+TRUST_REMOTE_CODE=True openllm start thudm/chatglm-6b --backend pt
 ```

 </details>
@@ -338,6 +340,7 @@ You can specify any of the following DollyV2 models via `openllm start`:
 - [databricks/dolly-v2-3b](https://huggingface.co/databricks/dolly-v2-3b)
 - [databricks/dolly-v2-7b](https://huggingface.co/databricks/dolly-v2-7b)
 - [databricks/dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b)
+
 ### Supported backends

 OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch.
@@ -415,6 +418,7 @@ You can specify any of the following Falcon models via `openllm start`:
 - [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b)
 - [tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct)
 - [tiiuae/falcon-40b-instruct](https://huggingface.co/tiiuae/falcon-40b-instruct)
+
 ### Supported backends

 OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch.
@@ -493,6 +497,7 @@ You can specify any of the following FlanT5 models via `openllm start`:
 - [google/flan-t5-large](https://huggingface.co/google/flan-t5-large)
 - [google/flan-t5-xl](https://huggingface.co/google/flan-t5-xl)
 - [google/flan-t5-xxl](https://huggingface.co/google/flan-t5-xxl)
+
 ### Supported backends

 OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch.
@@ -542,6 +547,7 @@ You can specify any of the following GPTNeoX models via `openllm start`:


 - [eleutherai/gpt-neox-20b](https://huggingface.co/eleutherai/gpt-neox-20b)
+
 ### Supported backends

 OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch.
@@ -627,6 +633,7 @@ You can specify any of the following Llama models via `openllm start`:
 - [NousResearch/llama-2-70b-hf](https://huggingface.co/NousResearch/llama-2-70b-hf)
 - [NousResearch/llama-2-13b-hf](https://huggingface.co/NousResearch/llama-2-13b-hf)
 - [NousResearch/llama-2-7b-hf](https://huggingface.co/NousResearch/llama-2-7b-hf)
+
 ### Supported backends

 OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch.
@@ -696,6 +703,7 @@ You can specify any of the following Mistral models via `openllm start`:
 - [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)
 - [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)
 - [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
+
 ### Supported backends

 OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch.
@@ -776,6 +784,7 @@ You can specify any of the following MPT models via `openllm start`:
 - [mosaicml/mpt-30b](https://huggingface.co/mosaicml/mpt-30b)
 - [mosaicml/mpt-30b-instruct](https://huggingface.co/mosaicml/mpt-30b-instruct)
 - [mosaicml/mpt-30b-chat](https://huggingface.co/mosaicml/mpt-30b-chat)
+
 ### Supported backends

 OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch.
@@ -792,7 +801,7 @@ OpenLLM will support vLLM and PyTorch as default backend. By default, it will us
 To install vLLM, run `pip install "openllm[vllm]"`

 ```bash
-openllm start mosaicml/mpt-7b --backend vllm
+TRUST_REMOTE_CODE=True openllm start mosaicml/mpt-7b --backend vllm
 ```


@@ -807,7 +816,7 @@ openllm start mosaicml/mpt-7b --backend vllm


 ```bash
-openllm start mosaicml/mpt-7b --backend pt
+TRUST_REMOTE_CODE=True openllm start mosaicml/mpt-7b --backend pt
 ```

 </details>
@@ -855,6 +864,7 @@ You can specify any of the following OPT models via `openllm start`:
 - [facebook/opt-2.7b](https://huggingface.co/facebook/opt-2.7b)
 - [facebook/opt-6.7b](https://huggingface.co/facebook/opt-6.7b)
 - [facebook/opt-66b](https://huggingface.co/facebook/opt-66b)
+
 ### Supported backends

 OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch.
@@ -893,6 +903,73 @@ openllm start facebook/opt-125m --backend pt

 <details>

+<summary>Phi</summary>
+
+
+### Quickstart
+
+Run the following command to quickly spin up a Phi server:
+
+```bash
+TRUST_REMOTE_CODE=True openllm start microsoft/phi-1_5
+```
+In a different terminal, run the following command to interact with the server:
+
+```bash
+export OPENLLM_ENDPOINT=http://localhost:3000
+openllm query 'What are large language models?'
+```
+
+
+> **Note:** Any Phi variants can be deployed with OpenLLM. Visit the [HuggingFace Model Hub](https://huggingface.co/models?sort=trending&search=phi) to see more Phi-compatible models.
+
+
+
+### Supported models
+
+You can specify any of the following Phi models via `openllm start`:
+
+
+- [microsoft/phi-1_5](https://huggingface.co/microsoft/phi-1_5)
+
+### Supported backends
+
+OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch.
+
+
+
+> **Important:** We recommend user to explicitly specify `--backend` to choose the desired backend to run the model. If you have access to a GPU, always use `--backend vllm`.
+
+
+
+- vLLM (Recommended):
+
+
+To install vLLM, run `pip install "openllm[vllm]"`
+
+```bash
+TRUST_REMOTE_CODE=True openllm start microsoft/phi-1_5 --backend vllm
+```
+
+
+> **Important:** Using vLLM requires a GPU that has architecture newer than 8.0 to get the best performance for serving. It is recommended that for all serving usecase in production, you should choose vLLM for serving.
+
+
+
+> **Note:** Currently, adapters are yet to be supported with vLLM.
+
+
+- PyTorch:
+
+
+```bash
+TRUST_REMOTE_CODE=True openllm start microsoft/phi-1_5 --backend pt
+```
+
+</details>
+
+<details>
+
 <summary>StableLM</summary>


@@ -924,6 +1001,7 @@ You can specify any of the following StableLM models via `openllm start`:
 - [stabilityai/stablelm-tuned-alpha-7b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b)
 - [stabilityai/stablelm-base-alpha-3b](https://huggingface.co/stabilityai/stablelm-base-alpha-3b)
 - [stabilityai/stablelm-base-alpha-7b](https://huggingface.co/stabilityai/stablelm-base-alpha-7b)
+
 ### Supported backends

 OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch.
@@ -999,6 +1077,7 @@ You can specify any of the following StarCoder models via `openllm start`:

 - [bigcode/starcoder](https://huggingface.co/bigcode/starcoder)
 - [bigcode/starcoderbase](https://huggingface.co/bigcode/starcoderbase)
+
 ### Supported backends

 OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch.
@@ -1068,6 +1147,7 @@ You can specify any of the following Yi models via `openllm start`:
 - [01-ai/Yi-34B](https://huggingface.co/01-ai/Yi-34B)
 - [01-ai/Yi-6B-200K](https://huggingface.co/01-ai/Yi-6B-200K)
 - [01-ai/Yi-34B-200K](https://huggingface.co/01-ai/Yi-34B-200K)
+
 ### Supported backends

 OpenLLM will support vLLM and PyTorch as default backend. By default, it will use vLLM if vLLM is available, otherwise fallback to PyTorch.
@@ -1084,7 +1164,7 @@ OpenLLM will support vLLM and PyTorch as default backend. By default, it will us
 To install vLLM, run `pip install "openllm[vllm]"`

 ```bash
-openllm start 01-ai/Yi-6B --backend vllm
+TRUST_REMOTE_CODE=True openllm start 01-ai/Yi-6B --backend vllm
 ```


@@ -1099,7 +1179,7 @@ openllm start 01-ai/Yi-6B --backend vllm


 ```bash
-openllm start 01-ai/Yi-6B --backend pt
+TRUST_REMOTE_CODE=True openllm start 01-ai/Yi-6B --backend pt
 ```

 </details>
@@ -1296,6 +1376,7 @@ OpenLLM is not just a standalone product; it's a building block designed to
 integrate with other powerful tools easily. We currently offer integration with
 [BentoML](https://github.com/bentoml/BentoML),
 [OpenAI's Compatible Endpoints](https://platform.openai.com/docs/api-reference/completions/object),
+[LlamaIndex](https://www.llamaindex.ai/),
 [LangChain](https://github.com/hwchase17/langchain), and
 [Transformers Agents](https://huggingface.co/docs/transformers/transformers_agents).

@@ -1340,6 +1421,33 @@ async def prompt(input_text: str) -> str:
  return generation.outputs[0].text
 ```

+### [LlamaIndex](https://docs.llamaindex.ai/en/stable/module_guides/models/llms/modules.html#openllm)
+
+To start a local LLM with `llama_index`, simply use `llama_index.llms.openllm.OpenLLM`:
+
+```python
+import asyncio
+from llama_index.llms.openllm import OpenLLM
+
+llm = OpenLLM('HuggingFaceH4/zephyr-7b-alpha')
+
+llm.complete("The meaning of life is")
+
+async def main(prompt, **kwargs):
+  async for it in llm.astream_chat(prompt, **kwargs): print(it)
+
+asyncio.run(main("The time at San Francisco is"))
+```
+
+If there is a remote LLM Server running elsewhere, then you can use `llama_index.llms.openllm.OpenLLMAPI`:
+
+```python
+from llama_index.llms.openllm import OpenLLMAPI
+```
+
+> [!NOTE]
+> All synchronous and asynchronous API from `llama_index.llms.LLM` are supported.
+
 ### [LangChain](https://python.langchain.com/docs/ecosystem/integrations/openllm)

 To quickly start a local LLM with `langchain`, simply do the following:
@@ -1372,7 +1480,6 @@ To integrate a LangChain agent with BentoML, you can do the following:

 ```python
 llm = OpenLLM(
-    model_name='flan-t5',
    model_id='google/flan-t5-large',
    embedded=False,
    serialisation="legacy"
--- a/openllm-python/src/openllm/init.pyi
+++ b/openllm-python/src/openllm/init.pyi
@@ -12,7 +12,7 @@ Fine-tune, serve, deploy, and monitor any LLMs with ease.

 # fmt: off
 # update-config-stubs.py: import stubs start
-from openlm_core.config import CONFIG_MAPPING as CONFIG_MAPPING,CONFIG_MAPPING_NAMES as CONFIG_MAPPING_NAMES,AutoConfig as AutoConfig,BaichuanConfig as BaichuanConfig,ChatGLMConfig as ChatGLMConfig,DollyV2Config as DollyV2Config,FalconConfig as FalconConfig,FlanT5Config as FlanT5Config,GPTNeoXConfig as GPTNeoXConfig,LlamaConfig as LlamaConfig,MistralConfig as MistralConfig,MPTConfig as MPTConfig,OPTConfig as OPTConfig,StableLMConfig as StableLMConfig,StarCoderConfig as StarCoderConfig,YiConfig as YiConfig
+from openlm_core.config import CONFIG_MAPPING as CONFIG_MAPPING,CONFIG_MAPPING_NAMES as CONFIG_MAPPING_NAMES,AutoConfig as AutoConfig,BaichuanConfig as BaichuanConfig,ChatGLMConfig as ChatGLMConfig,DollyV2Config as DollyV2Config,FalconConfig as FalconConfig,FlanT5Config as FlanT5Config,GPTNeoXConfig as GPTNeoXConfig,LlamaConfig as LlamaConfig,MistralConfig as MistralConfig,MPTConfig as MPTConfig,OPTConfig as OPTConfig,PhiConfig as PhiConfig,StableLMConfig as StableLMConfig,StarCoderConfig as StarCoderConfig,YiConfig as YiConfig
 # update-config-stubs.py: import stubs stop
 # fmt: on