Files
OpenLLM/examples
Aaron Pham 539f250c0f feat(vllm): bump to 0.2.2 (#695)
* feat(vllm): bump to 0.2.2

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: update changelog

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: move up to CUDA 12.1

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* fix: remove auto-gptq installation

since the builder image doesn't have access to GPU

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* fix: update containerization warning

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-19 02:52:32 -05:00
..
2023-11-18 19:44:52 -05:00

Examples with OpenLLM

You can find the following examples to interact with OpenLLM features. See more here

Features

The following notebook demonstrate general OpenLLM features and how to start running any open source models in production.

OpenAI-compatible endpoints

The openai_completion_client.py demos how to use the OpenAI-compatible /v1/completions to generate text.

export OPENLLM_ENDPOINT=https://api.openllm.com
python openai_completion_client.py

# For streaming set STREAM=True
STREAM=True python openai_completion_client.py

The openai_chat_completion_client.py demos how to use the OpenAI-compatible /v1/chat/completions to chat with a model.

export OPENLLM_ENDPOINT=https://api.openllm.com
python openai_chat_completion_client.py

# For streaming set STREAM=True
STREAM=True python openai_chat_completion_client.py

TinyLLM

The api_server.py demos how one can easily write production-ready BentoML service with OpenLLM and vLLM.

Install requirements:

pip install -U "openllm[vllm]"

To serve the Bento (given you have access to GPU):

bentoml serve api_server:svc

To build the Bento do the following:

bentoml build -f bentofile.yaml .