mirror of
https://github.com/bentoml/OpenLLM.git
synced 2026-01-21 14:02:20 -05:00
Examples with OpenLLM
You can find the following examples to interact with OpenLLM features. See more here
OpenAI-compatible endpoints
The openai_completion_client.py demos how to use the OpenAI-compatible /v1/completions to generate text.
export OPENLLM_ENDPOINT=https://api.openllm.com
python openai_completion_client.py
# For streaming set STREAM=True
STREAM=True python openai_completion_client.py
The openai_chat_completion_client.py demos how to use the OpenAI-compatible /v1/chat/completions to chat with a model.
export OPENLLM_ENDPOINT=https://api.openllm.com
python openai_chat_completion_client.py
# For streaming set STREAM=True
STREAM=True python openai_chat_completion_client.py
TinyLLM
The api_server.py demos how one can easily write production-ready BentoML service with OpenLLM and vLLM.
Install requirements:
pip install -U "openllm[vllm]"
To serve the Bento (given you have access to GPU):
bentoml serve api_server:svc
To build the Bento do the following:
bentoml build -f bentofile.yaml .