mirror of
https://github.com/bentoml/OpenLLM.git
synced 2026-01-23 15:01:32 -05:00
v0.0.6
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
OpenLLM
REST/gRPC API server for running any Open Large-Language Model - StableLM, Llama, Alpaca, Dolly, Flan-T5, and more
Powered by BentoML 🍱 + HuggingFace 🤗
To get started, simply install OpenLLM with pip:
pip install openllm
NOTE: Currently, OpenLLM is built with pydantic v2. At the time of writing, Pydantic v2 is still in alpha stage. To get pydantic v2, do
pip install -U --pre pydantic
To start a LLM server, openllm start allows you to start any supported LLM
with a single command. For example, to start a dolly-v2 server:
openllm start dolly-v2
# Starting LLM Server for 'dolly_v2'
#
# 2023-05-27T04:55:36-0700 [INFO] [cli] Environ for worker 0: set CPU thread coun t to 10
# 2023-05-27T04:55:36-0700 [INFO] [cli] Prometheus metrics for HTTP BentoServer f rom "_service.py:svc" can be accessed at http://localhost:3000/metrics.
# 2023-05-27T04:55:36-0700 [INFO] [cli] Starting production HTTP BentoServer from "_service.py:svc" listening on http://0.0.0.0:3000 (Press CTRL+C to quit)
To see a list of supported LLMs, run openllm start --help.
On a different terminal window, open a IPython session and create a client to start interacting with the model:
>>> import openllm
>>> client = openllm.client.HTTPClient('http://localhost:3000')
>>> client.query('Explain to me the difference between "further" and "farther"')
To package the LLM into a Bento, simply use openllm build:
openllm build dolly-v2
🎯 To streamline production deployment, you can use the following:
- ☁️ BentoML Cloud: the fastest way to deploy your bento, simple and at scale
- 🦄️ Yatai: Model Deployment at scale on Kubernetes
- 🚀 bentoctl: Fast model deployment on AWS SageMaker, Lambda, ECE, GCP, Azure, Heroku, and more!
Description
Languages
Python
95.9%
Shell
4.1%