mirror of https://github.com/bentoml/OpenLLM.git synced 2026-01-23 15:01:32 -05:00

Go to file

Aaron b9e1ca5514 infra: prepare for release 0.0.6 [generated]

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

2023-05-27 05:20:27 -07:00

.github

build(deps): Bump bufbuild/buf-setup-action from 1.17.0 to 1.19.0 (#2 )

2023-05-25 16:23:04 -07:00

rules

feat: FLAN-T5 supports

2023-05-03 17:50:14 -07:00

src

infra: prepare for release 0.0.6 [generated]

2023-05-27 05:20:27 -07:00

tests

feat: FLAN-T5 supports

2023-05-03 17:50:14 -07:00

tools

feat(script): Added ability to tag release

2023-05-27 01:26:29 -07:00

typings

feat: codegen and bundle build

2023-05-15 18:22:25 -07:00

.bazelignore

feat: FLAN-T5 supports

2023-05-03 17:50:14 -07:00

.bazelrc

feat: FLAN-T5 supports

2023-05-03 17:50:14 -07:00

.bazelversion

feat: FLAN-T5 supports

2023-05-03 17:50:14 -07:00

.gitignore

feat: FLAN-T5 supports

2023-05-03 17:50:14 -07:00

BUILD.bazel

feat: Adding central service definition and init openllm_client

2023-05-15 00:33:05 -07:00

LICENSE

feat: FLAN-T5 supports

2023-05-03 17:50:14 -07:00

package.json

feat: Adding central service definition and init openllm_client

2023-05-15 00:33:05 -07:00

pyproject.toml

feat(codegen): using black parser (#5 )

2023-05-27 04:46:54 -07:00

README.md

docs: update quickstart and remove unused yarn.lock

2023-05-27 05:01:50 -07:00

WORKSPACE

feat: FLAN-T5 supports

2023-05-03 17:50:14 -07:00

README.md

OpenLLM

REST/gRPC API server for running any Open Large-Language Model - StableLM, Llama, Alpaca, Dolly, Flan-T5, and more
Powered by BentoML 🍱 + HuggingFace 🤗

To get started, simply install OpenLLM with pip:

pip install openllm

NOTE: Currently, OpenLLM is built with pydantic v2. At the time of writing, Pydantic v2 is still in alpha stage. To get pydantic v2, do pip install -U --pre pydantic

To start a LLM server, openllm start allows you to start any supported LLM with a single command. For example, to start a dolly-v2 server:

openllm start dolly-v2

# Starting LLM Server for 'dolly_v2'
#
# 2023-05-27T04:55:36-0700 [INFO] [cli] Environ for worker 0: set CPU thread coun t to 10
# 2023-05-27T04:55:36-0700 [INFO] [cli] Prometheus metrics for HTTP BentoServer f rom "_service.py:svc" can be accessed at http://localhost:3000/metrics.
# 2023-05-27T04:55:36-0700 [INFO] [cli] Starting production HTTP BentoServer from "_service.py:svc" listening on http://0.0.0.0:3000 (Press CTRL+C to quit)

To see a list of supported LLMs, run openllm start --help.

On a different terminal window, open a IPython session and create a client to start interacting with the model:

>>> import openllm
>>> client = openllm.client.HTTPClient('http://localhost:3000')
>>> client.query('Explain to me the difference between "further" and "farther"')

To package the LLM into a Bento, simply use openllm build:

openllm build dolly-v2

🎯 To streamline production deployment, you can use the following:

☁️ BentoML Cloud: the fastest way to deploy your bento, simple and at scale
🦄️ Yatai: Model Deployment at scale on Kubernetes
🚀 bentoctl: Fast model deployment on AWS SageMaker, Lambda, ECE, GCP, Azure, Heroku, and more!

README.md Unescape Escape

OpenLLM

README.md