mirror of https://github.com/bentoml/OpenLLM.git synced 2026-01-23 15:01:32 -05:00

Go to file

Aaron ac710dfd54 revert(perf): remove group alias

There is no need for this feature

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

2023-05-28 10:04:33 -07:00

.github

build(deps): Bump bufbuild/buf-setup-action from 1.17.0 to 1.19.0 (#2 )

2023-05-25 16:23:04 -07:00

examples

fix(type): configuration and dependencies

2023-05-28 06:01:30 -07:00

src

revert(perf): remove group alias

2023-05-28 10:04:33 -07:00

tests

feat: FLAN-T5 supports

2023-05-03 17:50:14 -07:00

tools

infra: cleanup unused bazel rules

2023-05-28 06:09:32 -07:00

typings

feat: pre-commit setup

2023-05-27 06:54:22 -07:00

.gitignore

feat: pre-commit setup

2023-05-27 06:54:22 -07:00

.pre-commit-config.yaml

feat: pre-commit setup

2023-05-27 06:54:22 -07:00

LICENSE

feat: FLAN-T5 supports

2023-05-03 17:50:14 -07:00

package.json

infra: bump to dev version of 0.0.10.dev0 [generated]

2023-05-28 10:00:34 -07:00

pyproject.toml

perf(cli): using click instead of rich console

2023-05-28 09:32:56 -07:00

README.md

feat(telemetry): add support for usage tracking

2023-05-27 20:39:13 -07:00

taplo.toml

infra: docs and normalize formatting

2023-05-28 15:00:17 +00:00

README.md

OpenLLM

REST/gRPC API server for running any Open Large-Language Model - StableLM, Llama, Alpaca, Dolly, Flan-T5, and more
Powered by BentoML 🍱 + HuggingFace 🤗

To get started, simply install OpenLLM with pip:

pip install openllm

NOTE: Currently, OpenLLM is built with pydantic v2. At the time of writing, Pydantic v2 is still in alpha stage. To get pydantic v2, do pip install -U --pre pydantic

To start a LLM server, openllm start allows you to start any supported LLM with a single command. For example, to start a dolly-v2 server:

openllm start dolly-v2

# Starting LLM Server for 'dolly_v2'
#
# 2023-05-27T04:55:36-0700 [INFO] [cli] Environ for worker 0: set CPU thread coun t to 10
# 2023-05-27T04:55:36-0700 [INFO] [cli] Prometheus metrics for HTTP BentoServer f rom "_service.py:svc" can be accessed at http://localhost:3000/metrics.
# 2023-05-27T04:55:36-0700 [INFO] [cli] Starting production HTTP BentoServer from "_service.py:svc" listening on http://0.0.0.0:3000 (Press CTRL+C to quit)

To see a list of supported LLMs, run openllm start --help.

On a different terminal window, open a IPython session and create a client to start interacting with the model:

>>> import openllm
>>> client = openllm.client.HTTPClient('http://localhost:3000')
>>> client.query('Explain to me the difference between "further" and "farther"')

To package the LLM into a Bento, simply use openllm build:

openllm build dolly-v2

🎯 To streamline production deployment, you can use the following:

☁️ BentoML Cloud: the fastest way to deploy your bento, simple and at scale
🦄️ Yatai: Model Deployment at scale on Kubernetes
🚀 bentoctl: Fast model deployment on AWS SageMaker, Lambda, ECE, GCP, Azure, Heroku, and more!

🍇 Telemetry

OpenLLM collects usage data that helps the team to improve the product. Only OpenLLM's internal API calls are being reported. We strip out as much potentially sensitive information as possible, and we will never collect user code, model data, or stack traces. Here's the code for usage tracking. You can opt-out of usage tracking by the --do-not-track CLI option:

openllm [command] --do-not-track

Or by setting environment variable OPENLLM_DO_NOT_TRACK=True:

export OPENLLM_DO_NOT_TRACK=True

README.md Unescape Escape

OpenLLM

🍇 Telemetry

README.md