chore(docs): update README.md

See #12
This commit is contained in:
Chaoyu
2023-06-09 17:13:14 -07:00
committed by Aaron
parent 1597d5d4bb
commit e2b26adf2f
6 changed files with 106 additions and 113 deletions

3
.gitignore vendored
View File

@@ -130,3 +130,6 @@ dmypy.json
bazel-*
package-lock.json
# PyCharm config
.idea

View File

@@ -9,11 +9,14 @@ out to us if you have any question!
## Table of Contents
- [Setting Up Your Development Environment](#setting-up-your-development-environment)
- [Project Structure](#project-structure)
- [Development Workflow](#development-workflow)
- [Writing Tests](#writing-tests)
- [Releasing a New Version](#releasing-a-new-version)
- [Developer Guide](#developer-guide)
- [Table of Contents](#table-of-contents)
- [Setting Up Your Development Environment](#setting-up-your-development-environment)
- [Project Structure](#project-structure)
- [Development Workflow](#development-workflow)
- [Using a custom fork](#using-a-custom-fork)
- [Writing Tests](#writing-tests)
- [Releasing a New Version](#releasing-a-new-version)
## Setting Up Your Development Environment
@@ -121,6 +124,12 @@ After setting up your environment, here's how you can start contributing:
8. Submit a Pull Request on GitHub.
## Using a custom fork
If you wish to use a modified version of OpenLLM, install your fork from source
with `pip install -e` and set `OPENLLM_DEV_BUILD=True`, so that Bentos built will
include the generated wheels for OpenLLM in the bundle.
## Writing Tests
Good tests are crucial for the stability of our codebase. Always write tests for

150
README.md
View File

@@ -1,42 +1,41 @@
<div align="center">
<h1 align="center">OpenLLM</h1>
<h1 align="center">🦾 OpenLLM</h1>
<a href="https://pypi.org/project/openllm">
<img src="https://img.shields.io/pypi/v/openllm.svg" alt="pypi_status" />
</a><a href="https://github.com/bentoml/OpenLLM/actions/workflows/ci.yml">
<img src="https://github.com/bentoml/OpenLLM/actions/workflows/ci.yml/badge.svg?branch=main" alt="ci" />
</a><a href="https://l.bentoml.com/join-openllm-discord">
<img src="https://badgen.net/badge/icon/OpenLLM/7289da?icon=discord&label=Join%20Us" alt="Discord" />
</a><a href="https://twitter.com/bentomlai">
<img src="https://badgen.net/badge/icon/@bentomlai/1DA1F2?icon=twitter&label=Follow%20Us" alt="Twitter" />
</a><a href="https://l.bentoml.com/join-openllm-discord">
<img src="https://badgen.net/badge/icon/OpenLLM/7289da?icon=discord&label=Join%20Us" alt="Discord" />
</a><br>
<strong>Build, fine-tune, serve, and deploy Large-Language Models including popular ones like StableLM, Llama, Dolly, Flan-T5, Vicuna, or even your custom LLMs.<br></strong>
<i>Powered by BentoML 🍱</i>
<p>An open platform for operating large language models(LLMs) in production.</br>
Fine-tune, serve, deploy, and monitor any LLMs with ease.</p>
<i></i>
</div>
<br/>
## 📖 Introduction
With OpenLLM, you can easily run inference with any open-source large-language
models(LLMs) and build production-ready LLM apps, powered by BentoML. Here are
some key features:
With OpenLLM, you can run inference with any open-source large-language models(LLMs),
deploy to the cloud or on-premises, and build powerful AI apps.
🚂 **SOTA LLMs**: With a single click, access support for state-of-the-art LLMs,
including StableLM, Llama, Alpaca, Dolly, Flan-T5, ChatGLM, Falcon, and more.
🚂 **SOTA LLMs**: built-in supports a wide range of open-source LLMs and model runtime,
including StableLM, Falcon, Dolly, Flan-T5, ChatGLM, StarCoder and more.
🔥 **Easy-to-use APIs**: We provide intuitive interfaces by integrating with
popular tools like BentoML, HuggingFace, LangChain, and more.
🔥 **Flexible APIs**: serve LLMs over RESTful API or gRPC with one command, query
via WebUI, CLI, our Python/Javascript client, or any HTTP client.
📦 **Fine-tuning your own LLM**: Customize any LLM to suit your needs with
`LLM.tuning()`. (Work In Progress)
⛓️ **Freedom To Build**: First-class support for LangChain and BentoML allows you to
easily create your own AI apps by composing LLMs with other models and services.
⛓️ **Interoperability**: First-class support for LangChain and BentoMLs runner
architecture, allows easy chaining of LLMs on multiple GPUs/Nodes. (Work In
Progress)
🎯 **Streamline Deployment**: build your LLM server Docker Images or deploy as
serverless endpoint via [☁️ BentoCloud](https://l.bentoml.com/bento-cloud).
🤖️ **Bring your own LLM**: Fine-tune any LLM to suit your needs with
`LLM.tuning()`. (Coming soon)
🎯 **Streamline Production Deployment**: Seamlessly package into a Bento with
`openllm build`, containerized into OCI Images, and deploy with a single click
using [☁️ BentoCloud](https://l.bentoml.com/bento-cloud).
## 🏃‍ Getting Started
@@ -53,12 +52,8 @@ pip install openllm
To verify if it's installed correctly, run:
```
openllm -h
```
$ openllm -h
The correct output will be:
```
Usage: openllm [OPTIONS] COMMAND [ARGS]...
██████╗ ██████╗ ███████╗███╗ ██╗██╗ ██╗ ███╗ ███╗
@@ -68,11 +63,8 @@ Usage: openllm [OPTIONS] COMMAND [ARGS]...
╚██████╔╝██║ ███████╗██║ ╚████║███████╗███████╗██║ ╚═╝ ██║
╚═════╝ ╚═╝ ╚══════╝╚═╝ ╚═══╝╚══════╝╚══════╝╚═╝ ╚═╝
OpenLLM: Your one stop-and-go-solution for serving any Open Large-Language Model
- StableLM, Falcon, ChatGLM, Dolly, Flan-T5, and more
- Powered by BentoML 🍱
An open platform for operating large language models in production.
Fine-tune, serve, deploy, and monitor any LLMs with ease.
```
### Starting an LLM Server
@@ -84,8 +76,8 @@ server:
openllm start dolly-v2
```
Following this, a swagger UI will be accessible at http://0.0.0.0:3000 where you
can experiment with the endpoints and sample prompts.
Following this, a Web UI will be accessible at http://0.0.0.0:3000 where you
can experiment with the endpoints and sample input prompts.
OpenLLM provides a built-in Python client, allowing you to interact with the
model. In a different terminal window or a Jupyter notebook, create a client to
@@ -101,62 +93,33 @@ You can also use the `openllm query` command to query the model from the
terminal:
```bash
openllm query --endpoint http://localhost:3000 'Explain to me the difference between "further" and "farther"'
export OPENLLM_ENDPOINT=http://localhost:3000
openllm query 'Explain to me the difference between "further" and "farther"'
```
## 🚀 Deploying to Production
Visit `http://0.0.0.0:3000/docs.json` for OpenLLM's API specification.
To deploy your LLMs into production:
1. **Building a Bento**: With OpenLLM, you can easily build a Bento for a
specific model, like `dolly-v2`, using the `build` command.:
## 🧩 Supported Models
```bash
openllm build dolly-v2
```
A
[Bento](https://docs.bentoml.org/en/latest/concepts/bento.html#what-is-a-bento),
in BentoML, is the unit of distribution. It packages your program's source
code, models, files, artifacts, and dependencies.
> _NOTE_: If you wish to build OpenLLM from the git source, set
> `OPENLLM_DEV_BUILD=True` to include the generated wheels in the bundle.
2. **Containerize your Bento**
```
bentoml containerize <name:version>
```
BentoML offers a comprehensive set of options for deploying and hosting
online ML services in production. To learn more, check out the
[Deploying a Bento](https://docs.bentoml.org/en/latest/concepts/deploy.html)
guide.
## 🧩 Models and Dependencies
OpenLLM currently supports the following:
The following models are currently supported in OpenLLM. By default, OpenLLM doesn't
include dependencies to run all models. The extra model-specific dependencies can be
installed with the instructions below:
<!-- update-readme.py: start -->
| Model | CPU | GPU | Optional |
| --------------------------------------------------------------------- | --- | --- | -------------------------------- |
| [flan-t5](https://huggingface.co/docs/transformers/model_doc/flan-t5) | ✅ | ✅ | `pip install openllm[flan-t5]` |
| [dolly-v2](https://github.com/databrickslabs/dolly) | ✅ | ✅ | 👾 (not needed) |
| [chatglm](https://github.com/THUDM/ChatGLM-6B) | ❌ | ✅ | `pip install openllm[chatglm]` |
| [starcoder](https://github.com/bigcode-project/starcoder) | ❌ | ✅ | `pip install openllm[starcoder]` |
| [falcon](https://falconllm.tii.ae/) | ❌ | ✅ | `pip install openllm[falcon]` |
| [stablelm](https://github.com/Stability-AI/StableLM) | ❌ | ✅ | 👾 (not needed) |
> NOTE: We respect users' system disk space. Hence, OpenLLM doesn't enforce to
> install dependencies to run all models. If one wishes to use any of the
> aforementioned models, make sure to install the optional dependencies
> mentioned above.
| Model | CPU | GPU | Installation |
| --------------------------------------------------------------------- | --- | --- | ---------------------------------- |
| [flan-t5](https://huggingface.co/docs/transformers/model_doc/flan-t5) | ✅ | ✅ | `pip install "openllm[flan-t5]"` |
| [dolly-v2](https://github.com/databrickslabs/dolly) | ✅ | ✅ | `pip install openllm` |
| [chatglm](https://github.com/THUDM/ChatGLM-6B) | ❌ | ✅ | `pip install "openllm[chatglm]"` |
| [starcoder](https://github.com/bigcode-project/starcoder) | ❌ | ✅ | `pip install "openllm[starcoder]"` |
| [falcon](https://falconllm.tii.ae/) | ❌ | ✅ | `pip install "openllm[falcon]"` |
| [stablelm](https://github.com/Stability-AI/StableLM) | ❌ | ✅ | `pip install openllm` |
<!-- update-readme.py: stop -->
### Runtime Implementations
### Runtime Implementations (Experimental)
Different LLMs may have multiple runtime implementations. For instance, they
might use Pytorch (`pt`), Tensorflow (`tf`), or Flax (`flax`).
@@ -175,8 +138,7 @@ OPENLLM_FLAN_T5_FRAMEWORK=tf openllm start flan-t5
### Integrating a New Model
OpenLLM encourages contributions by welcoming users to incorporate their custom
LLMs into the ecosystem. Checkout
[Adding a New Model Guide](https://github.com/bentoml/OpenLLM/blob/main/ADDING_NEW_MODEL.md)
LLMs into the ecosystem. Check out [Adding a New Model Guide](https://github.com/bentoml/OpenLLM/blob/main/ADDING_NEW_MODEL.md)
to see how you can do it yourself.
## ⚙️ Integrations
@@ -190,7 +152,7 @@ easily integrate with other powerful tools. We currently offer integration with
OpenLLM models can be integrated as a
[Runner](https://docs.bentoml.org/en/latest/concepts/runner.html) in your
BentoML service. These runners has a `generate` method that takes a string as a
BentoML service. These runners have a `generate` method that takes a string as a
prompt and returns a corresponding output string. This will allow you to plug
and play any OpenLLM models with your existing ML workflow.
@@ -233,6 +195,34 @@ llm = OpenLLM.for_model(server_url='http://localhost:8000', server_type='http')
llm("What is the difference between a duck and a goose?")
```
## 🚀 Deploying to Production
To deploy your LLMs into production:
1. **Building a Bento**: With OpenLLM, you can easily build a Bento for a
specific model, like `dolly-v2`, using the `build` command.:
```bash
openllm build dolly-v2
```
A [Bento](https://docs.bentoml.org/en/latest/concepts/bento.html#what-is-a-bento),
in BentoML, is the unit of distribution. It packages your program's source
code, models, files, artifacts, and dependencies.
2. **Containerize your Bento**
```
bentoml containerize <name:version>
```
BentoML offers a comprehensive set of options for deploying and hosting
online ML services in production. To learn more, check out the
[Deploying a Bento](https://docs.bentoml.org/en/latest/concepts/deploy.html)
guide.
## 🍇 Telemetry
OpenLLM collects usage data to enhance user experience and improve the product.

View File

@@ -15,10 +15,13 @@
OpenLLM
=======
OpenLLM: Your one stop-and-go-solution for serving any Open Large-Language Model
An open platform for operating large language models in production. Fine-tune, serve,
deploy, and monitor any LLMs with ease.
- StableLM, Llama, Alpaca, Dolly, Flan-T5, and more
- Powered by BentoML 🍱 + HuggingFace 🤗
* Built-in support for StableLM, Llama, Dolly, Flan-T5, Vicuna
* Option to bring your own fine-tuned LLMs
* Online Serving with HTTP, gRPC, SSE(coming soon) or custom API
* Native integration with BentoML and LangChain for custom LLM apps
"""
from __future__ import annotations

View File

@@ -517,12 +517,8 @@ def cli_factory() -> click.Group:
╚═════╝ ╚═╝ ╚══════╝╚═╝ ╚═══╝╚══════╝╚══════╝╚═╝ ╚═╝
\b
OpenLLM: Your one stop-and-go-solution for serving any Open Large-Language Model
- StableLM, Falcon, ChatGLM, Dolly, Flan-T5, and more
\b
- Powered by BentoML 🍱
An open platform for operating large language models in production.
Fine-tune, serve, deploy, and monitor any LLMs with ease.
"""
@cli.group(cls=OpenLLMCommandGroup, context_settings=_CONTEXT_SETTINGS)
@@ -588,7 +584,7 @@ def cli_factory() -> click.Group:
_echo(bento.tag)
return bento
@cli.command(aliases=["list"])
@cli.command()
@output_option
def models(output: OutputLiteral):
"""List all supported models."""
@@ -649,7 +645,7 @@ def cli_factory() -> click.Group:
sys.exit(0)
@cli.command(aliases=["save"])
@cli.command()
@click.argument(
"model_name", type=click.Choice([inflection.dasherize(name) for name in openllm.CONFIG_MAPPING.keys()])
)
@@ -729,11 +725,12 @@ def cli_factory() -> click.Group:
model_store.delete(model.tag)
click.echo(f"{model} deleted.")
@cli.command(name="query", aliases=["run", "ask"])
@cli.command(name="query")
@click.option(
"--endpoint",
type=click.STRING,
help="LLM Server endpoint, i.e: http://12.323.2.1",
help="OpenLLM Server endpoint, i.e: http://0.0.0.0:3000",
envvar="OPENLLM_ENDPOINT",
default="http://0.0.0.0:3000",
)
@click.option("--timeout", type=click.INT, default=30, help="Default server timeout", show_default=True)

View File

@@ -38,11 +38,11 @@ def main() -> int:
readme = f.readlines()
start_index, stop_index = readme.index(START_COMMENT), readme.index(END_COMMENT)
formatted: dict[t.Literal["Model", "CPU", "GPU", "Optional"], list[str]] = {
formatted: dict[t.Literal["Model", "CPU", "GPU", "Installation"], list[str]] = {
"Model": [],
"CPU": [],
"GPU": [],
"Optional": [],
"Installation": [],
}
max_name_len_div = 0
max_install_len_div = 0
@@ -55,19 +55,19 @@ def main() -> int:
formatted["Model"].append(model_name)
formatted["GPU"].append("")
formatted["CPU"].append("" if not config.__openllm_requires_gpu__ else "")
instruction = "👾 (not needed)"
instruction = "`pip install openllm`"
if dashed in deps:
instruction = f"`pip install openllm[{dashed}]`"
instruction = f"""`pip install "openllm[{dashed}]"`"""
else:
does_not_need_custom_installation.append(model_name)
if len(instruction) > max_install_len_div:
max_install_len_div = len(instruction)
formatted["Optional"].append(instruction)
formatted["Installation"].append(instruction)
meta = ["\n"]
# NOTE: headers
meta += f"| Model {' ' * (max_name_len_div - 6)} | CPU | GPU | Optional {' ' * (max_install_len_div - 8)}|\n"
meta += f"| Model {' ' * (max_name_len_div - 6)} | CPU | GPU | Installation {' ' * (max_install_len_div - 8)}|\n"
# NOTE: divs
meta += f"| {'-' * max_name_len_div}" + " | --- | --- | " + f"{'-' * max_install_len_div} |\n"
# NOTE: rows
@@ -88,15 +88,6 @@ def main() -> int:
)
meta += "\n"
# NOTE: adding notes
meta += """\
> NOTE: We respect users' system disk space. Hence, OpenLLM doesn't enforce to
> install dependencies to run all models. If one wishes to use any of the
> aforementioned models, make sure to install the optional dependencies
> mentioned above.
"""
readme = readme[:start_index] + [START_COMMENT] + meta + [END_COMMENT] + readme[stop_index + 1 :]
with open(os.path.join(ROOT, "README.md"), "w") as f: