mirror of
https://github.com/bentoml/OpenLLM.git
synced 2026-01-22 14:31:26 -05:00
3
.gitignore
vendored
3
.gitignore
vendored
@@ -130,3 +130,6 @@ dmypy.json
|
||||
bazel-*
|
||||
|
||||
package-lock.json
|
||||
|
||||
# PyCharm config
|
||||
.idea
|
||||
|
||||
@@ -9,11 +9,14 @@ out to us if you have any question!
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Setting Up Your Development Environment](#setting-up-your-development-environment)
|
||||
- [Project Structure](#project-structure)
|
||||
- [Development Workflow](#development-workflow)
|
||||
- [Writing Tests](#writing-tests)
|
||||
- [Releasing a New Version](#releasing-a-new-version)
|
||||
- [Developer Guide](#developer-guide)
|
||||
- [Table of Contents](#table-of-contents)
|
||||
- [Setting Up Your Development Environment](#setting-up-your-development-environment)
|
||||
- [Project Structure](#project-structure)
|
||||
- [Development Workflow](#development-workflow)
|
||||
- [Using a custom fork](#using-a-custom-fork)
|
||||
- [Writing Tests](#writing-tests)
|
||||
- [Releasing a New Version](#releasing-a-new-version)
|
||||
|
||||
## Setting Up Your Development Environment
|
||||
|
||||
@@ -121,6 +124,12 @@ After setting up your environment, here's how you can start contributing:
|
||||
|
||||
8. Submit a Pull Request on GitHub.
|
||||
|
||||
## Using a custom fork
|
||||
|
||||
If you wish to use a modified version of OpenLLM, install your fork from source
|
||||
with `pip install -e` and set `OPENLLM_DEV_BUILD=True`, so that Bentos built will
|
||||
include the generated wheels for OpenLLM in the bundle.
|
||||
|
||||
## Writing Tests
|
||||
|
||||
Good tests are crucial for the stability of our codebase. Always write tests for
|
||||
|
||||
150
README.md
150
README.md
@@ -1,42 +1,41 @@
|
||||
<div align="center">
|
||||
<h1 align="center">OpenLLM</h1>
|
||||
<h1 align="center">🦾 OpenLLM</h1>
|
||||
<a href="https://pypi.org/project/openllm">
|
||||
<img src="https://img.shields.io/pypi/v/openllm.svg" alt="pypi_status" />
|
||||
</a><a href="https://github.com/bentoml/OpenLLM/actions/workflows/ci.yml">
|
||||
<img src="https://github.com/bentoml/OpenLLM/actions/workflows/ci.yml/badge.svg?branch=main" alt="ci" />
|
||||
</a><a href="https://l.bentoml.com/join-openllm-discord">
|
||||
<img src="https://badgen.net/badge/icon/OpenLLM/7289da?icon=discord&label=Join%20Us" alt="Discord" />
|
||||
</a><a href="https://twitter.com/bentomlai">
|
||||
<img src="https://badgen.net/badge/icon/@bentomlai/1DA1F2?icon=twitter&label=Follow%20Us" alt="Twitter" />
|
||||
</a><a href="https://l.bentoml.com/join-openllm-discord">
|
||||
<img src="https://badgen.net/badge/icon/OpenLLM/7289da?icon=discord&label=Join%20Us" alt="Discord" />
|
||||
</a><br>
|
||||
<strong>Build, fine-tune, serve, and deploy Large-Language Models including popular ones like StableLM, Llama, Dolly, Flan-T5, Vicuna, or even your custom LLMs.<br></strong>
|
||||
<i>Powered by BentoML 🍱</i>
|
||||
<p>An open platform for operating large language models(LLMs) in production.</br>
|
||||
Fine-tune, serve, deploy, and monitor any LLMs with ease.</p>
|
||||
<i></i>
|
||||
</div>
|
||||
|
||||
<br/>
|
||||
|
||||
## 📖 Introduction
|
||||
|
||||
With OpenLLM, you can easily run inference with any open-source large-language
|
||||
models(LLMs) and build production-ready LLM apps, powered by BentoML. Here are
|
||||
some key features:
|
||||
With OpenLLM, you can run inference with any open-source large-language models(LLMs),
|
||||
deploy to the cloud or on-premises, and build powerful AI apps.
|
||||
|
||||
🚂 **SOTA LLMs**: With a single click, access support for state-of-the-art LLMs,
|
||||
including StableLM, Llama, Alpaca, Dolly, Flan-T5, ChatGLM, Falcon, and more.
|
||||
🚂 **SOTA LLMs**: built-in supports a wide range of open-source LLMs and model runtime,
|
||||
including StableLM, Falcon, Dolly, Flan-T5, ChatGLM, StarCoder and more.
|
||||
|
||||
🔥 **Easy-to-use APIs**: We provide intuitive interfaces by integrating with
|
||||
popular tools like BentoML, HuggingFace, LangChain, and more.
|
||||
🔥 **Flexible APIs**: serve LLMs over RESTful API or gRPC with one command, query
|
||||
via WebUI, CLI, our Python/Javascript client, or any HTTP client.
|
||||
|
||||
📦 **Fine-tuning your own LLM**: Customize any LLM to suit your needs with
|
||||
`LLM.tuning()`. (Work In Progress)
|
||||
⛓️ **Freedom To Build**: First-class support for LangChain and BentoML allows you to
|
||||
easily create your own AI apps by composing LLMs with other models and services.
|
||||
|
||||
⛓️ **Interoperability**: First-class support for LangChain and BentoML’s runner
|
||||
architecture, allows easy chaining of LLMs on multiple GPUs/Nodes. (Work In
|
||||
Progress)
|
||||
🎯 **Streamline Deployment**: build your LLM server Docker Images or deploy as
|
||||
serverless endpoint via [☁️ BentoCloud](https://l.bentoml.com/bento-cloud).
|
||||
|
||||
🤖️ **Bring your own LLM**: Fine-tune any LLM to suit your needs with
|
||||
`LLM.tuning()`. (Coming soon)
|
||||
|
||||
🎯 **Streamline Production Deployment**: Seamlessly package into a Bento with
|
||||
`openllm build`, containerized into OCI Images, and deploy with a single click
|
||||
using [☁️ BentoCloud](https://l.bentoml.com/bento-cloud).
|
||||
|
||||
## 🏃 Getting Started
|
||||
|
||||
@@ -53,12 +52,8 @@ pip install openllm
|
||||
To verify if it's installed correctly, run:
|
||||
|
||||
```
|
||||
openllm -h
|
||||
```
|
||||
$ openllm -h
|
||||
|
||||
The correct output will be:
|
||||
|
||||
```
|
||||
Usage: openllm [OPTIONS] COMMAND [ARGS]...
|
||||
|
||||
██████╗ ██████╗ ███████╗███╗ ██╗██╗ ██╗ ███╗ ███╗
|
||||
@@ -68,11 +63,8 @@ Usage: openllm [OPTIONS] COMMAND [ARGS]...
|
||||
╚██████╔╝██║ ███████╗██║ ╚████║███████╗███████╗██║ ╚═╝ ██║
|
||||
╚═════╝ ╚═╝ ╚══════╝╚═╝ ╚═══╝╚══════╝╚══════╝╚═╝ ╚═╝
|
||||
|
||||
OpenLLM: Your one stop-and-go-solution for serving any Open Large-Language Model
|
||||
|
||||
- StableLM, Falcon, ChatGLM, Dolly, Flan-T5, and more
|
||||
|
||||
- Powered by BentoML 🍱
|
||||
An open platform for operating large language models in production.
|
||||
Fine-tune, serve, deploy, and monitor any LLMs with ease.
|
||||
```
|
||||
|
||||
### Starting an LLM Server
|
||||
@@ -84,8 +76,8 @@ server:
|
||||
openllm start dolly-v2
|
||||
```
|
||||
|
||||
Following this, a swagger UI will be accessible at http://0.0.0.0:3000 where you
|
||||
can experiment with the endpoints and sample prompts.
|
||||
Following this, a Web UI will be accessible at http://0.0.0.0:3000 where you
|
||||
can experiment with the endpoints and sample input prompts.
|
||||
|
||||
OpenLLM provides a built-in Python client, allowing you to interact with the
|
||||
model. In a different terminal window or a Jupyter notebook, create a client to
|
||||
@@ -101,62 +93,33 @@ You can also use the `openllm query` command to query the model from the
|
||||
terminal:
|
||||
|
||||
```bash
|
||||
openllm query --endpoint http://localhost:3000 'Explain to me the difference between "further" and "farther"'
|
||||
export OPENLLM_ENDPOINT=http://localhost:3000
|
||||
openllm query 'Explain to me the difference between "further" and "farther"'
|
||||
```
|
||||
|
||||
## 🚀 Deploying to Production
|
||||
Visit `http://0.0.0.0:3000/docs.json` for OpenLLM's API specification.
|
||||
|
||||
To deploy your LLMs into production:
|
||||
|
||||
1. **Building a Bento**: With OpenLLM, you can easily build a Bento for a
|
||||
specific model, like `dolly-v2`, using the `build` command.:
|
||||
## 🧩 Supported Models
|
||||
|
||||
```bash
|
||||
openllm build dolly-v2
|
||||
```
|
||||
|
||||
A
|
||||
[Bento](https://docs.bentoml.org/en/latest/concepts/bento.html#what-is-a-bento),
|
||||
in BentoML, is the unit of distribution. It packages your program's source
|
||||
code, models, files, artifacts, and dependencies.
|
||||
|
||||
> _NOTE_: If you wish to build OpenLLM from the git source, set
|
||||
> `OPENLLM_DEV_BUILD=True` to include the generated wheels in the bundle.
|
||||
|
||||
2. **Containerize your Bento**
|
||||
|
||||
```
|
||||
bentoml containerize <name:version>
|
||||
```
|
||||
|
||||
BentoML offers a comprehensive set of options for deploying and hosting
|
||||
online ML services in production. To learn more, check out the
|
||||
[Deploying a Bento](https://docs.bentoml.org/en/latest/concepts/deploy.html)
|
||||
guide.
|
||||
|
||||
## 🧩 Models and Dependencies
|
||||
|
||||
OpenLLM currently supports the following:
|
||||
The following models are currently supported in OpenLLM. By default, OpenLLM doesn't
|
||||
include dependencies to run all models. The extra model-specific dependencies can be
|
||||
installed with the instructions below:
|
||||
|
||||
<!-- update-readme.py: start -->
|
||||
|
||||
| Model | CPU | GPU | Optional |
|
||||
| --------------------------------------------------------------------- | --- | --- | -------------------------------- |
|
||||
| [flan-t5](https://huggingface.co/docs/transformers/model_doc/flan-t5) | ✅ | ✅ | `pip install openllm[flan-t5]` |
|
||||
| [dolly-v2](https://github.com/databrickslabs/dolly) | ✅ | ✅ | 👾 (not needed) |
|
||||
| [chatglm](https://github.com/THUDM/ChatGLM-6B) | ❌ | ✅ | `pip install openllm[chatglm]` |
|
||||
| [starcoder](https://github.com/bigcode-project/starcoder) | ❌ | ✅ | `pip install openllm[starcoder]` |
|
||||
| [falcon](https://falconllm.tii.ae/) | ❌ | ✅ | `pip install openllm[falcon]` |
|
||||
| [stablelm](https://github.com/Stability-AI/StableLM) | ❌ | ✅ | 👾 (not needed) |
|
||||
|
||||
> NOTE: We respect users' system disk space. Hence, OpenLLM doesn't enforce to
|
||||
> install dependencies to run all models. If one wishes to use any of the
|
||||
> aforementioned models, make sure to install the optional dependencies
|
||||
> mentioned above.
|
||||
| Model | CPU | GPU | Installation |
|
||||
| --------------------------------------------------------------------- | --- | --- | ---------------------------------- |
|
||||
| [flan-t5](https://huggingface.co/docs/transformers/model_doc/flan-t5) | ✅ | ✅ | `pip install "openllm[flan-t5]"` |
|
||||
| [dolly-v2](https://github.com/databrickslabs/dolly) | ✅ | ✅ | `pip install openllm` |
|
||||
| [chatglm](https://github.com/THUDM/ChatGLM-6B) | ❌ | ✅ | `pip install "openllm[chatglm]"` |
|
||||
| [starcoder](https://github.com/bigcode-project/starcoder) | ❌ | ✅ | `pip install "openllm[starcoder]"` |
|
||||
| [falcon](https://falconllm.tii.ae/) | ❌ | ✅ | `pip install "openllm[falcon]"` |
|
||||
| [stablelm](https://github.com/Stability-AI/StableLM) | ❌ | ✅ | `pip install openllm` |
|
||||
|
||||
<!-- update-readme.py: stop -->
|
||||
|
||||
### Runtime Implementations
|
||||
### Runtime Implementations (Experimental)
|
||||
|
||||
Different LLMs may have multiple runtime implementations. For instance, they
|
||||
might use Pytorch (`pt`), Tensorflow (`tf`), or Flax (`flax`).
|
||||
@@ -175,8 +138,7 @@ OPENLLM_FLAN_T5_FRAMEWORK=tf openllm start flan-t5
|
||||
### Integrating a New Model
|
||||
|
||||
OpenLLM encourages contributions by welcoming users to incorporate their custom
|
||||
LLMs into the ecosystem. Checkout
|
||||
[Adding a New Model Guide](https://github.com/bentoml/OpenLLM/blob/main/ADDING_NEW_MODEL.md)
|
||||
LLMs into the ecosystem. Check out [Adding a New Model Guide](https://github.com/bentoml/OpenLLM/blob/main/ADDING_NEW_MODEL.md)
|
||||
to see how you can do it yourself.
|
||||
|
||||
## ⚙️ Integrations
|
||||
@@ -190,7 +152,7 @@ easily integrate with other powerful tools. We currently offer integration with
|
||||
|
||||
OpenLLM models can be integrated as a
|
||||
[Runner](https://docs.bentoml.org/en/latest/concepts/runner.html) in your
|
||||
BentoML service. These runners has a `generate` method that takes a string as a
|
||||
BentoML service. These runners have a `generate` method that takes a string as a
|
||||
prompt and returns a corresponding output string. This will allow you to plug
|
||||
and play any OpenLLM models with your existing ML workflow.
|
||||
|
||||
@@ -233,6 +195,34 @@ llm = OpenLLM.for_model(server_url='http://localhost:8000', server_type='http')
|
||||
llm("What is the difference between a duck and a goose?")
|
||||
```
|
||||
|
||||
## 🚀 Deploying to Production
|
||||
|
||||
To deploy your LLMs into production:
|
||||
|
||||
1. **Building a Bento**: With OpenLLM, you can easily build a Bento for a
|
||||
specific model, like `dolly-v2`, using the `build` command.:
|
||||
|
||||
```bash
|
||||
openllm build dolly-v2
|
||||
```
|
||||
|
||||
A [Bento](https://docs.bentoml.org/en/latest/concepts/bento.html#what-is-a-bento),
|
||||
in BentoML, is the unit of distribution. It packages your program's source
|
||||
code, models, files, artifacts, and dependencies.
|
||||
|
||||
|
||||
2. **Containerize your Bento**
|
||||
|
||||
```
|
||||
bentoml containerize <name:version>
|
||||
```
|
||||
|
||||
BentoML offers a comprehensive set of options for deploying and hosting
|
||||
online ML services in production. To learn more, check out the
|
||||
[Deploying a Bento](https://docs.bentoml.org/en/latest/concepts/deploy.html)
|
||||
guide.
|
||||
|
||||
|
||||
## 🍇 Telemetry
|
||||
|
||||
OpenLLM collects usage data to enhance user experience and improve the product.
|
||||
|
||||
@@ -15,10 +15,13 @@
|
||||
OpenLLM
|
||||
=======
|
||||
|
||||
OpenLLM: Your one stop-and-go-solution for serving any Open Large-Language Model
|
||||
An open platform for operating large language models in production. Fine-tune, serve,
|
||||
deploy, and monitor any LLMs with ease.
|
||||
|
||||
- StableLM, Llama, Alpaca, Dolly, Flan-T5, and more
|
||||
- Powered by BentoML 🍱 + HuggingFace 🤗
|
||||
* Built-in support for StableLM, Llama, Dolly, Flan-T5, Vicuna
|
||||
* Option to bring your own fine-tuned LLMs
|
||||
* Online Serving with HTTP, gRPC, SSE(coming soon) or custom API
|
||||
* Native integration with BentoML and LangChain for custom LLM apps
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
|
||||
@@ -517,12 +517,8 @@ def cli_factory() -> click.Group:
|
||||
╚═════╝ ╚═╝ ╚══════╝╚═╝ ╚═══╝╚══════╝╚══════╝╚═╝ ╚═╝
|
||||
|
||||
\b
|
||||
OpenLLM: Your one stop-and-go-solution for serving any Open Large-Language Model
|
||||
|
||||
- StableLM, Falcon, ChatGLM, Dolly, Flan-T5, and more
|
||||
|
||||
\b
|
||||
- Powered by BentoML 🍱
|
||||
An open platform for operating large language models in production.
|
||||
Fine-tune, serve, deploy, and monitor any LLMs with ease.
|
||||
"""
|
||||
|
||||
@cli.group(cls=OpenLLMCommandGroup, context_settings=_CONTEXT_SETTINGS)
|
||||
@@ -588,7 +584,7 @@ def cli_factory() -> click.Group:
|
||||
_echo(bento.tag)
|
||||
return bento
|
||||
|
||||
@cli.command(aliases=["list"])
|
||||
@cli.command()
|
||||
@output_option
|
||||
def models(output: OutputLiteral):
|
||||
"""List all supported models."""
|
||||
@@ -649,7 +645,7 @@ def cli_factory() -> click.Group:
|
||||
|
||||
sys.exit(0)
|
||||
|
||||
@cli.command(aliases=["save"])
|
||||
@cli.command()
|
||||
@click.argument(
|
||||
"model_name", type=click.Choice([inflection.dasherize(name) for name in openllm.CONFIG_MAPPING.keys()])
|
||||
)
|
||||
@@ -729,11 +725,12 @@ def cli_factory() -> click.Group:
|
||||
model_store.delete(model.tag)
|
||||
click.echo(f"{model} deleted.")
|
||||
|
||||
@cli.command(name="query", aliases=["run", "ask"])
|
||||
@cli.command(name="query")
|
||||
@click.option(
|
||||
"--endpoint",
|
||||
type=click.STRING,
|
||||
help="LLM Server endpoint, i.e: http://12.323.2.1",
|
||||
help="OpenLLM Server endpoint, i.e: http://0.0.0.0:3000",
|
||||
envvar="OPENLLM_ENDPOINT",
|
||||
default="http://0.0.0.0:3000",
|
||||
)
|
||||
@click.option("--timeout", type=click.INT, default=30, help="Default server timeout", show_default=True)
|
||||
|
||||
@@ -38,11 +38,11 @@ def main() -> int:
|
||||
readme = f.readlines()
|
||||
|
||||
start_index, stop_index = readme.index(START_COMMENT), readme.index(END_COMMENT)
|
||||
formatted: dict[t.Literal["Model", "CPU", "GPU", "Optional"], list[str]] = {
|
||||
formatted: dict[t.Literal["Model", "CPU", "GPU", "Installation"], list[str]] = {
|
||||
"Model": [],
|
||||
"CPU": [],
|
||||
"GPU": [],
|
||||
"Optional": [],
|
||||
"Installation": [],
|
||||
}
|
||||
max_name_len_div = 0
|
||||
max_install_len_div = 0
|
||||
@@ -55,19 +55,19 @@ def main() -> int:
|
||||
formatted["Model"].append(model_name)
|
||||
formatted["GPU"].append("✅")
|
||||
formatted["CPU"].append("✅" if not config.__openllm_requires_gpu__ else "❌")
|
||||
instruction = "👾 (not needed)"
|
||||
instruction = "`pip install openllm`"
|
||||
if dashed in deps:
|
||||
instruction = f"`pip install openllm[{dashed}]`"
|
||||
instruction = f"""`pip install "openllm[{dashed}]"`"""
|
||||
else:
|
||||
does_not_need_custom_installation.append(model_name)
|
||||
if len(instruction) > max_install_len_div:
|
||||
max_install_len_div = len(instruction)
|
||||
formatted["Optional"].append(instruction)
|
||||
formatted["Installation"].append(instruction)
|
||||
|
||||
meta = ["\n"]
|
||||
|
||||
# NOTE: headers
|
||||
meta += f"| Model {' ' * (max_name_len_div - 6)} | CPU | GPU | Optional {' ' * (max_install_len_div - 8)}|\n"
|
||||
meta += f"| Model {' ' * (max_name_len_div - 6)} | CPU | GPU | Installation {' ' * (max_install_len_div - 8)}|\n"
|
||||
# NOTE: divs
|
||||
meta += f"| {'-' * max_name_len_div}" + " | --- | --- | " + f"{'-' * max_install_len_div} |\n"
|
||||
# NOTE: rows
|
||||
@@ -88,15 +88,6 @@ def main() -> int:
|
||||
)
|
||||
meta += "\n"
|
||||
|
||||
# NOTE: adding notes
|
||||
meta += """\
|
||||
> NOTE: We respect users' system disk space. Hence, OpenLLM doesn't enforce to
|
||||
> install dependencies to run all models. If one wishes to use any of the
|
||||
> aforementioned models, make sure to install the optional dependencies
|
||||
> mentioned above.
|
||||
|
||||
"""
|
||||
|
||||
readme = readme[:start_index] + [START_COMMENT] + meta + [END_COMMENT] + readme[stop_index + 1 :]
|
||||
|
||||
with open(os.path.join(ROOT, "README.md"), "w") as f:
|
||||
|
||||
Reference in New Issue
Block a user