diff --git a/.gitignore b/.gitignore
index 73883c09..688cea65 100644
--- a/.gitignore
+++ b/.gitignore
@@ -130,3 +130,6 @@ dmypy.json
bazel-*
package-lock.json
+
+# PyCharm config
+.idea
diff --git a/DEVELOPMENT.md b/DEVELOPMENT.md
index 49b88a51..cec32e84 100644
--- a/DEVELOPMENT.md
+++ b/DEVELOPMENT.md
@@ -9,11 +9,14 @@ out to us if you have any question!
## Table of Contents
-- [Setting Up Your Development Environment](#setting-up-your-development-environment)
-- [Project Structure](#project-structure)
-- [Development Workflow](#development-workflow)
-- [Writing Tests](#writing-tests)
-- [Releasing a New Version](#releasing-a-new-version)
+- [Developer Guide](#developer-guide)
+ - [Table of Contents](#table-of-contents)
+ - [Setting Up Your Development Environment](#setting-up-your-development-environment)
+ - [Project Structure](#project-structure)
+ - [Development Workflow](#development-workflow)
+ - [Using a custom fork](#using-a-custom-fork)
+ - [Writing Tests](#writing-tests)
+ - [Releasing a New Version](#releasing-a-new-version)
## Setting Up Your Development Environment
@@ -121,6 +124,12 @@ After setting up your environment, here's how you can start contributing:
8. Submit a Pull Request on GitHub.
+## Using a custom fork
+
+If you wish to use a modified version of OpenLLM, install your fork from source
+with `pip install -e` and set `OPENLLM_DEV_BUILD=True`, so that Bentos built will
+include the generated wheels for OpenLLM in the bundle.
+
## Writing Tests
Good tests are crucial for the stability of our codebase. Always write tests for
diff --git a/README.md b/README.md
index a97c9e44..a3ebc64b 100644
--- a/README.md
+++ b/README.md
@@ -1,42 +1,41 @@
-
OpenLLM
+
π¦Ύ OpenLLM
-
-
+
+
-
Build, fine-tune, serve, and deploy Large-Language Models including popular ones like StableLM, Llama, Dolly, Flan-T5, Vicuna, or even your custom LLMs.
-
Powered by BentoML π±
+
An open platform for operating large language models(LLMs) in production.
+ Fine-tune, serve, deploy, and monitor any LLMs with ease.
+
## π Introduction
-With OpenLLM, you can easily run inference with any open-source large-language
-models(LLMs) and build production-ready LLM apps, powered by BentoML. Here are
-some key features:
+With OpenLLM, you can run inference with any open-source large-language models(LLMs),
+deploy to the cloud or on-premises, and build powerful AI apps.
-π **SOTA LLMs**: With a single click, access support for state-of-the-art LLMs,
-including StableLM, Llama, Alpaca, Dolly, Flan-T5, ChatGLM, Falcon, and more.
+π **SOTA LLMs**: built-in supports a wide range of open-source LLMs and model runtime,
+including StableLM, Falcon, Dolly, Flan-T5, ChatGLM, StarCoder and more.
-π₯ **Easy-to-use APIs**: We provide intuitive interfaces by integrating with
-popular tools like BentoML, HuggingFace, LangChain, and more.
+π₯ **Flexible APIs**: serve LLMs over RESTful API or gRPC with one command, query
+via WebUI, CLI, our Python/Javascript client, or any HTTP client.
-π¦ **Fine-tuning your own LLM**: Customize any LLM to suit your needs with
-`LLM.tuning()`. (Work In Progress)
+βοΈ **Freedom To Build**: First-class support for LangChain and BentoML allows you to
+easily create your own AI apps by composing LLMs with other models and services.
-βοΈ **Interoperability**: First-class support for LangChain and BentoMLβs runner
-architecture, allows easy chaining of LLMs on multiple GPUs/Nodes. (Work In
-Progress)
+π― **Streamline Deployment**: build your LLM server Docker Images or deploy as
+serverless endpoint via [βοΈ BentoCloud](https://l.bentoml.com/bento-cloud).
+
+π€οΈ **Bring your own LLM**: Fine-tune any LLM to suit your needs with
+`LLM.tuning()`. (Coming soon)
-π― **Streamline Production Deployment**: Seamlessly package into a Bento with
-`openllm build`, containerized into OCI Images, and deploy with a single click
-using [βοΈ BentoCloud](https://l.bentoml.com/bento-cloud).
## πβ Getting Started
@@ -53,12 +52,8 @@ pip install openllm
To verify if it's installed correctly, run:
```
-openllm -h
-```
+$ openllm -h
-The correct output will be:
-
-```
Usage: openllm [OPTIONS] COMMAND [ARGS]...
βββββββ βββββββ ββββββββββββ ββββββ βββ ββββ ββββ
@@ -68,11 +63,8 @@ Usage: openllm [OPTIONS] COMMAND [ARGS]...
ββββββββββββ βββββββββββ βββββββββββββββββββββββββ βββ βββ
βββββββ βββ βββββββββββ ββββββββββββββββββββββββ βββ
- OpenLLM: Your one stop-and-go-solution for serving any Open Large-Language Model
-
- - StableLM, Falcon, ChatGLM, Dolly, Flan-T5, and more
-
- - Powered by BentoML π±
+ An open platform for operating large language models in production.
+ Fine-tune, serve, deploy, and monitor any LLMs with ease.
```
### Starting an LLM Server
@@ -84,8 +76,8 @@ server:
openllm start dolly-v2
```
-Following this, a swagger UI will be accessible at http://0.0.0.0:3000 where you
-can experiment with the endpoints and sample prompts.
+Following this, a Web UI will be accessible at http://0.0.0.0:3000 where you
+can experiment with the endpoints and sample input prompts.
OpenLLM provides a built-in Python client, allowing you to interact with the
model. In a different terminal window or a Jupyter notebook, create a client to
@@ -101,62 +93,33 @@ You can also use the `openllm query` command to query the model from the
terminal:
```bash
-openllm query --endpoint http://localhost:3000 'Explain to me the difference between "further" and "farther"'
+export OPENLLM_ENDPOINT=http://localhost:3000
+openllm query 'Explain to me the difference between "further" and "farther"'
```
-## π Deploying to Production
+Visit `http://0.0.0.0:3000/docs.json` for OpenLLM's API specification.
-To deploy your LLMs into production:
-1. **Building a Bento**: With OpenLLM, you can easily build a Bento for a
- specific model, like `dolly-v2`, using the `build` command.:
+## π§© Supported Models
- ```bash
- openllm build dolly-v2
- ```
-
- A
- [Bento](https://docs.bentoml.org/en/latest/concepts/bento.html#what-is-a-bento),
- in BentoML, is the unit of distribution. It packages your program's source
- code, models, files, artifacts, and dependencies.
-
- > _NOTE_: If you wish to build OpenLLM from the git source, set
- > `OPENLLM_DEV_BUILD=True` to include the generated wheels in the bundle.
-
-2. **Containerize your Bento**
-
- ```
- bentoml containerize
- ```
-
- BentoML offers a comprehensive set of options for deploying and hosting
- online ML services in production. To learn more, check out the
- [Deploying a Bento](https://docs.bentoml.org/en/latest/concepts/deploy.html)
- guide.
-
-## π§© Models and Dependencies
-
-OpenLLM currently supports the following:
+The following models are currently supported in OpenLLM. By default, OpenLLM doesn't
+include dependencies to run all models. The extra model-specific dependencies can be
+installed with the instructions below:
-| Model | CPU | GPU | Optional |
-| --------------------------------------------------------------------- | --- | --- | -------------------------------- |
-| [flan-t5](https://huggingface.co/docs/transformers/model_doc/flan-t5) | β
| β
| `pip install openllm[flan-t5]` |
-| [dolly-v2](https://github.com/databrickslabs/dolly) | β
| β
| πΎ (not needed) |
-| [chatglm](https://github.com/THUDM/ChatGLM-6B) | β | β
| `pip install openllm[chatglm]` |
-| [starcoder](https://github.com/bigcode-project/starcoder) | β | β
| `pip install openllm[starcoder]` |
-| [falcon](https://falconllm.tii.ae/) | β | β
| `pip install openllm[falcon]` |
-| [stablelm](https://github.com/Stability-AI/StableLM) | β | β
| πΎ (not needed) |
-
-> NOTE: We respect users' system disk space. Hence, OpenLLM doesn't enforce to
-> install dependencies to run all models. If one wishes to use any of the
-> aforementioned models, make sure to install the optional dependencies
-> mentioned above.
+| Model | CPU | GPU | Installation |
+| --------------------------------------------------------------------- | --- | --- | ---------------------------------- |
+| [flan-t5](https://huggingface.co/docs/transformers/model_doc/flan-t5) | β
| β
| `pip install "openllm[flan-t5]"` |
+| [dolly-v2](https://github.com/databrickslabs/dolly) | β
| β
| `pip install openllm` |
+| [chatglm](https://github.com/THUDM/ChatGLM-6B) | β | β
| `pip install "openllm[chatglm]"` |
+| [starcoder](https://github.com/bigcode-project/starcoder) | β | β
| `pip install "openllm[starcoder]"` |
+| [falcon](https://falconllm.tii.ae/) | β | β
| `pip install "openllm[falcon]"` |
+| [stablelm](https://github.com/Stability-AI/StableLM) | β | β
| `pip install openllm` |
-### Runtime Implementations
+### Runtime Implementations (Experimental)
Different LLMs may have multiple runtime implementations. For instance, they
might use Pytorch (`pt`), Tensorflow (`tf`), or Flax (`flax`).
@@ -175,8 +138,7 @@ OPENLLM_FLAN_T5_FRAMEWORK=tf openllm start flan-t5
### Integrating a New Model
OpenLLM encourages contributions by welcoming users to incorporate their custom
-LLMs into the ecosystem. Checkout
-[Adding a New Model Guide](https://github.com/bentoml/OpenLLM/blob/main/ADDING_NEW_MODEL.md)
+LLMs into the ecosystem. Check out [Adding a New Model Guide](https://github.com/bentoml/OpenLLM/blob/main/ADDING_NEW_MODEL.md)
to see how you can do it yourself.
## βοΈ Integrations
@@ -190,7 +152,7 @@ easily integrate with other powerful tools. We currently offer integration with
OpenLLM models can be integrated as a
[Runner](https://docs.bentoml.org/en/latest/concepts/runner.html) in your
-BentoML service. These runners has a `generate` method that takes a string as a
+BentoML service. These runners have a `generate` method that takes a string as a
prompt and returns a corresponding output string. This will allow you to plug
and play any OpenLLM models with your existing ML workflow.
@@ -233,6 +195,34 @@ llm = OpenLLM.for_model(server_url='http://localhost:8000', server_type='http')
llm("What is the difference between a duck and a goose?")
```
+## π Deploying to Production
+
+To deploy your LLMs into production:
+
+1. **Building a Bento**: With OpenLLM, you can easily build a Bento for a
+ specific model, like `dolly-v2`, using the `build` command.:
+
+ ```bash
+ openllm build dolly-v2
+ ```
+
+ A [Bento](https://docs.bentoml.org/en/latest/concepts/bento.html#what-is-a-bento),
+ in BentoML, is the unit of distribution. It packages your program's source
+ code, models, files, artifacts, and dependencies.
+
+
+2. **Containerize your Bento**
+
+ ```
+ bentoml containerize
+ ```
+
+ BentoML offers a comprehensive set of options for deploying and hosting
+ online ML services in production. To learn more, check out the
+ [Deploying a Bento](https://docs.bentoml.org/en/latest/concepts/deploy.html)
+ guide.
+
+
## π Telemetry
OpenLLM collects usage data to enhance user experience and improve the product.
diff --git a/src/openllm/__init__.py b/src/openllm/__init__.py
index 364ba2c6..39d49f5f 100644
--- a/src/openllm/__init__.py
+++ b/src/openllm/__init__.py
@@ -15,10 +15,13 @@
OpenLLM
=======
-OpenLLM: Your one stop-and-go-solution for serving any Open Large-Language Model
+An open platform for operating large language models in production. Fine-tune, serve,
+deploy, and monitor any LLMs with ease.
-- StableLM, Llama, Alpaca, Dolly, Flan-T5, and more
-- Powered by BentoML π± + HuggingFace π€
+* Built-in support for StableLM, Llama, Dolly, Flan-T5, Vicuna
+* Option to bring your own fine-tuned LLMs
+* Online Serving with HTTP, gRPC, SSE(coming soon) or custom API
+* Native integration with BentoML and LangChain for custom LLM apps
"""
from __future__ import annotations
diff --git a/src/openllm/cli.py b/src/openllm/cli.py
index 0fddbf94..c2bd6b27 100644
--- a/src/openllm/cli.py
+++ b/src/openllm/cli.py
@@ -517,12 +517,8 @@ def cli_factory() -> click.Group:
βββββββ βββ βββββββββββ ββββββββββββββββββββββββ βββ
\b
- OpenLLM: Your one stop-and-go-solution for serving any Open Large-Language Model
-
- - StableLM, Falcon, ChatGLM, Dolly, Flan-T5, and more
-
- \b
- - Powered by BentoML π±
+ An open platform for operating large language models in production.
+ Fine-tune, serve, deploy, and monitor any LLMs with ease.
"""
@cli.group(cls=OpenLLMCommandGroup, context_settings=_CONTEXT_SETTINGS)
@@ -588,7 +584,7 @@ def cli_factory() -> click.Group:
_echo(bento.tag)
return bento
- @cli.command(aliases=["list"])
+ @cli.command()
@output_option
def models(output: OutputLiteral):
"""List all supported models."""
@@ -649,7 +645,7 @@ def cli_factory() -> click.Group:
sys.exit(0)
- @cli.command(aliases=["save"])
+ @cli.command()
@click.argument(
"model_name", type=click.Choice([inflection.dasherize(name) for name in openllm.CONFIG_MAPPING.keys()])
)
@@ -729,11 +725,12 @@ def cli_factory() -> click.Group:
model_store.delete(model.tag)
click.echo(f"{model} deleted.")
- @cli.command(name="query", aliases=["run", "ask"])
+ @cli.command(name="query")
@click.option(
"--endpoint",
type=click.STRING,
- help="LLM Server endpoint, i.e: http://12.323.2.1",
+ help="OpenLLM Server endpoint, i.e: http://0.0.0.0:3000",
+ envvar="OPENLLM_ENDPOINT",
default="http://0.0.0.0:3000",
)
@click.option("--timeout", type=click.INT, default=30, help="Default server timeout", show_default=True)
diff --git a/tools/update-readme.py b/tools/update-readme.py
index 2adbc87d..a9d48348 100755
--- a/tools/update-readme.py
+++ b/tools/update-readme.py
@@ -38,11 +38,11 @@ def main() -> int:
readme = f.readlines()
start_index, stop_index = readme.index(START_COMMENT), readme.index(END_COMMENT)
- formatted: dict[t.Literal["Model", "CPU", "GPU", "Optional"], list[str]] = {
+ formatted: dict[t.Literal["Model", "CPU", "GPU", "Installation"], list[str]] = {
"Model": [],
"CPU": [],
"GPU": [],
- "Optional": [],
+ "Installation": [],
}
max_name_len_div = 0
max_install_len_div = 0
@@ -55,19 +55,19 @@ def main() -> int:
formatted["Model"].append(model_name)
formatted["GPU"].append("β
")
formatted["CPU"].append("β
" if not config.__openllm_requires_gpu__ else "β")
- instruction = "πΎ (not needed)"
+ instruction = "`pip install openllm`"
if dashed in deps:
- instruction = f"`pip install openllm[{dashed}]`"
+ instruction = f"""`pip install "openllm[{dashed}]"`"""
else:
does_not_need_custom_installation.append(model_name)
if len(instruction) > max_install_len_div:
max_install_len_div = len(instruction)
- formatted["Optional"].append(instruction)
+ formatted["Installation"].append(instruction)
meta = ["\n"]
# NOTE: headers
- meta += f"| Model {' ' * (max_name_len_div - 6)} | CPU | GPU | Optional {' ' * (max_install_len_div - 8)}|\n"
+ meta += f"| Model {' ' * (max_name_len_div - 6)} | CPU | GPU | Installation {' ' * (max_install_len_div - 8)}|\n"
# NOTE: divs
meta += f"| {'-' * max_name_len_div}" + " | --- | --- | " + f"{'-' * max_install_len_div} |\n"
# NOTE: rows
@@ -88,15 +88,6 @@ def main() -> int:
)
meta += "\n"
- # NOTE: adding notes
- meta += """\
-> NOTE: We respect users' system disk space. Hence, OpenLLM doesn't enforce to
-> install dependencies to run all models. If one wishes to use any of the
-> aforementioned models, make sure to install the optional dependencies
-> mentioned above.
-
-"""
-
readme = readme[:start_index] + [START_COMMENT] + meta + [END_COMMENT] + readme[stop_index + 1 :]
with open(os.path.join(ROOT, "README.md"), "w") as f: