chore(docs): update README.md

See #12
2026-03-10 02:57:37 -04:00 · 2023-06-09 17:13:14 -07:00
parent 1597d5d4bb
commit e2b26adf2f
6 changed files with 106 additions and 113 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -130,3 +130,6 @@ dmypy.json
 bazel-*

 package-lock.json
+
+# PyCharm config
+.idea
--- a/DEVELOPMENT.md
+++ b/DEVELOPMENT.md
@@ -9,11 +9,14 @@ out to us if you have any question!

 ## Table of Contents

- [Setting Up Your Development Environment](#setting-up-your-development-environment)
- [Project Structure](#project-structure)
- [Development Workflow](#development-workflow)
- [Writing Tests](#writing-tests)
- [Releasing a New Version](#releasing-a-new-version)
+- [Developer Guide](#developer-guide)
+  - [Table of Contents](#table-of-contents)
+  - [Setting Up Your Development Environment](#setting-up-your-development-environment)
+  - [Project Structure](#project-structure)
+  - [Development Workflow](#development-workflow)
+  - [Using a custom fork](#using-a-custom-fork)
+  - [Writing Tests](#writing-tests)
+  - [Releasing a New Version](#releasing-a-new-version)

 ## Setting Up Your Development Environment

@@ -121,6 +124,12 @@ After setting up your environment, here's how you can start contributing:

 8. Submit a Pull Request on GitHub.

+## Using a custom fork
+
+If you wish to use a modified version of OpenLLM, install your fork from source
+with `pip install -e` and set `OPENLLM_DEV_BUILD=True`, so that Bentos built will
+include the generated wheels for OpenLLM in the bundle.
+
 ## Writing Tests

 Good tests are crucial for the stability of our codebase. Always write tests for
--- a/README.md
+++ b/README.md
@@ -1,42 +1,41 @@
 <div align="center">
-    <h1 align="center">OpenLLM</h1>
+    <h1 align="center">🦾 OpenLLM</h1>
    <a href="https://pypi.org/project/openllm">
        <img src="https://img.shields.io/pypi/v/openllm.svg" alt="pypi_status" />
    </a><a href="https://github.com/bentoml/OpenLLM/actions/workflows/ci.yml">
        <img src="https://github.com/bentoml/OpenLLM/actions/workflows/ci.yml/badge.svg?branch=main" alt="ci" />
-    </a><a href="https://l.bentoml.com/join-openllm-discord">
-        <img src="https://badgen.net/badge/icon/OpenLLM/7289da?icon=discord&label=Join%20Us" alt="Discord" />
    </a><a href="https://twitter.com/bentomlai">
        <img src="https://badgen.net/badge/icon/@bentomlai/1DA1F2?icon=twitter&label=Follow%20Us" alt="Twitter" />
+    </a><a href="https://l.bentoml.com/join-openllm-discord">
+        <img src="https://badgen.net/badge/icon/OpenLLM/7289da?icon=discord&label=Join%20Us" alt="Discord" />
    </a><br>
-    <strong>Build, fine-tune, serve, and deploy Large-Language Models including popular ones like StableLM, Llama, Dolly, Flan-T5, Vicuna, or even your custom LLMs.<br></strong>
-    <i>Powered by BentoML 🍱</i>
+    <p>An open platform for operating large language models(LLMs) in production.</br>
+    Fine-tune, serve, deploy, and monitor any LLMs with ease.</p>
+    <i></i>
 </div>

 <br/>

 ## 📖 Introduction

-With OpenLLM, you can easily run inference with any open-source large-language
-models(LLMs) and build production-ready LLM apps, powered by BentoML. Here are
-some key features:
+With OpenLLM, you can run inference with any open-source large-language models(LLMs),
+deploy to the cloud or on-premises, and build powerful AI apps.

-🚂 **SOTA LLMs**: With a single click, access support for state-of-the-art LLMs,
-including StableLM, Llama, Alpaca, Dolly, Flan-T5, ChatGLM, Falcon, and more.
+🚂 **SOTA LLMs**: built-in supports a wide range of open-source LLMs and model runtime,
+including StableLM, Falcon, Dolly, Flan-T5, ChatGLM, StarCoder and more.

-🔥 **Easy-to-use APIs**: We provide intuitive interfaces by integrating with
-popular tools like BentoML, HuggingFace, LangChain, and more.
+🔥 **Flexible APIs**: serve LLMs over RESTful API or gRPC with one command, query
+via WebUI, CLI, our Python/Javascript client, or any HTTP client.

-📦 **Fine-tuning your own LLM**: Customize any LLM to suit your needs with
-`LLM.tuning()`. (Work In Progress)
+⛓️ **Freedom To Build**: First-class support for LangChain and BentoML allows you to
+easily create your own AI apps by composing LLMs with other models and services.

-⛓️ **Interoperability**: First-class support for LangChain and BentoML’s runner
-architecture, allows easy chaining of LLMs on multiple GPUs/Nodes. (Work In
-Progress)
+🎯 **Streamline Deployment**: build your LLM server Docker Images or deploy as
+serverless endpoint via [☁️ BentoCloud](https://l.bentoml.com/bento-cloud).
+
+🤖️ **Bring your own LLM**: Fine-tune any LLM to suit your needs with
+`LLM.tuning()`. (Coming soon)

-🎯 **Streamline Production Deployment**: Seamlessly package into a Bento with
-`openllm build`, containerized into OCI Images, and deploy with a single click
-using [☁️ BentoCloud](https://l.bentoml.com/bento-cloud).

 ## 🏃‍ Getting Started

@@ -53,12 +52,8 @@ pip install openllm
 To verify if it's installed correctly, run:

 ```
-openllm -h
-```
+$ openllm -h

-The correct output will be:
-
-```
 Usage: openllm [OPTIONS] COMMAND [ARGS]...

   ██████╗ ██████╗ ███████╗███╗   ██╗██╗     ██╗     ███╗   ███╗
@@ -68,11 +63,8 @@ Usage: openllm [OPTIONS] COMMAND [ARGS]...
  ╚██████╔╝██║     ███████╗██║ ╚████║███████╗███████╗██║ ╚═╝ ██║
   ╚═════╝ ╚═╝     ╚══════╝╚═╝  ╚═══╝╚══════╝╚══════╝╚═╝     ╚═╝

-  OpenLLM: Your one stop-and-go-solution for serving any Open Large-Language Model
-
-      - StableLM, Falcon, ChatGLM, Dolly, Flan-T5, and more
-
-      - Powered by BentoML 🍱
+  An open platform for operating large language models in production.
+  Fine-tune, serve, deploy, and monitor any LLMs with ease.
 ```

 ### Starting an LLM Server
@@ -84,8 +76,8 @@ server:
 openllm start dolly-v2
 ```

-Following this, a swagger UI will be accessible at http://0.0.0.0:3000 where you
-can experiment with the endpoints and sample prompts.
+Following this, a Web UI will be accessible at http://0.0.0.0:3000 where you
+can experiment with the endpoints and sample input prompts.

 OpenLLM provides a built-in Python client, allowing you to interact with the
 model. In a different terminal window or a Jupyter notebook, create a client to
@@ -101,62 +93,33 @@ You can also use the `openllm query` command to query the model from the
 terminal:

 ```bash
-openllm query --endpoint http://localhost:3000 'Explain to me the difference between "further" and "farther"'
+export OPENLLM_ENDPOINT=http://localhost:3000
+openllm query 'Explain to me the difference between "further" and "farther"'
 ```

-## 🚀 Deploying to Production
+Visit `http://0.0.0.0:3000/docs.json` for OpenLLM's API specification.

-To deploy your LLMs into production:

-1. **Building a Bento**: With OpenLLM, you can easily build a Bento for a
-   specific model, like `dolly-v2`, using the `build` command.:
+## 🧩 Supported Models

-   ```bash
-   openllm build dolly-v2
-   ```
-
-   A
-   [Bento](https://docs.bentoml.org/en/latest/concepts/bento.html#what-is-a-bento),
-   in BentoML, is the unit of distribution. It packages your program's source
-   code, models, files, artifacts, and dependencies.
-
-   > _NOTE_: If you wish to build OpenLLM from the git source, set
-   > `OPENLLM_DEV_BUILD=True` to include the generated wheels in the bundle.
-
-2. **Containerize your Bento**
-
-   ```
-   bentoml containerize <name:version>
-   ```
-
-   BentoML offers a comprehensive set of options for deploying and hosting
-   online ML services in production. To learn more, check out the
-   [Deploying a Bento](https://docs.bentoml.org/en/latest/concepts/deploy.html)
-   guide.
-
-## 🧩 Models and Dependencies
-
-OpenLLM currently supports the following:
+The following models are currently supported in OpenLLM. By default, OpenLLM doesn't
+include dependencies to run all models. The extra model-specific dependencies can be 
+installed with the instructions below:

 <!-- update-readme.py: start -->

-| Model                                                                 | CPU | GPU | Optional                         |
-| --------------------------------------------------------------------- | --- | --- | -------------------------------- |
-| [flan-t5](https://huggingface.co/docs/transformers/model_doc/flan-t5) | ✅  | ✅  | `pip install openllm[flan-t5]`   |
-| [dolly-v2](https://github.com/databrickslabs/dolly)                   | ✅  | ✅  | 👾 (not needed)                  |
-| [chatglm](https://github.com/THUDM/ChatGLM-6B)                        | ❌  | ✅  | `pip install openllm[chatglm]`   |
-| [starcoder](https://github.com/bigcode-project/starcoder)             | ❌  | ✅  | `pip install openllm[starcoder]` |
-| [falcon](https://falconllm.tii.ae/)                                   | ❌  | ✅  | `pip install openllm[falcon]`    |
-| [stablelm](https://github.com/Stability-AI/StableLM)                  | ❌  | ✅  | 👾 (not needed)                  |
-
-> NOTE: We respect users' system disk space. Hence, OpenLLM doesn't enforce to
-> install dependencies to run all models. If one wishes to use any of the
-> aforementioned models, make sure to install the optional dependencies
-> mentioned above.
+| Model                                                                 | CPU | GPU | Installation                           |
+| --------------------------------------------------------------------- | --- | --- | ---------------------------------- |
+| [flan-t5](https://huggingface.co/docs/transformers/model_doc/flan-t5) | ✅  | ✅  | `pip install "openllm[flan-t5]"`   |
+| [dolly-v2](https://github.com/databrickslabs/dolly)                   | ✅  | ✅  | `pip install openllm`             |
+| [chatglm](https://github.com/THUDM/ChatGLM-6B)                        | ❌  | ✅  | `pip install "openllm[chatglm]"`   |
+| [starcoder](https://github.com/bigcode-project/starcoder)             | ❌  | ✅  | `pip install "openllm[starcoder]"` |
+| [falcon](https://falconllm.tii.ae/)                                   | ❌  | ✅  | `pip install "openllm[falcon]"`    |
+| [stablelm](https://github.com/Stability-AI/StableLM)                  | ❌  | ✅  | `pip install openllm`             |

 <!-- update-readme.py: stop -->

-### Runtime Implementations
+### Runtime Implementations (Experimental)

 Different LLMs may have multiple runtime implementations. For instance, they
 might use Pytorch (`pt`), Tensorflow (`tf`), or Flax (`flax`).
@@ -175,8 +138,7 @@ OPENLLM_FLAN_T5_FRAMEWORK=tf openllm start flan-t5
 ### Integrating a New Model

 OpenLLM encourages contributions by welcoming users to incorporate their custom
-LLMs into the ecosystem. Checkout
-[Adding a New Model Guide](https://github.com/bentoml/OpenLLM/blob/main/ADDING_NEW_MODEL.md)
+LLMs into the ecosystem. Check out [Adding a New Model Guide](https://github.com/bentoml/OpenLLM/blob/main/ADDING_NEW_MODEL.md)
 to see how you can do it yourself.

 ## ⚙️ Integrations
@@ -190,7 +152,7 @@ easily integrate with other powerful tools. We currently offer integration with

 OpenLLM models can be integrated as a
 [Runner](https://docs.bentoml.org/en/latest/concepts/runner.html) in your
-BentoML service. These runners has a `generate` method that takes a string as a
+BentoML service. These runners have a `generate` method that takes a string as a
 prompt and returns a corresponding output string. This will allow you to plug
 and play any OpenLLM models with your existing ML workflow.

@@ -233,6 +195,34 @@ llm = OpenLLM.for_model(server_url='http://localhost:8000', server_type='http')
 llm("What is the difference between a duck and a goose?")
 ```

+## 🚀 Deploying to Production
+
+To deploy your LLMs into production:
+
+1. **Building a Bento**: With OpenLLM, you can easily build a Bento for a
+   specific model, like `dolly-v2`, using the `build` command.:
+
+   ```bash
+   openllm build dolly-v2
+   ```
+
+   A [Bento](https://docs.bentoml.org/en/latest/concepts/bento.html#what-is-a-bento),
+   in BentoML, is the unit of distribution. It packages your program's source
+   code, models, files, artifacts, and dependencies.
+
+
+2. **Containerize your Bento**
+
+   ```
+   bentoml containerize <name:version>
+   ```
+
+   BentoML offers a comprehensive set of options for deploying and hosting
+   online ML services in production. To learn more, check out the
+   [Deploying a Bento](https://docs.bentoml.org/en/latest/concepts/deploy.html)
+   guide.
+
+
 ## 🍇 Telemetry

 OpenLLM collects usage data to enhance user experience and improve the product.
--- a/src/openllm/init.py
+++ b/src/openllm/init.py
@@ -15,10 +15,13 @@
 OpenLLM
 =======

-OpenLLM: Your one stop-and-go-solution for serving any Open Large-Language Model
+An open platform for operating large language models in production. Fine-tune, serve,
+deploy, and monitor any LLMs with ease.

- StableLM, Llama, Alpaca, Dolly, Flan-T5, and more
- Powered by BentoML 🍱 + HuggingFace 🤗
+* Built-in support for StableLM, Llama, Dolly, Flan-T5, Vicuna
+* Option to bring your own fine-tuned LLMs
+* Online Serving with HTTP, gRPC, SSE(coming soon) or custom API
+* Native integration with BentoML and LangChain for custom LLM apps
 """
 from __future__ import annotations

--- a/src/openllm/cli.py
+++ b/src/openllm/cli.py
@@ -517,12 +517,8 @@ def cli_factory() -> click.Group:
         ╚═════╝ ╚═╝     ╚══════╝╚═╝  ╚═══╝╚══════╝╚══════╝╚═╝     ╚═╝

        \b
-        OpenLLM: Your one stop-and-go-solution for serving any Open Large-Language Model
-
-            - StableLM, Falcon, ChatGLM, Dolly, Flan-T5, and more
-
-        \b
-            - Powered by BentoML 🍱
+        An open platform for operating large language models in production.
+        Fine-tune, serve, deploy, and monitor any LLMs with ease.
        """

    @cli.group(cls=OpenLLMCommandGroup, context_settings=_CONTEXT_SETTINGS)
@@ -588,7 +584,7 @@ def cli_factory() -> click.Group:
            _echo(bento.tag)
        return bento

-    @cli.command(aliases=["list"])
+    @cli.command()
    @output_option
    def models(output: OutputLiteral):
        """List all supported models."""
@@ -649,7 +645,7 @@ def cli_factory() -> click.Group:

        sys.exit(0)

-    @cli.command(aliases=["save"])
+    @cli.command()
    @click.argument(
        "model_name", type=click.Choice([inflection.dasherize(name) for name in openllm.CONFIG_MAPPING.keys()])
    )
@@ -729,11 +725,12 @@ def cli_factory() -> click.Group:
                model_store.delete(model.tag)
                click.echo(f"{model} deleted.")

-    @cli.command(name="query", aliases=["run", "ask"])
+    @cli.command(name="query")
    @click.option(
        "--endpoint",
        type=click.STRING,
-        help="LLM Server endpoint, i.e: http://12.323.2.1",
+        help="OpenLLM Server endpoint, i.e: http://0.0.0.0:3000",
+        envvar="OPENLLM_ENDPOINT",
        default="http://0.0.0.0:3000",
    )
    @click.option("--timeout", type=click.INT, default=30, help="Default server timeout", show_default=True)
--- a/tools/update-readme.py
+++ b/tools/update-readme.py
@@ -38,11 +38,11 @@ def main() -> int:
        readme = f.readlines()

    start_index, stop_index = readme.index(START_COMMENT), readme.index(END_COMMENT)
-    formatted: dict[t.Literal["Model", "CPU", "GPU", "Optional"], list[str]] = {
+    formatted: dict[t.Literal["Model", "CPU", "GPU", "Installation"], list[str]] = {
        "Model": [],
        "CPU": [],
        "GPU": [],
-        "Optional": [],
+        "Installation": [],
    }
    max_name_len_div = 0
    max_install_len_div = 0
@@ -55,19 +55,19 @@ def main() -> int:
        formatted["Model"].append(model_name)
        formatted["GPU"].append("✅")
        formatted["CPU"].append("✅" if not config.__openllm_requires_gpu__ else "❌")
-        instruction = "👾 (not needed)"
+        instruction = "`pip install openllm`"
        if dashed in deps:
-            instruction = f"`pip install openllm[{dashed}]`"
+            instruction = f"""`pip install "openllm[{dashed}]"`"""
        else:
            does_not_need_custom_installation.append(model_name)
        if len(instruction) > max_install_len_div:
            max_install_len_div = len(instruction)
-        formatted["Optional"].append(instruction)
+        formatted["Installation"].append(instruction)

    meta = ["\n"]

    # NOTE: headers
-    meta += f"| Model {' ' * (max_name_len_div - 6)} | CPU | GPU | Optional {' ' * (max_install_len_div - 8)}|\n"
+    meta += f"| Model {' ' * (max_name_len_div - 6)} | CPU | GPU | Installation {' ' * (max_install_len_div - 8)}|\n"
    # NOTE: divs
    meta += f"| {'-' * max_name_len_div}" + " | --- | --- | " + f"{'-' * max_install_len_div} |\n"
    # NOTE: rows
@@ -88,15 +88,6 @@ def main() -> int:
        )
    meta += "\n"

-    # NOTE: adding notes
-    meta += """\
-> NOTE: We respect users' system disk space. Hence, OpenLLM doesn't enforce to
-> install dependencies to run all models. If one wishes to use any of the
-> aforementioned models, make sure to install the optional dependencies
-> mentioned above.
-
-"""
-
    readme = readme[:start_index] + [START_COMMENT] + meta + [END_COMMENT] + readme[stop_index + 1 :]

    with open(os.path.join(ROOT, "README.md"), "w") as f: