diff --git a/.gitignore b/.gitignore index 73883c09..688cea65 100644 --- a/.gitignore +++ b/.gitignore @@ -130,3 +130,6 @@ dmypy.json bazel-* package-lock.json + +# PyCharm config +.idea diff --git a/DEVELOPMENT.md b/DEVELOPMENT.md index 49b88a51..cec32e84 100644 --- a/DEVELOPMENT.md +++ b/DEVELOPMENT.md @@ -9,11 +9,14 @@ out to us if you have any question! ## Table of Contents -- [Setting Up Your Development Environment](#setting-up-your-development-environment) -- [Project Structure](#project-structure) -- [Development Workflow](#development-workflow) -- [Writing Tests](#writing-tests) -- [Releasing a New Version](#releasing-a-new-version) +- [Developer Guide](#developer-guide) + - [Table of Contents](#table-of-contents) + - [Setting Up Your Development Environment](#setting-up-your-development-environment) + - [Project Structure](#project-structure) + - [Development Workflow](#development-workflow) + - [Using a custom fork](#using-a-custom-fork) + - [Writing Tests](#writing-tests) + - [Releasing a New Version](#releasing-a-new-version) ## Setting Up Your Development Environment @@ -121,6 +124,12 @@ After setting up your environment, here's how you can start contributing: 8. Submit a Pull Request on GitHub. +## Using a custom fork + +If you wish to use a modified version of OpenLLM, install your fork from source +with `pip install -e` and set `OPENLLM_DEV_BUILD=True`, so that Bentos built will +include the generated wheels for OpenLLM in the bundle. + ## Writing Tests Good tests are crucial for the stability of our codebase. Always write tests for diff --git a/README.md b/README.md index a97c9e44..a3ebc64b 100644 --- a/README.md +++ b/README.md @@ -1,42 +1,41 @@
-

OpenLLM

+

🦾 OpenLLM

pypi_status ci - - Discord Twitter + + Discord
- Build, fine-tune, serve, and deploy Large-Language Models including popular ones like StableLM, Llama, Dolly, Flan-T5, Vicuna, or even your custom LLMs.
- Powered by BentoML 🍱 +

An open platform for operating large language models(LLMs) in production.
+ Fine-tune, serve, deploy, and monitor any LLMs with ease.

+

## πŸ“– Introduction -With OpenLLM, you can easily run inference with any open-source large-language -models(LLMs) and build production-ready LLM apps, powered by BentoML. Here are -some key features: +With OpenLLM, you can run inference with any open-source large-language models(LLMs), +deploy to the cloud or on-premises, and build powerful AI apps. -πŸš‚ **SOTA LLMs**: With a single click, access support for state-of-the-art LLMs, -including StableLM, Llama, Alpaca, Dolly, Flan-T5, ChatGLM, Falcon, and more. +πŸš‚ **SOTA LLMs**: built-in supports a wide range of open-source LLMs and model runtime, +including StableLM, Falcon, Dolly, Flan-T5, ChatGLM, StarCoder and more. -πŸ”₯ **Easy-to-use APIs**: We provide intuitive interfaces by integrating with -popular tools like BentoML, HuggingFace, LangChain, and more. +πŸ”₯ **Flexible APIs**: serve LLMs over RESTful API or gRPC with one command, query +via WebUI, CLI, our Python/Javascript client, or any HTTP client. -πŸ“¦ **Fine-tuning your own LLM**: Customize any LLM to suit your needs with -`LLM.tuning()`. (Work In Progress) +⛓️ **Freedom To Build**: First-class support for LangChain and BentoML allows you to +easily create your own AI apps by composing LLMs with other models and services. -⛓️ **Interoperability**: First-class support for LangChain and BentoML’s runner -architecture, allows easy chaining of LLMs on multiple GPUs/Nodes. (Work In -Progress) +🎯 **Streamline Deployment**: build your LLM server Docker Images or deploy as +serverless endpoint via [☁️ BentoCloud](https://l.bentoml.com/bento-cloud). + +πŸ€–οΈ **Bring your own LLM**: Fine-tune any LLM to suit your needs with +`LLM.tuning()`. (Coming soon) -🎯 **Streamline Production Deployment**: Seamlessly package into a Bento with -`openllm build`, containerized into OCI Images, and deploy with a single click -using [☁️ BentoCloud](https://l.bentoml.com/bento-cloud). ## πŸƒβ€ Getting Started @@ -53,12 +52,8 @@ pip install openllm To verify if it's installed correctly, run: ``` -openllm -h -``` +$ openllm -h -The correct output will be: - -``` Usage: openllm [OPTIONS] COMMAND [ARGS]... β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•—β–ˆβ–ˆβ•— β–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ•— @@ -68,11 +63,8 @@ Usage: openllm [OPTIONS] COMMAND [ARGS]... β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘ β•šβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘ β•šβ•β• β–ˆβ–ˆβ•‘ β•šβ•β•β•β•β•β• β•šβ•β• β•šβ•β•β•β•β•β•β•β•šβ•β• β•šβ•β•β•β•β•šβ•β•β•β•β•β•β•β•šβ•β•β•β•β•β•β•β•šβ•β• β•šβ•β• - OpenLLM: Your one stop-and-go-solution for serving any Open Large-Language Model - - - StableLM, Falcon, ChatGLM, Dolly, Flan-T5, and more - - - Powered by BentoML 🍱 + An open platform for operating large language models in production. + Fine-tune, serve, deploy, and monitor any LLMs with ease. ``` ### Starting an LLM Server @@ -84,8 +76,8 @@ server: openllm start dolly-v2 ``` -Following this, a swagger UI will be accessible at http://0.0.0.0:3000 where you -can experiment with the endpoints and sample prompts. +Following this, a Web UI will be accessible at http://0.0.0.0:3000 where you +can experiment with the endpoints and sample input prompts. OpenLLM provides a built-in Python client, allowing you to interact with the model. In a different terminal window or a Jupyter notebook, create a client to @@ -101,62 +93,33 @@ You can also use the `openllm query` command to query the model from the terminal: ```bash -openllm query --endpoint http://localhost:3000 'Explain to me the difference between "further" and "farther"' +export OPENLLM_ENDPOINT=http://localhost:3000 +openllm query 'Explain to me the difference between "further" and "farther"' ``` -## πŸš€ Deploying to Production +Visit `http://0.0.0.0:3000/docs.json` for OpenLLM's API specification. -To deploy your LLMs into production: -1. **Building a Bento**: With OpenLLM, you can easily build a Bento for a - specific model, like `dolly-v2`, using the `build` command.: +## 🧩 Supported Models - ```bash - openllm build dolly-v2 - ``` - - A - [Bento](https://docs.bentoml.org/en/latest/concepts/bento.html#what-is-a-bento), - in BentoML, is the unit of distribution. It packages your program's source - code, models, files, artifacts, and dependencies. - - > _NOTE_: If you wish to build OpenLLM from the git source, set - > `OPENLLM_DEV_BUILD=True` to include the generated wheels in the bundle. - -2. **Containerize your Bento** - - ``` - bentoml containerize - ``` - - BentoML offers a comprehensive set of options for deploying and hosting - online ML services in production. To learn more, check out the - [Deploying a Bento](https://docs.bentoml.org/en/latest/concepts/deploy.html) - guide. - -## 🧩 Models and Dependencies - -OpenLLM currently supports the following: +The following models are currently supported in OpenLLM. By default, OpenLLM doesn't +include dependencies to run all models. The extra model-specific dependencies can be +installed with the instructions below: -| Model | CPU | GPU | Optional | -| --------------------------------------------------------------------- | --- | --- | -------------------------------- | -| [flan-t5](https://huggingface.co/docs/transformers/model_doc/flan-t5) | βœ… | βœ… | `pip install openllm[flan-t5]` | -| [dolly-v2](https://github.com/databrickslabs/dolly) | βœ… | βœ… | πŸ‘Ύ (not needed) | -| [chatglm](https://github.com/THUDM/ChatGLM-6B) | ❌ | βœ… | `pip install openllm[chatglm]` | -| [starcoder](https://github.com/bigcode-project/starcoder) | ❌ | βœ… | `pip install openllm[starcoder]` | -| [falcon](https://falconllm.tii.ae/) | ❌ | βœ… | `pip install openllm[falcon]` | -| [stablelm](https://github.com/Stability-AI/StableLM) | ❌ | βœ… | πŸ‘Ύ (not needed) | - -> NOTE: We respect users' system disk space. Hence, OpenLLM doesn't enforce to -> install dependencies to run all models. If one wishes to use any of the -> aforementioned models, make sure to install the optional dependencies -> mentioned above. +| Model | CPU | GPU | Installation | +| --------------------------------------------------------------------- | --- | --- | ---------------------------------- | +| [flan-t5](https://huggingface.co/docs/transformers/model_doc/flan-t5) | βœ… | βœ… | `pip install "openllm[flan-t5]"` | +| [dolly-v2](https://github.com/databrickslabs/dolly) | βœ… | βœ… | `pip install openllm` | +| [chatglm](https://github.com/THUDM/ChatGLM-6B) | ❌ | βœ… | `pip install "openllm[chatglm]"` | +| [starcoder](https://github.com/bigcode-project/starcoder) | ❌ | βœ… | `pip install "openllm[starcoder]"` | +| [falcon](https://falconllm.tii.ae/) | ❌ | βœ… | `pip install "openllm[falcon]"` | +| [stablelm](https://github.com/Stability-AI/StableLM) | ❌ | βœ… | `pip install openllm` | -### Runtime Implementations +### Runtime Implementations (Experimental) Different LLMs may have multiple runtime implementations. For instance, they might use Pytorch (`pt`), Tensorflow (`tf`), or Flax (`flax`). @@ -175,8 +138,7 @@ OPENLLM_FLAN_T5_FRAMEWORK=tf openllm start flan-t5 ### Integrating a New Model OpenLLM encourages contributions by welcoming users to incorporate their custom -LLMs into the ecosystem. Checkout -[Adding a New Model Guide](https://github.com/bentoml/OpenLLM/blob/main/ADDING_NEW_MODEL.md) +LLMs into the ecosystem. Check out [Adding a New Model Guide](https://github.com/bentoml/OpenLLM/blob/main/ADDING_NEW_MODEL.md) to see how you can do it yourself. ## βš™οΈ Integrations @@ -190,7 +152,7 @@ easily integrate with other powerful tools. We currently offer integration with OpenLLM models can be integrated as a [Runner](https://docs.bentoml.org/en/latest/concepts/runner.html) in your -BentoML service. These runners has a `generate` method that takes a string as a +BentoML service. These runners have a `generate` method that takes a string as a prompt and returns a corresponding output string. This will allow you to plug and play any OpenLLM models with your existing ML workflow. @@ -233,6 +195,34 @@ llm = OpenLLM.for_model(server_url='http://localhost:8000', server_type='http') llm("What is the difference between a duck and a goose?") ``` +## πŸš€ Deploying to Production + +To deploy your LLMs into production: + +1. **Building a Bento**: With OpenLLM, you can easily build a Bento for a + specific model, like `dolly-v2`, using the `build` command.: + + ```bash + openllm build dolly-v2 + ``` + + A [Bento](https://docs.bentoml.org/en/latest/concepts/bento.html#what-is-a-bento), + in BentoML, is the unit of distribution. It packages your program's source + code, models, files, artifacts, and dependencies. + + +2. **Containerize your Bento** + + ``` + bentoml containerize + ``` + + BentoML offers a comprehensive set of options for deploying and hosting + online ML services in production. To learn more, check out the + [Deploying a Bento](https://docs.bentoml.org/en/latest/concepts/deploy.html) + guide. + + ## πŸ‡ Telemetry OpenLLM collects usage data to enhance user experience and improve the product. diff --git a/src/openllm/__init__.py b/src/openllm/__init__.py index 364ba2c6..39d49f5f 100644 --- a/src/openllm/__init__.py +++ b/src/openllm/__init__.py @@ -15,10 +15,13 @@ OpenLLM ======= -OpenLLM: Your one stop-and-go-solution for serving any Open Large-Language Model +An open platform for operating large language models in production. Fine-tune, serve, +deploy, and monitor any LLMs with ease. -- StableLM, Llama, Alpaca, Dolly, Flan-T5, and more -- Powered by BentoML 🍱 + HuggingFace πŸ€— +* Built-in support for StableLM, Llama, Dolly, Flan-T5, Vicuna +* Option to bring your own fine-tuned LLMs +* Online Serving with HTTP, gRPC, SSE(coming soon) or custom API +* Native integration with BentoML and LangChain for custom LLM apps """ from __future__ import annotations diff --git a/src/openllm/cli.py b/src/openllm/cli.py index 0fddbf94..c2bd6b27 100644 --- a/src/openllm/cli.py +++ b/src/openllm/cli.py @@ -517,12 +517,8 @@ def cli_factory() -> click.Group: β•šβ•β•β•β•β•β• β•šβ•β• β•šβ•β•β•β•β•β•β•β•šβ•β• β•šβ•β•β•β•β•šβ•β•β•β•β•β•β•β•šβ•β•β•β•β•β•β•β•šβ•β• β•šβ•β• \b - OpenLLM: Your one stop-and-go-solution for serving any Open Large-Language Model - - - StableLM, Falcon, ChatGLM, Dolly, Flan-T5, and more - - \b - - Powered by BentoML 🍱 + An open platform for operating large language models in production. + Fine-tune, serve, deploy, and monitor any LLMs with ease. """ @cli.group(cls=OpenLLMCommandGroup, context_settings=_CONTEXT_SETTINGS) @@ -588,7 +584,7 @@ def cli_factory() -> click.Group: _echo(bento.tag) return bento - @cli.command(aliases=["list"]) + @cli.command() @output_option def models(output: OutputLiteral): """List all supported models.""" @@ -649,7 +645,7 @@ def cli_factory() -> click.Group: sys.exit(0) - @cli.command(aliases=["save"]) + @cli.command() @click.argument( "model_name", type=click.Choice([inflection.dasherize(name) for name in openllm.CONFIG_MAPPING.keys()]) ) @@ -729,11 +725,12 @@ def cli_factory() -> click.Group: model_store.delete(model.tag) click.echo(f"{model} deleted.") - @cli.command(name="query", aliases=["run", "ask"]) + @cli.command(name="query") @click.option( "--endpoint", type=click.STRING, - help="LLM Server endpoint, i.e: http://12.323.2.1", + help="OpenLLM Server endpoint, i.e: http://0.0.0.0:3000", + envvar="OPENLLM_ENDPOINT", default="http://0.0.0.0:3000", ) @click.option("--timeout", type=click.INT, default=30, help="Default server timeout", show_default=True) diff --git a/tools/update-readme.py b/tools/update-readme.py index 2adbc87d..a9d48348 100755 --- a/tools/update-readme.py +++ b/tools/update-readme.py @@ -38,11 +38,11 @@ def main() -> int: readme = f.readlines() start_index, stop_index = readme.index(START_COMMENT), readme.index(END_COMMENT) - formatted: dict[t.Literal["Model", "CPU", "GPU", "Optional"], list[str]] = { + formatted: dict[t.Literal["Model", "CPU", "GPU", "Installation"], list[str]] = { "Model": [], "CPU": [], "GPU": [], - "Optional": [], + "Installation": [], } max_name_len_div = 0 max_install_len_div = 0 @@ -55,19 +55,19 @@ def main() -> int: formatted["Model"].append(model_name) formatted["GPU"].append("βœ…") formatted["CPU"].append("βœ…" if not config.__openllm_requires_gpu__ else "❌") - instruction = "πŸ‘Ύ (not needed)" + instruction = "`pip install openllm`" if dashed in deps: - instruction = f"`pip install openllm[{dashed}]`" + instruction = f"""`pip install "openllm[{dashed}]"`""" else: does_not_need_custom_installation.append(model_name) if len(instruction) > max_install_len_div: max_install_len_div = len(instruction) - formatted["Optional"].append(instruction) + formatted["Installation"].append(instruction) meta = ["\n"] # NOTE: headers - meta += f"| Model {' ' * (max_name_len_div - 6)} | CPU | GPU | Optional {' ' * (max_install_len_div - 8)}|\n" + meta += f"| Model {' ' * (max_name_len_div - 6)} | CPU | GPU | Installation {' ' * (max_install_len_div - 8)}|\n" # NOTE: divs meta += f"| {'-' * max_name_len_div}" + " | --- | --- | " + f"{'-' * max_install_len_div} |\n" # NOTE: rows @@ -88,15 +88,6 @@ def main() -> int: ) meta += "\n" - # NOTE: adding notes - meta += """\ -> NOTE: We respect users' system disk space. Hence, OpenLLM doesn't enforce to -> install dependencies to run all models. If one wishes to use any of the -> aforementioned models, make sure to install the optional dependencies -> mentioned above. - -""" - readme = readme[:start_index] + [START_COMMENT] + meta + [END_COMMENT] + readme[stop_index + 1 :] with open(os.path.join(ROOT, "README.md"), "w") as f: