diff --git a/.gitignore b/.gitignore
index 73883c09..688cea65 100644
--- a/.gitignore
+++ b/.gitignore
@@ -130,3 +130,6 @@ dmypy.json
 bazel-*
 
 package-lock.json
+
+# PyCharm config
+.idea
diff --git a/DEVELOPMENT.md b/DEVELOPMENT.md
index 49b88a51..cec32e84 100644
--- a/DEVELOPMENT.md
+++ b/DEVELOPMENT.md
@@ -9,11 +9,14 @@ out to us if you have any question!
 
 ## Table of Contents
 
-- [Setting Up Your Development Environment](#setting-up-your-development-environment)
-- [Project Structure](#project-structure)
-- [Development Workflow](#development-workflow)
-- [Writing Tests](#writing-tests)
-- [Releasing a New Version](#releasing-a-new-version)
+- [Developer Guide](#developer-guide)
+  - [Table of Contents](#table-of-contents)
+  - [Setting Up Your Development Environment](#setting-up-your-development-environment)
+  - [Project Structure](#project-structure)
+  - [Development Workflow](#development-workflow)
+  - [Using a custom fork](#using-a-custom-fork)
+  - [Writing Tests](#writing-tests)
+  - [Releasing a New Version](#releasing-a-new-version)
 
 ## Setting Up Your Development Environment
 
@@ -121,6 +124,12 @@ After setting up your environment, here's how you can start contributing:
 
 8. Submit a Pull Request on GitHub.
 
+## Using a custom fork
+
+If you wish to use a modified version of OpenLLM, install your fork from source
+with `pip install -e` and set `OPENLLM_DEV_BUILD=True`, so that Bentos built will
+include the generated wheels for OpenLLM in the bundle.
+
 ## Writing Tests
 
 Good tests are crucial for the stability of our codebase. Always write tests for
diff --git a/README.md b/README.md
index a97c9e44..a3ebc64b 100644
--- a/README.md
+++ b/README.md
@@ -1,42 +1,41 @@
 <div align="center">
-    <h1 align="center">OpenLLM</h1>
+    <h1 align="center">🦾 OpenLLM</h1>
     <a href="https://pypi.org/project/openllm">
         <img src="https://img.shields.io/pypi/v/openllm.svg" alt="pypi_status" />
     </a><a href="https://github.com/bentoml/OpenLLM/actions/workflows/ci.yml">
         <img src="https://github.com/bentoml/OpenLLM/actions/workflows/ci.yml/badge.svg?branch=main" alt="ci" />
-    </a><a href="https://l.bentoml.com/join-openllm-discord">
-        <img src="https://badgen.net/badge/icon/OpenLLM/7289da?icon=discord&label=Join%20Us" alt="Discord" />
     </a><a href="https://twitter.com/bentomlai">
         <img src="https://badgen.net/badge/icon/@bentomlai/1DA1F2?icon=twitter&label=Follow%20Us" alt="Twitter" />
+    </a><a href="https://l.bentoml.com/join-openllm-discord">
+        <img src="https://badgen.net/badge/icon/OpenLLM/7289da?icon=discord&label=Join%20Us" alt="Discord" />
     </a><br>
-    <strong>Build, fine-tune, serve, and deploy Large-Language Models including popular ones like StableLM, Llama, Dolly, Flan-T5, Vicuna, or even your custom LLMs.<br></strong>
-    <i>Powered by BentoML 🍱</i>
+    <p>An open platform for operating large language models(LLMs) in production.</br>
+    Fine-tune, serve, deploy, and monitor any LLMs with ease.</p>
+    <i></i>
 </div>
 
 <br/>
 
 ## 📖 Introduction
 
-With OpenLLM, you can easily run inference with any open-source large-language
-models(LLMs) and build production-ready LLM apps, powered by BentoML. Here are
-some key features:
+With OpenLLM, you can run inference with any open-source large-language models(LLMs),
+deploy to the cloud or on-premises, and build powerful AI apps.
 
-🚂 **SOTA LLMs**: With a single click, access support for state-of-the-art LLMs,
-including StableLM, Llama, Alpaca, Dolly, Flan-T5, ChatGLM, Falcon, and more.
+🚂 **SOTA LLMs**: built-in supports a wide range of open-source LLMs and model runtime,
+including StableLM, Falcon, Dolly, Flan-T5, ChatGLM, StarCoder and more.
 
-🔥 **Easy-to-use APIs**: We provide intuitive interfaces by integrating with
-popular tools like BentoML, HuggingFace, LangChain, and more.
+🔥 **Flexible APIs**: serve LLMs over RESTful API or gRPC with one command, query
+via WebUI, CLI, our Python/Javascript client, or any HTTP client.
 
-📦 **Fine-tuning your own LLM**: Customize any LLM to suit your needs with
-`LLM.tuning()`. (Work In Progress)
+⛓️ **Freedom To Build**: First-class support for LangChain and BentoML allows you to
+easily create your own AI apps by composing LLMs with other models and services.
 
-⛓️ **Interoperability**: First-class support for LangChain and BentoML’s runner
-architecture, allows easy chaining of LLMs on multiple GPUs/Nodes. (Work In
-Progress)
+🎯 **Streamline Deployment**: build your LLM server Docker Images or deploy as
+serverless endpoint via [☁️ BentoCloud](https://l.bentoml.com/bento-cloud).
+
+🤖️ **Bring your own LLM**: Fine-tune any LLM to suit your needs with
+`LLM.tuning()`. (Coming soon)
 
-🎯 **Streamline Production Deployment**: Seamlessly package into a Bento with
-`openllm build`, containerized into OCI Images, and deploy with a single click
-using [☁️ BentoCloud](https://l.bentoml.com/bento-cloud).
 
 ## 🏃‍ Getting Started
 
@@ -53,12 +52,8 @@ pip install openllm
 To verify if it's installed correctly, run:
 
 ```
-openllm -h
-```
+$ openllm -h
 
-The correct output will be:
-
-```
 Usage: openllm [OPTIONS] COMMAND [ARGS]...
 
    ██████╗ ██████╗ ███████╗███╗   ██╗██╗     ██╗     ███╗   ███╗
@@ -68,11 +63,8 @@ Usage: openllm [OPTIONS] COMMAND [ARGS]...
   ╚██████╔╝██║     ███████╗██║ ╚████║███████╗███████╗██║ ╚═╝ ██║
    ╚═════╝ ╚═╝     ╚══════╝╚═╝  ╚═══╝╚══════╝╚══════╝╚═╝     ╚═╝
 
-  OpenLLM: Your one stop-and-go-solution for serving any Open Large-Language Model
-
-      - StableLM, Falcon, ChatGLM, Dolly, Flan-T5, and more
-
-      - Powered by BentoML 🍱
+  An open platform for operating large language models in production.
+  Fine-tune, serve, deploy, and monitor any LLMs with ease.
 ```
 
 ### Starting an LLM Server
@@ -84,8 +76,8 @@ server:
 openllm start dolly-v2
 ```
 
-Following this, a swagger UI will be accessible at http://0.0.0.0:3000 where you
-can experiment with the endpoints and sample prompts.
+Following this, a Web UI will be accessible at http://0.0.0.0:3000 where you
+can experiment with the endpoints and sample input prompts.
 
 OpenLLM provides a built-in Python client, allowing you to interact with the
 model. In a different terminal window or a Jupyter notebook, create a client to
@@ -101,62 +93,33 @@ You can also use the `openllm query` command to query the model from the
 terminal:
 
 ```bash
-openllm query --endpoint http://localhost:3000 'Explain to me the difference between "further" and "farther"'
+export OPENLLM_ENDPOINT=http://localhost:3000
+openllm query 'Explain to me the difference between "further" and "farther"'
 ```
 
-## 🚀 Deploying to Production
+Visit `http://0.0.0.0:3000/docs.json` for OpenLLM's API specification.
 
-To deploy your LLMs into production:
 
-1. **Building a Bento**: With OpenLLM, you can easily build a Bento for a
-   specific model, like `dolly-v2`, using the `build` command.:
+## 🧩 Supported Models
 
-   ```bash
-   openllm build dolly-v2
-   ```
-
-   A
-   [Bento](https://docs.bentoml.org/en/latest/concepts/bento.html#what-is-a-bento),
-   in BentoML, is the unit of distribution. It packages your program's source
-   code, models, files, artifacts, and dependencies.
-
-   > _NOTE_: If you wish to build OpenLLM from the git source, set
-   > `OPENLLM_DEV_BUILD=True` to include the generated wheels in the bundle.
-
-2. **Containerize your Bento**
-
-   ```
-   bentoml containerize <name:version>
-   ```
-
-   BentoML offers a comprehensive set of options for deploying and hosting
-   online ML services in production. To learn more, check out the
-   [Deploying a Bento](https://docs.bentoml.org/en/latest/concepts/deploy.html)
-   guide.
-
-## 🧩 Models and Dependencies
-
-OpenLLM currently supports the following:
+The following models are currently supported in OpenLLM. By default, OpenLLM doesn't
+include dependencies to run all models. The extra model-specific dependencies can be 
+installed with the instructions below:
 
 <!-- update-readme.py: start -->
 
-| Model                                                                 | CPU | GPU | Optional                         |
-| --------------------------------------------------------------------- | --- | --- | -------------------------------- |
-| [flan-t5](https://huggingface.co/docs/transformers/model_doc/flan-t5) | ✅  | ✅  | `pip install openllm[flan-t5]`   |
-| [dolly-v2](https://github.com/databrickslabs/dolly)                   | ✅  | ✅  | 👾 (not needed)                  |
-| [chatglm](https://github.com/THUDM/ChatGLM-6B)                        | ❌  | ✅  | `pip install openllm[chatglm]`   |
-| [starcoder](https://github.com/bigcode-project/starcoder)             | ❌  | ✅  | `pip install openllm[starcoder]` |
-| [falcon](https://falconllm.tii.ae/)                                   | ❌  | ✅  | `pip install openllm[falcon]`    |
-| [stablelm](https://github.com/Stability-AI/StableLM)                  | ❌  | ✅  | 👾 (not needed)                  |
-
-> NOTE: We respect users' system disk space. Hence, OpenLLM doesn't enforce to
-> install dependencies to run all models. If one wishes to use any of the
-> aforementioned models, make sure to install the optional dependencies
-> mentioned above.
+| Model                                                                 | CPU | GPU | Installation                           |
+| --------------------------------------------------------------------- | --- | --- | ---------------------------------- |
+| [flan-t5](https://huggingface.co/docs/transformers/model_doc/flan-t5) | ✅  | ✅  | `pip install "openllm[flan-t5]"`   |
+| [dolly-v2](https://github.com/databrickslabs/dolly)                   | ✅  | ✅  | `pip install openllm`             |
+| [chatglm](https://github.com/THUDM/ChatGLM-6B)                        | ❌  | ✅  | `pip install "openllm[chatglm]"`   |
+| [starcoder](https://github.com/bigcode-project/starcoder)             | ❌  | ✅  | `pip install "openllm[starcoder]"` |
+| [falcon](https://falconllm.tii.ae/)                                   | ❌  | ✅  | `pip install "openllm[falcon]"`    |
+| [stablelm](https://github.com/Stability-AI/StableLM)                  | ❌  | ✅  | `pip install openllm`             |
 
 <!-- update-readme.py: stop -->
 
-### Runtime Implementations
+### Runtime Implementations (Experimental)
 
 Different LLMs may have multiple runtime implementations. For instance, they
 might use Pytorch (`pt`), Tensorflow (`tf`), or Flax (`flax`).
@@ -175,8 +138,7 @@ OPENLLM_FLAN_T5_FRAMEWORK=tf openllm start flan-t5
 ### Integrating a New Model
 
 OpenLLM encourages contributions by welcoming users to incorporate their custom
-LLMs into the ecosystem. Checkout
-[Adding a New Model Guide](https://github.com/bentoml/OpenLLM/blob/main/ADDING_NEW_MODEL.md)
+LLMs into the ecosystem. Check out [Adding a New Model Guide](https://github.com/bentoml/OpenLLM/blob/main/ADDING_NEW_MODEL.md)
 to see how you can do it yourself.
 
 ## ⚙️ Integrations
@@ -190,7 +152,7 @@ easily integrate with other powerful tools. We currently offer integration with
 
 OpenLLM models can be integrated as a
 [Runner](https://docs.bentoml.org/en/latest/concepts/runner.html) in your
-BentoML service. These runners has a `generate` method that takes a string as a
+BentoML service. These runners have a `generate` method that takes a string as a
 prompt and returns a corresponding output string. This will allow you to plug
 and play any OpenLLM models with your existing ML workflow.
 
@@ -233,6 +195,34 @@ llm = OpenLLM.for_model(server_url='http://localhost:8000', server_type='http')
 llm("What is the difference between a duck and a goose?")
 ```
 
+## 🚀 Deploying to Production
+
+To deploy your LLMs into production:
+
+1. **Building a Bento**: With OpenLLM, you can easily build a Bento for a
+   specific model, like `dolly-v2`, using the `build` command.:
+
+   ```bash
+   openllm build dolly-v2
+   ```
+
+   A [Bento](https://docs.bentoml.org/en/latest/concepts/bento.html#what-is-a-bento),
+   in BentoML, is the unit of distribution. It packages your program's source
+   code, models, files, artifacts, and dependencies.
+
+
+2. **Containerize your Bento**
+
+   ```
+   bentoml containerize <name:version>
+   ```
+
+   BentoML offers a comprehensive set of options for deploying and hosting
+   online ML services in production. To learn more, check out the
+   [Deploying a Bento](https://docs.bentoml.org/en/latest/concepts/deploy.html)
+   guide.
+
+
 ## 🍇 Telemetry
 
 OpenLLM collects usage data to enhance user experience and improve the product.
diff --git a/src/openllm/__init__.py b/src/openllm/__init__.py
index 364ba2c6..39d49f5f 100644
--- a/src/openllm/__init__.py
+++ b/src/openllm/__init__.py
@@ -15,10 +15,13 @@
 OpenLLM
 =======
 
-OpenLLM: Your one stop-and-go-solution for serving any Open Large-Language Model
+An open platform for operating large language models in production. Fine-tune, serve,
+deploy, and monitor any LLMs with ease.
 
-- StableLM, Llama, Alpaca, Dolly, Flan-T5, and more
-- Powered by BentoML 🍱 + HuggingFace 🤗
+* Built-in support for StableLM, Llama, Dolly, Flan-T5, Vicuna
+* Option to bring your own fine-tuned LLMs
+* Online Serving with HTTP, gRPC, SSE(coming soon) or custom API
+* Native integration with BentoML and LangChain for custom LLM apps
 """
 from __future__ import annotations
 
diff --git a/src/openllm/cli.py b/src/openllm/cli.py
index 0fddbf94..c2bd6b27 100644
--- a/src/openllm/cli.py
+++ b/src/openllm/cli.py
@@ -517,12 +517,8 @@ def cli_factory() -> click.Group:
          ╚═════╝ ╚═╝     ╚══════╝╚═╝  ╚═══╝╚══════╝╚══════╝╚═╝     ╚═╝
 
         \b
-        OpenLLM: Your one stop-and-go-solution for serving any Open Large-Language Model
-
-            - StableLM, Falcon, ChatGLM, Dolly, Flan-T5, and more
-
-        \b
-            - Powered by BentoML 🍱
+        An open platform for operating large language models in production.
+        Fine-tune, serve, deploy, and monitor any LLMs with ease.
         """
 
     @cli.group(cls=OpenLLMCommandGroup, context_settings=_CONTEXT_SETTINGS)
@@ -588,7 +584,7 @@ def cli_factory() -> click.Group:
             _echo(bento.tag)
         return bento
 
-    @cli.command(aliases=["list"])
+    @cli.command()
     @output_option
     def models(output: OutputLiteral):
         """List all supported models."""
@@ -649,7 +645,7 @@ def cli_factory() -> click.Group:
 
         sys.exit(0)
 
-    @cli.command(aliases=["save"])
+    @cli.command()
     @click.argument(
         "model_name", type=click.Choice([inflection.dasherize(name) for name in openllm.CONFIG_MAPPING.keys()])
     )
@@ -729,11 +725,12 @@ def cli_factory() -> click.Group:
                 model_store.delete(model.tag)
                 click.echo(f"{model} deleted.")
 
-    @cli.command(name="query", aliases=["run", "ask"])
+    @cli.command(name="query")
     @click.option(
         "--endpoint",
         type=click.STRING,
-        help="LLM Server endpoint, i.e: http://12.323.2.1",
+        help="OpenLLM Server endpoint, i.e: http://0.0.0.0:3000",
+        envvar="OPENLLM_ENDPOINT",
         default="http://0.0.0.0:3000",
     )
     @click.option("--timeout", type=click.INT, default=30, help="Default server timeout", show_default=True)
diff --git a/tools/update-readme.py b/tools/update-readme.py
index 2adbc87d..a9d48348 100755
--- a/tools/update-readme.py
+++ b/tools/update-readme.py
@@ -38,11 +38,11 @@ def main() -> int:
         readme = f.readlines()
 
     start_index, stop_index = readme.index(START_COMMENT), readme.index(END_COMMENT)
-    formatted: dict[t.Literal["Model", "CPU", "GPU", "Optional"], list[str]] = {
+    formatted: dict[t.Literal["Model", "CPU", "GPU", "Installation"], list[str]] = {
         "Model": [],
         "CPU": [],
         "GPU": [],
-        "Optional": [],
+        "Installation": [],
     }
     max_name_len_div = 0
     max_install_len_div = 0
@@ -55,19 +55,19 @@ def main() -> int:
         formatted["Model"].append(model_name)
         formatted["GPU"].append("✅")
         formatted["CPU"].append("✅" if not config.__openllm_requires_gpu__ else "❌")
-        instruction = "👾 (not needed)"
+        instruction = "`pip install openllm`"
         if dashed in deps:
-            instruction = f"`pip install openllm[{dashed}]`"
+            instruction = f"""`pip install "openllm[{dashed}]"`"""
         else:
             does_not_need_custom_installation.append(model_name)
         if len(instruction) > max_install_len_div:
             max_install_len_div = len(instruction)
-        formatted["Optional"].append(instruction)
+        formatted["Installation"].append(instruction)
 
     meta = ["\n"]
 
     # NOTE: headers
-    meta += f"| Model {' ' * (max_name_len_div - 6)} | CPU | GPU | Optional {' ' * (max_install_len_div - 8)}|\n"
+    meta += f"| Model {' ' * (max_name_len_div - 6)} | CPU | GPU | Installation {' ' * (max_install_len_div - 8)}|\n"
     # NOTE: divs
     meta += f"| {'-' * max_name_len_div}" + " | --- | --- | " + f"{'-' * max_install_len_div} |\n"
     # NOTE: rows
@@ -88,15 +88,6 @@ def main() -> int:
         )
     meta += "\n"
 
-    # NOTE: adding notes
-    meta += """\
-> NOTE: We respect users' system disk space. Hence, OpenLLM doesn't enforce to
-> install dependencies to run all models. If one wishes to use any of the
-> aforementioned models, make sure to install the optional dependencies
-> mentioned above.
-
-"""
-
     readme = readme[:start_index] + [START_COMMENT] + meta + [END_COMMENT] + readme[stop_index + 1 :]
 
     with open(os.path.join(ROOT, "README.md"), "w") as f: