infra: update changelog and added readme badges [generated] (#162)

This commit is contained in:
Aaron Pham
2023-07-31 04:02:02 -04:00
committed by GitHub
parent fec68d732b
commit 4fbfb363bf
3 changed files with 85 additions and 37 deletions

View File

@@ -20,11 +20,6 @@ on:
- "main"
tags:
- "v*"
paths:
- ".github/workflows/build.yaml"
- "src/openllm/bundle/oci/Dockerfile"
- "src/openllm/**"
- "src/openllm_client/**"
pull_request:
branches:
- "main"

View File

@@ -18,6 +18,55 @@ This changelog is managed by towncrier and is compiled at release time.
<!-- towncrier release notes start -->
## Changes for the Upcoming Release
> **Warning**: These changes reflect the current [development progress](https://github.com/bentoml/openllm/tree/main)
> and have **not** been part of a official PyPI release yet.
> To try out the latest change, one can do: `pip install -U git+https://github.com/bentoml/openllm.git@main`
### Features
- Added support for base container with OpenLLM. The base container will contains all necessary requirements
to run OpenLLM. Currently it does included compiled version of FlashAttention v2, vLLM, AutoGPTQ and triton.
This will now be the base image for all future BentoLLM. The image will also be published to public GHCR.
To extend and use this image into your bento, simply specify ``base_image`` under ``bentofile.yaml``:
```yaml
docker:
base_image: ghcr.io/bentoml/openllm:<hash>
```
The release strategy would include:
- versioning of ``ghcr.io/bentoml/openllm:sha-<sha1>`` for every commit to main, ``ghcr.io/bentoml/openllm:0.2.11`` for specific release version
- alias ``latest`` will be managed with docker/build-push-action (discouraged)
Note that all these images include compiled kernels that has been tested on Ampere GPUs with CUDA 11.8.
To quickly run the image, do the following:
```bash
docker run --rm --gpus all -it -v /home/ubuntu/.local/share/bentoml:/tmp/bentoml -e BENTOML_HOME=/tmp/bentoml \
-e OPENLLM_USE_LOCAL_LATEST=True -e OPENLLM_LLAMA_FRAMEWORK=vllm ghcr.io/bentoml/openllm:2b5e96f90ad314f54e07b5b31e386e7d688d9bb2 start llama --model-id meta-llama/Llama-2-7b-chat-hf --workers-per-resource conserved --debug`
```
In conjunction with this, OpenLLM now also have a set of small CLI utilities via ``openllm ext`` for ease-of-use
General fixes around codebase bytecode optimization
Fixes logs output to filter correct level based on ``--debug`` and ``--quiet``
``openllm build`` now will default run model check locally. To skip it pass in ``--fast`` (before this is the default behaviour, but ``--no-fast`` as default makes more sense here as ``openllm build`` should also be able to run standalone)
All ``LlaMA`` namespace has been renamed to ``Llama`` (internal change and shouldn't affect end users)
``openllm.AutoModel.for_model`` now will always return the instance. Runner kwargs will be handled via create_runner
[#142](https://github.com/bentoml/openllm/issues/142)
## [0.2.11](https://github.com/bentoml/openllm/tree/v0.2.11)
### Features

View File

@@ -6,20 +6,25 @@
<h1 align="center">🦾 OpenLLM</h1>
<a href="https://pypi.org/project/openllm">
<img src="https://img.shields.io/pypi/v/openllm.svg?logo=pypi&label=PyPI&logoColor=gold" alt="pypi_status" />
</a><a href="https://github.com/bentoml/OpenLLM/actions/workflows/ci.yml">
<img src="https://github.com/bentoml/OpenLLM/actions/workflows/ci.yml/badge.svg?branch=main" alt="ci" />
</a><a href="https://twitter.com/bentomlai">
<img src="https://badgen.net/badge/icon/@bentomlai/1DA1F2?icon=twitter&label=Follow%20Us" alt="Twitter" />
</a><a href="https://l.bentoml.com/join-openllm-discord">
<img src="https://badgen.net/badge/icon/OpenLLM/7289da?icon=discord&label=Join%20Us" alt="Discord" />
</a><a href="https://github.com/bentoml/OpenLLM/actions/workflows/ci.yml">
<img src="https://github.com/bentoml/OpenLLM/actions/workflows/ci.yml/badge.svg?branch=main" alt="ci" />
</a><a href="https://results.pre-commit.ci/latest/github/bentoml/OpenLLM/main">
<img src="https://results.pre-commit.ci/badge/github/bentoml/OpenLLM/main.svg" alt="pre-commit.ci status" />
</a><br>
</a><a href="https://pypi.org/project/openllm">
<a href="https://pypi.org/project/openllm">
<img src="https://img.shields.io/pypi/pyversions/openllm.svg?logo=python&label=Python&logoColor=gold" alt="python_version" />
</a><a href="https://github.com/pypa/hatch">
<img src="https://img.shields.io/badge/%F0%9F%A5%9A-Hatch-4051b5.svg" alt="Hatch" />
</a><br>
</a><a href="https://github.com/astral-sh/ruff">
<img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/charliermarsh/ruff/main/assets/badge/v2.json" alt="Ruff" />
</a><a href="https://github.com/python/mypy">
<img src="https://img.shields.io/badge/types-mypy-blue.svg" alt="types - mypy" />
</a><a href="https://github.com/microsoft/pyright">
<img src="https://img.shields.io/badge/types-pyright-yellow.svg" alt="types - pyright" />
</a><br>
<p>An open platform for operating large language models (LLMs) in production.</br>
Fine-tune, serve, deploy, and monitor any LLMs with ease.</p>
@@ -120,17 +125,19 @@ openllm query 'Explain to me the difference between "further" and "farther"'
Visit `http://localhost:3000/docs.json` for OpenLLM's API specification.
OpenLLM seamlessly supports many models and their variants.
Users can also specify different variants of the model to be served, by
providing the `--model-id` argument, e.g.:
OpenLLM seamlessly supports many models and their variants. Users can also
specify different variants of the model to be served, by providing the
`--model-id` argument, e.g.:
```bash
openllm start flan-t5 --model-id google/flan-t5-large
```
> **Note** that `openllm` also supports all variants of fine-tuning weights, custom model path
> as well as quantized weights for any of the supported models as long as it can be loaded with
> the model architecture. Refer to [supported models](https://github.com/bentoml/OpenLLM/tree/main#-supported-models) section for models' architecture.
> **Note** that `openllm` also supports all variants of fine-tuning weights,
> custom model path as well as quantized weights for any of the supported models
> as long as it can be loaded with the model architecture. Refer to
> [supported models](https://github.com/bentoml/OpenLLM/tree/main#-supported-models)
> section for models' architecture.
Use the `openllm models` command to see the list of models and their variants
supported in OpenLLM.
@@ -473,8 +480,8 @@ To include this into the Bento, one can also provide a `--adapter-id` into
openllm build opt --model-id facebook/opt-6.7b --adapter-id ...
```
> **Note**: We will gradually roll out support for fine-tuning all models.
> The following models contain fine-tuning support: OPT, Falcon, LlaMA.
> **Note**: We will gradually roll out support for fine-tuning all models. The
> following models contain fine-tuning support: OPT, Falcon, LlaMA.
### Integrating a New Model
@@ -485,10 +492,10 @@ to see how you can do it yourself.
### Embeddings
OpenLLM tentatively provides embeddings endpoint for supported models.
This can be accessed via `/v1/embeddings`.
OpenLLM tentatively provides embeddings endpoint for supported models. This can
be accessed via `/v1/embeddings`.
To use via CLI, simply call ``openllm embed``:
To use via CLI, simply call `openllm embed`:
```bash
openllm embed --endpoint http://localhost:3000 "I like to eat apples" -o json
@@ -508,7 +515,7 @@ openllm embed --endpoint http://localhost:3000 "I like to eat apples" -o json
}
```
To invoke this endpoint, use ``client.embed`` from the Python SDK:
To invoke this endpoint, use `client.embed` from the Python SDK:
```python
import openllm
@@ -518,15 +525,16 @@ client = openllm.client.HTTPClient("http://localhost:3000")
client.embed("I like to eat apples")
```
> **Note**: Currently, the following model framily supports embeddings: Llama, T5 (Flan-T5, FastChat, etc.), ChatGLM
> **Note**: Currently, the following model framily supports embeddings: Llama,
> T5 (Flan-T5, FastChat, etc.), ChatGLM
## ⚙️ Integrations
OpenLLM is not just a standalone product; it's a building block designed to
integrate with other powerful tools easily. We currently offer integration with
[BentoML](https://github.com/bentoml/BentoML),
[LangChain](https://github.com/hwchase17/langchain),
and [Transformers Agents](https://huggingface.co/docs/transformers/transformers_agents).
[LangChain](https://github.com/hwchase17/langchain), and
[Transformers Agents](https://huggingface.co/docs/transformers/transformers_agents).
### BentoML
@@ -555,7 +563,6 @@ async def prompt(input_text: str) -> str:
return answer
```
### [LangChain](https://python.langchain.com/docs/ecosystem/integrations/openllm)
To quickly start a local LLM with `langchain`, simply do the following:
@@ -600,15 +607,14 @@ def chat(input_text: str):
> **Note** You can find out more examples under the
> [examples](https://github.com/bentoml/OpenLLM/tree/main/examples) folder.
### Transformers Agents
OpenLLM seamlessly integrates with [Transformers Agents](https://huggingface.co/docs/transformers/transformers_agents).
OpenLLM seamlessly integrates with
[Transformers Agents](https://huggingface.co/docs/transformers/transformers_agents).
> **Warning** The Transformers Agent is still at an experimental stage. It is
> recommended to install OpenLLM with `pip install -r nightly-requirements.txt` to get
> the latest API update for HuggingFace agent.
> recommended to install OpenLLM with `pip install -r nightly-requirements.txt`
> to get the latest API update for HuggingFace agent.
```python
import transformers
@@ -665,15 +671,14 @@ There are several ways to deploy your LLMs:
```bash
bentoml containerize <name:version>
```
This generates a OCI-compatible docker image that can be deployed anywhere docker runs.
For best scalability and reliability of your LLM service in production, we recommend deploy
with BentoCloud。
This generates a OCI-compatible docker image that can be deployed anywhere
docker runs. For best scalability and reliability of your LLM service in
production, we recommend deploy with BentoCloud。
### ☁️ BentoCloud
Deploy OpenLLM with [BentoCloud](https://www.bentoml.com/bento-cloud/),
the serverless cloud for shipping and scaling AI applications.
Deploy OpenLLM with [BentoCloud](https://www.bentoml.com/bento-cloud/), the
serverless cloud for shipping and scaling AI applications.
1. **Create a BentoCloud account:** [sign up here](https://bentoml.com/cloud)
for early access
@@ -705,7 +710,6 @@ the serverless cloud for shipping and scaling AI applications.
`bentoml deployment create` command following the
[deployment instructions](https://docs.bentoml.com/en/latest/reference/cli.html#bentoml-deployment-create).
## 👥 Community
Engage with like-minded individuals passionate about LLMs, AI, and more on our