infra: prepare for release 0.2.12 [generated] [skip ci]

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
This commit is contained in:
Aaron Pham
2023-08-01 23:27:01 +00:00
parent af54ff299f
commit 57fdbda192
4 changed files with 46 additions and 39 deletions

View File

@@ -18,6 +18,51 @@ This changelog is managed by towncrier and is compiled at release time.
<!-- towncrier release notes start -->
## [0.2.12](https://github.com/bentoml/openllm/tree/v0.2.12)
### Features
- Added support for base container with OpenLLM. The base container will contains all necessary requirements
to run OpenLLM. Currently it does included compiled version of FlashAttention v2, vLLM, AutoGPTQ and triton.
This will now be the base image for all future BentoLLM. The image will also be published to public GHCR.
To extend and use this image into your bento, simply specify ``base_image`` under ``bentofile.yaml``:
```yaml
docker:
base_image: ghcr.io/bentoml/openllm:<hash>
```
The release strategy would include:
- versioning of ``ghcr.io/bentoml/openllm:sha-<sha1>`` for every commit to main, ``ghcr.io/bentoml/openllm:0.2.11`` for specific release version
- alias ``latest`` will be managed with docker/build-push-action (discouraged)
Note that all these images include compiled kernels that has been tested on Ampere GPUs with CUDA 11.8.
To quickly run the image, do the following:
```bash
docker run --rm --gpus all -it -v /home/ubuntu/.local/share/bentoml:/tmp/bentoml -e BENTOML_HOME=/tmp/bentoml \
-e OPENLLM_USE_LOCAL_LATEST=True -e OPENLLM_LLAMA_FRAMEWORK=vllm ghcr.io/bentoml/openllm:2b5e96f90ad314f54e07b5b31e386e7d688d9bb2 start llama --model-id meta-llama/Llama-2-7b-chat-hf --workers-per-resource conserved --debug`
```
In conjunction with this, OpenLLM now also have a set of small CLI utilities via ``openllm ext`` for ease-of-use
General fixes around codebase bytecode optimization
Fixes logs output to filter correct level based on ``--debug`` and ``--quiet``
``openllm build`` now will default run model check locally. To skip it pass in ``--fast`` (before this is the default behaviour, but ``--no-fast`` as default makes more sense here as ``openllm build`` should also be able to run standalone)
All ``LlaMA`` namespace has been renamed to ``Llama`` (internal change and shouldn't affect end users)
``openllm.AutoModel.for_model`` now will always return the instance. Runner kwargs will be handled via create_runner
[#142](https://github.com/bentoml/openllm/issues/142)
- All OpenLLM base container now are scanned for security vulnerabilities using
trivy (both SBOM mode and CVE)
[#169](https://github.com/bentoml/openllm/issues/169)
## [0.2.11](https://github.com/bentoml/openllm/tree/v0.2.11)

View File

@@ -1,36 +0,0 @@
Added support for base container with OpenLLM. The base container will contains all necessary requirements
to run OpenLLM. Currently it does included compiled version of FlashAttention v2, vLLM, AutoGPTQ and triton.
This will now be the base image for all future BentoLLM. The image will also be published to public GHCR.
To extend and use this image into your bento, simply specify ``base_image`` under ``bentofile.yaml``:
```yaml
docker:
base_image: ghcr.io/bentoml/openllm:<hash>
```
The release strategy would include:
- versioning of ``ghcr.io/bentoml/openllm:sha-<sha1>`` for every commit to main, ``ghcr.io/bentoml/openllm:0.2.11`` for specific release version
- alias ``latest`` will be managed with docker/build-push-action (discouraged)
Note that all these images include compiled kernels that has been tested on Ampere GPUs with CUDA 11.8.
To quickly run the image, do the following:
```bash
docker run --rm --gpus all -it -v /home/ubuntu/.local/share/bentoml:/tmp/bentoml -e BENTOML_HOME=/tmp/bentoml \
-e OPENLLM_USE_LOCAL_LATEST=True -e OPENLLM_LLAMA_FRAMEWORK=vllm ghcr.io/bentoml/openllm:2b5e96f90ad314f54e07b5b31e386e7d688d9bb2 start llama --model-id meta-llama/Llama-2-7b-chat-hf --workers-per-resource conserved --debug`
```
In conjunction with this, OpenLLM now also have a set of small CLI utilities via ``openllm ext`` for ease-of-use
General fixes around codebase bytecode optimization
Fixes logs output to filter correct level based on ``--debug`` and ``--quiet``
``openllm build`` now will default run model check locally. To skip it pass in ``--fast`` (before this is the default behaviour, but ``--no-fast`` as default makes more sense here as ``openllm build`` should also be able to run standalone)
All ``LlaMA`` namespace has been renamed to ``Llama`` (internal change and shouldn't affect end users)
``openllm.AutoModel.for_model`` now will always return the instance. Runner kwargs will be handled via create_runner

View File

@@ -1,2 +0,0 @@
All OpenLLM base container now are scanned for security vulnerabilities using
trivy (both SBOM mode and CVE)

View File

@@ -1,6 +1,6 @@
{
"name": "openllm",
"version": "0.2.12.dev0",
"version": "0.2.12",
"description": "OpenLLM: Your one stop-and-go solution for serving Large Language Model",
"repository": "git@github.com:llmsys/OpenLLM.git",
"author": "Aaron Pham <29749331+aarnphm@users.noreply.github.com>",