mirror of
https://github.com/bentoml/OpenLLM.git
synced 2026-06-14 03:20:18 -04:00
infra: prepare for release 0.2.12 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
This commit is contained in:
45
CHANGELOG.md
45
CHANGELOG.md
@@ -18,6 +18,51 @@ This changelog is managed by towncrier and is compiled at release time.
|
||||
|
||||
<!-- towncrier release notes start -->
|
||||
|
||||
## [0.2.12](https://github.com/bentoml/openllm/tree/v0.2.12)
|
||||
|
||||
### Features
|
||||
|
||||
- Added support for base container with OpenLLM. The base container will contains all necessary requirements
|
||||
to run OpenLLM. Currently it does included compiled version of FlashAttention v2, vLLM, AutoGPTQ and triton.
|
||||
|
||||
This will now be the base image for all future BentoLLM. The image will also be published to public GHCR.
|
||||
|
||||
To extend and use this image into your bento, simply specify ``base_image`` under ``bentofile.yaml``:
|
||||
|
||||
```yaml
|
||||
docker:
|
||||
base_image: ghcr.io/bentoml/openllm:<hash>
|
||||
```
|
||||
|
||||
The release strategy would include:
|
||||
- versioning of ``ghcr.io/bentoml/openllm:sha-<sha1>`` for every commit to main, ``ghcr.io/bentoml/openllm:0.2.11`` for specific release version
|
||||
- alias ``latest`` will be managed with docker/build-push-action (discouraged)
|
||||
|
||||
Note that all these images include compiled kernels that has been tested on Ampere GPUs with CUDA 11.8.
|
||||
|
||||
To quickly run the image, do the following:
|
||||
|
||||
```bash
|
||||
docker run --rm --gpus all -it -v /home/ubuntu/.local/share/bentoml:/tmp/bentoml -e BENTOML_HOME=/tmp/bentoml \
|
||||
-e OPENLLM_USE_LOCAL_LATEST=True -e OPENLLM_LLAMA_FRAMEWORK=vllm ghcr.io/bentoml/openllm:2b5e96f90ad314f54e07b5b31e386e7d688d9bb2 start llama --model-id meta-llama/Llama-2-7b-chat-hf --workers-per-resource conserved --debug`
|
||||
```
|
||||
|
||||
In conjunction with this, OpenLLM now also have a set of small CLI utilities via ``openllm ext`` for ease-of-use
|
||||
|
||||
General fixes around codebase bytecode optimization
|
||||
|
||||
Fixes logs output to filter correct level based on ``--debug`` and ``--quiet``
|
||||
|
||||
``openllm build`` now will default run model check locally. To skip it pass in ``--fast`` (before this is the default behaviour, but ``--no-fast`` as default makes more sense here as ``openllm build`` should also be able to run standalone)
|
||||
|
||||
All ``LlaMA`` namespace has been renamed to ``Llama`` (internal change and shouldn't affect end users)
|
||||
|
||||
``openllm.AutoModel.for_model`` now will always return the instance. Runner kwargs will be handled via create_runner
|
||||
[#142](https://github.com/bentoml/openllm/issues/142)
|
||||
- All OpenLLM base container now are scanned for security vulnerabilities using
|
||||
trivy (both SBOM mode and CVE)
|
||||
[#169](https://github.com/bentoml/openllm/issues/169)
|
||||
|
||||
|
||||
## [0.2.11](https://github.com/bentoml/openllm/tree/v0.2.11)
|
||||
|
||||
|
||||
@@ -1,36 +0,0 @@
|
||||
Added support for base container with OpenLLM. The base container will contains all necessary requirements
|
||||
to run OpenLLM. Currently it does included compiled version of FlashAttention v2, vLLM, AutoGPTQ and triton.
|
||||
|
||||
This will now be the base image for all future BentoLLM. The image will also be published to public GHCR.
|
||||
|
||||
To extend and use this image into your bento, simply specify ``base_image`` under ``bentofile.yaml``:
|
||||
|
||||
```yaml
|
||||
docker:
|
||||
base_image: ghcr.io/bentoml/openllm:<hash>
|
||||
```
|
||||
|
||||
The release strategy would include:
|
||||
- versioning of ``ghcr.io/bentoml/openllm:sha-<sha1>`` for every commit to main, ``ghcr.io/bentoml/openllm:0.2.11`` for specific release version
|
||||
- alias ``latest`` will be managed with docker/build-push-action (discouraged)
|
||||
|
||||
Note that all these images include compiled kernels that has been tested on Ampere GPUs with CUDA 11.8.
|
||||
|
||||
To quickly run the image, do the following:
|
||||
|
||||
```bash
|
||||
docker run --rm --gpus all -it -v /home/ubuntu/.local/share/bentoml:/tmp/bentoml -e BENTOML_HOME=/tmp/bentoml \
|
||||
-e OPENLLM_USE_LOCAL_LATEST=True -e OPENLLM_LLAMA_FRAMEWORK=vllm ghcr.io/bentoml/openllm:2b5e96f90ad314f54e07b5b31e386e7d688d9bb2 start llama --model-id meta-llama/Llama-2-7b-chat-hf --workers-per-resource conserved --debug`
|
||||
```
|
||||
|
||||
In conjunction with this, OpenLLM now also have a set of small CLI utilities via ``openllm ext`` for ease-of-use
|
||||
|
||||
General fixes around codebase bytecode optimization
|
||||
|
||||
Fixes logs output to filter correct level based on ``--debug`` and ``--quiet``
|
||||
|
||||
``openllm build`` now will default run model check locally. To skip it pass in ``--fast`` (before this is the default behaviour, but ``--no-fast`` as default makes more sense here as ``openllm build`` should also be able to run standalone)
|
||||
|
||||
All ``LlaMA`` namespace has been renamed to ``Llama`` (internal change and shouldn't affect end users)
|
||||
|
||||
``openllm.AutoModel.for_model`` now will always return the instance. Runner kwargs will be handled via create_runner
|
||||
@@ -1,2 +0,0 @@
|
||||
All OpenLLM base container now are scanned for security vulnerabilities using
|
||||
trivy (both SBOM mode and CVE)
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "openllm",
|
||||
"version": "0.2.12.dev0",
|
||||
"version": "0.2.12",
|
||||
"description": "OpenLLM: Your one stop-and-go solution for serving Large Language Model",
|
||||
"repository": "git@github.com:llmsys/OpenLLM.git",
|
||||
"author": "Aaron Pham <29749331+aarnphm@users.noreply.github.com>",
|
||||
|
||||
Reference in New Issue
Block a user