diff --git a/CHANGELOG.md b/CHANGELOG.md index 56bd5f96..d3cb322d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -18,6 +18,51 @@ This changelog is managed by towncrier and is compiled at release time. +## [0.2.12](https://github.com/bentoml/openllm/tree/v0.2.12) + +### Features + +- Added support for base container with OpenLLM. The base container will contains all necessary requirements + to run OpenLLM. Currently it does included compiled version of FlashAttention v2, vLLM, AutoGPTQ and triton. + + This will now be the base image for all future BentoLLM. The image will also be published to public GHCR. + + To extend and use this image into your bento, simply specify ``base_image`` under ``bentofile.yaml``: + + ```yaml + docker: + base_image: ghcr.io/bentoml/openllm: + ``` + + The release strategy would include: + - versioning of ``ghcr.io/bentoml/openllm:sha-`` for every commit to main, ``ghcr.io/bentoml/openllm:0.2.11`` for specific release version + - alias ``latest`` will be managed with docker/build-push-action (discouraged) + + Note that all these images include compiled kernels that has been tested on Ampere GPUs with CUDA 11.8. + + To quickly run the image, do the following: + + ```bash + docker run --rm --gpus all -it -v /home/ubuntu/.local/share/bentoml:/tmp/bentoml -e BENTOML_HOME=/tmp/bentoml \ + -e OPENLLM_USE_LOCAL_LATEST=True -e OPENLLM_LLAMA_FRAMEWORK=vllm ghcr.io/bentoml/openllm:2b5e96f90ad314f54e07b5b31e386e7d688d9bb2 start llama --model-id meta-llama/Llama-2-7b-chat-hf --workers-per-resource conserved --debug` + ``` + + In conjunction with this, OpenLLM now also have a set of small CLI utilities via ``openllm ext`` for ease-of-use + + General fixes around codebase bytecode optimization + + Fixes logs output to filter correct level based on ``--debug`` and ``--quiet`` + + ``openllm build`` now will default run model check locally. To skip it pass in ``--fast`` (before this is the default behaviour, but ``--no-fast`` as default makes more sense here as ``openllm build`` should also be able to run standalone) + + All ``LlaMA`` namespace has been renamed to ``Llama`` (internal change and shouldn't affect end users) + + ``openllm.AutoModel.for_model`` now will always return the instance. Runner kwargs will be handled via create_runner + [#142](https://github.com/bentoml/openllm/issues/142) +- All OpenLLM base container now are scanned for security vulnerabilities using + trivy (both SBOM mode and CVE) + [#169](https://github.com/bentoml/openllm/issues/169) + ## [0.2.11](https://github.com/bentoml/openllm/tree/v0.2.11) diff --git a/changelog.d/142.feature.md b/changelog.d/142.feature.md deleted file mode 100644 index 94b959f1..00000000 --- a/changelog.d/142.feature.md +++ /dev/null @@ -1,36 +0,0 @@ -Added support for base container with OpenLLM. The base container will contains all necessary requirements -to run OpenLLM. Currently it does included compiled version of FlashAttention v2, vLLM, AutoGPTQ and triton. - -This will now be the base image for all future BentoLLM. The image will also be published to public GHCR. - -To extend and use this image into your bento, simply specify ``base_image`` under ``bentofile.yaml``: - -```yaml -docker: - base_image: ghcr.io/bentoml/openllm: -``` - -The release strategy would include: -- versioning of ``ghcr.io/bentoml/openllm:sha-`` for every commit to main, ``ghcr.io/bentoml/openllm:0.2.11`` for specific release version -- alias ``latest`` will be managed with docker/build-push-action (discouraged) - -Note that all these images include compiled kernels that has been tested on Ampere GPUs with CUDA 11.8. - -To quickly run the image, do the following: - -```bash -docker run --rm --gpus all -it -v /home/ubuntu/.local/share/bentoml:/tmp/bentoml -e BENTOML_HOME=/tmp/bentoml \ - -e OPENLLM_USE_LOCAL_LATEST=True -e OPENLLM_LLAMA_FRAMEWORK=vllm ghcr.io/bentoml/openllm:2b5e96f90ad314f54e07b5b31e386e7d688d9bb2 start llama --model-id meta-llama/Llama-2-7b-chat-hf --workers-per-resource conserved --debug` -``` - -In conjunction with this, OpenLLM now also have a set of small CLI utilities via ``openllm ext`` for ease-of-use - -General fixes around codebase bytecode optimization - -Fixes logs output to filter correct level based on ``--debug`` and ``--quiet`` - -``openllm build`` now will default run model check locally. To skip it pass in ``--fast`` (before this is the default behaviour, but ``--no-fast`` as default makes more sense here as ``openllm build`` should also be able to run standalone) - -All ``LlaMA`` namespace has been renamed to ``Llama`` (internal change and shouldn't affect end users) - -``openllm.AutoModel.for_model`` now will always return the instance. Runner kwargs will be handled via create_runner diff --git a/changelog.d/169.feature.md b/changelog.d/169.feature.md deleted file mode 100644 index fcaea7b2..00000000 --- a/changelog.d/169.feature.md +++ /dev/null @@ -1,2 +0,0 @@ -All OpenLLM base container now are scanned for security vulnerabilities using -trivy (both SBOM mode and CVE) diff --git a/package.json b/package.json index 58261cf3..9ee11015 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "openllm", - "version": "0.2.12.dev0", + "version": "0.2.12", "description": "OpenLLM: Your one stop-and-go solution for serving Large Language Model", "repository": "git@github.com:llmsys/OpenLLM.git", "author": "Aaron Pham <29749331+aarnphm@users.noreply.github.com>",