infra: prepare for release 0.2.12 [generated] [skip ci]

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2026-08-02 19:22:27 -04:00 · 2023-08-01 23:27:01 +00:00
parent af54ff299f
commit 57fdbda192
4 changed files with 46 additions and 39 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -18,6 +18,51 @@ This changelog is managed by towncrier and is compiled at release time.

 <!-- towncrier release notes start -->

+## [0.2.12](https://github.com/bentoml/openllm/tree/v0.2.12)
+
+### Features
+
+- Added support for base container with OpenLLM. The base container will contains all necessary requirements
+  to run OpenLLM. Currently it does included compiled version of FlashAttention v2, vLLM, AutoGPTQ and triton.
+
+  This will now be the base image for all future BentoLLM. The image will also be published to public GHCR.
+
+  To extend and use this image into your bento, simply specify ``base_image`` under ``bentofile.yaml``:
+
+  ```yaml
+  docker:
+    base_image: ghcr.io/bentoml/openllm:<hash>
+  ```
+
+  The release strategy would include:
+  - versioning of ``ghcr.io/bentoml/openllm:sha-<sha1>`` for every commit to main, ``ghcr.io/bentoml/openllm:0.2.11`` for specific release version
+  - alias ``latest`` will be managed with docker/build-push-action (discouraged)
+
+  Note that all these images include compiled kernels that has been tested on Ampere GPUs with CUDA 11.8.
+
+  To quickly run the image, do the following:
+
+  ```bash
+  docker run --rm --gpus all -it -v /home/ubuntu/.local/share/bentoml:/tmp/bentoml -e BENTOML_HOME=/tmp/bentoml \
+              -e OPENLLM_USE_LOCAL_LATEST=True -e OPENLLM_LLAMA_FRAMEWORK=vllm ghcr.io/bentoml/openllm:2b5e96f90ad314f54e07b5b31e386e7d688d9bb2 start llama --model-id meta-llama/Llama-2-7b-chat-hf --workers-per-resource conserved --debug`
+  ```
+
+  In conjunction with this, OpenLLM now also have a set of small CLI utilities via ``openllm ext`` for ease-of-use
+
+  General fixes around codebase bytecode optimization
+
+  Fixes logs output to filter correct level based on ``--debug`` and ``--quiet``
+
+  ``openllm build`` now will default run model check locally. To skip it pass in ``--fast`` (before this is the default behaviour, but ``--no-fast`` as default makes more sense here as ``openllm build`` should also be able to run standalone)
+
+  All ``LlaMA`` namespace has been renamed to ``Llama`` (internal change and shouldn't affect end users)
+
+  ``openllm.AutoModel.for_model`` now will always return the instance. Runner kwargs will be handled via create_runner
+  [#142](https://github.com/bentoml/openllm/issues/142)
+- All OpenLLM base container now are scanned for security vulnerabilities using
+  trivy (both SBOM mode and CVE)
+  [#169](https://github.com/bentoml/openllm/issues/169)
+

 ## [0.2.11](https://github.com/bentoml/openllm/tree/v0.2.11)

--- a/changelog.d/142.feature.md
+++ b/changelog.d/142.feature.md
@@ -1,36 +0,0 @@
-Added support for base container with OpenLLM. The base container will contains all necessary requirements
-to run OpenLLM. Currently it does included compiled version of FlashAttention v2, vLLM, AutoGPTQ and triton.
-
-This will now be the base image for all future BentoLLM. The image will also be published to public GHCR.
-
-To extend and use this image into your bento, simply specify ``base_image`` under ``bentofile.yaml``:
-
-```yaml
-docker:
-  base_image: ghcr.io/bentoml/openllm:<hash>
-```
-
-The release strategy would include:
- versioning of ``ghcr.io/bentoml/openllm:sha-<sha1>`` for every commit to main, ``ghcr.io/bentoml/openllm:0.2.11`` for specific release version
- alias ``latest`` will be managed with docker/build-push-action (discouraged)
-
-Note that all these images include compiled kernels that has been tested on Ampere GPUs with CUDA 11.8.
-
-To quickly run the image, do the following:
-
-```bash
-docker run --rm --gpus all -it -v /home/ubuntu/.local/share/bentoml:/tmp/bentoml -e BENTOML_HOME=/tmp/bentoml \
-            -e OPENLLM_USE_LOCAL_LATEST=True -e OPENLLM_LLAMA_FRAMEWORK=vllm ghcr.io/bentoml/openllm:2b5e96f90ad314f54e07b5b31e386e7d688d9bb2 start llama --model-id meta-llama/Llama-2-7b-chat-hf --workers-per-resource conserved --debug`
-```
-
-In conjunction with this, OpenLLM now also have a set of small CLI utilities via ``openllm ext`` for ease-of-use
-
-General fixes around codebase bytecode optimization
-
-Fixes logs output to filter correct level based on ``--debug`` and ``--quiet``
-
-``openllm build`` now will default run model check locally. To skip it pass in ``--fast`` (before this is the default behaviour, but ``--no-fast`` as default makes more sense here as ``openllm build`` should also be able to run standalone)
-
-All ``LlaMA`` namespace has been renamed to ``Llama`` (internal change and shouldn't affect end users)
-
-``openllm.AutoModel.for_model`` now will always return the instance. Runner kwargs will be handled via create_runner
--- a/changelog.d/169.feature.md
+++ b/changelog.d/169.feature.md
@@ -1,2 +0,0 @@
-All OpenLLM base container now are scanned for security vulnerabilities using
-trivy (both SBOM mode and CVE)
--- a/package.json
+++ b/package.json
@@ -1,6 +1,6 @@
 {
  "name": "openllm",
-  "version": "0.2.12.dev0",
+  "version": "0.2.12",
  "description": "OpenLLM: Your one stop-and-go solution for serving Large Language Model",
  "repository": "git@github.com:llmsys/OpenLLM.git",
  "author": "Aaron Pham <29749331+aarnphm@users.noreply.github.com>",