infra: prepare for release 0.3.0 [generated] [skip ci]

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2026-04-21 23:47:23 -04:00 · 2023-09-04 19:03:41 +00:00
parent 5eea40a599
commit 06a68ade7d
8 changed files with 86 additions and 33 deletions
--- a/openllm-python/CHANGELOG.md
+++ b/openllm-python/CHANGELOG.md
@@ -18,6 +18,47 @@ This changelog is managed by towncrier and is compiled at release time.

 <!-- towncrier release notes start -->

+## [0.3.0](https://github.com/bentoml/openllm/tree/v0.3.0)
+
+### Backwards-incompatible Changes
+
+- All environment variable now will be more simplified, without the need for the specific model prefix
+
+  For example: OPENLLM_LLAMA_GENERATION_MAX_NEW_TOKENS now becomes OPENLLM_GENERATION_MAX_NEW_TOKENS
+
+  Unify some misc environment variable. To switch different backend, one can use `--backend` for both `start` and `build`
+
+  ```bash
+  openllm start llama --backend vllm
+  ```
+
+  or the environment variable `OPENLLM_BACKEND`
+
+  ```bash
+  OPENLLM_BACKEND=vllm openllm start llama
+  ```
+
+  `openllm.Runner` now will default to try download the model the first time if the model is not available, and get the cached in model store consequently
+
+  Model serialisation now updated to a new API version with more clear name change, kindly ask users to do `openllm prune -y --include-bentos` and update to
+  this current version of openllm
+  [#283](https://github.com/bentoml/openllm/issues/283)
+
+
+### Refactor
+
+- Refactor GPTQ to use official implementation from transformers>=4.32
+  [#297](https://github.com/bentoml/openllm/issues/297)
+
+
+### Features
+
+- Added support for vLLM streaming
+
+  This can now be accessed via `/v1/generate_stream`
+  [#260](https://github.com/bentoml/openllm/issues/260)
+
+
 ## [0.2.27](https://github.com/bentoml/openllm/tree/v0.2.27)

 ### Changes
@@ -230,7 +271,7 @@ No significant changes.

  ```bash
  docker run --rm --gpus all -it -v /home/ubuntu/.local/share/bentoml:/tmp/bentoml -e BENTOML_HOME=/tmp/bentoml \
-              -e OPENLLM_USE_LOCAL_LATEST=True -e OPENLLM_LLAMA_FRAMEWORK=vllm ghcr.io/bentoml/openllm:2b5e96f90ad314f54e07b5b31e386e7d688d9bb2 start llama --model-id meta-llama/Llama-2-7b-chat-hf --workers-per-resource conserved --debug`
+              -e OPENLLM_USE_LOCAL_LATEST=True -e OPENLLM_BACKEND=vllm ghcr.io/bentoml/openllm:2b5e96f90ad314f54e07b5b31e386e7d688d9bb2 start llama --model-id meta-llama/Llama-2-7b-chat-hf --workers-per-resource conserved --debug`
  ```

  In conjunction with this, OpenLLM now also have a set of small CLI utilities via ``openllm ext`` for ease-of-use
@@ -721,9 +762,6 @@ No significant changes.
  `openllm start` now support `--quantize int8` and `--quantize int4` `GPTQ`
  quantization support is on the roadmap and currently being worked on.

-  `openllm start` now also support `--bettertransformer` to use
-  `BetterTransformer` for serving.
-
  Refactored `openllm.LLMConfig` to be able to use with `__getitem__`:
  `openllm.DollyV2Config()['requirements']`.

@@ -732,8 +770,6 @@ No significant changes.

  Added `towncrier` workflow to easily generate changelog entries

-  Added `use_pipeline`, `bettertransformer` flag into ModelSettings
-
  `LLMConfig` now supported `__dataclass_transform__` protocol to help with
  type-checking