infra: prepare for release 0.2.0 [generated]

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2026-05-04 13:52:46 -04:00 · 2023-07-19 23:43:52 +00:00
parent 292bca68c7
commit f9ca164e73
7 changed files with 70 additions and 56 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -18,6 +18,74 @@ This changelog is managed by towncrier and is compiled at release time.

 <!-- towncrier release notes start -->

+## [0.2.0](https://github.com/bentoml/openllm/tree/v0.2.0)
+
+### Features
+
+- Added support for GPTNeoX models. All variants of GPTNeoX, including Dolly-V2
+  and StableLM can now also use `openllm start gpt-neox`
+
+  `openllm models -o json` nows return CPU and GPU field. `openllm models` now
+  show table that mimics the one from README.md
+
+  Added scripts to automatically add models import to `__init__.py`
+
+  `--workers-per-resource` now accepts the following strategies:
+
+  - `round_robin`: Similar behaviour when setting `--workers-per-resource 1`. This
+    is useful for smaller models.
+  - `conserved`: This will determine the number of available GPU resources, and
+    only assign one worker for the LLMRunner with all available GPU resources. For
+    example, if ther are 4 GPUs available, then `conserved` is equivalent to
+    `--workers-per-resource 0.25`.
+  [#106](https://github.com/bentoml/openllm/issues/106)
+- Added support for [Baichuan](https://github.com/baichuan-inc/Baichuan-7B) model
+  generation, contributed by @hetaoBackend
+
+  Fixes how we handle model loader auto class for trust_remote_code in
+  transformers
+  [#115](https://github.com/bentoml/openllm/issues/115)
+
+
+### Bug fix
+
+- Fixes relative model_id handling for running LLM within the container.
+
+  Added support for building container directly with `openllm build`. Users now
+  can do `openllm build --format=container`:
+
+  ```bash
+  openllm build flan-t5 --format=container
+  ```
+
+  This is equivalent to:
+
+  ```bash
+  openllm build flan-t5 && bentoml containerize google-flan-t5-large-service
+  ```
+
+  Added Snapshot testing and more robust edge cases for model testing
+
+  General improvement in `openllm.LLM.import_model` where it will parse santised
+  parameters automatically.
+
+  Fixes `openllm start <bento>` to use correct `model_id`, ignoring `--model-id`
+  (The correct behaviour)
+
+  Fixes `--workers-per-resource conserved` to respect `--device`
+
+  Added initial interface for `LLM.embeddings`
+  [#107](https://github.com/bentoml/openllm/issues/107)
+- Fixes resources to correctly follows CUDA_VISIBLE_DEVICES spec
+
+  OpenLLM now contains a standalone parser that mimic `torch.cuda` parser for set
+  GPU devices. This parser will be used to parse both AMD and NVIDIA GPUs.
+
+  `openllm` should now be able to parse `GPU-` and `MIG-` UUID from both
+  configuration or spec.
+  [#114](https://github.com/bentoml/openllm/issues/114)
+
+
 ## [0.1.20](https://github.com/bentoml/openllm/tree/v0.1.20)

 ### Features
--- a/changelog.d/106.feature.md
+++ b/changelog.d/106.feature.md
@@ -1,16 +0,0 @@
-Added support for GPTNeoX models. All variants of GPTNeoX, including Dolly-V2
-and StableLM can now also use `openllm start gpt-neox`
-
-`openllm models -o json` nows return CPU and GPU field. `openllm models` now
-show table that mimics the one from README.md
-
-Added scripts to automatically add models import to `__init__.py`
-
-`--workers-per-resource` now accepts the following strategies:
-
- `round_robin`: Similar behaviour when setting `--workers-per-resource 1`. This
-  is useful for smaller models.
- `conserved`: This will determine the number of available GPU resources, and
-  only assign one worker for the LLMRunner with all available GPU resources. For
-  example, if ther are 4 GPUs available, then `conserved` is equivalent to
-  `--workers-per-resource 0.25`.
--- a/changelog.d/107.fix.md
+++ b/changelog.d/107.fix.md
@@ -1,26 +0,0 @@
-Fixes relative model_id handling for running LLM within the container.
-
-Added support for building container directly with `openllm build`. Users now
-can do `openllm build --format=container`:
-
-```bash
-openllm build flan-t5 --format=container
-```
-
-This is equivalent to:
-
-```bash
-openllm build flan-t5 && bentoml containerize google-flan-t5-large-service
-```
-
-Added Snapshot testing and more robust edge cases for model testing
-
-General improvement in `openllm.LLM.import_model` where it will parse santised
-parameters automatically.
-
-Fixes `openllm start <bento>` to use correct `model_id`, ignoring `--model-id`
-(The correct behaviour)
-
-Fixes `--workers-per-resource conserved` to respect `--device`
-
-Added initial interface for `LLM.embeddings`
--- a/changelog.d/114.fix.md
+++ b/changelog.d/114.fix.md
@@ -1,7 +0,0 @@
-Fixes resources to correctly follows CUDA_VISIBLE_DEVICES spec
-
-OpenLLM now contains a standalone parser that mimic `torch.cuda` parser for set
-GPU devices. This parser will be used to parse both AMD and NVIDIA GPUs.
-
-`openllm` should now be able to parse `GPU-` and `MIG-` UUID from both
-configuration or spec.
--- a/changelog.d/115.feature.md
+++ b/changelog.d/115.feature.md
@@ -1,5 +0,0 @@
-Added support for [Baichuan](https://github.com/baichuan-inc/Baichuan-7B) model
-generation, contributed by @hetaoBackend
-
-Fixes how we handle model loader auto class for trust_remote_code in
-transformers
--- a/package.json
+++ b/package.json
@@ -1,6 +1,6 @@
 {
  "name": "openllm",
-  "version": "0.1.21.dev0",
+  "version": "0.2.0",
  "description": "OpenLLM: Your one stop-and-go solution for serving Large Language Model",
  "repository": "git@github.com:llmsys/OpenLLM.git",
  "author": "Aaron Pham <29749331+aarnphm@users.noreply.github.com>",
--- a/src/openllm/about.py
+++ b/src/openllm/about.py
@@ -11,4 +11,4 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-__version__ = "0.1.21.dev0"
+__version__ = "0.2.0"