infra: prepare for release 0.3.5 [generated] [skip ci]

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2026-05-24 08:34:37 -04:00 · 2023-09-18 06:29:22 +00:00
parent 5a1fcc9cd5
commit 4662f7008a
7 changed files with 39 additions and 9 deletions
--- a/openllm-python/CHANGELOG.md
+++ b/openllm-python/CHANGELOG.md
@@ -18,6 +18,24 @@ This changelog is managed by towncrier and is compiled at release time.

 <!-- towncrier release notes start -->

+## [0.3.5](https://github.com/bentoml/openllm/tree/v0.3.5)
+
+### Features
+
+- Added support for continuous batching via vLLM
+
+  Currently benchmark shows that 100 concurrent requests shows around 1218 TPS on 1 A100 running meta-llama/Llama-2-13b-chat-hf
+  [#349](https://github.com/bentoml/openllm/issues/349)
+
+
+### Bug fix
+
+- Set a default serialisation for all models.
+
+  Currently, only Llama 2 will use safetensors as default format. For all other models, if they have safetensors format, then it will can be opt-int via `--serialisation safetensors`
+  [#355](https://github.com/bentoml/openllm/issues/355)
+
+
 ## [0.3.4](https://github.com/bentoml/openllm/tree/v0.3.4)

 ### Bug fix