infra: prepare for release 0.3.5 [generated] [skip ci]

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
This commit is contained in:
Aaron Pham
2023-09-18 06:29:22 +00:00
parent 5a1fcc9cd5
commit 4662f7008a
7 changed files with 39 additions and 9 deletions

View File

@@ -18,6 +18,24 @@ This changelog is managed by towncrier and is compiled at release time.
<!-- towncrier release notes start -->
## [0.3.5](https://github.com/bentoml/openllm/tree/v0.3.5)
### Features
- Added support for continuous batching via vLLM
Currently benchmark shows that 100 concurrent requests shows around 1218 TPS on 1 A100 running meta-llama/Llama-2-13b-chat-hf
[#349](https://github.com/bentoml/openllm/issues/349)
### Bug fix
- Set a default serialisation for all models.
Currently, only Llama 2 will use safetensors as default format. For all other models, if they have safetensors format, then it will can be opt-int via `--serialisation safetensors`
[#355](https://github.com/bentoml/openllm/issues/355)
## [0.3.4](https://github.com/bentoml/openllm/tree/v0.3.4)
### Bug fix