mirror of
https://github.com/bentoml/OpenLLM.git
synced 2026-03-06 08:08:03 -05:00
infra: prepare for release 0.3.5 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
This commit is contained in:
18
openllm-python/CHANGELOG.md
generated
18
openllm-python/CHANGELOG.md
generated
@@ -18,6 +18,24 @@ This changelog is managed by towncrier and is compiled at release time.
|
||||
|
||||
<!-- towncrier release notes start -->
|
||||
|
||||
## [0.3.5](https://github.com/bentoml/openllm/tree/v0.3.5)
|
||||
|
||||
### Features
|
||||
|
||||
- Added support for continuous batching via vLLM
|
||||
|
||||
Currently benchmark shows that 100 concurrent requests shows around 1218 TPS on 1 A100 running meta-llama/Llama-2-13b-chat-hf
|
||||
[#349](https://github.com/bentoml/openllm/issues/349)
|
||||
|
||||
|
||||
### Bug fix
|
||||
|
||||
- Set a default serialisation for all models.
|
||||
|
||||
Currently, only Llama 2 will use safetensors as default format. For all other models, if they have safetensors format, then it will can be opt-int via `--serialisation safetensors`
|
||||
[#355](https://github.com/bentoml/openllm/issues/355)
|
||||
|
||||
|
||||
## [0.3.4](https://github.com/bentoml/openllm/tree/v0.3.4)
|
||||
|
||||
### Bug fix
|
||||
|
||||
Reference in New Issue
Block a user