mirror of
https://github.com/bentoml/OpenLLM.git
synced 2026-03-03 06:06:09 -05:00
infra: prepare for release 0.3.0 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
This commit is contained in:
48
openllm-python/CHANGELOG.md
generated
48
openllm-python/CHANGELOG.md
generated
@@ -18,6 +18,47 @@ This changelog is managed by towncrier and is compiled at release time.
|
||||
|
||||
<!-- towncrier release notes start -->
|
||||
|
||||
## [0.3.0](https://github.com/bentoml/openllm/tree/v0.3.0)
|
||||
|
||||
### Backwards-incompatible Changes
|
||||
|
||||
- All environment variable now will be more simplified, without the need for the specific model prefix
|
||||
|
||||
For example: OPENLLM_LLAMA_GENERATION_MAX_NEW_TOKENS now becomes OPENLLM_GENERATION_MAX_NEW_TOKENS
|
||||
|
||||
Unify some misc environment variable. To switch different backend, one can use `--backend` for both `start` and `build`
|
||||
|
||||
```bash
|
||||
openllm start llama --backend vllm
|
||||
```
|
||||
|
||||
or the environment variable `OPENLLM_BACKEND`
|
||||
|
||||
```bash
|
||||
OPENLLM_BACKEND=vllm openllm start llama
|
||||
```
|
||||
|
||||
`openllm.Runner` now will default to try download the model the first time if the model is not available, and get the cached in model store consequently
|
||||
|
||||
Model serialisation now updated to a new API version with more clear name change, kindly ask users to do `openllm prune -y --include-bentos` and update to
|
||||
this current version of openllm
|
||||
[#283](https://github.com/bentoml/openllm/issues/283)
|
||||
|
||||
|
||||
### Refactor
|
||||
|
||||
- Refactor GPTQ to use official implementation from transformers>=4.32
|
||||
[#297](https://github.com/bentoml/openllm/issues/297)
|
||||
|
||||
|
||||
### Features
|
||||
|
||||
- Added support for vLLM streaming
|
||||
|
||||
This can now be accessed via `/v1/generate_stream`
|
||||
[#260](https://github.com/bentoml/openllm/issues/260)
|
||||
|
||||
|
||||
## [0.2.27](https://github.com/bentoml/openllm/tree/v0.2.27)
|
||||
|
||||
### Changes
|
||||
@@ -230,7 +271,7 @@ No significant changes.
|
||||
|
||||
```bash
|
||||
docker run --rm --gpus all -it -v /home/ubuntu/.local/share/bentoml:/tmp/bentoml -e BENTOML_HOME=/tmp/bentoml \
|
||||
-e OPENLLM_USE_LOCAL_LATEST=True -e OPENLLM_LLAMA_FRAMEWORK=vllm ghcr.io/bentoml/openllm:2b5e96f90ad314f54e07b5b31e386e7d688d9bb2 start llama --model-id meta-llama/Llama-2-7b-chat-hf --workers-per-resource conserved --debug`
|
||||
-e OPENLLM_USE_LOCAL_LATEST=True -e OPENLLM_BACKEND=vllm ghcr.io/bentoml/openllm:2b5e96f90ad314f54e07b5b31e386e7d688d9bb2 start llama --model-id meta-llama/Llama-2-7b-chat-hf --workers-per-resource conserved --debug`
|
||||
```
|
||||
|
||||
In conjunction with this, OpenLLM now also have a set of small CLI utilities via ``openllm ext`` for ease-of-use
|
||||
@@ -721,9 +762,6 @@ No significant changes.
|
||||
`openllm start` now support `--quantize int8` and `--quantize int4` `GPTQ`
|
||||
quantization support is on the roadmap and currently being worked on.
|
||||
|
||||
`openllm start` now also support `--bettertransformer` to use
|
||||
`BetterTransformer` for serving.
|
||||
|
||||
Refactored `openllm.LLMConfig` to be able to use with `__getitem__`:
|
||||
`openllm.DollyV2Config()['requirements']`.
|
||||
|
||||
@@ -732,8 +770,6 @@ No significant changes.
|
||||
|
||||
Added `towncrier` workflow to easily generate changelog entries
|
||||
|
||||
Added `use_pipeline`, `bettertransformer` flag into ModelSettings
|
||||
|
||||
`LLMConfig` now supported `__dataclass_transform__` protocol to help with
|
||||
type-checking
|
||||
|
||||
|
||||
Reference in New Issue
Block a user