diff --git a/CHANGELOG.md b/CHANGELOG.md index 06175b8d..e7f3e582 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -18,6 +18,74 @@ This changelog is managed by towncrier and is compiled at release time. +## [0.2.0](https://github.com/bentoml/openllm/tree/v0.2.0) + +### Features + +- Added support for GPTNeoX models. All variants of GPTNeoX, including Dolly-V2 + and StableLM can now also use `openllm start gpt-neox` + + `openllm models -o json` nows return CPU and GPU field. `openllm models` now + show table that mimics the one from README.md + + Added scripts to automatically add models import to `__init__.py` + + `--workers-per-resource` now accepts the following strategies: + + - `round_robin`: Similar behaviour when setting `--workers-per-resource 1`. This + is useful for smaller models. + - `conserved`: This will determine the number of available GPU resources, and + only assign one worker for the LLMRunner with all available GPU resources. For + example, if ther are 4 GPUs available, then `conserved` is equivalent to + `--workers-per-resource 0.25`. + [#106](https://github.com/bentoml/openllm/issues/106) +- Added support for [Baichuan](https://github.com/baichuan-inc/Baichuan-7B) model + generation, contributed by @hetaoBackend + + Fixes how we handle model loader auto class for trust_remote_code in + transformers + [#115](https://github.com/bentoml/openllm/issues/115) + + +### Bug fix + +- Fixes relative model_id handling for running LLM within the container. + + Added support for building container directly with `openllm build`. Users now + can do `openllm build --format=container`: + + ```bash + openllm build flan-t5 --format=container + ``` + + This is equivalent to: + + ```bash + openllm build flan-t5 && bentoml containerize google-flan-t5-large-service + ``` + + Added Snapshot testing and more robust edge cases for model testing + + General improvement in `openllm.LLM.import_model` where it will parse santised + parameters automatically. + + Fixes `openllm start ` to use correct `model_id`, ignoring `--model-id` + (The correct behaviour) + + Fixes `--workers-per-resource conserved` to respect `--device` + + Added initial interface for `LLM.embeddings` + [#107](https://github.com/bentoml/openllm/issues/107) +- Fixes resources to correctly follows CUDA_VISIBLE_DEVICES spec + + OpenLLM now contains a standalone parser that mimic `torch.cuda` parser for set + GPU devices. This parser will be used to parse both AMD and NVIDIA GPUs. + + `openllm` should now be able to parse `GPU-` and `MIG-` UUID from both + configuration or spec. + [#114](https://github.com/bentoml/openllm/issues/114) + + ## [0.1.20](https://github.com/bentoml/openllm/tree/v0.1.20) ### Features diff --git a/changelog.d/106.feature.md b/changelog.d/106.feature.md deleted file mode 100644 index c27f5bd1..00000000 --- a/changelog.d/106.feature.md +++ /dev/null @@ -1,16 +0,0 @@ -Added support for GPTNeoX models. All variants of GPTNeoX, including Dolly-V2 -and StableLM can now also use `openllm start gpt-neox` - -`openllm models -o json` nows return CPU and GPU field. `openllm models` now -show table that mimics the one from README.md - -Added scripts to automatically add models import to `__init__.py` - -`--workers-per-resource` now accepts the following strategies: - -- `round_robin`: Similar behaviour when setting `--workers-per-resource 1`. This - is useful for smaller models. -- `conserved`: This will determine the number of available GPU resources, and - only assign one worker for the LLMRunner with all available GPU resources. For - example, if ther are 4 GPUs available, then `conserved` is equivalent to - `--workers-per-resource 0.25`. diff --git a/changelog.d/107.fix.md b/changelog.d/107.fix.md deleted file mode 100644 index a70a8c52..00000000 --- a/changelog.d/107.fix.md +++ /dev/null @@ -1,26 +0,0 @@ -Fixes relative model_id handling for running LLM within the container. - -Added support for building container directly with `openllm build`. Users now -can do `openllm build --format=container`: - -```bash -openllm build flan-t5 --format=container -``` - -This is equivalent to: - -```bash -openllm build flan-t5 && bentoml containerize google-flan-t5-large-service -``` - -Added Snapshot testing and more robust edge cases for model testing - -General improvement in `openllm.LLM.import_model` where it will parse santised -parameters automatically. - -Fixes `openllm start ` to use correct `model_id`, ignoring `--model-id` -(The correct behaviour) - -Fixes `--workers-per-resource conserved` to respect `--device` - -Added initial interface for `LLM.embeddings` diff --git a/changelog.d/114.fix.md b/changelog.d/114.fix.md deleted file mode 100644 index 346670f3..00000000 --- a/changelog.d/114.fix.md +++ /dev/null @@ -1,7 +0,0 @@ -Fixes resources to correctly follows CUDA_VISIBLE_DEVICES spec - -OpenLLM now contains a standalone parser that mimic `torch.cuda` parser for set -GPU devices. This parser will be used to parse both AMD and NVIDIA GPUs. - -`openllm` should now be able to parse `GPU-` and `MIG-` UUID from both -configuration or spec. diff --git a/changelog.d/115.feature.md b/changelog.d/115.feature.md deleted file mode 100644 index 8c54d15c..00000000 --- a/changelog.d/115.feature.md +++ /dev/null @@ -1,5 +0,0 @@ -Added support for [Baichuan](https://github.com/baichuan-inc/Baichuan-7B) model -generation, contributed by @hetaoBackend - -Fixes how we handle model loader auto class for trust_remote_code in -transformers diff --git a/package.json b/package.json index d6f4e845..8d3c47db 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "openllm", - "version": "0.1.21.dev0", + "version": "0.2.0", "description": "OpenLLM: Your one stop-and-go solution for serving Large Language Model", "repository": "git@github.com:llmsys/OpenLLM.git", "author": "Aaron Pham <29749331+aarnphm@users.noreply.github.com>", diff --git a/src/openllm/__about__.py b/src/openllm/__about__.py index 7b634711..49add02f 100644 --- a/src/openllm/__about__.py +++ b/src/openllm/__about__.py @@ -11,4 +11,4 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. -__version__ = "0.1.21.dev0" +__version__ = "0.2.0"