chore(deps): bump vllm to 0.2.7 (#837)

* chore(deps): bump vllm to 0.2.7 Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * chore: update changelog Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> --------- Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2026-06-11 18:09:52 -04:00 · 2024-01-08 14:41:58 -05:00
parent 07d15ed217
commit 79da419d87
5 changed files with 5 additions and 4 deletions
--- a/openllm-python/README.md
+++ b/openllm-python/README.md
@@ -1445,7 +1445,7 @@ openllm start squeeze-ai-lab/sq-llama-2-7b-w4-s0 --quantize squeezellm --seriali
 ```

 > [!IMPORTANT]
-> Since both `squeezellm` and `awq` are weight-aware quantization methods, meaning the quantization is done during training, all pre-trained weights needs to get quantized before inference time. Make sure to fine compatible weights on HuggingFace Hub for your model of choice.
+> Since both `squeezellm` and `awq` are weight-aware quantization methods, meaning the quantization is done during training, all pre-trained weights needs to get quantized before inference time. Make sure to find compatible weights on HuggingFace Hub for your model of choice.

 ## 🛠️ Serving fine-tuning layers

--- a/openllm-python/pyproject.toml
+++ b/openllm-python/pyproject.toml
@@ -119,7 +119,7 @@ openai = ["openai[datalib]>=1", "tiktoken"]
 playground = ["jupyter", "notebook", "ipython", "jupytext", "nbformat"]
 qwen = ["cpm-kernels", "tiktoken"]
 starcoder = ["bitsandbytes"]
-vllm = ["vllm==0.2.6", "ray==2.6.0"]
+vllm = ["vllm==0.2.7", "ray==2.6.0"]

 [tool.hatch.version]
 fallback-version = "0.0.0"