mirror of
https://github.com/bentoml/OpenLLM.git
synced 2026-04-23 00:17:28 -04:00
chore(deps): bump vllm to 0.2.7 (#837)
* chore(deps): bump vllm to 0.2.7 Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * chore: update changelog Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> --------- Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
This commit is contained in:
@@ -29,7 +29,7 @@ COPY hatch.toml README.md CHANGELOG.md openllm-python/pyproject.toml /openllm-py
|
||||
# below
|
||||
RUN --mount=type=cache,target=/root/.cache/pip \
|
||||
pip3 install -v --no-cache-dir \
|
||||
"ray==2.6.0" "xformers==0.0.23" "vllm==0.2.6" && \
|
||||
"ray==2.6.0" "xformers==0.0.23" "vllm==0.2.7" && \
|
||||
pip3 install --no-cache-dir -e /openllm-python/
|
||||
|
||||
COPY openllm-core/src openllm-core/src
|
||||
|
||||
1
changelog.d/837.change.md
Normal file
1
changelog.d/837.change.md
Normal file
@@ -0,0 +1 @@
|
||||
Bump vllm to 0.2.7 for a newly built bento
|
||||
2
openllm-python/README.md
generated
2
openllm-python/README.md
generated
@@ -1445,7 +1445,7 @@ openllm start squeeze-ai-lab/sq-llama-2-7b-w4-s0 --quantize squeezellm --seriali
|
||||
```
|
||||
|
||||
> [!IMPORTANT]
|
||||
> Since both `squeezellm` and `awq` are weight-aware quantization methods, meaning the quantization is done during training, all pre-trained weights needs to get quantized before inference time. Make sure to fine compatible weights on HuggingFace Hub for your model of choice.
|
||||
> Since both `squeezellm` and `awq` are weight-aware quantization methods, meaning the quantization is done during training, all pre-trained weights needs to get quantized before inference time. Make sure to find compatible weights on HuggingFace Hub for your model of choice.
|
||||
|
||||
## 🛠️ Serving fine-tuning layers
|
||||
|
||||
|
||||
@@ -119,7 +119,7 @@ openai = ["openai[datalib]>=1", "tiktoken"]
|
||||
playground = ["jupyter", "notebook", "ipython", "jupytext", "nbformat"]
|
||||
qwen = ["cpm-kernels", "tiktoken"]
|
||||
starcoder = ["bitsandbytes"]
|
||||
vllm = ["vllm==0.2.6", "ray==2.6.0"]
|
||||
vllm = ["vllm==0.2.7", "ray==2.6.0"]
|
||||
|
||||
[tool.hatch.version]
|
||||
fallback-version = "0.0.0"
|
||||
|
||||
@@ -155,7 +155,7 @@ GGML_DEPS = ['ctransformers']
|
||||
CTRANSLATE_DEPS = ['ctranslate2>=3.22.0']
|
||||
AWQ_DEPS = ['autoawq']
|
||||
GPTQ_DEPS = ['auto-gptq[triton]>=0.4.2']
|
||||
VLLM_DEPS = ['vllm==0.2.6', 'ray==2.6.0']
|
||||
VLLM_DEPS = ['vllm==0.2.7', 'ray==2.6.0']
|
||||
|
||||
_base_requirements: dict[str, t.Any] = {
|
||||
inflection.dasherize(name): config_cls.__openllm_requirements__
|
||||
|
||||
Reference in New Issue
Block a user