Commit Graph

6 Commits

Author SHA1 Message Date
Aaron Pham
ad9107958d feat: continuous batching with vLLM (#349)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* feat: continuous batching

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>

* chore: add changeloe

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>

* chore: add one shot generation

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-09-14 03:09:36 -04:00
Alan Poulain
88d7ba7ca8 fix(vllm): Make sure to use max number of GPUs available (#326)
* fix(serving): vllm bad num_gpus

Signed-off-by: Alan Poulain <contact@alanpoulain.eu>

* ci: auto fixes from pre-commit.ci

For more information, see https://pre-commit.ci

---------

Signed-off-by: Alan Poulain <contact@alanpoulain.eu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-09-12 12:45:00 -04:00
aarnphm-ec2-dev
7d893e6cd2 chore: ignore new lines split [skip ci]
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-09-01 17:00:49 +00:00
Aaron Pham
608de0b667 fix(serving): vllm distributed size (#285)
* chore(weights): ignore gguf pattern for non GGML backend

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>

* chore: correct fix num_gpus to be divisble by 2

This depends on the attention_heads from given models

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-09-01 12:37:10 -04:00
Aaron Pham
b7af7765d4 fix(yapf): align weird new lines break [generated] [skip ci] (#284)
fix(yapf): align weird new lines break

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-09-01 05:34:22 -04:00
Aaron Pham
3e45530abd refactor(breaking): unify LLM API (#283)
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-09-01 05:15:19 -04:00