Zhao Shenyang
|
aff5dc8ff2
|
fix: proper SSE handling for vllm (#877)
fix: proper SSE handling
|
2024-02-02 17:25:58 +08:00 |
|
Aaron Pham
|
8d63afc9ce
|
feat(vllm): support GPTQ with 0.2.6 (#797)
* feat(vllm): GPTQ support passthrough
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* chore: run scripts
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
* fix(install): set order of xformers before vllm
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* feat: support GPTQ with vLLM
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
|
2023-12-18 12:41:19 -05:00 |
|
Aaron Pham
|
c8c9663d06
|
fix(infra): conform ruff to 150 LL (#781)
Generally correctly format it with ruff format and manual style
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-12-14 17:27:32 -05:00 |
|
Aaron Pham
|
44383528b5
|
fix(logprobs): correct check logprobs (#779)
* fix(logprobs): correct check logprobs
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* chore: update changlog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-12-14 14:19:01 -05:00 |
|
Aaron Pham
|
9706228956
|
chore(vllm): add arguments for gpu memory utilization
Probably not going to fix anything, just delaying the problem.
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
|
2023-11-29 06:45:14 +00:00 |
|
Aaron Pham
|
d04309188b
|
chore(style): 2.7k
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
|
2023-11-28 07:04:27 +00:00 |
|
Aaron Pham
|
52a44b1bfa
|
chore: cleanup loader (#729)
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
|
2023-11-22 21:51:51 -05:00 |
|
Aaron Pham
|
63d86faa32
|
fix(openai): correct stop tokens and finish_reason state (#722)
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
|
2023-11-22 04:21:13 -05:00 |
|
Aaron Pham
|
c33b071ee4
|
refactor: delete unused code (#716)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-21 04:39:48 -05:00 |
|
Aaron Pham
|
fde78a2c78
|
chore: cleanup unused prompt templates (#713)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-21 01:56:51 -05:00 |
|
Aaron Pham
|
ad4f388c98
|
refactor: update runner helpers and add max_model_len (#712)
* chore(runner): cleanup unecessary checks for runnable backend
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* chore: saving llm reference to runner
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* chore: correct inject item
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* chore: update support for max_seq_len
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* fix: correct max_model_len
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* chore: update and warning backward compatibility
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* chore: remove unused sets
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-20 20:37:15 -05:00 |
|
Aaron Pham
|
816c1ee80e
|
feat(engine): CTranslate2 (#698)
* chore: update instruction for dependencies
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* feat(experimental): CTranslate2
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-19 10:25:08 -05:00 |
|
Aaron Pham
|
1831d8f129
|
feat: heuristics logprobs (#692)
* fix(encoder): bring back T5 support on PyTorch
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* feat: support logprobs and prompt_logprobs
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* docs: update changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-18 19:26:20 -05:00 |
|
Aaron Pham
|
4a6f13ddd2
|
feat(type): provide structured annotations stubs (#663)
* feat(type): provide client stubs
separation of concern for more brevity code base
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* docs: update changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-16 02:58:45 -05:00 |
|
Aaron Pham
|
a58d947bc8
|
perf: improve build logics and cleanup speed (#657)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-15 00:18:31 -05:00 |
|
Aaron Pham
|
22eaaf3ce1
|
feat(vllm): support passing specific dtype (#626)
* feat(vllm): support passing specific dtype
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
* fix: correctly cached the item
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
* ci: auto fixes from pre-commit.ci
For more information, see https://pre-commit.ci
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
2023-11-13 05:08:33 -05:00 |
|
Aaron Pham
|
126e6c9d63
|
fix(ruff): correct consistency between isort and formatter (#624)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-12 21:12:50 -05:00 |
|
Aaron Pham
|
7e1fb35a71
|
chore(llm): expose quantise and lazy load heavy imports (#617)
* chore(llm): expose quantise and lazy load heavy imports
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* chore: move transformers to TYPE_CHECKING block
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-12 14:55:37 -05:00 |
|
Aaron Pham
|
ac377fe490
|
infra: using ruff formatter (#594)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-09 12:44:05 -05:00 |
|
Aaron Pham
|
47107727b3
|
feat(vllm): squeezellm (#588)
* feat(vllm): squeezellm
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
* fix: correct import_model with awq and gatekeep squeezellm for PyTorch
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
|
2023-11-08 07:21:27 -05:00 |
|
Aaron Pham
|
85a7243ac3
|
fix: device imports using strategies (#584)
* fix: device imports using strategies
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
* chore: support trust_remote_code for vLLM runners
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
|
2023-11-08 05:10:50 -05:00 |
|
Aaron Pham
|
cfd09bfc47
|
chore(runner): yield the outputs directly (#573)
update openai client examples to >1
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
2023-11-07 22:34:11 -05:00 |
|
Aaron Pham
|
e2029c934b
|
perf: unify LLM interface (#518)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
2023-11-06 20:39:43 -05:00 |
|