Aaron Pham
|
cfd09bfc47
|
chore(runner): yield the outputs directly (#573)
update openai client examples to >1
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
2023-11-07 22:34:11 -05:00 |
|
Aaron Pham
|
e2029c934b
|
perf: unify LLM interface (#518)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
2023-11-06 20:39:43 -05:00 |
|
Aaron Pham
|
72c6005d3b
|
chore(inference): update vllm to 0.2.1.post1 and update config parsing (#554)
chore(dependencies): update vllm to 0.2.1.post1 and update config
parsing
|
2023-11-04 04:01:56 -04:00 |
|
aarnphm-ec2-dev
|
65c76cace3
|
chore: update deps for transformers and vllm
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
|
2023-10-11 04:28:46 +00:00 |
|
XunchaoZ
|
04bb29a264
|
feat: OpenAI-compatible API (#417)
Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
2023-10-07 00:50:03 -04:00 |
|
Aaron Pham
|
ad9107958d
|
feat: continuous batching with vLLM (#349)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* feat: continuous batching
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
* chore: add changeloe
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
* chore: add one shot generation
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
---------
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-09-14 03:09:36 -04:00 |
|
Aaron Pham
|
35e6945e86
|
fix(serialisation): vLLM safetensors support (#324)
* fix(serilisation): vllm support for safetensors
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
* chore: running tools
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* chore: generalize one shot generation
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* chore: add changelog [skip ci]
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
---------
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
|
2023-09-12 17:44:01 -04:00 |
|
Aaron
|
0d50aa00b9
|
chore: add openllm-core as meta dependencies
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-09-07 10:31:40 -04:00 |
|
aarnphm-ec2-dev
|
8173cb09a5
|
fix(quantize): dyn quant for int8 and int4
only set tokenizer when it is gptq
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
|
2023-09-07 01:48:45 +00:00 |
|
Aaron
|
887ffa9aa0
|
chore: cleanup pre-commit jobs and update usage
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-09-05 10:06:36 -04:00 |
|
Aaron Pham
|
956b3a53bc
|
fix(gptq): use upstream integration (#297)
* wip
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
* feat: GPTQ transformers integration
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
* fix: only load if variable is available and add changelog
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
* chore: remove boilerplate check
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
---------
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
|
2023-09-04 14:05:50 -04:00 |
|
Aaron Pham
|
2036d4e015
|
chore(build): use latest vllm pre-built kernel (#261)
|
2023-08-26 09:02:52 -04:00 |
|
aarnphm-ec2-dev
|
dae38cdba1
|
chore: update external dependencies [skip ci]
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
|
2023-08-25 09:27:26 +00:00 |
|
Aaron Pham
|
3ffb25a872
|
refactor: packages (#249)
|
2023-08-22 08:55:46 -04:00 |
|
Aaron Pham
|
4140d160b8
|
feat(embedding): Adding generic endpoint (#227)
|
2023-08-17 15:17:00 -04:00 |
|
Aaron Pham
|
ccca49af04
|
fix(ci): remove broken build hooks (#216)
|
2023-08-16 04:49:12 -04:00 |
|
Aaron
|
af8cb73832
|
fix: latest vllm build
sync changelog with monorepo for sdist installation
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-08-16 04:03:34 -04:00 |
|
Aaron
|
78ae2b3843
|
fix(metadata): hooks for metadata pypi [skip ci]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-08-15 06:15:51 -04:00 |
|
Aaron
|
43740aca8b
|
fix(metadata): include hatch-fancy-pypi-readme into subdir [skip ci]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-08-15 05:06:48 -04:00 |
|
Aaron
|
accc8d0d15
|
fix: editable install
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-08-15 03:57:53 -04:00 |
|
Aaron Pham
|
cd872ef631
|
refactor: monorepo (#203)
|
2023-08-15 02:11:14 -04:00 |
|