Commit Graph

613 Commits

Author SHA1 Message Date
Aaron
aaa8ec433c chore(ci): running pyright last
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-08 22:19:08 -04:00
Aaron
21143fdfab fix(brew): set correct url for release
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-08 22:18:26 -04:00
Aaron Pham
b9dd54f634 feat: homebrew tap (#190) 2023-08-08 22:11:48 -04:00
aarnphm-ec2-dev
deaee67b47 fix(loading): make sure to cast the model to cuda if PyTorch
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-08-09 01:42:11 +00:00
aarnphm-ec2-dev
ae35ee8115 fix(build): set legacy serialisation for vllm on Bento
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-08-08 20:10:49 +00:00
Aaron Pham
2d47a54efd feat(strategy): spawn one runner instance (#189) 2023-08-08 05:47:11 -04:00
Aaron Pham
9c3019d236 infra: bump to dev version of 0.2.18.dev0 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-08-08 05:44:51 +00:00
Aaron Pham
126491f272 infra: prepare for release 0.2.17 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
v0.2.17
2023-08-08 05:34:43 +00:00
Aaron Pham
cb6f3aa48e feat: --force-push to allow force push to bentocloud (#188) 2023-08-08 01:06:59 -04:00
Aaron
371a7c896c fix: loading models within k8s API server
remove a logic where the API server tries to load the model when it is
not available locally

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-08 00:22:48 -04:00
pre-commit-ci[bot]
0139613f3c ci: pre-commit autoupdate [pre-commit.ci] [skip ci] (#187)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-08-07 18:02:26 -04:00
Aaron Pham
21ea7e493f feat(generation): initial work for generating tokens (#186) 2023-08-06 20:04:40 -04:00
Aaron Pham
2d5be909cd fix(models): setup xformers and loading PyTorch meta weights (#185) 2023-08-06 03:25:02 -04:00
Aaron
96b25842d1 chore(docs): update security notes obsidian style [skip ci]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-06 02:02:08 -04:00
Aaron Pham
74a928f6f3 chore: CODE_OF_CONDUCT.md [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-08-06 01:56:08 -04:00
Aaron Pham
752de09626 fix(ci): update version correctly [skip ci] (#184) 2023-08-06 01:18:33 -04:00
Aaron Pham
4875c3a109 feat: optimize model saving and loading on single GPU (#183) 2023-08-06 01:00:49 -04:00
aarnphm-ec2-dev
8bba90f611 fix: add release for correct CI version [skip ci]
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-08-05 07:41:50 +00:00
Aaron Pham
82d7ab67f3 infra: bump to dev version of ..1.dev0 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-08-04 16:56:48 +00:00
Aaron Pham
f68beb5ccb infra: prepare for release 0.2.16 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
v0.2.16
2023-08-04 16:43:29 +00:00
Aaron
90072ec5ee fix(regression): setting quantize only if it is not None
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-04 12:40:55 -04:00
Aaron
ba07205156 fix: disable building xformers from source
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-04 12:14:04 -04:00
Aaron
794719670e chore: update README [skip ci]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-04 12:10:21 -04:00
Aaron Pham
cdc6bae0e9 infra: bump to dev version of ..1.dev0 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-08-04 15:47:20 +00:00
Aaron Pham
9d1476e360 infra: prepare for release 0.2.15 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
v0.2.15
2023-08-04 15:32:47 +00:00
Aaron
287b7f9ab2 fix: releases issue when building new container [skip ci]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-04 11:31:02 -04:00
Aaron
20deb3354d infra: bump to dev version of 0.2.15.dev0 [generated] [skip ci]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-04 11:11:14 -04:00
Aaron Pham
cb05446760 infra: prepare for release 0.2.14 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
v0.2.14
2023-08-04 14:50:59 +00:00
Aaron
975a1d0349 fix: remove tokens for release [skip ci]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-04 10:49:06 -04:00
Aaron
1e74e967d1 fix(container): correct cache directory
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-04 10:31:06 -04:00
Aaron Pham
2541a0f8dc infra: initial work on compiling mypyc wheels (#182) 2023-08-04 10:20:03 -04:00
Aaron Pham
2cc264aa72 fix(vllm): correctly load given model id from envvar (#181) 2023-08-03 16:34:35 -04:00
Aaron
db8e47bc5b fix(build): correct module type for stubs and strip assert [skip ci]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-03 04:15:55 -04:00
Aaron
8f74e24c2f fix: clone all for nightly strategy
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-03 03:17:18 -04:00
Aaron
b949106daf fix(ci): rename runner name [skip ci]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-03 02:24:45 -04:00
Aaron Pham
e9eff70978 infra: bump to dev version of 0.2.14.dev0 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-08-03 06:18:57 +00:00
Aaron Pham
8428692d45 infra: prepare for release 0.2.13 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
v0.2.13
2023-08-03 06:06:09 +00:00
Aaron
cac7a19be9 fix(build): to run on tags [skip ci]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-03 02:00:13 -04:00
aarnphm-ec2-dev
29ca9f398f fix: add arch_list for cross compiling
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-08-03 04:33:48 +00:00
Aaron
f5eb21ede0 revert: "chore(aws): use g4dn for more availability"
This reverts commit a06464bdc7.
2023-08-02 23:55:29 -04:00
aarnphm-ec2-dev
a01d867bc7 chore(base): add auto-gptq CUDA kernel
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-08-03 02:40:06 +00:00
aarnphm-ec2-dev
820b4991fa chore(stubs): add generated for auto-gptq and vllm [skip ci]
This is to help with working on CPU machine

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-08-03 02:28:24 +00:00
aarnphm-ec2-dev
a06464bdc7 chore(aws): use g4dn for more availability
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-08-03 02:17:37 +00:00
Aaron
af64a6dfd5 chore(docs): update to obsidian README format
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-02 21:49:33 -04:00
aarnphm-ec2-dev
b349820429 fix(build): add `--device` into envvar
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-08-03 00:44:40 +00:00
Aaron Pham
cfc7f3888d chore(vllm): add all supported models (#179) 2023-08-02 17:42:02 -04:00
Aaron Pham
72337410cf fix: nightly resolver for correct tag (#177) 2023-08-02 13:10:50 -04:00
Aaron
d4fbfa5e5c fix: custom release strategy for correct naming
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-02 03:03:21 -04:00
Aaron Pham
acb81a6e1a fix(build): dispatch container via workflow calls (#174)
add OPENLLM_USE_LOCAL_LATEST as default behaviour within container
2023-08-02 01:54:10 -04:00
Aaron
f989ebd4b9 infra: bump to dev version of 0.2.13.dev0 [generated] [skip ci]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-01 19:52:56 -04:00