Commit Graph

421 Commits

Author SHA1 Message Date
Aaron Pham
c74f3de6c7 chore: update typing to hijack compliant
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2024-07-10 20:19:41 -04:00
Aaron Pham
e1675652d1 chore: add repo default utils
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2024-07-10 18:01:15 -04:00
Aaron Pham
3fbb75f7e9 chore: add instruction to access chat URL
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2024-07-09 22:56:27 -04:00
Aaron Pham
f4d822125e chore: ready for 0.6 releases
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2024-07-09 22:05:43 -04:00
Aaron Pham
cd872ef631 refactor: monorepo (#203) 2023-08-15 02:11:14 -04:00
pre-commit-ci[bot]
2d33100d72 ci: pre-commit autoupdate [pre-commit.ci] (#207)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-08-14 19:20:05 -04:00
Aaron Pham
f6317d8003 infra: enable compiled wheels for all supported Python (#201) 2023-08-12 04:54:50 -04:00
aarnphm-ec2-dev
dc776e9c5a chore(autogptq): update to latest commits
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-08-11 11:02:05 +00:00
Aaron
785c1db237 fix(client): include openllm.client into main module [skip ci]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-11 06:19:56 -04:00
Aaron Pham
5329853b10 perf: compiled modules and enable lazyeval (#200) 2023-08-11 05:53:45 -04:00
Aaron Pham
c083990edd infra: migrate to initial openllm-node library (#199) 2023-08-10 18:54:00 -04:00
aarnphm-ec2-dev
034610b6b0 fix(embeddings): correct imports
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-08-10 22:47:38 +00:00
aarnphm-ec2-dev
689b83bbe3 fix(loading): make sure not to load to cuda with kbit quantisation
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-08-10 19:39:01 +00:00
Aaron
e0daea6e78 fix(compile): absolute import for compiled wheels
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-09 22:51:35 -04:00
aarnphm-ec2-dev
dfc4b489c5 feat(build): notes on compiled wheels for Bento
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-08-09 21:52:34 +00:00
Aaron Pham
b1445c6516 refactor(cli): compiled wheels and extension modules (#191) 2023-08-09 17:10:15 -04:00
Aaron Pham
b9dd54f634 feat: homebrew tap (#190) 2023-08-08 22:11:48 -04:00
aarnphm-ec2-dev
deaee67b47 fix(loading): make sure to cast the model to cuda if PyTorch
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-08-09 01:42:11 +00:00
aarnphm-ec2-dev
ae35ee8115 fix(build): set legacy serialisation for vllm on Bento
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-08-08 20:10:49 +00:00
Aaron Pham
2d47a54efd feat(strategy): spawn one runner instance (#189) 2023-08-08 05:47:11 -04:00
Aaron Pham
cb6f3aa48e feat: --force-push to allow force push to bentocloud (#188) 2023-08-08 01:06:59 -04:00
Aaron
371a7c896c fix: loading models within k8s API server
remove a logic where the API server tries to load the model when it is
not available locally

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-08 00:22:48 -04:00
Aaron Pham
21ea7e493f feat(generation): initial work for generating tokens (#186) 2023-08-06 20:04:40 -04:00
Aaron Pham
2d5be909cd fix(models): setup xformers and loading PyTorch meta weights (#185) 2023-08-06 03:25:02 -04:00
Aaron Pham
4875c3a109 feat: optimize model saving and loading on single GPU (#183) 2023-08-06 01:00:49 -04:00
Aaron
90072ec5ee fix(regression): setting quantize only if it is not None
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-04 12:40:55 -04:00
Aaron
ba07205156 fix: disable building xformers from source
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-04 12:14:04 -04:00
Aaron
287b7f9ab2 fix: releases issue when building new container [skip ci]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-04 11:31:02 -04:00
Aaron
1e74e967d1 fix(container): correct cache directory
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-04 10:31:06 -04:00
Aaron Pham
2541a0f8dc infra: initial work on compiling mypyc wheels (#182) 2023-08-04 10:20:03 -04:00
Aaron Pham
2cc264aa72 fix(vllm): correctly load given model id from envvar (#181) 2023-08-03 16:34:35 -04:00
Aaron
db8e47bc5b fix(build): correct module type for stubs and strip assert [skip ci]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-03 04:15:55 -04:00
Aaron
8f74e24c2f fix: clone all for nightly strategy
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-03 03:17:18 -04:00
aarnphm-ec2-dev
29ca9f398f fix: add arch_list for cross compiling
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-08-03 04:33:48 +00:00
aarnphm-ec2-dev
a01d867bc7 chore(base): add auto-gptq CUDA kernel
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-08-03 02:40:06 +00:00
Aaron
af64a6dfd5 chore(docs): update to obsidian README format
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-02 21:49:33 -04:00
aarnphm-ec2-dev
b349820429 fix(build): add `--device` into envvar
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-08-03 00:44:40 +00:00
Aaron Pham
cfc7f3888d chore(vllm): add all supported models (#179) 2023-08-02 17:42:02 -04:00
Aaron Pham
72337410cf fix: nightly resolver for correct tag (#177) 2023-08-02 13:10:50 -04:00
Aaron
d4fbfa5e5c fix: custom release strategy for correct naming
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-02 03:03:21 -04:00
Aaron Pham
acb81a6e1a fix(build): dispatch container via workflow calls (#174)
add OPENLLM_USE_LOCAL_LATEST as default behaviour within container
2023-08-02 01:54:10 -04:00
pre-commit-ci[bot]
c2ed1d56da chore(release): update base container restriction (#173)
Prepare for 0.2.12 release

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-01 15:25:17 -04:00
Aaron
6ba8899743 fix: remove invalid OPENLLMDEVDEBUG envvar
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-01 01:52:08 -04:00
Aaron
961455c762 fix(cli): always --force on --push
feat: add --bento-version for ``openllm build``

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-01 00:56:46 -04:00
Aaron
ca5e3c7ae5 fix: correct setup property for envvar instance
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-07-31 23:34:42 -04:00
Aaron Pham
729e423b17 chore(bnb): filter warnings message on CPU (#170) 2023-07-31 15:48:59 -04:00
Aaron Pham
8c2867d26d style: define experimental guidelines (#168) 2023-07-31 07:54:26 -04:00
Aaron Pham
ef94c6b98a feat(container): vLLM build and base image strategies (#142) 2023-07-31 02:44:52 -04:00
aarnphm-ec2-dev
fc66ff275b fix: make sure to add torch to dependencies
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-07-28 00:01:52 +00:00
Aaron Pham
15640a85cd feat: supports embeddings for T5 and ChatGLM family generation (#153) 2023-07-27 16:43:43 -04:00