Aaron Pham
|
c74f3de6c7
|
chore: update typing to hijack compliant
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2024-07-10 20:19:41 -04:00 |
|
Aaron Pham
|
e1675652d1
|
chore: add repo default utils
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2024-07-10 18:01:15 -04:00 |
|
Aaron Pham
|
3fbb75f7e9
|
chore: add instruction to access chat URL
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2024-07-09 22:56:27 -04:00 |
|
Aaron Pham
|
f4d822125e
|
chore: ready for 0.6 releases
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2024-07-09 22:05:43 -04:00 |
|
Aaron Pham
|
cd872ef631
|
refactor: monorepo (#203)
|
2023-08-15 02:11:14 -04:00 |
|
pre-commit-ci[bot]
|
2d33100d72
|
ci: pre-commit autoupdate [pre-commit.ci] (#207)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
2023-08-14 19:20:05 -04:00 |
|
Aaron Pham
|
f6317d8003
|
infra: enable compiled wheels for all supported Python (#201)
|
2023-08-12 04:54:50 -04:00 |
|
aarnphm-ec2-dev
|
dc776e9c5a
|
chore(autogptq): update to latest commits
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
|
2023-08-11 11:02:05 +00:00 |
|
Aaron
|
785c1db237
|
fix(client): include openllm.client into main module [skip ci]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-08-11 06:19:56 -04:00 |
|
Aaron Pham
|
5329853b10
|
perf: compiled modules and enable lazyeval (#200)
|
2023-08-11 05:53:45 -04:00 |
|
Aaron Pham
|
c083990edd
|
infra: migrate to initial openllm-node library (#199)
|
2023-08-10 18:54:00 -04:00 |
|
aarnphm-ec2-dev
|
034610b6b0
|
fix(embeddings): correct imports
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
|
2023-08-10 22:47:38 +00:00 |
|
aarnphm-ec2-dev
|
689b83bbe3
|
fix(loading): make sure not to load to cuda with kbit quantisation
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
|
2023-08-10 19:39:01 +00:00 |
|
Aaron
|
e0daea6e78
|
fix(compile): absolute import for compiled wheels
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-08-09 22:51:35 -04:00 |
|
aarnphm-ec2-dev
|
dfc4b489c5
|
feat(build): notes on compiled wheels for Bento
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
|
2023-08-09 21:52:34 +00:00 |
|
Aaron Pham
|
b1445c6516
|
refactor(cli): compiled wheels and extension modules (#191)
|
2023-08-09 17:10:15 -04:00 |
|
Aaron Pham
|
b9dd54f634
|
feat: homebrew tap (#190)
|
2023-08-08 22:11:48 -04:00 |
|
aarnphm-ec2-dev
|
deaee67b47
|
fix(loading): make sure to cast the model to cuda if PyTorch
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
|
2023-08-09 01:42:11 +00:00 |
|
aarnphm-ec2-dev
|
ae35ee8115
|
fix(build): set legacy serialisation for vllm on Bento
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
|
2023-08-08 20:10:49 +00:00 |
|
Aaron Pham
|
2d47a54efd
|
feat(strategy): spawn one runner instance (#189)
|
2023-08-08 05:47:11 -04:00 |
|
Aaron Pham
|
cb6f3aa48e
|
feat: --force-push to allow force push to bentocloud (#188)
|
2023-08-08 01:06:59 -04:00 |
|
Aaron
|
371a7c896c
|
fix: loading models within k8s API server
remove a logic where the API server tries to load the model when it is
not available locally
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-08-08 00:22:48 -04:00 |
|
Aaron Pham
|
21ea7e493f
|
feat(generation): initial work for generating tokens (#186)
|
2023-08-06 20:04:40 -04:00 |
|
Aaron Pham
|
2d5be909cd
|
fix(models): setup xformers and loading PyTorch meta weights (#185)
|
2023-08-06 03:25:02 -04:00 |
|
Aaron Pham
|
4875c3a109
|
feat: optimize model saving and loading on single GPU (#183)
|
2023-08-06 01:00:49 -04:00 |
|
Aaron
|
90072ec5ee
|
fix(regression): setting quantize only if it is not None
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-08-04 12:40:55 -04:00 |
|
Aaron
|
ba07205156
|
fix: disable building xformers from source
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-08-04 12:14:04 -04:00 |
|
Aaron
|
287b7f9ab2
|
fix: releases issue when building new container [skip ci]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-08-04 11:31:02 -04:00 |
|
Aaron
|
1e74e967d1
|
fix(container): correct cache directory
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-08-04 10:31:06 -04:00 |
|
Aaron Pham
|
2541a0f8dc
|
infra: initial work on compiling mypyc wheels (#182)
|
2023-08-04 10:20:03 -04:00 |
|
Aaron Pham
|
2cc264aa72
|
fix(vllm): correctly load given model id from envvar (#181)
|
2023-08-03 16:34:35 -04:00 |
|
Aaron
|
db8e47bc5b
|
fix(build): correct module type for stubs and strip assert [skip ci]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-08-03 04:15:55 -04:00 |
|
Aaron
|
8f74e24c2f
|
fix: clone all for nightly strategy
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-08-03 03:17:18 -04:00 |
|
aarnphm-ec2-dev
|
29ca9f398f
|
fix: add arch_list for cross compiling
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
|
2023-08-03 04:33:48 +00:00 |
|
aarnphm-ec2-dev
|
a01d867bc7
|
chore(base): add auto-gptq CUDA kernel
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
|
2023-08-03 02:40:06 +00:00 |
|
Aaron
|
af64a6dfd5
|
chore(docs): update to obsidian README format
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-08-02 21:49:33 -04:00 |
|
aarnphm-ec2-dev
|
b349820429
|
fix(build): add `--device` into envvar
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
|
2023-08-03 00:44:40 +00:00 |
|
Aaron Pham
|
cfc7f3888d
|
chore(vllm): add all supported models (#179)
|
2023-08-02 17:42:02 -04:00 |
|
Aaron Pham
|
72337410cf
|
fix: nightly resolver for correct tag (#177)
|
2023-08-02 13:10:50 -04:00 |
|
Aaron
|
d4fbfa5e5c
|
fix: custom release strategy for correct naming
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-08-02 03:03:21 -04:00 |
|
Aaron Pham
|
acb81a6e1a
|
fix(build): dispatch container via workflow calls (#174)
add OPENLLM_USE_LOCAL_LATEST as default behaviour within container
|
2023-08-02 01:54:10 -04:00 |
|
pre-commit-ci[bot]
|
c2ed1d56da
|
chore(release): update base container restriction (#173)
Prepare for 0.2.12 release
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-08-01 15:25:17 -04:00 |
|
Aaron
|
6ba8899743
|
fix: remove invalid OPENLLMDEVDEBUG envvar
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-08-01 01:52:08 -04:00 |
|
Aaron
|
961455c762
|
fix(cli): always --force on --push
feat: add --bento-version for ``openllm build``
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-08-01 00:56:46 -04:00 |
|
Aaron
|
ca5e3c7ae5
|
fix: correct setup property for envvar instance
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-07-31 23:34:42 -04:00 |
|
Aaron Pham
|
729e423b17
|
chore(bnb): filter warnings message on CPU (#170)
|
2023-07-31 15:48:59 -04:00 |
|
Aaron Pham
|
8c2867d26d
|
style: define experimental guidelines (#168)
|
2023-07-31 07:54:26 -04:00 |
|
Aaron Pham
|
ef94c6b98a
|
feat(container): vLLM build and base image strategies (#142)
|
2023-07-31 02:44:52 -04:00 |
|
aarnphm-ec2-dev
|
fc66ff275b
|
fix: make sure to add torch to dependencies
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
|
2023-07-28 00:01:52 +00:00 |
|
Aaron Pham
|
15640a85cd
|
feat: supports embeddings for T5 and ChatGLM family generation (#153)
|
2023-07-27 16:43:43 -04:00 |
|