Commit Graph

391 Commits

Author SHA1 Message Date
aarnphm-ec2-dev
a01d867bc7 chore(base): add auto-gptq CUDA kernel
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-08-03 02:40:06 +00:00
Aaron
af64a6dfd5 chore(docs): update to obsidian README format
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-02 21:49:33 -04:00
aarnphm-ec2-dev
b349820429 fix(build): add `--device` into envvar
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-08-03 00:44:40 +00:00
Aaron Pham
cfc7f3888d chore(vllm): add all supported models (#179) 2023-08-02 17:42:02 -04:00
Aaron Pham
72337410cf fix: nightly resolver for correct tag (#177) 2023-08-02 13:10:50 -04:00
Aaron
d4fbfa5e5c fix: custom release strategy for correct naming
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-02 03:03:21 -04:00
Aaron Pham
acb81a6e1a fix(build): dispatch container via workflow calls (#174)
add OPENLLM_USE_LOCAL_LATEST as default behaviour within container
2023-08-02 01:54:10 -04:00
pre-commit-ci[bot]
c2ed1d56da chore(release): update base container restriction (#173)
Prepare for 0.2.12 release

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-01 15:25:17 -04:00
Aaron
6ba8899743 fix: remove invalid OPENLLMDEVDEBUG envvar
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-01 01:52:08 -04:00
Aaron
961455c762 fix(cli): always --force on --push
feat: add --bento-version for ``openllm build``

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-01 00:56:46 -04:00
Aaron
ca5e3c7ae5 fix: correct setup property for envvar instance
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-07-31 23:34:42 -04:00
Aaron Pham
729e423b17 chore(bnb): filter warnings message on CPU (#170) 2023-07-31 15:48:59 -04:00
Aaron Pham
8c2867d26d style: define experimental guidelines (#168) 2023-07-31 07:54:26 -04:00
Aaron Pham
ef94c6b98a feat(container): vLLM build and base image strategies (#142) 2023-07-31 02:44:52 -04:00
aarnphm-ec2-dev
fc66ff275b fix: make sure to add torch to dependencies
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-07-28 00:01:52 +00:00
Aaron Pham
15640a85cd feat: supports embeddings for T5 and ChatGLM family generation (#153) 2023-07-27 16:43:43 -04:00
Aaron Pham
e075bd25ea chore: add NousResearch's as non-gated Llama (#152) 2023-07-27 15:30:56 -04:00
aarnphm-ec2-dev
6dc0bf0b12 fix: remove breakpoint on CLI
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-07-25 16:30:16 +00:00
aarnphm-ec2-dev
b23b59e1c9 fix(embeddings): correctly set JSON data via CLI client
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-07-25 16:26:01 +00:00
Aaron Pham
1940086bec feat(client): embeddings (#146) 2023-07-25 05:44:21 -04:00
Aaron Pham
dcd34bd381 fix(build): running bento insider container (#141)
Behaviour of `docker run` should be the same with `openllm start`
2023-07-25 04:24:28 -04:00
Aaron Pham
c391717226 feat(ci): automatic release semver + git archival installation (#143) 2023-07-25 04:18:49 -04:00
Aaron Pham
5635ce8d87 infra: bump to dev version of 0.2.10.dev0 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-07-24 23:35:04 +00:00
Aaron Pham
fb656164e1 infra: prepare for release 0.2.9 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-07-24 23:24:09 +00:00
aarnphm-ec2-dev
084786c898 fix(cli): `openllm models` for showing available
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-07-24 23:00:03 +00:00
Aaron Pham
e72f0d55f4 infra: bump to dev version of 0.2.9.dev0 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-07-24 19:58:13 +00:00
Aaron Pham
23a8ae44ed infra: prepare for release 0.2.8 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-07-24 19:44:11 +00:00
Aaron Pham
7eabcd4355 feat: vLLM integration for PagedAttention (#134) 2023-07-24 15:42:17 -04:00
aarnphm-ec2-dev
4cd0784ee2 chore: export generation items for lazy loading
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-07-23 08:01:55 +00:00
aarnphm-ec2-dev
e2cdd767ef chore(cli): simplify table for `openllm models`
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-07-23 06:29:58 +00:00
Aaron Pham
693631958a feat(service): provisional API (#133) 2023-07-23 02:15:39 -04:00
Aaron Pham
d88b069160 infra: bump to dev version of 0.2.8.dev0 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-07-23 01:21:32 +00:00
Aaron Pham
b74bea36a7 infra: prepare for release 0.2.7 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-07-23 01:10:45 +00:00
aarnphm-ec2-dev
99bb0e4446 fix(serialisation): using save_pretrained with import_model
Fix llm_post_init correct wrapper behaviour

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-07-23 01:07:39 +00:00
aarnphm-ec2-dev
d4f3cf8b75 fix(llm): ignore quantization config when --quantize int4 is passed
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-07-22 22:45:46 +00:00
aarnphm-ec2-dev
6f4c58175d chore(llm): add envvar for making tag
the envvar isd OPENLLM_USE_LOCAL_LATEST

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-07-22 21:37:19 +00:00
Aaron Pham
57a0fec247 infra: bump to dev version of 0.2.7.dev0 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-07-22 21:31:19 +00:00
Aaron Pham
71689e506d infra: prepare for release 0.2.6 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-07-22 21:19:04 +00:00
Aaron Pham
19f20c7dad perf(serialisation): implement wrapper to reduce callstack (#132) 2023-07-22 17:15:03 -04:00
Aaron
ecf31e90b7 chore(configuration): remove unused call
to remove one call in the call stack

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-07-22 15:25:44 -04:00
aarnphm-ec2-dev
beb8c2bb08 fix(ft): set report_to none to avoid wandb setup
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-07-21 18:33:38 +00:00
Aaron
5d2dd470d0 infra: bump to dev version of 0.2.6.dev0 [generated] [skip ci]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-07-21 14:16:48 -04:00
Aaron Pham
d49ff95f7f infra: prepare for release 0.2.5 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-07-21 17:59:21 +00:00
Aaron Pham
81b0451685 feat(cli): query with per request instruction (#130) 2023-07-21 13:57:21 -04:00
aarnphm-ec2-dev
aa32bfcc4d infra: bump to dev version of 0.2.5.dev0 [generated]
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-07-21 15:45:13 +00:00
Aaron Pham
6b61217523 infra: prepare for release 0.2.4 [generated]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-07-21 08:21:36 +00:00
aarnphm-ec2-dev
e4ac0ed8b7 fix(cuda): support loading in single GPU
add available_devices for getting # of available GPUs

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-07-21 08:10:01 +00:00
aarnphm-ec2-dev
033358a991 infra: bump to dev version of 0.2.4.dev0 [generated]
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-07-21 06:31:49 +00:00
Aaron Pham
e5cada218a infra: prepare for release 0.2.3 [generated]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-07-21 03:39:34 +00:00
Aaron
9ccbd60584 revert: include configuration to labels
This is used for starting up the bento

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-07-20 23:37:25 -04:00