aarnphm-ec2-dev
2e453fb005
refactor(configuration): __config__ and perf
...
move model_ids and default_id to config class declaration,
cleanup dependencies between config and LLM implementation
lazy load module during LLM creation to llm_post_init
fix post_init hooks to run load_in_mha.
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-11 12:53:15 +00:00
aarnphm-ec2-dev
17241292da
feat(cli): show runtime implementation
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-11 05:29:40 +00:00
Aaron
06c90c0ba3
docs: update matrix [generated]
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-11 00:48:09 -04:00
Aaron Pham [bot]
3177781e50
infra: bump to dev version of 0.0.35.dev0 [generated]
...
Signed-off-by: Aaron Pham [bot] <29749331+aarnphm@users.noreply.github.com >
2023-06-11 04:45:24 +00:00
Aaron Pham [bot]
0552b32456
infra: prepare for release 0.0.34 [generated]
...
Signed-off-by: Aaron Pham [bot] <29749331+aarnphm@users.noreply.github.com >
v0.0.34
2023-06-11 04:35:30 +00:00
aarnphm-ec2-dev
a5efb7fcb1
fix(stablelm): running on GPU
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-11 04:28:22 +00:00
aarnphm-ec2-dev
8762a56093
revert: broken KeyboardInterrupt change
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-11 04:20:07 +00:00
aarnphm-ec2-dev
512cd0715c
feat(service): implementing with lifecycle hooks
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-11 04:14:18 +00:00
aarnphm-ec2-dev
5a7942574f
chore(docs): update docs for to_runner
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-11 03:38:56 +00:00
Aaron
6a937d8b51
feat(scheduling): custom GPU offload strategy
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-10 22:57:54 -04:00
Aaron
b22468e8c4
feat(cli): openllm models --show-available
...
show available models locally
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-10 20:46:11 -04:00
Aaron
7d71246322
fix(stablelm): load with BetterTransformers on CPU only
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-10 20:46:11 -04:00
aarnphm-ec2-dev
204a7ab7c9
revert(starcoder): quant 8
...
revert 2348946ada
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-10 23:17:42 +00:00
aarnphm-ec2-dev
bb37f7e238
feat(utils): lazy load modules and fix typo
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-10 22:18:37 +00:00
Aaron
05fa34f9e6
refactor: pretrained => model_id
...
I think model_id makes more sense than calling it pretrained
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-10 17:36:02 -04:00
Aaron
4841051fc5
feat(stablelm): CPU inference
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-10 07:53:29 -04:00
aarnphm-ec2-dev
53296111d0
fix(gpu): enable device_map 'auto' to multi-gpu setup only
...
This device_map is a magical value to set all available GPU to the
model. Usually this should only be set when multiple GPUs are
available.
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-10 07:41:30 -04:00
Aaron Pham [bot]
66a87ef0b7
infra: bump to dev version of 0.0.34.dev0 [generated]
...
Signed-off-by: Aaron Pham [bot] <29749331+aarnphm@users.noreply.github.com >
2023-06-10 10:19:02 +00:00
Aaron Pham [bot]
56f50deab6
infra: prepare for release 0.0.33 [generated]
...
Signed-off-by: Aaron Pham [bot] <29749331+aarnphm@users.noreply.github.com >
v0.0.33
2023-06-10 10:09:12 +00:00
aarnphm-ec2-dev
2348946ada
fix(starcoder): disable quant 8
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-10 10:01:43 +00:00
aarnphm-ec2-dev
4db141c649
feat(gpu): support passing GPU per LLM
...
respect CUDA_VISIBLE_DEVICES and optionally --device
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-10 09:47:16 +00:00
aarnphm-ec2-dev
ebfed3c116
fix(chatglm): generation tokens not concatenated correctly
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-10 09:46:33 +00:00
Aaron
d70530cb0e
chore: add stubs for deepmerge [generated]
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-10 03:04:56 -04:00
aarnphm-ec2-dev
8fbf352ec6
docs: add more information about pretrained weights
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-10 06:58:36 +00:00
aarnphm-ec2-dev
c669d38dea
fix(flan-t5): casting model to CUDA
...
Add a notes about GPU support for Flax
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-10 02:55:55 -04:00
Aaron
afddaed08c
fix(perf): respect per request information
...
remove use_default_prompt_template options
add pretrained to list of start help docstring
fix flax generation config
improve flax and tensorflow implementation
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-10 02:14:13 -04:00
Aaron
e90d90e9a0
feat(docs): copy button from table list
...
the script now generate into a HTML table, which allows us to use the
copy button from the README.md
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-10 01:23:56 -04:00
Aaron
7d382ced4f
chore(docs): update notes about flan-t5
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-10 00:22:12 -04:00
Chaoyu
9ffe1f40bf
chore: rename LICENSE to LICENSE.md
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-10 00:21:54 -04:00
Chaoyu
e2b26adf2f
chore(docs): update README.md
...
See #12
2023-06-10 00:21:21 -04:00
Aaron
1597d5d4bb
chore(readme): update stablelm [generated]
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-10 00:21:21 -04:00
Aaron
bca133f389
revert: update metadata for Python 3.8 and 3.9
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-10 00:21:20 -04:00
Aaron Pham [bot]
11cedce974
infra: bump to dev version of 0.0.33.dev0 [generated]
...
Signed-off-by: Aaron Pham [bot] <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-10 00:21:20 -04:00
Aaron Pham [bot]
03ac525949
infra: prepare for release 0.0.32 [generated]
...
Signed-off-by: Aaron Pham [bot] <29749331+aarnphm@users.noreply.github.com >
v0.0.32
2023-06-09 19:05:09 +00:00
Aaron
9bbe1ff4bf
chore(stablelm): make stablelm run explicitly with GPU
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-09 14:57:12 -04:00
Aaron
c51e944cb2
chore(version): remove support for 3.8 and 3.9 for now
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-08 22:47:57 -04:00
Aaron
b72317db67
fix(import): lazy load torch
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-08 22:05:41 -04:00
Aaron
16df0f4393
chore(infra): increase timeout to 60m
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-08 18:18:51 -04:00
Aaron Pham [bot]
d005760c68
infra: bump to dev version of 0.0.32.dev0 [generated]
...
Signed-off-by: Aaron Pham [bot] <29749331+aarnphm@users.noreply.github.com >
2023-06-08 22:15:29 +00:00
Aaron Pham [bot]
e2813f843e
infra: prepare for release 0.0.31 [generated]
...
Signed-off-by: Aaron Pham [bot] <29749331+aarnphm@users.noreply.github.com >
v0.0.31
2023-06-08 22:04:19 +00:00
Aaron
ebe5ae797e
fix(script): avoid using private variable
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-08 17:59:06 -04:00
Aaron
f5edd4fcf4
feat(script): add easy script to release
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-08 17:52:39 -04:00
Aaron
f284c64370
docs: update release-notes run with ref for tags
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-08 17:18:23 -04:00
aarnphm-ec2-dev
acf78ce731
fix(saving): make sure to cleanup cuda cache after using default
...
import
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-08 21:11:07 +00:00
Aaron Pham [bot]
a451b03a0a
infra: bump to dev version of 0.0.31.dev0 [generated]
...
Signed-off-by: Aaron Pham [bot] <29749331+aarnphm@users.noreply.github.com >
2023-06-08 21:10:01 +00:00
Aaron Pham [bot]
55d584a986
infra: prepare for release 0.0.30 [generated]
...
Signed-off-by: Aaron Pham [bot] <29749331+aarnphm@users.noreply.github.com >
v0.0.30
2023-06-08 20:55:39 +00:00
aarnphm-ec2-dev
2f9bd2f6fe
fix(packaging): make sure to add BENTOML_CONFIG_OPTIONS into
...
Dockerfile
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-08 20:33:20 +00:00
Aaron
71198b66cc
revert: move release-notes to separate actions
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-08 16:03:41 -04:00
Aaron
1902954463
infra: bump to dev version of 0.0.30.dev0 [generated]
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-08 16:03:36 -04:00
Aaron Pham [bot]
2db7663ba5
infra: prepare for release 0.0.29 [generated]
...
Signed-off-by: Aaron Pham [bot] <29749331+aarnphm@users.noreply.github.com >
v0.0.29
2023-06-08 19:56:51 +00:00