Commit Graph

278 Commits

Author SHA1 Message Date
Aaron
ebcedc35de fix(exception): handle notfound explicitly
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-13 20:15:38 -04:00
Aaron
0ab7450e90 chore(types): add hints for LLMRunner
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-13 20:13:33 -04:00
Aaron
03c90c2a13 fix(llm): ensure we don't bleed runner options
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-13 20:05:33 -04:00
Aaron
e3ccf766d7 chore: expose LLMRunner for type
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-13 19:47:36 -04:00
aarnphm-ec2-dev
1194684658 fix(llm): cached load
Ensure we only load the llm once

fix falcon offloading load

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-13 14:22:09 +00:00
Aaron
74c8323e42 docs: update generated with href
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-13 07:30:43 -04:00
Aaron Pham [bot]
ece2b377c0 infra: bump to dev version of 0.1.3.dev0 [generated]
Signed-off-by: Aaron Pham [bot] <29749331+aarnphm@users.noreply.github.com>
2023-06-13 11:24:14 +00:00
Aaron Pham [bot]
398ed85b9b infra: prepare for release 0.1.2 [generated]
Signed-off-by: Aaron Pham [bot] <29749331+aarnphm@users.noreply.github.com>
v0.1.2
2023-06-13 11:14:25 +00:00
Aaron
cb76a894cf feat(metadata): add configuration to metadata endpoint
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-13 07:09:31 -04:00
Aaron Pham
dd20941050 chore: metadata (#19) 2023-06-13 04:09:33 -04:00
Aaron
764d86289c chore(readme): update table with model_ids matrix
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-12 16:57:40 -04:00
Aaron Pham [bot]
b5547bbc97 infra: bump to dev version of 0.1.2.dev0 [generated]
Signed-off-by: Aaron Pham [bot] <29749331+aarnphm@users.noreply.github.com>
2023-06-12 20:30:48 +00:00
Aaron Pham [bot]
f85bbec147 infra: prepare for release 0.1.1 [generated]
Signed-off-by: Aaron Pham [bot] <29749331+aarnphm@users.noreply.github.com>
v0.1.1
2023-06-12 20:19:34 +00:00
Aaron
71070b90b4 chore(metadata): fix model_id to be respected on service.py
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-12 16:04:52 -04:00
Aaron
4717989384 fix(tokenizers): allow forking by default
address message about forking in tokenizers

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-12 15:47:51 -04:00
Aaron
aa8812cf90 fix(build): empty model_id
Set the envvar after we initialize LLM

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-12 14:31:34 -04:00
Aaron
30a8c32a53 infra: bump to dev version of 0.1.1.dev0 [generated]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-12 14:31:34 -04:00
Aaron
53a63dbe78 infra: prepare for release 0.1.0
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
v0.1.0
2023-06-12 14:23:26 -04:00
Aaron Pham
f8ebb36e15 tests: fastpath (#17)
added fastpath cases for configuration and Flan-T5

fixes respecting model_id into lifecycle hooks.

update CLI to cleanup models info
2023-06-12 14:18:26 -04:00
Chaoyu
187a5f834f docs: add --model-id command (#18)
Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-06-12 14:03:36 -04:00
Jian Shen
d3bbb727ea doc: add gif to readme 2023-06-12 15:51:08 +08:00
Aaron
0fc209da72 chore: bump up dependencies for BentoML
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-12 01:26:25 -04:00
Aaron
f8e99dd8f5 chore(configuration): clean house implementation
Using Attrs implementation

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-11 18:46:15 -04:00
aarnphm-ec2-dev
1847209489 feat(cli): --workers
provide workers-per-resource configuration on CLI for build and start

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-11 16:22:13 +00:00
aarnphm-ec2-dev
81d46ca211 feat(type): support annotations
openllm.LLM now supports fully typed-strict

openllm.LLM[ModelType, TokenizerType] -> self.model, self.tokenizer

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-11 14:58:17 +00:00
aarnphm-ec2-dev
2e453fb005 refactor(configuration): __config__ and perf
move model_ids and default_id to config class declaration,
cleanup dependencies between config and LLM implementation

lazy load module during LLM creation to llm_post_init

fix post_init hooks to run load_in_mha.

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-11 12:53:15 +00:00
aarnphm-ec2-dev
17241292da feat(cli): show runtime implementation
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-11 05:29:40 +00:00
Aaron
06c90c0ba3 docs: update matrix [generated]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-11 00:48:09 -04:00
Aaron Pham [bot]
3177781e50 infra: bump to dev version of 0.0.35.dev0 [generated]
Signed-off-by: Aaron Pham [bot] <29749331+aarnphm@users.noreply.github.com>
2023-06-11 04:45:24 +00:00
Aaron Pham [bot]
0552b32456 infra: prepare for release 0.0.34 [generated]
Signed-off-by: Aaron Pham [bot] <29749331+aarnphm@users.noreply.github.com>
v0.0.34
2023-06-11 04:35:30 +00:00
aarnphm-ec2-dev
a5efb7fcb1 fix(stablelm): running on GPU
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-11 04:28:22 +00:00
aarnphm-ec2-dev
8762a56093 revert: broken KeyboardInterrupt change
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-11 04:20:07 +00:00
aarnphm-ec2-dev
512cd0715c feat(service): implementing with lifecycle hooks
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-11 04:14:18 +00:00
aarnphm-ec2-dev
5a7942574f chore(docs): update docs for to_runner
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-11 03:38:56 +00:00
Aaron
6a937d8b51 feat(scheduling): custom GPU offload strategy
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-10 22:57:54 -04:00
Aaron
b22468e8c4 feat(cli): openllm models --show-available
show available models locally

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-10 20:46:11 -04:00
Aaron
7d71246322 fix(stablelm): load with BetterTransformers on CPU only
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-10 20:46:11 -04:00
aarnphm-ec2-dev
204a7ab7c9 revert(starcoder): quant 8
revert 2348946ada

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-10 23:17:42 +00:00
aarnphm-ec2-dev
bb37f7e238 feat(utils): lazy load modules and fix typo
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-10 22:18:37 +00:00
Aaron
05fa34f9e6 refactor: pretrained => model_id
I think model_id makes more sense than calling it pretrained

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-10 17:36:02 -04:00
Aaron
4841051fc5 feat(stablelm): CPU inference
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-10 07:53:29 -04:00
aarnphm-ec2-dev
53296111d0 fix(gpu): enable device_map 'auto' to multi-gpu setup only
This device_map is a magical value to set all available GPU to the
model. Usually this should only be set when multiple GPUs are
available.

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-10 07:41:30 -04:00
Aaron Pham [bot]
66a87ef0b7 infra: bump to dev version of 0.0.34.dev0 [generated]
Signed-off-by: Aaron Pham [bot] <29749331+aarnphm@users.noreply.github.com>
2023-06-10 10:19:02 +00:00
Aaron Pham [bot]
56f50deab6 infra: prepare for release 0.0.33 [generated]
Signed-off-by: Aaron Pham [bot] <29749331+aarnphm@users.noreply.github.com>
v0.0.33
2023-06-10 10:09:12 +00:00
aarnphm-ec2-dev
2348946ada fix(starcoder): disable quant 8
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-10 10:01:43 +00:00
aarnphm-ec2-dev
4db141c649 feat(gpu): support passing GPU per LLM
respect CUDA_VISIBLE_DEVICES and optionally --device

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-10 09:47:16 +00:00
aarnphm-ec2-dev
ebfed3c116 fix(chatglm): generation tokens not concatenated correctly
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-10 09:46:33 +00:00
Aaron
d70530cb0e chore: add stubs for deepmerge [generated]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-10 03:04:56 -04:00
aarnphm-ec2-dev
8fbf352ec6 docs: add more information about pretrained weights
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-10 06:58:36 +00:00
aarnphm-ec2-dev
c669d38dea fix(flan-t5): casting model to CUDA
Add a notes about GPU support for Flax

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-10 02:55:55 -04:00