Commit Graph

75 Commits

Author SHA1 Message Date
Aaron Pham
ded8a9f809 feat: quantization (#27) 2023-06-16 18:10:50 -04:00
Aaron Pham
19bc7e3116 feat: fine-tuning [part 1] (#23) 2023-06-16 00:19:01 -04:00
Aaron Pham
5e1445218b refactor: toplevel CLI (#26)
Move up CLI outside of the factory function to simplify workflow
2023-06-15 02:32:46 -04:00
aarnphm-ec2-dev
dfe71d7867 chore(cli): redirect download models into subcontext
utilise click subcontext for nicer CLI interaction

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-14 11:44:39 +00:00
Aaron
d7e92ae525 feat(cli): --device all --workers-per-resource
synonymous to the configuration arguments

add support for --device all

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-14 06:36:54 -04:00
Aaron
111d205f63 perf: faster LLM loading
using attrs for faster class creation opposed to metaclass

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-14 01:36:42 -04:00
Aaron Pham
dd20941050 chore: metadata (#19) 2023-06-13 04:09:33 -04:00
Aaron
71070b90b4 chore(metadata): fix model_id to be respected on service.py
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-12 16:04:52 -04:00
Aaron
4717989384 fix(tokenizers): allow forking by default
address message about forking in tokenizers

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-12 15:47:51 -04:00
Aaron Pham
f8ebb36e15 tests: fastpath (#17)
added fastpath cases for configuration and Flan-T5

fixes respecting model_id into lifecycle hooks.

update CLI to cleanup models info
2023-06-12 14:18:26 -04:00
Aaron
f8e99dd8f5 chore(configuration): clean house implementation
Using Attrs implementation

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-11 18:46:15 -04:00
aarnphm-ec2-dev
1847209489 feat(cli): --workers
provide workers-per-resource configuration on CLI for build and start

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-11 16:22:13 +00:00
aarnphm-ec2-dev
81d46ca211 feat(type): support annotations
openllm.LLM now supports fully typed-strict

openllm.LLM[ModelType, TokenizerType] -> self.model, self.tokenizer

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-11 14:58:17 +00:00
aarnphm-ec2-dev
2e453fb005 refactor(configuration): __config__ and perf
move model_ids and default_id to config class declaration,
cleanup dependencies between config and LLM implementation

lazy load module during LLM creation to llm_post_init

fix post_init hooks to run load_in_mha.

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-11 12:53:15 +00:00
aarnphm-ec2-dev
17241292da feat(cli): show runtime implementation
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-11 05:29:40 +00:00
aarnphm-ec2-dev
8762a56093 revert: broken KeyboardInterrupt change
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-11 04:20:07 +00:00
aarnphm-ec2-dev
512cd0715c feat(service): implementing with lifecycle hooks
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-11 04:14:18 +00:00
Aaron
6a937d8b51 feat(scheduling): custom GPU offload strategy
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-10 22:57:54 -04:00
Aaron
b22468e8c4 feat(cli): openllm models --show-available
show available models locally

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-10 20:46:11 -04:00
aarnphm-ec2-dev
bb37f7e238 feat(utils): lazy load modules and fix typo
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-10 22:18:37 +00:00
Aaron
05fa34f9e6 refactor: pretrained => model_id
I think model_id makes more sense than calling it pretrained

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-10 17:36:02 -04:00
aarnphm-ec2-dev
4db141c649 feat(gpu): support passing GPU per LLM
respect CUDA_VISIBLE_DEVICES and optionally --device

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-10 09:47:16 +00:00
aarnphm-ec2-dev
8fbf352ec6 docs: add more information about pretrained weights
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-10 06:58:36 +00:00
Aaron
afddaed08c fix(perf): respect per request information
remove use_default_prompt_template options

add pretrained to list of start help docstring

fix flax generation config

improve flax and tensorflow implementation

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-10 02:14:13 -04:00
Chaoyu
e2b26adf2f chore(docs): update README.md
See #12
2023-06-10 00:21:21 -04:00
aarnphm-ec2-dev
0f7840626d fix(cli): make sure to allow user to pass endpointu
--endpoint http://0.0.0.0:3000

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-08 19:23:04 +00:00
Aaron
a84661142c chore(cli): remove --local for query
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-08 14:53:11 -04:00
Aaron
c0418b76ec feat(infra): add tools for managing optional-dependencies
based on llm config

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-08 08:57:19 -04:00
aarnphm-ec2-dev
e9e12a66a8 fix(falcon): custom load
This has to do with pipeline load is pretty magical and broken
on transformers

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-08 09:03:34 +00:00
Aaron
f2771bfe49 chore(cli): move back --version
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-07 03:41:50 -04:00
aarnphm-ec2-dev
170be0ebc8 fix(cli): make sure make_tag to respect config trust_remote_code
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-07 04:35:15 +00:00
Aaron
d6d2de6748 feat(cli): prune
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-06 23:24:50 -04:00
Aaron
aa50b5279e fix(falcon): loading based on model registration
remove duplicate events

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-06 22:42:28 -04:00
Aaron
8823c70e5a chore: rename variants to pretrained for consistency
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-06 18:45:45 -04:00
Aaron
f78d55f0fd fix(cli): type handling for specific container types
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-06 17:18:25 -04:00
Aaron
b446b65642 chore(cli): remove alias and use build to be consistent with BentoML
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-06 15:51:13 -04:00
Aaron
a0749d0a80 chore: update version message
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-06 08:31:40 -04:00
Aaron
1707beb7aa feat(cli): openllm query
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-06 08:05:13 -04:00
Aaron
64d783107d chore(cli): update namespace and show better traceback
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-03 06:39:01 -07:00
Aaron
49cb02d2f2 perf(cli): improve printing speed that respect terminal_size
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-02 06:58:11 -07:00
aarnphm-ec2-dev
c3aeb43997 fix: generation serde
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-02 07:06:04 +00:00
aarnphm-ec2-dev
07d42daaec fix: make sure we evolve the attribute from CLI
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-02 05:52:04 +00:00
aarnphm-ec2-dev
a94294bc65 fix: generate attrs class internally to conform with interface
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-06-01 19:06:06 +00:00
Aaron
84358b28cd chore: handle KeyboardInterrupt correctly
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-01 01:23:44 -07:00
Aaron
e86dc35ec5 chore: migrate service to use JSON
until we have attrs io descriptor, this should do it

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-01 00:28:43 -07:00
Aaron
4e2d5e330c refactor(cli): move CLI to address anti-pattern
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-05-31 13:53:40 -07:00
Aaron
33e7004e66 format: consistent CLI outputs
vendorred type-related module from bentoml._internal.types

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-05-30 14:56:11 -07:00
Aaron
fa16c67131 fix(cli): remove debug print
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-05-30 12:02:00 -07:00
Aaron Pham
01517e37c6 migration: attrs (#7)
Move configuration to attrs

Depends on https://github.com/bentoml/BentoML/pull/3906
2023-05-30 11:59:21 -07:00
Aaron
ac710dfd54 revert(perf): remove group alias
There is no need for this feature

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-05-28 10:04:33 -07:00