Aaron Pham
ded8a9f809
feat: quantization ( #27 )
2023-06-16 18:10:50 -04:00
Aaron Pham
19bc7e3116
feat: fine-tuning [part 1] ( #23 )
2023-06-16 00:19:01 -04:00
Aaron Pham
5e1445218b
refactor: toplevel CLI ( #26 )
...
Move up CLI outside of the factory function to simplify workflow
2023-06-15 02:32:46 -04:00
aarnphm-ec2-dev
dfe71d7867
chore(cli): redirect download models into subcontext
...
utilise click subcontext for nicer CLI interaction
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-14 11:44:39 +00:00
Aaron
d7e92ae525
feat(cli): --device all --workers-per-resource
...
synonymous to the configuration arguments
add support for --device all
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-14 06:36:54 -04:00
Aaron
111d205f63
perf: faster LLM loading
...
using attrs for faster class creation opposed to metaclass
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-14 01:36:42 -04:00
Aaron Pham
dd20941050
chore: metadata ( #19 )
2023-06-13 04:09:33 -04:00
Aaron
71070b90b4
chore(metadata): fix model_id to be respected on service.py
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-12 16:04:52 -04:00
Aaron
4717989384
fix(tokenizers): allow forking by default
...
address message about forking in tokenizers
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-12 15:47:51 -04:00
Aaron Pham
f8ebb36e15
tests: fastpath ( #17 )
...
added fastpath cases for configuration and Flan-T5
fixes respecting model_id into lifecycle hooks.
update CLI to cleanup models info
2023-06-12 14:18:26 -04:00
Aaron
f8e99dd8f5
chore(configuration): clean house implementation
...
Using Attrs implementation
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-11 18:46:15 -04:00
aarnphm-ec2-dev
1847209489
feat(cli): --workers
...
provide workers-per-resource configuration on CLI for build and start
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-11 16:22:13 +00:00
aarnphm-ec2-dev
81d46ca211
feat(type): support annotations
...
openllm.LLM now supports fully typed-strict
openllm.LLM[ModelType, TokenizerType] -> self.model, self.tokenizer
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-11 14:58:17 +00:00
aarnphm-ec2-dev
2e453fb005
refactor(configuration): __config__ and perf
...
move model_ids and default_id to config class declaration,
cleanup dependencies between config and LLM implementation
lazy load module during LLM creation to llm_post_init
fix post_init hooks to run load_in_mha.
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-11 12:53:15 +00:00
aarnphm-ec2-dev
17241292da
feat(cli): show runtime implementation
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-11 05:29:40 +00:00
aarnphm-ec2-dev
8762a56093
revert: broken KeyboardInterrupt change
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-11 04:20:07 +00:00
aarnphm-ec2-dev
512cd0715c
feat(service): implementing with lifecycle hooks
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-11 04:14:18 +00:00
Aaron
6a937d8b51
feat(scheduling): custom GPU offload strategy
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-10 22:57:54 -04:00
Aaron
b22468e8c4
feat(cli): openllm models --show-available
...
show available models locally
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-10 20:46:11 -04:00
aarnphm-ec2-dev
bb37f7e238
feat(utils): lazy load modules and fix typo
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-10 22:18:37 +00:00
Aaron
05fa34f9e6
refactor: pretrained => model_id
...
I think model_id makes more sense than calling it pretrained
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-10 17:36:02 -04:00
aarnphm-ec2-dev
4db141c649
feat(gpu): support passing GPU per LLM
...
respect CUDA_VISIBLE_DEVICES and optionally --device
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-10 09:47:16 +00:00
aarnphm-ec2-dev
8fbf352ec6
docs: add more information about pretrained weights
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-10 06:58:36 +00:00
Aaron
afddaed08c
fix(perf): respect per request information
...
remove use_default_prompt_template options
add pretrained to list of start help docstring
fix flax generation config
improve flax and tensorflow implementation
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-10 02:14:13 -04:00
Chaoyu
e2b26adf2f
chore(docs): update README.md
...
See #12
2023-06-10 00:21:21 -04:00
aarnphm-ec2-dev
0f7840626d
fix(cli): make sure to allow user to pass endpointu
...
--endpoint http://0.0.0.0:3000
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-08 19:23:04 +00:00
Aaron
a84661142c
chore(cli): remove --local for query
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-08 14:53:11 -04:00
Aaron
c0418b76ec
feat(infra): add tools for managing optional-dependencies
...
based on llm config
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-08 08:57:19 -04:00
aarnphm-ec2-dev
e9e12a66a8
fix(falcon): custom load
...
This has to do with pipeline load is pretty magical and broken
on transformers
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-08 09:03:34 +00:00
Aaron
f2771bfe49
chore(cli): move back --version
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-07 03:41:50 -04:00
aarnphm-ec2-dev
170be0ebc8
fix(cli): make sure make_tag to respect config trust_remote_code
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-07 04:35:15 +00:00
Aaron
d6d2de6748
feat(cli): prune
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-06 23:24:50 -04:00
Aaron
aa50b5279e
fix(falcon): loading based on model registration
...
remove duplicate events
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-06 22:42:28 -04:00
Aaron
8823c70e5a
chore: rename variants to pretrained for consistency
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-06 18:45:45 -04:00
Aaron
f78d55f0fd
fix(cli): type handling for specific container types
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-06 17:18:25 -04:00
Aaron
b446b65642
chore(cli): remove alias and use build to be consistent with BentoML
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-06 15:51:13 -04:00
Aaron
a0749d0a80
chore: update version message
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-06 08:31:40 -04:00
Aaron
1707beb7aa
feat(cli): openllm query
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-06 08:05:13 -04:00
Aaron
64d783107d
chore(cli): update namespace and show better traceback
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-03 06:39:01 -07:00
Aaron
49cb02d2f2
perf(cli): improve printing speed that respect terminal_size
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-02 06:58:11 -07:00
aarnphm-ec2-dev
c3aeb43997
fix: generation serde
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-02 07:06:04 +00:00
aarnphm-ec2-dev
07d42daaec
fix: make sure we evolve the attribute from CLI
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-02 05:52:04 +00:00
aarnphm-ec2-dev
a94294bc65
fix: generate attrs class internally to conform with interface
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-01 19:06:06 +00:00
Aaron
84358b28cd
chore: handle KeyboardInterrupt correctly
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-01 01:23:44 -07:00
Aaron
e86dc35ec5
chore: migrate service to use JSON
...
until we have attrs io descriptor, this should do it
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-01 00:28:43 -07:00
Aaron
4e2d5e330c
refactor(cli): move CLI to address anti-pattern
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-05-31 13:53:40 -07:00
Aaron
33e7004e66
format: consistent CLI outputs
...
vendorred type-related module from bentoml._internal.types
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-05-30 14:56:11 -07:00
Aaron
fa16c67131
fix(cli): remove debug print
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-05-30 12:02:00 -07:00
Aaron Pham
01517e37c6
migration: attrs ( #7 )
...
Move configuration to attrs
Depends on https://github.com/bentoml/BentoML/pull/3906
2023-05-30 11:59:21 -07:00
Aaron
ac710dfd54
revert(perf): remove group alias
...
There is no need for this feature
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-05-28 10:04:33 -07:00