Aaron Pham
2cc264aa72
fix(vllm): correctly load given model id from envvar ( #181 )
2023-08-03 16:34:35 -04:00
Aaron Pham
cfc7f3888d
chore(vllm): add all supported models ( #179 )
2023-08-02 17:42:02 -04:00
pre-commit-ci[bot]
c2ed1d56da
chore(release): update base container restriction ( #173 )
...
Prepare for 0.2.12 release
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-08-01 15:25:17 -04:00
Aaron
ca5e3c7ae5
fix: correct setup property for envvar instance
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-07-31 23:34:42 -04:00
Aaron Pham
8c2867d26d
style: define experimental guidelines ( #168 )
2023-07-31 07:54:26 -04:00
Aaron Pham
ef94c6b98a
feat(container): vLLM build and base image strategies ( #142 )
2023-07-31 02:44:52 -04:00
Aaron Pham
c391717226
feat(ci): automatic release semver + git archival installation ( #143 )
2023-07-25 04:18:49 -04:00
Aaron Pham
7eabcd4355
feat: vLLM integration for PagedAttention ( #134 )
2023-07-24 15:42:17 -04:00
Aaron Pham
81b0451685
feat(cli): query with per request instruction ( #130 )
2023-07-21 13:57:21 -04:00
aarnphm-ec2-dev
e4ac0ed8b7
fix(cuda): support loading in single GPU
...
add available_devices for getting # of available GPUs
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-07-21 08:10:01 +00:00
Aaron Pham
f56f8ee782
feat: fine-tuning script for LlaMA 2 ( #128 )
2023-07-20 20:44:51 -04:00
Aaron Pham
858c2007c3
feat: revision parsed via model_id ( #126 )
2023-07-20 14:36:53 -04:00
Aaron Pham
c1ddb9ed7c
feat: GPTQ + vLLM and LlaMA ( #113 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-07-19 18:12:12 -04:00
HeTaoPKU
fd9ae56812
fix(baichuan): add "cpm-kernel" as additional requirements ( #117 )
...
This is to support the 13b variant of baichuan
Co-authored-by: the <tao.he@hulu.com >
Co-authored-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-07-15 23:16:05 -04:00
HeTaoPKU
09b0787306
feat(models): Baichuan ( #115 )
...
Co-authored-by: the <tao.he@hulu.com >
Co-authored-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-07-15 22:01:37 -04:00
aarnphm-ec2-dev
d37d14e52b
fix(tests): mark package on CI to xfail
...
XXX: @aarnphm to solve build isolation when have bandwidth. Currently
this is not a problem when running locally.
`openllm build` just works, where as `openllm.build` won't work
sequentially.
Address some type stubs for jupytext
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-07-15 12:48:28 +00:00
Aaron Pham
b2dba6143f
fix(resource): correctly parse CUDA_VISIBLE_DEVICES ( #114 )
2023-07-15 07:19:35 -04:00
aarnphm-ec2-dev
c2bb29b4f3
fix: building mpt dependencies
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-07-11 00:21:23 +00:00
Aaron Pham
c7f4dc7bb2
feat(test): snapshot testing ( #107 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-07-10 17:23:19 -04:00
Aaron Pham
fb849a384e
feat: GPTNeoX ( #106 )
2023-07-07 03:05:40 -04:00
Aaron Pham
d6303d306a
perf: fixing import custom paths and cleanup serialisation ( #102 )
2023-07-04 12:49:14 -04:00
Aaron Pham
8ac2755de4
feat(llm): fine-tuning Falcon ( #98 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-06-30 21:25:16 -04:00
Aaron Pham
e52045eda6
fix: running MPT on CPU ( #92 )
2023-06-29 10:54:12 -04:00
Aaron Pham
01db504e7d
feat: MPT ( #91 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-06-28 23:12:15 -04:00
Aaron Pham
db1494a6ae
feat(start): starting bento and fix load ( #80 )
2023-06-27 12:45:17 -04:00
Aaron Pham
74fdd5e259
feat: release binary distribution ( #66 )
2023-06-25 10:38:03 -04:00
Aaron Pham
3593c764f0
fix(test): robustness ( #64 )
2023-06-24 11:10:07 -04:00
Aaron Pham
98328be394
peft(models): improve implementation ( #60 )
...
If you have a local Dolly-V2 version, please do `openllm prune`
2023-06-24 05:22:18 -04:00
Aaron Pham
1435478f6c
fix(cli): ensure we parse tag for download ( #58 )
2023-06-23 21:24:53 -04:00
Aaron Pham
dfca956fad
feat: serve adapter layers ( #52 )
2023-06-23 10:07:15 -04:00
Aaron
1ed0ae7787
fix(log): make sure to configure OpenLLM logs correctly
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-19 06:19:06 -04:00
Aaron Pham
03758a5487
fix(tools): adhere to style guidelines ( #31 )
2023-06-18 20:03:17 -04:00
Aaron Pham
4fcd7c8ac9
integration: HuggingFace Agent ( #29 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-06-18 00:13:53 -04:00
Aaron Pham
6f724416c0
perf: build quantization and better transformer behaviour ( #28 )
...
Fixes quantization_config and low_cpu_mem_usage to be available on PyTorch implementation only
See changelog for more details on #28
2023-06-17 08:56:14 -04:00
Aaron
233d4697b5
chore: update __all__ to take into _extra_objects
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-16 18:13:35 -04:00
Aaron Pham
ded8a9f809
feat: quantization ( #27 )
2023-06-16 18:10:50 -04:00
Aaron Pham
19bc7e3116
feat: fine-tuning [part 1] ( #23 )
2023-06-16 00:19:01 -04:00
Aaron
528f76e1d0
fix(client): using httpx for running calls within async context
...
This is so that client.query works within a async context
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-15 01:58:49 -04:00
Aaron
be41c23c10
codegen: remove black as dependencies
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-14 03:22:05 -04:00
Aaron
111d205f63
perf: faster LLM loading
...
using attrs for faster class creation opposed to metaclass
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-14 01:36:42 -04:00
Aaron
cb76a894cf
feat(metadata): add configuration to metadata endpoint
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-13 07:09:31 -04:00
Aaron
71070b90b4
chore(metadata): fix model_id to be respected on service.py
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-12 16:04:52 -04:00
Aaron Pham
f8ebb36e15
tests: fastpath ( #17 )
...
added fastpath cases for configuration and Flan-T5
fixes respecting model_id into lifecycle hooks.
update CLI to cleanup models info
2023-06-12 14:18:26 -04:00
Aaron
f8e99dd8f5
chore(configuration): clean house implementation
...
Using Attrs implementation
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-11 18:46:15 -04:00
aarnphm-ec2-dev
81d46ca211
feat(type): support annotations
...
openllm.LLM now supports fully typed-strict
openllm.LLM[ModelType, TokenizerType] -> self.model, self.tokenizer
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-11 14:58:17 +00:00
aarnphm-ec2-dev
2e453fb005
refactor(configuration): __config__ and perf
...
move model_ids and default_id to config class declaration,
cleanup dependencies between config and LLM implementation
lazy load module during LLM creation to llm_post_init
fix post_init hooks to run load_in_mha.
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-11 12:53:15 +00:00
aarnphm-ec2-dev
17241292da
feat(cli): show runtime implementation
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-11 05:29:40 +00:00
aarnphm-ec2-dev
bb37f7e238
feat(utils): lazy load modules and fix typo
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-10 22:18:37 +00:00
Aaron
05fa34f9e6
refactor: pretrained => model_id
...
I think model_id makes more sense than calling it pretrained
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-10 17:36:02 -04:00
Aaron
afddaed08c
fix(perf): respect per request information
...
remove use_default_prompt_template options
add pretrained to list of start help docstring
fix flax generation config
improve flax and tensorflow implementation
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-10 02:14:13 -04:00