Aaron Pham
8c2867d26d
style: define experimental guidelines ( #168 )
2023-07-31 07:54:26 -04:00
Aaron Pham
ef94c6b98a
feat(container): vLLM build and base image strategies ( #142 )
2023-07-31 02:44:52 -04:00
Aaron Pham
60c725a21f
ci: release PyPI before building binary ( #138 )
2023-07-24 16:39:51 -04:00
Aaron Pham
7eabcd4355
feat: vLLM integration for PagedAttention ( #134 )
2023-07-24 15:42:17 -04:00
aarnphm-ec2-dev
e4ac0ed8b7
fix(cuda): support loading in single GPU
...
add available_devices for getting # of available GPUs
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-07-21 08:10:01 +00:00
Aaron Pham
1b3508619e
feat(llama): add default prompt for LlaMA-2 ( #122 )
2023-07-20 07:46:33 -04:00
aarnphm-ec2-dev
8b340559aa
fix(tests): skip running models tests on CI
...
The runners don't have enough space to run all tests
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-07-19 22:40:40 +00:00
Aaron Pham
c1ddb9ed7c
feat: GPTQ + vLLM and LlaMA ( #113 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-07-19 18:12:12 -04:00
aarnphm-ec2-dev
5bb95652db
chore(ci): skip large models
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-07-16 05:53:53 +00:00
Aaron Pham
fc963c42ce
fix: build isolation ( #116 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-07-16 01:52:21 -04:00
HeTaoPKU
09b0787306
feat(models): Baichuan ( #115 )
...
Co-authored-by: the <tao.he@hulu.com >
Co-authored-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-07-15 22:01:37 -04:00
aarnphm-ec2-dev
d37d14e52b
fix(tests): mark package on CI to xfail
...
XXX: @aarnphm to solve build isolation when have bandwidth. Currently
this is not a problem when running locally.
`openllm build` just works, where as `openllm.build` won't work
sequentially.
Address some type stubs for jupytext
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-07-15 12:48:28 +00:00
Aaron Pham
b2dba6143f
fix(resource): correctly parse CUDA_VISIBLE_DEVICES ( #114 )
2023-07-15 07:19:35 -04:00
aarnphm-ec2-dev
e2ae24b74c
fix(tests): building not being isolated
...
We will need to fix this from BentoML
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-07-11 17:28:00 +00:00
Aaron Pham
c7f4dc7bb2
feat(test): snapshot testing ( #107 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-07-10 17:23:19 -04:00
Aaron
775c8c15a5
fix(tests): make sure the model is available on runner
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-07-04 15:11:58 -04:00
aarnphm-ec2-dev
4c5b27495c
fix: bettertransformer check to bool already
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-07-04 17:58:02 +00:00
Aaron Pham
d6303d306a
perf: fixing import custom paths and cleanup serialisation ( #102 )
2023-07-04 12:49:14 -04:00
Aaron Pham
8ac2755de4
feat(llm): fine-tuning Falcon ( #98 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-06-30 21:25:16 -04:00
Aaron Pham
59b1d89971
feat: custom dockerfile templates ( #95 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-06-30 13:44:11 -04:00
Aaron Pham
f2457fcdaf
tests: add sanity check for openllm.client ( #93 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-29 12:07:00 -04:00
Aaron Pham
bd4cc9b3ff
fix: loading local ( #87 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-06-28 11:25:54 -04:00
aarnphm-ec2-dev
698d929522
tests: add dict protocol cases
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-28 04:25:09 +00:00
Aaron Pham
db1494a6ae
feat(start): starting bento and fix load ( #80 )
2023-06-27 12:45:17 -04:00
Aaron Pham
d544764386
feat: cascading resource strategies ( #72 )
2023-06-26 17:38:49 -04:00
Aaron Pham
74fdd5e259
feat: release binary distribution ( #66 )
2023-06-25 10:38:03 -04:00
Aaron Pham
3593c764f0
fix(test): robustness ( #64 )
2023-06-24 11:10:07 -04:00
Aaron Pham
98328be394
peft(models): improve implementation ( #60 )
...
If you have a local Dolly-V2 version, please do `openllm prune`
2023-06-24 05:22:18 -04:00
Aaron Pham
1435478f6c
fix(cli): ensure we parse tag for download ( #58 )
2023-06-23 21:24:53 -04:00
Aaron Pham
a30eebd56f
feat(config): new class generation ( #51 )
...
allow set up new class derived from base class with `model_derivate`.
2023-06-23 01:15:38 -04:00
Aaron Pham
03758a5487
fix(tools): adhere to style guidelines ( #31 )
2023-06-18 20:03:17 -04:00
aarnphm-ec2-dev
fe8da4e8a9
fix(tests): ensure_available on tests
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-06-17 15:12:28 +00:00
Aaron Pham
6f724416c0
perf: build quantization and better transformer behaviour ( #28 )
...
Fixes quantization_config and low_cpu_mem_usage to be available on PyTorch implementation only
See changelog for more details on #28
2023-06-17 08:56:14 -04:00
Aaron Pham
ded8a9f809
feat: quantization ( #27 )
2023-06-16 18:10:50 -04:00
Aaron Pham
19bc7e3116
feat: fine-tuning [part 1] ( #23 )
2023-06-16 00:19:01 -04:00
Aaron Pham
f8ebb36e15
tests: fastpath ( #17 )
...
added fastpath cases for configuration and Flan-T5
fixes respecting model_id into lifecycle hooks.
update CLI to cleanup models info
2023-06-12 14:18:26 -04:00
Aaron
ec941c95d5
chore: add license header
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-06-04 16:22:37 -07:00
Chaoyu
dd8b6050b2
feat: FLAN-T5 supports
...
- add infrastructure, to be implemented: cache, chat history
- Base Runnable Implementation, that fits LangChain API
- Added a Prompt descriptor and utils.
feat: license headers and auto factory impl and CLI
Auto construct args from pydantic config
Add auto factory for ease of use
only provide `/generate` to streamline UX experience
CLI > envvar > input contract for configuration
fix: serve from a thread
fix CLI args
chore: cleanup names and refactor imports
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-05-03 17:50:14 -07:00