Commit Graph

89 Commits

Author SHA1 Message Date
Aaron Pham
8c2867d26d style: define experimental guidelines (#168) 2023-07-31 07:54:26 -04:00
Aaron Pham
ef94c6b98a feat(container): vLLM build and base image strategies (#142) 2023-07-31 02:44:52 -04:00
aarnphm-ec2-dev
56bf84a760 fix(ci): make sure to exclude generated _version.py
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-07-25 09:55:24 +00:00
Aaron Pham
dcd34bd381 fix(build): running bento insider container (#141)
Behaviour of `docker run` should be the same with `openllm start`
2023-07-25 04:24:28 -04:00
Aaron Pham
c391717226 feat(ci): automatic release semver + git archival installation (#143) 2023-07-25 04:18:49 -04:00
Aaron Pham
7eabcd4355 feat: vLLM integration for PagedAttention (#134) 2023-07-24 15:42:17 -04:00
dependabot[bot]
9afbdc5198 chore(deps): update bitsandbytes requirement from <0.40 to <0.42 (#137)
Updates the requirements on [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) to permit the latest version.
- [Release notes](https://github.com/TimDettmers/bitsandbytes/releases)
- [Changelog](https://github.com/TimDettmers/bitsandbytes/blob/main/CHANGELOG.md)
- [Commits](https://github.com/TimDettmers/bitsandbytes/compare/0.32.0...0.41.0)

---
updated-dependencies:
- dependency-name: bitsandbytes
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-24 07:59:50 +00:00
Aaron Pham
693631958a feat(service): provisional API (#133) 2023-07-23 02:15:39 -04:00
aarnphm-ec2-dev
e4ac0ed8b7 fix(cuda): support loading in single GPU
add available_devices for getting # of available GPUs

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-07-21 08:10:01 +00:00
Aaron Pham
f56f8ee782 feat: fine-tuning script for LlaMA 2 (#128) 2023-07-20 20:44:51 -04:00
aarnphm-ec2-dev
3e50f0a851 fix(cli): implement latest bentoml 1.0.25 features
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-07-20 20:51:27 +00:00
Aaron Pham
c1ddb9ed7c feat: GPTQ + vLLM and LlaMA (#113)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-07-19 18:12:12 -04:00
Aaron Pham
fc963c42ce fix: build isolation (#116)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-07-16 01:52:21 -04:00
HeTaoPKU
fd9ae56812 fix(baichuan): add "cpm-kernel" as additional requirements (#117)
This is to support the 13b variant of baichuan

Co-authored-by: the <tao.he@hulu.com>
Co-authored-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-07-15 23:16:05 -04:00
HeTaoPKU
09b0787306 feat(models): Baichuan (#115)
Co-authored-by: the <tao.he@hulu.com>
Co-authored-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-07-15 22:01:37 -04:00
aarnphm-ec2-dev
d37d14e52b fix(tests): mark package on CI to xfail
XXX: @aarnphm to solve build isolation when have bandwidth. Currently
this is not a problem when running locally.

`openllm build` just works, where as `openllm.build` won't work
sequentially.

Address some type stubs for jupytext

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-07-15 12:48:28 +00:00
Aaron Pham
b2dba6143f fix(resource): correctly parse CUDA_VISIBLE_DEVICES (#114) 2023-07-15 07:19:35 -04:00
aarnphm-ec2-dev
e2ae24b74c fix(tests): building not being isolated
We will need to fix this from BentoML

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-07-11 17:28:00 +00:00
aarnphm-ec2-dev
cea082e7bd fix(cli): correct prune based on metadata
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-07-11 00:34:22 +00:00
aarnphm-ec2-dev
c2bb29b4f3 fix: building mpt dependencies
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-07-11 00:21:23 +00:00
Aaron Pham
c7f4dc7bb2 feat(test): snapshot testing (#107)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-07-10 17:23:19 -04:00
Aaron Pham
fb849a384e feat: GPTNeoX (#106) 2023-07-07 03:05:40 -04:00
aarnphm-ec2-dev
4c5b27495c fix: bettertransformer check to bool already
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-07-04 17:58:02 +00:00
Aaron Pham
d6303d306a perf: fixing import custom paths and cleanup serialisation (#102) 2023-07-04 12:49:14 -04:00
Aaron Pham
8ac2755de4 feat(llm): fine-tuning Falcon (#98)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-06-30 21:25:16 -04:00
Aaron Pham
59b1d89971 feat: custom dockerfile templates (#95)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-06-30 13:44:11 -04:00
Aaron Pham
f2457fcdaf tests: add sanity check for openllm.client (#93)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-29 12:07:00 -04:00
Aaron
7e8ca79c2d chore: style [skip ci]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-29 10:54:58 -04:00
Aaron Pham
e52045eda6 fix: running MPT on CPU (#92) 2023-06-29 10:54:12 -04:00
Aaron Pham
01db504e7d feat: MPT (#91)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-06-28 23:12:15 -04:00
Aaron Pham
5a4df53490 fix(load): tokenizer and adapter within a BentoLLM (#88) 2023-06-28 15:45:25 -04:00
Aaron Pham
bd4cc9b3ff fix: loading local (#87)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-06-28 11:25:54 -04:00
Aaron Pham
db1494a6ae feat(start): starting bento and fix load (#80) 2023-06-27 12:45:17 -04:00
Aaron Pham
74fdd5e259 feat: release binary distribution (#66) 2023-06-25 10:38:03 -04:00
Aaron Pham
3593c764f0 fix(test): robustness (#64) 2023-06-24 11:10:07 -04:00
Aaron Pham
1435478f6c fix(cli): ensure we parse tag for download (#58) 2023-06-23 21:24:53 -04:00
Aaron Pham
dfca956fad feat: serve adapter layers (#52) 2023-06-23 10:07:15 -04:00
Aaron
752c2e60a5 fix: remove direct url reference
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-19 13:25:29 -04:00
Aaron
1ed0ae7787 fix(log): make sure to configure OpenLLM logs correctly
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-19 06:19:06 -04:00
Aaron
622a2fb37d fix: separate hatch config
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-19 03:39:05 -04:00
Aaron
e3fad40f21 fix(env): make tests with extra-dependencies
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-18 23:58:03 -04:00
Aaron Pham
03758a5487 fix(tools): adhere to style guidelines (#31) 2023-06-18 20:03:17 -04:00
Aaron Pham
4fcd7c8ac9 integration: HuggingFace Agent (#29)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-06-18 00:13:53 -04:00
Aaron Pham
ded8a9f809 feat: quantization (#27) 2023-06-16 18:10:50 -04:00
Aaron Pham
19bc7e3116 feat: fine-tuning [part 1] (#23) 2023-06-16 00:19:01 -04:00
Aaron
528f76e1d0 fix(client): using httpx for running calls within async context
This is so that client.query works within a async context

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-15 01:58:49 -04:00
Aaron
d07cc95ea0 ci: add hatch to dev envs
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-14 03:48:05 -04:00
Aaron
be41c23c10 codegen: remove black as dependencies
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-14 03:22:05 -04:00
Aaron Pham
dd20941050 chore: metadata (#19) 2023-06-13 04:09:33 -04:00
Aaron Pham
f8ebb36e15 tests: fastpath (#17)
added fastpath cases for configuration and Flan-T5

fixes respecting model_id into lifecycle hooks.

update CLI to cleanup models info
2023-06-12 14:18:26 -04:00