Commit Graph

60 Commits

Author SHA1 Message Date
paperspace
526a770a06 chore: update base requirements to 0.4.2
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2024-05-08 18:46:13 +00:00
Aaron Pham
5c0d2787c0 feat: add dbrx support
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2024-04-02 04:10:19 +00:00
Aaron Pham
072b3e97ec feat: 1.2 APIs (#821)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-03-15 03:49:19 -04:00
Zhao Shenyang
4dc4c45c4a Bump BentoML version in tools (#884) 2024-02-05 05:24:02 +08:00
Aaron Pham
79da419d87 chore(deps): bump vllm to 0.2.7 (#837)
* chore(deps): bump vllm to 0.2.7

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: update changelog

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2024-01-08 14:41:58 -05:00
Aaron Pham
8d63afc9ce feat(vllm): support GPTQ with 0.2.6 (#797)
* feat(vllm): GPTQ support passthrough

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: run scripts

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* fix(install): set order of xformers before vllm

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* feat: support GPTQ with vLLM

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-12-18 12:41:19 -05:00
Aaron Pham
88b6d3d6de perf: upgrade mixtral to use expert parallelism (#783)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-12-15 11:45:08 -05:00
Aaron Pham
d3328343d7 feat: mixtral support (#770)
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-12-12 01:33:13 -05:00
Aaron
59e8ef93dc chore(deps): lock vLLM to 0.2.4
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-12-12 00:17:18 -05:00
Aaron Pham
5442d9cd10 fix(trust_remote_code): handle args correctly (#727)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-22 17:03:13 -05:00
Aaron Pham
6505abdb44 chore: update lower bound version of bentoml to avoid breakage (#703)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-19 23:09:14 -05:00
Aaron
cb4386b013 fix(release): remove unecessary check for client dependencies [skip ci]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-19 10:39:38 -05:00
Aaron Pham
539f250c0f feat(vllm): bump to 0.2.2 (#695)
* feat(vllm): bump to 0.2.2

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: update changelog

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: move up to CUDA 12.1

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* fix: remove auto-gptq installation

since the builder image doesn't have access to GPU

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* fix: update containerization warning

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-19 02:52:32 -05:00
Aaron Pham
206521e02d feat(ctranslate): initial infrastructure support (#694)
* perf: compact and improve speed and agility

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* --wip--

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: cleanup infrastructure

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: update styles notes and autogen mypy configuration

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-19 01:48:33 -05:00
Aaron Pham
c03e3bebb3 fix(infra): prepare correct dependencies for release [skip ci] (#687)
fix(infra): prepare correct dependencies for release

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-17 16:05:46 -05:00
Aaron Pham
80ed400646 fix(build): lock lower version based on each release and update infra (#686)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-17 15:57:31 -05:00
Aaron Pham
6102a67a83 infra: makes huggingface-hub requirements on fine-tune (#665)
infra: makes huggingface-hub core deps

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-16 03:12:52 -05:00
Aaron Pham
4a6f13ddd2 feat(type): provide structured annotations stubs (#663)
* feat(type): provide client stubs

separation of concern for more brevity code base

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* docs: update changelog

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-16 02:58:45 -05:00
Aaron Pham
0bf6ec7537 fix(dependencies): lock build < 1 for now (#643)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-14 00:36:08 -05:00
Aaron Pham
e0632a85ed refactor(cli): move out to its own packages (#619)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-12 18:25:44 -05:00
Aaron Pham
ac377fe490 infra: using ruff formatter (#594)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-09 12:44:05 -05:00
Aaron Pham
b8a2e8cf91 refactor(cli): cleanup API (#592)
* chore: remove unused imports

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* refactor(cli): update to only need model_id

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* feat: `openllm start model-id`

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: add changelog

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: update changelog notice

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: update correct config and running tools

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: update backward compat options and treat JSON outputs
corespondingly

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-09 11:40:17 -05:00
Aaron Pham
ff8b6377c8 fix(awq): correct awq detection for support (#586)
* fix(awq): correct detection for awq

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* chore: update base docker to work

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* chore: disable awq on pytorch for now

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* ci: auto fixes from pre-commit.ci

For more information, see https://pre-commit.ci

---------

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-11-08 06:57:11 -05:00
Aaron Pham
cfd09bfc47 chore(runner): yield the outputs directly (#573)
update openai client examples to >1

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-11-07 22:34:11 -05:00
Aaron Pham
e2029c934b perf: unify LLM interface (#518)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-11-06 20:39:43 -05:00
Aaron Pham
72c6005d3b chore(inference): update vllm to 0.2.1.post1 and update config parsing (#554)
chore(dependencies): update vllm to 0.2.1.post1 and update config
parsing
2023-11-04 04:01:56 -04:00
aarnphm-ec2-dev
65c76cace3 chore: update deps for transformers and vllm
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-10-11 04:28:46 +00:00
Aaron
625b82a0fc fix(style): remove weird break on split item
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-10-07 02:21:31 -04:00
XunchaoZ
04bb29a264 feat: OpenAI-compatible API (#417)
Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-10-07 00:50:03 -04:00
Aaron Pham
ad9107958d feat: continuous batching with vLLM (#349)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* feat: continuous batching

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>

* chore: add changeloe

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>

* chore: add one shot generation

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-09-14 03:09:36 -04:00
Aaron Pham
35e6945e86 fix(serialisation): vLLM safetensors support (#324)
* fix(serilisation): vllm support for safetensors

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>

* chore: running tools

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: generalize one shot generation

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: add changelog [skip ci]

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2023-09-12 17:44:01 -04:00
aarnphm-ec2-dev
c7f915fa71 chore: update documentation wrt to envvar correctness
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-09-08 17:43:03 +00:00
Aaron
0d50aa00b9 chore: add openllm-core as meta dependencies
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-09-07 10:31:40 -04:00
Aaron Pham
956b3a53bc fix(gptq): use upstream integration (#297)
* wip

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>

* feat: GPTQ transformers integration

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>

* fix: only load if variable is available and add changelog

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>

* chore: remove boilerplate check

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-09-04 14:05:50 -04:00
aarnphm-ec2-dev
7d893e6cd2 chore: ignore new lines split [skip ci]
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-09-01 17:00:49 +00:00
Aaron Pham
b7af7765d4 fix(yapf): align weird new lines break [generated] [skip ci] (#284)
fix(yapf): align weird new lines break

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-09-01 05:34:22 -04:00
Aaron
b545ad2ad1 style: google
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-30 13:52:35 -04:00
Aaron Pham
c9cef1d773 fix: persistent styling between ruff and yapf (#279) 2023-08-30 11:37:41 -04:00
Aaron Pham
2036d4e015 chore(build): use latest vllm pre-built kernel (#261) 2023-08-26 09:02:52 -04:00
aarnphm-ec2-dev
806a663e4a chore(style): add one blank line
to conform with Google style

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-08-26 11:36:57 +00:00
aarnphm-ec2-dev
dae38cdba1 chore: update external dependencies [skip ci]
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-08-25 09:27:26 +00:00
Aaron Pham
3ffb25a872 refactor: packages (#249) 2023-08-22 08:55:46 -04:00
Aaron Pham
cd872ef631 refactor: monorepo (#203) 2023-08-15 02:11:14 -04:00
Aaron Pham
f6317d8003 infra: enable compiled wheels for all supported Python (#201) 2023-08-12 04:54:50 -04:00
Aaron Pham
5329853b10 perf: compiled modules and enable lazyeval (#200) 2023-08-11 05:53:45 -04:00
aarnphm-ec2-dev
dfc4b489c5 feat(build): notes on compiled wheels for Bento
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-08-09 21:52:34 +00:00
Aaron Pham
b1445c6516 refactor(cli): compiled wheels and extension modules (#191) 2023-08-09 17:10:15 -04:00
Aaron Pham
b9dd54f634 feat: homebrew tap (#190) 2023-08-08 22:11:48 -04:00
Aaron Pham
2541a0f8dc infra: initial work on compiling mypyc wheels (#182) 2023-08-04 10:20:03 -04:00
Aaron Pham
8c2867d26d style: define experimental guidelines (#168) 2023-07-31 07:54:26 -04:00