Commit Graph

100 Commits

Author SHA1 Message Date
Aaron Pham
b8a2e8cf91 refactor(cli): cleanup API (#592)
* chore: remove unused imports

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* refactor(cli): update to only need model_id

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* feat: `openllm start model-id`

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: add changelog

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: update changelog notice

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: update correct config and running tools

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: update backward compat options and treat JSON outputs
corespondingly

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-09 11:40:17 -05:00
Aaron Pham
0d88370127 infra: prepare for release 0.4.1 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-11-08 13:24:46 +00:00
Aaron Pham
e87830ef0a container: update tracing dependencies (#591)
* chore: update build message

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: add tracing dependencies to container

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-08 08:08:40 -05:00
Aaron Pham
0ea025da5a fix(cli): append model-id instruction to build (#590)
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-11-08 07:44:36 -05:00
Aaron Pham
47107727b3 feat(vllm): squeezellm (#588)
* feat(vllm): squeezellm

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* fix: correct import_model with awq and gatekeep squeezellm for PyTorch

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-11-08 07:21:27 -05:00
Aaron Pham
ff8b6377c8 fix(awq): correct awq detection for support (#586)
* fix(awq): correct detection for awq

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* chore: update base docker to work

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* chore: disable awq on pytorch for now

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* ci: auto fixes from pre-commit.ci

For more information, see https://pre-commit.ci

---------

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-11-08 06:57:11 -05:00
Aaron Pham
387637405d fix(gptq): update config fields (#585)
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-11-08 05:20:30 -05:00
Aaron Pham
85a7243ac3 fix: device imports using strategies (#584)
* fix: device imports using strategies

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* chore: support trust_remote_code for vLLM runners

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-11-08 05:10:50 -05:00
Aaron Pham
ea42108e45 chore(service): cleanup API (#579)
* chore(service): cleanup API

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: running tools

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* fix: tests import

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-08 02:53:08 -05:00
Aaron Pham
7398ae0486 refactor(strategies): move logics into openllm-python (#578)
fix(strategies): move to openllm

Strategies shouldn't be a part of openllm-core

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-08 02:23:08 -05:00
Aaron Pham
97d7c38fea refactor: cleanup typing to expose correct API (#576)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-08 01:24:03 -05:00
Aaron Pham
cfd09bfc47 chore(runner): yield the outputs directly (#573)
update openai client examples to >1

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-11-07 22:34:11 -05:00
Aaron Pham
8ffab93d39 infra: prepare for release 0.4.0 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-11-07 22:42:22 +00:00
Aaron Pham
4d356f4b72 feat: Mistral support (#571)
* feat: Mistral support

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* ci: auto fixes from pre-commit.ci

For more information, see https://pre-commit.ci

* chore: fix style

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: update README docs about mistral

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-11-07 17:28:02 -05:00
Aaron Pham
dc27b0e727 fix: update build dependencies and format chat prompt (#569)
chore: update correct check and format prompt

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-11-07 16:42:20 -05:00
Aaron Pham
8fade070f3 infra: update docs on serving fine-tuning layers (#567)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-06 21:34:44 -05:00
Aaron Pham
e2029c934b perf: unify LLM interface (#518)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-11-06 20:39:43 -05:00
Aaron Pham
729d47a86c infra: prepare for release 0.3.14 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-11-04 09:05:08 +00:00
Aaron Pham
72c6005d3b chore(inference): update vllm to 0.2.1.post1 and update config parsing (#554)
chore(dependencies): update vllm to 0.2.1.post1 and update config
parsing
2023-11-04 04:01:56 -04:00
XunchaoZ
440e3d646f fix: Max new tokens (#550)
Bug fix for retrieving user input max_new_tokens
2023-11-03 13:44:25 -04:00
Aaron Pham
e33cd77ee3 infra: prepare for release 0.3.13 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-10-31 05:24:40 +00:00
Aaron Pham
cb451f6309 infra: prepare for release 0.3.12 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-10-30 21:55:48 +00:00
XunchaoZ
392c7a8139 Fix chat template and message list bug (#549) 2023-10-30 14:28:42 -07:00
Aaron Pham
b66a3d34b3 infra: prepare for release 0.3.10 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-10-30 07:23:35 +00:00
XunchaoZ
022130d0ac fix(openai): Chat templates (#519)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: Aaron <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-10-30 03:20:43 -04:00
Aaron Pham
ae664d3b49 infra: prepare for release 0.3.9 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-10-17 06:01:32 +00:00
Aaron
aedb1e4843 fix: correct classes for regression
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-10-17 02:00:11 -04:00
Aaron Pham
607d7f5f12 infra: prepare for release 0.3.8 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-10-16 21:36:10 +00:00
Aaron Pham
d59a8860df fix(build): check for parity (#508) 2023-10-16 17:33:47 -04:00
XunchaoZ
d9183267dc feat: openai.Model.list() (#499)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-10-14 16:33:49 -04:00
Aaron Pham
c1ca7ccd3b fix(breaking): remove embeddings and update client implementation (#500) 2023-10-14 16:04:35 -04:00
Aaron Pham
62e23f78ac infra: prepare for release 0.3.7 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-10-12 21:24:02 +00:00
Aaron Pham
1539c3f7dc feat(client): simple implementation and streaming (#256) 2023-10-12 17:21:54 -04:00
Aaron
60bc0bd4a0 infra: make github recognize this as a Pip packages [skip ci]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-10-12 06:32:07 -04:00
aarnphm-ec2-dev
65c76cace3 chore: update deps for transformers and vllm
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-10-11 04:28:46 +00:00
Zhao Shenyang
bf96570eab fix: do not reply on env var for built bento/docker (#477)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-10-10 12:29:20 -04:00
Aaron
625b82a0fc fix(style): remove weird break on split item
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-10-07 02:21:31 -04:00
XunchaoZ
04bb29a264 feat: OpenAI-compatible API (#417)
Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-10-07 00:50:03 -04:00
Aaron
b43fabfff8 fix(playground): eager import jupytext
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-10-04 19:24:03 -04:00
Aaron
d2a2af3ee2 fix: import nbformat for playground
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-10-04 19:21:14 -04:00
MingLiangDai
a0e0f81306 feat: PromptTemplate and system prompt support (#407)
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-10-03 09:53:37 -04:00
Aaron Pham
398e6b3856 infra: prepare for release 0.3.6 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-09-19 07:09:13 +00:00
Aaron Pham
3b2ac1cd59 feat: support continuous batching on generate (#375)
* feat: support continuous batching on `generate`

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: add changelog

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-09-19 03:04:59 -04:00
Aaron Pham
4662f7008a infra: prepare for release 0.3.5 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-09-18 06:29:22 +00:00
Aaron Pham
5a1fcc9cd5 fix: set default serialisation methods (#355)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-09-18 02:26:53 -04:00
Aaron Pham
52adaeeb18 infra: prepare for release 0.3.4 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-09-14 07:47:15 +00:00
Aaron Pham
a32cf324d8 fix(prompt): correct export extra objects items (#351)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-09-14 03:42:28 -04:00
Aaron Pham
ad9107958d feat: continuous batching with vLLM (#349)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* feat: continuous batching

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>

* chore: add changeloe

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>

* chore: add one shot generation

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-09-14 03:09:36 -04:00
Aaron Pham
35e6945e86 fix(serialisation): vLLM safetensors support (#324)
* fix(serilisation): vllm support for safetensors

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>

* chore: running tools

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: generalize one shot generation

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: add changelog [skip ci]

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2023-09-12 17:44:01 -04:00
Alan Poulain
88d7ba7ca8 fix(vllm): Make sure to use max number of GPUs available (#326)
* fix(serving): vllm bad num_gpus

Signed-off-by: Alan Poulain <contact@alanpoulain.eu>

* ci: auto fixes from pre-commit.ci

For more information, see https://pre-commit.ci

---------

Signed-off-by: Alan Poulain <contact@alanpoulain.eu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-09-12 12:45:00 -04:00