Commit Graph

72 Commits

Author SHA1 Message Date
Aaron Pham
d59a8860df fix(build): check for parity (#508) 2023-10-16 17:33:47 -04:00
XunchaoZ
d9183267dc feat: openai.Model.list() (#499)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-10-14 16:33:49 -04:00
Aaron Pham
c1ca7ccd3b fix(breaking): remove embeddings and update client implementation (#500) 2023-10-14 16:04:35 -04:00
Aaron Pham
62e23f78ac infra: prepare for release 0.3.7 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-10-12 21:24:02 +00:00
Aaron Pham
1539c3f7dc feat(client): simple implementation and streaming (#256) 2023-10-12 17:21:54 -04:00
Aaron
60bc0bd4a0 infra: make github recognize this as a Pip packages [skip ci]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-10-12 06:32:07 -04:00
aarnphm-ec2-dev
65c76cace3 chore: update deps for transformers and vllm
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-10-11 04:28:46 +00:00
Zhao Shenyang
bf96570eab fix: do not reply on env var for built bento/docker (#477)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-10-10 12:29:20 -04:00
Aaron
625b82a0fc fix(style): remove weird break on split item
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-10-07 02:21:31 -04:00
XunchaoZ
04bb29a264 feat: OpenAI-compatible API (#417)
Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-10-07 00:50:03 -04:00
Aaron
b43fabfff8 fix(playground): eager import jupytext
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-10-04 19:24:03 -04:00
Aaron
d2a2af3ee2 fix: import nbformat for playground
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-10-04 19:21:14 -04:00
MingLiangDai
a0e0f81306 feat: PromptTemplate and system prompt support (#407)
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-10-03 09:53:37 -04:00
Aaron Pham
398e6b3856 infra: prepare for release 0.3.6 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-09-19 07:09:13 +00:00
Aaron Pham
3b2ac1cd59 feat: support continuous batching on generate (#375)
* feat: support continuous batching on `generate`

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: add changelog

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-09-19 03:04:59 -04:00
Aaron Pham
4662f7008a infra: prepare for release 0.3.5 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-09-18 06:29:22 +00:00
Aaron Pham
5a1fcc9cd5 fix: set default serialisation methods (#355)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-09-18 02:26:53 -04:00
Aaron Pham
52adaeeb18 infra: prepare for release 0.3.4 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-09-14 07:47:15 +00:00
Aaron Pham
a32cf324d8 fix(prompt): correct export extra objects items (#351)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-09-14 03:42:28 -04:00
Aaron Pham
ad9107958d feat: continuous batching with vLLM (#349)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* feat: continuous batching

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>

* chore: add changeloe

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>

* chore: add one shot generation

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-09-14 03:09:36 -04:00
Aaron Pham
35e6945e86 fix(serialisation): vLLM safetensors support (#324)
* fix(serilisation): vllm support for safetensors

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>

* chore: running tools

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: generalize one shot generation

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: add changelog [skip ci]

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2023-09-12 17:44:01 -04:00
Alan Poulain
88d7ba7ca8 fix(vllm): Make sure to use max number of GPUs available (#326)
* fix(serving): vllm bad num_gpus

Signed-off-by: Alan Poulain <contact@alanpoulain.eu>

* ci: auto fixes from pre-commit.ci

For more information, see https://pre-commit.ci

---------

Signed-off-by: Alan Poulain <contact@alanpoulain.eu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-09-12 12:45:00 -04:00
Aaron Pham
fddd0bf95e feat: bootstrap documentation site (#252)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: GutZuFusss <leon.ikinger@googlemail.com>
Co-authored-by: GutZuFusss <leon.ikinger@googlemail.com>
2023-09-12 12:28:29 -04:00
aarnphm-ec2-dev
8530a067ea chore(serialisation): dump quantization_config.json to conform with
optimum load

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-09-07 16:50:50 +00:00
Aaron
0d50aa00b9 chore: add openllm-core as meta dependencies
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-09-07 10:31:40 -04:00
Aaron Pham
7e2c8428bb infra: prepare for release 0.3.3 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-09-07 01:51:18 +00:00
aarnphm-ec2-dev
8173cb09a5 fix(quantize): dyn quant for int8 and int4
only set tokenizer when it is gptq

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-09-07 01:48:45 +00:00
Aaron Pham
fd18c8be01 infra: prepare for release 0.3.2 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-09-06 19:03:40 +00:00
Aaron
675b372981 fix: synchronize device for inference
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-09-06 14:08:52 -04:00
Aaron Pham
b61005424b infra: prepare for release 0.3.1 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-09-06 17:49:58 +00:00
Aaron
887ffa9aa0 chore: cleanup pre-commit jobs and update usage
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-09-05 10:06:36 -04:00
aarnphm-ec2-dev
f43c721579 chore: only add bentomodel branch during generated service with
OpenLLM

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-09-05 01:08:23 +00:00
Aaron Pham
06a68ade7d infra: prepare for release 0.3.0 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-09-04 19:03:41 +00:00
Aaron
5eea40a599 chore(readme): update README for release [generated] [skip ci]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-09-04 15:02:01 -04:00
Aaron Pham
956b3a53bc fix(gptq): use upstream integration (#297)
* wip

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>

* feat: GPTQ transformers integration

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>

* fix: only load if variable is available and add changelog

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>

* chore: remove boilerplate check

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-09-04 14:05:50 -04:00
aarnphm-ec2-dev
7d893e6cd2 chore: ignore new lines split [skip ci]
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-09-01 17:00:49 +00:00
Aaron Pham
608de0b667 fix(serving): vllm distributed size (#285)
* chore(weights): ignore gguf pattern for non GGML backend

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>

* chore: correct fix num_gpus to be divisble by 2

This depends on the attention_heads from given models

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-09-01 12:37:10 -04:00
Aaron Pham
b7af7765d4 fix(yapf): align weird new lines break [generated] [skip ci] (#284)
fix(yapf): align weird new lines break

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-09-01 05:34:22 -04:00
Aaron Pham
3e45530abd refactor(breaking): unify LLM API (#283)
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-09-01 05:15:19 -04:00
Aaron
b545ad2ad1 style: google
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-30 13:52:35 -04:00
Aaron Pham
c9cef1d773 fix: persistent styling between ruff and yapf (#279) 2023-08-30 11:37:41 -04:00
Aaron Pham
2036d4e015 chore(build): use latest vllm pre-built kernel (#261) 2023-08-26 09:02:52 -04:00
aarnphm-ec2-dev
806a663e4a chore(style): add one blank line
to conform with Google style

Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-08-26 11:36:57 +00:00
Aaron Pham
938fd362bb feat(vllm): streaming (#260) 2023-08-26 07:27:32 -04:00
Aaron Pham
46c8904806 cron(style): run formatter [generated] [skip ci] (#257) 2023-08-25 06:38:59 -04:00
Aaron Pham
816bfdcc19 infra: prepare for release 0.2.27 [generated] [skip ci]
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-08-25 09:28:49 +00:00
aarnphm-ec2-dev
dae38cdba1 chore: update external dependencies [skip ci]
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
2023-08-25 09:27:26 +00:00
Aaron Pham
08dc6ed2ba chore: ignore peft and fix adapter loading issue (#255)
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-08-25 04:36:35 -04:00
Aaron
787ce1b3b6 chore(style): synchronized style across packages [skip ci]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-08-23 08:46:22 -04:00
Aaron Pham
bbd9aa7646 refactor(contrib): similar namespace [clojure-ui build] (#251) 2023-08-23 00:21:59 -04:00