Aaron Pham
8fade070f3
infra: update docs on serving fine-tuning layers ( #567 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-06 21:34:44 -05:00
Aaron Pham
e2029c934b
perf: unify LLM interface ( #518 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-11-06 20:39:43 -05:00
Aaron Pham
729d47a86c
infra: prepare for release 0.3.14 [generated] [skip ci]
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-04 09:05:08 +00:00
Aaron Pham
72c6005d3b
chore(inference): update vllm to 0.2.1.post1 and update config parsing ( #554 )
...
chore(dependencies): update vllm to 0.2.1.post1 and update config
parsing
2023-11-04 04:01:56 -04:00
XunchaoZ
440e3d646f
fix: Max new tokens ( #550 )
...
Bug fix for retrieving user input max_new_tokens
2023-11-03 13:44:25 -04:00
Aaron Pham
e33cd77ee3
infra: prepare for release 0.3.13 [generated] [skip ci]
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-10-31 05:24:40 +00:00
Aaron Pham
cb451f6309
infra: prepare for release 0.3.12 [generated] [skip ci]
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-10-30 21:55:48 +00:00
XunchaoZ
392c7a8139
Fix chat template and message list bug ( #549 )
2023-10-30 14:28:42 -07:00
Aaron Pham
b66a3d34b3
infra: prepare for release 0.3.10 [generated] [skip ci]
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-10-30 07:23:35 +00:00
XunchaoZ
022130d0ac
fix(openai): Chat templates ( #519 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-10-30 03:20:43 -04:00
Aaron Pham
ae664d3b49
infra: prepare for release 0.3.9 [generated] [skip ci]
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-10-17 06:01:32 +00:00
Aaron
aedb1e4843
fix: correct classes for regression
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-10-17 02:00:11 -04:00
Aaron Pham
607d7f5f12
infra: prepare for release 0.3.8 [generated] [skip ci]
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-10-16 21:36:10 +00:00
Aaron Pham
d59a8860df
fix(build): check for parity ( #508 )
2023-10-16 17:33:47 -04:00
XunchaoZ
d9183267dc
feat: openai.Model.list() ( #499 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-10-14 16:33:49 -04:00
Aaron Pham
c1ca7ccd3b
fix(breaking): remove embeddings and update client implementation ( #500 )
2023-10-14 16:04:35 -04:00
Aaron Pham
62e23f78ac
infra: prepare for release 0.3.7 [generated] [skip ci]
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-10-12 21:24:02 +00:00
Aaron Pham
1539c3f7dc
feat(client): simple implementation and streaming ( #256 )
2023-10-12 17:21:54 -04:00
Aaron
60bc0bd4a0
infra: make github recognize this as a Pip packages [skip ci]
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-10-12 06:32:07 -04:00
aarnphm-ec2-dev
65c76cace3
chore: update deps for transformers and vllm
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-10-11 04:28:46 +00:00
Zhao Shenyang
bf96570eab
fix: do not reply on env var for built bento/docker ( #477 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-10-10 12:29:20 -04:00
Aaron
625b82a0fc
fix(style): remove weird break on split item
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-10-07 02:21:31 -04:00
XunchaoZ
04bb29a264
feat: OpenAI-compatible API ( #417 )
...
Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-10-07 00:50:03 -04:00
Aaron
b43fabfff8
fix(playground): eager import jupytext
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-10-04 19:24:03 -04:00
Aaron
d2a2af3ee2
fix: import nbformat for playground
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-10-04 19:21:14 -04:00
MingLiangDai
a0e0f81306
feat: PromptTemplate and system prompt support ( #407 )
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-10-03 09:53:37 -04:00
Aaron Pham
398e6b3856
infra: prepare for release 0.3.6 [generated] [skip ci]
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-09-19 07:09:13 +00:00
Aaron Pham
3b2ac1cd59
feat: support continuous batching on generate ( #375 )
...
* feat: support continuous batching on `generate`
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: add changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-09-19 03:04:59 -04:00
Aaron Pham
4662f7008a
infra: prepare for release 0.3.5 [generated] [skip ci]
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-09-18 06:29:22 +00:00
Aaron Pham
5a1fcc9cd5
fix: set default serialisation methods ( #355 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-09-18 02:26:53 -04:00
Aaron Pham
52adaeeb18
infra: prepare for release 0.3.4 [generated] [skip ci]
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-09-14 07:47:15 +00:00
Aaron Pham
a32cf324d8
fix(prompt): correct export extra objects items ( #351 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-09-14 03:42:28 -04:00
Aaron Pham
ad9107958d
feat: continuous batching with vLLM ( #349 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* feat: continuous batching
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com >
* chore: add changeloe
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com >
* chore: add one shot generation
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-09-14 03:09:36 -04:00
Aaron Pham
35e6945e86
fix(serialisation): vLLM safetensors support ( #324 )
...
* fix(serilisation): vllm support for safetensors
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
* chore: running tools
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: generalize one shot generation
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: add changelog [skip ci]
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com >
2023-09-12 17:44:01 -04:00
Alan Poulain
88d7ba7ca8
fix(vllm): Make sure to use max number of GPUs available ( #326 )
...
* fix(serving): vllm bad num_gpus
Signed-off-by: Alan Poulain <contact@alanpoulain.eu >
* ci: auto fixes from pre-commit.ci
For more information, see https://pre-commit.ci
---------
Signed-off-by: Alan Poulain <contact@alanpoulain.eu >
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-09-12 12:45:00 -04:00
Aaron Pham
fddd0bf95e
feat: bootstrap documentation site ( #252 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: GutZuFusss <leon.ikinger@googlemail.com >
Co-authored-by: GutZuFusss <leon.ikinger@googlemail.com >
2023-09-12 12:28:29 -04:00
aarnphm-ec2-dev
8530a067ea
chore(serialisation): dump quantization_config.json to conform with
...
optimum load
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-09-07 16:50:50 +00:00
Aaron
0d50aa00b9
chore: add openllm-core as meta dependencies
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-09-07 10:31:40 -04:00
Aaron Pham
7e2c8428bb
infra: prepare for release 0.3.3 [generated] [skip ci]
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-09-07 01:51:18 +00:00
aarnphm-ec2-dev
8173cb09a5
fix(quantize): dyn quant for int8 and int4
...
only set tokenizer when it is gptq
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-09-07 01:48:45 +00:00
Aaron Pham
fd18c8be01
infra: prepare for release 0.3.2 [generated] [skip ci]
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-09-06 19:03:40 +00:00
Aaron
675b372981
fix: synchronize device for inference
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-09-06 14:08:52 -04:00
Aaron Pham
b61005424b
infra: prepare for release 0.3.1 [generated] [skip ci]
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-09-06 17:49:58 +00:00
Aaron
887ffa9aa0
chore: cleanup pre-commit jobs and update usage
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-09-05 10:06:36 -04:00
aarnphm-ec2-dev
f43c721579
chore: only add bentomodel branch during generated service with
...
OpenLLM
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-09-05 01:08:23 +00:00
Aaron Pham
06a68ade7d
infra: prepare for release 0.3.0 [generated] [skip ci]
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-09-04 19:03:41 +00:00
Aaron
5eea40a599
chore(readme): update README for release [generated] [skip ci]
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-09-04 15:02:01 -04:00
Aaron Pham
956b3a53bc
fix(gptq): use upstream integration ( #297 )
...
* wip
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
* feat: GPTQ transformers integration
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
* fix: only load if variable is available and add changelog
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
* chore: remove boilerplate check
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-09-04 14:05:50 -04:00
aarnphm-ec2-dev
7d893e6cd2
chore: ignore new lines split [skip ci]
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-09-01 17:00:49 +00:00
Aaron Pham
608de0b667
fix(serving): vllm distributed size ( #285 )
...
* chore(weights): ignore gguf pattern for non GGML backend
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
* chore: correct fix num_gpus to be divisble by 2
This depends on the attention_heads from given models
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-09-01 12:37:10 -04:00