Aaron
d2a2af3ee2
fix: import nbformat for playground
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-10-04 19:21:14 -04:00
MingLiangDai
a0e0f81306
feat: PromptTemplate and system prompt support ( #407 )
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-10-03 09:53:37 -04:00
Aaron Pham
3b2ac1cd59
feat: support continuous batching on generate ( #375 )
...
* feat: support continuous batching on `generate`
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: add changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-09-19 03:04:59 -04:00
Aaron Pham
5a1fcc9cd5
fix: set default serialisation methods ( #355 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-09-18 02:26:53 -04:00
Aaron Pham
a32cf324d8
fix(prompt): correct export extra objects items ( #351 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-09-14 03:42:28 -04:00
Aaron Pham
ad9107958d
feat: continuous batching with vLLM ( #349 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* feat: continuous batching
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com >
* chore: add changeloe
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com >
* chore: add one shot generation
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-09-14 03:09:36 -04:00
Aaron Pham
35e6945e86
fix(serialisation): vLLM safetensors support ( #324 )
...
* fix(serilisation): vllm support for safetensors
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
* chore: running tools
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: generalize one shot generation
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: add changelog [skip ci]
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com >
2023-09-12 17:44:01 -04:00
Alan Poulain
88d7ba7ca8
fix(vllm): Make sure to use max number of GPUs available ( #326 )
...
* fix(serving): vllm bad num_gpus
Signed-off-by: Alan Poulain <contact@alanpoulain.eu >
* ci: auto fixes from pre-commit.ci
For more information, see https://pre-commit.ci
---------
Signed-off-by: Alan Poulain <contact@alanpoulain.eu >
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-09-12 12:45:00 -04:00
Aaron Pham
fddd0bf95e
feat: bootstrap documentation site ( #252 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: GutZuFusss <leon.ikinger@googlemail.com >
Co-authored-by: GutZuFusss <leon.ikinger@googlemail.com >
2023-09-12 12:28:29 -04:00
aarnphm-ec2-dev
8530a067ea
chore(serialisation): dump quantization_config.json to conform with
...
optimum load
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-09-07 16:50:50 +00:00
aarnphm-ec2-dev
8173cb09a5
fix(quantize): dyn quant for int8 and int4
...
only set tokenizer when it is gptq
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-09-07 01:48:45 +00:00
Aaron
675b372981
fix: synchronize device for inference
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-09-06 14:08:52 -04:00
aarnphm-ec2-dev
f43c721579
chore: only add bentomodel branch during generated service with
...
OpenLLM
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-09-05 01:08:23 +00:00
Aaron Pham
956b3a53bc
fix(gptq): use upstream integration ( #297 )
...
* wip
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
* feat: GPTQ transformers integration
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
* fix: only load if variable is available and add changelog
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
* chore: remove boilerplate check
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-09-04 14:05:50 -04:00
aarnphm-ec2-dev
7d893e6cd2
chore: ignore new lines split [skip ci]
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-09-01 17:00:49 +00:00
Aaron Pham
608de0b667
fix(serving): vllm distributed size ( #285 )
...
* chore(weights): ignore gguf pattern for non GGML backend
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
* chore: correct fix num_gpus to be divisble by 2
This depends on the attention_heads from given models
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-09-01 12:37:10 -04:00
Aaron Pham
b7af7765d4
fix(yapf): align weird new lines break [generated] [skip ci] ( #284 )
...
fix(yapf): align weird new lines break
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-09-01 05:34:22 -04:00
Aaron Pham
3e45530abd
refactor(breaking): unify LLM API ( #283 )
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-09-01 05:15:19 -04:00
Aaron
b545ad2ad1
style: google
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-08-30 13:52:35 -04:00
Aaron Pham
c9cef1d773
fix: persistent styling between ruff and yapf ( #279 )
2023-08-30 11:37:41 -04:00
Aaron Pham
2036d4e015
chore(build): use latest vllm pre-built kernel ( #261 )
2023-08-26 09:02:52 -04:00
aarnphm-ec2-dev
806a663e4a
chore(style): add one blank line
...
to conform with Google style
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-08-26 11:36:57 +00:00
Aaron Pham
938fd362bb
feat(vllm): streaming ( #260 )
2023-08-26 07:27:32 -04:00
Aaron Pham
46c8904806
cron(style): run formatter [generated] [skip ci] ( #257 )
2023-08-25 06:38:59 -04:00
Aaron Pham
08dc6ed2ba
chore: ignore peft and fix adapter loading issue ( #255 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-08-25 04:36:35 -04:00
Aaron
787ce1b3b6
chore(style): synchronized style across packages [skip ci]
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-08-23 08:46:22 -04:00
aarnphm-ec2-dev
eddbc06374
chore(style): reduce line length and truncate compression
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-08-22 17:02:00 +00:00
aarnphm-ec2-dev
1488fbb167
chore(style): enable yapf to match with style guidelines
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-08-22 14:03:06 +00:00
Aaron Pham
3ffb25a872
refactor: packages ( #249 )
2023-08-22 08:55:46 -04:00
aarnphm-ec2-dev
9e371d2ead
fix(generate): Correct set batch output for generate from iterator
...
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-08-21 12:02:35 +00:00
Aaron Pham
9e205b4963
feat: token streaming and SSE support ( #240 )
2023-08-20 07:32:49 -04:00
Aaron Pham
4140d160b8
feat(embedding): Adding generic endpoint ( #227 )
2023-08-17 15:17:00 -04:00
aarnphm-ec2-dev
d5c4066ff4
fix(generation): unecessary casting [ec2 build] [wheel build]
...
This breaks when compiled to cfunc
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-08-17 16:34:59 +00:00
aarnphm-ec2-dev
3363ee158b
fix(container): set correct PyTorch version not to override cuda
...
wheels
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-08-16 10:46:49 +00:00
Aaron Pham
8796d0d63d
feat(models): add vLLM support for Falcon ( #223 )
2023-08-16 05:57:42 -04:00
Aaron Pham
3a73aacb01
chore(ci): add dependabot and fix vllm release container ( #217 )
2023-08-16 05:43:41 -04:00
Aaron
af8cb73832
fix: latest vllm build
...
sync changelog with monorepo for sdist installation
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-08-16 04:03:34 -04:00
GutZuFusss
4cad367ab5
feat(contrib): ClojureScript UI ( #89 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com >
2023-08-16 03:30:44 -04:00
Aaron
6b0ab17018
chore: remove unnecessary headers
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-08-15 18:15:54 -04:00
Aaron Pham
cd872ef631
refactor: monorepo ( #203 )
2023-08-15 02:11:14 -04:00