Commit Graph

232 Commits

Author SHA1 Message Date
Aaron Pham
d7e99c2827 fix: correctly set quantise for non quantise options
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2024-06-14 02:20:19 +00:00
paperspace
15cada079a fix(models): make sure to use private-tag name for the generated service
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2024-06-03 20:45:17 +00:00
Aaron Pham
c60398c45b chore: add more info to metadata
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2024-06-02 17:57:51 -04:00
Aaron Pham
3193190b94 chore: update configuration to yield objects instead
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2024-06-02 17:48:03 -04:00
paperspace
a93da12084 chore: upgrade to new vLLM schema
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2024-06-02 15:52:45 +00:00
paperspace
8fea50dfdb feat: update ROCm check for syspath
See #950 for more information

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2024-06-02 14:20:23 +00:00
Aaron Pham
bf28f977bc feat(models): command-r (#1005)
* feat(models): add support for command-r

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>

* feat(models): support command-r and remove deadcode and extensions

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>

* chore: update local.sh script

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2024-06-02 10:16:08 -04:00
Aaron Pham
45aceb172f feat(API): add light support for batch inference (#1004)
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2024-05-31 20:36:12 -04:00
paperspace
02010d3499 fix: synchronize into llm_config dict
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2024-05-29 04:31:34 +00:00
paperspace
ef11e54a6d chore: update docs and base instruction [skip ci]
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2024-05-29 03:19:47 +00:00
paperspace
c820cececb fix(generate): make sure to only pass prompt_token_ids if it is a valid
mutable

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2024-05-29 02:42:13 +00:00
paperspace
9da0b4134c chore(qol): make envvar private
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2024-05-27 18:07:38 +00:00
paperspace
07655c9ba8 chore(build): remove vllm_version envvar and lock into templates
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2024-05-27 17:49:58 +00:00
paperspace
ba5a5da720 chore: udpate docstring
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2024-05-27 17:02:26 +00:00
paperspace
0f32290606 chore(packages): ready for 0.5 releases
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2024-05-27 16:54:53 +00:00
Aaron Pham (mbp16)
f4f7f16e81 chore(releases): remove deadcode
Signed-off-by: Aaron Pham (mbp16) <29749331+aarnphm@users.noreply.github.com>
2024-05-27 12:37:50 -04:00
Aaron Pham
3f048d8a5b chore(qol): update CLI options and performance upgrade for build cache (#997)
* chore(qol): update CLI options and performance upgrade for build cache

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>

* chore: update default python version for dev

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>

* fix: install custom tar.gz models

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2024-05-26 04:17:23 -04:00
paperspace
cec0aa5487 fix(memory): correctly recommend instance types for cloud
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2024-05-23 14:42:39 +00:00
Aaron Pham
97d76eec85 tests: add additional basic testing (#982)
* chore: update rebase tests

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>

* chore: update partial clients before removing

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>

* fix: update clients parsing logics to work with 0.5

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>

* chore: ignore ci runs as to run locally

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>

* chore: update async client tests

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>

* chore: update pre-commit

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2024-05-23 10:02:23 -04:00
paperspace
b7193511e6 fix: correct update default value for dict unpacking
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2024-05-22 15:15:23 +00:00
paperspace
e9246e7772 fix: make sure to only update fields when correct type is parse
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2024-05-14 06:42:46 +00:00
paperspace
806308eed6 fix: one-shot generation not to concatenate duplicates
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2024-05-12 04:21:06 +00:00
paperspace
1d2e554a94 chore: disable progressbar for cleaner log trace
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2024-05-10 03:11:47 +00:00
paperspace
9a961d9070 perf(build): improve preheating layers for caching dependencies
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2024-05-09 22:34:39 +00:00
paperspace
8e82bd9600 chore: update streaming logics to respect cursor
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2024-05-09 21:13:28 +00:00
paperspace
dd79779e0e fix: update generate_stream to yield delta based on given index
generation

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2024-05-09 20:18:00 +00:00
paperspace
c9f8dbc767 feat: set options for 'gpu' for building recommendation
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2024-05-09 01:37:29 +00:00
paperspace
852b82d25b fix: make sure to export correct json config
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2024-05-09 01:14:12 +00:00
paperspace
6726f6ae3e fix: make sure to add cpu to number
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2024-05-09 00:06:10 +00:00
paperspace
526a770a06 chore: update base requirements to 0.4.2
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2024-05-08 18:46:13 +00:00
Aaron Pham
42417dbdbf fix: make sure to respect additional parameters parse (#981) 2024-05-08 13:53:56 -04:00
Aaron Pham
43b635fbfd fix: update correct CompletionOutput object (#973)
* fix: update correct CompletionOutput object

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>

* fix: revert to correct version

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
2024-04-30 15:06:46 -04:00
Aaron
66de54eae7 chore: update default params as pydantic fields
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2024-04-03 15:50:37 -04:00
Aaron Pham
32f4dff83b fix: explicitly pass only non-null value
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2024-04-02 04:35:47 +00:00
Aaron Pham
1d817a7e01 fix: add support for min_tokens
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2024-04-02 04:22:14 +00:00
Aaron Pham
5c0d2787c0 feat: add dbrx support
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2024-04-02 04:10:19 +00:00
Aaron Pham
4661838964 chore: move out the template to separate files
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2024-04-02 03:24:26 +00:00
Aaron Pham
67ab9b5762 fix: swagger showing for subpath
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2024-03-22 02:24:22 +00:00
Aaron Pham
3ef93fe371 chore: update support development_mode as DEBUG and support for RELOAD
envvar

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2024-03-22 01:19:32 +00:00
Aaron Pham
80b35f0d72 revert: correct type for openapi schema generation
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2024-03-21 07:51:00 +00:00
Aaron Pham
51bec78ee9 fix(load): make sure to respect MAX_MODEL_LEN from env
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2024-03-21 07:44:49 +00:00
Aaron
295a3b1061 chore(codegen): update generated var to read from envvar
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2024-03-20 21:51:39 -04:00
Aaron Pham
f0ab6d44fa fix: make sure to include new implementation in bundle build
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2024-03-20 22:11:53 +00:00
Aaron
5c8c30a70b fix: uses --pre for alpha releases for now
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2024-03-20 13:38:10 -04:00
Aaron
2ddbe4eb22 fix(service): remove mounting ASGI app
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2024-03-20 11:51:09 -04:00
Aaron Pham
824ff68818 chore: update local script and update service
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2024-03-15 20:29:49 +00:00
Aaron
727361ced7 chore: running updated ruff formatter [skip ci]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2024-03-15 05:35:24 -04:00
Aaron
c34db550a6 fix(build): explicit set to use alpha version
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2024-03-15 05:33:18 -04:00
Aaron
0274fb4c11 fix: don't lock openllm to support alpha release
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2024-03-15 05:29:35 -04:00
Aaron Pham
072b3e97ec feat: 1.2 APIs (#821)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-03-15 03:49:19 -04:00