Aaron Pham
|
88b6d3d6de
|
perf: upgrade mixtral to use expert parallelism (#783)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-12-15 11:45:08 -05:00 |
|
Aaron Pham
|
3ab78cd105
|
feat(mixtral): correct support for mixtral (#772)
feat(mixtral): support inference with pt
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-12-13 09:03:56 -05:00 |
|
Aaron Pham
|
d3328343d7
|
feat: mixtral support (#770)
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
|
2023-12-12 01:33:13 -05:00 |
|
Aaron
|
59e8ef93dc
|
chore(deps): lock vLLM to 0.2.4
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-12-12 00:17:18 -05:00 |
|
Aaron
|
9d1b16395e
|
infra: remove redundant mypy config
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-30 09:33:52 -05:00 |
|
yansheng
|
3cb7f14fc1
|
feat(models): Support qwen (#742)
* support qwen
* support qwen
* ci: auto fixes from pre-commit.ci
For more information, see https://pre-commit.ci
* Update openllm-core/src/openllm_core/config/configuration_qwen.py
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
* chore: update correct readme and supports qwen models
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: root <yansheng105@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
|
2023-11-30 06:54:17 -05:00 |
|
Aaron
|
39ecc73a50
|
infra: bump to dev version of 0.4.28.dev0 [generated] [skip ci]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-24 01:54:46 -05:00 |
|
Aaron Pham
|
aab173cd99
|
refactor: focus (#730)
* perf: remove based images
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* chore: update changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* chore: move dockerifle to run on release only
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* chore: cleanup unused types
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-24 01:11:31 -05:00 |
|
Aaron Pham
|
5442d9cd10
|
fix(trust_remote_code): handle args correctly (#727)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-22 17:03:13 -05:00 |
|
Aaron Pham
|
79c9608735
|
infra: reduce wait time to around 7 mins (#726)
Seems like the release process for PyPI usually takes from 4-7 minutes
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-22 07:28:36 -05:00 |
|
Aaron Pham
|
f83f64ffd7
|
fix(infra): setup higher timer for building container images (#723)
* fix(infra): setup higher timer for building container images
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* chore: remove invalid tests
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-22 05:00:33 -05:00 |
|
Aaron Pham
|
38b7c44df0
|
fix(base-image): update base image to include cuda for now (#720)
* fix(base-image): update base image to include cuda for now
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* fix: build core and client on release images
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* chore: cleanup style changes
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-22 01:15:19 -05:00 |
|
Aaron Pham
|
6505abdb44
|
chore: update lower bound version of bentoml to avoid breakage (#703)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-19 23:09:14 -05:00 |
|
Aaron Pham
|
44f05da845
|
infra: update generate notes and better local handle (#701)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-19 17:50:23 -05:00 |
|
Aaron
|
cb4386b013
|
fix(release): remove unecessary check for client dependencies [skip ci]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-19 10:39:38 -05:00 |
|
Aaron Pham
|
816c1ee80e
|
feat(engine): CTranslate2 (#698)
* chore: update instruction for dependencies
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* feat(experimental): CTranslate2
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-19 10:25:08 -05:00 |
|
Aaron Pham
|
539f250c0f
|
feat(vllm): bump to 0.2.2 (#695)
* feat(vllm): bump to 0.2.2
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* chore: update changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* chore: move up to CUDA 12.1
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* fix: remove auto-gptq installation
since the builder image doesn't have access to GPU
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* fix: update containerization warning
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-19 02:52:32 -05:00 |
|
Aaron Pham
|
206521e02d
|
feat(ctranslate): initial infrastructure support (#694)
* perf: compact and improve speed and agility
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* --wip--
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* chore: cleanup infrastructure
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* chore: update styles notes and autogen mypy configuration
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-19 01:48:33 -05:00 |
|
Aaron Pham
|
099cc22a94
|
chore: update documentation (#693)
* chore: update documentation
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* chore: update readme
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* chore: update documentations for configuration
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-18 19:44:52 -05:00 |
|
Aaron Pham
|
c03e3bebb3
|
fix(infra): prepare correct dependencies for release [skip ci] (#687)
fix(infra): prepare correct dependencies for release
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-17 16:05:46 -05:00 |
|
Aaron Pham
|
80ed400646
|
fix(build): lock lower version based on each release and update infra (#686)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-17 15:57:31 -05:00 |
|
Aaron Pham
|
21a308538e
|
fix: correct set item for attrs >23.1 (#678)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-17 09:16:52 -05:00 |
|
Aaron Pham
|
c850d76ccd
|
feat(models): Phi 1.5 (#672)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-16 17:48:10 -05:00 |
|
Aaron Pham
|
6102a67a83
|
infra: makes huggingface-hub requirements on fine-tune (#665)
infra: makes huggingface-hub core deps
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-16 03:12:52 -05:00 |
|
Aaron Pham
|
4a6f13ddd2
|
feat(type): provide structured annotations stubs (#663)
* feat(type): provide client stubs
separation of concern for more brevity code base
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* docs: update changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-16 02:58:45 -05:00 |
|
Aaron Pham
|
9e6df0df89
|
chore: update requirements in README.md (#659)
chore: update requirements
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-15 02:32:36 -05:00 |
|
Aaron Pham
|
034e08cf08
|
infra: update scripts to run update readme automatically (#658)
* infra: update scripts to run update readme automatically
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* chore: cleanup mirror
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* chore(dropdown): correctly format noteblock and important block
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* fix: whitespace aware
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-15 02:22:49 -05:00 |
|
Aaron Pham
|
0bf6ec7537
|
fix(dependencies): lock build < 1 for now (#643)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-14 00:36:08 -05:00 |
|
Aaron Pham
|
a6387d1d15
|
chore: cleanup unused code path (#633)
we now rely on tokenizer.chat_templates to format prompts correctly
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-13 17:23:07 -05:00 |
|
Aaron Pham
|
0924c0b34d
|
infra: removing clojure frontend from infra cycle (#630)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-13 13:29:34 -05:00 |
|
Aaron Pham
|
e0632a85ed
|
refactor(cli): move out to its own packages (#619)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-12 18:25:44 -05:00 |
|
Aaron Pham
|
ac377fe490
|
infra: using ruff formatter (#594)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-09 12:44:05 -05:00 |
|
Aaron Pham
|
b8a2e8cf91
|
refactor(cli): cleanup API (#592)
* chore: remove unused imports
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* refactor(cli): update to only need model_id
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* feat: `openllm start model-id`
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* chore: add changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* chore: update changelog notice
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* chore: update correct config and running tools
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* chore: update backward compat options and treat JSON outputs
corespondingly
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-11-09 11:40:17 -05:00 |
|
Aaron Pham
|
ff8b6377c8
|
fix(awq): correct awq detection for support (#586)
* fix(awq): correct detection for awq
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
* chore: update base docker to work
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
* chore: disable awq on pytorch for now
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
* ci: auto fixes from pre-commit.ci
For more information, see https://pre-commit.ci
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
2023-11-08 06:57:11 -05:00 |
|
Aaron Pham
|
cfd09bfc47
|
chore(runner): yield the outputs directly (#573)
update openai client examples to >1
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
2023-11-07 22:34:11 -05:00 |
|
Aaron Pham
|
e2029c934b
|
perf: unify LLM interface (#518)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
2023-11-06 20:39:43 -05:00 |
|
Aaron Pham
|
72c6005d3b
|
chore(inference): update vllm to 0.2.1.post1 and update config parsing (#554)
chore(dependencies): update vllm to 0.2.1.post1 and update config
parsing
|
2023-11-04 04:01:56 -04:00 |
|
Aaron Pham
|
c1ca7ccd3b
|
fix(breaking): remove embeddings and update client implementation (#500)
|
2023-10-14 16:04:35 -04:00 |
|
aarnphm-ec2-dev
|
65c76cace3
|
chore: update deps for transformers and vllm
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
|
2023-10-11 04:28:46 +00:00 |
|
Aaron
|
625b82a0fc
|
fix(style): remove weird break on split item
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-10-07 02:21:31 -04:00 |
|
XunchaoZ
|
04bb29a264
|
feat: OpenAI-compatible API (#417)
Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
2023-10-07 00:50:03 -04:00 |
|
Aaron Pham
|
5a1fcc9cd5
|
fix: set default serialisation methods (#355)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-09-18 02:26:53 -04:00 |
|
Aaron Pham
|
ad9107958d
|
feat: continuous batching with vLLM (#349)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* feat: continuous batching
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
* chore: add changeloe
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
* chore: add one shot generation
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
---------
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-09-14 03:09:36 -04:00 |
|
Aaron Pham
|
35e6945e86
|
fix(serialisation): vLLM safetensors support (#324)
* fix(serilisation): vllm support for safetensors
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
* chore: running tools
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* chore: generalize one shot generation
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
* chore: add changelog [skip ci]
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
---------
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com>
|
2023-09-12 17:44:01 -04:00 |
|
Aaron
|
c6c23bc959
|
fix(actions): hermetic dependencies
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-09-12 13:44:18 -04:00 |
|
Aaron
|
70f4ccfae6
|
fix(ratchet): lock correctly on cron job
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-09-11 15:09:49 -04:00 |
|
aarnphm-ec2-dev
|
c7f915fa71
|
chore: update documentation wrt to envvar correctness
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
|
2023-09-08 17:43:03 +00:00 |
|
Aaron
|
0d50aa00b9
|
chore: add openllm-core as meta dependencies
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-09-07 10:31:40 -04:00 |
|
Aaron
|
887ffa9aa0
|
chore: cleanup pre-commit jobs and update usage
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
|
2023-09-05 10:06:36 -04:00 |
|
Aaron Pham
|
956b3a53bc
|
fix(gptq): use upstream integration (#297)
* wip
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
* feat: GPTQ transformers integration
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
* fix: only load if variable is available and add changelog
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
* chore: remove boilerplate check
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
---------
Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>
|
2023-09-04 14:05:50 -04:00 |
|