Commit Graph

109 Commits

Author SHA1 Message Date
Aaron Pham
1d817a7e01 fix: add support for min_tokens
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2024-04-02 04:22:14 +00:00
Aaron Pham
5c0d2787c0 feat: add dbrx support
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2024-04-02 04:10:19 +00:00
Aaron Pham
3ef93fe371 chore: update support development_mode as DEBUG and support for RELOAD
envvar

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2024-03-22 01:19:32 +00:00
Aaron Pham
80b35f0d72 revert: correct type for openapi schema generation
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2024-03-21 07:51:00 +00:00
Aaron Pham
51bec78ee9 fix(load): make sure to respect MAX_MODEL_LEN from env
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2024-03-21 07:44:49 +00:00
Aaron Pham
072b3e97ec feat: 1.2 APIs (#821)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-03-15 03:49:19 -04:00
Aaron
e3392476be revert: "ci: pre-commit autoupdate [pre-commit.ci] (#931)"
This reverts commit 7b00c84c2a.
2024-03-15 03:47:23 -04:00
pre-commit-ci[bot]
7b00c84c2a ci: pre-commit autoupdate [pre-commit.ci] (#931)
* ci: pre-commit autoupdate [pre-commit.ci]

updates:
- [github.com/astral-sh/ruff-pre-commit: v0.2.2 → v0.3.2](https://github.com/astral-sh/ruff-pre-commit/compare/v0.2.2...v0.3.2)
- [github.com/pre-commit/mirrors-eslint: v9.0.0-beta.0 → v9.0.0-beta.2](https://github.com/pre-commit/mirrors-eslint/compare/v9.0.0-beta.0...v9.0.0-beta.2)

* ci: auto fixes from pre-commit.ci

For more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-03-15 03:46:28 -04:00
Zhao Shenyang
16d8caf2ee chore: bump up bentoml version to 1.1.11 (#883) 2024-02-04 21:31:14 +08:00
Zhao Shenyang
9d0e292076 fix: limit BentoML version range (#881) 2024-02-04 16:59:21 +08:00
Zhao Shenyang
6c909aabdb chore: set stop to empty list by default (#878) 2024-02-02 19:28:49 +08:00
Aaron Pham
2bb97f8ba2 chore: update discord link (#838)
* Update pyproject.toml

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* Update pyproject.toml

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* Update pyproject.toml

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* Update pyproject.toml

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2024-01-08 19:09:51 -05:00
Aaron Pham
88b6d3d6de perf: upgrade mixtral to use expert parallelism (#783)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-12-15 11:45:08 -05:00
Aaron Pham
c8c9663d06 fix(infra): conform ruff to 150 LL (#781)
Generally correctly format it with ruff format and manual style

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-12-14 17:27:32 -05:00
Aaron Pham
10f508d051 fix(mixtral): correct chat templates to remove additional spacing (#774)
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-12-13 12:34:06 -05:00
Aaron Pham
3ab78cd105 feat(mixtral): correct support for mixtral (#772)
feat(mixtral): support inference with pt

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-12-13 09:03:56 -05:00
Aaron Pham
6a185bc88e fix(logprobs): explicitly set logprobs=None (#757)
logprobs=0 still outputs logprobs for one generation

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-12-07 03:36:39 -05:00
Aaron
9a7e0cecf0 fix(types): makes sures mypy is running strict
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-30 09:42:24 -05:00
Aaron
9d1b16395e infra: remove redundant mypy config
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-30 09:33:52 -05:00
yansheng
3cb7f14fc1 feat(models): Support qwen (#742)
* support qwen

* support qwen

* ci: auto fixes from pre-commit.ci

For more information, see https://pre-commit.ci

* Update openllm-core/src/openllm_core/config/configuration_qwen.py

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* chore: update correct readme and supports qwen models

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: root <yansheng105@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-11-30 06:54:17 -05:00
Aaron Pham
d04309188b chore(style): 2.7k
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-11-28 07:04:27 +00:00
Aaron
96318b65ee fix(sdk): remove broken sdk
codespace now around 2.8k lines

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-26 04:53:36 -05:00
Aaron
69aae34cf4 fix(style): reduce boilerplate and format to custom logics
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-26 01:44:59 -05:00
MingLiangDai
7b8d9024c4 fix(baichuan): supported from baichuan 2 from now on. (#728)
* config support multiple architectures

* chore: only support baichuan2 from now on

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: update notes

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: run script [skip ci]

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-24 02:07:06 -05:00
Aaron Pham
aab173cd99 refactor: focus (#730)
* perf: remove based images

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: update changelog

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: move dockerifle to run on release only

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: cleanup unused types

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-24 01:11:31 -05:00
Aaron Pham
52a44b1bfa chore: cleanup loader (#729)
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-11-22 21:51:51 -05:00
Aaron Pham
5442d9cd10 fix(trust_remote_code): handle args correctly (#727)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-22 17:03:13 -05:00
Aaron Pham
b28b5269b5 feat(openai): chat templates and complete control of prompt generation (#725)
* feat(openai): chat templates and complete control of prompt generation

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* fix: correctly use base chat templates

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* fix: remove symlink

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-11-22 06:49:14 -05:00
Aaron Pham
7aa0918a6f fix(client): correct schemas parser from correct response output (#724)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-22 05:01:35 -05:00
Aaron Pham
63d86faa32 fix(openai): correct stop tokens and finish_reason state (#722)
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-11-22 04:21:13 -05:00
Aaron Pham
38b7c44df0 fix(base-image): update base image to include cuda for now (#720)
* fix(base-image): update base image to include cuda for now

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* fix: build core and client on release images

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: cleanup style changes

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-22 01:15:19 -05:00
Aaron Pham
8bb2742a9a chore(types): append additional types change (#719)
* chore(types): append additional types change

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* chore: add arguments for parsing dir

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-11-21 22:38:20 -05:00
Aaron Pham
77bd6f090a chore(logger): fix warnings and streamline style (#717)
Sorry but there are too much wasted spacing in `_llm.py`, and I'm unhappy and not productive anytime I look or want to do anything with it

---------

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-11-21 18:54:51 -05:00
Aaron Pham
c33b071ee4 refactor: delete unused code (#716)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-21 04:39:48 -05:00
Aaron Pham
a8a9f154ce fix(ci): tests (#715)
* fix: tests

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* chore: remove broken tests

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
2023-11-21 03:05:22 -05:00
Aaron Pham
fde78a2c78 chore: cleanup unused prompt templates (#713)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-21 01:56:51 -05:00
Aaron Pham
ad4f388c98 refactor: update runner helpers and add max_model_len (#712)
* chore(runner): cleanup unecessary checks for runnable backend

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: saving llm reference to runner

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: correct inject item

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: update support for max_seq_len

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* fix: correct max_model_len

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: update and warning backward compatibility

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: remove unused sets

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-20 20:37:15 -05:00
Aaron Pham
816c1ee80e feat(engine): CTranslate2 (#698)
* chore: update instruction for dependencies

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* feat(experimental): CTranslate2

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-19 10:25:08 -05:00
Aaron Pham
206521e02d feat(ctranslate): initial infrastructure support (#694)
* perf: compact and improve speed and agility

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* --wip--

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: cleanup infrastructure

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: update styles notes and autogen mypy configuration

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-19 01:48:33 -05:00
Aaron Pham
099cc22a94 chore: update documentation (#693)
* chore: update documentation

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: update readme

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: update documentations for configuration

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-18 19:44:52 -05:00
Aaron Pham
1831d8f129 feat: heuristics logprobs (#692)
* fix(encoder): bring back T5 support on PyTorch

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* feat: support logprobs and prompt_logprobs

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* docs: update changelog

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-18 19:26:20 -05:00
Aaron Pham
e9a89b7a7e fix(cattrs): strictly lock <23.2 until we upgrade validation logic (#690)
fix(cattrs): strictly lock <23.2 until we move converter to upper version

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-17 17:11:15 -05:00
Aaron Pham
0891cde0b6 fix(dependencies): ignore broken cattrs release (#689)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-17 16:52:58 -05:00
Aaron Pham
80ed400646 fix(build): lock lower version based on each release and update infra (#686)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-17 15:57:31 -05:00
Aaron Pham
21a308538e fix: correct set item for attrs >23.1 (#678)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-17 09:16:52 -05:00
Aaron Pham
1a38de9b1f fix(docs): chatglm support on vLLM (#673)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-16 17:54:06 -05:00
Aaron Pham
c850d76ccd feat(models): Phi 1.5 (#672)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-16 17:48:10 -05:00
Aaron Pham
4a6f13ddd2 feat(type): provide structured annotations stubs (#663)
* feat(type): provide client stubs

separation of concern for more brevity code base

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* docs: update changelog

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-16 02:58:45 -05:00
Aaron Pham
876586a30e fix(falcon): remove early_stopping default arguments (#660)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-15 02:49:54 -05:00
Aaron Pham
034e08cf08 infra: update scripts to run update readme automatically (#658)
* infra: update scripts to run update readme automatically

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: cleanup mirror

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore(dropdown): correctly format noteblock and important block

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* fix: whitespace aware

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-11-15 02:22:49 -05:00