Aaron Pham
88b6d3d6de
perf: upgrade mixtral to use expert parallelism ( #783 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-12-15 11:45:08 -05:00
Aaron Pham
c8c9663d06
fix(infra): conform ruff to 150 LL ( #781 )
...
Generally correctly format it with ruff format and manual style
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-12-14 17:27:32 -05:00
Aaron Pham
10f508d051
fix(mixtral): correct chat templates to remove additional spacing ( #774 )
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-12-13 12:34:06 -05:00
Aaron Pham
3ab78cd105
feat(mixtral): correct support for mixtral ( #772 )
...
feat(mixtral): support inference with pt
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-12-13 09:03:56 -05:00
Aaron Pham
6a185bc88e
fix(logprobs): explicitly set logprobs=None ( #757 )
...
logprobs=0 still outputs logprobs for one generation
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-12-07 03:36:39 -05:00
Aaron
9a7e0cecf0
fix(types): makes sures mypy is running strict
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-30 09:42:24 -05:00
Aaron
9d1b16395e
infra: remove redundant mypy config
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-30 09:33:52 -05:00
yansheng
3cb7f14fc1
feat(models): Support qwen ( #742 )
...
* support qwen
* support qwen
* ci: auto fixes from pre-commit.ci
For more information, see https://pre-commit.ci
* Update openllm-core/src/openllm_core/config/configuration_qwen.py
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* chore: update correct readme and supports qwen models
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: root <yansheng105@gmail.com >
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-30 06:54:17 -05:00
Aaron Pham
d04309188b
chore(style): 2.7k
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-28 07:04:27 +00:00
Aaron
96318b65ee
fix(sdk): remove broken sdk
...
codespace now around 2.8k lines
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-26 04:53:36 -05:00
Aaron
69aae34cf4
fix(style): reduce boilerplate and format to custom logics
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-26 01:44:59 -05:00
MingLiangDai
7b8d9024c4
fix(baichuan): supported from baichuan 2 from now on. ( #728 )
...
* config support multiple architectures
* chore: only support baichuan2 from now on
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update notes
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: run script [skip ci]
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-24 02:07:06 -05:00
Aaron Pham
aab173cd99
refactor: focus ( #730 )
...
* perf: remove based images
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: move dockerifle to run on release only
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: cleanup unused types
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-24 01:11:31 -05:00
Aaron Pham
52a44b1bfa
chore: cleanup loader ( #729 )
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-22 21:51:51 -05:00
Aaron Pham
5442d9cd10
fix(trust_remote_code): handle args correctly ( #727 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-22 17:03:13 -05:00
Aaron Pham
b28b5269b5
feat(openai): chat templates and complete control of prompt generation ( #725 )
...
* feat(openai): chat templates and complete control of prompt generation
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* fix: correctly use base chat templates
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* fix: remove symlink
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-22 06:49:14 -05:00
Aaron Pham
7aa0918a6f
fix(client): correct schemas parser from correct response output ( #724 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-22 05:01:35 -05:00
Aaron Pham
63d86faa32
fix(openai): correct stop tokens and finish_reason state ( #722 )
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-22 04:21:13 -05:00
Aaron Pham
38b7c44df0
fix(base-image): update base image to include cuda for now ( #720 )
...
* fix(base-image): update base image to include cuda for now
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* fix: build core and client on release images
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: cleanup style changes
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-22 01:15:19 -05:00
Aaron Pham
8bb2742a9a
chore(types): append additional types change ( #719 )
...
* chore(types): append additional types change
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* chore: add arguments for parsing dir
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-21 22:38:20 -05:00
Aaron Pham
77bd6f090a
chore(logger): fix warnings and streamline style ( #717 )
...
Sorry but there are too much wasted spacing in `_llm.py`, and I'm unhappy and not productive anytime I look or want to do anything with it
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-11-21 18:54:51 -05:00
Aaron Pham
c33b071ee4
refactor: delete unused code ( #716 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-21 04:39:48 -05:00
Aaron Pham
a8a9f154ce
fix(ci): tests ( #715 )
...
* fix: tests
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* chore: remove broken tests
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-21 03:05:22 -05:00
Aaron Pham
fde78a2c78
chore: cleanup unused prompt templates ( #713 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-21 01:56:51 -05:00
Aaron Pham
ad4f388c98
refactor: update runner helpers and add max_model_len ( #712 )
...
* chore(runner): cleanup unecessary checks for runnable backend
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: saving llm reference to runner
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: correct inject item
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update support for max_seq_len
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* fix: correct max_model_len
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update and warning backward compatibility
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: remove unused sets
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-20 20:37:15 -05:00
Aaron Pham
816c1ee80e
feat(engine): CTranslate2 ( #698 )
...
* chore: update instruction for dependencies
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* feat(experimental): CTranslate2
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-19 10:25:08 -05:00
Aaron Pham
206521e02d
feat(ctranslate): initial infrastructure support ( #694 )
...
* perf: compact and improve speed and agility
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* --wip--
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: cleanup infrastructure
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update styles notes and autogen mypy configuration
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-19 01:48:33 -05:00
Aaron Pham
099cc22a94
chore: update documentation ( #693 )
...
* chore: update documentation
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update readme
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update documentations for configuration
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-18 19:44:52 -05:00
Aaron Pham
1831d8f129
feat: heuristics logprobs ( #692 )
...
* fix(encoder): bring back T5 support on PyTorch
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* feat: support logprobs and prompt_logprobs
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* docs: update changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-18 19:26:20 -05:00
Aaron Pham
e9a89b7a7e
fix(cattrs): strictly lock <23.2 until we upgrade validation logic ( #690 )
...
fix(cattrs): strictly lock <23.2 until we move converter to upper version
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-17 17:11:15 -05:00
Aaron Pham
0891cde0b6
fix(dependencies): ignore broken cattrs release ( #689 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-17 16:52:58 -05:00
Aaron Pham
80ed400646
fix(build): lock lower version based on each release and update infra ( #686 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-17 15:57:31 -05:00
Aaron Pham
21a308538e
fix: correct set item for attrs >23.1 ( #678 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-17 09:16:52 -05:00
Aaron Pham
1a38de9b1f
fix(docs): chatglm support on vLLM ( #673 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-16 17:54:06 -05:00
Aaron Pham
c850d76ccd
feat(models): Phi 1.5 ( #672 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-16 17:48:10 -05:00
Aaron Pham
4a6f13ddd2
feat(type): provide structured annotations stubs ( #663 )
...
* feat(type): provide client stubs
separation of concern for more brevity code base
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* docs: update changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-16 02:58:45 -05:00
Aaron Pham
876586a30e
fix(falcon): remove early_stopping default arguments ( #660 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-15 02:49:54 -05:00
Aaron Pham
034e08cf08
infra: update scripts to run update readme automatically ( #658 )
...
* infra: update scripts to run update readme automatically
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: cleanup mirror
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore(dropdown): correctly format noteblock and important block
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* fix: whitespace aware
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-15 02:22:49 -05:00
Aaron Pham
c5f8602d4c
docs: update instruction adding new models and remove command docstring ( #654 )
...
docs: update instruction adding new models and remove command
docstring
as start will just support model_id directly, there is no need to
support custom docstring anymore
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-14 23:11:16 -05:00
Aaron Pham
6a6d689a77
feat: Yi models ( #651 )
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-14 21:55:24 -05:00
Aaron Pham
31a799ff61
refactor: use DEBUG env-var instead of OPENLLMDEVDEBUG ( #647 )
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-14 01:39:58 -05:00
Aaron Pham
b0ab8ccdf6
experimental: Cohere compatible endpoints. ( #644 )
...
* feat: add generate endpoint
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update generation
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* fix(cohere): generate endpoints
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: --wip--
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* feat: update testing clients and chat implementation
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: disable schemas for easter eggs
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-14 01:07:43 -05:00
Aaron Pham
af84462f27
infra: remove unused postprocess_generate ( #634 )
...
This is currently a noop
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-13 17:35:39 -05:00
Zhao Shenyang
f202fddce8
perf(model): update mistral inference parameters and prompt format ( #632 )
...
* feat(model): add initial mistral support
* ci: auto fixes from pre-commit.ci
For more information, see https://pre-commit.ci
* chore: update with recent refactor
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-13 17:32:16 -05:00
Aaron Pham
a6387d1d15
chore: cleanup unused code path ( #633 )
...
we now rely on tokenizer.chat_templates to format prompts correctly
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-13 17:23:07 -05:00
Aaron Pham
22eaaf3ce1
feat(vllm): support passing specific dtype ( #626 )
...
* feat(vllm): support passing specific dtype
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* fix: correctly cached the item
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* ci: auto fixes from pre-commit.ci
For more information, see https://pre-commit.ci
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-11-13 05:08:33 -05:00
Aaron Pham
126e6c9d63
fix(ruff): correct consistency between isort and formatter ( #624 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-12 21:12:50 -05:00
Aaron Pham
c3416c0afd
feat(llm): update warning envvar and add embedded mode ( #618 )
...
* chore: unify warning envvar and update type inference
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore; update documentation about embedded
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-12 17:39:06 -05:00
Aaron Pham
106e8617c1
chore(config): no need compat workaround for setting cell_contents ( #616 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-12 04:29:27 -05:00
Aaron Pham
fad4186dbc
feat(server): helpers endpoints for conversation format ( #613 )
...
* feat: add support for helpers conversation conversion endpoint
also correct schema generation for openllm client
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update clients to reuse `openllm-core` logics
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: add changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-12 01:02:27 -05:00