Aaron Pham
072b3e97ec
feat: 1.2 APIs ( #821 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-03-15 03:49:19 -04:00
Aaron
e3392476be
revert: "ci: pre-commit autoupdate [pre-commit.ci] ( #931 )"
...
This reverts commit 7b00c84c2a .
2024-03-15 03:47:23 -04:00
pre-commit-ci[bot]
7b00c84c2a
ci: pre-commit autoupdate [pre-commit.ci] ( #931 )
...
* ci: pre-commit autoupdate [pre-commit.ci]
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.2.2 → v0.3.2](https://github.com/astral-sh/ruff-pre-commit/compare/v0.2.2...v0.3.2 )
- [github.com/pre-commit/mirrors-eslint: v9.0.0-beta.0 → v9.0.0-beta.2](https://github.com/pre-commit/mirrors-eslint/compare/v9.0.0-beta.0...v9.0.0-beta.2 )
* ci: auto fixes from pre-commit.ci
For more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2024-03-15 03:46:28 -04:00
Zhao Shenyang
aff5dc8ff2
fix: proper SSE handling for vllm ( #877 )
...
fix: proper SSE handling
2024-02-02 17:25:58 +08:00
Aaron Pham
8d63afc9ce
feat(vllm): support GPTQ with 0.2.6 ( #797 )
...
* feat(vllm): GPTQ support passthrough
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: run scripts
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* fix(install): set order of xformers before vllm
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* feat: support GPTQ with vLLM
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-12-18 12:41:19 -05:00
Aaron Pham
c8c9663d06
fix(infra): conform ruff to 150 LL ( #781 )
...
Generally correctly format it with ruff format and manual style
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-12-14 17:27:32 -05:00
Aaron
55a0b2f825
fix(style): setup correct block format
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-30 07:58:35 -05:00
Aaron
b53559de6f
fix(setter): correct item with the same kwargs with stubs
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-30 07:36:34 -05:00
Aaron Pham
0909e08e3c
fix(llm): remove unecessary parsing
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-29 18:00:08 +00:00
Aaron Pham
9706228956
chore(vllm): add arguments for gpu memory utilization
...
Probably not going to fix anything, just delaying the problem.
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-29 06:45:14 +00:00
Aaron
96318b65ee
fix(sdk): remove broken sdk
...
codespace now around 2.8k lines
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-26 04:53:36 -05:00
Aaron
43a96dab2c
fix(gpus): disable slots for now to enable cached_property
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-26 02:49:48 -05:00
Aaron
69aae34cf4
fix(style): reduce boilerplate and format to custom logics
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-26 01:44:59 -05:00
Aaron Pham
aab173cd99
refactor: focus ( #730 )
...
* perf: remove based images
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: move dockerifle to run on release only
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: cleanup unused types
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-24 01:11:31 -05:00
Aaron Pham
b28b5269b5
feat(openai): chat templates and complete control of prompt generation ( #725 )
...
* feat(openai): chat templates and complete control of prompt generation
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* fix: correctly use base chat templates
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* fix: remove symlink
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-22 06:49:14 -05:00
Aaron Pham
63d86faa32
fix(openai): correct stop tokens and finish_reason state ( #722 )
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-22 04:21:13 -05:00
Aaron Pham
38b7c44df0
fix(base-image): update base image to include cuda for now ( #720 )
...
* fix(base-image): update base image to include cuda for now
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* fix: build core and client on release images
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: cleanup style changes
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-22 01:15:19 -05:00
Aaron Pham
77bd6f090a
chore(logger): fix warnings and streamline style ( #717 )
...
Sorry but there are too much wasted spacing in `_llm.py`, and I'm unhappy and not productive anytime I look or want to do anything with it
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-11-21 18:54:51 -05:00
Aaron Pham
c33b071ee4
refactor: delete unused code ( #716 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-21 04:39:48 -05:00
Aaron Pham
e70246ca5d
feat(generation): add support for eos_token_id ( #714 )
...
chore: add support for custom eos_token_id
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-21 02:01:36 -05:00
Aaron Pham
fde78a2c78
chore: cleanup unused prompt templates ( #713 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-21 01:56:51 -05:00
Aaron Pham
ad4f388c98
refactor: update runner helpers and add max_model_len ( #712 )
...
* chore(runner): cleanup unecessary checks for runnable backend
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: saving llm reference to runner
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: correct inject item
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update support for max_seq_len
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* fix: correct max_model_len
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update and warning backward compatibility
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: remove unused sets
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-20 20:37:15 -05:00
Aaron
f753662ae6
fix(build): only load model when eager is True
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-20 17:06:25 -05:00
Aaron Pham
4491aa54d0
fix(backend): correct use variable for backend when initialisation ( #702 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-19 22:42:25 -05:00
Aaron Pham
816c1ee80e
feat(engine): CTranslate2 ( #698 )
...
* chore: update instruction for dependencies
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* feat(experimental): CTranslate2
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-19 10:25:08 -05:00
Aaron Pham
539f250c0f
feat(vllm): bump to 0.2.2 ( #695 )
...
* feat(vllm): bump to 0.2.2
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: move up to CUDA 12.1
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* fix: remove auto-gptq installation
since the builder image doesn't have access to GPU
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* fix: update containerization warning
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-19 02:52:32 -05:00
Aaron Pham
206521e02d
feat(ctranslate): initial infrastructure support ( #694 )
...
* perf: compact and improve speed and agility
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* --wip--
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: cleanup infrastructure
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update styles notes and autogen mypy configuration
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-19 01:48:33 -05:00
Aaron Pham
1831d8f129
feat: heuristics logprobs ( #692 )
...
* fix(encoder): bring back T5 support on PyTorch
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* feat: support logprobs and prompt_logprobs
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* docs: update changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-18 19:26:20 -05:00
Aaron Pham
4499469efb
fix(annotations): check library through find_spec ( #691 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-18 02:02:16 -05:00
Aaron Pham
381d740a7a
fix(llm): remove unnecessary check ( #683 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-17 11:23:22 -05:00
Aaron Pham
14b3ceb436
fix(torch_dtype): correctly infer based on options ( #682 )
...
Users should be able to set the dtype during build, as we it doesn't effect start time
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-17 10:52:05 -05:00
Aaron Pham
fce8f223f3
perf: reduce footprint ( #668 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-16 04:45:49 -05:00
Aaron Pham
6102a67a83
infra: makes huggingface-hub requirements on fine-tune ( #665 )
...
infra: makes huggingface-hub core deps
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-16 03:12:52 -05:00
Aaron Pham
86d23fd6f5
feat(llm): respect warnings environment for dtype warning ( #664 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-16 03:05:58 -05:00
Aaron Pham
4a6f13ddd2
feat(type): provide structured annotations stubs ( #663 )
...
* feat(type): provide client stubs
separation of concern for more brevity code base
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* docs: update changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-16 02:58:45 -05:00
Aaron Pham
a58d947bc8
perf: improve build logics and cleanup speed ( #657 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-15 00:18:31 -05:00
Aaron Pham
6a6d689a77
feat: Yi models ( #651 )
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-14 21:55:24 -05:00
Aaron Pham
2d428f12da
fix(cpu): more verbose definition for dtype casting ( #639 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-13 20:40:50 -05:00
Aaron Pham
b20c7d1c1d
fix(generation): compatibility dtype with CPU ( #638 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-13 20:32:07 -05:00
Aaron Pham
d358e68539
fix(torch_dtype): load eagerly ( #631 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-13 13:48:04 -05:00
Aaron Pham
099c0dc31b
feat(cli): --dtype arguments ( #627 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-13 05:25:50 -05:00
Aaron Pham
22eaaf3ce1
feat(vllm): support passing specific dtype ( #626 )
...
* feat(vllm): support passing specific dtype
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* fix: correctly cached the item
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* ci: auto fixes from pre-commit.ci
For more information, see https://pre-commit.ci
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-11-13 05:08:33 -05:00
Aaron Pham
126e6c9d63
fix(ruff): correct consistency between isort and formatter ( #624 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-12 21:12:50 -05:00
Aaron Pham
c3416c0afd
feat(llm): update warning envvar and add embedded mode ( #618 )
...
* chore: unify warning envvar and update type inference
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore; update documentation about embedded
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-12 17:39:06 -05:00
Aaron Pham
7e1fb35a71
chore(llm): expose quantise and lazy load heavy imports ( #617 )
...
* chore(llm): expose quantise and lazy load heavy imports
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: move transformers to TYPE_CHECKING block
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-12 14:55:37 -05:00
Aaron Pham
7438005c04
refactor(config): simplify configuration and update start CLI output ( #611 )
...
* chore(config): simplify configuration and update start CLI output
handling
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: remove state and message sent after server lifecycle
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update color stream and refactor reusable logic
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update documentations and mypy
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-11 22:36:10 -05:00
Aaron Pham
fa2038f4e2
fix: loading correct local models ( #599 )
...
* fix(model): loading local correctly
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* chore: update repr and correct bentomodel processor
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* ci: auto fixes from pre-commit.ci
For more information, see https://pre-commit.ci
* chore: cleanup transformers implementation
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* fix: ruff to ignore I001 on all stubs
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-11-10 02:36:12 -05:00
Aaron Pham
ac377fe490
infra: using ruff formatter ( #594 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-09 12:44:05 -05:00
Aaron Pham
b8a2e8cf91
refactor(cli): cleanup API ( #592 )
...
* chore: remove unused imports
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* refactor(cli): update to only need model_id
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* feat: `openllm start model-id`
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: add changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update changelog notice
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update correct config and running tools
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update backward compat options and treat JSON outputs
corespondingly
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-09 11:40:17 -05:00
Aaron Pham
ff8b6377c8
fix(awq): correct awq detection for support ( #586 )
...
* fix(awq): correct detection for awq
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* chore: update base docker to work
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* chore: disable awq on pytorch for now
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* ci: auto fixes from pre-commit.ci
For more information, see https://pre-commit.ci
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-11-08 06:57:11 -05:00