Aaron Pham
8d63afc9ce
feat(vllm): support GPTQ with 0.2.6 ( #797 )
...
* feat(vllm): GPTQ support passthrough
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: run scripts
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* fix(install): set order of xformers before vllm
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* feat: support GPTQ with vLLM
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-12-18 12:41:19 -05:00
Aaron Pham
5d27337e82
fix(cli): avoid runtime __origin__ check for older Python ( #798 )
...
fix(cli): avoid runtime __origin__ on older Python
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-12-18 12:33:36 -05:00
Aaron Pham
88b6d3d6de
perf: upgrade mixtral to use expert parallelism ( #783 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-12-15 11:45:08 -05:00
Aaron Pham
c8c9663d06
fix(infra): conform ruff to 150 LL ( #781 )
...
Generally correctly format it with ruff format and manual style
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-12-14 17:27:32 -05:00
Aaron Pham
44383528b5
fix(logprobs): correct check logprobs ( #779 )
...
* fix(logprobs): correct check logprobs
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update changlog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-12-14 14:19:01 -05:00
Aaron Pham
0d83cefcb6
fix(mixtral): setup hack atm to load weights from pt specifically instead of safetensors ( #776 )
...
fix(mixtral): setup hack atm to load weights from pt specifically
instead of safetensors
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-12-13 18:18:51 -05:00
Aaron Pham
2dbcfa8a0c
fix(cli): correct set arguments for openllm import and openllm build ( #775 )
...
* fix(cli): correct set arguments for `openllm import` and `openllm build`
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-12-13 15:52:59 -05:00
Aaron Pham
3ab78cd105
feat(mixtral): correct support for mixtral ( #772 )
...
feat(mixtral): support inference with pt
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-12-13 09:03:56 -05:00
Aaron Pham
d3328343d7
feat: mixtral support ( #770 )
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-12-12 01:33:13 -05:00
Aaron
59e8ef93dc
chore(deps): lock vLLM to 0.2.4
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-12-12 00:17:18 -05:00
Aaron Pham
08114410bc
fix(openai): logprobs when echo is enabled ( #761 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-12-10 18:09:25 -05:00
Aaron Pham
c3a0b5c39f
feat(openai): supports echo ( #760 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-12-10 13:19:40 -05:00
Aaron
bb4ed8b53c
fix(llm): correct annotations definitions
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-12-09 09:59:02 -05:00
Aaron
9a7e0cecf0
fix(types): makes sures mypy is running strict
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-30 09:42:24 -05:00
Aaron
55a0b2f825
fix(style): setup correct block format
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-30 07:58:35 -05:00
Aaron
b53559de6f
fix(setter): correct item with the same kwargs with stubs
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-30 07:36:34 -05:00
yansheng
3cb7f14fc1
feat(models): Support qwen ( #742 )
...
* support qwen
* support qwen
* ci: auto fixes from pre-commit.ci
For more information, see https://pre-commit.ci
* Update openllm-core/src/openllm_core/config/configuration_qwen.py
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* chore: update correct readme and supports qwen models
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: root <yansheng105@gmail.com >
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-30 06:54:17 -05:00
Aaron Pham
0909e08e3c
fix(llm): remove unecessary parsing
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-29 18:00:08 +00:00
Aaron Pham
9706228956
chore(vllm): add arguments for gpu memory utilization
...
Probably not going to fix anything, just delaying the problem.
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-29 06:45:14 +00:00
Aaron Pham
f0fa06004b
chore: revert back previous backend support PyTorch ( #739 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-29 01:44:41 -05:00
Aaron Pham
d04309188b
chore(style): 2.7k
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-28 07:04:27 +00:00
Aaron
ce6efc2a9e
chore(style): cleanup bytes
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-28 01:27:27 -05:00
Aaron
96318b65ee
fix(sdk): remove broken sdk
...
codespace now around 2.8k lines
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-26 04:53:36 -05:00
Aaron
43a96dab2c
fix(gpus): disable slots for now to enable cached_property
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-26 02:49:48 -05:00
Aaron
69aae34cf4
fix(style): reduce boilerplate and format to custom logics
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-26 01:44:59 -05:00
Aaron
b4c9971678
fix(build): explicitly not lock packages
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-24 01:21:29 -05:00
Aaron
7dd4e3ac4b
fix(build): don't lock packages for now, but do lock base requirements
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-24 01:17:45 -05:00
Aaron
7beaa92c2b
fix(types): using correct refactored literal
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-24 01:14:29 -05:00
Aaron Pham
aab173cd99
refactor: focus ( #730 )
...
* perf: remove based images
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: move dockerifle to run on release only
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: cleanup unused types
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-24 01:11:31 -05:00
Aaron Pham
52a44b1bfa
chore: cleanup loader ( #729 )
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-22 21:51:51 -05:00
Aaron Pham
5442d9cd10
fix(trust_remote_code): handle args correctly ( #727 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-22 17:03:13 -05:00
Aaron Pham
b28b5269b5
feat(openai): chat templates and complete control of prompt generation ( #725 )
...
* feat(openai): chat templates and complete control of prompt generation
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* fix: correctly use base chat templates
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* fix: remove symlink
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-22 06:49:14 -05:00
Aaron Pham
63d86faa32
fix(openai): correct stop tokens and finish_reason state ( #722 )
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-22 04:21:13 -05:00
Aaron
d697ea3903
fix(image): setup correct installation
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-22 01:33:26 -05:00
Aaron Pham
38b7c44df0
fix(base-image): update base image to include cuda for now ( #720 )
...
* fix(base-image): update base image to include cuda for now
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* fix: build core and client on release images
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: cleanup style changes
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-22 01:15:19 -05:00
Aaron Pham
8bb2742a9a
chore(types): append additional types change ( #719 )
...
* chore(types): append additional types change
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* chore: add arguments for parsing dir
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-21 22:38:20 -05:00
Aaron Pham
04ef08a7f8
chore(strategy): compact and add stubs ( #718 )
...
generate service_vars automatically inline without reading from files
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-21 21:49:28 -05:00
Aaron Pham
909db8c3bf
refactor: reduce compiled cacheline
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-22 02:27:42 +00:00
Aaron Pham
77bd6f090a
chore(logger): fix warnings and streamline style ( #717 )
...
Sorry but there are too much wasted spacing in `_llm.py`, and I'm unhappy and not productive anytime I look or want to do anything with it
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-11-21 18:54:51 -05:00
Aaron
14242a7ab8
fix(utils): correct import
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-21 05:03:20 -05:00
Aaron Pham
c33b071ee4
refactor: delete unused code ( #716 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-21 04:39:48 -05:00
Aaron Pham
e70246ca5d
feat(generation): add support for eos_token_id ( #714 )
...
chore: add support for custom eos_token_id
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-21 02:01:36 -05:00
Aaron Pham
fde78a2c78
chore: cleanup unused prompt templates ( #713 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-21 01:56:51 -05:00
Aaron Pham
ad4f388c98
refactor: update runner helpers and add max_model_len ( #712 )
...
* chore(runner): cleanup unecessary checks for runnable backend
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: saving llm reference to runner
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: correct inject item
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update support for max_seq_len
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* fix: correct max_model_len
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update and warning backward compatibility
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: remove unused sets
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-20 20:37:15 -05:00
Aaron
00e2666e48
fix(build): contraint packages for bentoml >1.1.10
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-20 17:30:38 -05:00
Aaron
f753662ae6
fix(build): only load model when eager is True
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-20 17:06:25 -05:00
Aaron
5b92e848e2
fix: raises error if backend is not supported
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-20 17:03:30 -05:00
Aaron Pham
513c08ccda
feat(openai): dynamic model_type registration ( #704 )
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-20 00:13:45 -05:00
Aaron Pham
4491aa54d0
fix(backend): correct use variable for backend when initialisation ( #702 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-19 22:42:25 -05:00
Aaron Pham
816c1ee80e
feat(engine): CTranslate2 ( #698 )
...
* chore: update instruction for dependencies
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* feat(experimental): CTranslate2
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-19 10:25:08 -05:00