Aaron Pham
77bd6f090a
chore(logger): fix warnings and streamline style ( #717 )
...
Sorry but there are too much wasted spacing in `_llm.py`, and I'm unhappy and not productive anytime I look or want to do anything with it
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-11-21 18:54:51 -05:00
Aaron Pham
c33b071ee4
refactor: delete unused code ( #716 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-21 04:39:48 -05:00
Aaron Pham
e70246ca5d
feat(generation): add support for eos_token_id ( #714 )
...
chore: add support for custom eos_token_id
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-21 02:01:36 -05:00
Aaron Pham
fde78a2c78
chore: cleanup unused prompt templates ( #713 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-21 01:56:51 -05:00
Aaron Pham
ad4f388c98
refactor: update runner helpers and add max_model_len ( #712 )
...
* chore(runner): cleanup unecessary checks for runnable backend
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: saving llm reference to runner
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: correct inject item
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update support for max_seq_len
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* fix: correct max_model_len
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update and warning backward compatibility
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: remove unused sets
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-20 20:37:15 -05:00
Aaron
f753662ae6
fix(build): only load model when eager is True
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-20 17:06:25 -05:00
Aaron Pham
4491aa54d0
fix(backend): correct use variable for backend when initialisation ( #702 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-19 22:42:25 -05:00
Aaron Pham
816c1ee80e
feat(engine): CTranslate2 ( #698 )
...
* chore: update instruction for dependencies
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* feat(experimental): CTranslate2
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-19 10:25:08 -05:00
Aaron Pham
539f250c0f
feat(vllm): bump to 0.2.2 ( #695 )
...
* feat(vllm): bump to 0.2.2
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: move up to CUDA 12.1
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* fix: remove auto-gptq installation
since the builder image doesn't have access to GPU
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* fix: update containerization warning
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-19 02:52:32 -05:00
Aaron Pham
206521e02d
feat(ctranslate): initial infrastructure support ( #694 )
...
* perf: compact and improve speed and agility
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* --wip--
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: cleanup infrastructure
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update styles notes and autogen mypy configuration
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-19 01:48:33 -05:00
Aaron Pham
1831d8f129
feat: heuristics logprobs ( #692 )
...
* fix(encoder): bring back T5 support on PyTorch
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* feat: support logprobs and prompt_logprobs
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* docs: update changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-18 19:26:20 -05:00
Aaron Pham
4499469efb
fix(annotations): check library through find_spec ( #691 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-18 02:02:16 -05:00
Aaron Pham
381d740a7a
fix(llm): remove unnecessary check ( #683 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-17 11:23:22 -05:00
Aaron Pham
14b3ceb436
fix(torch_dtype): correctly infer based on options ( #682 )
...
Users should be able to set the dtype during build, as we it doesn't effect start time
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-17 10:52:05 -05:00
Aaron Pham
fce8f223f3
perf: reduce footprint ( #668 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-16 04:45:49 -05:00
Aaron Pham
6102a67a83
infra: makes huggingface-hub requirements on fine-tune ( #665 )
...
infra: makes huggingface-hub core deps
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-16 03:12:52 -05:00
Aaron Pham
86d23fd6f5
feat(llm): respect warnings environment for dtype warning ( #664 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-16 03:05:58 -05:00
Aaron Pham
4a6f13ddd2
feat(type): provide structured annotations stubs ( #663 )
...
* feat(type): provide client stubs
separation of concern for more brevity code base
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* docs: update changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-16 02:58:45 -05:00
Aaron Pham
a58d947bc8
perf: improve build logics and cleanup speed ( #657 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-15 00:18:31 -05:00
Aaron Pham
6a6d689a77
feat: Yi models ( #651 )
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-14 21:55:24 -05:00
Aaron Pham
2d428f12da
fix(cpu): more verbose definition for dtype casting ( #639 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-13 20:40:50 -05:00
Aaron Pham
b20c7d1c1d
fix(generation): compatibility dtype with CPU ( #638 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-13 20:32:07 -05:00
Aaron Pham
d358e68539
fix(torch_dtype): load eagerly ( #631 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-13 13:48:04 -05:00
Aaron Pham
099c0dc31b
feat(cli): --dtype arguments ( #627 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-13 05:25:50 -05:00
Aaron Pham
22eaaf3ce1
feat(vllm): support passing specific dtype ( #626 )
...
* feat(vllm): support passing specific dtype
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* fix: correctly cached the item
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* ci: auto fixes from pre-commit.ci
For more information, see https://pre-commit.ci
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-11-13 05:08:33 -05:00
Aaron Pham
126e6c9d63
fix(ruff): correct consistency between isort and formatter ( #624 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-12 21:12:50 -05:00
Aaron Pham
c3416c0afd
feat(llm): update warning envvar and add embedded mode ( #618 )
...
* chore: unify warning envvar and update type inference
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore; update documentation about embedded
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-12 17:39:06 -05:00
Aaron Pham
7e1fb35a71
chore(llm): expose quantise and lazy load heavy imports ( #617 )
...
* chore(llm): expose quantise and lazy load heavy imports
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: move transformers to TYPE_CHECKING block
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-12 14:55:37 -05:00
Aaron Pham
7438005c04
refactor(config): simplify configuration and update start CLI output ( #611 )
...
* chore(config): simplify configuration and update start CLI output
handling
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: remove state and message sent after server lifecycle
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update color stream and refactor reusable logic
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update documentations and mypy
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-11 22:36:10 -05:00
Aaron Pham
fa2038f4e2
fix: loading correct local models ( #599 )
...
* fix(model): loading local correctly
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* chore: update repr and correct bentomodel processor
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* ci: auto fixes from pre-commit.ci
For more information, see https://pre-commit.ci
* chore: cleanup transformers implementation
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* fix: ruff to ignore I001 on all stubs
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-11-10 02:36:12 -05:00
Aaron Pham
ac377fe490
infra: using ruff formatter ( #594 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-09 12:44:05 -05:00
Aaron Pham
b8a2e8cf91
refactor(cli): cleanup API ( #592 )
...
* chore: remove unused imports
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* refactor(cli): update to only need model_id
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* feat: `openllm start model-id`
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: add changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update changelog notice
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update correct config and running tools
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update backward compat options and treat JSON outputs
corespondingly
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-09 11:40:17 -05:00
Aaron Pham
ff8b6377c8
fix(awq): correct awq detection for support ( #586 )
...
* fix(awq): correct detection for awq
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* chore: update base docker to work
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* chore: disable awq on pytorch for now
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* ci: auto fixes from pre-commit.ci
For more information, see https://pre-commit.ci
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-11-08 06:57:11 -05:00
Aaron Pham
ea42108e45
chore(service): cleanup API ( #579 )
...
* chore(service): cleanup API
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: running tools
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* fix: tests import
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-08 02:53:08 -05:00
Aaron Pham
7398ae0486
refactor(strategies): move logics into openllm-python ( #578 )
...
fix(strategies): move to openllm
Strategies shouldn't be a part of openllm-core
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-08 02:23:08 -05:00
Aaron Pham
97d7c38fea
refactor: cleanup typing to expose correct API ( #576 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-08 01:24:03 -05:00
Aaron Pham
cfd09bfc47
chore(runner): yield the outputs directly ( #573 )
...
update openai client examples to >1
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-11-07 22:34:11 -05:00
Aaron Pham
dc27b0e727
fix: update build dependencies and format chat prompt ( #569 )
...
chore: update correct check and format prompt
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-07 16:42:20 -05:00
Aaron Pham
8fade070f3
infra: update docs on serving fine-tuning layers ( #567 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-06 21:34:44 -05:00
Aaron Pham
e2029c934b
perf: unify LLM interface ( #518 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: paperspace <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-11-06 20:39:43 -05:00
XunchaoZ
440e3d646f
fix: Max new tokens ( #550 )
...
Bug fix for retrieving user input max_new_tokens
2023-11-03 13:44:25 -04:00
XunchaoZ
022130d0ac
fix(openai): Chat templates ( #519 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-10-30 03:20:43 -04:00
Aaron Pham
c1ca7ccd3b
fix(breaking): remove embeddings and update client implementation ( #500 )
2023-10-14 16:04:35 -04:00
Aaron Pham
1539c3f7dc
feat(client): simple implementation and streaming ( #256 )
2023-10-12 17:21:54 -04:00
Zhao Shenyang
bf96570eab
fix: do not reply on env var for built bento/docker ( #477 )
...
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-10-10 12:29:20 -04:00
Aaron
625b82a0fc
fix(style): remove weird break on split item
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-10-07 02:21:31 -04:00
XunchaoZ
04bb29a264
feat: OpenAI-compatible API ( #417 )
...
Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-10-07 00:50:03 -04:00
MingLiangDai
a0e0f81306
feat: PromptTemplate and system prompt support ( #407 )
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-10-03 09:53:37 -04:00
Aaron Pham
3b2ac1cd59
feat: support continuous batching on generate ( #375 )
...
* feat: support continuous batching on `generate`
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: add changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-09-19 03:04:59 -04:00
Aaron Pham
5a1fcc9cd5
fix: set default serialisation methods ( #355 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-09-18 02:26:53 -04:00