Aaron Pham
909db8c3bf
refactor: reduce compiled cacheline
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-22 02:27:42 +00:00
Aaron Pham
77bd6f090a
chore(logger): fix warnings and streamline style ( #717 )
...
Sorry but there are too much wasted spacing in `_llm.py`, and I'm unhappy and not productive anytime I look or want to do anything with it
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-11-21 18:54:51 -05:00
Aaron
14242a7ab8
fix(utils): correct import
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-21 05:03:20 -05:00
Aaron Pham
c33b071ee4
refactor: delete unused code ( #716 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-21 04:39:48 -05:00
Aaron Pham
e70246ca5d
feat(generation): add support for eos_token_id ( #714 )
...
chore: add support for custom eos_token_id
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-21 02:01:36 -05:00
Aaron Pham
fde78a2c78
chore: cleanup unused prompt templates ( #713 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-21 01:56:51 -05:00
Aaron Pham
ad4f388c98
refactor: update runner helpers and add max_model_len ( #712 )
...
* chore(runner): cleanup unecessary checks for runnable backend
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: saving llm reference to runner
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: correct inject item
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update support for max_seq_len
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* fix: correct max_model_len
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update and warning backward compatibility
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: remove unused sets
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-20 20:37:15 -05:00
Aaron
00e2666e48
fix(build): contraint packages for bentoml >1.1.10
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-20 17:30:38 -05:00
Aaron
f753662ae6
fix(build): only load model when eager is True
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-20 17:06:25 -05:00
Aaron
5b92e848e2
fix: raises error if backend is not supported
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-20 17:03:30 -05:00
Aaron Pham
513c08ccda
feat(openai): dynamic model_type registration ( #704 )
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-20 00:13:45 -05:00
Aaron Pham
4491aa54d0
fix(backend): correct use variable for backend when initialisation ( #702 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-19 22:42:25 -05:00
Aaron Pham
816c1ee80e
feat(engine): CTranslate2 ( #698 )
...
* chore: update instruction for dependencies
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* feat(experimental): CTranslate2
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-19 10:25:08 -05:00
Aaron Pham
539f250c0f
feat(vllm): bump to 0.2.2 ( #695 )
...
* feat(vllm): bump to 0.2.2
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: move up to CUDA 12.1
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* fix: remove auto-gptq installation
since the builder image doesn't have access to GPU
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* fix: update containerization warning
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-19 02:52:32 -05:00
Aaron Pham
206521e02d
feat(ctranslate): initial infrastructure support ( #694 )
...
* perf: compact and improve speed and agility
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* --wip--
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: cleanup infrastructure
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update styles notes and autogen mypy configuration
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-19 01:48:33 -05:00
Aaron Pham
1831d8f129
feat: heuristics logprobs ( #692 )
...
* fix(encoder): bring back T5 support on PyTorch
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* feat: support logprobs and prompt_logprobs
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* docs: update changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-18 19:26:20 -05:00
Aaron Pham
4499469efb
fix(annotations): check library through find_spec ( #691 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-18 02:02:16 -05:00
Aaron Pham
80ed400646
fix(build): lock lower version based on each release and update infra ( #686 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-17 15:57:31 -05:00
Aaron Pham
381d740a7a
fix(llm): remove unnecessary check ( #683 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-17 11:23:22 -05:00
Aaron Pham
14b3ceb436
fix(torch_dtype): correctly infer based on options ( #682 )
...
Users should be able to set the dtype during build, as we it doesn't effect start time
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-17 10:52:05 -05:00
Aaron Pham
7402408c5f
fix(envvar): explicitly set NVIDIA_DRIVER_CAPABILITIES ( #681 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-17 10:40:45 -05:00
Aaron Pham
bce273ad47
fix(env): correct format environment on docker ( #680 )
...
* fix(env): correct format environment on docker
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* docs: changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-17 09:51:17 -05:00
Aaron Pham
c1e0e3eae7
fix(build): correctly parse default env for container ( #679 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-17 09:35:26 -05:00
Aaron Pham
d60ca49d2f
perf: potentially reduce image size ( #675 )
...
* perf: potentially reduce image size
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* perf: use base python packages only
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* fix: typo
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* perf: Shave off 2GB
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-17 01:15:56 -05:00
Aaron Pham
09cc84a56c
chore(loading): include verbose warning about trust_remote_code ( #674 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-16 20:09:50 -05:00
Aaron Pham
c850d76ccd
feat(models): Phi 1.5 ( #672 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-16 17:48:10 -05:00
Aaron Pham
8fdfd0491f
perf(build): locking and improve build speed ( #669 )
...
* revert(build): not locking packages
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* perf: improve svars generation and unifying envvar parsing
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* docs: update changelog
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* chore: update stubs check for mypy
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-16 06:27:45 -05:00
Aaron Pham
fce8f223f3
perf: reduce footprint ( #668 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-16 04:45:49 -05:00
Aaron Pham
9e3f0fea15
types: update stubs for remaining entrypoints ( #667 )
...
* perf(type): static OpenAI types definition
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* feat: add hf types
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* types: update remaining missing stubs
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-16 04:26:13 -05:00
Aaron Pham
6102a67a83
infra: makes huggingface-hub requirements on fine-tune ( #665 )
...
infra: makes huggingface-hub core deps
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-16 03:12:52 -05:00
Aaron Pham
86d23fd6f5
feat(llm): respect warnings environment for dtype warning ( #664 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-16 03:05:58 -05:00
Aaron Pham
4a6f13ddd2
feat(type): provide structured annotations stubs ( #663 )
...
* feat(type): provide client stubs
separation of concern for more brevity code base
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* docs: update changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-16 02:58:45 -05:00
Kuan-Chun Wang
af88b9b077
fix(runner): remove keyword args for attrs.get() ( #661 )
2023-11-15 04:59:01 -05:00
Aaron Pham
a58d947bc8
perf: improve build logics and cleanup speed ( #657 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-15 00:18:31 -05:00
Aaron Pham
103156cd71
chore(cli): move playground to CLI components ( #655 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-14 23:20:50 -05:00
xianxian.zhang
ea02aaaa23
fix: correct OPENLLM_DEV_BUILD check ( #653 )
2023-11-14 22:21:37 -05:00
Aaron Pham
6a6d689a77
feat: Yi models ( #651 )
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-14 21:55:24 -05:00
Aaron Pham
b4b70e2f20
fix(cli): update context name parsing correctly ( #652 )
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-14 21:53:56 -05:00
Aaron
9eddae83a6
infra: update cohere client
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-14 01:52:38 -05:00
Aaron Pham
31a799ff61
refactor: use DEBUG env-var instead of OPENLLMDEVDEBUG ( #647 )
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-14 01:39:58 -05:00
Aaron Pham
00d6016bcb
chore(openapi): unify inject param ( #645 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-14 01:16:20 -05:00
Aaron Pham
b0ab8ccdf6
experimental: Cohere compatible endpoints. ( #644 )
...
* feat: add generate endpoint
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update generation
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* fix(cohere): generate endpoints
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: --wip--
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* feat: update testing clients and chat implementation
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: disable schemas for easter eggs
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-14 01:07:43 -05:00
Aaron Pham
b30a412398
fix(cli): set default dtype to auto infer ( #642 )
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-13 23:05:27 -05:00
Aaron Pham
99a5d26527
fix(service): to yield out correct JSON objects ( #640 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-13 22:41:52 -05:00
Aaron Pham
2d428f12da
fix(cpu): more verbose definition for dtype casting ( #639 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-13 20:40:50 -05:00
Aaron Pham
b20c7d1c1d
fix(generation): compatibility dtype with CPU ( #638 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-13 20:32:07 -05:00
Aaron Pham
a6387d1d15
chore: cleanup unused code path ( #633 )
...
we now rely on tokenizer.chat_templates to format prompts correctly
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-13 17:23:07 -05:00
Aaron Pham
d358e68539
fix(torch_dtype): load eagerly ( #631 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-13 13:48:04 -05:00
pre-commit-ci[bot]
52367d1e8b
ci: pre-commit autoupdate [pre-commit.ci] ( #629 )
...
* ci: pre-commit autoupdate [pre-commit.ci]
updates:
- [github.com/pre-commit/mirrors-prettier: v3.0.3 → v3.1.0](https://github.com/pre-commit/mirrors-prettier/compare/v3.0.3...v3.1.0 )
* ci: auto fixes from pre-commit.ci
For more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-11-13 13:07:53 -05:00
Aaron Pham
852cd863a9
fix(cli): make sure to pass the dtype to subprocess service ( #628 )
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-13 05:32:17 -05:00