Aaron Pham
aab173cd99
refactor: focus ( #730 )
...
* perf: remove based images
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: move dockerifle to run on release only
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: cleanup unused types
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-24 01:11:31 -05:00
Aaron Pham
52a44b1bfa
chore: cleanup loader ( #729 )
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-22 21:51:51 -05:00
Aaron Pham
5442d9cd10
fix(trust_remote_code): handle args correctly ( #727 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-22 17:03:13 -05:00
Aaron Pham
7eae50377d
infra: prepare for release 0.4.26 [generated] [skip ci]
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-22 11:50:50 +00:00
Aaron Pham
b28b5269b5
feat(openai): chat templates and complete control of prompt generation ( #725 )
...
* feat(openai): chat templates and complete control of prompt generation
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* fix: correctly use base chat templates
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* fix: remove symlink
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-22 06:49:14 -05:00
Aaron Pham
f83f64ffd7
fix(infra): setup higher timer for building container images ( #723 )
...
* fix(infra): setup higher timer for building container images
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: remove invalid tests
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-22 05:00:33 -05:00
Aaron Pham
0189342730
infra: prepare for release 0.4.25 [generated] [skip ci]
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-22 09:22:45 +00:00
Aaron Pham
63d86faa32
fix(openai): correct stop tokens and finish_reason state ( #722 )
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-22 04:21:13 -05:00
Aaron Pham
7f09f9daf2
infra: prepare for release 0.4.24 [generated] [skip ci]
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-22 06:34:30 +00:00
Aaron
d697ea3903
fix(image): setup correct installation
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-22 01:33:26 -05:00
Aaron Pham
85e03a4b92
infra: prepare for release 0.4.23 [generated] [skip ci]
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-22 06:16:49 +00:00
Aaron Pham
38b7c44df0
fix(base-image): update base image to include cuda for now ( #720 )
...
* fix(base-image): update base image to include cuda for now
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* fix: build core and client on release images
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: cleanup style changes
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-22 01:15:19 -05:00
Aaron Pham
8bb2742a9a
chore(types): append additional types change ( #719 )
...
* chore(types): append additional types change
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* chore: add arguments for parsing dir
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-21 22:38:20 -05:00
Aaron Pham
04ef08a7f8
chore(strategy): compact and add stubs ( #718 )
...
generate service_vars automatically inline without reading from files
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-21 21:49:28 -05:00
Aaron Pham
909db8c3bf
refactor: reduce compiled cacheline
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-22 02:27:42 +00:00
Aaron Pham
77bd6f090a
chore(logger): fix warnings and streamline style ( #717 )
...
Sorry but there are too much wasted spacing in `_llm.py`, and I'm unhappy and not productive anytime I look or want to do anything with it
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2023-11-21 18:54:51 -05:00
Aaron
14242a7ab8
fix(utils): correct import
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-21 05:03:20 -05:00
Aaron Pham
c33b071ee4
refactor: delete unused code ( #716 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-21 04:39:48 -05:00
Aaron Pham
a8a9f154ce
fix(ci): tests ( #715 )
...
* fix: tests
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
* chore: remove broken tests
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-21 03:05:22 -05:00
Aaron Pham
e70246ca5d
feat(generation): add support for eos_token_id ( #714 )
...
chore: add support for custom eos_token_id
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-21 02:01:36 -05:00
Aaron Pham
fde78a2c78
chore: cleanup unused prompt templates ( #713 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-21 01:56:51 -05:00
Aaron Pham
f3fd32d596
infra: prepare for release 0.4.22 [generated] [skip ci]
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-21 01:38:46 +00:00
Aaron Pham
ad4f388c98
refactor: update runner helpers and add max_model_len ( #712 )
...
* chore(runner): cleanup unecessary checks for runnable backend
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: saving llm reference to runner
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: correct inject item
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update support for max_seq_len
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* fix: correct max_model_len
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update and warning backward compatibility
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: remove unused sets
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-20 20:37:15 -05:00
Aaron Pham
4c4bc82a47
infra: prepare for release 0.4.21 [generated] [skip ci]
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-20 22:32:44 +00:00
Aaron
00e2666e48
fix(build): contraint packages for bentoml >1.1.10
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-20 17:30:38 -05:00
Aaron Pham
204cbd43d2
infra: prepare for release 0.4.20 [generated] [skip ci]
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-20 22:09:47 +00:00
Aaron
f753662ae6
fix(build): only load model when eager is True
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-20 17:06:25 -05:00
Aaron
5b92e848e2
fix: raises error if backend is not supported
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-20 17:03:30 -05:00
Aaron Pham
46d6fcca98
infra: prepare for release 0.4.19 [generated] [skip ci]
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-20 08:06:53 +00:00
Aaron Pham
c1f86bda16
infra: prepare for release 0.4.18 [generated] [skip ci]
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-20 05:15:14 +00:00
Aaron Pham
513c08ccda
feat(openai): dynamic model_type registration ( #704 )
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-20 00:13:45 -05:00
Aaron Pham
6505abdb44
chore: update lower bound version of bentoml to avoid breakage ( #703 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-19 23:09:14 -05:00
Aaron Pham
d1915d7a9e
infra: prepare for release 0.4.17 [generated] [skip ci]
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-20 03:43:21 +00:00
Aaron Pham
4491aa54d0
fix(backend): correct use variable for backend when initialisation ( #702 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-19 22:42:25 -05:00
Aaron Pham
e9207ff683
infra: prepare for release 0.4.16 [generated] [skip ci]
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-19 15:41:03 +00:00
Aaron
cb4386b013
fix(release): remove unecessary check for client dependencies [skip ci]
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-19 10:39:38 -05:00
Aaron Pham
d80c392661
chore: update documentation about runtime ( #699 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-19 10:27:07 -05:00
Aaron Pham
816c1ee80e
feat(engine): CTranslate2 ( #698 )
...
* chore: update instruction for dependencies
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* feat(experimental): CTranslate2
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-19 10:25:08 -05:00
Aaron Pham
539f250c0f
feat(vllm): bump to 0.2.2 ( #695 )
...
* feat(vllm): bump to 0.2.2
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: move up to CUDA 12.1
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* fix: remove auto-gptq installation
since the builder image doesn't have access to GPU
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* fix: update containerization warning
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-19 02:52:32 -05:00
Aaron Pham
206521e02d
feat(ctranslate): initial infrastructure support ( #694 )
...
* perf: compact and improve speed and agility
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* --wip--
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: cleanup infrastructure
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* chore: update styles notes and autogen mypy configuration
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-19 01:48:33 -05:00
Aaron Pham
c19654adf3
infra: prepare for release 0.4.15 [generated] [skip ci]
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-19 00:47:18 +00:00
Aaron Pham
1831d8f129
feat: heuristics logprobs ( #692 )
...
* fix(encoder): bring back T5 support on PyTorch
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* feat: support logprobs and prompt_logprobs
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
* docs: update changelog
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
---------
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-18 19:26:20 -05:00
Aaron Pham
4499469efb
fix(annotations): check library through find_spec ( #691 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-18 02:02:16 -05:00
Aaron Pham
5402db1e61
infra: prepare for release 0.4.14 [generated] [skip ci]
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-17 21:54:10 +00:00
Aaron Pham
e14f3ffed5
infra: prepare for release 0.4.13 [generated] [skip ci]
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-17 21:06:56 +00:00
Aaron Pham
80ed400646
fix(build): lock lower version based on each release and update infra ( #686 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-17 15:57:31 -05:00
Aaron Pham
381d740a7a
fix(llm): remove unnecessary check ( #683 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-17 11:23:22 -05:00
Aaron Pham
65370f6919
infra: prepare for release 0.4.12 [generated] [skip ci]
...
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com >
2023-11-17 15:54:41 +00:00
Aaron Pham
14b3ceb436
fix(torch_dtype): correctly infer based on options ( #682 )
...
Users should be able to set the dtype during build, as we it doesn't effect start time
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-17 10:52:05 -05:00
Aaron Pham
7402408c5f
fix(envvar): explicitly set NVIDIA_DRIVER_CAPABILITIES ( #681 )
...
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com >
2023-11-17 10:40:45 -05:00