OpenLLM

mirror of https://github.com/bentoml/OpenLLM.git synced 2026-02-06 13:52:21 -05:00

Author	SHA1	Message	Date
Aaron Pham	072b3e97ec	feat: 1.2 APIs (#821 ) Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-03-15 03:49:19 -04:00
Aaron	e3392476be	revert: "ci: pre-commit autoupdate [pre-commit.ci] (#931 )" This reverts commit `7b00c84c2a`.	2024-03-15 03:47:23 -04:00
pre-commit-ci[bot]	7b00c84c2a	ci: pre-commit autoupdate [pre-commit.ci] (#931 ) * ci: pre-commit autoupdate [pre-commit.ci] updates: - [github.com/astral-sh/ruff-pre-commit: v0.2.2 → v0.3.2](https://github.com/astral-sh/ruff-pre-commit/compare/v0.2.2...v0.3.2) - [github.com/pre-commit/mirrors-eslint: v9.0.0-beta.0 → v9.0.0-beta.2](https://github.com/pre-commit/mirrors-eslint/compare/v9.0.0-beta.0...v9.0.0-beta.2) * ci: auto fixes from pre-commit.ci For more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-03-15 03:46:28 -04:00
Zhao Shenyang	aff5dc8ff2	fix: proper SSE handling for vllm (#877 ) fix: proper SSE handling	2024-02-02 17:25:58 +08:00
Aaron Pham	8d63afc9ce	feat(vllm): support GPTQ with 0.2.6 (#797 ) * feat(vllm): GPTQ support passthrough Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * chore: run scripts Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> * fix(install): set order of xformers before vllm Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * feat: support GPTQ with vLLM Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> --------- Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>	2023-12-18 12:41:19 -05:00
Aaron Pham	c8c9663d06	fix(infra): conform ruff to 150 LL (#781 ) Generally correctly format it with ruff format and manual style Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-12-14 17:27:32 -05:00
Aaron	55a0b2f825	fix(style): setup correct block format Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-30 07:58:35 -05:00
Aaron	b53559de6f	fix(setter): correct item with the same kwargs with stubs Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-30 07:36:34 -05:00
Aaron Pham	0909e08e3c	fix(llm): remove unecessary parsing Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>	2023-11-29 18:00:08 +00:00
Aaron Pham	9706228956	chore(vllm): add arguments for gpu memory utilization Probably not going to fix anything, just delaying the problem. Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>	2023-11-29 06:45:14 +00:00
Aaron	96318b65ee	fix(sdk): remove broken sdk codespace now around 2.8k lines Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-26 04:53:36 -05:00
Aaron	43a96dab2c	fix(gpus): disable slots for now to enable cached_property Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-26 02:49:48 -05:00
Aaron	69aae34cf4	fix(style): reduce boilerplate and format to custom logics Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-26 01:44:59 -05:00
Aaron Pham	aab173cd99	refactor: focus (#730 ) * perf: remove based images Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * chore: update changelog Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * chore: move dockerifle to run on release only Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * chore: cleanup unused types Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> --------- Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-24 01:11:31 -05:00
Aaron Pham	b28b5269b5	feat(openai): chat templates and complete control of prompt generation (#725 ) * feat(openai): chat templates and complete control of prompt generation Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> * fix: correctly use base chat templates Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> * fix: remove symlink Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> --------- Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>	2023-11-22 06:49:14 -05:00
Aaron Pham	63d86faa32	fix(openai): correct stop tokens and finish_reason state (#722 ) Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>	2023-11-22 04:21:13 -05:00
Aaron Pham	38b7c44df0	fix(base-image): update base image to include cuda for now (#720 ) * fix(base-image): update base image to include cuda for now Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * fix: build core and client on release images Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * chore: cleanup style changes Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> --------- Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-22 01:15:19 -05:00
Aaron Pham	77bd6f090a	chore(logger): fix warnings and streamline style (#717 ) Sorry but there are too much wasted spacing in `_llm.py`, and I'm unhappy and not productive anytime I look or want to do anything with it --------- Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2023-11-21 18:54:51 -05:00
Aaron Pham	c33b071ee4	refactor: delete unused code (#716 ) Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-21 04:39:48 -05:00
Aaron Pham	e70246ca5d	feat(generation): add support for eos_token_id (#714 ) chore: add support for custom eos_token_id Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-21 02:01:36 -05:00
Aaron Pham	fde78a2c78	chore: cleanup unused prompt templates (#713 ) Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-21 01:56:51 -05:00
Aaron Pham	ad4f388c98	refactor: update runner helpers and add max_model_len (#712 ) * chore(runner): cleanup unecessary checks for runnable backend Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * chore: saving llm reference to runner Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * chore: correct inject item Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * chore: update support for max_seq_len Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * fix: correct max_model_len Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * chore: update and warning backward compatibility Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * chore: remove unused sets Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> --------- Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-20 20:37:15 -05:00
Aaron	f753662ae6	fix(build): only load model when eager is True Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-20 17:06:25 -05:00
Aaron Pham	4491aa54d0	fix(backend): correct use variable for backend when initialisation (#702 ) Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-19 22:42:25 -05:00
Aaron Pham	816c1ee80e	feat(engine): CTranslate2 (#698 ) * chore: update instruction for dependencies Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * feat(experimental): CTranslate2 Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> --------- Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-19 10:25:08 -05:00
Aaron Pham	539f250c0f	feat(vllm): bump to 0.2.2 (#695 ) * feat(vllm): bump to 0.2.2 Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * chore: update changelog Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * chore: move up to CUDA 12.1 Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * fix: remove auto-gptq installation since the builder image doesn't have access to GPU Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * fix: update containerization warning Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> --------- Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-19 02:52:32 -05:00
Aaron Pham	206521e02d	feat(ctranslate): initial infrastructure support (#694 ) * perf: compact and improve speed and agility Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * --wip-- Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * chore: cleanup infrastructure Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * chore: update styles notes and autogen mypy configuration Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> --------- Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-19 01:48:33 -05:00
Aaron Pham	1831d8f129	feat: heuristics logprobs (#692 ) * fix(encoder): bring back T5 support on PyTorch Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * feat: support logprobs and prompt_logprobs Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * docs: update changelog Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> --------- Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-18 19:26:20 -05:00
Aaron Pham	4499469efb	fix(annotations): check library through find_spec (#691 ) Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-18 02:02:16 -05:00
Aaron Pham	381d740a7a	fix(llm): remove unnecessary check (#683 ) Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-17 11:23:22 -05:00
Aaron Pham	14b3ceb436	fix(torch_dtype): correctly infer based on options (#682 ) Users should be able to set the dtype during build, as we it doesn't effect start time Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-17 10:52:05 -05:00
Aaron Pham	fce8f223f3	perf: reduce footprint (#668 ) Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-16 04:45:49 -05:00
Aaron Pham	6102a67a83	infra: makes huggingface-hub requirements on fine-tune (#665 ) infra: makes huggingface-hub core deps Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-16 03:12:52 -05:00
Aaron Pham	86d23fd6f5	feat(llm): respect warnings environment for dtype warning (#664 ) Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-16 03:05:58 -05:00
Aaron Pham	4a6f13ddd2	feat(type): provide structured annotations stubs (#663 ) * feat(type): provide client stubs separation of concern for more brevity code base Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * docs: update changelog Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> --------- Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-16 02:58:45 -05:00
Aaron Pham	a58d947bc8	perf: improve build logics and cleanup speed (#657 ) Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-15 00:18:31 -05:00
Aaron Pham	6a6d689a77	feat: Yi models (#651 ) Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>	2023-11-14 21:55:24 -05:00
Aaron Pham	2d428f12da	fix(cpu): more verbose definition for dtype casting (#639 ) Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-13 20:40:50 -05:00
Aaron Pham	b20c7d1c1d	fix(generation): compatibility dtype with CPU (#638 ) Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-13 20:32:07 -05:00
Aaron Pham	d358e68539	fix(torch_dtype): load eagerly (#631 ) Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-13 13:48:04 -05:00
Aaron Pham	099c0dc31b	feat(cli): `--dtype` arguments (#627 ) Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-13 05:25:50 -05:00
Aaron Pham	22eaaf3ce1	feat(vllm): support passing specific dtype (#626 ) * feat(vllm): support passing specific dtype Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> * fix: correctly cached the item Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> * ci: auto fixes from pre-commit.ci For more information, see https://pre-commit.ci --------- Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2023-11-13 05:08:33 -05:00
Aaron Pham	126e6c9d63	fix(ruff): correct consistency between isort and formatter (#624 ) Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-12 21:12:50 -05:00
Aaron Pham	c3416c0afd	feat(llm): update warning envvar and add embedded mode (#618 ) * chore: unify warning envvar and update type inference Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * chore; update documentation about embedded Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> --------- Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-12 17:39:06 -05:00
Aaron Pham	7e1fb35a71	chore(llm): expose quantise and lazy load heavy imports (#617 ) * chore(llm): expose quantise and lazy load heavy imports Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * chore: move transformers to TYPE_CHECKING block Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> --------- Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-12 14:55:37 -05:00
Aaron Pham	7438005c04	refactor(config): simplify configuration and update start CLI output (#611 ) * chore(config): simplify configuration and update start CLI output handling Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * chore: remove state and message sent after server lifecycle Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * chore: update color stream and refactor reusable logic Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * chore: update documentations and mypy Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> --------- Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-11 22:36:10 -05:00
Aaron Pham	fa2038f4e2	fix: loading correct local models (#599 ) * fix(model): loading local correctly Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> * chore: update repr and correct bentomodel processor Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * ci: auto fixes from pre-commit.ci For more information, see https://pre-commit.ci * chore: cleanup transformers implementation Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * fix: ruff to ignore I001 on all stubs Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> --------- Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2023-11-10 02:36:12 -05:00
Aaron Pham	ac377fe490	infra: using ruff formatter (#594 ) Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-09 12:44:05 -05:00
Aaron Pham	b8a2e8cf91	refactor(cli): cleanup API (#592 ) * chore: remove unused imports Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * refactor(cli): update to only need model_id Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * feat: `openllm start model-id` Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * chore: add changelog Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * chore: update changelog notice Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * chore: update correct config and running tools Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * chore: update backward compat options and treat JSON outputs corespondingly Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> --------- Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-09 11:40:17 -05:00
Aaron Pham	ff8b6377c8	fix(awq): correct awq detection for support (#586 ) * fix(awq): correct detection for awq Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> * chore: update base docker to work Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> * chore: disable awq on pytorch for now Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> * ci: auto fixes from pre-commit.ci For more information, see https://pre-commit.ci --------- Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2023-11-08 06:57:11 -05:00

1 2

90 Commits