OpenLLM

mirror of https://github.com/bentoml/OpenLLM.git synced 2026-03-07 08:38:20 -05:00

Author	SHA1	Message	Date
Aaron Pham	4661838964	chore: move out the template to separate files Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>	2024-04-02 03:24:26 +00:00
Aaron Pham	67ab9b5762	fix: swagger showing for subpath Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>	2024-03-22 02:24:22 +00:00
Aaron Pham	3ef93fe371	chore: update support development_mode as DEBUG and support for RELOAD envvar Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>	2024-03-22 01:19:32 +00:00
Aaron Pham	80b35f0d72	revert: correct type for openapi schema generation Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>	2024-03-21 07:51:00 +00:00
Aaron Pham	51bec78ee9	fix(load): make sure to respect MAX_MODEL_LEN from env Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>	2024-03-21 07:44:49 +00:00
Aaron	295a3b1061	chore(codegen): update generated var to read from envvar Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2024-03-20 21:51:39 -04:00
Aaron Pham	f0ab6d44fa	fix: make sure to include new implementation in bundle build Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>	2024-03-20 22:11:53 +00:00
Aaron	5c8c30a70b	fix: uses --pre for alpha releases for now Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2024-03-20 13:38:10 -04:00
Aaron	2ddbe4eb22	fix(service): remove mounting ASGI app Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2024-03-20 11:51:09 -04:00
Aaron Pham	824ff68818	chore: update local script and update service Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>	2024-03-15 20:29:49 +00:00
Aaron	727361ced7	chore: running updated ruff formatter [skip ci] Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2024-03-15 05:35:24 -04:00
Aaron	c34db550a6	fix(build): explicit set to use alpha version Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2024-03-15 05:33:18 -04:00
Aaron	0274fb4c11	fix: don't lock openllm to support alpha release Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2024-03-15 05:29:35 -04:00
Aaron Pham	072b3e97ec	feat: 1.2 APIs (#821 ) Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-03-15 03:49:19 -04:00
Aaron	e3392476be	revert: "ci: pre-commit autoupdate [pre-commit.ci] (#931 )" This reverts commit `7b00c84c2a`.	2024-03-15 03:47:23 -04:00
pre-commit-ci[bot]	7b00c84c2a	ci: pre-commit autoupdate [pre-commit.ci] (#931 ) * ci: pre-commit autoupdate [pre-commit.ci] updates: - [github.com/astral-sh/ruff-pre-commit: v0.2.2 → v0.3.2](https://github.com/astral-sh/ruff-pre-commit/compare/v0.2.2...v0.3.2) - [github.com/pre-commit/mirrors-eslint: v9.0.0-beta.0 → v9.0.0-beta.2](https://github.com/pre-commit/mirrors-eslint/compare/v9.0.0-beta.0...v9.0.0-beta.2) * ci: auto fixes from pre-commit.ci For more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2024-03-15 03:46:28 -04:00
Zhao Shenyang	3299f463a6	fix: remove vllm dependency for pytorch bento (#893 )	2024-02-05 18:36:14 -05:00
Zhao Shenyang	16d8caf2ee	chore: bump up bentoml version to 1.1.11 (#883 )	2024-02-04 21:31:14 +08:00
Zhao Shenyang	9d0e292076	fix: limit BentoML version range (#881 )	2024-02-04 16:59:21 +08:00
Zhao Shenyang	9f9195f74b	fix: all runners sse output (#880 )	2024-02-02 20:08:31 +08:00
Zhao Shenyang	6c909aabdb	chore: set stop to empty list by default (#878 )	2024-02-02 19:28:49 +08:00
Zhao Shenyang	aff5dc8ff2	fix: proper SSE handling for vllm (#877 ) fix: proper SSE handling	2024-02-02 17:25:58 +08:00
Fazli Sapuan	6b3a1bd708	chore: fix typo in list_models pydoc (#847 )	2024-01-15 08:26:48 -05:00
Zhao Shenyang	8baaf122ae	improv(package): use python slim base image and let pytorch install cuda (#807 )	2024-01-11 23:23:03 -05:00
Aaron Pham	8d63afc9ce	feat(vllm): support GPTQ with 0.2.6 (#797 ) * feat(vllm): GPTQ support passthrough Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * chore: run scripts Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> * fix(install): set order of xformers before vllm Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * feat: support GPTQ with vLLM Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> --------- Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>	2023-12-18 12:41:19 -05:00
Aaron Pham	5d27337e82	fix(cli): avoid runtime `__origin__` check for older Python (#798 ) fix(cli): avoid runtime __origin__ on older Python Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-12-18 12:33:36 -05:00
Aaron Pham	88b6d3d6de	perf: upgrade mixtral to use expert parallelism (#783 ) Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-12-15 11:45:08 -05:00
Aaron Pham	c8c9663d06	fix(infra): conform ruff to 150 LL (#781 ) Generally correctly format it with ruff format and manual style Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-12-14 17:27:32 -05:00
Aaron Pham	44383528b5	fix(logprobs): correct check logprobs (#779 ) * fix(logprobs): correct check logprobs Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * chore: update changlog Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> --------- Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-12-14 14:19:01 -05:00
Aaron Pham	0d83cefcb6	fix(mixtral): setup hack atm to load weights from pt specifically instead of safetensors (#776 ) fix(mixtral): setup hack atm to load weights from pt specifically instead of safetensors Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-12-13 18:18:51 -05:00
Aaron Pham	2dbcfa8a0c	fix(cli): correct set arguments for `openllm import` and `openllm build` (#775 ) * fix(cli): correct set arguments for `openllm import` and `openllm build` Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> * chore: update changelog Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> --------- Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-12-13 15:52:59 -05:00
Aaron Pham	3ab78cd105	feat(mixtral): correct support for mixtral (#772 ) feat(mixtral): support inference with pt Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-12-13 09:03:56 -05:00
Aaron Pham	d3328343d7	feat: mixtral support (#770 ) Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>	2023-12-12 01:33:13 -05:00
Aaron	59e8ef93dc	chore(deps): lock vLLM to 0.2.4 Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-12-12 00:17:18 -05:00
Aaron Pham	08114410bc	fix(openai): logprobs when echo is enabled (#761 ) Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-12-10 18:09:25 -05:00
Aaron Pham	c3a0b5c39f	feat(openai): supports echo (#760 ) Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-12-10 13:19:40 -05:00
Aaron	bb4ed8b53c	fix(llm): correct annotations definitions Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-12-09 09:59:02 -05:00
Aaron	9a7e0cecf0	fix(types): makes sures mypy is running strict Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-30 09:42:24 -05:00
Aaron	55a0b2f825	fix(style): setup correct block format Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-30 07:58:35 -05:00
Aaron	b53559de6f	fix(setter): correct item with the same kwargs with stubs Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-30 07:36:34 -05:00
yansheng	3cb7f14fc1	feat(models): Support qwen (#742 ) * support qwen * support qwen * ci: auto fixes from pre-commit.ci For more information, see https://pre-commit.ci * Update openllm-core/src/openllm_core/config/configuration_qwen.py Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> * chore: update correct readme and supports qwen models Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> --------- Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> Co-authored-by: root <yansheng105@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>	2023-11-30 06:54:17 -05:00
Aaron Pham	0909e08e3c	fix(llm): remove unecessary parsing Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>	2023-11-29 18:00:08 +00:00
Aaron Pham	9706228956	chore(vllm): add arguments for gpu memory utilization Probably not going to fix anything, just delaying the problem. Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>	2023-11-29 06:45:14 +00:00
Aaron Pham	f0fa06004b	chore: revert back previous backend support PyTorch (#739 ) Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-29 01:44:41 -05:00
Aaron Pham	d04309188b	chore(style): 2.7k Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>	2023-11-28 07:04:27 +00:00
Aaron	ce6efc2a9e	chore(style): cleanup bytes Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-28 01:27:27 -05:00
Aaron	96318b65ee	fix(sdk): remove broken sdk codespace now around 2.8k lines Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-26 04:53:36 -05:00
Aaron	43a96dab2c	fix(gpus): disable slots for now to enable cached_property Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-26 02:49:48 -05:00
Aaron	69aae34cf4	fix(style): reduce boilerplate and format to custom logics Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-26 01:44:59 -05:00
Aaron	b4c9971678	fix(build): explicitly not lock packages Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-11-24 01:21:29 -05:00

1 2 3 4

196 Commits