OpenLLM

mirror of https://github.com/bentoml/OpenLLM.git synced 2026-04-22 07:57:20 -04:00

Author	SHA1	Message	Date
Aaron Pham	c74f3de6c7	chore: update typing to hijack compliant Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2024-07-10 20:19:41 -04:00
Aaron Pham	e1675652d1	chore: add repo default utils Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2024-07-10 18:01:15 -04:00
Aaron Pham	3fbb75f7e9	chore: add instruction to access chat URL Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2024-07-09 22:56:27 -04:00
Aaron Pham	f4d822125e	chore: ready for 0.6 releases Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2024-07-09 22:05:43 -04:00
Aaron Pham	cd872ef631	refactor: monorepo (#203 )	2023-08-15 02:11:14 -04:00
pre-commit-ci[bot]	2d33100d72	ci: pre-commit autoupdate [pre-commit.ci] (#207 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2023-08-14 19:20:05 -04:00
Aaron Pham	f6317d8003	infra: enable compiled wheels for all supported Python (#201 )	2023-08-12 04:54:50 -04:00
aarnphm-ec2-dev	dc776e9c5a	chore(autogptq): update to latest commits Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>	2023-08-11 11:02:05 +00:00
Aaron	785c1db237	fix(client): include openllm.client into main module [skip ci] Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-08-11 06:19:56 -04:00
Aaron Pham	5329853b10	perf: compiled modules and enable lazyeval (#200 )	2023-08-11 05:53:45 -04:00
Aaron Pham	c083990edd	infra: migrate to initial `openllm-node` library (#199 )	2023-08-10 18:54:00 -04:00
aarnphm-ec2-dev	034610b6b0	fix(embeddings): correct imports Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>	2023-08-10 22:47:38 +00:00
aarnphm-ec2-dev	689b83bbe3	fix(loading): make sure not to load to cuda with kbit quantisation Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>	2023-08-10 19:39:01 +00:00
Aaron	e0daea6e78	fix(compile): absolute import for compiled wheels Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-08-09 22:51:35 -04:00
aarnphm-ec2-dev	dfc4b489c5	feat(build): notes on compiled wheels for Bento Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>	2023-08-09 21:52:34 +00:00
Aaron Pham	b1445c6516	refactor(cli): compiled wheels and extension modules (#191 )	2023-08-09 17:10:15 -04:00
Aaron Pham	b9dd54f634	feat: homebrew tap (#190 )	2023-08-08 22:11:48 -04:00
aarnphm-ec2-dev	deaee67b47	fix(loading): make sure to cast the model to cuda if PyTorch Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>	2023-08-09 01:42:11 +00:00
aarnphm-ec2-dev	ae35ee8115	fix(build): set legacy serialisation for vllm on Bento Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>	2023-08-08 20:10:49 +00:00
Aaron Pham	2d47a54efd	feat(strategy): spawn one runner instance (#189 )	2023-08-08 05:47:11 -04:00
Aaron Pham	cb6f3aa48e	feat: --force-push to allow force push to bentocloud (#188 )	2023-08-08 01:06:59 -04:00
Aaron	371a7c896c	fix: loading models within k8s API server remove a logic where the API server tries to load the model when it is not available locally Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-08-08 00:22:48 -04:00
Aaron Pham	21ea7e493f	feat(generation): initial work for generating tokens (#186 )	2023-08-06 20:04:40 -04:00
Aaron Pham	2d5be909cd	fix(models): setup xformers and loading PyTorch meta weights (#185 )	2023-08-06 03:25:02 -04:00
Aaron Pham	4875c3a109	feat: optimize model saving and loading on single GPU (#183 )	2023-08-06 01:00:49 -04:00
Aaron	90072ec5ee	fix(regression): setting quantize only if it is not None Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-08-04 12:40:55 -04:00
Aaron	ba07205156	fix: disable building xformers from source Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-08-04 12:14:04 -04:00
Aaron	287b7f9ab2	fix: releases issue when building new container [skip ci] Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-08-04 11:31:02 -04:00
Aaron	1e74e967d1	fix(container): correct cache directory Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-08-04 10:31:06 -04:00
Aaron Pham	2541a0f8dc	infra: initial work on compiling mypyc wheels (#182 )	2023-08-04 10:20:03 -04:00
Aaron Pham	2cc264aa72	fix(vllm): correctly load given model id from envvar (#181 )	2023-08-03 16:34:35 -04:00
Aaron	db8e47bc5b	fix(build): correct module type for stubs and strip assert [skip ci] Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-08-03 04:15:55 -04:00
Aaron	8f74e24c2f	fix: clone all for nightly strategy Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-08-03 03:17:18 -04:00
aarnphm-ec2-dev	29ca9f398f	fix: add arch_list for cross compiling Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>	2023-08-03 04:33:48 +00:00
aarnphm-ec2-dev	a01d867bc7	chore(base): add auto-gptq CUDA kernel Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>	2023-08-03 02:40:06 +00:00
Aaron	af64a6dfd5	chore(docs): update to obsidian README format Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-08-02 21:49:33 -04:00
aarnphm-ec2-dev	b349820429	fix(build): add ``--device`` into envvar Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>	2023-08-03 00:44:40 +00:00
Aaron Pham	cfc7f3888d	chore(vllm): add all supported models (#179 )	2023-08-02 17:42:02 -04:00
Aaron Pham	72337410cf	fix: nightly resolver for correct tag (#177 )	2023-08-02 13:10:50 -04:00
Aaron	d4fbfa5e5c	fix: custom release strategy for correct naming Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-08-02 03:03:21 -04:00
Aaron Pham	acb81a6e1a	fix(build): dispatch container via workflow calls (#174 ) add OPENLLM_USE_LOCAL_LATEST as default behaviour within container	2023-08-02 01:54:10 -04:00
pre-commit-ci[bot]	c2ed1d56da	chore(release): update base container restriction (#173 ) Prepare for 0.2.12 release Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-08-01 15:25:17 -04:00
Aaron	6ba8899743	fix: remove invalid OPENLLMDEVDEBUG envvar Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-08-01 01:52:08 -04:00
Aaron	961455c762	fix(cli): always --force on `--push` feat: add --bento-version for ``openllm build`` Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-08-01 00:56:46 -04:00
Aaron	ca5e3c7ae5	fix: correct setup property for envvar instance Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-07-31 23:34:42 -04:00
Aaron Pham	729e423b17	chore(bnb): filter warnings message on CPU (#170 )	2023-07-31 15:48:59 -04:00
Aaron Pham	8c2867d26d	style: define experimental guidelines (#168 )	2023-07-31 07:54:26 -04:00
Aaron Pham	ef94c6b98a	feat(container): vLLM build and base image strategies (#142 )	2023-07-31 02:44:52 -04:00
aarnphm-ec2-dev	fc66ff275b	fix: make sure to add torch to dependencies Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>	2023-07-28 00:01:52 +00:00
Aaron Pham	15640a85cd	feat: supports embeddings for T5 and ChatGLM family generation (#153 )	2023-07-27 16:43:43 -04:00

1 2 3 4 5 ...

421 Commits