OpenLLM

mirror of https://github.com/bentoml/OpenLLM.git synced 2026-03-10 19:22:10 -04:00

Author	SHA1	Message	Date
aarnphm-ec2-dev	8b340559aa	fix(tests): skip running models tests on CI The runners don't have enough space to run all tests Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>	2023-07-19 22:40:40 +00:00
Aaron Pham	c1ddb9ed7c	feat: GPTQ + vLLM and LlaMA (#113 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2023-07-19 18:12:12 -04:00
aarnphm-ec2-dev	5bb95652db	chore(ci): skip large models Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>	2023-07-16 05:53:53 +00:00
Aaron Pham	fc963c42ce	fix: build isolation (#116 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2023-07-16 01:52:21 -04:00
HeTaoPKU	09b0787306	feat(models): Baichuan (#115 ) Co-authored-by: the <tao.he@hulu.com> Co-authored-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-07-15 22:01:37 -04:00
aarnphm-ec2-dev	d37d14e52b	fix(tests): mark package on CI to xfail XXX: @aarnphm to solve build isolation when have bandwidth. Currently this is not a problem when running locally. `openllm build` just works, where as `openllm.build` won't work sequentially. Address some type stubs for jupytext Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>	2023-07-15 12:48:28 +00:00
Aaron Pham	b2dba6143f	fix(resource): correctly parse CUDA_VISIBLE_DEVICES (#114 )	2023-07-15 07:19:35 -04:00
aarnphm-ec2-dev	e2ae24b74c	fix(tests): building not being isolated We will need to fix this from BentoML Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>	2023-07-11 17:28:00 +00:00
Aaron Pham	c7f4dc7bb2	feat(test): snapshot testing (#107 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2023-07-10 17:23:19 -04:00
Aaron	775c8c15a5	fix(tests): make sure the model is available on runner Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-07-04 15:11:58 -04:00
aarnphm-ec2-dev	4c5b27495c	fix: bettertransformer check to bool already Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>	2023-07-04 17:58:02 +00:00
Aaron Pham	d6303d306a	perf: fixing import custom paths and cleanup serialisation (#102 )	2023-07-04 12:49:14 -04:00
Aaron Pham	8ac2755de4	feat(llm): fine-tuning Falcon (#98 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2023-06-30 21:25:16 -04:00
Aaron Pham	59b1d89971	feat: custom dockerfile templates (#95 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2023-06-30 13:44:11 -04:00
Aaron Pham	f2457fcdaf	tests: add sanity check for openllm.client (#93 ) Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-06-29 12:07:00 -04:00
Aaron Pham	bd4cc9b3ff	fix: loading local (#87 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2023-06-28 11:25:54 -04:00
aarnphm-ec2-dev	698d929522	tests: add dict protocol cases Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>	2023-06-28 04:25:09 +00:00
Aaron Pham	db1494a6ae	feat(start): starting bento and fix load (#80 )	2023-06-27 12:45:17 -04:00
Aaron Pham	d544764386	feat: cascading resource strategies (#72 )	2023-06-26 17:38:49 -04:00
Aaron Pham	74fdd5e259	feat: release binary distribution (#66 )	2023-06-25 10:38:03 -04:00
Aaron Pham	3593c764f0	fix(test): robustness (#64 )	2023-06-24 11:10:07 -04:00
Aaron Pham	98328be394	peft(models): improve implementation (#60 ) If you have a local Dolly-V2 version, please do `openllm prune`	2023-06-24 05:22:18 -04:00
Aaron Pham	1435478f6c	fix(cli): ensure we parse tag for download (#58 )	2023-06-23 21:24:53 -04:00
Aaron Pham	a30eebd56f	feat(config): new class generation (#51 ) allow set up new class derived from base class with `model_derivate`.	2023-06-23 01:15:38 -04:00
Aaron Pham	03758a5487	fix(tools): adhere to style guidelines (#31 )	2023-06-18 20:03:17 -04:00
aarnphm-ec2-dev	fe8da4e8a9	fix(tests): ensure_available on tests Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>	2023-06-17 15:12:28 +00:00
Aaron Pham	6f724416c0	perf: build quantization and better transformer behaviour (#28 ) Fixes quantization_config and low_cpu_mem_usage to be available on PyTorch implementation only See changelog for more details on #28	2023-06-17 08:56:14 -04:00
Aaron Pham	ded8a9f809	feat: quantization (#27 )	2023-06-16 18:10:50 -04:00
Aaron Pham	19bc7e3116	feat: fine-tuning [part 1] (#23 )	2023-06-16 00:19:01 -04:00
Aaron Pham	f8ebb36e15	tests: fastpath (#17 ) added fastpath cases for configuration and Flan-T5 fixes respecting model_id into lifecycle hooks. update CLI to cleanup models info	2023-06-12 14:18:26 -04:00
Aaron	ec941c95d5	chore: add license header Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-06-04 16:22:37 -07:00
Chaoyu	dd8b6050b2	feat: FLAN-T5 supports - add infrastructure, to be implemented: cache, chat history - Base Runnable Implementation, that fits LangChain API - Added a Prompt descriptor and utils. feat: license headers and auto factory impl and CLI Auto construct args from pydantic config Add auto factory for ease of use only provide `/generate` to streamline UX experience CLI > envvar > input contract for configuration fix: serve from a thread fix CLI args chore: cleanup names and refactor imports Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-05-03 17:50:14 -07:00

32 Commits