OpenLLM

mirror of https://github.com/bentoml/OpenLLM.git synced 2026-04-23 00:17:28 -04:00

Author	SHA1	Message	Date
Aaron Pham	2cc264aa72	fix(vllm): correctly load given model id from envvar (#181 )	2023-08-03 16:34:35 -04:00
Aaron Pham	cfc7f3888d	chore(vllm): add all supported models (#179 )	2023-08-02 17:42:02 -04:00
pre-commit-ci[bot]	c2ed1d56da	chore(release): update base container restriction (#173 ) Prepare for 0.2.12 release Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-08-01 15:25:17 -04:00
Aaron	ca5e3c7ae5	fix: correct setup property for envvar instance Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-07-31 23:34:42 -04:00
Aaron Pham	8c2867d26d	style: define experimental guidelines (#168 )	2023-07-31 07:54:26 -04:00
Aaron Pham	ef94c6b98a	feat(container): vLLM build and base image strategies (#142 )	2023-07-31 02:44:52 -04:00
Aaron Pham	c391717226	feat(ci): automatic release semver + git archival installation (#143 )	2023-07-25 04:18:49 -04:00
Aaron Pham	7eabcd4355	feat: vLLM integration for PagedAttention (#134 )	2023-07-24 15:42:17 -04:00
Aaron Pham	81b0451685	feat(cli): query with per request instruction (#130 )	2023-07-21 13:57:21 -04:00
aarnphm-ec2-dev	e4ac0ed8b7	fix(cuda): support loading in single GPU add available_devices for getting # of available GPUs Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>	2023-07-21 08:10:01 +00:00
Aaron Pham	f56f8ee782	feat: fine-tuning script for LlaMA 2 (#128 )	2023-07-20 20:44:51 -04:00
Aaron Pham	858c2007c3	feat: revision parsed via model_id (#126 )	2023-07-20 14:36:53 -04:00
Aaron Pham	c1ddb9ed7c	feat: GPTQ + vLLM and LlaMA (#113 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2023-07-19 18:12:12 -04:00
HeTaoPKU	fd9ae56812	fix(baichuan): add "cpm-kernel" as additional requirements (#117 ) This is to support the 13b variant of baichuan Co-authored-by: the <tao.he@hulu.com> Co-authored-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-07-15 23:16:05 -04:00
HeTaoPKU	09b0787306	feat(models): Baichuan (#115 ) Co-authored-by: the <tao.he@hulu.com> Co-authored-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-07-15 22:01:37 -04:00
aarnphm-ec2-dev	d37d14e52b	fix(tests): mark package on CI to xfail XXX: @aarnphm to solve build isolation when have bandwidth. Currently this is not a problem when running locally. `openllm build` just works, where as `openllm.build` won't work sequentially. Address some type stubs for jupytext Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>	2023-07-15 12:48:28 +00:00
Aaron Pham	b2dba6143f	fix(resource): correctly parse CUDA_VISIBLE_DEVICES (#114 )	2023-07-15 07:19:35 -04:00
aarnphm-ec2-dev	c2bb29b4f3	fix: building mpt dependencies Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>	2023-07-11 00:21:23 +00:00
Aaron Pham	c7f4dc7bb2	feat(test): snapshot testing (#107 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2023-07-10 17:23:19 -04:00
Aaron Pham	fb849a384e	feat: GPTNeoX (#106 )	2023-07-07 03:05:40 -04:00
Aaron Pham	d6303d306a	perf: fixing import custom paths and cleanup serialisation (#102 )	2023-07-04 12:49:14 -04:00
Aaron Pham	8ac2755de4	feat(llm): fine-tuning Falcon (#98 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2023-06-30 21:25:16 -04:00
Aaron Pham	e52045eda6	fix: running MPT on CPU (#92 )	2023-06-29 10:54:12 -04:00
Aaron Pham	01db504e7d	feat: MPT (#91 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2023-06-28 23:12:15 -04:00
Aaron Pham	db1494a6ae	feat(start): starting bento and fix load (#80 )	2023-06-27 12:45:17 -04:00
Aaron Pham	74fdd5e259	feat: release binary distribution (#66 )	2023-06-25 10:38:03 -04:00
Aaron Pham	3593c764f0	fix(test): robustness (#64 )	2023-06-24 11:10:07 -04:00
Aaron Pham	98328be394	peft(models): improve implementation (#60 ) If you have a local Dolly-V2 version, please do `openllm prune`	2023-06-24 05:22:18 -04:00
Aaron Pham	1435478f6c	fix(cli): ensure we parse tag for download (#58 )	2023-06-23 21:24:53 -04:00
Aaron Pham	dfca956fad	feat: serve adapter layers (#52 )	2023-06-23 10:07:15 -04:00
Aaron	1ed0ae7787	fix(log): make sure to configure OpenLLM logs correctly Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-06-19 06:19:06 -04:00
Aaron Pham	03758a5487	fix(tools): adhere to style guidelines (#31 )	2023-06-18 20:03:17 -04:00
Aaron Pham	4fcd7c8ac9	integration: HuggingFace Agent (#29 ) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2023-06-18 00:13:53 -04:00
Aaron Pham	6f724416c0	perf: build quantization and better transformer behaviour (#28 ) Fixes quantization_config and low_cpu_mem_usage to be available on PyTorch implementation only See changelog for more details on #28	2023-06-17 08:56:14 -04:00
Aaron	233d4697b5	chore: update __all__ to take into _extra_objects Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-06-16 18:13:35 -04:00
Aaron Pham	ded8a9f809	feat: quantization (#27 )	2023-06-16 18:10:50 -04:00
Aaron Pham	19bc7e3116	feat: fine-tuning [part 1] (#23 )	2023-06-16 00:19:01 -04:00
Aaron	528f76e1d0	fix(client): using httpx for running calls within async context This is so that client.query works within a async context Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-06-15 01:58:49 -04:00
Aaron	be41c23c10	codegen: remove black as dependencies Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-06-14 03:22:05 -04:00
Aaron	111d205f63	perf: faster LLM loading using attrs for faster class creation opposed to metaclass Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-06-14 01:36:42 -04:00
Aaron	cb76a894cf	feat(metadata): add configuration to metadata endpoint Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-06-13 07:09:31 -04:00
Aaron	71070b90b4	chore(metadata): fix model_id to be respected on service.py Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-06-12 16:04:52 -04:00
Aaron Pham	f8ebb36e15	tests: fastpath (#17 ) added fastpath cases for configuration and Flan-T5 fixes respecting model_id into lifecycle hooks. update CLI to cleanup models info	2023-06-12 14:18:26 -04:00
Aaron	f8e99dd8f5	chore(configuration): clean house implementation Using Attrs implementation Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-06-11 18:46:15 -04:00
aarnphm-ec2-dev	81d46ca211	feat(type): support annotations openllm.LLM now supports fully typed-strict openllm.LLM[ModelType, TokenizerType] -> self.model, self.tokenizer Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>	2023-06-11 14:58:17 +00:00
aarnphm-ec2-dev	2e453fb005	refactor(configuration): __config__ and perf move model_ids and default_id to config class declaration, cleanup dependencies between config and LLM implementation lazy load module during LLM creation to llm_post_init fix post_init hooks to run load_in_mha. Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>	2023-06-11 12:53:15 +00:00
aarnphm-ec2-dev	17241292da	feat(cli): show runtime implementation Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>	2023-06-11 05:29:40 +00:00
aarnphm-ec2-dev	bb37f7e238	feat(utils): lazy load modules and fix typo Signed-off-by: aarnphm-ec2-dev <29749331+aarnphm@users.noreply.github.com>	2023-06-10 22:18:37 +00:00
Aaron	05fa34f9e6	refactor: pretrained => model_id I think model_id makes more sense than calling it pretrained Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-06-10 17:36:02 -04:00
Aaron	afddaed08c	fix(perf): respect per request information remove use_default_prompt_template options add pretrained to list of start help docstring fix flax generation config improve flax and tensorflow implementation Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>	2023-06-10 02:14:13 -04:00

1 2

83 Commits