diff --git a/ADDING_NEW_MODEL.md b/ADDING_NEW_MODEL.md index da062c49..b128d084 100644 --- a/ADDING_NEW_MODEL.md +++ b/ADDING_NEW_MODEL.md @@ -1,44 +1,76 @@ # Adding a New Model -OpenLLM encourages contributions by welcoming users to incorporate their custom Large Language Models (LLMs) into the ecosystem. You can set up your development environment by referring to our [Developer Guide](https://github.com/bentoml/OpenLLM/blob/main/DEVELOPMENT.md). +OpenLLM encourages contributions by welcoming users to incorporate their custom +Large Language Models (LLMs) into the ecosystem. You can set up your development +environment by referring to our +[Developer Guide](https://github.com/bentoml/OpenLLM/blob/main/DEVELOPMENT.md). ## Procedure -All the relevant code for incorporating a new model resides within `src/openllm/models`. Start by creating a new folder named after your `model_name` in snake_case. Here's your roadmap: +All the relevant code for incorporating a new model resides within +`src/openllm/models`. Start by creating a new folder named after your +`model_name` in snake_case. Here's your roadmap: -- [ ] Generate model configuration file: `src/openllm/models/{model_name}/configuration_{model_name}.py` -- [ ] Establish model implementation files: `src/openllm/models/{model_name}/modeling_{runtime}_{model_name}.py` -- [ ] Create module's `__init__.py`: `src/openllm/models/{model_name}/__init__.py` +- [ ] Generate model configuration file: + `src/openllm/models/{model_name}/configuration_{model_name}.py` +- [ ] Establish model implementation files: + `src/openllm/models/{model_name}/modeling_{runtime}_{model_name}.py` +- [ ] Create module's `__init__.py`: + `src/openllm/models/{model_name}/__init__.py` - [ ] Adjust the entrypoints for files at `src/openllm/models/auto/*` - [ ] Modify the main `__init__.py`: `src/openllm/models/__init__.py` -- [ ] Develop or adjust dummy objects for dependencies, a task exclusive to the `utils` directory: `src/openllm/utils/*` +- [ ] Develop or adjust dummy objects for dependencies, a task exclusive to the + `utils` directory: `src/openllm/utils/*` For a working example, check out any pre-implemented model. -> We are developing a CLI command and helper script to generate these files, which would further streamline the process. Until then, manual creation is necessary. +> We are developing a CLI command and helper script to generate these files, +> which would further streamline the process. Until then, manual creation is +> necessary. ### Model Configuration + File Name: `configuration_{model_name}.py` -This file is dedicated to specifying docstrings, default prompt templates, default parameters, as well as additional fields for the models. +This file is dedicated to specifying docstrings, default prompt templates, +default parameters, as well as additional fields for the models. ### Model Implementation + File Name: `modeling_{runtime}_{model_name}.py` -For each runtime, i.e., torch (default with no prefix), TensorFlow -`tf`, Flax - `flax`, it is necessary to implement a class that adheres to the `openllm.LLM` interface. The conventional class name follows the `RuntimeModelName` pattern, e.g., `FlaxFlanT5`. +For each runtime, i.e., torch (default with no prefix), TensorFlow -`tf`, Flax - +`flax`, it is necessary to implement a class that adheres to the `openllm.LLM` +interface. The conventional class name follows the `RuntimeModelName` pattern, +e.g., `FlaxFlanT5`. -### Initialization Files -The `__init__.py` files facilitate intelligent imports, type checking, and auto-completions for the OpenLLM codebase and CLIs. +### Initialization Files + +The `__init__.py` files facilitate intelligent imports, type checking, and +auto-completions for the OpenLLM codebase and CLIs. ### Entrypoint -After establishing the model config and implementation class, register them in the `auto` folder files. There are four entrypoint files: -* `configuration_auto.py`: Registers `ModelConfig` classes -* `modeling_auto.py`: Registers a model's PyTorch implementation -* `modeling_tf_auto.py`: Registers a model's TensorFlow implementation -* `modeling_flax_auto.py`: Registers a model's Flax implementation + +After establishing the model config and implementation class, register them in +the `auto` folder files. There are four entrypoint files: + +- `configuration_auto.py`: Registers `ModelConfig` classes +- `modeling_auto.py`: Registers a model's PyTorch implementation +- `modeling_tf_auto.py`: Registers a model's TensorFlow implementation +- `modeling_flax_auto.py`: Registers a model's Flax implementation ### Dummy Objects -In the `src/openllm/utils` directory, dummy objects are created for each model and runtime implementation. These specify the dependencies required for each model. + +In the `src/openllm/utils` directory, dummy objects are created for each model +and runtime implementation. These specify the dependencies required for each +model. + +### Updating README.md + +Run `./tools/update-readme.py` to update the README.md file with the new model. ## Raise a Pull Request -Once you have completed the checklist above, raise a PR and the OpenLLMs maintainer will review it ASAP. Once the PR is merged, you should be able to see your model in the next release! πŸŽ‰ 🎊 \ No newline at end of file + +Once you have completed the checklist above, raise a PR and the OpenLLMs +maintainer will review it ASAP. Once the PR is merged, you should be able to see +your model in the next release! πŸŽ‰ 🎊 diff --git a/README.md b/README.md index ffba36b5..f7f39c78 100644 --- a/README.md +++ b/README.md @@ -97,6 +97,13 @@ start interacting with the model: >>> client.query('Explain to me the difference between "further" and "farther"') ``` +You can also use the `openllm query` command to query the model from the +terminal: + +```bash +openllm query --local 'Explain to me the difference between "further" and "farther"' +``` + ## πŸš€ Deploying to Production To deploy your LLMs into production: @@ -131,27 +138,23 @@ To deploy your LLMs into production: OpenLLM currently supports the following: -- [dolly-v2](https://github.com/databrickslabs/dolly) -- [flan-t5](https://huggingface.co/docs/transformers/model_doc/flan-t5) -- [chatglm](https://github.com/THUDM/ChatGLM-6B) -- [falcon](https://falconllm.tii.ae/) -- [starcoder](https://github.com/bigcode-project/starcoder) + -### Model-specific Dependencies +| Model | CPU | GPU | Optional | +| --------------------------------------------------------------------- | --- | --- | -------------------------------- | +| [flan-t5](https://huggingface.co/docs/transformers/model_doc/flan-t5) | βœ… | βœ… | `pip install openllm[flan-t5]` | +| [dolly-v2](https://github.com/databrickslabs/dolly) | βœ… | βœ… | πŸ‘Ύ (not needed) | +| [chatglm](https://github.com/THUDM/ChatGLM-6B) | ❌ | βœ… | `pip install openllm[chatglm]` | +| [starcoder](https://github.com/bigcode-project/starcoder) | ❌ | βœ… | `pip install openllm[starcoder]` | +| [falcon](https://falconllm.tii.ae/) | ❌ | βœ… | `pip install openllm[falcon]` | +| [stablelm](https://github.com/Stability-AI/StableLM) | βœ… | βœ… | πŸ‘Ύ (not needed) | -We respect your system's space and efficiency. That's why we don't force users -to install dependencies for all models. By default, you can run `dolly-v2` and -`flan-t5` without installing any additional packages. +> NOTE: We respect users' system disk space. Hence, OpenLLM doesn't enforce to +> install dependencies to run all models. If one wishes to use any of the +> aforementioned models, make sure to install the optional dependencies +> mentioned above. -To enable support for a specific model, you'll need to install its corresponding -dependencies. You can do this by using `pip install "openllm[model_name]"`. For -example, to use **chatglm**: - -```bash -pip install "openllm[chatglm]" -``` - -This will install `cpm_kernels` and `sentencepiece` additionally + ### Runtime Implementations diff --git a/pyproject.toml b/pyproject.toml index 1282f7c4..dd7048e2 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -98,6 +98,7 @@ dependencies = [ "pytest-randomly", "pytest-rerunfailures", "pre-commit", + "tomlkit", ] [tool.hatch.envs.default.scripts] cov = ["test-cov", "cov-report"] diff --git a/src/openllm/_configuration.py b/src/openllm/_configuration.py index 5d1fa75e..355fcfec 100644 --- a/src/openllm/_configuration.py +++ b/src/openllm/_configuration.py @@ -649,6 +649,9 @@ class LLMConfig: __openllm_hints__: dict[str, t.Any] = Field(None, init=False) """An internal cache of resolved types for this LLMConfig.""" + __openllm_url__: str = Field(None, init=False) + """The resolved url for this LLMConfig.""" + GenerationConfig: type = type """Users can override this subclass of any given LLMConfig to provide GenerationConfig default value. For example: @@ -678,6 +681,7 @@ class LLMConfig: default_timeout: int | None = None, trust_remote_code: bool = False, requires_gpu: bool = False, + url: str | None = None, ): if name_type == "dasherize": model_name = inflection.underscore(cls.__name__.replace("Config", "")) @@ -694,6 +698,7 @@ class LLMConfig: cls.__openllm_model_name__ = model_name cls.__openllm_start_name__ = start_name cls.__openllm_env__ = openllm.utils.ModelEnv(model_name) + cls.__openllm_url__ = url or "(not set)" # NOTE: Since we want to enable a pydantic-like experience # this means we will have to hide the attr abstraction, and generate diff --git a/src/openllm/models/chatglm/configuration_chatglm.py b/src/openllm/models/chatglm/configuration_chatglm.py index 2e06afdb..8c742729 100644 --- a/src/openllm/models/chatglm/configuration_chatglm.py +++ b/src/openllm/models/chatglm/configuration_chatglm.py @@ -22,6 +22,7 @@ class ChatGLMConfig( trust_remote_code=True, default_timeout=3600000, requires_gpu=True, + url="https://github.com/THUDM/ChatGLM-6B", ): """ ChatGLM is an open bilingual language model based on diff --git a/src/openllm/models/dolly_v2/configuration_dolly_v2.py b/src/openllm/models/dolly_v2/configuration_dolly_v2.py index 5faaa599..38aa9881 100644 --- a/src/openllm/models/dolly_v2/configuration_dolly_v2.py +++ b/src/openllm/models/dolly_v2/configuration_dolly_v2.py @@ -20,7 +20,12 @@ from __future__ import annotations import openllm -class DollyV2Config(openllm.LLMConfig, default_timeout=3600000, trust_remote_code=True): +class DollyV2Config( + openllm.LLMConfig, + default_timeout=3600000, + trust_remote_code=True, + url="https://github.com/databrickslabs/dolly", +): """Databricks’ Dolly is an instruction-following large language model trained on the Databricks machine learning platform that is licensed for commercial use. diff --git a/src/openllm/models/falcon/configuration_falcon.py b/src/openllm/models/falcon/configuration_falcon.py index 3d5a63b2..36818860 100644 --- a/src/openllm/models/falcon/configuration_falcon.py +++ b/src/openllm/models/falcon/configuration_falcon.py @@ -22,6 +22,7 @@ class FalconConfig( trust_remote_code=True, requires_gpu=True, default_timeout=3600000, + url="https://falconllm.tii.ae/", ): """Falcon-7B is a 7B parameters causal decoder-only model built by TII and trained on 1,500B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) diff --git a/src/openllm/models/flan_t5/configuration_flan_t5.py b/src/openllm/models/flan_t5/configuration_flan_t5.py index ca5354ea..9f0584e8 100644 --- a/src/openllm/models/flan_t5/configuration_flan_t5.py +++ b/src/openllm/models/flan_t5/configuration_flan_t5.py @@ -40,7 +40,7 @@ saved pretrained, or a fine-tune FLAN-T5, provide ``OPENLLM_FLAN_T5_PRETRAINED=' DEFAULT_PROMPT_TEMPLATE = """Answer the following question:\nQuestion: {instruction}\nAnswer:""" -class FlanT5Config(openllm.LLMConfig): +class FlanT5Config(openllm.LLMConfig, url="https://huggingface.co/docs/transformers/model_doc/flan-t5"): """FLAN-T5 was released in the paper [Scaling Instruction-Finetuned Language Models](https://arxiv.org/pdf/2210.11416.pdf) - it is an enhanced version of T5 that has been finetuned in a mixture of tasks. diff --git a/src/openllm/models/stablelm/configuration_stablelm.py b/src/openllm/models/stablelm/configuration_stablelm.py index fbab48a0..05089715 100644 --- a/src/openllm/models/stablelm/configuration_stablelm.py +++ b/src/openllm/models/stablelm/configuration_stablelm.py @@ -16,7 +16,7 @@ from __future__ import annotations import openllm -class StableLMConfig(openllm.LLMConfig, name_type="lowercase"): +class StableLMConfig(openllm.LLMConfig, name_type="lowercase", url="https://github.com/Stability-AI/StableLM"): """StableLM-Base-Alpha is a suite of 3B and 7B parameter decoder-only language models pre-trained on a diverse collection of English datasets with a sequence length of 4096 to push beyond the context window limitations of existing open-source language models. diff --git a/src/openllm/models/starcoder/configuration_starcoder.py b/src/openllm/models/starcoder/configuration_starcoder.py index 259146b4..c210392f 100644 --- a/src/openllm/models/starcoder/configuration_starcoder.py +++ b/src/openllm/models/starcoder/configuration_starcoder.py @@ -16,7 +16,12 @@ from __future__ import annotations import openllm -class StarCoderConfig(openllm.LLMConfig, name_type="lowercase", requires_gpu=True): +class StarCoderConfig( + openllm.LLMConfig, + name_type="lowercase", + requires_gpu=True, + url="https://github.com/bigcode-project/starcoder", +): """The StarCoder models are 15.5B parameter models trained on 80+ programming languages from [The Stack (v1.2)](https://huggingface.co/datasets/bigcode/the-stack), with opt-out requests excluded. diff --git a/tools/update-readme.py b/tools/update-readme.py new file mode 100755 index 00000000..7c75daf2 --- /dev/null +++ b/tools/update-readme.py @@ -0,0 +1,95 @@ +#!/usr/bin/env python3 + +from __future__ import annotations + +import os +import typing as t + +import inflection +import tomlkit + +import openllm + +START_COMMENT = f"\n" +END_COMMENT = f"\n" + +ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) + + +def main() -> int: + with open(os.path.join(ROOT, "pyproject.toml"), "r") as f: + deps = tomlkit.parse(f.read()).value["project"]["optional-dependencies"] + + with open(os.path.join(ROOT, "README.md"), "r") as f: + readme = f.readlines() + + start_index, stop_index = readme.index(START_COMMENT), readme.index(END_COMMENT) + formatted: dict[t.Literal["Model", "CPU", "GPU", "Optional"], list[str]] = { + "Model": [], + "CPU": [], + "GPU": [], + "Optional": [], + } + max_name_len_div = 0 + max_install_len_div = 0 + does_not_need_custom_installation: list[str] = [] + for name, config in openllm.CONFIG_MAPPING.items(): + dashed = inflection.dasherize(name) + model_name = f"[{dashed}]({config.__openllm_url__})" + if len(model_name) > max_name_len_div: + max_name_len_div = len(model_name) + formatted["Model"].append(model_name) + formatted["GPU"].append("βœ…") + formatted["CPU"].append("βœ…" if not config.__openllm_requires_gpu__ else "❌") + instruction = "πŸ‘Ύ (not needed)" + if dashed in deps: + instruction = f"`pip install openllm[{dashed}]`" + else: + does_not_need_custom_installation.append(model_name) + if len(instruction) > max_install_len_div: + max_install_len_div = len(instruction) + formatted["Optional"].append(instruction) + + meta = ["\n"] + + # NOTE: headers + meta += f"| Model {' ' * (max_name_len_div - 6)} | CPU | GPU | Optional {' ' * (max_install_len_div - 8)}|\n" + # NOTE: divs + meta += f"| {'-' * max_name_len_div}" + " | --- | --- | " + f"{'-' * max_install_len_div} |\n" + # NOTE: rows + for links, cpu, gpu, custom_installation in t.cast("tuple[str, str, str, str]", zip(*formatted.values())): + meta += ( + "| " + + links + + " " * (max_name_len_div - len(links)) + + f" | {cpu} | {gpu} | " + + custom_installation + + " " + * ( + max_install_len_div + - len(custom_installation) + - (0 if links not in does_not_need_custom_installation else 1) + ) + + " |\n" + ) + meta += "\n" + + # NOTE: adding notes + meta += """\ +> NOTE: We respect users' system disk space. Hence, OpenLLM doesn't enforce to +> install dependencies to run all models. If one wishes to use any of the +> aforementioned models, make sure to install the optional dependencies +> mentioned above. + +""" + + readme = readme[:start_index] + [START_COMMENT] + meta + [END_COMMENT] + readme[stop_index + 1 :] + + with open(os.path.join(ROOT, "README.md"), "w") as f: + f.writelines(readme) + + return 0 + + +if __name__ == "__main__": + raise SystemExit(main())