mirror of
https://github.com/bentoml/OpenLLM.git
synced 2026-01-25 07:47:49 -05:00
feat(tooling): add script to auto update readme table of supported
models Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
This commit is contained in:
@@ -1,44 +1,76 @@
|
||||
# Adding a New Model
|
||||
|
||||
OpenLLM encourages contributions by welcoming users to incorporate their custom Large Language Models (LLMs) into the ecosystem. You can set up your development environment by referring to our [Developer Guide](https://github.com/bentoml/OpenLLM/blob/main/DEVELOPMENT.md).
|
||||
OpenLLM encourages contributions by welcoming users to incorporate their custom
|
||||
Large Language Models (LLMs) into the ecosystem. You can set up your development
|
||||
environment by referring to our
|
||||
[Developer Guide](https://github.com/bentoml/OpenLLM/blob/main/DEVELOPMENT.md).
|
||||
|
||||
## Procedure
|
||||
|
||||
All the relevant code for incorporating a new model resides within `src/openllm/models`. Start by creating a new folder named after your `model_name` in snake_case. Here's your roadmap:
|
||||
All the relevant code for incorporating a new model resides within
|
||||
`src/openllm/models`. Start by creating a new folder named after your
|
||||
`model_name` in snake_case. Here's your roadmap:
|
||||
|
||||
- [ ] Generate model configuration file: `src/openllm/models/{model_name}/configuration_{model_name}.py`
|
||||
- [ ] Establish model implementation files: `src/openllm/models/{model_name}/modeling_{runtime}_{model_name}.py`
|
||||
- [ ] Create module's `__init__.py`: `src/openllm/models/{model_name}/__init__.py`
|
||||
- [ ] Generate model configuration file:
|
||||
`src/openllm/models/{model_name}/configuration_{model_name}.py`
|
||||
- [ ] Establish model implementation files:
|
||||
`src/openllm/models/{model_name}/modeling_{runtime}_{model_name}.py`
|
||||
- [ ] Create module's `__init__.py`:
|
||||
`src/openllm/models/{model_name}/__init__.py`
|
||||
- [ ] Adjust the entrypoints for files at `src/openllm/models/auto/*`
|
||||
- [ ] Modify the main `__init__.py`: `src/openllm/models/__init__.py`
|
||||
- [ ] Develop or adjust dummy objects for dependencies, a task exclusive to the `utils` directory: `src/openllm/utils/*`
|
||||
- [ ] Develop or adjust dummy objects for dependencies, a task exclusive to the
|
||||
`utils` directory: `src/openllm/utils/*`
|
||||
|
||||
For a working example, check out any pre-implemented model.
|
||||
|
||||
> We are developing a CLI command and helper script to generate these files, which would further streamline the process. Until then, manual creation is necessary.
|
||||
> We are developing a CLI command and helper script to generate these files,
|
||||
> which would further streamline the process. Until then, manual creation is
|
||||
> necessary.
|
||||
|
||||
### Model Configuration
|
||||
|
||||
File Name: `configuration_{model_name}.py`
|
||||
|
||||
This file is dedicated to specifying docstrings, default prompt templates, default parameters, as well as additional fields for the models.
|
||||
This file is dedicated to specifying docstrings, default prompt templates,
|
||||
default parameters, as well as additional fields for the models.
|
||||
|
||||
### Model Implementation
|
||||
|
||||
File Name: `modeling_{runtime}_{model_name}.py`
|
||||
|
||||
For each runtime, i.e., torch (default with no prefix), TensorFlow -`tf`, Flax - `flax`, it is necessary to implement a class that adheres to the `openllm.LLM` interface. The conventional class name follows the `RuntimeModelName` pattern, e.g., `FlaxFlanT5`.
|
||||
For each runtime, i.e., torch (default with no prefix), TensorFlow -`tf`, Flax -
|
||||
`flax`, it is necessary to implement a class that adheres to the `openllm.LLM`
|
||||
interface. The conventional class name follows the `RuntimeModelName` pattern,
|
||||
e.g., `FlaxFlanT5`.
|
||||
|
||||
### Initialization Files
|
||||
The `__init__.py` files facilitate intelligent imports, type checking, and auto-completions for the OpenLLM codebase and CLIs.
|
||||
### Initialization Files
|
||||
|
||||
The `__init__.py` files facilitate intelligent imports, type checking, and
|
||||
auto-completions for the OpenLLM codebase and CLIs.
|
||||
|
||||
### Entrypoint
|
||||
After establishing the model config and implementation class, register them in the `auto` folder files. There are four entrypoint files:
|
||||
* `configuration_auto.py`: Registers `ModelConfig` classes
|
||||
* `modeling_auto.py`: Registers a model's PyTorch implementation
|
||||
* `modeling_tf_auto.py`: Registers a model's TensorFlow implementation
|
||||
* `modeling_flax_auto.py`: Registers a model's Flax implementation
|
||||
|
||||
After establishing the model config and implementation class, register them in
|
||||
the `auto` folder files. There are four entrypoint files:
|
||||
|
||||
- `configuration_auto.py`: Registers `ModelConfig` classes
|
||||
- `modeling_auto.py`: Registers a model's PyTorch implementation
|
||||
- `modeling_tf_auto.py`: Registers a model's TensorFlow implementation
|
||||
- `modeling_flax_auto.py`: Registers a model's Flax implementation
|
||||
|
||||
### Dummy Objects
|
||||
In the `src/openllm/utils` directory, dummy objects are created for each model and runtime implementation. These specify the dependencies required for each model.
|
||||
|
||||
In the `src/openllm/utils` directory, dummy objects are created for each model
|
||||
and runtime implementation. These specify the dependencies required for each
|
||||
model.
|
||||
|
||||
### Updating README.md
|
||||
|
||||
Run `./tools/update-readme.py` to update the README.md file with the new model.
|
||||
|
||||
## Raise a Pull Request
|
||||
Once you have completed the checklist above, raise a PR and the OpenLLMs maintainer will review it ASAP. Once the PR is merged, you should be able to see your model in the next release! 🎉 🎊
|
||||
|
||||
Once you have completed the checklist above, raise a PR and the OpenLLMs
|
||||
maintainer will review it ASAP. Once the PR is merged, you should be able to see
|
||||
your model in the next release! 🎉 🎊
|
||||
|
||||
39
README.md
39
README.md
@@ -97,6 +97,13 @@ start interacting with the model:
|
||||
>>> client.query('Explain to me the difference between "further" and "farther"')
|
||||
```
|
||||
|
||||
You can also use the `openllm query` command to query the model from the
|
||||
terminal:
|
||||
|
||||
```bash
|
||||
openllm query --local 'Explain to me the difference between "further" and "farther"'
|
||||
```
|
||||
|
||||
## 🚀 Deploying to Production
|
||||
|
||||
To deploy your LLMs into production:
|
||||
@@ -131,27 +138,23 @@ To deploy your LLMs into production:
|
||||
|
||||
OpenLLM currently supports the following:
|
||||
|
||||
- [dolly-v2](https://github.com/databrickslabs/dolly)
|
||||
- [flan-t5](https://huggingface.co/docs/transformers/model_doc/flan-t5)
|
||||
- [chatglm](https://github.com/THUDM/ChatGLM-6B)
|
||||
- [falcon](https://falconllm.tii.ae/)
|
||||
- [starcoder](https://github.com/bigcode-project/starcoder)
|
||||
<!-- update-readme.py: start -->
|
||||
|
||||
### Model-specific Dependencies
|
||||
| Model | CPU | GPU | Optional |
|
||||
| --------------------------------------------------------------------- | --- | --- | -------------------------------- |
|
||||
| [flan-t5](https://huggingface.co/docs/transformers/model_doc/flan-t5) | ✅ | ✅ | `pip install openllm[flan-t5]` |
|
||||
| [dolly-v2](https://github.com/databrickslabs/dolly) | ✅ | ✅ | 👾 (not needed) |
|
||||
| [chatglm](https://github.com/THUDM/ChatGLM-6B) | ❌ | ✅ | `pip install openllm[chatglm]` |
|
||||
| [starcoder](https://github.com/bigcode-project/starcoder) | ❌ | ✅ | `pip install openllm[starcoder]` |
|
||||
| [falcon](https://falconllm.tii.ae/) | ❌ | ✅ | `pip install openllm[falcon]` |
|
||||
| [stablelm](https://github.com/Stability-AI/StableLM) | ✅ | ✅ | 👾 (not needed) |
|
||||
|
||||
We respect your system's space and efficiency. That's why we don't force users
|
||||
to install dependencies for all models. By default, you can run `dolly-v2` and
|
||||
`flan-t5` without installing any additional packages.
|
||||
> NOTE: We respect users' system disk space. Hence, OpenLLM doesn't enforce to
|
||||
> install dependencies to run all models. If one wishes to use any of the
|
||||
> aforementioned models, make sure to install the optional dependencies
|
||||
> mentioned above.
|
||||
|
||||
To enable support for a specific model, you'll need to install its corresponding
|
||||
dependencies. You can do this by using `pip install "openllm[model_name]"`. For
|
||||
example, to use **chatglm**:
|
||||
|
||||
```bash
|
||||
pip install "openllm[chatglm]"
|
||||
```
|
||||
|
||||
This will install `cpm_kernels` and `sentencepiece` additionally
|
||||
<!-- update-readme.py: stop -->
|
||||
|
||||
### Runtime Implementations
|
||||
|
||||
|
||||
@@ -98,6 +98,7 @@ dependencies = [
|
||||
"pytest-randomly",
|
||||
"pytest-rerunfailures",
|
||||
"pre-commit",
|
||||
"tomlkit",
|
||||
]
|
||||
[tool.hatch.envs.default.scripts]
|
||||
cov = ["test-cov", "cov-report"]
|
||||
|
||||
@@ -649,6 +649,9 @@ class LLMConfig:
|
||||
__openllm_hints__: dict[str, t.Any] = Field(None, init=False)
|
||||
"""An internal cache of resolved types for this LLMConfig."""
|
||||
|
||||
__openllm_url__: str = Field(None, init=False)
|
||||
"""The resolved url for this LLMConfig."""
|
||||
|
||||
GenerationConfig: type = type
|
||||
"""Users can override this subclass of any given LLMConfig to provide GenerationConfig
|
||||
default value. For example:
|
||||
@@ -678,6 +681,7 @@ class LLMConfig:
|
||||
default_timeout: int | None = None,
|
||||
trust_remote_code: bool = False,
|
||||
requires_gpu: bool = False,
|
||||
url: str | None = None,
|
||||
):
|
||||
if name_type == "dasherize":
|
||||
model_name = inflection.underscore(cls.__name__.replace("Config", ""))
|
||||
@@ -694,6 +698,7 @@ class LLMConfig:
|
||||
cls.__openllm_model_name__ = model_name
|
||||
cls.__openllm_start_name__ = start_name
|
||||
cls.__openllm_env__ = openllm.utils.ModelEnv(model_name)
|
||||
cls.__openllm_url__ = url or "(not set)"
|
||||
|
||||
# NOTE: Since we want to enable a pydantic-like experience
|
||||
# this means we will have to hide the attr abstraction, and generate
|
||||
|
||||
@@ -22,6 +22,7 @@ class ChatGLMConfig(
|
||||
trust_remote_code=True,
|
||||
default_timeout=3600000,
|
||||
requires_gpu=True,
|
||||
url="https://github.com/THUDM/ChatGLM-6B",
|
||||
):
|
||||
"""
|
||||
ChatGLM is an open bilingual language model based on
|
||||
|
||||
@@ -20,7 +20,12 @@ from __future__ import annotations
|
||||
import openllm
|
||||
|
||||
|
||||
class DollyV2Config(openllm.LLMConfig, default_timeout=3600000, trust_remote_code=True):
|
||||
class DollyV2Config(
|
||||
openllm.LLMConfig,
|
||||
default_timeout=3600000,
|
||||
trust_remote_code=True,
|
||||
url="https://github.com/databrickslabs/dolly",
|
||||
):
|
||||
"""Databricks’ Dolly is an instruction-following large language model trained on the Databricks
|
||||
machine learning platform that is licensed for commercial use.
|
||||
|
||||
|
||||
@@ -22,6 +22,7 @@ class FalconConfig(
|
||||
trust_remote_code=True,
|
||||
requires_gpu=True,
|
||||
default_timeout=3600000,
|
||||
url="https://falconllm.tii.ae/",
|
||||
):
|
||||
"""Falcon-7B is a 7B parameters causal decoder-only model built by
|
||||
TII and trained on 1,500B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb)
|
||||
|
||||
@@ -40,7 +40,7 @@ saved pretrained, or a fine-tune FLAN-T5, provide ``OPENLLM_FLAN_T5_PRETRAINED='
|
||||
DEFAULT_PROMPT_TEMPLATE = """Answer the following question:\nQuestion: {instruction}\nAnswer:"""
|
||||
|
||||
|
||||
class FlanT5Config(openllm.LLMConfig):
|
||||
class FlanT5Config(openllm.LLMConfig, url="https://huggingface.co/docs/transformers/model_doc/flan-t5"):
|
||||
"""FLAN-T5 was released in the paper [Scaling Instruction-Finetuned Language Models](https://arxiv.org/pdf/2210.11416.pdf)
|
||||
- it is an enhanced version of T5 that has been finetuned in a mixture of tasks.
|
||||
|
||||
|
||||
@@ -16,7 +16,7 @@ from __future__ import annotations
|
||||
import openllm
|
||||
|
||||
|
||||
class StableLMConfig(openllm.LLMConfig, name_type="lowercase"):
|
||||
class StableLMConfig(openllm.LLMConfig, name_type="lowercase", url="https://github.com/Stability-AI/StableLM"):
|
||||
"""StableLM-Base-Alpha is a suite of 3B and 7B parameter decoder-only language models
|
||||
pre-trained on a diverse collection of English datasets with a sequence
|
||||
length of 4096 to push beyond the context window limitations of existing open-source language models.
|
||||
|
||||
@@ -16,7 +16,12 @@ from __future__ import annotations
|
||||
import openllm
|
||||
|
||||
|
||||
class StarCoderConfig(openllm.LLMConfig, name_type="lowercase", requires_gpu=True):
|
||||
class StarCoderConfig(
|
||||
openllm.LLMConfig,
|
||||
name_type="lowercase",
|
||||
requires_gpu=True,
|
||||
url="https://github.com/bigcode-project/starcoder",
|
||||
):
|
||||
"""The StarCoder models are 15.5B parameter models trained on 80+ programming languages from
|
||||
[The Stack (v1.2)](https://huggingface.co/datasets/bigcode/the-stack), with opt-out requests excluded.
|
||||
|
||||
|
||||
95
tools/update-readme.py
Executable file
95
tools/update-readme.py
Executable file
@@ -0,0 +1,95 @@
|
||||
#!/usr/bin/env python3
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
import typing as t
|
||||
|
||||
import inflection
|
||||
import tomlkit
|
||||
|
||||
import openllm
|
||||
|
||||
START_COMMENT = f"<!-- {os.path.basename(__file__)}: start -->\n"
|
||||
END_COMMENT = f"<!-- {os.path.basename(__file__)}: stop -->\n"
|
||||
|
||||
ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
|
||||
|
||||
|
||||
def main() -> int:
|
||||
with open(os.path.join(ROOT, "pyproject.toml"), "r") as f:
|
||||
deps = tomlkit.parse(f.read()).value["project"]["optional-dependencies"]
|
||||
|
||||
with open(os.path.join(ROOT, "README.md"), "r") as f:
|
||||
readme = f.readlines()
|
||||
|
||||
start_index, stop_index = readme.index(START_COMMENT), readme.index(END_COMMENT)
|
||||
formatted: dict[t.Literal["Model", "CPU", "GPU", "Optional"], list[str]] = {
|
||||
"Model": [],
|
||||
"CPU": [],
|
||||
"GPU": [],
|
||||
"Optional": [],
|
||||
}
|
||||
max_name_len_div = 0
|
||||
max_install_len_div = 0
|
||||
does_not_need_custom_installation: list[str] = []
|
||||
for name, config in openllm.CONFIG_MAPPING.items():
|
||||
dashed = inflection.dasherize(name)
|
||||
model_name = f"[{dashed}]({config.__openllm_url__})"
|
||||
if len(model_name) > max_name_len_div:
|
||||
max_name_len_div = len(model_name)
|
||||
formatted["Model"].append(model_name)
|
||||
formatted["GPU"].append("✅")
|
||||
formatted["CPU"].append("✅" if not config.__openllm_requires_gpu__ else "❌")
|
||||
instruction = "👾 (not needed)"
|
||||
if dashed in deps:
|
||||
instruction = f"`pip install openllm[{dashed}]`"
|
||||
else:
|
||||
does_not_need_custom_installation.append(model_name)
|
||||
if len(instruction) > max_install_len_div:
|
||||
max_install_len_div = len(instruction)
|
||||
formatted["Optional"].append(instruction)
|
||||
|
||||
meta = ["\n"]
|
||||
|
||||
# NOTE: headers
|
||||
meta += f"| Model {' ' * (max_name_len_div - 6)} | CPU | GPU | Optional {' ' * (max_install_len_div - 8)}|\n"
|
||||
# NOTE: divs
|
||||
meta += f"| {'-' * max_name_len_div}" + " | --- | --- | " + f"{'-' * max_install_len_div} |\n"
|
||||
# NOTE: rows
|
||||
for links, cpu, gpu, custom_installation in t.cast("tuple[str, str, str, str]", zip(*formatted.values())):
|
||||
meta += (
|
||||
"| "
|
||||
+ links
|
||||
+ " " * (max_name_len_div - len(links))
|
||||
+ f" | {cpu} | {gpu} | "
|
||||
+ custom_installation
|
||||
+ " "
|
||||
* (
|
||||
max_install_len_div
|
||||
- len(custom_installation)
|
||||
- (0 if links not in does_not_need_custom_installation else 1)
|
||||
)
|
||||
+ " |\n"
|
||||
)
|
||||
meta += "\n"
|
||||
|
||||
# NOTE: adding notes
|
||||
meta += """\
|
||||
> NOTE: We respect users' system disk space. Hence, OpenLLM doesn't enforce to
|
||||
> install dependencies to run all models. If one wishes to use any of the
|
||||
> aforementioned models, make sure to install the optional dependencies
|
||||
> mentioned above.
|
||||
|
||||
"""
|
||||
|
||||
readme = readme[:start_index] + [START_COMMENT] + meta + [END_COMMENT] + readme[stop_index + 1 :]
|
||||
|
||||
with open(os.path.join(ROOT, "README.md"), "w") as f:
|
||||
f.writelines(readme)
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
Reference in New Issue
Block a user