4.5 KiB
Developer Guide
This Developer Guide is designed to help you contribute to the OpenLLM project. Follow these steps to set up your development environment and learn the process of contributing to our open-source project.
Join our Discord Channel and reach out to us if you have any question!
Table of Contents
Setting Up Your Development Environment
Before you can start developing, you'll need to set up your environment:
-
Ensure you have Git, and Python3.8+ installed.
-
Fork the OpenLLM repository from GitHub.
-
Clone the forked repository from GitHub:
git clone git@github.com:username/OpenLLM.git && cd openllm -
Add the OpenLLM upstream remote to your local OpenLLM clone:
git remote add upstream git@github.com:bentoml/OpenLLM.git -
Configure git to pull from the upstream remote:
git switch main # ensure you're on the main branch git fetch upstream --tags git branch --set-upstream-to=upstream/main -
(Optional) Link
.python-version-defaultto.python-version:ln .python-version-default .python-version
Development Workflow
There are a few ways to contribute to the repository structure for OpenLLM:
Adding new models
- recipe.yaml contains all related-metadata for generating new LLM-based bentos. To add a new LLM, the following structure should be adhere to:
"<model_name>:<model_tag>":
project: vllm-chat
service_config:
name: phi3
traffic:
timeout: 300
resources:
gpu: 1
gpu_type: nvidia-tesla-l4
engine_config:
model: microsoft/Phi-3-mini-4k-instruct
max_model_len: 4096
dtype: half
chat_template: phi-3
-
<model_name>represents the type of model to be supported. Currently supportsphi3,llama2,llama3,gemma -
<model_tag>emphasizes the type of model and its related metadata. The convention would include<model_size>-<model_type>-<precision>[-<quantization>]For example:microsoft/Phi-3-mini-4k-instructshould be represented as3.8b-instruct-fp16.TheBloke/Llama-2-7B-Chat-AWQwould be7b-chat-awq-4bit
-
projectwould be used as the basis for the generated bento. Currently, most models should usevllm-chatas default. -
service_configentails all BentoML-related configuration to run this bento.
Note
We recommend to include the following field for
service_config:
nameshould be the same as<model_name>resourcesincludes the available accelerator that can run this models. See more here
-
engine_configare fields to be used for vLLM engine. See more supported arguments inAsyncEngineArgs. We recommend to always includemodel,max_model_len,dtypeandtrust_remote_code. -
If the model is a chat model,
chat_templateshould be used. Add the appropriatechat_templateunder chat_template directory should you decide to do so.
-
You can then run
BENTOML_HOME=$(openllm repo default)/bentoml/bentos python make.py <model_name>:<model_tag>to generate the required bentos. -
You can then submit a Pull request to
openllmwith the recipe changes
Adding bentos
OpenLLM now also manages a generated bento repository. If you update and modify and generated bentos, make sure to update the recipe and added the generated bentos under bentoml/bentos.
Adding repos
If you wish to create a your own managed git repo, you should follow the structure of bentoml/openllm-models.
To add your custom repo, do openllm repo add <repo_alias> <git_url>