| Model |
Parameters |
Required GPU |
Start a Server |
| deepseek |
r1-671b |
80Gx16 |
openllm serve deepseek:r1-671b |
| gemma2 |
2b |
12G |
openllm serve gemma2:2b |
| gemma3 |
3b |
12G |
openllm serve gemma3:3b |
| jamba1.5 |
mini-ff0a |
80Gx2 |
openllm serve jamba1.5:mini-ff0a |
| llama3.1 |
8b |
24G |
openllm serve llama3.1:8b |
| llama3.2 |
1b |
24G |
openllm serve llama3.2:1b |
| llama3.3 |
70b |
80Gx2 |
openllm serve llama3.3:70b |
| llama4 |
17b16e |
80Gx8 |
openllm serve llama4:17b16e |
| mistral |
8b-2410 |
24G |
openllm serve mistral:8b-2410 |
| mistral-large |
123b-2407 |
80Gx4 |
openllm serve mistral-large:123b-2407 |
| phi4 |
14b |
80G |
openllm serve phi4:14b |
| pixtral |
12b-2409 |
80G |
openllm serve pixtral:12b-2409 |
| qwen2.5 |
7b |
24G |
openllm serve qwen2.5:7b |
| qwen2.5-coder |
3b |
24G |
openllm serve qwen2.5-coder:3b |
| qwq |
32b |
80G |
openllm serve qwq:32b |
For the full model list, see the [OpenLLM models repository](https://github.com/bentoml/openllm-models).
## Start an LLM server
To start an LLM server locally, use the `openllm serve` command and specify the model version.
> [!NOTE]
> OpenLLM does not store model weights. A Hugging Face token (HF_TOKEN) is required for gated models.
>
> 1. Create your Hugging Face token [here](https://huggingface.co/settings/tokens).
> 2. Request access to the gated model, such as [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct).
> 3. Set your token as an environment variable by running:
> ```bash
> export HF_TOKEN=