mirror of
https://github.com/bentoml/OpenLLM.git
synced 2026-01-22 06:19:35 -05:00
infra: update changelog and added readme badges [generated] (#162)
This commit is contained in:
5
.github/workflows/build.yml
vendored
5
.github/workflows/build.yml
vendored
@@ -20,11 +20,6 @@ on:
|
||||
- "main"
|
||||
tags:
|
||||
- "v*"
|
||||
paths:
|
||||
- ".github/workflows/build.yaml"
|
||||
- "src/openllm/bundle/oci/Dockerfile"
|
||||
- "src/openllm/**"
|
||||
- "src/openllm_client/**"
|
||||
pull_request:
|
||||
branches:
|
||||
- "main"
|
||||
|
||||
49
CHANGELOG.md
49
CHANGELOG.md
@@ -18,6 +18,55 @@ This changelog is managed by towncrier and is compiled at release time.
|
||||
|
||||
<!-- towncrier release notes start -->
|
||||
|
||||
## Changes for the Upcoming Release
|
||||
|
||||
> **Warning**: These changes reflect the current [development progress](https://github.com/bentoml/openllm/tree/main)
|
||||
> and have **not** been part of a official PyPI release yet.
|
||||
> To try out the latest change, one can do: `pip install -U git+https://github.com/bentoml/openllm.git@main`
|
||||
|
||||
|
||||
|
||||
### Features
|
||||
|
||||
- Added support for base container with OpenLLM. The base container will contains all necessary requirements
|
||||
to run OpenLLM. Currently it does included compiled version of FlashAttention v2, vLLM, AutoGPTQ and triton.
|
||||
|
||||
This will now be the base image for all future BentoLLM. The image will also be published to public GHCR.
|
||||
|
||||
To extend and use this image into your bento, simply specify ``base_image`` under ``bentofile.yaml``:
|
||||
|
||||
```yaml
|
||||
docker:
|
||||
base_image: ghcr.io/bentoml/openllm:<hash>
|
||||
```
|
||||
|
||||
The release strategy would include:
|
||||
- versioning of ``ghcr.io/bentoml/openllm:sha-<sha1>`` for every commit to main, ``ghcr.io/bentoml/openllm:0.2.11`` for specific release version
|
||||
- alias ``latest`` will be managed with docker/build-push-action (discouraged)
|
||||
|
||||
Note that all these images include compiled kernels that has been tested on Ampere GPUs with CUDA 11.8.
|
||||
|
||||
To quickly run the image, do the following:
|
||||
|
||||
```bash
|
||||
docker run --rm --gpus all -it -v /home/ubuntu/.local/share/bentoml:/tmp/bentoml -e BENTOML_HOME=/tmp/bentoml \
|
||||
-e OPENLLM_USE_LOCAL_LATEST=True -e OPENLLM_LLAMA_FRAMEWORK=vllm ghcr.io/bentoml/openllm:2b5e96f90ad314f54e07b5b31e386e7d688d9bb2 start llama --model-id meta-llama/Llama-2-7b-chat-hf --workers-per-resource conserved --debug`
|
||||
```
|
||||
|
||||
In conjunction with this, OpenLLM now also have a set of small CLI utilities via ``openllm ext`` for ease-of-use
|
||||
|
||||
General fixes around codebase bytecode optimization
|
||||
|
||||
Fixes logs output to filter correct level based on ``--debug`` and ``--quiet``
|
||||
|
||||
``openllm build`` now will default run model check locally. To skip it pass in ``--fast`` (before this is the default behaviour, but ``--no-fast`` as default makes more sense here as ``openllm build`` should also be able to run standalone)
|
||||
|
||||
All ``LlaMA`` namespace has been renamed to ``Llama`` (internal change and shouldn't affect end users)
|
||||
|
||||
``openllm.AutoModel.for_model`` now will always return the instance. Runner kwargs will be handled via create_runner
|
||||
[#142](https://github.com/bentoml/openllm/issues/142)
|
||||
|
||||
|
||||
## [0.2.11](https://github.com/bentoml/openllm/tree/v0.2.11)
|
||||
|
||||
### Features
|
||||
|
||||
68
README.md
68
README.md
@@ -6,20 +6,25 @@
|
||||
<h1 align="center">🦾 OpenLLM</h1>
|
||||
<a href="https://pypi.org/project/openllm">
|
||||
<img src="https://img.shields.io/pypi/v/openllm.svg?logo=pypi&label=PyPI&logoColor=gold" alt="pypi_status" />
|
||||
</a><a href="https://github.com/bentoml/OpenLLM/actions/workflows/ci.yml">
|
||||
<img src="https://github.com/bentoml/OpenLLM/actions/workflows/ci.yml/badge.svg?branch=main" alt="ci" />
|
||||
</a><a href="https://twitter.com/bentomlai">
|
||||
<img src="https://badgen.net/badge/icon/@bentomlai/1DA1F2?icon=twitter&label=Follow%20Us" alt="Twitter" />
|
||||
</a><a href="https://l.bentoml.com/join-openllm-discord">
|
||||
<img src="https://badgen.net/badge/icon/OpenLLM/7289da?icon=discord&label=Join%20Us" alt="Discord" />
|
||||
</a><a href="https://github.com/bentoml/OpenLLM/actions/workflows/ci.yml">
|
||||
<img src="https://github.com/bentoml/OpenLLM/actions/workflows/ci.yml/badge.svg?branch=main" alt="ci" />
|
||||
</a><a href="https://results.pre-commit.ci/latest/github/bentoml/OpenLLM/main">
|
||||
<img src="https://results.pre-commit.ci/badge/github/bentoml/OpenLLM/main.svg" alt="pre-commit.ci status" />
|
||||
</a><br>
|
||||
</a><a href="https://pypi.org/project/openllm">
|
||||
<a href="https://pypi.org/project/openllm">
|
||||
<img src="https://img.shields.io/pypi/pyversions/openllm.svg?logo=python&label=Python&logoColor=gold" alt="python_version" />
|
||||
</a><a href="https://github.com/pypa/hatch">
|
||||
<img src="https://img.shields.io/badge/%F0%9F%A5%9A-Hatch-4051b5.svg" alt="Hatch" />
|
||||
</a><br>
|
||||
</a><a href="https://github.com/astral-sh/ruff">
|
||||
<img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/charliermarsh/ruff/main/assets/badge/v2.json" alt="Ruff" />
|
||||
</a><a href="https://github.com/python/mypy">
|
||||
<img src="https://img.shields.io/badge/types-mypy-blue.svg" alt="types - mypy" />
|
||||
</a><a href="https://github.com/microsoft/pyright">
|
||||
<img src="https://img.shields.io/badge/types-pyright-yellow.svg" alt="types - pyright" />
|
||||
</a><br>
|
||||
<p>An open platform for operating large language models (LLMs) in production.</br>
|
||||
Fine-tune, serve, deploy, and monitor any LLMs with ease.</p>
|
||||
@@ -120,17 +125,19 @@ openllm query 'Explain to me the difference between "further" and "farther"'
|
||||
|
||||
Visit `http://localhost:3000/docs.json` for OpenLLM's API specification.
|
||||
|
||||
OpenLLM seamlessly supports many models and their variants.
|
||||
Users can also specify different variants of the model to be served, by
|
||||
providing the `--model-id` argument, e.g.:
|
||||
OpenLLM seamlessly supports many models and their variants. Users can also
|
||||
specify different variants of the model to be served, by providing the
|
||||
`--model-id` argument, e.g.:
|
||||
|
||||
```bash
|
||||
openllm start flan-t5 --model-id google/flan-t5-large
|
||||
```
|
||||
|
||||
> **Note** that `openllm` also supports all variants of fine-tuning weights, custom model path
|
||||
> as well as quantized weights for any of the supported models as long as it can be loaded with
|
||||
> the model architecture. Refer to [supported models](https://github.com/bentoml/OpenLLM/tree/main#-supported-models) section for models' architecture.
|
||||
> **Note** that `openllm` also supports all variants of fine-tuning weights,
|
||||
> custom model path as well as quantized weights for any of the supported models
|
||||
> as long as it can be loaded with the model architecture. Refer to
|
||||
> [supported models](https://github.com/bentoml/OpenLLM/tree/main#-supported-models)
|
||||
> section for models' architecture.
|
||||
|
||||
Use the `openllm models` command to see the list of models and their variants
|
||||
supported in OpenLLM.
|
||||
@@ -473,8 +480,8 @@ To include this into the Bento, one can also provide a `--adapter-id` into
|
||||
openllm build opt --model-id facebook/opt-6.7b --adapter-id ...
|
||||
```
|
||||
|
||||
> **Note**: We will gradually roll out support for fine-tuning all models.
|
||||
> The following models contain fine-tuning support: OPT, Falcon, LlaMA.
|
||||
> **Note**: We will gradually roll out support for fine-tuning all models. The
|
||||
> following models contain fine-tuning support: OPT, Falcon, LlaMA.
|
||||
|
||||
### Integrating a New Model
|
||||
|
||||
@@ -485,10 +492,10 @@ to see how you can do it yourself.
|
||||
|
||||
### Embeddings
|
||||
|
||||
OpenLLM tentatively provides embeddings endpoint for supported models.
|
||||
This can be accessed via `/v1/embeddings`.
|
||||
OpenLLM tentatively provides embeddings endpoint for supported models. This can
|
||||
be accessed via `/v1/embeddings`.
|
||||
|
||||
To use via CLI, simply call ``openllm embed``:
|
||||
To use via CLI, simply call `openllm embed`:
|
||||
|
||||
```bash
|
||||
openllm embed --endpoint http://localhost:3000 "I like to eat apples" -o json
|
||||
@@ -508,7 +515,7 @@ openllm embed --endpoint http://localhost:3000 "I like to eat apples" -o json
|
||||
}
|
||||
```
|
||||
|
||||
To invoke this endpoint, use ``client.embed`` from the Python SDK:
|
||||
To invoke this endpoint, use `client.embed` from the Python SDK:
|
||||
|
||||
```python
|
||||
import openllm
|
||||
@@ -518,15 +525,16 @@ client = openllm.client.HTTPClient("http://localhost:3000")
|
||||
client.embed("I like to eat apples")
|
||||
```
|
||||
|
||||
> **Note**: Currently, the following model framily supports embeddings: Llama, T5 (Flan-T5, FastChat, etc.), ChatGLM
|
||||
> **Note**: Currently, the following model framily supports embeddings: Llama,
|
||||
> T5 (Flan-T5, FastChat, etc.), ChatGLM
|
||||
|
||||
## ⚙️ Integrations
|
||||
|
||||
OpenLLM is not just a standalone product; it's a building block designed to
|
||||
integrate with other powerful tools easily. We currently offer integration with
|
||||
[BentoML](https://github.com/bentoml/BentoML),
|
||||
[LangChain](https://github.com/hwchase17/langchain),
|
||||
and [Transformers Agents](https://huggingface.co/docs/transformers/transformers_agents).
|
||||
[LangChain](https://github.com/hwchase17/langchain), and
|
||||
[Transformers Agents](https://huggingface.co/docs/transformers/transformers_agents).
|
||||
|
||||
### BentoML
|
||||
|
||||
@@ -555,7 +563,6 @@ async def prompt(input_text: str) -> str:
|
||||
return answer
|
||||
```
|
||||
|
||||
|
||||
### [LangChain](https://python.langchain.com/docs/ecosystem/integrations/openllm)
|
||||
|
||||
To quickly start a local LLM with `langchain`, simply do the following:
|
||||
@@ -600,15 +607,14 @@ def chat(input_text: str):
|
||||
> **Note** You can find out more examples under the
|
||||
> [examples](https://github.com/bentoml/OpenLLM/tree/main/examples) folder.
|
||||
|
||||
|
||||
### Transformers Agents
|
||||
|
||||
OpenLLM seamlessly integrates with [Transformers Agents](https://huggingface.co/docs/transformers/transformers_agents).
|
||||
|
||||
OpenLLM seamlessly integrates with
|
||||
[Transformers Agents](https://huggingface.co/docs/transformers/transformers_agents).
|
||||
|
||||
> **Warning** The Transformers Agent is still at an experimental stage. It is
|
||||
> recommended to install OpenLLM with `pip install -r nightly-requirements.txt` to get
|
||||
> the latest API update for HuggingFace agent.
|
||||
> recommended to install OpenLLM with `pip install -r nightly-requirements.txt`
|
||||
> to get the latest API update for HuggingFace agent.
|
||||
|
||||
```python
|
||||
import transformers
|
||||
@@ -665,15 +671,14 @@ There are several ways to deploy your LLMs:
|
||||
```bash
|
||||
bentoml containerize <name:version>
|
||||
```
|
||||
This generates a OCI-compatible docker image that can be deployed anywhere docker runs.
|
||||
For best scalability and reliability of your LLM service in production, we recommend deploy
|
||||
with BentoCloud。
|
||||
|
||||
This generates a OCI-compatible docker image that can be deployed anywhere
|
||||
docker runs. For best scalability and reliability of your LLM service in
|
||||
production, we recommend deploy with BentoCloud。
|
||||
|
||||
### ☁️ BentoCloud
|
||||
|
||||
Deploy OpenLLM with [BentoCloud](https://www.bentoml.com/bento-cloud/),
|
||||
the serverless cloud for shipping and scaling AI applications.
|
||||
Deploy OpenLLM with [BentoCloud](https://www.bentoml.com/bento-cloud/), the
|
||||
serverless cloud for shipping and scaling AI applications.
|
||||
|
||||
1. **Create a BentoCloud account:** [sign up here](https://bentoml.com/cloud)
|
||||
for early access
|
||||
@@ -705,7 +710,6 @@ the serverless cloud for shipping and scaling AI applications.
|
||||
`bentoml deployment create` command following the
|
||||
[deployment instructions](https://docs.bentoml.com/en/latest/reference/cli.html#bentoml-deployment-create).
|
||||
|
||||
|
||||
## 👥 Community
|
||||
|
||||
Engage with like-minded individuals passionate about LLMs, AI, and more on our
|
||||
|
||||
Reference in New Issue
Block a user