* wip
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* get rid of panics
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* expose it properly from the config
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Simplify
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* forgot to commit
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Remove focus on test
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Small fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactor: break down json grammar parser in different files
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: patch to `refactor_grammars` - propagate errors (#3006)
propagate errors around
Signed-off-by: Dave Lee <dave@gray101.com>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Dave Lee <dave@gray101.com>
Co-authored-by: Dave <dave@gray101.com>
* fix(cuda): downgrade to 12.0 to increase compatibility range
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* improve messaging
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(functions): enhance parsing with broken JSON when we parse the raw results
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* breaking: make function name by default
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(grammar): dynamically generate grammars with mutating keys
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactor: simplify condition
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Update docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Fixes:
```
panic: invalid argument to IntN
goroutine 401 [running]:
math/rand/v2.(*Rand).IntN(...)
/home/mudler/_git/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.4.linux-amd64/src/math/rand/v2/rand.go:190
math/rand/v2.IntN(...)
/home/mudler/_git/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.4.linux-amd64/src/math/rand/v2/rand.go:307
github.com/mudler/LocalAI/core/cli.Proxy.func2()
/home/mudler/_git/LocalAI/core/cli/federated.go:104 +0x76e
created by github.com/mudler/LocalAI/core/cli.Proxy in goroutine 1
/home/mudler/_git/LocalAI/core/cli/federated.go:91 +0x3c5
```
When no nodes are found and something is trying to hit the federated
endpoint (and no tunnels are ready yet).
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* docs(swagger): finish convering gallery section
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* docs: add section to explain how to install models with local-ai run
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Minor docs adjustments
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* docs(swagger): core more localai/openai endpoints
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix swagger descriptions for backend_monitor.go
Signed-off-by: Dave <dave@gray101.com>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Dave <dave@gray101.com>
Co-authored-by: Dave <dave@gray101.com>
* minor cleanup to go.mod - importing ourself?
Signed-off-by: Dave Lee <dave@gray101.com>
* figured out why we were importing ourself and fixed it
Signed-off-by: Dave Lee <dave@gray101.com>
* set pull_request_target
Signed-off-by: Dave Lee <dave@gray101.com>
---------
Signed-off-by: Dave Lee <dave@gray101.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* ci: add workflow to comment new Opened PRs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Update comment-pr.yaml
eliminate a stray ' character that was terminating the shell script by slightly rewriting the prompt
Signed-off-by: Dave <dave@gray101.com>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Dave <dave@gray101.com>
Co-authored-by: Dave <dave@gray101.com>
* feat(llama.cpp): add embeddings
Also enable embeddings by default for llama.cpp models
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(Makefile): prepare llama.cpp sources only once
Otherwise we keep cloning llama.cpp for each of the variants
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* do not set embeddings to false
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* docs: add embeddings to the YAML config reference
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* models(gallery): add deepseek-v2-lite
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Update deepseek.yaml
The trailing space here is presumably part of the template string - try use a chomp keep to get yaml lint to accept it?
Signed-off-by: Dave <dave@gray101.com>
* Update deepseek.yaml
chomp didn't fix, erase the space and see what happens.
Signed-off-by: Dave <dave@gray101.com>
* Update deepseek.yaml
Signed-off-by: Dave <dave@gray101.com>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Dave <dave@gray101.com>
Co-authored-by: Dave <dave@gray101.com>
* feat(install.sh): support federated install
This allows to support federation by exposing:
- FEDERATED: true/false to share the instance
- FEDERATED_SERVER: true/false to start the federated load balancer (it
forwards requests to the federation)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* docs: update installer parameters
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Dave <dave@gray101.com>
* Git fetch specific branch instead of full tree during build
* Recursively create directores for all sources
---------
Signed-off-by: Dave <dave@gray101.com>
Signed-off-by: Dave Lee <dave@gray101.com>
Co-authored-by: Shane <dev@null.com>
Co-authored-by: Dave <dave@gray101.com>
* ci: Do not test the full matrix on PR
Hipblas and sycl take long time to build from scratch as for now. Until
we find a way to speedup image building we are going to test these only
on master, and not for every open PR.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci: do not run release workflow twice
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(models): pull models from urls
When using `run` now we can point directly to hf models via URL, for
instance:
```bash
local-ai run
huggingface://TheBloke/TinyLlama-1.1B-Chat-v0.3-GGUF/tinyllama-1.1b-chat-v0.3.Q2_K.gguf
```
Will pull the gguf model and place it in the models folder - of course
this depends on the fact that the gguf file should be automatically
detected by our guesser mechanism in order to this to make effective.
Similarly now galleries can refer to single files in the API requests.
This also changes the download code and `yaml` files now are treated in
the same way, so now config files are saved with the appropriate name
(and not hashed anymore).
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Adapt tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* use a sed hack to jam a missing line in place for grpc's abseil version.
Signed-off-by: Dave Lee <dave@gray101.com>
---------
Signed-off-by: Dave Lee <dave@gray101.com>
2024-07-11 01:40:54 +00:00
155 changed files with 7461 additions and 1887 deletions
@@ -62,6 +66,8 @@ from diffusers.schedulers import (
PNDMScheduler,
UniPCMultistepScheduler,
)
# The scheduler list mapping was taken from here: https://github.com/neggles/animatediff-cli/blob/6f336f5f4b5e38e85d7f06f1744ef42d0a45f2a7/src/animatediff/schedulers.py#L39
# Credits to https://github.com/neggles
# See https://github.com/huggingface/diffusers/issues/4167 for more details on sched mapping from A1111
// ListAssistantsEndpoint is the OpenAI Assistant API endpoint to list assistents https://platform.openai.com/docs/api-reference/assistants/listAssistants
// @Summary List available assistents
// @Param limit query int false "Limit the number of assistants returned"
// @Param order query string false "Order of assistants returned"
// @Param after query string false "Return assistants created after the given ID"
// @Param before query string false "Return assistants created before the given ID"
// Because we're altering the existing assistants list we should just duplicate it for now.
@@ -230,13 +239,11 @@ func modelExists(cl *config.BackendConfigLoader, ml *model.ModelLoader, modelNam
return
}
// DeleteAssistantEndpoint is the OpenAI Assistant API endpoint to delete assistents https://platform.openai.com/docs/api-reference/assistants/deleteAssistant
<h2class="text-center text-3xl font-semibold text-gray-100"><iclass="text-yellow-200 ml-2 fa-solid fa-triangle-exclamation animate-pulse"></i> Ouch! seems you don't have any models installed!</h2>
<h2class="text-center text-3xl font-semibold text-gray-100"><iclass="text-yellow-200 ml-2 fa-solid fa-triangle-exclamation animate-pulse"></i> Ouch! seems you don't have any models installed from the LocalAI gallery!</h2>
<pclass="text-center mt-4 text-xl">..install something from the <aclass="text-gray-400 hover:text-white ml-1 px-3 py-2 rounded"href="/browse">🖼️ Gallery</a> or check the <ahref="https://localai.io/basics/getting_started/"class="text-gray-400 hover:text-white ml-1 px-3 py-2 rounded"><iclass="fa-solid fa-book"></i> Getting started documentation </a></p>
data-twe-ripple-color="light"data-twe-ripple-init=""hx-confirm="Are you sure you wish to delete the model?"hx-post="/browse/delete/model/{{.Name}}"hx-swap="outerHTML"><iclass="fa-solid fa-cancel pr-2"></i>Delete</button>
<h5class="mb-4 text-justify">LocalAI uses P2P technologies to enable distribution of work between peers. It is possible to share an instance with Federation and/or split the weights of a model across peers (only available with llama.cpp models). You can now share computational resources between your devices or your friends!</h5>
<!-- Warning box if p2p token is empty and p2p is enabled -->
<pclass="text-xl font-semibold text-white"><iclass="fa-solid fa-exclamation-triangle"></i> Warning: P2P mode is disabled or no token was specified</p>
<pclass="mb-4">You have to enable P2P mode by starting LocalAI with <code>--p2p</code>. Please restart the server with <code>--p2p</code> to generate a new token automatically that can be used to automatically discover other nodes. If you already have a token specify it with <code>export TOKEN=".."</code><ahref="https://localai.io/features/distribute/"target="_blank">
@@ -112,6 +112,8 @@ name: "" # Model name, used to identify the model in API calls.
# Precision settings for the model, reducing precision can enhance performance on some hardware.
f16:null# Whether to use 16-bit floating-point precision.
embeddings:true# Enable embeddings for the model.
# Concurrency settings for the application.
threads:null# Number of threads to use for processing.
@@ -150,7 +152,8 @@ function:
replace_function_results:[]# Placeholder to replace function call results with arbitrary strings or patterns.
replace_llm_results:[]# Replace language model results with arbitrary strings or patterns.
capture_llm_results:[]# Capture language model results as text result, among JSON, in function calls. For instance, if a model returns a block for "thinking" and a block for "response", this will allow you to capture the thinking block.
return_name_in_function_response:false# Some models might prefer to use "name" rather then "function" when returning JSON data. This will allow to use "name" as a key in the JSON response.
function_name_key:"name"
function_arguments_key:"arguments"
# Feature gating flags to enable experimental or optional features.
@@ -29,5 +29,9 @@ List of the Environment Variables:
| **THREADS** | Number of processor threads the application should use. Defaults to the number of logical cores minus one. |
| **VERSION** | Specifies the version of LocalAI to install. Defaults to the latest available version. |
| **MODELS_PATH** | Directory path where LocalAI models are stored (default is /usr/share/local-ai/models). |
| **P2P_TOKEN** | Token to use for the federation or for starting workers see [documentation]({{%relref "docs/features/distributed_inferencing" %}}) |
| **WORKER** | Set to "true" to make the instance a worker (p2p token is required see [documentation]({{%relref "docs/features/distributed_inferencing" %}})) |
| **FEDERATED** | Set to "true" to share the instance with the federation (p2p token is required see [documentation]({{%relref "docs/features/distributed_inferencing" %}})) |
| **FEDERATED_SERVER** | Set to "true" to run the instance as a federation server which forwards requests to the federation (p2p token is required see [documentation]({{%relref "docs/features/distributed_inferencing" %}})) |
We are looking into improving the installer, and as this is a first iteration any feedback is welcome! Open up an [issue](https://github.com/mudler/LocalAI/issues/new/choose) if something doesn't work for you!
> _Do you have already a model file? Skip to [Run models manually]({{%relref "docs/getting-started/manual" %}})_.
> _Do you have already a model file? Skip to [Run models manually]({{%relref "docs/getting-started/models" %}})_.
To load models into LocalAI, you can either [use models manually]({{%relref "docs/getting-started/manual" %}}) or configure LocalAI to pull the models from external sources, like Huggingface and configure the model.
To load models into LocalAI, you can either [use models manually]({{%relref "docs/getting-started/models" %}}) or configure LocalAI to pull the models from external sources, like Huggingface and configure the model.
To do that, you can point LocalAI to an URL to a YAML configuration file - however - LocalAI does also have some popular model configuration embedded in the binary as well. Below you can find a list of the models configuration that LocalAI has pre-built, see [Model customization]({{%relref "docs/getting-started/customize-model" %}}) on how to configure models from URLs.
@@ -16,6 +16,10 @@ Here are answers to some of the most common questions.
Most gguf-based models should work, but newer models may require additions to the API. If a model doesn't work, please feel free to open up issues. However, be cautious about downloading models from the internet and directly onto your machine, as there may be security vulnerabilities in lama.cpp or ggml that could be maliciously exploited. Some models can be found on Hugging Face: https://huggingface.co/models?search=gguf, or models from gpt4all are compatible too: https://github.com/nomic-ai/gpt4all.
### Benchmarking LocalAI and llama.cpp shows different results!
LocalAI applies a set of defaults when loading models with the llama.cpp backend, one of these is mirostat sampling - while it achieves better results, it slows down the inference. You can disable this by setting `mirostat: 0` in the model config file. See also the advanced section ({{%relref "docs/advanced/advanced-usage" %}}) for more information and [this issue](https://github.com/mudler/LocalAI/issues/2780).
### What's the difference with Serge, or XXX?
LocalAI is a multi-model solution that doesn't focus on a specific model type (e.g., llama.cpp or alpaca.cpp), and it handles all of these internally for faster inference, easy to set up locally and deploy to Kubernetes.
This functionality enables LocalAI to distribute inference requests across multiple worker nodes, improving efficiency and performance. Nodes are automatically discovered and connect via p2p by using a shared token which makes sure the communication is secure and private between the nodes of the network.
LocalAI supports two modes of distributed inferencing via p2p:
- **Federated Mode**: Requests are shared between the cluster and routed to a single worker node in the network based on the load balancer's decision.
- **Worker Mode** (aka "model sharding" or "splitting weights"): Requests are processed by all the workers which contributes to the final inference result (by sharing the model weights).
## Usage
Starting LocalAI with `--p2p` generates a shared token for connecting multiple instances: and that's all you need to create AI clusters, eliminating the need for intricate network setups.
Simply navigate to the "Swarm" section in the WebUI and follow the on-screen instructions.
For fully shared instances, initiate LocalAI with --p2p --federated and adhere to the Swarm section's guidance. This feature, while still experimental, offers a tech preview quality experience.
### Federated mode
Federated mode allows to launch multiple LocalAI instances and connect them together in a federated network. This mode is useful when you want to distribute the load of the inference across multiple nodes, but you want to have a single point of entry for the API. In the Swarm section of the WebUI, you can see the instructions to connect multiple instances together.
This will generate a token that you can use to connect other LocalAI instances to the network or others can use to join the network. If you already have a token, you can specify it using the `TOKEN` environment variable.
To start a load balanced server that routes the requests to the network, run with the `TOKEN`:
```bash
local-ai federated
```
To see all the available options, run `local-ai federated --help`.
The instructions are displayed in the "Swarm" section of the WebUI, guiding you through the process of connecting multiple instances.
### Workers mode
{{% alert note %}}
This feature is available exclusively with llama-cpp compatible models.
This feature was introduced in [LocalAI pull request #2324](https://github.com/mudler/LocalAI/pull/2324) and is based on the upstream work in [llama.cpp pull request #6829](https://github.com/ggerganov/llama.cpp/pull/6829).
{{% /alert %}}
This functionality enables LocalAI to distribute inference requests across multiple worker nodes, improving efficiency and performance.
To connect multiple workers to a single LocalAI instance, start first a server in p2p mode:
## Usage
```bash
local-ai run --p2p
```
### Starting Workers
And navigate the WebUI to the "Swarm" section to see the instructions to connect multiple workers to the network.
Alternatively, you can build the RPC server following the llama.cpp [README](https://github.com/ggerganov/llama.cpp/blob/master/examples/rpc/README.md), which is compatible with LocalAI.
### Starting LocalAI
To start the LocalAI server, which handles API requests, specify the worker addresses using the `LLAMACPP_GRPC_SERVERS` environment variable:
And you can specify the address of the workers when starting LocalAI with the `LLAMACPP_GRPC_SERVERS` environment variable:
```bash
LLAMACPP_GRPC_SERVERS="address1:port,address2:port" local-ai run
```
The workload on the LocalAI server will then be distributed across the specified nodes.
## Peer-to-Peer Networking
Alternatively, you can build the RPC workers/server following the llama.cpp [README](https://github.com/ggerganov/llama.cpp/blob/master/examples/rpc/README.md), which is compatible with LocalAI.
Workers can also connect to each other in a peer-to-peer network, distributing the workload in a decentralized manner.
A shared token between the server and the workers is required for communication within the peer-to-peer network. This feature supports both local network (using mDNS discovery) and DHT for communication across different networks.
The token is automatically generated when starting the server with the `--p2p` flag. Workers can be started with the token using `local-ai worker p2p-llama-cpp-rpc` and specifying the token via the environment variable `TOKEN` or with the `--token` argument.
A network is established between the server and workers using DHT and mDNS discovery protocols. The llama.cpp RPC server is automatically started and exposed to the peer-to-peer network, allowing the API server to connect.
When the HTTP server starts, it discovers workers in the network and creates port forwards to the local service. Llama.cpp is configured to use these services. For more details on the implementation, refer to [LocalAI pull request #2343](https://github.com/mudler/LocalAI/pull/2343).
### Usage
Use the WebUI to guide you in the process of starting new workers. This example shows the manual steps to highlight the process.
1. Start the server with `--p2p`:
```bash
./local-ai run --p2p
# 1:02AM INF loading environment variables from file envFile=.env
# 1:02AM INF Setting logging to info
# 1:02AM INF P2P mode enabled
# 1:02AM INF No token provided, generating one
# 1:02AM INF Generated Token:
# XXXXXXXXXXX
# 1:02AM INF Press a button to proceed
# Get the token in the Swarm section of the WebUI
```
Copy the displayed token and press Enter.
Copy the token from the WebUI or via API call (e.g., `curl http://localhost:8000/p2p/token`) and save it for later use.
To reuse the same token later, restart the server with `--p2ptoken` or `P2P_TOKEN`.
@@ -93,11 +120,7 @@ The server logs should indicate that new workers are being discovered.
3. Start inference as usual on the server initiated in step 1.
## Notes
- If running in p2p mode with container images, make sure you start the container with `--net host` or `network_mode: host` in the docker-compose file.
- Only a single model is supported currently.
- Ensure the server detects new workers before starting inference. Currently, additional workers cannot be added once inference has begun.
@@ -109,3 +132,20 @@ There are options that can be tweaked or parameters that can be set using enviro
| **LOCALAI_P2P_DISABLE_DHT** | Set to "true" to disable DHT and enable p2p layer to be local only (mDNS) |
| **LOCALAI_P2P_DISABLE_LIMITS** | Set to "true" to disable connection limits and resources management |
| **LOCALAI_P2P_TOKEN** | Set the token for the p2p network |
## Architecture
LocalAI uses https://github.com/libp2p/go-libp2p under the hood, the same project powering IPFS. Differently from other frameworks, LocalAI uses peer2peer without a single master server, but rather it uses sub/gossip and ledger functionalities to achieve consensus across different peers.
[EdgeVPN](https://github.com/mudler/edgevpn) is used as a library to establish the network and expose the ledger functionality under a shared token to ease out automatic discovery and have separated, private peer2peer networks.
The weights are split proportional to the memory when running into worker mode, when in federation mode each request is split to every node which have to load the model fully.
## Notes
- If running in p2p mode with container images, make sure you start the container with `--net host` or `network_mode: host` in the docker-compose file.
- Only a single model is supported currently.
- Ensure the server detects new workers before starting inference. Currently, additional workers cannot be added once inference has begun.
- For more details on the implementation, refer to [LocalAI pull request #2343](https://github.com/mudler/LocalAI/pull/2343)
After you have golang installed and working, you can install the required binaries for compiling the golang protobuf components via the following commands
```bash
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.0
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@8ba23be9613c672d40ae261d2a1335d639bdd59b
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
- Browse the Model Gallery from the Web Interface and install models with a couple of clicks. For more details, refer to the [Gallery Documentation]({{% relref "docs/features/model-gallery" %}}).
- Specify a model from the LocalAI gallery during startup, e.g., `local-ai run <model_gallery_name>`.
- Use a URI to specify a model file (e.g., `huggingface://...`, `oci://`, or `ollama://`) when starting LocalAI, e.g., `local-ai run huggingface://TheBloke/phi-2-GGUF/phi-2.Q8_0.gguf`.
- Specify a URL to a model configuration file when starting LocalAI, e.g., `local-ai run https://gist.githubusercontent.com/.../phi-2.yaml`.
- Manually install the models by copying the files into the models directory (`--models`).
# Run Models Manually
## Run and Install Models via the Gallery
To run models available in the LocalAI gallery, you can use the WebUI or specify the model name when starting LocalAI. Models can be found in the gallery via the Web interface, the [model gallery](https://models.localai.io), or the CLI with: `local-ai models list`.
To install a model from the gallery, use the model name as the URI. For example, to run LocalAI with the Hermes model, execute:
```bash
local-ai run hermes-2-theta-llama-3-8b
```
To install only the model, use:
```bash
local-ai models install hermes-2-theta-llama-3-8b
```
Note: The galleries available in LocalAI can be customized to point to a different URL or a local directory. For more information on how to setup your own gallery, see the [Gallery Documentation]({{% relref "docs/features/model-gallery" %}}).
## Run Models via URI
To run models via URI, specify a URI to a model file or a configuration file when starting LocalAI. Valid syntax includes:
- From OCIs: `oci://container_image:tag`, `ollama://model_id:tag`
- From configuration files: `https://gist.githubusercontent.com/.../phi-2.yaml`
Configuration files can be used to customize the model defaults and settings. For advanced configurations, refer to the [Customize Models section]({{% relref "docs/getting-started/customize-model" %}}).
### Examples
```bash
# Start LocalAI with the phi-2 model
local-ai run huggingface://TheBloke/phi-2-GGUF/phi-2.Q8_0.gguf
# Install and run a model from the Ollama OCI registry
local-ai run ollama://gemma:2b
# Run a model from a configuration file
local-ai run https://gist.githubusercontent.com/.../phi-2.yaml
# Install and run a model from a standard OCI registry (e.g., Docker Hub)
local-ai run oci://localai/phi-2:latest
```
## Run Models Manually
Follow these steps to manually run models using LocalAI:
1. **Prepare Your Model and Configuration Files**:
Ensure you have a model file and a configuration YAML file, if necessary. Customize model defaults and specific settings with a configuration file. For advanced configurations, refer to the [Advanced Documentation]({{% relref "docs/advanced" %}}).
Ensure you have a model file and, if necessary, a configuration YAML file. Customize model defaults and settings with a configuration file. For advanced configurations, refer to the [Advanced Documentation]({{% relref "docs/advanced" %}}).
2. **GPU Acceleration**:
For instructions on GPU acceleration, visit the [GPU acceleration]({{% relref "docs/features/gpu-acceleration" %}}) page.
For instructions on GPU acceleration, visit the [GPU Acceleration]({{% relref "docs/features/gpu-acceleration" %}}) page.
3. **Run LocalAI**:
Choose one of the following methods to run LocalAI:
@@ -160,5 +208,3 @@ For instructions on building LocalAI from source, see the [Build Section]({{% re
{{< /tabs >}}
For more model configurations, visit the [Examples Section](https://github.com/mudler/LocalAI/tree/master/examples/configurations).
@@ -38,13 +38,13 @@ For detailed instructions, see [Using container images]({{% relref "docs/getting
## Running LocalAI with All-in-One (AIO) Images
> _Already have a model file? Skip to [Run models manually]({{% relref "docs/getting-started/manual" %}})_.
> _Already have a model file? Skip to [Run models manually]({{% relref "docs/getting-started/models" %}})_.
LocalAI's All-in-One (AIO) images are pre-configured with a set of models and backends to fully leverage almost all the features of LocalAI. If pre-configured models are not required, you can use the standard [images]({{% relref "docs/getting-started/container-images" %}}).
These images are available for both CPU and GPU environments. AIO images are designed for ease of use and require no additional configuration.
It is recommended to use AIO images if you prefer not to configure the models manually or via the web interface. For running specific models, refer to the [manual method]({{% relref "docs/getting-started/manual" %}}).
It is recommended to use AIO images if you prefer not to configure the models manually or via the web interface. For running specific models, refer to the [manual method]({{% relref "docs/getting-started/models" %}}).
The AIO images come pre-configured with the following features:
- Text to Speech (TTS)
@@ -66,5 +66,5 @@ Explore additional resources and community contributions:
- [Run from Container images]({{% relref "docs/getting-started/container-images" %}})
- [Examples to try from the CLI]({{% relref "docs/getting-started/try-it-out" %}})
- [Build LocalAI and the container image]({{% relref "docs/getting-started/build" %}})
@@ -17,10 +17,10 @@ After installation, install new models by navigating the model gallery, or by us
To install models with the WebUI, see the [Models section]({{%relref "docs/features/model-gallery" %}}).
With the CLI you can list the models with `local-ai models list` and install them with `local-ai models install <model-name>`.
You can also [run models manually]({{%relref "docs/getting-started/manual" %}}) by copying files into the `models` directory.
You can also [run models manually]({{%relref "docs/getting-started/models" %}}) by copying files into the `models` directory.
{{% /alert %}}
You can test out the API endpoints using `curl`, few examples are listed below. The models we are refering here (`gpt-4`, `gpt-4-vision-preview`, `tts-1`, `whisper-1`) are the default models that come with the AIO images - you can also use any other model you have installed.
You can test out the API endpoints using `curl`, few examples are listed below. The models we are referring here (`gpt-4`, `gpt-4-vision-preview`, `tts-1`, `whisper-1`) are the default models that come with the AIO images - you can also use any other model you have installed.
### Text Generation
@@ -193,4 +193,4 @@ Don't use the model file as `model` in the request unless you want to handle the
Use the model names like you would do with OpenAI like in the examples below. For instance `gpt-4-vision-preview`, or `gpt-4`.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.