+++ disableToc = false title = "Sound Classification" weight = 18 url = "/features/audio-classification/" +++ Sound-event classification (audio tagging) answers the question **"what am I hearing?"** - given an audio clip, it returns a list of scored [AudioSet](https://research.google.com/audioset/) labels (e.g. *Baby cry, infant cry*, *Glass breaking*, *Dog bark*, *Alarm*). LocalAI exposes this through the `/v1/audio/classification` endpoint, modelled after `/v1/audio/transcriptions`. The reference backend is **[ced.cpp](https://github.com/mudler/ced.cpp)** (CED, a 527-class AudioSet tagger), a small ViT over a log-mel spectrogram ported to ggml with full PyTorch parity. Apache-2.0 weights are redistributable as GGUF. Because classification is exposed as a regular OpenAI-style endpoint, any HTTP client works - there is no Python dependency on the consumer side. ## Endpoint ``` POST /v1/audio/classification Content-Type: multipart/form-data ``` | Field | Type | Description | |-------|------|-------------| | `file` | file (required) | audio file in any format `ffmpeg` accepts | | `model` | string (required) | name of the sound-classification-capable model (e.g. `ced-base`) | | `top_k` | int | number of top tags to return (0 = backend default) | | `threshold` | float | drop tags scoring below this value | ### Response ```json { "model": "ced-base", "detections": [ {"index": 23, "label": "Baby cry, infant cry", "score": 0.87}, {"index": 22, "label": "Crying, sobbing", "score": 0.41} ] } ``` Detections are returned in score-descending order. Scores are per-class probabilities (multi-label, independent), so they do not sum to 1. ## Example ```bash curl http://localhost:8080/v1/audio/classification \ -H "Content-Type: multipart/form-data" \ -F file="@/path/to/clip.wav" \ -F model="ced-base" \ -F top_k=10 ``` ## See also - [Audio to Text]({{% relref "audio-to-text" %}}) - speech transcription - [Speaker Diarization]({{% relref "audio-diarization" %}}) - who spoke when