Files
profilarr/docs/backend/parser.md

11 KiB

Parser Service

Source: src/services/parser/ (C# microservice), src/lib/server/utils/arr/parser/ (TypeScript client)

The parser is an optional C# microservice that extracts structured metadata from release titles: resolution, source, languages, release group, episode info, and more. It exists as a separate process because Radarr and Sonarr use .NET regex internally, and matching against the same engine ensures custom format conditions evaluate identically to how the Arr apps themselves would evaluate them.

When the parser is offline, features that depend on it degrade gracefully: testing pages show warnings, evaluation endpoints skip CF matching, and regex validation falls back to permissive behavior.

Table of Contents

Service

The microservice is a .NET 8 minimal API (Parser.csproj). It is stateless and designed to be called from Profilarr's backend for each release title that needs parsing or pattern matching.

Endpoints

Method Path Input Output
POST /parse { title, type } Full ParseResponse (all fields)
POST /match { text, patterns[] } { results: { pattern: bool } }
POST /match/batch { texts[], patterns[] } { results: { text: { pattern: bool } } }
POST /validate/regex { pattern } { valid: bool, error?: string }
GET /health -- { status: "healthy", version }

The /match/batch endpoint pre-compiles all patterns once with RegexOptions.Compiled, then evaluates them against each text using Parallel.ForEach for throughput. All regex evaluation uses a 100ms timeout to prevent ReDoS.

Parsers

Five independent parsers run on every /parse call. They do not depend on each other's output -- each extracts a different facet from the raw title string.

Parser Extracts
QualityParser Source (Bluray, WebDL, HDTV, etc.), resolution, modifier (Remux, Screener, etc.), revision (version, repack flag)
TitleParser Movie title(s), year, edition, IMDB/TMDB IDs, hardcoded subs, release hash
EpisodeParser Series title, season/episode numbers, air dates, release type (single, multi, season pack)
LanguageParser Up to 59 languages via word matches, ISO codes, regex patterns, and dubbed markers
ReleaseGroupParser Scene-style (-GROUP) and anime-style ([SubGroup]) release group names

Each parser uses pre-compiled regexes tried in priority order. First match wins (except ReleaseGroupParser which takes the last match). If a parser throws, it returns null and the endpoint uses defaults (empty lists, zeros, nulls).

QualityParser handles edge cases like anime patterns (bd720, bd1080), MPEG2 detection for RawHD, and Remux fallback logic (Remux without explicit source assumes Bluray).

TitleParser tries 8+ regex patterns sequentially for movies, handling anime with subgroups, German/French tracker formats, special editions, and PassThePopcorn-style releases. It validates against obfuscated titles (MD5 hashes, suspicious patterns) and normalizes by removing file extensions and torrent site suffixes.

EpisodeParser handles 30+ regex patterns covering standard (S01E05), multi-episode (S01E05-06), absolute numbering (anime #123), daily shows (YYYY-MM-DD with ambiguous date handling), mini-series (Part 01), and season packs (Season 1-2).

LanguageParser detects languages through four methods in order: full word matches, case-sensitive codes (with anti-false-positive safety for codes like ES), regex patterns for abbreviations and dubbed markers, and special logic for German DL/ML (dual/multi language) tags.

Parse Pipeline

flowchart TD
    INPUT["Input: title + type (movie/series)"]
    QP[QualityParser]
    LP[LanguageParser]
    RGP[ReleaseGroupParser]
    TP["TitleParser (movies)"]
    EP["EpisodeParser (series)"]
    MERGE[Merge into ParseResponse]

    INPUT --> QP & LP & RGP
    INPUT -->|movie| TP
    INPUT -->|series| EP
    QP & LP & RGP & TP & EP --> MERGE

The merged ParseResponse includes: title, type, source, resolution, modifier, revision, languages, release group, and either movie-specific fields (titles, year, edition, IDs) or series-specific fields (series title, season, episodes, air date, release type).

Client

Source: src/lib/server/utils/arr/parser/client.ts

The TypeScript client is a singleton that extends BaseHttpClient with connection pooling, 30-second timeout, and 2 retries with 500ms delay. It wraps every parser endpoint and adds persistent caching for parse and match results.

Operations

Function Cached Batch Use case
parse() no no Direct parse, CF testing page
parseWithCache() yes no Single cached parse
parseWithCacheBatch() yes yes Entity testing (many releases)
matchPatterns() no no CF testing (one text, N patterns)
matchPatternsBatch() yes yes Entity testing (many texts)
validateRegex() no no Regex editor validation
isParserHealthy() memory no Layout-level health check
getParserVersion() memory no Cache invalidation key

Batch operations separate cached from uncached items, fetch only the uncached ones in parallel via Promise.all(), store new results, and return the combined map.

Caching

Two database tables provide persistent caching across server restarts:

Table Key Invalidation
parsed_release_cache {title}:{type} Parser version mismatch
pattern_match_cache {title} + patterns hash Pattern set changes (SHA-256)

When the parser version changes (detected via getParserVersion()), all entries from the old version are purged. Pattern match entries are keyed by a SHA-256 hash of the sorted pattern list, so any change to the pattern set invalidates all affected cache rows.

Two in-memory caches avoid repeated network calls:

Cache TTL Purpose
Health 30 seconds Prevent blocking on every page load
Version Session lifetime Single fetch per app start

Health Checks

isParserHealthy() uses a direct fetch() call with a 3-second timeout (shorter than the client's default 30s) to avoid stalling page loads. The result is cached in memory for 30 seconds. The global layout (+layout.server.ts) calls this on every load and passes parserAvailable to the frontend.

Integration

Custom Format Testing

Route: src/routes/custom-formats/[databaseId]/[id]/testing/+page.server.ts

When a user views a custom format's test cases, the page server:

  1. Checks parser health (early exit if unavailable).
  2. For each test case: calls parse() to get the full parsed result.
  3. Calls matchPatterns() to evaluate the format's pattern-based conditions against the parsed title, edition, and release group.
  4. Runs evaluateCustomFormat() with the parsed data and match maps.
  5. Returns per-condition results and an overall match boolean.

This path uses single (non-batch) calls because it evaluates one CF at a time with a small number of test cases.

Entity Testing

Route: src/routes/api/v1/entity-testing/evaluate/+server.ts

Entity testing evaluates quality profile scoring against many releases across all custom formats. The evaluation flow uses batch operations for performance:

flowchart TD
    REQ[Releases from UI] --> PARSE["parseWithCacheBatch()"]
    PARSE --> EXTRACT["extractPatternsByType() from all CFs"]
    EXTRACT --> MATCH["matchPatternsBatch() per type"]
    MATCH --> EVAL["evaluateCustomFormat() per release per CF"]
    EVAL --> RESP["Return: parsed info + CF match results"]

The evaluator (customFormats/evaluator.ts) groups patterns by condition type (title, edition, release group) before batch matching, ensuring each unique text is only matched once against each pattern set.

Regex Validation

Routes: regex editor page, regex creation page

validateRegex() sends the pattern to the C# service for compilation against the .NET regex engine. This catches patterns that are valid in JavaScript but invalid in .NET (or vice versa). Returns { valid, error? } or null if the parser is offline (the save is not blocked).

Offline Behavior

Feature Behavior when parser is offline
CF testing page Returns unknown results, shows parser warning
Entity testing API Returns parsed info only, skips CF evaluation
Regex validation Returns null, does not block saves
Health check Returns false, UI shows warning banner
Parse/match calls Return null, callers handle gracefully

Configuration

The service reads appsettings.json for logging and version settings:

{
	"ParserLogging": {
		"Enabled": true,
		"ConsoleLogging": true,
		"FileLogging": false,
		"MinLevel": "INFO"
	},
	"Parser": { "Version": "1.0.0" }
}

Environment variable overrides: PARSER_LOG_ENABLED, PARSER_LOG_FILE, PARSER_LOG_CONSOLE, PARSER_LOG_LEVEL, PARSER_LOGS_DIR (default /tmp/parser-logs). The service detects Docker containers (via /.dockerenv or /proc/1/cgroup) and logs container-specific environment variables at startup (PUID, PGID, UMASK, TZ).

Console output uses colored ANSI formatting. File output uses JSON-lines with daily rotation, matching the main Profilarr logger conventions.