11 KiB
Parser Service
Source: src/services/parser/ (C# microservice),
src/lib/server/utils/arr/parser/ (TypeScript client)
The parser is an optional C# microservice that extracts structured metadata from release titles: resolution, source, languages, release group, episode info, and more. It exists as a separate process because Radarr and Sonarr use .NET regex internally, and matching against the same engine ensures custom format conditions evaluate identically to how the Arr apps themselves would evaluate them.
When the parser is offline, features that depend on it degrade gracefully: testing pages show warnings, evaluation endpoints skip CF matching, and regex validation falls back to permissive behavior.
Table of Contents
Service
The microservice is a .NET 8 minimal API (Parser.csproj). It is stateless
and designed to be called from Profilarr's backend for each release title that
needs parsing or pattern matching.
Endpoints
| Method | Path | Input | Output |
|---|---|---|---|
| POST | /parse |
{ title, type } |
Full ParseResponse (all fields) |
| POST | /match |
{ text, patterns[] } |
{ results: { pattern: bool } } |
| POST | /match/batch |
{ texts[], patterns[] } |
{ results: { text: { pattern: bool } } } |
| POST | /validate/regex |
{ pattern } |
{ valid: bool, error?: string } |
| GET | /health |
-- | { status: "healthy", version } |
The /match/batch endpoint pre-compiles all patterns once with
RegexOptions.Compiled, then evaluates them against each text using
Parallel.ForEach for throughput. All regex evaluation uses a 100ms timeout
to prevent ReDoS.
Parsers
Five independent parsers run on every /parse call. They do not depend on each
other's output -- each extracts a different facet from the raw title string.
| Parser | Extracts |
|---|---|
| QualityParser | Source (Bluray, WebDL, HDTV, etc.), resolution, modifier (Remux, Screener, etc.), revision (version, repack flag) |
| TitleParser | Movie title(s), year, edition, IMDB/TMDB IDs, hardcoded subs, release hash |
| EpisodeParser | Series title, season/episode numbers, air dates, release type (single, multi, season pack) |
| LanguageParser | Up to 59 languages via word matches, ISO codes, regex patterns, and dubbed markers |
| ReleaseGroupParser | Scene-style (-GROUP) and anime-style ([SubGroup]) release group names |
Each parser uses pre-compiled regexes tried in priority order. First match wins (except ReleaseGroupParser which takes the last match). If a parser throws, it returns null and the endpoint uses defaults (empty lists, zeros, nulls).
QualityParser handles edge cases like anime patterns (bd720, bd1080),
MPEG2 detection for RawHD, and Remux fallback logic (Remux without explicit
source assumes Bluray).
TitleParser tries 8+ regex patterns sequentially for movies, handling anime with subgroups, German/French tracker formats, special editions, and PassThePopcorn-style releases. It validates against obfuscated titles (MD5 hashes, suspicious patterns) and normalizes by removing file extensions and torrent site suffixes.
EpisodeParser handles 30+ regex patterns covering standard (S01E05),
multi-episode (S01E05-06), absolute numbering (anime #123), daily shows
(YYYY-MM-DD with ambiguous date handling), mini-series (Part 01), and season
packs (Season 1-2).
LanguageParser detects languages through four methods in order: full word
matches, case-sensitive codes (with anti-false-positive safety for codes like
ES), regex patterns for abbreviations and dubbed markers, and special logic
for German DL/ML (dual/multi language) tags.
Parse Pipeline
flowchart TD
INPUT["Input: title + type (movie/series)"]
QP[QualityParser]
LP[LanguageParser]
RGP[ReleaseGroupParser]
TP["TitleParser (movies)"]
EP["EpisodeParser (series)"]
MERGE[Merge into ParseResponse]
INPUT --> QP & LP & RGP
INPUT -->|movie| TP
INPUT -->|series| EP
QP & LP & RGP & TP & EP --> MERGE
The merged ParseResponse includes: title, type, source, resolution, modifier,
revision, languages, release group, and either movie-specific fields (titles,
year, edition, IDs) or series-specific fields (series title, season, episodes,
air date, release type).
Client
Source: src/lib/server/utils/arr/parser/client.ts
The TypeScript client is a singleton that extends BaseHttpClient with
connection pooling, 30-second timeout, and 2 retries with 500ms delay. It wraps
every parser endpoint and adds persistent caching for parse and match results.
Operations
| Function | Cached | Batch | Use case |
|---|---|---|---|
parse() |
no | no | Direct parse, CF testing page |
parseWithCache() |
yes | no | Single cached parse |
parseWithCacheBatch() |
yes | yes | Entity testing (many releases) |
matchPatterns() |
no | no | CF testing (one text, N patterns) |
matchPatternsBatch() |
yes | yes | Entity testing (many texts) |
validateRegex() |
no | no | Regex editor validation |
isParserHealthy() |
memory | no | Layout-level health check |
getParserVersion() |
memory | no | Cache invalidation key |
Batch operations separate cached from uncached items, fetch only the uncached
ones in parallel via Promise.all(), store new results, and return the combined
map.
Caching
Two database tables provide persistent caching across server restarts:
| Table | Key | Invalidation |
|---|---|---|
parsed_release_cache |
{title}:{type} |
Parser version mismatch |
pattern_match_cache |
{title} + patterns hash |
Pattern set changes (SHA-256) |
When the parser version changes (detected via getParserVersion()), all
entries from the old version are purged. Pattern match entries are keyed by a
SHA-256 hash of the sorted pattern list, so any change to the pattern set
invalidates all affected cache rows.
Two in-memory caches avoid repeated network calls:
| Cache | TTL | Purpose |
|---|---|---|
| Health | 30 seconds | Prevent blocking on every page load |
| Version | Session lifetime | Single fetch per app start |
Health Checks
isParserHealthy() uses a direct fetch() call with a 3-second timeout
(shorter than the client's default 30s) to avoid stalling page loads. The result
is cached in memory for 30 seconds. The global layout (+layout.server.ts)
calls this on every load and passes parserAvailable to the frontend.
Integration
Custom Format Testing
Route: src/routes/custom-formats/[databaseId]/[id]/testing/+page.server.ts
When a user views a custom format's test cases, the page server:
- Checks parser health (early exit if unavailable).
- For each test case: calls
parse()to get the full parsed result. - Calls
matchPatterns()to evaluate the format's pattern-based conditions against the parsed title, edition, and release group. - Runs
evaluateCustomFormat()with the parsed data and match maps. - Returns per-condition results and an overall match boolean.
This path uses single (non-batch) calls because it evaluates one CF at a time with a small number of test cases.
Entity Testing
Route: src/routes/api/v1/entity-testing/evaluate/+server.ts
Entity testing evaluates quality profile scoring against many releases across all custom formats. The evaluation flow uses batch operations for performance:
flowchart TD
REQ[Releases from UI] --> PARSE["parseWithCacheBatch()"]
PARSE --> EXTRACT["extractPatternsByType() from all CFs"]
EXTRACT --> MATCH["matchPatternsBatch() per type"]
MATCH --> EVAL["evaluateCustomFormat() per release per CF"]
EVAL --> RESP["Return: parsed info + CF match results"]
The evaluator (customFormats/evaluator.ts) groups patterns by condition type
(title, edition, release group) before batch matching, ensuring each unique
text is only matched once against each pattern set.
Regex Validation
Routes: regex editor page, regex creation page
validateRegex() sends the pattern to the C# service for compilation against
the .NET regex engine. This catches patterns that are valid in JavaScript but
invalid in .NET (or vice versa). Returns { valid, error? } or null if the
parser is offline (the save is not blocked).
Offline Behavior
| Feature | Behavior when parser is offline |
|---|---|
| CF testing page | Returns unknown results, shows parser warning |
| Entity testing API | Returns parsed info only, skips CF evaluation |
| Regex validation | Returns null, does not block saves |
| Health check | Returns false, UI shows warning banner |
| Parse/match calls | Return null, callers handle gracefully |
Configuration
The service reads appsettings.json for logging and version settings:
{
"ParserLogging": {
"Enabled": true,
"ConsoleLogging": true,
"FileLogging": false,
"MinLevel": "INFO"
},
"Parser": { "Version": "1.0.0" }
}
Environment variable overrides: PARSER_LOG_ENABLED, PARSER_LOG_FILE,
PARSER_LOG_CONSOLE, PARSER_LOG_LEVEL, PARSER_LOGS_DIR (default
/tmp/parser-logs). The service detects Docker containers (via /.dockerenv
or /proc/1/cgroup) and logs container-specific environment variables at
startup (PUID, PGID, UMASK, TZ).
Console output uses colored ANSI formatting. File output uses JSON-lines with daily rotation, matching the main Profilarr logger conventions.