Files
wizarr/app/activity/__init__.py
Matthieu B 2283c4de68 fix: prevent startup race condition during migrations
This fixes a critical issue where Gunicorn workers would fail to start
after upgrading to v2025.11.0, causing containers to show as unhealthy
with only the uv wrapper process running and no actual workers.

Root Cause:
-----------
In v2025.11.0, library scanning and session recovery were added to the
create_app() function, which runs during EVERY app creation including:
1. During 'flask db upgrade' (migrations)
2. During Gunicorn master when_ready() hook
3. During each Gunicorn worker spawn

The migration 20251103_properly_fix_foreign_keys recreates 4 database
tables with CASCADE foreign keys using raw SQL. This holds exclusive
database locks during table recreation.

When library scanning and session recovery try to query these tables
during migration, they hit database locks, creating a race condition
that causes workers to timeout and crash during startup.

Fix:
----
- Skip library scanning during migrations (FLASK_SKIP_SCHEDULER=true)
- Skip activity monitoring/session recovery during migrations
- Make Gunicorn log level configurable (GUNICORN_LOG_LEVEL env var)
- Add worker lifecycle hooks for better crash debugging
- Increase healthcheck start period from 10s to 60s
- Increase Gunicorn worker timeout from 30s to 120s

Testing:
--------
- Verified app starts successfully with FLASK_SKIP_SCHEDULER=true
- Verified library scanning runs normally without the flag
- Confirmed 0.38s startup during migrations vs 1.61s normal startup

Closes #976
2025-11-03 20:41:52 +01:00

77 lines
2.5 KiB
Python

"""
Activity monitoring module for Wizarr.
Provides real-time activity monitoring and historical tracking of media playback
sessions across all configured media servers.
"""
from __future__ import annotations
import os
import threading
import structlog
from flask import Flask
from app.models import ActivitySession, ActivitySnapshot
from app.services.activity import ActivityService
from .monitoring.monitor import WebSocketMonitor
def init_app(app: Flask) -> None:
"""Initialise activity monitoring features with the Flask application."""
logger = structlog.get_logger(__name__)
# Skip activity monitoring during tests
if app.config.get("TESTING"):
logger.debug("Skipping activity monitoring in test mode")
return
# Skip during migrations to avoid database locking and race conditions
# This prevents session recovery from running during 'flask db upgrade'
if os.environ.get("FLASK_SKIP_SCHEDULER") == "true":
logger.debug("Skipping activity monitoring during migrations")
return
# Skip only in Werkzeug's reloader parent process (development mode)
# WERKZEUG_RUN_MAIN is only set when using Flask's development server with reloader
# In production (Gunicorn/uWSGI), this env var won't be set, so we should proceed
if os.environ.get("WERKZEUG_RUN_MAIN") == "false":
logger.debug("Skipping activity monitoring in reloader parent process")
return
app.extensions = getattr(app, "extensions", {})
if "activity_monitor" in app.extensions:
logger.debug("Activity monitoring already initialized, skipping")
return
logger.info("Initializing activity monitoring")
monitor = WebSocketMonitor(app)
app.extensions["activity_monitor"] = monitor
def delayed_start():
import time
time.sleep(2)
try:
from app.tasks.activity import recover_sessions_on_startup_task
recovered_count = recover_sessions_on_startup_task(app)
logger.info(
"Session recovery completed on startup: %s orphaned sessions cleaned up",
recovered_count,
)
except Exception as exc:
logger.error("Session recovery failed on startup: %s", exc, exc_info=True)
monitor.start_monitoring()
threading.Thread(target=delayed_start, daemon=True).start()
logger.info("Activity monitoring initialized")
__all__ = ["ActivityService", "ActivitySession", "ActivitySnapshot", "init_app"]