The WS-Security timestamp had only a 10-second validity window, which
caused intermittent "not authorized" errors when there was clock drift
between ZoneMinder and the camera.
Changes:
- Increase default timestamp validity from 10 to 60 seconds
- Add configurable timestamp_validity option (range 10-600 seconds)
- Add better error diagnostics for auth failures showing timestamp_validity
and camera_clock_offset values
- Update parse_onvif_options documentation
Users experiencing auth errors due to clock drift can now set
onvif_options=timestamp_validity=120 (or higher) as a workaround.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Two issues fixed:
1. Null soap context on retry: After a failed subscription attempt,
soap_free() was called but the context was never recreated. Added
InitSoapContext() to reinitialize the soap context, proxy, and
endpoint when needed.
2. Use-after-free in Renew/PullMessages: soap_end() frees all gSOAP
allocated memory including response.SubscriptionReference.Address.
Subsequent calls to Renew() used the freed pointer, causing DNS
errors on valid IP addresses. Added subscription_address_ member
to cache the address before soap_end() is called.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add soap_destroy() and soap_end() calls after SOAP operations in:
- WaitForMessage(): after PullMessages processing
- Renew(): after Renew response processing (all return paths)
- Subscribe(): after initial PullMessages call
Also clear initialized_count map and warned_initialized_repeat flag
when creating new subscription to prevent unbounded map growth.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Wrap all GSOAP-dependent code in a single #ifdef WITH_GSOAP block
- Move GSOAP-specific members and methods inside the conditional
- Provide minimal stubs (constructor, destructor, start) for non-GSOAP builds
- Remove unnecessary stub methods for private members that can't be called
- Code now compiles cleanly with or without GSOAP support
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Cameras like Reolink send alarm=true but never send the corresponding
false, causing alarms to stick indefinitely. Use the TerminationTime
from PullMessagesResponse to auto-expire stale per-topic alarms.
- Add AlarmEntry struct with value and termination_time fields
- Extract TerminationTime from each PullMessagesResponse and attach
it to alarm entries; refresh on re-trigger so active alarms persist
- Sweep expired alarms after processing messages and on poll timeout
- Add expire_alarms option (default: true) to disable via onvif_options
- Fix TOCTOU race: remove unsynchronized alarms.empty() check before
acquiring mutex in the timeout sweep path
- Simplify SetNoteSet with C++17 structured bindings
- Add Catch2 tests for alarm expiry logic (mirrored struct to avoid
gSOAP header dependency)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Previously, when plain UsernameToken authentication succeeded after
digest auth failed, the code would skip the subscription setup code
(initial PullMessages, renewal tracking, initial Renew). This was
because the success path only executed inside an else block that
plain auth success didn't reach.
Restructure attempt_subscription() so both success paths (first attempt
and plain auth retry) fall through to common success handling code:
- Initial PullMessages to clear queued messages
- Update renewal tracking times
- Perform initial subscription renewal
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add has_valid_subscription_ member to track whether we have an active
ONVIF subscription that needs to be unsubscribed:
- Initialize to false in constructor
- Set to true after successful CreatePullPointSubscription
- Set to false after cleanup or unsubscribe
- Check flag in cleanup_subscription() and destructor before attempting
unsubscribe
This avoids unnecessary network traffic and warning logs when attempting
to unsubscribe with stale response.SubscriptionReference.Address data
from failed subscription attempts.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Refactor ONVIF subscription logic to guarantee the polling thread
starts regardless of initial connection success:
- Extract subscription logic into attempt_subscription() function
- Move thread start code after attempt_subscription() in start()
- Replace goto pattern with return values
- Move Run() inside #ifdef WITH_GSOAP to fix compile without gSOAP
- Add stub Run() for non-gSOAP builds to prevent linker errors
- Remove unused last_retry_time member variable
- Increase default max_retries from 5 to 10
Previously, if the initial ONVIF subscription failed, start() would
return early before the thread-starting code, leaving no thread to
attempt reconnection. Now the thread always starts and handles
reconnection with exponential backoff.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The ONVIF::start() function contained early return statements in error
handling blocks. This prevented the execution flow from reaching the
thread-starting logic at the end of the function if the initial SOAP
subscription failed.
Without the polling thread (Run()), the monitor would never attempt
to reconnect or retry, effectively disabling ONVIF events until a
manual restart or object recreation occurred.
Changes:
- Replaced early returns with jumps to the thread-starting label.
- Moved 'rc' variable declaration to avoid 'crosses initialization' errors.
- Ensures Run() is always spawned to handle exponential backoff and retries.
- Fix dangling pointer: store endpoint URL in member variable
(event_endpoint_url_) so proxyEvent.soap_endpoint remains valid
- Remove debug log that accessed alarms.size() outside mutex lock
- Use isHealthy() accessor instead of direct healthy_.load()
- Use setAlarmed()/isAlarmed() consistently throughout WaitForMessage()
instead of direct atomic access
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The ONVIF polling thread was failing to reconnect after connection loss
because:
1. Exponential backoff delay was calculated but never used - the thread
only slept 500ms between attempts regardless of retry count
2. After max_retries was exceeded, retry_count was never reset, causing
permanent failure state
3. Monitor::connect() created new ONVIF objects without deleting old
ones, causing duplicate threads and memory leaks
Changes:
- Use get_retry_delay() for actual exponential backoff (1s, 2s, 4s... up to 300s)
- Add 5-minute cool-down period after max_retries, then reset for fresh attempts
- Sleep in 1-second increments to remain responsive to termination signals
- Clean up existing ONVIF object in Monitor::connect() before creating new one
- Add setHealthy() accessor for consistency with setAlarmed() pattern
- Replace direct healthy_.load/store calls with isHealthy()/setHealthy()
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Move the 500ms delay to before start() call when unhealthy, preventing
rapid reconnection attempts when the subscription fails.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change pull_timeout and subscription_timeout from std::string (ISO 8601)
to int (seconds) for simpler configuration and cleaner code
- Add deprecation warning when PT prefix is detected in option values
- Add FormatDurationSeconds() helper to construct ISO 8601 strings for SOAP
- Use zm_utils.h functions: Split(), PairSplit(), StartsWith()
- Remove unused parse_iso8601_duration_seconds() function
New option format: pull_timeout=5,subscription_timeout=300
Old format (deprecated): pull_timeout=PT5S,subscription_timeout=PT300S
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Increase ONVIF_RENEWAL_ADVANCE_SECONDS from 20s to 60s to give more
buffer time before subscription expiration
- Reduce default pull_timeout from PT10S to PT5S for more responsive
event polling
- Fix pull_timeout adjustment logic to dynamically construct the timeout
string based on ONVIF_RENEWAL_ADVANCE_SECONDS instead of hardcoding
- Update fallback timeout from PT8S to PT5S for consistency
- Clean up SOAP_STRINGS array formatting
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Convert alarmed and healthy from plain bool to std::atomic<bool> to
fix data races between the ONVIF polling thread and the analysis thread.
Changes:
- Rename alarmed -> alarmed_, healthy -> healthy_ (atomic naming convention)
- Use memory_order_acquire for loads, memory_order_release for stores
- Remove unnecessary mutex lock from isAlarmed() (atomic is sufficient)
- Add const qualifier to isAlarmed() and isHealthy() accessors
The alarms_mutex is retained for protecting the alarms map operations.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Move ONVIF polling from shared Poll() loop (10s cadence) to dedicated
thread with 500ms cadence. This reduces worst-case ONVIF event latency
from ~10 seconds to ~500ms.
Changes:
- Add thread management to ONVIF class (thread_, terminate_, Run())
- Launch polling thread in start() after successful subscription
- Stop thread in destructor before SOAP cleanup
- Remove ONVIF handling from Monitor::Poll()
- Call onvif->start() immediately after creation in Monitor::connect()
- Increase default pull_timeout from PT5S to PT10S for better listening window
The ONVIF thread handles its own reconnection when unhealthy. Other
managers (Amcrest, RTSP2Web, Go2RTC, Janus) remain on 10s Poll() cadence.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Move "Is" prefix check to the beginning of is_data_simple_item to
catch IsMotion, IsTamper, etc. early without explicit string comparisons.
Add case-insensitive matching using StringToLower for broader camera
compatibility. Remove redundant Is* entries now caught by prefix check.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace inline ASCII lowercase conversion with existing StringToLower
utility function for consistency with the rest of the codebase.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace 21 case-variant string comparisons with single lowercase
conversion followed by 7 comparisons. Remove redundant empty check
in all_zeros condition since early return already handles empty values.
Remove explicit true/active checks since default return handles them.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The previous fix incorrectly assumed msg->Message.__any.elts pointed to
tt:Message, but in gSOAP's DOM structure:
- msg->Message.__any IS tt:Message
- msg->Message.__any.atts contains tt:Message attributes (PropertyOperation)
- msg->Message.__any.elts is the FIRST CHILD (tt:Source), not tt:Message
This caused the code to look for tt:Data inside tt:Source instead of
as a sibling of tt:Source, falling back to legacy parsing.
The fix correctly:
1. Gets PropertyOperation from msg->Message.__any.atts
2. Iterates through msg->Message.__any.elts and siblings to find tt:Data
Tested with Reolink RLC-811A and RLC-822A example messages.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add support for additional camera manufacturers and event types
based on ONVIF Event Service Specification review.
Data field names added:
- LogicalState (Hikvision/Dahua digital I/O)
- active (Axis cameras)
- Motion, Alarm, IsActive (various cameras)
- Analytics: Count, Level, Direction, ObjectId, Speed, Region, Percentage
- Pattern matching for fields ending in "State" or "Alarm"
Value formats added:
- active/inactive (Hikvision/Dahua)
- on/off, yes/no, open/closed
- alarm/triggered/idle/normal
Cameras supported: Reolink, Hikvision, Dahua, Axis, Amcrest, Hanwha/Samsung
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The previous DOM traversal would descend into tt:Source element and
get stuck there, never reaching the sibling tt:Data element where
actual alarm values are stored.
Changes:
- Add helper functions for cleaner namespace handling and attribute extraction
- Navigate directly to tt:Data element instead of depth-first into tt:Source
- Extract PropertyOperation from tt:Message attributes
- Support data fields: State, IsMotion, IsTamper, IsInside, IsCrossing, Value
- Add iterative depth-first fallback for non-standard camera structures
- Preserve legacy fallback for backward compatibility
Tested with Reolink RLC-811A and RLC-822A event examples.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add use_absolute_time_for_renewal flag to ONVIF class
- Initialize flag to false in constructor
- Set flag when update_renewal_times() detects past termination time
- Modify Renew() to use absolute ISO 8601 time when flag is set
- Add format_absolute_time_iso8601() helper function
- Backward compatible: only uses absolute time after detecting camera issue
Co-authored-by: connortechnology <925519+connortechnology@users.noreply.github.com>
Add comments explaining that soap error is from soap_wsa_request
and clarify that failing to unsubscribe when WS-Addressing setup
fails is a known limitation (subscription may remain on camera).
Address code review feedback.
Co-authored-by: connortechnology <925519+connortechnology@users.noreply.github.com>
Use actual soap error from do_wsa_request failure instead of
hardcoding SOAP_ERR. Return early if WS-Addressing setup fails
to avoid attempting unsubscribe with invalid state.
Address code review feedback.
Co-authored-by: connortechnology <925519+connortechnology@users.noreply.github.com>
Add cleanup_subscription() helper method that properly unsubscribes
from ONVIF cameras with error handling. This prevents subscription
leaks when renewals fail or when restarting subscriptions.
Changes:
- Add cleanup_subscription() method to header and implementation
- Call cleanup on renewal failures in Renew()
- Call cleanup when start() detects existing soap context
- Improve destructor logging for unsubscribe failures
- Add test documentation for cleanup behavior
Refs: ONVIF subscription leak issue
Co-authored-by: connortechnology <925519+connortechnology@users.noreply.github.com>