moonfire-nvr/design/time.md
Scott Lamb cb97ccdfeb start splitting wall and media duration for #34
This splits the schema and playback path. The recording path still
adjusts the frame durations and always says the wall and media durations
are the same. I expect to change that in a following commit. I wouldn't
be surprised if that shakes out some bugs in this portion.
2020-08-04 21:44:01 -07:00

17 KiB
Raw Blame History

Moonfire NVR Time Handling

Status: in flux. The approach below works well for video, but audio frames' durations can't be adjusted as easily. As part of implementing audio support, the implementation is changing to instead decouple "wall time" and "media time", as described in this comment.

A man with a watch knows what time it is. A man with two watches is never sure.

— Segal's law

Objective

Maximize the likelihood Moonfire NVR's timestamps are useful.

The timestamp corresponding to a video frame should roughly match timestamps from other sources:

  • another video stream from the same camera. Given a video frame from the "main" stream, a video frame from the "sub" stream with a similar timestamp should have been recorded near the same time, and vice versa. This minimizes confusion when switching between views of these streams, and when viewing the "main" stream timestamps corresponding to a motion event gathered from the less CPU-intensive "sub" stream.
  • on-camera motion events from the same camera. If the video frame reflects the motion event, its timestamp should be roughly within the event's timespan.
  • streams from other cameras. Recorded views from two cameras of the same event should have similar timestamps.
  • events noted by the owner of the system, neighbors, police, etc., for the purpose of determining chronology, to the extent those persons use accurate clocks.

Two segments of video recorded from the same stream of the same camera should not overlap. This would make it impossible for a user interface to present a simple timeline for accessing all recorded video.

Durations should be useful over short timescales:

  • If an object's motion is recorded, distance travelled divided by the duration of the frames over which this motion occurred should reflect the object's average speed.
  • Motion should appear smooth. There shouldn't be excessive frame-to-frame jitter due to such factors as differences in encoding time or network transmission.

This document describes an approach to achieving these goals when the following statements are true:

  • the NVR's system clock is within a second of correct on startup. (True when NTP is functioning or when the system has a real-time clock battery to preserve a previous correct time.)
  • the NVR's system time does not experience forward or backward "step" corrections (as opposed to frequency correction) during operation.
  • the NVR's system time advances at roughly the correct frequency. (NTP achieves this through frequency correction when operating correctly.)
  • the cameras' clock frequencies are off by no more than 500 parts per million (roughly 43 seconds per day).
  • the cameras are geographically close to the NVR, so in most cases network transmission time is under 50 ms. (Occasional delays are to be expected, however.)

When one or more of those statements are false, the system should degrade gracefully: preserve what properties it can, gather video anyway, and when possible include sufficient metadata to assess trustworthiness.

Additionally, the system should not require manual configuration of camera frequency corrections.

Background

Time in a distributed system is notoriously tricky. Falsehoods programmers believe about time and More falsehoods programmers believe about time; "wisdom of the crowd" edition give a taste of the problems encountered. These problems are found even in datacenters with expensive, well-tested hardware and relatively reliable network connections. Moonfire NVR is meant to run on an inexpensive single-board computer and record video from budget, closed-source cameras, so such problems are to be expected.

Moonfire NVR typically has access to the following sources of time information:

  • the local CLOCK_REALTIME. Ideally this is maintained by ntpd: synchronized on startup, and frequency-corrected during operation. A hardware real-time clock and battery keep accurate time across restarts if the network is unavailable on startup. In the worst case, the system has no real-time clock or no battery and a network connection is unavailable. The time is far in the past on startup and is never corrected or is corrected via a step while Moonfire NVR is running.
  • the local CLOCK_MONOTONIC. This should be frequency-corrected by ntpd and guaranteed to never experience "steps", though its reference point is unspecified.
  • the local ntpd, which can be used to determine if the system is synchronized to NTP and quantify the precision of synchronization.
  • each camera's clock. The ONVIF specification mandates cameras must support synchronizing clocks via NTP, but in practice cameras appear to use SNTP clients which simply step time periodically and provide no interface to determine if the clock is currently synchronized. This document's author owns several cameras with clocks that run roughly 20 ppm fast (2 seconds per day) and are adjusted via steps.
  • the RTP timestamps from each of a camera's streams. As described in RFC 3550 section 5.1, these are monotonically increasing with an unspecified reference point. They can't be directly compared to other cameras or other streams from the same camera. Emperically, budget cameras don't appear to do any frequency correction on these timestamps.
  • in some cases, RTCP sender reports, as described in RFC 3550 section 6.4. These correlate RTP timestamps with the camera's real time clock. However, these are only sent periodically, not necessarily at the beginning of the session. Some cameras omit them entirely depending on firmware version, as noted in this forum post. Additionally, Moonfire NVR currently uses ffmpeg's libavformat for RTSP protocol handling; this library exposes these reports in a limited fashion.

The camera records video frames as in the diagram below:

Video frame timeline

Each frame has an associated RTP timestamp. It's unclear from skimming RFC 3550 exactly what time this represents, but it must be some time after the last frame and before the next frame. At a typical rate of 30 frames per second, this timespan is short enough that this uncertainty won't be the largest source of time error in the system. We'll assume arbitrarily that the timestamp refers to the start of exposure.

RTP doesn't transmit the duration of each video frame; it must be calculated from the timestamp of the following frame. This means that if a stream is terminated, the final frame has unknown duration.

As described in schema.md, Moonfire NVR saves RTSP video streams into roughly one-minute "recordings", with a fixed rotation offset after the minute in the NVR's wall time.

Overview

Moonfire NVR will use the RTP timestamps to calculate video frames' durations. For the first segment of video, it will trust these completely. It will use them and the NVR's wall clock time to establish the start time of the recording. For following segments, it will slightly adjust durations to compensate for difference between the frequencies of the camera and NVR clock, trusting the latter to be accurate.

Detailed design

On every frame of video, Moonfire NVR will get a timestamp from CLOCK_MONOTONIC. On the first frame, it will additionally get a timestamp from CLOCK_REALTIME and compute the difference. It uses these to compute a monotonically increasing real time of receipt for every frame, called the local frame time. Assuming the local clock is accurate, this time is an upper bound on when the frame was generated. The difference is the sum of the following items:

  • H.264 encoding
  • buffering on the camera (particularly when starting the stream—some cameras apparently send frames that were captured before the RTSP session was established)
  • network transmission time

These values may produce some jitter, so the local frame time is not directly used to calculate frame durations. Instead, they are primarily taken from differences in RTP timestamps from one frame to the next. During the first segment of video, these RTP timestamp differences are used directly, without correcting for incorrect camera frequency. At the design limit of 500 ppm camera frequency error, and an upper bound of two minutes of recording for the initial segment (explained below), this causes a maximum of 60 milliseconds of error.

The local start time of a segment is calculated when ending it. It's defined as the minimum for all frames of the local frame time minus the duration of all previous frames. If there are many frames, this means neither initial buffering nor spikes of delay in H.264 encoding or network transmission cause the local start time to become inaccurate. The least delayed frame wins.

The first segment either ends with the RTSP session (due to error/shutdown) or on rotation. In the former case, there may not be many samples to use in calculating the local start time; accuracy may suffer but the system degrades gracefully. Rotation doesn't happen until the second time the rotation offset is passed, so rotation happens after 12 minutes rather than 01 minutes to maximize accuracy.

The start time of the first segment is its local start time. The start time of following segments is the end time of the previous segment.

The duration of following segments is adjusted to compensate for camera frequency error, assuming the NVR clock's frequency is more trustworthy. This is done as follows. The local duration of segment i is calculated as the local start time of segment i+1 minus the local start time of segment i. The cumulative error as of segment i is defined as the local duration of all previous segments minus the duration of all previous segments. The duration of segment i should be adjusted by up to 500 ppm to eliminate cumulative error. (For a one-minute segment, this is 0.3 ms, or 27 90kHz units.) This correction should be spread out across the segment to minimize jitter.

Each segment's local start time is also stored in the database as a delta to the segment's start time. These stored values aren't for normal system operation but may be handy in understanding and correcting errors.

Caveats

Stream mismatches

There's no particular reason to believe this will produce perfectly matched streams between cameras or even of main and sub streams within a camera. If this is insufficient, there's an alternate calculation of start time that could be used in some circumstances: the camera start time. The first RTCP sender report could be used to correlate a RTP timestamp with the camera's wall clock, and thus calculate the camera's time as of the first frame.

The start time of the first segment could be either its local start time or its camera start time, determined via the following rules:

  1. if there is no camera start time (due to the lack of a RTCP sender report), the local start time wins by default.
  2. if the camera start time is before 2016-01-01 00:00:00 UTC, the local start time wins.
  3. if the local start time is before 2016-01-01 00:00:00 UTC, the camera start time wins.
  4. if the times differ by more than 5 seconds, the local start time wins.
  5. otherwise, the camera start time wins.

These rules are a compromise. When a system starts up without NTP or a clock battery, it typically reverts to a time in the distant past. Therefore times before Moonfire NVR was written should be checked for and avoided. When both systems have a believably recent timestamp, the local time is typically more accurate, but the camera time allows a closer match between two streams of the same camera.

This still doesn't completely solve the problem, and it's unclear it is even better. When using camera start times, different cameras' streams may be mismatched by up twice the 5-second threshold described above. This could even happen for two streams within the same camera if a significant step happens between their establishment. More frequent SNTP adjustments may help, so that individual steps are less frequent. Or Moonfire NVR could attempt to address this with more complexity: use sender reports of established RTSP sessions to detect and compensate for these clock splits.

It's unclear if these additional mechanisms are desirable or worthwhile. The simplest approach will be adopted initially and adapted as necessary.

Time discontinuities

If the local system's wall clock time jumps during a recording (as has happened), Moonfire NVR will continue to use the initial wall clock time for as long as the recording lasts. This can result in some unfortunate behaviors:

  • a recording that lasts for months might have an incorrect time all the way through because ntpd took a few minutes on startup.
  • two recordings that were in fact simultaneous might be recorded with very different times because a time jump happened between their starts.

It might be better to use the new time (assuming that ntpd has made a correction) retroactively. This is unimplemented, but the recording_integrity database table has a wall_time_delta_90k field which could be used for this purpose, either automatically or interactively.

It would also be possible to split a recording in two if a "significant" time jump is noted, or to allow manually restarting a recording without restarting the entire program.

Leap seconds

UTC time is defined as the seconds since epoch excluding leap seconds. Thus, timestamps during the leap second are ambiguous, and durations across the leap second should be adjusted.

In POSIX, the system clock (as returned by clock_gettime(CLOCK_REALTIME, ...) is defined as representing UTC. Note that some systems may instead be following a leap smear policy in which instead of one second happening twice, the clock runs slower. For a 24-hour period, the clock runs slower by a factor of 1/86,400 (an extra ~11.6 μs/s).

In Moonfire NVR, all wall times in the database are based on UTC as reported by the system, and it's assumed that start + duration = end. Thus, a leap second is similar to a one-second time jump (see "Time discontinuities" above).

Here are some options for improvement:

Use clock_gettime(CLOCK_TAI, ...) timestamps

Timestamps in the TAI clock system don't skip leap seconds. There's a system interface intended to provide timestamps in this clock system, and Moonfire NVR could use it. Unfortunately this has several problems:

  • CLOCK_TAI is only available on Linux. It'd be preferable to handle timestamps in a consistent way on other platforms. (At least on macOS, Moonfire NVR's current primary development platform.)
  • CLOCK_TAI is wrong on startup and possibly adjusted later. The offset between TAI and UTC is initially assumed to be 0. It's corrected when/if a sufficiently new ntpd starts.
  • We'd need a leap second table to translate this into calendar time. One would have to be downloaded from the Internet periodically, and we'd need to consider the case in which the available table is expired.
  • CLOCK_TAI likely doesn't work properly with leap smear systems. Where the leap smear prevents a time jump for CLOCK_REALTIME, it likely introduces one for CLOCK_TAI.

Use a leap second table when calculating differences

Moonfire NVR could retrieve UTC timestamps from the system then translate then to TAI via a leap second table, either before writing them to the database or whenever doing math on timestamps.

As with CLOCK_TAI, this would require downloading a leap second table from the Internet periodically.

This would mostly solve the problem at the cost of complexity. Timestamps obtained from the system for a two-second period starting with each leap second would still be ambiguous.

Use smeared time

Moonfire NVR could make no code changes and ask the system administrator to use smeared time. This is the simplest option. On a leap smear system, there are no time jumps. The ~11.6 ppm frequency error and the maximum introduced absolute error of 0.5 sec can be considered acceptable.

Alternatively, Moonfire NVR could assume a specific leap smear policy (such as 24-hour linear smear from 12:00 the day before to 12:00 the day after) and attempt to correct the time into TAI with a leap second table. This behavior would work well on a system with the expected configuration and produce surprising results on other systems. It's unfortunate that there's no standard way to determine if a system is using a leap smear and with what policy.