start splitting wall and media duration for #34

This splits the schema and playback path. The recording path still adjusts the frame durations and always says the wall and media durations are the same. I expect to change that in a following commit. I wouldn't be surprised if that shakes out some bugs in this portion.
2025-12-03 06:22:32 -05:00 · 2020-08-04 21:44:01 -07:00
parent 476bd86b12
commit cb97ccdfeb
12 changed files with 437 additions and 241 deletions
--- a/design/api.md
+++ b/design/api.md
@@ -13,14 +13,10 @@ In the future, this is likely to be expanded:
    (at least for bootstrapping web authentication)
 *   mobile interface

-## Terminology
-
-*signal:* a timeseries with an enum value. Signals might represent a camera's
-motion detection or day/night status. They could also represent an external
-input such as a burglar alarm system's zone status.
-
 ## Detailed design

+*Note:* italicized terms in this document are defined in the [glossary](glossary.md).
+
 All requests for JSON data should be sent with the header
 `Accept: application/json` (exactly).

@@ -112,7 +108,7 @@ The `application/json` response will have a dict as follows:
        *   `config`: (only included if request parameter `cameraConfigs` is
            true) a dictionary describing the configuration of the stream:
            *   `rtsp_url`
-*   `signals`: a list of all signals known to the server. Each is a dictionary
+*   `signals`: a list of all *signals* known to the server. Each is a dictionary
    with the following properties:
       * `id`: an integer identifier.
       * `shortName`: a unique, human-readable description of the signal
@@ -254,13 +250,12 @@ Example response:

 ### `GET /api/cameras/<uuid>/<stream>/recordings`

-Returns information about recordings.
-
-Valid request parameters:
+Returns information about *recordings*. Valid request parameters:

 *   `startTime90k` and and `endTime90k` limit the data returned to only
-    recordings which overlap with the given half-open interval. Either or both
-    may be absent; they default to the beginning and end of time, respectively.
+    recordings with wall times overlapping with the given half-open interval.
+    Either or both may be absent; they default to the beginning and end of time,
+    respectively.
 *   `split90k` causes long runs of recordings to be split at the next
    convenient boundary after the given duration.
 *   TODO(slamb): `continue` to support paging. (If data is too large, the
@@ -291,12 +286,12 @@ arbitrary order. Each recording object has the following properties:
    an increasing "open id". This field is the open id as of when these
    recordings were written. This can be used to disambiguate ids referring to
    uncommitted recordings.
-*   `startTime90k`: the start time of the given recording. Note this may be
-    less than the requested `startTime90k` if this recording was ongoing
-    at the requested time.
-*   `endTime90k`: the end time of the given recording. Note this may be
-    greater than the requested `endTime90k` if this recording was ongoing at
-    the requested time.
+*   `startTime90k`: the start time of the given recording, in the wall time
+    scale. Note this may be less than the requested `startTime90k` if this
+    recording was ongoing at the requested time.
+*   `endTime90k`: the end time of the given recording, in the wall time scale.
+    Note this may be greater than the requested `endTime90k` if this recording
+    was ongoing at the requested time.
 *   `videoSampleEntryId`: a reference to an entry in the `videoSampleEntries`
    map.mp4` URL.
 *   `videoSamples`: the number of samples (aka frames) of video in this
@@ -362,18 +357,19 @@ Expected query parameters:

 *   `s` (one or more): a string of the form
    `START_ID[-END_ID][@OPEN_ID][.[REL_START_TIME]-[REL_END_TIME]]`. This
-    specifies recording segments to include. The produced `.mp4` file will be a
-    concatenation of the segments indicated by all `s` parameters.  The ids to
-    retrieve are as returned by the `/recordings` URL. The open id is optional
-    and will be enforced if present; it's recommended for disambiguation when
-    the requested range includes uncommitted recordings. The optional start and
-    end times are in 90k units and relative to the start of the first specified
-    id. These can be used to clip the returned segments. Note they can be used
-    to skip over some ids entirely; this is allowed so that the caller doesn't
-    need to know the start time of each interior id. If there is no key frame
-    at the desired relative start time, frames back to the last key frame will
-    be included in the returned data, and an edit list will instruct the
-    viewer to skip to the desired start time.
+    specifies *segments* to include. The produced `.mp4` file will be a
+    concatenation of the segments indicated by all `s` parameters. The ids to
+    retrieve are as returned by the `/recordings` URL.  The *open id* (see
+    [glossary](glossary.md)) is optional and will be enforced if present; it's
+    recommended for disambiguation when the requested range includes uncommitted
+    recordings. The optional start and end times are in 90k units of wall time
+    and relative to the start of the first specified id. These can be used to
+    clip the returned segments. Note they can be used to skip over some ids
+    entirely; this is allowed so that the caller doesn't need to know the start
+    time of each interior id. If there is no key frame at the desired relative
+    start time, frames back to the last key frame will be included in the
+    returned data, and an edit list will instruct the viewer to skip to the
+    desired start time.
 *   `ts` (optional): should be set to `true` to request a subtitle track be
    added with human-readable recording timestamps.

@@ -397,6 +393,11 @@ Example request URI to retrieve recording id 1, skipping its first 26
    /api/cameras/fd20f7a2-9d69-4cb3-94ed-d51a20c3edfe/main/view.mp4?s=1.26
 ```

+Note carefully the distinction between *wall duration* and *media duration*.
+It's normal for `/view.mp4` to return a media presentation with a length
+slightly different from the *wall duration* of the backing recording or
+portion that was requested.
+
 TODO: error behavior on missing segment. It should be a 404, likely with an
 `application/json` body describing what portion if any (still) exists.

@@ -415,20 +416,20 @@ trim undesired leading portions.

 This response will include the following additional headers:

-*   `X-Prev-Duration`: the total duration (in 90 kHz units) of all recordings
-    before the first requested recording in the `s` parameter. Browser-based
-    callers may use this to place this at the correct position in the source
-    buffer via `SourceBuffer.timestampOffset`.
+*   `X-Prev-Media-Duration`: the total *media duration* (in 90 kHz units) of all
+    *recordings* before the first requested recording in the `s` parameter.
+    Browser-based callers may use this to place this at the correct position in
+    the source buffer via `SourceBuffer.timestampOffset`.
 *   `X-Runs`: the cumulative number of "runs" of recordings. If this recording
    starts a new run, it is included in the count. Browser-based callers may
    use this to force gaps in the source buffer timeline by adjusting the
    timestamp offset if desired.
-*   `X-Leading-Duration`: if present, the total duration (in 90 kHz units) of
-    additional leading video included before the caller's first requested
-    timestamp. This happens when the caller's requested timestamp does not
-    fall exactly on a key frame. Media segments can't include edit lists, so
-    unlike with the `/api/.../view.mp4` endpoint the caller is responsible for
-    trimming this portion. Browser-based callers may use
+*   `X-Leading-Media-Duration`: if present, the total duration (in 90 kHz
+    units) of additional leading video included before the caller's first
+    requested timestamp. This happens when the caller's requested timestamp
+    does not fall exactly on a key frame. Media segments can't include edit
+    lists, so unlike with the `/api/.../view.mp4` endpoint the caller is
+    responsible for trimming this portion. Browser-based callers may use
    `SourceBuffer.appendWindowStart`.

 Expected query parameters:
@@ -448,8 +449,12 @@ this fundamental reason Moonfire NVR makes no effort to make multiple-segment
 *   There's currently no way to generate an initialization segment for more
    than one video sample entry, so a `.m4s` that uses more than one video
    sample entry can't be used.
-*   The `X-Prev-Duration` and `X-Leading-Duration` headers only describe the
-    first segment.
+*   The `X-Prev-Media-Duration` and `X-Leading-Duration` headers only describe
+    the first segment.
+
+Timestamp tracks (see the `ts` parameter to `.mp4` URIs) aren't supported
+today. Most likely browser clients will implement timestamp subtitles via
+WebVTT API calls anyway.

 ### `GET /api/cameras/<uuid>/<stream>/view.m4s.txt`

--- a/design/glossary.md
+++ b/design/glossary.md
@@ -0,0 +1,66 @@
+# Moonfire NVR Glossary
+
+*media duration:* the total duration of the actual samples in a recording. These
+durations are based on the camera's clock. Camera clocks can be quite
+inaccurate, so this may not match the *wall duration*. See [time.md](time.md)
+for details.
+
+*open id:* a sequence number representing a time the database was opened in
+write mode. One reason for using open ids is to disambiguate unflushed
+recordings. Recordings' ids are assigned immediately, without any kind of
+database transaction or reservation. Thus if a recording is never flushed
+successfully, a following *open* may assign the same id to a new recording.
+The open id disambiguates this and should be used whenever referring to a
+recording that may be unflushed.
+
+*recording:* the video from a (typically 1-minute) portion of an RTSP session.
+RTSP sessions are divided into recordings as a detail of the
+storage schema. See [schema.md](schema.md) for details. This concept is exposed
+to the frontend code through the API; see [api.md](api.md). It's not exposed in
+the user interface; videos are reconstructed from segments automatically.
+
+*run:* all the recordings from a single RTSP session. These are all from the
+same *stream* and could be reassembled into a single video with no gaps. If the
+ camera is lost and re-established, one run ends and another starts.
+
+*sample:* data associated with a single timestamp within a recording, e.g. a video
+frame or a set of 
+
+*sample file:* a file on disk that holds all the samples from a single recording.
+
+*sample file directory:* a directory in the local filesystem that holds all
+sample files for one or more streams. Typically there is one directory per disk.
+
+*segment:* part or all of a recording. An API request might ask for a video of
+recordings 1–4 starting 80 seconds in. If each recording is exactly 60 seconds,
+this would correspond to three segments: recording 2 from 20 seconds in to
+the end, all of recording 3, and all of recording 4. See [api.md](api.md).
+
+*session:* a set of authenticated Moonfire NVR requests defined by the use of a
+given credential (`s` cookie). Each user may have many credentials and thus
+many sessions. Note that in Moonfire NVR's the term "session" by itself has
+nothing to do with RTSP sessions; those more closely match a *run*.
+
+*signal:* a timeseries with an enum value. Signals might represent a camera's
+motion detection or day/night status. They could also represent an external
+input such as a burglar alarm system's zone status. See [api.md](api.md).
+Note signals are still under development and not yet exposed in Moonfire NVR's
+UI. See [#28](https://github.com/scottlamb/moonfire-nvr/issues/28) for more
+information.
+
+*stream:* the "main" or "sub" stream from a given camera. Moonfire NVR expects
+cameras support configuring and simultaneously viewing two streams encoded from
+the same underlying video and audio source. The difference between the two is
+that the "main" stream's video is typically higher quality in terms of frame
+rate, resolution, and bitrate. Likewise it may have higher quality audio.
+A stream corresponds to an ONVIF "media profile". Each stream has a distinct
+RTSP URL that yields a difference RTSP "presentation".
+
+*track:* one of the video, audio, or subtitles associated with a single
+*stream*. This is consistent with the definition in ISO/IEC 14496-12 section
+3.1.19. Note that RTSP RFC 2326 uses the word "stream" in the same way
+Moonfire NVR uses the word "track".
+
+*wall duration:* the total duration of a recording for the purpose of matching
+with the NVR's wall clock time. This may not match the same recording's media
+duration. See [time.md](time.md) for details.
--- a/design/time.md
+++ b/design/time.md
@@ -1,6 +1,10 @@
 # Moonfire NVR Time Handling

-Status: **current**
+Status: **in flux**. The approach below works well for video, but audio frames'
+durations can't be adjusted as easily. As part of implementing audio support,
+the implementation is changing to instead decouple "wall time" and "media time",
+as described in 
+[this comment](https://github.com/scottlamb/moonfire-nvr/issues/34#issuecomment-651548468).

 > A man with a watch knows what time it is. A man with two watches is never
 > sure.