mass markdown reformatting

Add tables of contents (using the VS Code Markdown All-In-One extension) and reformat lists to consistently use 4-space indents. No content changes.
2025-11-20 01:50:24 -05:00 · 2021-04-01 12:10:43 -07:00
parent 74b13a0fbf
commit 4d4d78ba64
10 changed files with 539 additions and 470 deletions
--- a/design/api.md
+++ b/design/api.md
@@ -1,7 +1,27 @@
-# Moonfire NVR API
+# Moonfire NVR API <!-- omit in toc -->

 Status: **current**.

+* [Objective](#objective)
+* [Detailed design](#detailed-design)
+    * [`POST /api/login`](#post-apilogin)
+    * [`POST /api/logout`](#post-apilogout)
+    * [`GET /api/`](#get-api)
+    * [`GET /api/cameras/<uuid>/`](#get-apicamerasuuid)
+    * [`GET /api/cameras/<uuid>/<stream>/recordings`](#get-apicamerasuuidstreamrecordings)
+    * [`GET /api/cameras/<uuid>/<stream>/view.mp4`](#get-apicamerasuuidstreamviewmp4)
+    * [`GET /api/cameras/<uuid>/<stream>/view.mp4.txt`](#get-apicamerasuuidstreamviewmp4txt)
+    * [`GET /api/cameras/<uuid>/<stream>/view.m4s`](#get-apicamerasuuidstreamviewm4s)
+    * [`GET /api/cameras/<uuid>/<stream>/view.m4s.txt`](#get-apicamerasuuidstreamviewm4stxt)
+    * [`GET /api/cameras/<uuid>/<stream>/live.m4s`](#get-apicamerasuuidstreamlivem4s)
+    * [`GET /api/init/<id>.mp4`](#get-apiinitidmp4)
+    * [`GET /api/init/<id>.mp4.txt`](#get-apiinitidmp4txt)
+    * [`GET /api/signals`](#get-apisignals)
+    * [`POST /api/signals`](#post-apisignals)
+        * [Request 1](#request-1)
+        * [Request 2](#request-2)
+        * [Request 3](#request-3)
+
 ## Objective

 Allow a JavaScript-based web interface to list cameras and view recordings.
@@ -704,7 +724,7 @@ Response:
 }
 ```

-### Request 3
+#### Request 3

 5 seconds later, the client observes motion has ended. It leaves the prior
 data alone and predicts no more motion.
--- a/design/schema.md
+++ b/design/schema.md
@@ -1,4 +1,4 @@
-# Moonfire NVR Storage Schema
+# Moonfire NVR Storage Schema <!-- omit in toc -->

 Status: **current**.

@@ -6,42 +6,56 @@ This is the initial design for the most fundamental parts of the Moonfire NVR
 storage schema. See also [guide/schema.md](../guide/schema.md) for more
 administrator-focused documentation.

+* [Objective](#objective)
+    * [Cameras](#cameras)
+    * [Hard drives](#hard-drives)
+* [Overview](#overview)
+* [Detailed design](#detailed-design)
+    * [SQLite3](#sqlite3)
+    * [Duration of recordings](#duration-of-recordings)
+    * [Lifecycle of a sample file directory](#lifecycle-of-a-sample-file-directory)
+    * [Lifecycle of a recording](#lifecycle-of-a-recording)
+    * [Verifying invariants](#verifying-invariants)
+    * [Recording table](#recording-table)
+        * [`video_index`](#video_index)
+    * [<a href="on-demand"></a>On-demand `.mp4` construction](#on-demand-mp4-construction)
+
 ## Objective

 Goals:

-* record streams from modern ONVIF/PSIA IP security cameras
-* support several cameras
-* maintain full fidelity of incoming compressed video streams
-* record continuously
-* support on-demand serving in different file formats / protocols
-  (such as standard .mp4 files for arbitrary timespans, fragmented .mp4 files
-  for MPEG-DASH or HTML5 Video Source Extensions, MPEG-TS files for HTTP Live
-  Streaming, and "trick play" RTSP)
-* annotate camera timelines with metadata
-  (such as motion detection, security alarm events, etc)
-* retain video segments with ~1-minute granularity based on metadata
-  (e.g., extend retention of motion events)
-* take advantage of compact, inexpensive, low-power, commonly-available
-  hardware such as the $35 [Raspberry Pi 2 Model B][pi2]
-* support high- and low-bandwidth playback
-* support near-live playback (~second old), including "trick play"
-* allow verifying database consistency with an `fsck` tool
+*   record streams from modern ONVIF/PSIA IP security cameras
+*   support several cameras
+*   maintain full fidelity of incoming compressed video streams
+*   record continuously
+*   support on-demand serving in different file formats / protocols
+    (such as standard .mp4 files for arbitrary timespans, fragmented .mp4 files
+    for MPEG-DASH or HTML5 Video Source Extensions, MPEG-TS files for HTTP Live
+    Streaming, and "trick play" RTSP)
+*   annotate camera timelines with metadata
+    (such as motion detection, security alarm events, etc)
+*   retain video segments with ~1-minute granularity based on metadata
+    (e.g., extend retention of motion events)
+*   take advantage of compact, inexpensive, low-power, commonly-available
+    hardware such as the $35 [Raspberry Pi 2 Model B][pi2]
+*   support high- and low-bandwidth playback
+*   support near-live playback (~second old), including "trick play"
+*   allow verifying database consistency with an `fsck` tool

 Non-goals:

-* record streams from older cameras: JPEG/MJPEG USB "webcams" and analog
-  security cameras/capture cards
-* allow users to directly access or manipulate the stored data with standard
-  video or filesystem tools
-* support H.264 features not used by common IP camera encoders, such as
-  B-frames and Periodic Infra Refresh.
-* support recovering the last ~minute of video after a crash or power loss
+*   record streams from older cameras: JPEG/MJPEG USB "webcams" and analog
+    security cameras/capture cards
+*   allow users to directly access or manipulate the stored data with standard
+    video or filesystem tools
+*   support H.264 features not used by common IP camera encoders, such as
+    B-frames and Periodic Infra Refresh.
+*   support recovering the last ~minute of video after a crash or power loss

 Possible future goals:

-* record audio and/or other types of timestamped samples (such as
-  [Xandem][xandem] tomography data).
+*   record audio and/or other types of timestamped samples (such as
+    [Xandem][xandem] tomography data).

 ### Cameras

@@ -51,12 +65,12 @@ streams. They have many customizable settings, such as resolution, frame rate,
 compression quality, maximum bitrate, I-frame interval. A typical setup might be
 as follows:

-* the high-quality "main" stream as 1080p/30fps, 3000 kbps.
-  This stream is well-suited to local viewing or forensics.
-* the low-bandwidth "sub" stream as 704x480/10fps, 100 kbps.
-  This stream may be preferred for mobile/remote viewing, when viewing several
-  streams side-by-side, and for real-time computer vision (such as salient
-  motion detection).
+*   the high-quality "main" stream as 1080p/30fps, 3000 kbps.
+    This stream is well-suited to local viewing or forensics.
+*   the low-bandwidth "sub" stream as 704x480/10fps, 100 kbps.
+    This stream may be preferred for mobile/remote viewing, when viewing several
+    streams side-by-side, and for real-time computer vision (such as salient
+    motion detection).

 The dual pre-encoded H.264 video streams provide a tremendous advantage over
 older camera models (which provided raw video or JPEG-encoded frames) because
@@ -73,13 +87,17 @@ different quality settings as well.

 Decode:

-    $ time ffmpeg -y -threads 1 -i input.mp4 \
-                  -f null /dev/null
+```
+$ time ffmpeg -y -threads 1 -i input.mp4 \
+              -f null /dev/null
+```

 Combo (Decode + encode with libx264):

-    $ time ffmpeg -y -threads 1 -i input.mp4 \
-                  -c:v libx264 -preset ultrafast -threads 1 -f mp4 /dev/null
+```
+$ time ffmpeg -y -threads 1 -i input.mp4 \
+              -c:v libx264 -preset ultrafast -threads 1 -f mp4 /dev/null
+```


 | Processor                     | 1080p30 decode | 1080p30 combo | 704x480p10 decode | 704x480p10 combo |
@@ -115,8 +133,10 @@ only capable of 50 random accesses per second, and each one takes time that
 otherwise could be used to transfer 2+ MB. The constrained resource, *disk
 time fraction*, can be bounded as follows:

-    disk time fraction <= (seek rate) / (50 seeks/sec) +
-                          (bandwidth) / (100 MB/sec)
+```
+disk time fraction <= (seek rate) / (50 seeks/sec) +
+                      (bandwidth) / (100 MB/sec)
+```

 ## Overview

@@ -127,19 +147,20 @@ together.

 Each recording is stored in two places:

-* a sample file directory, intended to be stored on spinning disk.
-  Each file in this directory is simply a concatenation of the compressed,
-  timestamped video samples (also called "packets" or encoded frames), as
-  received from the camera. In MPEG-4 terminology (see [ISO
-  14496-12][iso-14496-12]), this is the contents of a `mdat` box for a `.mp4`
-  file representing the segment. These files do not contain framing data (start
-  and end byte offsets of samples) and thus are not meant to be decoded on
-  their own.
-* the `recording` table in a [SQLite3][sqlite3] database, intended to be
-  stored on flash if possible. A row in this table contains all the metadata
-  associated with the segment, including the sample-by-sample contents of the
-  MPEG-4 `stbl` box. At 30 fps, a row is expected to require roughly 4 KB of
-  storage (2 bytes per sample, plus some fixed overhead).
+*   a sample file directory, intended to be stored on spinning disk.
+    Each file in this directory is simply a concatenation of the compressed,
+    timestamped video samples (also called "packets" or encoded frames), as
+    received from the camera. In MPEG-4 terminology (see [ISO
+    14496-12][iso-14496-12]), this is the contents of a `mdat` box for a
+    `.mp4` file representing the segment. These files do not contain framing
+    data (start and end byte offsets of samples) and thus are not meant to be
+    decoded on their own.
+*   the `recording` table in a [SQLite3][sqlite3] database, intended to be
+    stored on flash if possible. A row in this table contains all the
+    metadata associated with the segment, including the sample-by-sample
+    contents of the MPEG-4 `stbl` box. At 30 fps, a row is expected to
+    require roughly 4 KB of storage (2 bytes per sample, plus some fixed
+    overhead).

 Putting the metadata on flash means metadata operations can be fast
 (sub-millisecond random access, with parallelism) and do not take precious
@@ -167,17 +188,18 @@ All metadata, including the `recording` table and others, will be stored in
 the SQLite3 database using [write-ahead logging][sqlite3-wal]. There are
 several reasons for this decision:

-* No user administration required. SQLite3, unlike its heavier-weight friends
-  MySQL and PostgreSQL, can be completely internal to the application. In many
-  applications, end users are unaware of the existence of a RDBMS, and
-  Moonfire NVR should be no exception.
-* Correctness. It's relatively easy to make guarantees about the state of an
-  ACID database, and SQLite3 in particular has a robust implementation. (See
-  [Files Are Hard][file-consistency].)
-* Developer ease and familiarity. SQL-based RDBMSs are quite common and
-  provide a lot of high-level constructs that ease development. SQLite3 in
-  particular is ubiquitous. Contributors are likely to come with some
-  understanding of the database, and there are many resources to learn more.
+*   No user administration required. SQLite3, unlike its heavier-weight friends
+    MySQL and PostgreSQL, can be completely internal to the application. In
+    many applications, end users are unaware of the existence of a RDBMS, and
+    Moonfire NVR should be no exception.
+*   Correctness. It's relatively easy to make guarantees about the state of an
+    ACID database, and SQLite3 in particular has a robust implementation.
+    (See [Files Are Hard][file-consistency].)
+*   Developer ease and familiarity. SQL-based RDBMSs are quite common and
+    provide a lot of high-level constructs that ease development. SQLite3 in
+    particular is ubiquitous. Contributors are likely to come with some
+    understanding of the database, and there are many resources to learn
+    more.

 Total database size is expected to be roughly 4 KB per minute at 30 fps, or
 1 GB for six camera-months of video. This will easily fit on a modest flash
@@ -189,40 +211,42 @@ to be a performance bottleneck.
 There are many constraints that influenced the choice of 1 minute as the
 duration of recordings.

-* Per-recording metadata size. There is a fixed component to the size of each
-  row, including the starting/ending timestamps, sample file UUID, etc. This
-  should not cause the database to be too large to fit on low-cost flash
-  devices. As described in the previous section, with 1 minute recordings the
-  size is quite modest.
-* Disk seeks. Sample files should be large enough that even during
-  simultaneous recording and playback of several streams, the disk seeks
-  incurred when switching from one file to another should not be significant.
-  At the extreme, a sample file per frame could cause an unacceptable 240
-  seeks per second just to record 8 30 fps streams. At one minute recording
-  time, 16 recording streams (2 per each of 8 cameras) and 4 playback streams
-  would cause on average 20 seeks per minute, or under 1% disk time.
-* Internal fragmentation. Common Linux filesystems have a block size of 4 KiB
-  (see `statvfs.f_frsize`). Up to this much space per file will be wasted at
-  the end of each file. At the bitrates described in "Background", this is an
-  insignicant .02% waste for main streams and .5% waste for sub streams.
-* Number of "slices" in .mp4 files. As described [below](#on-demand),
-  `.mp4` files will be constructed on-demand for export. It should be
-  possible to export an hours-long segment without too much overhead. In
-  particular, it must be possible to iterate through all the recordings,
-  assemble the list of slices, and calculate offsets and total size. One
-  minute seems acceptable; though we will watch this as work proceeds.
-* Crashes. On program crash or power loss, ideally it's acceptable to simply
-  discard any recordings in progress rather than add a checkpointing scheme.
-* Granularity of retention. It should be possible to extend retention time
-  around motion events without forcing retention of too much additional data
-  or copying bytes around on disk.
+*   Per-recording metadata size. There is a fixed component to the size of each
+    row, including the starting/ending timestamps, sample file UUID, etc.
+    This should not cause the database to be too large to fit on low-cost
+    flash devices. As described in the previous section, with 1 minute
+    recordings the size is quite modest.
+*   Disk seeks. Sample files should be large enough that even during
+    simultaneous recording and playback of several streams, the disk seeks
+    incurred when switching from one file to another should not be
+    significant. At the extreme, a sample file per frame could cause an
+    unacceptable 240 seeks per second just to record 8 30 fps streams. At one
+    minute recording time, 16 recording streams (2 per each of 8 cameras) and
+    4 playback streams would cause on average 20 seeks per minute, or under
+    1% disk time.
+*   Internal fragmentation. Common Linux filesystems have a block size of 4 KiB
+    (see `statvfs.f_frsize`). Up to this much space per file will be wasted
+    at the end of each file. At the bitrates described in "Background", this
+    is an insignicant .02% waste for main streams and .5% waste for sub
+    streams.
+*   Number of "slices" in .mp4 files. As described [below](#on-demand),
+    `.mp4` files will be constructed on-demand for export. It should be
+    possible to export an hours-long segment without too much overhead. In
+    particular, it must be possible to iterate through all the recordings,
+    assemble the list of slices, and calculate offsets and total size. One
+    minute seems acceptable; though we will watch this as work proceeds.
+*   Crashes. On program crash or power loss, ideally it's acceptable to simply
+    discard any recordings in progress rather than add a checkpointing scheme.
+*   Granularity of retention. It should be possible to extend retention time
+    around motion events without forcing retention of too much additional
+    data or copying bytes around on disk.

 The design avoids the need for the following constraints:

-* Dealing with events crossing segment boundaries. This is meant to be
-  invisible.
-* Serving close to live. It's possible to serve a recording as it is being
-  written.
+*   Dealing with events crossing segment boundaries. This is meant to be
+    invisible.
+*   Serving close to live. It's possible to serve a recording as it is being
+    written.

 ### Lifecycle of a sample file directory

@@ -230,19 +254,20 @@ One major disadvantage to splitting the state in two (the SQLite3 database in
 flash and the sample file directories on spinning disk) is the possibility of
 inconsistency. There are many ways this could arise:

-* a sample file directory's disk is unexpectedly not mounted due to hardware
-  failure or misconfiguration.
-* the administrator mixing up the mount points of two filesystems holding
-  different sample file directories.
-* the administrator renaming a sample file directory without updating the database.
-* the administrator restoring the database from backup but not the sample file
-  directory, or vice versa.
-* the administrator providing two sample file directory paths pointed at the
-  same inode via symlinks or non-canonical paths. (Note that flock(2) has a
-  design flaw in which multiple file descriptors can share a lock, so the current
-  locking scheme is not sufficient to detect this otherwise.)
-* database and sample file directories forked from the same version, opened
-  the same number of times, then crossed.
+*   a sample file directory's disk is unexpectedly not mounted due to hardware
+    failure or misconfiguration.
+*   the administrator mixing up the mount points of two filesystems holding
+    different sample file directories.
+*   the administrator renaming a sample file directory without updating the
+    database.
+*   the administrator restoring the database from backup but not the sample file
+    directory, or vice versa.
+*   the administrator providing two sample file directory paths pointed at the
+    same inode via symlinks or non-canonical paths. (Note that flock(2) has a
+    design flaw in which multiple file descriptors can share a lock, so the
+    current locking scheme is not sufficient to detect this otherwise.)
+*   database and sample file directories forked from the same version, opened
+    the same number of times, then crossed.

 To combat this, each sample file directory has some metadata its database row
 and stored file called `meta`. These track uuids associated with the database
@@ -323,74 +348,74 @@ This is a sub-procedure used in several places below.
 Precondition: the directory's lock is held with `LOCK_EX` (exclusive) and
 there is an existing metadata file.

-  1. Open the metadata file.
-  2. Rewrite the fixed-length data atomically.
-  3. `fdatasync` the file.
+1.  Open the metadata file.
+2.  Rewrite the fixed-length data atomically.
+3.  `fdatasync` the file.

 *Open the database as read-only*

-  1. Lock the database directory with `LOCK_SH` (shared).
-  2. Open the SQLite database with `SQLITE_OPEN_READ_ONLY`.
+1.  Lock the database directory with `LOCK_SH` (shared).
+2.  Open the SQLite database with `SQLITE_OPEN_READ_ONLY`.

 *Open the database as read-write*

-  1. Lock the database directory with `LOCK_EX` (exclusive).
-  2. Open the SQLite database with `SQLITE_OPEN_READ_WRITE`.
-  3. Insert a new `open` table row with the new sequence number and uuid.
+1.  Lock the database directory with `LOCK_EX` (exclusive).
+2.  Open the SQLite database with `SQLITE_OPEN_READ_WRITE`.
+3.  Insert a new `open` table row with the new sequence number and uuid.

 *Create a sample file directory*

 Precondition: database open read-write.

-  1. Lock the sample file directory with `LOCK_EX` (exclusive).
-  2. Verify there is no metadata file or `last_complete_open` is unset.
-  3. Write new metadata file with a fresh `dir_uuid` and a `in_progress_open`
-     matching the database's current open.
-  4. Add a matching row to the database with `last_complete_open_id` matching
-     the current open.
-  5. Update the metadata file to move `in_progress_open` to
-     `last_complete_open`.
+1.  Lock the sample file directory with `LOCK_EX` (exclusive).
+2.  Verify there is no metadata file or `last_complete_open` is unset.
+3.  Write new metadata file with a fresh `dir_uuid` and a `in_progress_open`
+    matching the database's current open.
+4.  Add a matching row to the database with `last_complete_open_id` matching
+    the current open.
+5.  Update the metadata file to move `in_progress_open` to
+    `last_complete_open`.

 *Open a sample file directory read-only*

 Precondition: database open (read-only or read-write).

-  1. Lock the sample file directory with `LOCK_SH` (shared).
-  2. Verify the metadata file matches the database:
-        * database uuid matches.
-        * dir uuid matches.
-        * if the database's `last_complete_open` is set, it must match the
-          directory's `last_complete_open` or `in_progress_open`.
-        * if the database's `last_complete_open` is absent, the directory's
-          must be as well.
+1.  Lock the sample file directory with `LOCK_SH` (shared).
+2.  Verify the metadata file matches the database:
+    *   database uuid matches.
+    *   dir uuid matches.
+    *   if the database's `last_complete_open` is set, it must match the
+        directory's `last_complete_open` or `in_progress_open`.
+    *   if the database's `last_complete_open` is absent, the directory's
+        must be as well.

 *Open a sample file directory read-write*

 Precondition: database open read-write.

-  1. Lock the sample file directory with `LOCK_EX` (exclusive).
-  2. Verify the metadata file matches the database (as above).
-  3. Update the metadata file with `in_progress_open` matching the current
-     open.
-  3. Update the database row with `last_complete_open_id` matching the current
-     open.
-  4. Update the metadata file with `last_complete_open` rather than
-     `in_progress_open`.
-  5. Run the recording startup procedure for this directory.
+1.  Lock the sample file directory with `LOCK_EX` (exclusive).
+2.  Verify the metadata file matches the database (as above).
+3.  Update the metadata file with `in_progress_open` matching the current
+    open.
+4.  Update the database row with `last_complete_open_id` matching the current
+    open.
+5.  Update the metadata file with `last_complete_open` rather than
+    `in_progress_open`.
+6.  Run the recording startup procedure for this directory.

 *Close a sample file directory*

-  1. Drop the sample file directory lock.
+1.  Drop the sample file directory lock.

 *Delete a sample file directory*

-  1. Remove all sample files (of all three categories described below:
-     `recording` table rows, `garbage` table rows, and files with recording
-     ids >= their stream's `cum_recordings`); see "delete a recording"
-     procedure below.
-  2. Rewrite the directory metadata with `in_progress_open` set to the current open,
-     `last_complete_open` cleared.
-  3. Delete the directory's row from the database.
+1.  Remove all sample files (of all three categories described below:
+    `recording` table rows, `garbage` table rows, and files with recording
+    ids >= their stream's `cum_recordings`); see "delete a recording"
+    procedure below.
+2.  Rewrite the directory metadata with `in_progress_open` set to the current open,
+    `last_complete_open` cleared.
+3.  Delete the directory's row from the database.

 ### Lifecycle of a recording

@@ -398,15 +423,15 @@ Because a major part of the recording state is outside the SQL database, care
 must be taken to guarantee consistency and durability. Moonfire NVR maintains
 three invariants about sample files:

-  1. `recording` table rows have sample files on disk with the indicated size
-     and SHA-1 hash.
-  2. Exactly one of the following statements is true for every sample file:
-        * It has a `recording` table row.
-        * It has a `garbage` table row.
-        * Its recording id is greater than or equal to the `cum_recordings`
-          for its stream.
-  3. After an orderly shutdown of Moonfire NVR, there is a `recording` table row
-     for every sample file, even if there have been previous crashes.
+1.  `recording` table rows have sample files on disk with the indicated size
+    and SHA-1 hash.
+2.  Exactly one of the following statements is true for every sample file:
+    *   It has a `recording` table row.
+    *   It has a `garbage` table row.
+    *   Its recording id is greater than or equal to the `cum_recordings`
+        for its stream.
+3.  After an orderly shutdown of Moonfire NVR, there is a `recording` table row
+    for every sample file, even if there have been previous crashes.

 The first invariant provides certainty that a recording is properly stored. It
 would be prohibitively expensive to verify hashes on demand (when listing or
@@ -423,31 +448,31 @@ These invariants are updated through the following procedure:

 *Create a recording:*

-1. Write the sample file, aborting if `open(..., O\_WRONLY|O\_CREATE|O\_EXCL)`
-   fails with `EEXIST`.
-3. `fsync()` the sample file.
-4. `fsync()` the sample file directory.
-5. Insert the `recording` row, marking its size and SHA-1 hash in the process.
+1.   Write the sample file, aborting if `open(..., O\_WRONLY|O\_CREATE|O\_EXCL)`
+     fails with `EEXIST`.
+3.   `fsync()` the sample file.
+4.   `fsync()` the sample file directory.
+5.   Insert the `recording` row, marking its size and SHA-1 hash in the process.

 *Delete a recording:*

-1. Replace the `recording` row with a `garbage` row.
-2. `unlink()` the sample file, warning on `ENOENT`. (This would indicate
-   invariant #2 is false.)
-3. `fsync()` the sample file directory.
-4. Delete the `garbage` row.
+1.   Replace the `recording` row with a `garbage` row.
+2.   `unlink()` the sample file, warning on `ENOENT`. (This would indicate
+     invariant #2 is false.)
+3.   `fsync()` the sample file directory.
+4.   Delete the `garbage` row.

 *Startup (crash recovery):*

-1. Acquire a lock to guarantee this is the only Moonfire NVR process running
-   against the given database. This lock is not released until program shutdown.
-2. Query `garbage` table and `cum_recordings` field in the `stream` table.
-3. `unlink()` all the sample files associated with garbage rows, ignoring
-   `ENOENT`.
-4. For each stream, `unlink()` all the existing files with recording ids >=
-   `cum_recordings`.
-4. `fsync()` the sample file directory.
-5. Delete all rows from the `garbage` table.
+1.   Acquire a lock to guarantee this is the only Moonfire NVR process running
+     against the given database. This lock is not released until program shutdown.
+2.   Query `garbage` table and `cum_recordings` field in the `stream` table.
+3.   `unlink()` all the sample files associated with garbage rows, ignoring
+     `ENOENT`.
+4.   For each stream, `unlink()` all the existing files with recording ids >=
+     `cum_recordings`.
+4.   `fsync()` the sample file directory.
+5.   Delete all rows from the `garbage` table.

 The procedures can be batched: while for a given recording, the steps must be
 strictly ordered, multiple recordings can be proceeding through the steps
@@ -471,9 +496,9 @@ problem.
 There should be a means to verify the invariants above. There are three
 possible levels of verification:

-1. Compare presence of sample files.
-2. Compare size of sample files.
-3. Compare hashes of sample files.
+1.   Compare presence of sample files.
+2.   Compare size of sample files.
+3.   Compare hashes of sample files.

 Consider a database with a 6 camera-months of recordings at 3.1 Mbps (for
 both main and sub streams). There would be 0.5 million files, taking 5.9 TB.
@@ -487,15 +512,17 @@ The times are roughly:

 The `readdir()` and `fstat()` times can be tested simply:

-    $ mkdir testdir
-    $ cd testdir
-    $ seq 1 $[60*24*365*6/12*2] | xargs touch
-    $ sudo sh -c 'echo 1 > /proc/sys/vm/drop_caches'
-    $ time ls -1 -f | wc -l
-    $ sudo sh -c 'echo 1 > /proc/sys/vm/drop_caches'
-    $ time ls -1 -f --size | wc -l
+```
+$ mkdir testdir
+$ cd testdir
+$ seq 1 $[60*24*365*6/12*2] | xargs touch
+$ sudo sh -c 'echo 1 > /proc/sys/vm/drop_caches'
+$ time ls -1 -f | wc -l
+$ sudo sh -c 'echo 1 > /proc/sys/vm/drop_caches'
+$ time ls -1 -f --size | wc -l
+```

-    (The system calls used by `ls` can be verified through strace.)
+(The system calls used by `ls` can be verified through strace.)

 The hash verification time is easiest to calculate: reading 5.9 TB at 100
 MB/sec takes about 8 hours. On some systems, it will be even slower. On
@@ -515,42 +542,44 @@ the background at low priority.
 The snippet below is a illustrative excerpt of the SQLite schema; see
 `schema.sql` for the authoritative, up-to-date version.

-    -- A single, typically 60-second, recorded segment of video.
-    create table recording (
-      id integer primary key,
-      open_id integer references open (id),
-      camera_id integer references camera (id) not null,
+```sql
+-- A single, typically 60-second, recorded segment of video.
+create table recording (
+    id integer primary key,
+    open_id integer references open (id),
+    camera_id integer references camera (id) not null,

-      sample_file_uuid blob unique not null,
-      sample_file_blake3 blob,
-      sample_file_size integer,
+    sample_file_uuid blob unique not null,
+    sample_file_blake3 blob,
+    sample_file_size integer,

-      -- The starting time and duration of the recording, in 90 kHz units since
-      -- 1970-01-01 00:00:00 UTC.
-      start_time_90k integer not null,
-      duration_90k integer,
+    -- The starting time and duration of the recording, in 90 kHz units since
+    -- 1970-01-01 00:00:00 UTC.
+    start_time_90k integer not null,
+    duration_90k integer,

-      video_samples integer,
-      video_sample_entry_id blob references visual_sample_entry (id),
-      video_index blob,
+    video_samples integer,
+    video_sample_entry_id blob references visual_sample_entry (id),
+    video_index blob,

-      ...
-    );
+    ...
+);

-    -- A concrete box derived from a ISO/IEC 14496-12 section 8.5.2
-    -- VisualSampleEntry box. Describes the codec, width, height, etc.
-    create table visual_sample_entry (
-      id integerprimary key,
+-- A concrete box derived from a ISO/IEC 14496-12 section 8.5.2
+-- VisualSampleEntry box. Describes the codec, width, height, etc.
+create table visual_sample_entry (
+    id integerprimary key,

-      -- The width and height in pixels; must match values within
-      -- `sample_entry_bytes`.
-      width integer,
-      height integer,
+    -- The width and height in pixels; must match values within
+    -- `sample_entry_bytes`.
+    width integer,
+    height integer,

-      -- A serialized SampleEntry box, including the leading length and box
-      -- type (avcC in the case of H.264).
-      data blob
-    );
+    -- A serialized SampleEntry box, including the leading length and box
+    -- type (avcC in the case of H.264).
+    data blob
+);
+```

 As mentioned by the `start_time_90k` field above, recordings use a 90 kHz time
 base. This matches the RTP timestamp frequency used for H.264 and other video
@@ -579,21 +608,21 @@ only with certain firmware versions (see [thread][hikvision-sr]). Most likely
 it will be useful to have any available clock/timing information for
 diagnosing problems, such as the following:

-* the NVR's wall clock time
-* the NVR's NTP server sync status
-* the NVR's uptime
-* the camera's time as of the RTP play response
-* the camera's time as of any RTCP Sender Reports, and the corresponding RTP
-  timestamps
+*   the NVR's wall clock time
+*   the NVR's NTP server sync status
+*   the NVR's uptime
+*   the camera's time as of the RTP play response
+*   the camera's time as of any RTCP Sender Reports, and the corresponding RTP
+    timestamps

 #### `video_index`

 The `video_index` field conceptually holds three pieces of information about
 the samples:

-1. the duration (in 90kHz units) of each sample
-2. the byte size of each sample
-3. which samples are "sync samples" (aka key frames or I-frames)
+1.   the duration (in 90kHz units) of each sample
+2.   the byte size of each sample
+3.   which samples are "sync samples" (aka key frames or I-frames)

 These correspond to [ISO/IEC 14496-12][iso-14496-12] `stts` (TimeToSampleBox,
 section 8.6.1.2), `stsz` (SampleSizeBox, section 8.7.3), and `stss`
@@ -614,16 +643,18 @@ This encoding is chosen so that values will be near zero, and thus the varints
 will be at their most compact possible form. An index might be written by the
 following pseudocode:

-    prev_duration = 0
-    prev_bytes_key = 0
-    prev_bytes_nonkey = 0
-    for each frame:
-      duration_delta = duration - prev_duration
-      bytes_delta = bytes - (is_key ? prev_bytes_key : prev_bytes_nonkey)
-      prev_duration_ms = duration_ms
-      if key: prev_bytes_key = bytes else: prev_bytes_nonkey = bytes
-      PutVarint((Zigzag(duration_delta) << 1) | is_key)
-      PutVarint(Zigzag(bytes_delta)
+```
+prev_duration = 0
+prev_bytes_key = 0
+prev_bytes_nonkey = 0
+for each frame:
+  duration_delta = duration - prev_duration
+  bytes_delta = bytes - (is_key ? prev_bytes_key : prev_bytes_nonkey)
+  prev_duration_ms = duration_ms
+  if key: prev_bytes_key = bytes else: prev_bytes_nonkey = bytes
+  PutVarint((Zigzag(duration_delta) << 1) | is_key)
+  PutVarint(Zigzag(bytes_delta)
+```

 See also the example below:

@@ -643,10 +674,10 @@ See also the example below:
 A major goal of this format is to support on-demand serving in various formats,
 including two types of `.mp4` files:

-* unfragmented `.mp4` files, for traditional video players.
-* fragmented `.mp4` files for MPEG-DASH or HTML5 Media Source Extensions
-  (see [Media Source ISO BMFF Byte Stream Format][media-bmff]), for
-  a browser-based user interface.
+*   unfragmented `.mp4` files, for traditional video players.
+*   fragmented `.mp4` files for MPEG-DASH or HTML5 Media Source Extensions
+    (see [Media Source ISO BMFF Byte Stream Format][media-bmff]), for
+    a browser-based user interface.

 This does not require writing new `.mp4` files to disk. In fact, HTTP range
 requests (for "pseudo-streaming") can be satisfied on `.mp4` files aggregated
@@ -654,38 +685,6 @@ from several segments. The implementation details are outside the scope of this
 document, but this is possible in part due to the use of an on-flash database
 to store metadata and the simple, consistent format of sample indexes.

-### Copyright
-
-This file is part of Moonfire NVR, a security camera network video recorder.
-Copyright (C) 2016 The Moonfire NVR Authors
-
-This program is free software: you can redistribute it and/or modify
-it under the terms of the GNU General Public License as published by
-the Free Software Foundation, either version 3 of the License, or
-(at your option) any later version.
-
-In addition, as a special exception, the copyright holders give
-permission to link the code of portions of this program with the
-OpenSSL library under certain conditions as described in each
-individual source file, and distribute linked combinations including
-the two.
-
-You must obey the GNU General Public License in all respects for all
-of the code used other than OpenSSL. If you modify file(s) with this
-exception, you may extend this exception to your version of the
-file(s), but you are not obligated to do so. If you do not wish to do
-so, delete this exception statement from your version. If you delete
-this exception statement from all source files in the program, then
-also delete it here.
-
-This program is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU General Public License for more details.
-
-You should have received a copy of the GNU General Public License
-along with this program.  If not, see <http://www.gnu.org/licenses/>.
-
 [pi2]: https://www.raspberrypi.org/products/raspberry-pi-2-model-b/
 [xandem]: http://www.xandemhome.com/
 [hikcam]: http://overseas.hikvision.com/us/Products_accessries_10533_i7696.html
--- a/design/time.md
+++ b/design/time.md
@@ -1,4 +1,4 @@
-# Moonfire NVR Time Handling
+# Moonfire NVR Time Handling <!-- omit in toc -->

 Status: **current**.

@@ -7,6 +7,19 @@ Status: **current**.
 >
 > — Segal's law

+* [Objective](#objective)
+* [Background](#background)
+* [Overview](#overview)
+* [Detailed design](#detailed-design)
+* [Caveats](#caveats)
+    * [Stream mismatches](#stream-mismatches)
+    * [Time discontinuities](#time-discontinuities)
+    * [Leap seconds](#leap-seconds)
+        * [Use `clock_gettime(CLOCK_TAI, ...)` timestamps](#use-clock_gettimeclock_tai--timestamps)
+        * [Use a leap second table when calculating differences](#use-a-leap-second-table-when-calculating-differences)
+        * [Use smeared time](#use-smeared-time)
+* [Alternatives considered](#alternatives-considered)
+
 ## Objective

 Maximize the likelihood Moonfire NVR's timestamps are useful.
@@ -14,20 +27,20 @@ Maximize the likelihood Moonfire NVR's timestamps are useful.
 The timestamp corresponding to a video frame should roughly match timestamps
 from other sources:

-   * another video stream from the same camera. Given a video frame from the
-     "main" stream, a video frame from the "sub" stream with a similar
-     timestamp should have been recorded near the same time, and vice versa.
-     This minimizes confusion when switching between views of these streams,
-     and when viewing the "main" stream timestamps corresponding to a motion
-     event gathered from the less CPU-intensive "sub" stream.
-   * on-camera motion events from the same camera. If the video frame reflects
-     the motion event, its timestamp should be roughly within the event's
-     timespan.
-   * streams from other cameras. Recorded views from two cameras of the same
-     event should have similar timestamps.
-   * events noted by the owner of the system, neighbors, police, etc., for the
-     purpose of determining chronology, to the extent those persons use
-     accurate clocks.
+*   another video stream from the same camera. Given a video frame from the
+    "main" stream, a video frame from the "sub" stream with a similar
+    timestamp should have been recorded near the same time, and vice versa.
+    This minimizes confusion when switching between views of these streams,
+    and when viewing the "main" stream timestamps corresponding to a motion
+    event gathered from the less CPU-intensive "sub" stream.
+*  on-camera motion events from the same camera. If the video frame reflects
+    the motion event, its timestamp should be roughly within the event's
+    timespan.
+*   streams from other cameras. Recorded views from two cameras of the same
+    event should have similar timestamps.
+*   events noted by the owner of the system, neighbors, police, etc., for the
+    purpose of determining chronology, to the extent those persons use
+    accurate clocks.

 Two recordings from the same stream should not overlap. This would make it
 impossible for a user interface to present a simple timeline for accessing all
@@ -35,28 +48,28 @@ recorded video.

 Durations should be useful over short timescales:

-   * If an object's motion is recorded, distance travelled divided by the
-     duration of the frames over which this motion occurred should reflect the
-     object's average speed.
-   * Motion should appear smooth. There shouldn't be excessive frame-to-frame
-     jitter due to such factors as differences in encoding time or network
-     transmission.
+*   If an object's motion is recorded, distance travelled divided by the
+    duration of the frames over which this motion occurred should reflect the
+    object's average speed.
+*   Motion should appear smooth. There shouldn't be excessive frame-to-frame
+    jitter due to such factors as differences in encoding time or network
+    transmission.

 This document describes an approach to achieving these goals when the
 following statements are true:

-   * the NVR's system clock is within a second of correct on startup. (True
-     when NTP is functioning or when the system has a real-time clock battery
-     to preserve a previous correct time.)
-   * the NVR's system time does not experience forward or backward "step"
-     corrections (as opposed to frequency correction) during operation.
-   * the NVR's system time advances at roughly the correct frequency. (NTP
-     achieves this through frequency correction when operating correctly.)
-   * the cameras' clock frequencies are off by no more than 500 parts per
-     million (roughly 43 seconds per day).
-   * the cameras are geographically close to the NVR, so in most cases network
-     transmission time is under 50 ms. (Occasional delays are to be expected,
-     however.)
+*   the NVR's system clock is within a second of correct on startup. (True
+    when NTP is functioning or when the system has a real-time clock battery
+    to preserve a previous correct time.)
+*   the NVR's system time does not experience forward or backward "step"
+    corrections (as opposed to frequency correction) during operation.
+*   the NVR's system time advances at roughly the correct frequency. (NTP
+    achieves this through frequency correction when operating correctly.)
+*   the cameras' clock frequencies are off by no more than 500 parts per
+    million (roughly 43 seconds per day).
+*   the cameras are geographically close to the NVR, so in most cases network
+    transmission time is under 50 ms. (Occasional delays are to be expected,
+    however.)

 When one or more of those statements are false, the system should degrade
 gracefully: preserve what properties it can, gather video anyway, and when
@@ -81,40 +94,40 @@ so such problems are to be expected.
 Moonfire NVR typically has access to the following sources of time
 information:

-   * the local `CLOCK_REALTIME`. Ideally this is maintained by `ntpd`:
-     synchronized on startup, and frequency-corrected during operation. A
-     hardware real-time clock and battery keep accurate time across restarts
-     if the network is unavailable on startup. In the worst case, the system
-     has no real-time clock or no battery and a network connection is
-     unavailable. The time is far in the past on startup and is never
-     corrected or is corrected via a step while Moonfire NVR is running.
-   * the local `CLOCK_MONOTONIC`. This should be frequency-corrected by `ntpd`
-     and guaranteed to never experience "steps", though its reference point is
-     unspecified.
-   * the local `ntpd`, which can be used to determine if the system is
-     synchronized to NTP and quantify the precision of synchronization.
-   * each camera's clock. The ONVIF specification mandates cameras must
-     support synchronizing clocks via NTP, but in practice cameras appear to
-     use SNTP clients which simply step time periodically and provide no
-     interface to determine if the clock is currently synchronized. This
-     document's author owns several cameras with clocks that run roughly 20
-     ppm fast (2 seconds per day) and are adjusted via steps.
-   * the RTP timestamps from each of a camera's streams. As described in
-     [RFC 3550 section 5.1](https://tools.ietf.org/html/rfc3550#section-5.1),
-     these are monotonically increasing with an unspecified reference point.
-     They can't be directly compared to other cameras or other streams from
-     the same camera. Emperically, budget cameras don't appear to do any
-     frequency correction on these timestamps.
-   * in some cases, RTCP sender reports, as described in
-     [RFC 3550 section 6.4](https://tools.ietf.org/html/rfc3550#section-6.4).
-     These correlate RTP timestamps with the camera's real time clock.
-     However, these are only sent periodically, not necessarily at the
-     beginning of the session.  Some cameras omit them entirely depending on
-     firmware version, as noted in
-     [this forum post](https://www.cctvforum.com/topic/40914-video-sync-with-hikvision-ipcams-tech-query-about-rtcp/).
-     Additionally, Moonfire NVR currently uses ffmpeg's libavformat for RTSP
-     protocol handling; this library exposes these reports in a limited
-     fashion.
+*   the local `CLOCK_REALTIME`. Ideally this is maintained by `ntpd`:
+    synchronized on startup, and frequency-corrected during operation. A
+    hardware real-time clock and battery keep accurate time across restarts
+    if the network is unavailable on startup. In the worst case, the system
+    has no real-time clock or no battery and a network connection is
+    unavailable. The time is far in the past on startup and is never
+    corrected or is corrected via a step while Moonfire NVR is running.
+*   the local `CLOCK_MONOTONIC`. This should be frequency-corrected by `ntpd`
+    and guaranteed to never experience "steps", though its reference point is
+    unspecified.
+*   the local `ntpd`, which can be used to determine if the system is
+    synchronized to NTP and quantify the precision of synchronization.
+*   each camera's clock. The ONVIF specification mandates cameras must
+    support synchronizing clocks via NTP, but in practice cameras appear to
+    use SNTP clients which simply step time periodically and provide no
+    interface to determine if the clock is currently synchronized. This
+    document's author owns several cameras with clocks that run roughly 20
+    ppm fast (2 seconds per day) and are adjusted via steps.
+*   the RTP timestamps from each of a camera's streams. As described in
+    [RFC 3550 section 5.1](https://tools.ietf.org/html/rfc3550#section-5.1),
+    these are monotonically increasing with an unspecified reference point.
+    They can't be directly compared to other cameras or other streams from
+    the same camera. Emperically, budget cameras don't appear to do any
+    frequency correction on these timestamps.
+*   in some cases, RTCP sender reports, as described in
+    [RFC 3550 section 6.4](https://tools.ietf.org/html/rfc3550#section-6.4).
+    These correlate RTP timestamps with the camera's real time clock.
+    However, these are only sent periodically, not necessarily at the
+    beginning of the session.  Some cameras omit them entirely depending on
+    firmware version, as noted in
+    [this forum post](https://www.cctvforum.com/topic/40914-video-sync-with-hikvision-ipcams-tech-query-about-rtcp/).
+    Additionally, Moonfire NVR currently uses ffmpeg's libavformat for RTSP
+    protocol handling; this library exposes these reports in a limited
+    fashion.

 The camera records video frames as in the diagram below:

@@ -224,14 +237,14 @@ wall clock, and thus calculate the camera's time as of the first frame.
 The _start time_ of the first recording could be either its local start time
 or its camera start time, determined via the following rules:

-   1. if there is no camera start time (due to the lack of a RTCP sender
-      report), the local start time wins by default.
-   2. if the camera start time is before 2016-01-01 00:00:00 UTC, the local
-      start time wins.
-   3. if the local start time is before 2016-01-01 00:00:00 UTC, the camera
-      start time wins.
-   4. if the times differ by more than 5 seconds, the local start time wins.
-   5. otherwise, the camera start time wins.
+1.  if there is no camera start time (due to the lack of a RTCP sender
+    report), the local start time wins by default.
+2.  if the camera start time is before 2016-01-01 00:00:00 UTC, the local
+    start time wins.
+3.  if the local start time is before 2016-01-01 00:00:00 UTC, the camera
+    start time wins.
+4.  if the times differ by more than 5 seconds, the local start time wins.
+5.  otherwise, the camera start time wins.

 These rules are a compromise. When a system starts up without NTP or a clock
 battery, it typically reverts to a time in the distant past. Therefore times
@@ -259,10 +272,10 @@ happened](https://github.com/scottlamb/moonfire-nvr/issues/9#issuecomment-322663
 Moonfire NVR will continue to use the initial wall clock time for as long as
 the recording lasts. This can result in some unfortunate behaviors:

-   * a recording that lasts for months might have an incorrect time all the
-     way through because `ntpd` took a few minutes on startup.
-   * two recordings that were in fact simultaneous might be recorded with very
-     different times because a time jump happened between their starts.
+*   a recording that lasts for months might have an incorrect time all the
+    way through because `ntpd` took a few minutes on startup.
+*   two recordings that were in fact simultaneous might be recorded with very
+    different times because a time jump happened between their starts.

 It might be better to use the new time (assuming that ntpd has made a
 correction) retroactively. This is unimplemented, but the
@@ -299,18 +312,18 @@ Timestamps in the TAI clock system don't skip leap seconds. There's a system
 interface intended to provide timestamps in this clock system, and Moonfire
 NVR could use it. Unfortunately this has several problems:

-   * `CLOCK_TAI` is only available on Linux. It'd be preferable to handle
-     timestamps in a consistent way on other platforms. (At least on macOS,
-     Moonfire NVR's current primary development platform.)
-   * `CLOCK_TAI` is wrong on startup and possibly adjusted later. The offset
-     between TAI and UTC is initially assumed to be 0. It's corrected when/if
-     a sufficiently new `ntpd` starts.
-   * We'd need a leap second table to translate this into calendar time. One
-     would have to be downloaded from the Internet periodically, and we'd need
-     to consider the case in which the available table is expired.
-   * `CLOCK_TAI` likely doesn't work properly with leap smear systems. Where
-     the leap smear prevents a time jump for `CLOCK_REALTIME`, it likely
-     introduces one for `CLOCK_TAI`.
+*   `CLOCK_TAI` is only available on Linux. It'd be preferable to handle
+    timestamps in a consistent way on other platforms. (At least on macOS,
+    Moonfire NVR's current primary development platform.)
+*   `CLOCK_TAI` is wrong on startup and possibly adjusted later. The offset
+    between TAI and UTC is initially assumed to be 0. It's corrected when/if
+    a sufficiently new `ntpd` starts.
+*   We'd need a leap second table to translate this into calendar time. One
+    would have to be downloaded from the Internet periodically, and we'd need
+    to consider the case in which the available table is expired.
+*   `CLOCK_TAI` likely doesn't work properly with leap smear systems. Where
+    the leap smear prevents a time jump for `CLOCK_REALTIME`, it likely
+    introduces one for `CLOCK_TAI`.

 #### Use a leap second table when calculating differences

@@ -345,4 +358,4 @@ Schema versions prior to 6 used a simpler database schema which didn't
 distinguish between "wall" and "media" time. Instead, the durations of video
 samples were adjusted for clock correction. This approach worked well for
 video. It couldn't be extended to audio without decoding and re-encoding to
-adjust same lengths and pitch.
+adjust same lengths and pitch.