708 lines
30 KiB
Markdown
708 lines
30 KiB
Markdown
# Moonfire NVR Storage Schema <!-- omit in toc -->
|
|
|
|
Status: **current**.
|
|
|
|
This is the initial design for the most fundamental parts of the Moonfire NVR
|
|
storage schema. See also [guide/schema.md](../guide/schema.md) for more
|
|
administrator-focused documentation.
|
|
|
|
* [Objective](#objective)
|
|
* [Cameras](#cameras)
|
|
* [Hard drives](#hard-drives)
|
|
* [Overview](#overview)
|
|
* [Detailed design](#detailed-design)
|
|
* [SQLite3](#sqlite3)
|
|
* [Duration of recordings](#duration-of-recordings)
|
|
* [Lifecycle of a sample file directory](#lifecycle-of-a-sample-file-directory)
|
|
* [Lifecycle of a recording](#lifecycle-of-a-recording)
|
|
* [Verifying invariants](#verifying-invariants)
|
|
* [Recording table](#recording-table)
|
|
* [`video_index`](#video_index)
|
|
* [On-demand `.mp4` construction](#on-demand-mp4-construction)
|
|
|
|
## Objective
|
|
|
|
Goals:
|
|
|
|
* record streams from modern ONVIF/PSIA IP security cameras
|
|
* support several cameras
|
|
* maintain full fidelity of incoming compressed video streams
|
|
* record continuously
|
|
* support on-demand serving in different file formats / protocols
|
|
(such as standard .mp4 files for arbitrary timespans, fragmented .mp4 files
|
|
for MPEG-DASH or HTML5 Video Source Extensions, MPEG-TS files for HTTP Live
|
|
Streaming, and "trick play" RTSP)
|
|
* annotate camera timelines with metadata
|
|
(such as motion detection, security alarm events, etc)
|
|
* retain video segments with ~1-minute granularity based on metadata
|
|
(e.g., extend retention of motion events)
|
|
* take advantage of compact, inexpensive, low-power, commonly-available
|
|
hardware such as the $35 [Raspberry Pi 2 Model B][pi2]
|
|
* support high- and low-bandwidth playback
|
|
* support near-live playback (~second old), including "trick play"
|
|
* allow verifying database consistency with an `fsck` tool
|
|
|
|
Non-goals:
|
|
|
|
* record streams from older cameras: JPEG/MJPEG USB "webcams" and analog
|
|
security cameras/capture cards
|
|
* allow users to directly access or manipulate the stored data with standard
|
|
video or filesystem tools
|
|
* support H.264 features not used by common IP camera encoders, such as
|
|
B-frames and Periodic Infra Refresh.
|
|
* support recovering the last ~minute of video after a crash or power loss
|
|
|
|
Possible future goals:
|
|
|
|
* record audio and/or other types of timestamped samples (such as
|
|
[Xandem][xandem] tomography data).
|
|
|
|
### Cameras
|
|
|
|
Inexpensive modern ONVIF/PSIA IP security cameras, such as the $100
|
|
[Hikvision DS-2CD2032-I][hikcam], support two H.264-encoded RTSP
|
|
streams. They have many customizable settings, such as resolution, frame rate,
|
|
compression quality, maximum bitrate, I-frame interval. A typical setup might be
|
|
as follows:
|
|
|
|
* the high-quality "main" stream as 1080p/30fps, 3000 kbps.
|
|
This stream is well-suited to local viewing or forensics.
|
|
* the low-bandwidth "sub" stream as 704x480/10fps, 100 kbps.
|
|
This stream may be preferred for mobile/remote viewing, when viewing several
|
|
streams side-by-side, and for real-time computer vision (such as salient
|
|
motion detection).
|
|
|
|
The dual pre-encoded H.264 video streams provide a tremendous advantage over
|
|
older camera models (which provided raw video or JPEG-encoded frames) because
|
|
the encoding is prohibitively expensive in multi-camera setups.
|
|
[libx264][libx264] supports "encoding 4 or more 1080p streams in realtime on a
|
|
single consumer-level computer", but this does not apply to the low-cost devices
|
|
Moonfire NVR targets. In fact, even decoding can be expensive on the
|
|
full-quality streams, enough to challenge the feasibility of on-NVR motion
|
|
detection. It's valuable to have the "sub" stream for this purpose.
|
|
|
|
The table below shows cost of processing a single stream, as a percentage of the
|
|
whole processor ((user+sys) time / video duration / CPU cores). **TODO:** try
|
|
different quality settings as well.
|
|
|
|
Decode:
|
|
|
|
```
|
|
$ time ffmpeg -y -threads 1 -i input.mp4 \
|
|
-f null /dev/null
|
|
```
|
|
|
|
Combo (Decode + encode with libx264):
|
|
|
|
```
|
|
$ time ffmpeg -y -threads 1 -i input.mp4 \
|
|
-c:v libx264 -preset ultrafast -threads 1 -f mp4 /dev/null
|
|
```
|
|
|
|
|
|
| Processor | 1080p30 decode | 1080p30 combo | 704x480p10 decode | 704x480p10 combo |
|
|
| :---------------------------- | -------------: | ------------: | ----------------: | ---------------: |
|
|
| [Intel i7-2635QM][2635QM] | 6.0% | 23.7% | 0.2% | 1.0% |
|
|
| [Intel Atom C2538][C2538] | 16.7% | 58.1% | 0.7% | 3.0% |
|
|
| [Raspberry Pi 2 Model B][pi2] | 68.4% | **230.1%** | 2.9% | 11.7% |
|
|
|
|
Hardware-accelerated decoding/encoding is possible in some cases (VAAPI on the
|
|
Intel processors, or OpenMAX on the Raspberry Pi), but similarly it would not be
|
|
possible to have several high-quality streams without using the camera's
|
|
encoding. **TODO:** get numbers.
|
|
|
|
### Hard drives ###
|
|
|
|
With current hard drives prices (see [WD Purple][wdpurple] prices below), it's
|
|
cost-effective to store a month or more of high-quality video, at roughly 1
|
|
camera-month per TB.
|
|
|
|
| Capacity | Price |
|
|
| -------: | ----: |
|
|
| 1 TB | $61 |
|
|
| 2 TB | $82 |
|
|
| 3 TB | $107 |
|
|
| 4 TB | $157 |
|
|
| 6 TB | $240 |
|
|
|
|
Typical sequential bandwidth is >100 MB/sec, more than that required by over a
|
|
hundred streams at 3 Mbps. The concern is seek times: a [WD20EURS][wd20eurs]
|
|
appears to require 20 ms per sequential random access (across the full range
|
|
of the disk), as measured with [seeker][seeker]. Put another way, the drive is
|
|
only capable of 50 random accesses per second, and each one takes time that
|
|
otherwise could be used to transfer 2+ MB. The constrained resource, *disk
|
|
time fraction*, can be bounded as follows:
|
|
|
|
```
|
|
disk time fraction <= (seek rate) / (50 seeks/sec) +
|
|
(bandwidth) / (100 MB/sec)
|
|
```
|
|
|
|
## Overview
|
|
|
|
Moonfire NVR divides video streams into 1-minute recordings. These boundaries
|
|
are invisible to the user. On playback, the UI moves from one recording to
|
|
another seamlessly. When exporting video, recordings are automatically spliced
|
|
together.
|
|
|
|
Each recording is stored in two places:
|
|
|
|
* a sample file directory, intended to be stored on spinning disk.
|
|
Each file in this directory is simply a concatenation of the compressed,
|
|
timestamped video samples (also called "packets" or encoded frames), as
|
|
received from the camera. In MPEG-4 terminology (see [ISO
|
|
14496-12][iso-14496-12]), this is the contents of a `mdat` box for a
|
|
`.mp4` file representing the segment. These files do not contain framing
|
|
data (start and end byte offsets of samples) and thus are not meant to be
|
|
decoded on their own.
|
|
* the `recording` table in a [SQLite3][sqlite3] database, intended to be
|
|
stored on flash if possible. A row in this table contains all the
|
|
metadata associated with the segment, including the sample-by-sample
|
|
contents of the MPEG-4 `stbl` box. At 30 fps, a row is expected to
|
|
require roughly 4 KB of storage (2 bytes per sample, plus some fixed
|
|
overhead).
|
|
|
|
Putting the metadata on flash means metadata operations can be fast
|
|
(sub-millisecond random access, with parallelism) and do not take precious
|
|
disk time fraction away from accessing sample data. Disk time can be saved for
|
|
long sequential accesses. Assuming filesystem metadata is cached, Moonfire NVR
|
|
can seek directly to the correct sample.
|
|
|
|
To avoid a burst of seeks every minute, rotation times will be staggered. For
|
|
example, if there are two cameras (A and B), camera A's main stream might
|
|
switch to a new recording at :00 seconds past the minute, B's main stream at
|
|
:15 seconds past the minute, and likewise the sub streams, as shown below.
|
|
|
|
| camera | stream | switchover |
|
|
| :----- | :----- | ---------: |
|
|
| A | main | xx:xx:00 |
|
|
| B | main | xx:xx:15 |
|
|
| A | sub | xx:xx:30 |
|
|
| B | sub | xx:xx:45 |
|
|
|
|
## Detailed design
|
|
|
|
### SQLite3
|
|
|
|
All metadata, including the `recording` table and others, will be stored in
|
|
the SQLite3 database using [write-ahead logging][sqlite3-wal]. There are
|
|
several reasons for this decision:
|
|
|
|
* No user administration required. SQLite3, unlike its heavier-weight friends
|
|
MySQL and PostgreSQL, can be completely internal to the application. In
|
|
many applications, end users are unaware of the existence of a RDBMS, and
|
|
Moonfire NVR should be no exception.
|
|
* Correctness. It's relatively easy to make guarantees about the state of an
|
|
ACID database, and SQLite3 in particular has a robust implementation.
|
|
(See [Files Are Hard][file-consistency].)
|
|
* Developer ease and familiarity. SQL-based RDBMSs are quite common and
|
|
provide a lot of high-level constructs that ease development. SQLite3 in
|
|
particular is ubiquitous. Contributors are likely to come with some
|
|
understanding of the database, and there are many resources to learn
|
|
more.
|
|
|
|
Total database size is expected to be roughly 4 KB per minute at 30 fps, or
|
|
1 GB for six camera-months of video. This will easily fit on a modest flash
|
|
device. Given the fast storage and modest size, the database is not expected
|
|
to be a performance bottleneck.
|
|
|
|
### Duration of recordings
|
|
|
|
There are many constraints that influenced the choice of 1 minute as the
|
|
duration of recordings.
|
|
|
|
* Per-recording metadata size. There is a fixed component to the size of each
|
|
row, including the starting/ending timestamps, sample file UUID, etc.
|
|
This should not cause the database to be too large to fit on low-cost
|
|
flash devices. As described in the previous section, with 1 minute
|
|
recordings the size is quite modest.
|
|
* Disk seeks. Sample files should be large enough that even during
|
|
simultaneous recording and playback of several streams, the disk seeks
|
|
incurred when switching from one file to another should not be
|
|
significant. At the extreme, a sample file per frame could cause an
|
|
unacceptable 240 seeks per second just to record 8 30 fps streams. At one
|
|
minute recording time, 16 recording streams (2 per each of 8 cameras) and
|
|
4 playback streams would cause on average 20 seeks per minute, or under
|
|
1% disk time.
|
|
* Internal fragmentation. Common Linux filesystems have a block size of 4 KiB
|
|
(see `statvfs.f_frsize`). Up to this much space per file will be wasted
|
|
at the end of each file. At the bitrates described in "Background", this
|
|
is an insignicant .02% waste for main streams and .5% waste for sub
|
|
streams.
|
|
* Number of "slices" in .mp4 files. As described
|
|
[below](#on-demand-mp4-construction), `.mp4` files will be constructed
|
|
on-demand for export. It should be possible to export an hours-long segment
|
|
without too much overhead. In particular, it must be possible to iterate
|
|
through all the recordings, assemble the list of slices, and calculate
|
|
offsets and total size. One minute seems acceptable; though we will watch
|
|
this as work proceeds.
|
|
* Crashes. On program crash or power loss, ideally it's acceptable to simply
|
|
discard any recordings in progress rather than add a checkpointing scheme.
|
|
* Granularity of retention. It should be possible to extend retention time
|
|
around motion events without forcing retention of too much additional
|
|
data or copying bytes around on disk.
|
|
|
|
The design avoids the need for the following constraints:
|
|
|
|
* Dealing with events crossing segment boundaries. This is meant to be
|
|
invisible.
|
|
* Serving close to live. It's possible to serve a recording as it is being
|
|
written.
|
|
|
|
### Lifecycle of a sample file directory
|
|
|
|
One major disadvantage to splitting the state in two (the SQLite3 database in
|
|
flash and the sample file directories on spinning disk) is the possibility of
|
|
inconsistency. There are many ways this could arise:
|
|
|
|
* a sample file directory's disk is unexpectedly not mounted due to hardware
|
|
failure or misconfiguration.
|
|
* the administrator mixing up the mount points of two filesystems holding
|
|
different sample file directories.
|
|
* the administrator renaming a sample file directory without updating the
|
|
database.
|
|
* the administrator restoring the database from backup but not the sample file
|
|
directory, or vice versa.
|
|
* the administrator providing two sample file directory paths pointed at the
|
|
same inode via symlinks or non-canonical paths. (Note that flock(2) has a
|
|
design flaw in which multiple file descriptors can share a lock, so the
|
|
current locking scheme is not sufficient to detect this otherwise.)
|
|
* database and sample file directories forked from the same version, opened
|
|
the same number of times, then crossed.
|
|
|
|
To combat this, each sample file directory has some metadata its database row
|
|
and stored file called `meta`. These track uuids associated with the database
|
|
and directory to avoid mixups. They also track sequence numbers and uuids
|
|
associated with "opens": each time the database has been opened in read/write
|
|
mode.
|
|
|
|
```sql
|
|
create table open (
|
|
id integer primary key,
|
|
uuid blob unique not null check (length(uuid) = 16)
|
|
);
|
|
|
|
create table sample_file_dir (
|
|
id integer primary key,
|
|
path text unique not null,
|
|
uuid blob unique not null check (length(uuid) = 16),
|
|
|
|
-- The last (read/write) open of this directory which fully completed.
|
|
-- See schema.proto:DirMeta for a more complete description.
|
|
last_complete_open_id integer references open (id)
|
|
);
|
|
```
|
|
|
|
```proto
|
|
// Metadata stored in sample file dirs as "<dir>/meta". This is checked
|
|
// against the metadata stored within the database to detect inconsistencies
|
|
// between the directory and database, such as those described in
|
|
// design/schema.md.
|
|
//
|
|
// As of schema version 4, the overall file format is as follows: a
|
|
// varint-encoded length, followed by a serialized DirMeta message, followed
|
|
// by NUL bytes padding to a total length of 512 bytes. This message never
|
|
// exceeds that length.
|
|
//
|
|
// The goal of this format is to allow atomically rewriting a meta file
|
|
// in-place. I hope that on modern OSs and hardware, a single-sector
|
|
// rewrite is atomic, though POSIX frustratingly doesn't seem to guarantee
|
|
// this. There's some discussion of that here:
|
|
// <https://stackoverflow.com/a/2068608/23584>. At worst, there's a short
|
|
// window during which the meta file can be corrupted. As the file's purpose
|
|
// is to check for inconsistencies, it can be reconstructed if you assume no
|
|
// inconsistency exists.
|
|
message DirMeta {
|
|
// A uuid associated with the database, in binary form. dir_uuid is strictly
|
|
// more powerful, but it improves diagnostics to know if the directory
|
|
// belongs to the expected database at all or not.
|
|
bytes db_uuid = 1;
|
|
|
|
// A uuid associated with the directory itself.
|
|
bytes dir_uuid = 2;
|
|
|
|
// Corresponds to an entry in the `open` database table.
|
|
message Open {
|
|
uint32 id = 1;
|
|
bytes uuid = 2;
|
|
}
|
|
|
|
// The last open that was known to be recorded in the database as completed.
|
|
// Absent if this has never happened. Note this can backtrack in exactly one
|
|
// scenario: when deleting the directory, after all associated files have
|
|
// been deleted, last_complete_open can be moved to in_progress_open.
|
|
Open last_complete_open = 3;
|
|
|
|
// The last run which is in progress, if different from last_complete_open.
|
|
// This may or may not have been recorded in the database, but it's
|
|
// guaranteed that no data has yet been written by this open.
|
|
Open in_progress_open = 4;
|
|
}
|
|
```
|
|
|
|
These are updated through procedures below:
|
|
|
|
*Write the metadata file*
|
|
|
|
This is a sub-procedure used in several places below.
|
|
|
|
Precondition: the directory's lock is held with `LOCK_EX` (exclusive) and
|
|
there is an existing metadata file.
|
|
|
|
1. Open the metadata file.
|
|
2. Rewrite the fixed-length data atomically.
|
|
3. `fdatasync` the file.
|
|
|
|
*Open the database as read-only*
|
|
|
|
1. Lock the database directory with `LOCK_SH` (shared).
|
|
2. Open the SQLite database with `SQLITE_OPEN_READ_ONLY`.
|
|
|
|
*Open the database as read-write*
|
|
|
|
1. Lock the database directory with `LOCK_EX` (exclusive).
|
|
2. Open the SQLite database with `SQLITE_OPEN_READ_WRITE`.
|
|
3. Insert a new `open` table row with the new sequence number and uuid.
|
|
|
|
*Create a sample file directory*
|
|
|
|
Precondition: database open read-write.
|
|
|
|
1. Lock the sample file directory with `LOCK_EX` (exclusive).
|
|
2. Verify there is no metadata file or `last_complete_open` is unset.
|
|
3. Write new metadata file with a fresh `dir_uuid` and a `in_progress_open`
|
|
matching the database's current open.
|
|
4. Add a matching row to the database with `last_complete_open_id` matching
|
|
the current open.
|
|
5. Update the metadata file to move `in_progress_open` to
|
|
`last_complete_open`.
|
|
|
|
*Open a sample file directory read-only*
|
|
|
|
Precondition: database open (read-only or read-write).
|
|
|
|
1. Lock the sample file directory with `LOCK_SH` (shared).
|
|
2. Verify the metadata file matches the database:
|
|
* database uuid matches.
|
|
* dir uuid matches.
|
|
* if the database's `last_complete_open` is set, it must match the
|
|
directory's `last_complete_open` or `in_progress_open`.
|
|
* if the database's `last_complete_open` is absent, the directory's
|
|
must be as well.
|
|
|
|
*Open a sample file directory read-write*
|
|
|
|
Precondition: database open read-write.
|
|
|
|
1. Lock the sample file directory with `LOCK_EX` (exclusive).
|
|
2. Verify the metadata file matches the database (as above).
|
|
3. Update the metadata file with `in_progress_open` matching the current
|
|
open.
|
|
4. Update the database row with `last_complete_open_id` matching the current
|
|
open.
|
|
5. Update the metadata file with `last_complete_open` rather than
|
|
`in_progress_open`.
|
|
6. Run the recording startup procedure for this directory.
|
|
|
|
*Close a sample file directory*
|
|
|
|
1. Drop the sample file directory lock.
|
|
|
|
*Delete a sample file directory*
|
|
|
|
1. Remove all sample files (of all three categories described below:
|
|
`recording` table rows, `garbage` table rows, and files with recording
|
|
ids >= their stream's `cum_recordings`); see "delete a recording"
|
|
procedure below.
|
|
2. Rewrite the directory metadata with `in_progress_open` set to the current open,
|
|
`last_complete_open` cleared.
|
|
3. Delete the directory's row from the database.
|
|
|
|
### Lifecycle of a recording
|
|
|
|
Because a major part of the recording state is outside the SQL database, care
|
|
must be taken to guarantee consistency and durability. Moonfire NVR maintains
|
|
three invariants about sample files:
|
|
|
|
1. `recording` table rows have sample files on disk with the indicated size
|
|
and SHA-1 hash.
|
|
2. Exactly one of the following statements is true for every sample file:
|
|
* It has a `recording` table row.
|
|
* It has a `garbage` table row.
|
|
* Its recording id is greater than or equal to the `cum_recordings`
|
|
for its stream.
|
|
3. After an orderly shutdown of Moonfire NVR, there is a `recording` table row
|
|
for every sample file, even if there have been previous crashes.
|
|
|
|
The first invariant provides certainty that a recording is properly stored. It
|
|
would be prohibitively expensive to verify hashes on demand (when listing or
|
|
serving recordings), or in some cases even to verify the size of the files via
|
|
`stat()` calls.
|
|
|
|
The second invariant improves auditability of the database and sample file
|
|
directory.
|
|
|
|
The third invariant prevents accumulation of garbage files which could fill the
|
|
drive and stop recording.
|
|
|
|
These invariants are updated through the following procedure:
|
|
|
|
*Create a recording:*
|
|
|
|
1. Write the sample file, aborting if `open(..., O\_WRONLY|O\_CREATE|O\_EXCL)`
|
|
fails with `EEXIST`.
|
|
3. `fsync()` the sample file.
|
|
4. `fsync()` the sample file directory.
|
|
5. Insert the `recording` row, marking its size and SHA-1 hash in the process.
|
|
|
|
*Delete a recording:*
|
|
|
|
1. Replace the `recording` row with a `garbage` row.
|
|
2. `unlink()` the sample file, warning on `ENOENT`. (This would indicate
|
|
invariant #2 is false.)
|
|
3. `fsync()` the sample file directory.
|
|
4. Delete the `garbage` row.
|
|
|
|
*Startup (crash recovery):*
|
|
|
|
1. Acquire a lock to guarantee this is the only Moonfire NVR process running
|
|
against the given database. This lock is not released until program shutdown.
|
|
2. Query `garbage` table and `cum_recordings` field in the `stream` table.
|
|
3. `unlink()` all the sample files associated with garbage rows, ignoring
|
|
`ENOENT`.
|
|
4. For each stream, `unlink()` all the existing files with recording ids >=
|
|
`cum_recordings`.
|
|
4. `fsync()` the sample file directory.
|
|
5. Delete all rows from the `garbage` table.
|
|
|
|
The procedures can be batched: while for a given recording, the steps must be
|
|
strictly ordered, multiple recordings can be proceeding through the steps
|
|
simultaneously. In particular, there is no need to hurry syncing deletions to
|
|
disk, so deletion steps #3 and #4 can be done opportunistically if it's
|
|
desirable to avoid extra disk seeks or flash write cycles.
|
|
|
|
It'd also be possible to conserve some partial recordings. Moonfire NVR could,
|
|
as a recording is written, record the latest sample tables,
|
|
size, and hash fields without marking the recording as fully written. On
|
|
startup, the file would be truncated to match and then the recording marked
|
|
as fully written. The file would either have to be synced prior to each update
|
|
(to guarantee it is at least as new as the row) or multiple checkpoints would
|
|
be kept, using the last one with a correct hash (if any) on a best-effort
|
|
basis. However, this may not be worth the complexity; it's simpler to just
|
|
keep recording time short enough that losing partial recordings is not a
|
|
problem.
|
|
|
|
### Verifying invariants
|
|
|
|
There should be a means to verify the invariants above. There are three
|
|
possible levels of verification:
|
|
|
|
1. Compare presence of sample files.
|
|
2. Compare size of sample files.
|
|
3. Compare hashes of sample files.
|
|
|
|
Consider a database with a 6 camera-months of recordings at 3.1 Mbps (for
|
|
both main and sub streams). There would be 0.5 million files, taking 5.9 TB.
|
|
The times are roughly:
|
|
|
|
| level | operation | time |
|
|
| :------- | :---------- | -------: |
|
|
| presence | `readdir()` | ~19 sec |
|
|
| size | `fstat()` | ~120 sec |
|
|
| hash | `read()` | ~8 hours |
|
|
|
|
The `readdir()` and `fstat()` times can be tested simply:
|
|
|
|
```
|
|
$ mkdir testdir
|
|
$ cd testdir
|
|
$ seq 1 $[60*24*365*6/12*2] | xargs touch
|
|
$ sudo sh -c 'echo 1 > /proc/sys/vm/drop_caches'
|
|
$ time ls -1 -f | wc -l
|
|
$ sudo sh -c 'echo 1 > /proc/sys/vm/drop_caches'
|
|
$ time ls -1 -f --size | wc -l
|
|
```
|
|
|
|
(The system calls used by `ls` can be verified through strace.)
|
|
|
|
The hash verification time is easiest to calculate: reading 5.9 TB at 100
|
|
MB/sec takes about 8 hours. On some systems, it will be even slower. On
|
|
the Raspberry Pi 2, flash, network, and disk are all on the same USB 2.0 bus
|
|
(see [Raspberry Pi 2 NAS Experiment HOWTO][pi-2-nas]). Disk throughput seems
|
|
to be about 25 MB/sec on an idle system (~40% of the theoretical 480
|
|
Mbit/sec). Therefore the process will take over a day.
|
|
|
|
The presence check is fast enough that it seems reasonable to simply always
|
|
perform it on startup. Size could be checked with a verification command used
|
|
for more extensive verification, such as before and after schema upgrades.
|
|
Hash checks could be performed in a rare offline data recovery mechanism or in
|
|
the background at low priority.
|
|
|
|
### Recording table
|
|
|
|
The snippet below is a illustrative excerpt of the SQLite schema; see
|
|
`schema.sql` for the authoritative, up-to-date version.
|
|
|
|
```sql
|
|
-- A single, typically 60-second, recorded segment of video.
|
|
create table recording (
|
|
id integer primary key,
|
|
open_id integer references open (id),
|
|
camera_id integer references camera (id) not null,
|
|
|
|
sample_file_uuid blob unique not null,
|
|
sample_file_blake3 blob,
|
|
sample_file_size integer,
|
|
|
|
-- The starting time and duration of the recording, in 90 kHz units since
|
|
-- 1970-01-01 00:00:00 UTC.
|
|
start_time_90k integer not null,
|
|
duration_90k integer,
|
|
|
|
video_samples integer,
|
|
video_sample_entry_id blob references visual_sample_entry (id),
|
|
video_index blob,
|
|
|
|
...
|
|
);
|
|
|
|
-- A concrete box derived from a ISO/IEC 14496-12 section 8.5.2
|
|
-- VisualSampleEntry box. Describes the codec, width, height, etc.
|
|
create table visual_sample_entry (
|
|
id integerprimary key,
|
|
|
|
-- The width and height in pixels; must match values within
|
|
-- `sample_entry_bytes`.
|
|
width integer,
|
|
height integer,
|
|
|
|
-- A serialized SampleEntry box, including the leading length and box
|
|
-- type (avcC in the case of H.264).
|
|
data blob
|
|
);
|
|
```
|
|
|
|
As mentioned by the `start_time_90k` field above, recordings use a 90 kHz time
|
|
base. This matches the RTP timestamp frequency used for H.264 and other video
|
|
encodings. See [RFC 3551][rfc-3551] section 5 for an explanation of this
|
|
choice.
|
|
|
|
It's tempting to downscale to a coarser timebase, rounding as necessary, in
|
|
the name of a more compact encoding of `video_index`. (By having timestamp
|
|
deltas near zero and borrowing some of the timestamp varint to represent
|
|
additional bits of the size deltas, it's possible to use barely more than 2
|
|
bytes per frame on a typical recording. **TODO:** recalculate database size
|
|
estimates above, which were made using this technique.) But matching the input
|
|
timebase is the most understandable approach and leaves the most flexibility
|
|
available for handling timestamps encoded in RTCP Sender Report messages. In
|
|
practice, a database size of two gigabytes rather than one is unlikely to cause
|
|
problems.
|
|
|
|
One likely point of difficulty is reliably mapping recordings to wall clock
|
|
time. (This may be the subject of a separate design doc later.) In an ideal
|
|
world, the NVR and cameras would each be closely synced to a reliable NTP time
|
|
reference, time would advance at a consistent rate, time would never jump
|
|
forward or backward, each transmission would take bounded time, and cameras
|
|
would reliably send RTCP Sender Reports. In reality, none of that is likely to
|
|
be consistently true. For example, Hikvision cameras send RTCP Sender Reports
|
|
only with certain firmware versions (see [thread][hikvision-sr]). Most likely
|
|
it will be useful to have any available clock/timing information for
|
|
diagnosing problems, such as the following:
|
|
|
|
* the NVR's wall clock time
|
|
* the NVR's NTP server sync status
|
|
* the NVR's uptime
|
|
* the camera's time as of the RTP play response
|
|
* the camera's time as of any RTCP Sender Reports, and the corresponding RTP
|
|
timestamps
|
|
|
|
#### `video_index`
|
|
|
|
The `video_index` field conceptually holds three pieces of information about
|
|
the samples:
|
|
|
|
1. the duration (in 90kHz units) of each sample
|
|
2. the byte size of each sample
|
|
3. which samples are "sync samples" (aka key frames or I-frames)
|
|
|
|
These correspond to [ISO/IEC 14496-12][iso-14496-12] `stts` (TimeToSampleBox,
|
|
section 8.6.1.2), `stsz` (SampleSizeBox, section 8.7.3), and `stss`
|
|
(SyncSampleBox, section 8.6.2) boxes, respectively.
|
|
|
|
Currently the `stsc` (SampleToChunkBox, section 8.7.4) information is implied:
|
|
all samples are in a single chunk from the beginning of the file to the end.
|
|
If in the future support for interleaved audio is added, there will be a new
|
|
blob field with chunk information. **TODO:** can audio data really be sliced
|
|
to fit the visual samples like this?
|
|
|
|
The index is structured as two [varints][varints] per sample. The first varint
|
|
represents the delta between this frame's duration and the previous frame's,
|
|
in [zigzag][zigzag] form. The low bit is borrowed to indicate if this frame
|
|
is a key frame. The second varint represents the delta between this frame's
|
|
duration and the duration of the last frame of the same type (key or non-key).
|
|
This encoding is chosen so that values will be near zero, and thus the varints
|
|
will be at their most compact possible form. An index might be written by the
|
|
following pseudocode:
|
|
|
|
```
|
|
prev_duration = 0
|
|
prev_bytes_key = 0
|
|
prev_bytes_nonkey = 0
|
|
for each frame:
|
|
duration_delta = duration - prev_duration
|
|
bytes_delta = bytes - (is_key ? prev_bytes_key : prev_bytes_nonkey)
|
|
prev_duration_ms = duration_ms
|
|
if key: prev_bytes_key = bytes else: prev_bytes_nonkey = bytes
|
|
PutVarint((Zigzag(duration_delta) << 1) | is_key)
|
|
PutVarint(Zigzag(bytes_delta)
|
|
```
|
|
|
|
See also the example below:
|
|
|
|
| | frame 1 | frame 2 | frame 3 | frame 4 | frame 5 |
|
|
| :-------------- | ---------: | ------: | ------: | ------: | ------: |
|
|
| duration | 10 | 9 | 11 | 10 | 10 |
|
|
| is\_key | 1 | 0 | 0 | 0 | 1 |
|
|
| bytes | 1000 | 10 | 15 | 12 | 1050 |
|
|
| duration\_delta | 10 | -1 | 2 | -1 | 0 |
|
|
| bytes\_delta | 1000 | 10 | 5 | -3 | 50 |
|
|
| varint1 | 41 | 2 | 8 | 3 | 1 |
|
|
| varint2 | 2000 | 20 | 10 | 5 | 100 |
|
|
| encoded | `29 d0 0f` | `02 14` | `08 0a` | `02 05` | `01 64` |
|
|
|
|
### On-demand `.mp4` construction
|
|
|
|
A major goal of this format is to support on-demand serving in various formats,
|
|
including two types of `.mp4` files:
|
|
|
|
* unfragmented `.mp4` files, for traditional video players.
|
|
* fragmented `.mp4` files for MPEG-DASH or HTML5 Media Source Extensions
|
|
(see [Media Source ISO BMFF Byte Stream Format][media-bmff]), for
|
|
a browser-based user interface.
|
|
|
|
This does not require writing new `.mp4` files to disk. In fact, HTTP range
|
|
requests (for "pseudo-streaming") can be satisfied on `.mp4` files aggregated
|
|
from several segments. The implementation details are outside the scope of this
|
|
document, but this is possible in part due to the use of an on-flash database
|
|
to store metadata and the simple, consistent format of sample indexes.
|
|
|
|
[pi2]: https://www.raspberrypi.org/products/raspberry-pi-2-model-b/
|
|
[xandem]: http://www.xandemhome.com/
|
|
[hikcam]: http://overseas.hikvision.com/us/Products_accessries_10533_i7696.html
|
|
[libx264]: http://www.videolan.org/developers/x264.html
|
|
[2635QM]: http://ark.intel.com/products/53463/Intel-Core-i7-2635QM-Processor-6M-Cache-up-to-2_90-GHz
|
|
[C2538]: http://ark.intel.com/products/77981/Intel-Atom-Processor-C2538-2M-Cache-2_40-GHz
|
|
[wdpurple]: http://www.wdc.com/en/products/products.aspx?id=1210
|
|
[wd20eurs]: http://www.wdc.com/wdproducts/library/SpecSheet/ENG/2879-701250.pdf
|
|
[seeker]: http://www.linuxinsight.com/how_fast_is_your_disk.html
|
|
[rfc-3551]: https://tools.ietf.org/html/rfc3551
|
|
[hikvision-sr]: http://www.cctvforum.com/viewtopic.php?f=19&t=44534
|
|
[iso-14496-12]: http://www.iso.org/iso/home/store/catalogue_ics/catalogue_detail_ics.htm?csnumber=68960
|
|
[sqlite3]: https://www.sqlite.org/
|
|
[sqlite3-wal]: https://www.sqlite.org/wal.html
|
|
[file-consistency]: http://danluu.com/file-consistency/
|
|
[pi-2-nas]: http://www.mikronauts.com/raspberry-pi/raspberry-pi-2-nas-experiment-howto/
|
|
[varints]: https://developers.google.com/protocol-buffers/docs/encoding#varints
|
|
[zigzag]: https://developers.google.com/protocol-buffers/docs/encoding#types
|
|
[media-bmff]: https://w3c.github.io/media-source/isobmff-byte-stream-format.html
|