This page was noticeably slower than necessary because the recording_cover
index wasn't actually covering the query. Both the schema for new databases
and the upgrade query were broken (and not even in the same way).
No new schema version to correct this, at least for now. I'll probably have
another reason to change the schema soon anyway and can throw this in.
The advantages of the new schema are:
* overlapping recordings can be unambiguously described and viewed.
This is a significant problem right now; the clock on my cameras appears to
run faster than the (NTP-synchronized) clock on my NVR. Thus, if an
RTSP session drops and is quickly reconnected, there's likely to be
overlap.
* less I/O is required to view mp4s when there are multiple cameras.
This is a pretty dramatic difference in the number of database read
syscalls with pragma page_size = 1024 (605 -> 39 in one test),
although I'm not sure how much of that maps to actual I/O wait time.
That's probably as dramatic as it is due to overflow page chaining.
But even with larger page sizes, there's an improvement. It helps to
stop interleaving the video_index fields from different cameras.
There are changes to the JSON API to take advantage of this, described
in design/api.md.
There's an upgrade procedure, described in guide/schema.md.
I should have submitted/pushed more incrementally but just played with it on
my computer as I was learning the language. The new Rust version more or less
matches the functionality of the current C++ version, although there are many
caveats listed below.
Upgrade notes: when moving from the C++ version, I recommend dropping and
recreating the "recording_cover" index in SQLite3 to pick up the addition of
the "video_sync_samples" column:
$ sudo systemctl stop moonfire-nvr
$ sudo -u moonfire-nvr sqlite3 /var/lib/moonfire-nvr/db/db
sqlite> drop index recording_cover;
sqlite3> create index ...rest of command as in schema.sql...;
sqlite3> ^D
Some known visible differences from the C++ version:
* .mp4 generation queries SQLite3 differently. Before it would just get all
video indexes in a single query. Now it leads with a query that should be
satisfiable by the covering index (assuming the index has been recreated as
noted above), then queries individual recording's indexes as needed to fill
a LRU cache. I believe this is roughly similar speed for the initial hit
(which generates the moov part of the file) and significantly faster when
seeking. I would have done it a while ago with the C++ version but didn't
want to track down a lru cache library. It was easier to find with Rust.
* On startup, the Rust version cleans up old reserved files. This is as in the
design; the C++ version was just missing this code.
* The .html recording list output is a little different. It's in ascending
order, with the most current segment shorten than an hour rather than the
oldest. This is less ergonomic, but it was easy. I could fix it or just wait
to obsolete it with some fancier JavaScript UI.
* commandline argument parsing and logging have changed formats due to
different underlying libraries.
* The JSON output isn't quite right (matching the spec / C++ implementation)
yet.
Additional caveats:
* I haven't done any proof-reading of prep.sh + install instructions.
* There's a lot of code quality work to do: adding (back) comments and test
coverage, developing a good Rust style.
* The ffmpeg foreign function interface is particularly sketchy. I'd
eventually like to switch to something based on autogenerated bindings.
I'd also like to use pure Rust code where practical, but once I do on-NVR
motion detection I'll need to existing C/C++ libraries for speed (H.264
decoding + OpenCL-based analysis).
There's a lot of work left to do on this:
* important latency optimization: the recording threads block
while fsync()ing sample files, which can take 250+ ms. This
should be moved to a separate thread to happen asynchronously.
* write cycle optimizations: several SQLite commits per camera per minute.
* test coverage: this drops testing of the file rotation, and
there are several error paths worth testing.
* ffmpeg oddities to investigate:
* the out-of-order first frame's pts
* measurable delay before returning packets
* it sometimes returns an initial packet it calls a "key" frame that actually
has an SEI recovery point NAL but not an IDR-coded slice NAL, even though
in the input these always seem to come together. This makes playback
starting from this recording not work at all on Chrome. The symptom is
that it loads a player-looking thing with the proper dimensions but
playback never actually starts.
I imagine these are all related but haven't taken the time to dig through
ffmpeg code and understand them. The right thing anyway may be to ditch
ffmpeg for RTSP streaming (perhaps in favor of the live555 library), as
it seems to have other omissions like making it hard/impossible to take
advantage of Sender Reports. In the meantime, I attempted to mitigate
problems by decreasing ffmpeg's probesize.
* handling overlapping recordings: right now if there's too much time drift or
a time jump, you can end up with recordings that the UI won't play without
manual database changes. It's not obvious what the right thing to do is.
* easy camera setup: currently you have to manually insert rows in the SQLite
database and restart.
but I think it's best to get something in to iterate from.
This deletes a lot of code, including:
* the ffmpeg video sink code (instead now using a bit of extra code in Stream
on top of the SampleFileWriter, SampleIndexEncoder, and MoonfireDatabase
code that's been around for a while)
* FileManager (in favor of new code using the database)
* the old UI
* RealFile and friends
* the dependency on protocol buffers, which was used for the config file
(though I'll likely have other reasons for using protocol buffers later)
* even some utilities like IsWord that were just for validating the config
* Schema revisions. The most dramatic is the addition of a covering index on
(camera_id, start_time_90k) that avoids the need to make sparse accesses
into the recording table (where the desired data is intermixed with both
the large blobs and rows from other cameras). A query over a year's data
previously took many seconds (6+ even in a form without the video_index)
and now is roughly 10X faster. Queries for a couple weeks now should be
unnoticeably fast.
Other changes to shrink the rows, such as duration_90k instead of
end_time_90k (more compact varint encoding) and video_sample_entry_id
(typically 1 byte) instead of video_sample_entry_sha1 (20 bytes).
And more CHECK constraints for good measure.
* Caching of expensive computations and logic to keep them up to date.
The top-level web view previously went through the entire recording table,
which was even slower. Now it is served from a small map in RAM.
* Expanded the scope of operations to cover (hopefully) everything needed for
recording into the SQLite database.
* Added tests of MoonfireDatabase. These are basic tests that don't
exercise a lot of error cases, but at least they exist.
The main MoonfireDatabase functionality still missing is support for quickly
seeing what calendar days have data over the full timespan of a camera. This
is more data to compute and cache.
On my laptop, with a month's data, a test query would take 0.1 to 0.2 seconds
before. Now it takes 0.001 to 0.004 seconds.
I improved this by creating and taking advantage of an index on start time.
It's a little more complicated than that because the desired timespan is
specified in terms of a recording's start and end time, not start time alone.
I defined a maximum duration of a recording (5 minutes) and specified this
with an extra condition in the query so that the end time can be used to
narrow the valid range of start times.
"explain query plan select ..." output confirms it's using the index with
both > and < comparisons:
0|0|0|SEARCH TABLE recording USING INDEX recording_start_time_90k (start_time_90k>? AND start_time_90k<?)
0|1|1|SEARCH TABLE video_sample_entry USING INDEX sqlite_autoindex_video_sample_entry_1 (sha1=?)
I also refactored ListMp4Recordings out of BuildMp4File to make the measurement
easier.
I wrote the old interface before playing much with SQLite. Now that I've
played around with it a bit, I found many ways to make the interface more
pleasant and fool-proof:
* it opens the database in a mode that honors foreign keys and
returns extended result codes.
* it forces locking to avoid SQLITE_BUSY and
sqlite3_{changes,last_insert_rowid} race conditions.
* it supports named bind parameters.
* it defers some errors until Step() to reduce caller verbosity.
* it automatically handles calling reset, which was quite easy to forget.
* it remembers the Step() return value, which makes the row loop every so
slightly more pleasant.
* it tracks transaction status.
This avoids iteration through the video index for the "interior" recordings of
a virtual file. This takes generating the size of a ~8-hour / 15 fps file from
about 60 ms to about 10 ms. I expect better savings on a Raspberry Pi 2, for
longer records, and for higher frame rates. The total time here can be
significant; one one ~day-long recording on the Pi, it was several seconds.
I'm optimistic this will help with that.
It'd also be possible to optimize DecodeVar32 (perhaps by unrolling the loop)
but better to remove a call than to optimize one.
To add the fast path, we need a new field "video_sync_samples" in the
recording table to calculate the length of the stss table. Storage cost should
be minimal; I think typically two bytes in SQLite's record format (serial type
1, value < 128), described here: <https://www.sqlite.org/fileformat2.html>.
This is still pretty rough. For example, there's no test coverage of virtual
files based on multiple recordings. The etag and last modified code are stubs.
And various other conditions aren't tested at all. But it does appear to work
in a test that does a round-trip from a .mp4 file, so it should be a decent
starting point.