SQLite3 has gotten some noticeable speed improvements in recent releases. This
is a modest improvement on reasonably new platforms, and a pretty significant
one on Raspbian Jessie (which has an incredibly old SQLite3).
This makes a huge difference in the reported time - 863 usec rather than 6
milliseconds on my laptop. Part of the difference is in reqwest client setup
(it apparently initializes a SSL_CTX that is never used in this test), part
fresh connections vs keepalive, part I don't know what. None of it seems
relevant to the logic I want to test.
serve_generated_bytes is >3X faster. One caveat is that the reactor thread may
stall when reading from the memory-mapped slice. Moonfire NVR is basically a
single-user program, so that may not be so bad, but we'll see.
It had an Arc which in hindsight isn't necessary; the actual video index
generation is fast anyway. This saves a couple pointers per cache entry and
the overhead of chasing them. LruCache itself also has some extra pointers on
it but that's something to address another day.
This reduces the working set by another 960 bytes for a typical one-hour recording, improving cache efficiency a bit more.
8 bytes from SampleIndexIterator:
* reduce the three "bytes" fields to two. Doing so as "bytes_key" vs
"bytes_nonkey" slowed it down a bit, perhaps because the "bytes" is
needed right away and requires a branch. But "bytes" vs "bytes_other"
seems fine. Looks like it can do this with cmovs in parallel with other
stuff.
* stuff "is_key" into the "i" field.
8 bytes from recording::Segment itself:
* make "frames" and "key_frame" u16s
* stuff "trailing_zero" into "video_sample_entry_id"
There were Nagle's algorithm delays in both the "fresh_client" and
"reuse_client" versions of the .mp4 serving benchmark. Now performance is much
more consistent.
* don't store sizes of mp4-format sample indexes; recalculate them.
* keep SampleIndexIterator position as a u32 rather than a usize.
This is 960 bytes for a 60-minute mp4; another small cache usage improvement.
For a one-hour recording, this is about 2 KiB, so a decent chunk of a
Raspberry Pi 2's L1 cache. Generating the Slices and searching/scanning
it should be a bit faster and pollute the cache less.
This is a pretty small optimization now when transferring a decent chunk
of the moov or mdat, but it's easy enough to do. It will be much more
noticeable if I end up interleaving the captions between each key frame.
This came up when I tried using the "bundled" feature of rusqlite. Its build
script passes -DSQLITE_DEFAULT_FOREIGN_KEYS=1, which caused a test to fail.
Fix the bug that this option revealed, and set the pragma so we'll catch
such problems in the future even when using a system library not compiled in
this way.
This was completely wrong: it overflowed on large filesystems and
double-counted the used bytes.
The new logic is still imperfect in that if there are a bunch of files in the
process of being deleted (moved from recording to reserved_sample_files but
not yet unlinked), they'll be taken out of the total capacity. Maybe it should
stat everything in the sample file directory instead of relying on the
recording table. It's definitely an improvement, though.
This page was noticeably slower than necessary because the recording_cover
index wasn't actually covering the query. Both the schema for new databases
and the upgrade query were broken (and not even in the same way).
No new schema version to correct this, at least for now. I'll probably have
another reason to change the schema soon anyway and can throw this in.
This makes it easier to understand which options are valid with each
command.
Additionally, there's more separation of implementations. The most
obvious consequence is that "moonfire-nvr ts ..." no longer uselessly
locks/opens a database.
* add a --ts subcommand to convert between numeric and human-readable
representations. This is handy when directly inspecting the SQLite database
or API output.
* also take the human-readable form in the web interface's camera view.
* to reduce confusion, when using trim=true on the web interface's camera
view, trim the displayed starting and ending times as well as the actual
.mp4 file links.
These are currently the only thing which require a nightly Rust. I haven't run
them since adding the feature gates. The feature gates were slightly broken,
and the actual benchmarks had bitrotted a bit. Fix these things. Also put them
into a separate submodule from the regular tests, so that not as many
feature gates (#[cfg(feature="nightly")]) are required.
This fixes a minor performance regression for recording lists introduced in
eee887b by ordering by the start_time_90k (the natural order of the
recording_cover index) rather than the composite_id (which requires a sort
pass).
"explain query plan" before:
0|0|0|SEARCH TABLE recording USING INDEX recording_cover (start_time_90k>? AND start_time_90k<?)
0|0|0|USE TEMP B-TREE FOR ORDER BY
after:
0|0|0|SEARCH TABLE recording USING INDEX recording_cover (start_time_90k>? AND start_time_90k<?)
The list_aggregated_recordings algorithm is already designed to work in this
case; see the comments there. I must have forgotten to switch the order by
clause since writing that algorithm.
There's still a sort post-aggregation but that's over less data.
Dolf reported hitting this problem:
$ sudo -u moonfire-nvr RUST_LOG=info RUST_BACKTRACE=1 release/moonfire-nvr --upgrade
Jan 06 17:10:57.148 INFO Upgrading database from version 0 to version 1...
Jan 06 17:10:57.149 INFO ...database now in journal_mode delete (requested delete).
Jan 06 17:10:57.149 INFO ...from version 0 to version 1
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { description: "zero duration only allowed at end; have 3123 bytes left", cause: None }', /buildslave/rust-buildbot/slave/stable-dist-rustc-cross-host-linux/build/src/libcore/result.rs:837
The indexes were being scanned on upgrade to set the trailing zero flag which
is some sanity checking for /view.mp4 URLs. It's not a big problem to skip it
for some funny recordings to let the update proceed.
Separately, I'll add validation of the pts when writing a recording; it will
report error and end the recording (retrying a second later) rather than write
an unplayable database enty.
Probably also a good time to add a --check to spot database problems such as
this and recording rows without a matching sample file or vice versa.
* sort by newest recording first (even if time jumps backwards), which seems
more useful / less confusing.
* add a trim=true URL parameter to trim the .mp4s to not extend beyond the
range in question. Otherwise it's quite difficult to produce such a URL in
the new s= format: you'd have to manually inspect the database to find the
precise start time of the recording and do the math by hand.
As described in design/time.md:
* get the realtime-monotonic once at the start of a run and use the
monotonic clock afterward to avoid problems with local time steps
* on every recording, try to correct the latest local_time_delta at up
to 500 ppm
Let's see how this works...
The test ensures it solves the problem of the initial buffering throwing off
the start time of the first segment.
Along the way, I tested and fixed the new TrailingZero flag; it wasn't being
set.
This is as described in design/time.md. Other aspects of that design
(including using the monotonic clock and adjusting the durations to compensate
for camera clock frequency error) are not implemented yet. No new tests yet.
Just trying to get some flight miles on these ideas as soon as I can.