This is as described in design/time.md. Other aspects of that design
(including using the monotonic clock and adjusting the durations to compensate
for camera clock frequency error) are not implemented yet. No new tests yet.
Just trying to get some flight miles on these ideas as soon as I can.
The advantages of the new schema are:
* overlapping recordings can be unambiguously described and viewed.
This is a significant problem right now; the clock on my cameras appears to
run faster than the (NTP-synchronized) clock on my NVR. Thus, if an
RTSP session drops and is quickly reconnected, there's likely to be
overlap.
* less I/O is required to view mp4s when there are multiple cameras.
This is a pretty dramatic difference in the number of database read
syscalls with pragma page_size = 1024 (605 -> 39 in one test),
although I'm not sure how much of that maps to actual I/O wait time.
That's probably as dramatic as it is due to overflow page chaining.
But even with larger page sizes, there's an improvement. It helps to
stop interleaving the video_index fields from different cameras.
There are changes to the JSON API to take advantage of this, described
in design/api.md.
There's an upgrade procedure, described in guide/schema.md.
This crate is a slightly-more-polished and MIT-licensed version of
resource.rs. So far it has one advantage: running the tests doesn't
require RUST_TEST_THREADS=1.
The benchmarks now require "cargo bench --features=nightly". The
extra #[cfg(nightly)] switches in the code needed for it are a bit
annoying; I may move the benches to a separate directory to avoid this.
But for now, this works.
This is a significant milestone; now the Rust branch matches the C++ branch's
features.
In the process, I switched from using serde_derive (which requires nightly
Rust) to serde_codegen (which does not). It was easier than I thought it'd
be. I'm getting close to no longer requiring nightly Rust.
It would be nice to build on stable Rust. In particular, I'm hitting
compiler bugs in Rust nightly, such at this one:
https://github.com/rust-lang/rust/issues/38177
I imagine beta/stable compilers would be less problematic.
These two features were easy to get rid of:
* alloc was used to get a Box<[u8]> to uninitialized memory.
Looks like that's possible with Vec.
* box_syntax wasn't actually used at all. (Maybe a leftover from something.)
The remaining features are:
* plugin, for clippy.
https://github.com/rust-lang/rust/issues/29597
I could easily gate it with a "nightly" cargo feature.
* proc_macro, for serde_derive.
https://github.com/rust-lang/rust/issues/35900
serde does support stable rust, although it's annoying.
https://serde.rs/codegen-stable.html
I might just wait a bit; this feature looks like it's getting close to
stabilization.
Seeking in Chrome 55 wasn't working. It apparently sends If-Match requests
with the correct etag, which Moonfire NVR was incorrectly responding to with
"Precondition failed" responses. Fix and test that.
I hadn't noticed the problem in earlier versions of Chrome. I think they were
using If-Range instead, which is already tested and working.
This test is copied from the C++ implementation. It ensures the timestamps are
calculated accurately from the pts rather than using ffmpeg's estimated
duration. The Rust implementation was doing the easy-but-inaccurate thing, so
fix that to make the test pass.
Additionally, I did this with a code structure that should ensure the Rust
code never drops a Writer without indicating to the syncer that its uuid is
abandoned. Such a bug essentially leaks the partially-written file, although a
restart would cause it to be properly unlinked and marked as such. There are
no tests (yet) that exercise this scenario, though.
* new, more thorough tests based on a "BoxCursor" which navigates the
resulting .mp4. This tests everything the C++ code was testing on
Mp4SamplePieces. And it goes beyond: it tests the actual resulting .mp4
file, not some internal logic.
* fix recording::Segment::foreach to properly handle a truncated ending.
Before this was causing a panic.
* get rid of the separate recording::Segment::init method. This was some of
the first Rust I ever wrote, and I must have thought I couldn't loan it my
locked database. I can, and that's more clean. Now Segments are never
half-initialized. Less to test, less to go wrong.
* fix recording::Segment::new to treat a trailing zero duration on a segment
with a non-zero start in the same way as it does with a zero start. I'm
still not sure what I'm doing makes sense, but at least it's not
surprisingly inconsistent.
* add separate, smaller tests of recording::Segment
* address a couple TODOs in the .mp4 code and add missing comments
* change a couple panics on database corruption into cleaner error returns
* increment the etag version given the .mp4 output has changed
I found this while bringing db.rs's test coverage up to the old
moonfire-db-test.cc. I mistakenly thought that in SQLite, an ungrouped
aggregate on a relation with no rows would return a row with a null result of
the aggregate. Instead, it returns no rows. In hindsight, this makes more
sense; it matches what grouped aggregates (have to) do.
I should have submitted/pushed more incrementally but just played with it on
my computer as I was learning the language. The new Rust version more or less
matches the functionality of the current C++ version, although there are many
caveats listed below.
Upgrade notes: when moving from the C++ version, I recommend dropping and
recreating the "recording_cover" index in SQLite3 to pick up the addition of
the "video_sync_samples" column:
$ sudo systemctl stop moonfire-nvr
$ sudo -u moonfire-nvr sqlite3 /var/lib/moonfire-nvr/db/db
sqlite> drop index recording_cover;
sqlite3> create index ...rest of command as in schema.sql...;
sqlite3> ^D
Some known visible differences from the C++ version:
* .mp4 generation queries SQLite3 differently. Before it would just get all
video indexes in a single query. Now it leads with a query that should be
satisfiable by the covering index (assuming the index has been recreated as
noted above), then queries individual recording's indexes as needed to fill
a LRU cache. I believe this is roughly similar speed for the initial hit
(which generates the moov part of the file) and significantly faster when
seeking. I would have done it a while ago with the C++ version but didn't
want to track down a lru cache library. It was easier to find with Rust.
* On startup, the Rust version cleans up old reserved files. This is as in the
design; the C++ version was just missing this code.
* The .html recording list output is a little different. It's in ascending
order, with the most current segment shorten than an hour rather than the
oldest. This is less ergonomic, but it was easy. I could fix it or just wait
to obsolete it with some fancier JavaScript UI.
* commandline argument parsing and logging have changed formats due to
different underlying libraries.
* The JSON output isn't quite right (matching the spec / C++ implementation)
yet.
Additional caveats:
* I haven't done any proof-reading of prep.sh + install instructions.
* There's a lot of code quality work to do: adding (back) comments and test
coverage, developing a good Rust style.
* The ffmpeg foreign function interface is particularly sketchy. I'd
eventually like to switch to something based on autogenerated bindings.
I'd also like to use pure Rust code where practical, but once I do on-NVR
motion detection I'll need to existing C/C++ libraries for speed (H.264
decoding + OpenCL-based analysis).
* typo: the subtitle should use its own mdhd, not alias the video one
* use 64-bit ints for the edit lists; the 32-bit values overflow at 13.25 hours
* use etags that reflect the edit list
I'm seeing what is possible performance-wise in the current C++ before
trying out Go and Rust implementations.
* use the google benchmark framework and some real data.
* use release builds - I hadn't done this in a while, and there were a
few compile errors that manifested only in release mode. Update the
readme to suggest using a release build.
* optimize the varint decoder and SampleIndexIterator to branch less.
* enable link-time optimization for release builds.
* add some support for feedback-directed optimization. Ideally "make"
would automatically produce the "generate" build outputs with a
different object/library/executable suffix, run the generate
benchmark, and then produce the "use" builds. This is not that fancy;
you have to run an arcane command:
alias cmake='cmake -DCMAKE_BUILD_TYPE=Release'
cmake -DPROFILE_GENERATE=true -DPROFILE_USE=false .. && \
make recording-bench && \
src/recording-bench && \
cmake -DPROFILE_GENERATE=false -DPROFILE_USE=true .. && \
make recording-bench && \
perf stat -e cycles,instructions,branches,branch-misses \
src/recording-bench --benchmark_repetitions=5
That said, the results are dramatic - at least 50% improvement. (The
results weren't stable before as small tweaks to the code caused a
huge shift in performance, presumably something something branch
alignment something something.)
Now it's possible to quickly determine what calendar days have data and then
query recordings for just the day(s) of interest with their returned
{start,end}_time_usec.
The helper isn't used yet. The goal is to export this on /camera/<uuid>/ as
described in a TODO in design/api.md.
The next step is to keep MoonfireDatabase::CameraData::days up-to-date:
* Init: call on every recording (replacing the current aggregated query with
a recording-by-recording query)
* InsertRecording, DeleteRecordings: call for added/removed recordings
then return it from GetCamera and pass it along to the client in
WebInterface::HandleJsonCameraDetail.
* If the end of a segment is between samples, the last included sample will
have a shortened duration.
* If the beginning of a segment not on a key frame (aka sync sample), the
prefix will be included but trimmed using an edit list. (It seems like a
ctts box might be able to accomplish the same thing, fwiw.)
* gcc (Raspbian 4.9.2-10) 4.9.2 complains about -1 in const char[]s.
gcc (Ubuntu 5.2.1-22ubuntu2) 5.2.1 20151010 was fine with this.
Use '\xff' instead.
* libjsoncpp-dev 0.6.0~rc2-3.1 doesn't have Json::writeValue.
Use an older interface instead.
* libre2-dev 20140304+dfsg-2 has a bug in which custom RE2 parsers don't
compile because the relevant constructor is only declared, not defined as
trivial. (This is fixed on my Ubuntu's libre2-dev 20150701+dfsg-2.)
Avoid using this.
I tested these in VLC and QuickTime. Both players appear to ignore the
as the track dimensions, track transformation matrix, box dimensions, and box
justification. I just left them at default values then.
Automated testing is minimal. There's a new test that the resulting .mp4
parses, but I didn't actually ensure correctness of the subtitles in any way.
* Changed README.md commensurately
* Add cameras.sql to .gitignore to not commit personal camera data
* Change CMakeLists.txt to explicitly refer to hand-built libevent dirs
There's a lot of work left to do on this:
* important latency optimization: the recording threads block
while fsync()ing sample files, which can take 250+ ms. This
should be moved to a separate thread to happen asynchronously.
* write cycle optimizations: several SQLite commits per camera per minute.
* test coverage: this drops testing of the file rotation, and
there are several error paths worth testing.
* ffmpeg oddities to investigate:
* the out-of-order first frame's pts
* measurable delay before returning packets
* it sometimes returns an initial packet it calls a "key" frame that actually
has an SEI recovery point NAL but not an IDR-coded slice NAL, even though
in the input these always seem to come together. This makes playback
starting from this recording not work at all on Chrome. The symptom is
that it loads a player-looking thing with the proper dimensions but
playback never actually starts.
I imagine these are all related but haven't taken the time to dig through
ffmpeg code and understand them. The right thing anyway may be to ditch
ffmpeg for RTSP streaming (perhaps in favor of the live555 library), as
it seems to have other omissions like making it hard/impossible to take
advantage of Sender Reports. In the meantime, I attempted to mitigate
problems by decreasing ffmpeg's probesize.
* handling overlapping recordings: right now if there's too much time drift or
a time jump, you can end up with recordings that the UI won't play without
manual database changes. It's not obvious what the right thing to do is.
* easy camera setup: currently you have to manually insert rows in the SQLite
database and restart.
but I think it's best to get something in to iterate from.
This deletes a lot of code, including:
* the ffmpeg video sink code (instead now using a bit of extra code in Stream
on top of the SampleFileWriter, SampleIndexEncoder, and MoonfireDatabase
code that's been around for a while)
* FileManager (in favor of new code using the database)
* the old UI
* RealFile and friends
* the dependency on protocol buffers, which was used for the config file
(though I'll likely have other reasons for using protocol buffers later)
* even some utilities like IsWord that were just for validating the config
I discovered that the mp4 files I was writing were viewable in VLC and in
Chrome-on-desktop (ffmpeg-based) but not in Chrome-on-Android
(libstagefright-based). It turns out that I was writing Annex B sample data
rather than the correct AVCParameterSample format. ffmpeg gives both the
"extradata" and the actual frames in Annex B format when reading from rtsp.
This is still my simple, unoptimized implementation of the Annex B parser. My
Raspberry Pi 2 is still able to record my six streams using about 30% of 1
core, so it will do for the moment at least.
In particular, this returns all the extra configuration data that will be
necessary to actually instantiate streams from the database rather than the
soon-to-be-removed configuration file.
Before, I had a gross hardcoded path in moonfire-db.cc + a hacky
Recording::sample_file_path (which is StrCat(sample_file_dir, "/", uuid),
essentially). Now, things expect to take a File* to the sample file directory
and use openat(2). Several things had to change in the process:
* RealFileSlice now takes a File* dir.
* File has an Open that returns an fd (for RealFileSlice's benefit).
* BuildMp4 now is in WebInterface rather than MoonfireDatabase. The latter
only manages the SQLite database, so it shouldn't know anything about the
sample file directory.
This reverts commit ad4beac464.
That commit wasn't as advertised; I had several other changes mixed in my
working copy. I'd also copied a working copy from one path to another, and
it turns out the cmake build subdir was still referring to the original, so
I hadn't realized this commit didn't even build. :(
I didn't properly update the new duration calculation when switching from
ascending to descending order.
Also, on the Pi, 1-hour recordings are noticeably faster to load.
* Schema revisions. The most dramatic is the addition of a covering index on
(camera_id, start_time_90k) that avoids the need to make sparse accesses
into the recording table (where the desired data is intermixed with both
the large blobs and rows from other cameras). A query over a year's data
previously took many seconds (6+ even in a form without the video_index)
and now is roughly 10X faster. Queries for a couple weeks now should be
unnoticeably fast.
Other changes to shrink the rows, such as duration_90k instead of
end_time_90k (more compact varint encoding) and video_sample_entry_id
(typically 1 byte) instead of video_sample_entry_sha1 (20 bytes).
And more CHECK constraints for good measure.
* Caching of expensive computations and logic to keep them up to date.
The top-level web view previously went through the entire recording table,
which was even slower. Now it is served from a small map in RAM.
* Expanded the scope of operations to cover (hopefully) everything needed for
recording into the SQLite database.
* Added tests of MoonfireDatabase. These are basic tests that don't
exercise a lot of error cases, but at least they exist.
The main MoonfireDatabase functionality still missing is support for quickly
seeing what calendar days have data over the full timespan of a camera. This
is more data to compute and cache.
On my laptop, with a month's data, a test query would take 0.1 to 0.2 seconds
before. Now it takes 0.001 to 0.004 seconds.
I improved this by creating and taking advantage of an index on start time.
It's a little more complicated than that because the desired timespan is
specified in terms of a recording's start and end time, not start time alone.
I defined a maximum duration of a recording (5 minutes) and specified this
with an extra condition in the query so that the end time can be used to
narrow the valid range of start times.
"explain query plan select ..." output confirms it's using the index with
both > and < comparisons:
0|0|0|SEARCH TABLE recording USING INDEX recording_start_time_90k (start_time_90k>? AND start_time_90k<?)
0|1|1|SEARCH TABLE video_sample_entry USING INDEX sqlite_autoindex_video_sample_entry_1 (sha1=?)
I also refactored ListMp4Recordings out of BuildMp4File to make the measurement
easier.
This is almost certain to have performance problems with large databases,
but it's a useful starting point.
No tests yet. It shouldn't be too hard to add some for moonfire-db.h, but
I'm impatient to fake up enough data to check on the performance and see
what needs to change there first.
This wraps libevent's evhttp_parse_query_str and friends. It's easier to use
than the raw libevent stuff because it handles initialization (formerly not
done properly in profiler.cc) and cleans up with RAII.