The recording::Segment was constructing a segment with no frames in it, which
was causing a panic when appending a zero-length stts to the Slices. Fix this
in a couple ways:
* Slices::append should return Err rather than panic. No reason to crash the
whole program when we have trouble serving a single .mp4 request.
* recording::Segment shouldn't produce zero-frame segments
I had an assert that fired in this case, dating back to when I hadn't plumbed
Result returns through much of .mp4 construction. Now I have, so there's no
excuse in having an assert here. Change to an error return, and tweak it to
not fire in the zero-duration case.
Also fix a problem in the test harness; I hadn't finished converting it for
multi-recording tests, and it was returning the wrong recording.
Because of that, I seem to have stumbled across a related problem in which
asking for zero duration of a non-zero duration recording will return a
recording::Segment with no frames, which will cause panics because its
corresponding .mp4 slices are zero-length. I just adjusted the panic message
here; I'll follow up with changes to address that.
* CameraDayKey::bounds (used to generate the start and end times of days in
the returned JSON) returned UTC, not matching what recordings were mapped
into that day. So fetching a day with its given bounds would return
something different. Test and fix it.
* Several time-related tests weren't calling testutil::init(), so they weren't
fixing the time zone to the expected America/Los_Angeles. If the machine
time is set to something else, they would break.
I think this is an ffmpeg bug, which I plan to report. In the meantime, this
makes the tests pass. Long-term, even if ffmpeg fixes this, I probably don't
want to continue doing acceptance tests against whatever version of ffmpeg
happens to be installed - my real targets of interest are the latest versions
of Chrome, Firefox, Safari, QuickTime, and VLC.
This was causing Firefox to fail to play multipart .mp4s which trimmed away a
prefix. In the developer console, it said NS_ERROR_DOM_MEDIA_METADATA_ERR
without giving any RESULT_DETAIL, making it a pain to diagnose. Given that the
stss is supposed to be needed for seeking, I'm surprised it didn't have any
immediately obvious impact on Chrome or VLC. Maybe they just took longer to
seek than otherwise necessary.
The bug was that when keeping track of the "next frame num" while constructing
the .mp4, I appended the number in the underlying recording, not the number
post-trimming. That meant following segments used the wrong numbers. In some
cases, it caused it to exceed the total number of samples in the generated
.mp4, which seems to be what Firefox was complaining about. Running the result
through "ffmpeg -i bad.mp4 -c copy -f mp4 good.mp4" just trimmed away the most
obviously invalid ones, leaving others that didn't point to the frames they
meant to. That was enough to make Firefox start playing the file. /shruggie
The existing tests were all with a single segment, so I added a new one to
catch this. I also added a Debug implementation to recording::Segment and
mp4::Segment.
This was totally broken in commit 1cf27c18. It would serve bytes from the
beginning of the sample file in question, not from the start of the given
range.
it should be exactly 1, but was slightly more because the fraction was
incorrectly 1 rather than 0. I'm not sure if any actual players care about
this, but it was something I noticed when looking into strange edit list
behavior.
This is intended to support HTML5 Media Source Extensions, which I expect to
be the most practical way to make a good web UI with a proper scrub bar and
such.
This feature has had very limited testing on Chrome and Firefox, and that was
not entirely successful. More work is needed before it's usable, but this
seems like a helpful progress checkpoint.
This significantly improves safety of the ffmpeg interface. The complex
ABIs aren't accessed directly from Rust. Instead, I have a small C
wrapper which uses the ffmpeg C API and the C headers at compile-time to
determine the proper ABI in the same way any C program using ffmpeg
would, so that the ABI doesn't have to be duplicated in Rust code.
I've tested with ffmpeg 2.x and ffmpeg 3.x; it seems to work properly
with both where before ffmpeg 3.x caused segfaults.
It still depends on ABI compatibility between the compiled and running
versions. C programs need this, too, and normal shared library
versioning practices provide this guarantee. But log both versions on
startup for diagnosing problems with this.
Fixes#7
The benchmarks don't get compiled with the standard "cargo test";
they require "cargo +nightly bench --features=nightly", so I didn't notice
they were broken in the previous commit. Now fixed.
This makes a huge difference in the reported time - 863 usec rather than 6
milliseconds on my laptop. Part of the difference is in reqwest client setup
(it apparently initializes a SSL_CTX that is never used in this test), part
fresh connections vs keepalive, part I don't know what. None of it seems
relevant to the logic I want to test.
serve_generated_bytes is >3X faster. One caveat is that the reactor thread may
stall when reading from the memory-mapped slice. Moonfire NVR is basically a
single-user program, so that may not be so bad, but we'll see.
It had an Arc which in hindsight isn't necessary; the actual video index
generation is fast anyway. This saves a couple pointers per cache entry and
the overhead of chasing them. LruCache itself also has some extra pointers on
it but that's something to address another day.
This reduces the working set by another 960 bytes for a typical one-hour recording, improving cache efficiency a bit more.
8 bytes from SampleIndexIterator:
* reduce the three "bytes" fields to two. Doing so as "bytes_key" vs
"bytes_nonkey" slowed it down a bit, perhaps because the "bytes" is
needed right away and requires a branch. But "bytes" vs "bytes_other"
seems fine. Looks like it can do this with cmovs in parallel with other
stuff.
* stuff "is_key" into the "i" field.
8 bytes from recording::Segment itself:
* make "frames" and "key_frame" u16s
* stuff "trailing_zero" into "video_sample_entry_id"
There were Nagle's algorithm delays in both the "fresh_client" and
"reuse_client" versions of the .mp4 serving benchmark. Now performance is much
more consistent.
* don't store sizes of mp4-format sample indexes; recalculate them.
* keep SampleIndexIterator position as a u32 rather than a usize.
This is 960 bytes for a 60-minute mp4; another small cache usage improvement.
For a one-hour recording, this is about 2 KiB, so a decent chunk of a
Raspberry Pi 2's L1 cache. Generating the Slices and searching/scanning
it should be a bit faster and pollute the cache less.
This is a pretty small optimization now when transferring a decent chunk
of the moov or mdat, but it's easy enough to do. It will be much more
noticeable if I end up interleaving the captions between each key frame.
This came up when I tried using the "bundled" feature of rusqlite. Its build
script passes -DSQLITE_DEFAULT_FOREIGN_KEYS=1, which caused a test to fail.
Fix the bug that this option revealed, and set the pragma so we'll catch
such problems in the future even when using a system library not compiled in
this way.
This was completely wrong: it overflowed on large filesystems and
double-counted the used bytes.
The new logic is still imperfect in that if there are a bunch of files in the
process of being deleted (moved from recording to reserved_sample_files but
not yet unlinked), they'll be taken out of the total capacity. Maybe it should
stat everything in the sample file directory instead of relying on the
recording table. It's definitely an improvement, though.
This page was noticeably slower than necessary because the recording_cover
index wasn't actually covering the query. Both the schema for new databases
and the upgrade query were broken (and not even in the same way).
No new schema version to correct this, at least for now. I'll probably have
another reason to change the schema soon anyway and can throw this in.
This makes it easier to understand which options are valid with each
command.
Additionally, there's more separation of implementations. The most
obvious consequence is that "moonfire-nvr ts ..." no longer uselessly
locks/opens a database.
* add a --ts subcommand to convert between numeric and human-readable
representations. This is handy when directly inspecting the SQLite database
or API output.
* also take the human-readable form in the web interface's camera view.
* to reduce confusion, when using trim=true on the web interface's camera
view, trim the displayed starting and ending times as well as the actual
.mp4 file links.
These are currently the only thing which require a nightly Rust. I haven't run
them since adding the feature gates. The feature gates were slightly broken,
and the actual benchmarks had bitrotted a bit. Fix these things. Also put them
into a separate submodule from the regular tests, so that not as many
feature gates (#[cfg(feature="nightly")]) are required.
This fixes a minor performance regression for recording lists introduced in
eee887b by ordering by the start_time_90k (the natural order of the
recording_cover index) rather than the composite_id (which requires a sort
pass).
"explain query plan" before:
0|0|0|SEARCH TABLE recording USING INDEX recording_cover (start_time_90k>? AND start_time_90k<?)
0|0|0|USE TEMP B-TREE FOR ORDER BY
after:
0|0|0|SEARCH TABLE recording USING INDEX recording_cover (start_time_90k>? AND start_time_90k<?)
The list_aggregated_recordings algorithm is already designed to work in this
case; see the comments there. I must have forgotten to switch the order by
clause since writing that algorithm.
There's still a sort post-aggregation but that's over less data.
Dolf reported hitting this problem:
$ sudo -u moonfire-nvr RUST_LOG=info RUST_BACKTRACE=1 release/moonfire-nvr --upgrade
Jan 06 17:10:57.148 INFO Upgrading database from version 0 to version 1...
Jan 06 17:10:57.149 INFO ...database now in journal_mode delete (requested delete).
Jan 06 17:10:57.149 INFO ...from version 0 to version 1
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { description: "zero duration only allowed at end; have 3123 bytes left", cause: None }', /buildslave/rust-buildbot/slave/stable-dist-rustc-cross-host-linux/build/src/libcore/result.rs:837
The indexes were being scanned on upgrade to set the trailing zero flag which
is some sanity checking for /view.mp4 URLs. It's not a big problem to skip it
for some funny recordings to let the update proceed.
Separately, I'll add validation of the pts when writing a recording; it will
report error and end the recording (retrying a second later) rather than write
an unplayable database enty.
Probably also a good time to add a --check to spot database problems such as
this and recording rows without a matching sample file or vice versa.
* sort by newest recording first (even if time jumps backwards), which seems
more useful / less confusing.
* add a trim=true URL parameter to trim the .mp4s to not extend beyond the
range in question. Otherwise it's quite difficult to produce such a URL in
the new s= format: you'd have to manually inspect the database to find the
precise start time of the recording and do the math by hand.
As described in design/time.md:
* get the realtime-monotonic once at the start of a run and use the
monotonic clock afterward to avoid problems with local time steps
* on every recording, try to correct the latest local_time_delta at up
to 500 ppm
Let's see how this works...
The test ensures it solves the problem of the initial buffering throwing off
the start time of the first segment.
Along the way, I tested and fixed the new TrailingZero flag; it wasn't being
set.
This is as described in design/time.md. Other aspects of that design
(including using the monotonic clock and adjusting the durations to compensate
for camera clock frequency error) are not implemented yet. No new tests yet.
Just trying to get some flight miles on these ideas as soon as I can.
The advantages of the new schema are:
* overlapping recordings can be unambiguously described and viewed.
This is a significant problem right now; the clock on my cameras appears to
run faster than the (NTP-synchronized) clock on my NVR. Thus, if an
RTSP session drops and is quickly reconnected, there's likely to be
overlap.
* less I/O is required to view mp4s when there are multiple cameras.
This is a pretty dramatic difference in the number of database read
syscalls with pragma page_size = 1024 (605 -> 39 in one test),
although I'm not sure how much of that maps to actual I/O wait time.
That's probably as dramatic as it is due to overflow page chaining.
But even with larger page sizes, there's an improvement. It helps to
stop interleaving the video_index fields from different cameras.
There are changes to the JSON API to take advantage of this, described
in design/api.md.
There's an upgrade procedure, described in guide/schema.md.
This crate is a slightly-more-polished and MIT-licensed version of
resource.rs. So far it has one advantage: running the tests doesn't
require RUST_TEST_THREADS=1.
The benchmarks now require "cargo bench --features=nightly". The
extra #[cfg(nightly)] switches in the code needed for it are a bit
annoying; I may move the benches to a separate directory to avoid this.
But for now, this works.
This is a significant milestone; now the Rust branch matches the C++ branch's
features.
In the process, I switched from using serde_derive (which requires nightly
Rust) to serde_codegen (which does not). It was easier than I thought it'd
be. I'm getting close to no longer requiring nightly Rust.
It would be nice to build on stable Rust. In particular, I'm hitting
compiler bugs in Rust nightly, such at this one:
https://github.com/rust-lang/rust/issues/38177
I imagine beta/stable compilers would be less problematic.
These two features were easy to get rid of:
* alloc was used to get a Box<[u8]> to uninitialized memory.
Looks like that's possible with Vec.
* box_syntax wasn't actually used at all. (Maybe a leftover from something.)
The remaining features are:
* plugin, for clippy.
https://github.com/rust-lang/rust/issues/29597
I could easily gate it with a "nightly" cargo feature.
* proc_macro, for serde_derive.
https://github.com/rust-lang/rust/issues/35900
serde does support stable rust, although it's annoying.
https://serde.rs/codegen-stable.html
I might just wait a bit; this feature looks like it's getting close to
stabilization.
Seeking in Chrome 55 wasn't working. It apparently sends If-Match requests
with the correct etag, which Moonfire NVR was incorrectly responding to with
"Precondition failed" responses. Fix and test that.
I hadn't noticed the problem in earlier versions of Chrome. I think they were
using If-Range instead, which is already tested and working.
This test is copied from the C++ implementation. It ensures the timestamps are
calculated accurately from the pts rather than using ffmpeg's estimated
duration. The Rust implementation was doing the easy-but-inaccurate thing, so
fix that to make the test pass.
Additionally, I did this with a code structure that should ensure the Rust
code never drops a Writer without indicating to the syncer that its uuid is
abandoned. Such a bug essentially leaks the partially-written file, although a
restart would cause it to be properly unlinked and marked as such. There are
no tests (yet) that exercise this scenario, though.
* new, more thorough tests based on a "BoxCursor" which navigates the
resulting .mp4. This tests everything the C++ code was testing on
Mp4SamplePieces. And it goes beyond: it tests the actual resulting .mp4
file, not some internal logic.
* fix recording::Segment::foreach to properly handle a truncated ending.
Before this was causing a panic.
* get rid of the separate recording::Segment::init method. This was some of
the first Rust I ever wrote, and I must have thought I couldn't loan it my
locked database. I can, and that's more clean. Now Segments are never
half-initialized. Less to test, less to go wrong.
* fix recording::Segment::new to treat a trailing zero duration on a segment
with a non-zero start in the same way as it does with a zero start. I'm
still not sure what I'm doing makes sense, but at least it's not
surprisingly inconsistent.
* add separate, smaller tests of recording::Segment
* address a couple TODOs in the .mp4 code and add missing comments
* change a couple panics on database corruption into cleaner error returns
* increment the etag version given the .mp4 output has changed
I found this while bringing db.rs's test coverage up to the old
moonfire-db-test.cc. I mistakenly thought that in SQLite, an ungrouped
aggregate on a relation with no rows would return a row with a null result of
the aggregate. Instead, it returns no rows. In hindsight, this makes more
sense; it matches what grouped aggregates (have to) do.
I should have submitted/pushed more incrementally but just played with it on
my computer as I was learning the language. The new Rust version more or less
matches the functionality of the current C++ version, although there are many
caveats listed below.
Upgrade notes: when moving from the C++ version, I recommend dropping and
recreating the "recording_cover" index in SQLite3 to pick up the addition of
the "video_sync_samples" column:
$ sudo systemctl stop moonfire-nvr
$ sudo -u moonfire-nvr sqlite3 /var/lib/moonfire-nvr/db/db
sqlite> drop index recording_cover;
sqlite3> create index ...rest of command as in schema.sql...;
sqlite3> ^D
Some known visible differences from the C++ version:
* .mp4 generation queries SQLite3 differently. Before it would just get all
video indexes in a single query. Now it leads with a query that should be
satisfiable by the covering index (assuming the index has been recreated as
noted above), then queries individual recording's indexes as needed to fill
a LRU cache. I believe this is roughly similar speed for the initial hit
(which generates the moov part of the file) and significantly faster when
seeking. I would have done it a while ago with the C++ version but didn't
want to track down a lru cache library. It was easier to find with Rust.
* On startup, the Rust version cleans up old reserved files. This is as in the
design; the C++ version was just missing this code.
* The .html recording list output is a little different. It's in ascending
order, with the most current segment shorten than an hour rather than the
oldest. This is less ergonomic, but it was easy. I could fix it or just wait
to obsolete it with some fancier JavaScript UI.
* commandline argument parsing and logging have changed formats due to
different underlying libraries.
* The JSON output isn't quite right (matching the spec / C++ implementation)
yet.
Additional caveats:
* I haven't done any proof-reading of prep.sh + install instructions.
* There's a lot of code quality work to do: adding (back) comments and test
coverage, developing a good Rust style.
* The ffmpeg foreign function interface is particularly sketchy. I'd
eventually like to switch to something based on autogenerated bindings.
I'd also like to use pure Rust code where practical, but once I do on-NVR
motion detection I'll need to existing C/C++ libraries for speed (H.264
decoding + OpenCL-based analysis).
* typo: the subtitle should use its own mdhd, not alias the video one
* use 64-bit ints for the edit lists; the 32-bit values overflow at 13.25 hours
* use etags that reflect the edit list
I'm seeing what is possible performance-wise in the current C++ before
trying out Go and Rust implementations.
* use the google benchmark framework and some real data.
* use release builds - I hadn't done this in a while, and there were a
few compile errors that manifested only in release mode. Update the
readme to suggest using a release build.
* optimize the varint decoder and SampleIndexIterator to branch less.
* enable link-time optimization for release builds.
* add some support for feedback-directed optimization. Ideally "make"
would automatically produce the "generate" build outputs with a
different object/library/executable suffix, run the generate
benchmark, and then produce the "use" builds. This is not that fancy;
you have to run an arcane command:
alias cmake='cmake -DCMAKE_BUILD_TYPE=Release'
cmake -DPROFILE_GENERATE=true -DPROFILE_USE=false .. && \
make recording-bench && \
src/recording-bench && \
cmake -DPROFILE_GENERATE=false -DPROFILE_USE=true .. && \
make recording-bench && \
perf stat -e cycles,instructions,branches,branch-misses \
src/recording-bench --benchmark_repetitions=5
That said, the results are dramatic - at least 50% improvement. (The
results weren't stable before as small tweaks to the code caused a
huge shift in performance, presumably something something branch
alignment something something.)
Now it's possible to quickly determine what calendar days have data and then
query recordings for just the day(s) of interest with their returned
{start,end}_time_usec.
The helper isn't used yet. The goal is to export this on /camera/<uuid>/ as
described in a TODO in design/api.md.
The next step is to keep MoonfireDatabase::CameraData::days up-to-date:
* Init: call on every recording (replacing the current aggregated query with
a recording-by-recording query)
* InsertRecording, DeleteRecordings: call for added/removed recordings
then return it from GetCamera and pass it along to the client in
WebInterface::HandleJsonCameraDetail.
* If the end of a segment is between samples, the last included sample will
have a shortened duration.
* If the beginning of a segment not on a key frame (aka sync sample), the
prefix will be included but trimmed using an edit list. (It seems like a
ctts box might be able to accomplish the same thing, fwiw.)
* gcc (Raspbian 4.9.2-10) 4.9.2 complains about -1 in const char[]s.
gcc (Ubuntu 5.2.1-22ubuntu2) 5.2.1 20151010 was fine with this.
Use '\xff' instead.
* libjsoncpp-dev 0.6.0~rc2-3.1 doesn't have Json::writeValue.
Use an older interface instead.
* libre2-dev 20140304+dfsg-2 has a bug in which custom RE2 parsers don't
compile because the relevant constructor is only declared, not defined as
trivial. (This is fixed on my Ubuntu's libre2-dev 20150701+dfsg-2.)
Avoid using this.
I tested these in VLC and QuickTime. Both players appear to ignore the
as the track dimensions, track transformation matrix, box dimensions, and box
justification. I just left them at default values then.
Automated testing is minimal. There's a new test that the resulting .mp4
parses, but I didn't actually ensure correctness of the subtitles in any way.
* Changed README.md commensurately
* Add cameras.sql to .gitignore to not commit personal camera data
* Change CMakeLists.txt to explicitly refer to hand-built libevent dirs
There's a lot of work left to do on this:
* important latency optimization: the recording threads block
while fsync()ing sample files, which can take 250+ ms. This
should be moved to a separate thread to happen asynchronously.
* write cycle optimizations: several SQLite commits per camera per minute.
* test coverage: this drops testing of the file rotation, and
there are several error paths worth testing.
* ffmpeg oddities to investigate:
* the out-of-order first frame's pts
* measurable delay before returning packets
* it sometimes returns an initial packet it calls a "key" frame that actually
has an SEI recovery point NAL but not an IDR-coded slice NAL, even though
in the input these always seem to come together. This makes playback
starting from this recording not work at all on Chrome. The symptom is
that it loads a player-looking thing with the proper dimensions but
playback never actually starts.
I imagine these are all related but haven't taken the time to dig through
ffmpeg code and understand them. The right thing anyway may be to ditch
ffmpeg for RTSP streaming (perhaps in favor of the live555 library), as
it seems to have other omissions like making it hard/impossible to take
advantage of Sender Reports. In the meantime, I attempted to mitigate
problems by decreasing ffmpeg's probesize.
* handling overlapping recordings: right now if there's too much time drift or
a time jump, you can end up with recordings that the UI won't play without
manual database changes. It's not obvious what the right thing to do is.
* easy camera setup: currently you have to manually insert rows in the SQLite
database and restart.
but I think it's best to get something in to iterate from.
This deletes a lot of code, including:
* the ffmpeg video sink code (instead now using a bit of extra code in Stream
on top of the SampleFileWriter, SampleIndexEncoder, and MoonfireDatabase
code that's been around for a while)
* FileManager (in favor of new code using the database)
* the old UI
* RealFile and friends
* the dependency on protocol buffers, which was used for the config file
(though I'll likely have other reasons for using protocol buffers later)
* even some utilities like IsWord that were just for validating the config
I discovered that the mp4 files I was writing were viewable in VLC and in
Chrome-on-desktop (ffmpeg-based) but not in Chrome-on-Android
(libstagefright-based). It turns out that I was writing Annex B sample data
rather than the correct AVCParameterSample format. ffmpeg gives both the
"extradata" and the actual frames in Annex B format when reading from rtsp.
This is still my simple, unoptimized implementation of the Annex B parser. My
Raspberry Pi 2 is still able to record my six streams using about 30% of 1
core, so it will do for the moment at least.
In particular, this returns all the extra configuration data that will be
necessary to actually instantiate streams from the database rather than the
soon-to-be-removed configuration file.
Before, I had a gross hardcoded path in moonfire-db.cc + a hacky
Recording::sample_file_path (which is StrCat(sample_file_dir, "/", uuid),
essentially). Now, things expect to take a File* to the sample file directory
and use openat(2). Several things had to change in the process:
* RealFileSlice now takes a File* dir.
* File has an Open that returns an fd (for RealFileSlice's benefit).
* BuildMp4 now is in WebInterface rather than MoonfireDatabase. The latter
only manages the SQLite database, so it shouldn't know anything about the
sample file directory.
This reverts commit ad4beac464.
That commit wasn't as advertised; I had several other changes mixed in my
working copy. I'd also copied a working copy from one path to another, and
it turns out the cmake build subdir was still referring to the original, so
I hadn't realized this commit didn't even build. :(
I didn't properly update the new duration calculation when switching from
ascending to descending order.
Also, on the Pi, 1-hour recordings are noticeably faster to load.
* Schema revisions. The most dramatic is the addition of a covering index on
(camera_id, start_time_90k) that avoids the need to make sparse accesses
into the recording table (where the desired data is intermixed with both
the large blobs and rows from other cameras). A query over a year's data
previously took many seconds (6+ even in a form without the video_index)
and now is roughly 10X faster. Queries for a couple weeks now should be
unnoticeably fast.
Other changes to shrink the rows, such as duration_90k instead of
end_time_90k (more compact varint encoding) and video_sample_entry_id
(typically 1 byte) instead of video_sample_entry_sha1 (20 bytes).
And more CHECK constraints for good measure.
* Caching of expensive computations and logic to keep them up to date.
The top-level web view previously went through the entire recording table,
which was even slower. Now it is served from a small map in RAM.
* Expanded the scope of operations to cover (hopefully) everything needed for
recording into the SQLite database.
* Added tests of MoonfireDatabase. These are basic tests that don't
exercise a lot of error cases, but at least they exist.
The main MoonfireDatabase functionality still missing is support for quickly
seeing what calendar days have data over the full timespan of a camera. This
is more data to compute and cache.
On my laptop, with a month's data, a test query would take 0.1 to 0.2 seconds
before. Now it takes 0.001 to 0.004 seconds.
I improved this by creating and taking advantage of an index on start time.
It's a little more complicated than that because the desired timespan is
specified in terms of a recording's start and end time, not start time alone.
I defined a maximum duration of a recording (5 minutes) and specified this
with an extra condition in the query so that the end time can be used to
narrow the valid range of start times.
"explain query plan select ..." output confirms it's using the index with
both > and < comparisons:
0|0|0|SEARCH TABLE recording USING INDEX recording_start_time_90k (start_time_90k>? AND start_time_90k<?)
0|1|1|SEARCH TABLE video_sample_entry USING INDEX sqlite_autoindex_video_sample_entry_1 (sha1=?)
I also refactored ListMp4Recordings out of BuildMp4File to make the measurement
easier.
This is almost certain to have performance problems with large databases,
but it's a useful starting point.
No tests yet. It shouldn't be too hard to add some for moonfire-db.h, but
I'm impatient to fake up enough data to check on the performance and see
what needs to change there first.
This wraps libevent's evhttp_parse_query_str and friends. It's easier to use
than the raw libevent stuff because it handles initialization (formerly not
done properly in profiler.cc) and cleans up with RAII.
I wrote the old interface before playing much with SQLite. Now that I've
played around with it a bit, I found many ways to make the interface more
pleasant and fool-proof:
* it opens the database in a mode that honors foreign keys and
returns extended result codes.
* it forces locking to avoid SQLITE_BUSY and
sqlite3_{changes,last_insert_rowid} race conditions.
* it supports named bind parameters.
* it defers some errors until Step() to reduce caller verbosity.
* it automatically handles calling reset, which was quite easy to forget.
* it remembers the Step() return value, which makes the row loop every so
slightly more pleasant.
* it tracks transaction status.
This isn't as much of a speed-up as you might imagine; most of the large HTTP
content was mmap()ed files which are relatively efficient. The big improvement
here is that it's now possible to serve large files (4 GiB and up) on 32-bit
machines. This actually works: I was just able to browse a 25-hour, 37 GiB
.mp4 file on my Raspberry Pi 2 Model B. It takes about 400 ms to start serving
each request, which isn't exactly zippy but might be forgivable for such a
large file. I still intend for the common request from the web interface to be
for much smaller fragmented .mp4 files.
Speed could be improved later through caching. Right now my test code is
creating a fresh VirtualFile from a database query on each request, even
though it hasn't changed. The tricky part will be doing cache invalidation
cleanly if it does change---new recordings are added to the requested time
range, recordings are deleted, or existing recordings' timestamps are changed.
The downside to the approach here is that it requires libevent 2.1 for
evhttp_send_reply_chunk_with_cb. Unfortunately, Ubuntu 15.10 and Debian Jessie
still bundle libevent 2.0. There are a few possible improvements here:
1. fall back to assuming chunks are added immediately, so that people with
libevent 2.0 get the old bad behavior and people with libevent 2.1 get the
better behavior. This is kind of lame, though; it's easy to go through
the whole address space pretty fast, particularly when the browsers send
out requests so quickly so there may be some unintentional concurrency.
2. alter the FileSlice interface to return a pointer/destructor rather than
add something to the evbuffer. HttpServe would then add each chunk via
evbuffer_add_reference, and it'd supply a cleanupfn that (in addition to
calling the FileSlice-supplied destructor) notes that this chunk has been
fully sent. For all the currently-used FileSlices, this shouldn't be too
hard, and there are a few other reasons it might be beneficial:
* RealFileSlice could call madvise() to control the OS buffering
* RealFileSlice could track when file descriptors are open and thus
FileManager's unlink() calls don't actually free up space
* It feels dirty to expose libevent stuff through the otherwise-nice
FileSlice interface.
3. support building libevent 2.1 statically in-tree if the OS-supplied
libevent is unsuitable.
I'm tempted to go with #2, but probably not right now. More urgent to commit
support for writing the new format and the wrapper bits for viewing it.
This avoids iteration through the video index for the "interior" recordings of
a virtual file. This takes generating the size of a ~8-hour / 15 fps file from
about 60 ms to about 10 ms. I expect better savings on a Raspberry Pi 2, for
longer records, and for higher frame rates. The total time here can be
significant; one one ~day-long recording on the Pi, it was several seconds.
I'm optimistic this will help with that.
It'd also be possible to optimize DecodeVar32 (perhaps by unrolling the loop)
but better to remove a call than to optimize one.
To add the fast path, we need a new field "video_sync_samples" in the
recording table to calculate the length of the stss table. Storage cost should
be minimal; I think typically two bytes in SQLite's record format (serial type
1, value < 128), described here: <https://www.sqlite.org/fileformat2.html>.
* Fix the mdat box size, which was not properly including the length of the
header itself. (The "mp4file" tool nicely diagnosed this corruption.)
* Fix the stsc box. The first number of each entry is meant to be a chunk
index, not a sample index. This was causing strange behavior in basically
any video player for multi-recording videos.
* Populate etag and last-modified so that Range: requests can work properly.
The etag must be changed every time the generated file format changes.
There's a serial number constant for this purpose and a test meant to help
catch such problems.
This is still pretty rough. For example, there's no test coverage of virtual
files based on multiple recordings. The etag and last modified code are stubs.
And various other conditions aren't tested at all. But it does appear to work
in a test that does a round-trip from a .mp4 file, so it should be a decent
starting point.
This code isn't pretty exactly---particularly the hardcoded lengths---but it
does work. I'll have a different mechanism for calculating the length and
nesting structure forthe more dynamic parts of the moov atom. This way is
convenient when generating a single string of mostly static data.