The new behavior eliminates a couple unpleasant edge cases in which it
would never flush:
* if all recording stops, whatever was unflushed would stay that way
* if every recording attempt produces a 0-duration recording (such as if the
camera sends only one frame and thus no PTS delta can be calculated),
the list of recordings to flush would continue to grow
I moved the clocks member from LockedDatabase to Database to make this happen,
so the new DatabaseGuard (replacing a direct MutexGuard<LockedDatabase>) can
access it before acquiring the lock.
I also made the type of clock a type parameter of Database (and so several
other things throughout the system). This allowed me to drop the Arc<>, but
more importantly it means that the Clocks trait doesn't need to stay
object-safe. I plan to take advantage of that shortly.
--features=bundled enables -DSQLITE_DEFAULT_FOREIGN_KEYS=1, and so some
operations have to be done in the proper order.
* enable foreign key enforcement all the time, so I test this more reliably.
* reorder some parts of the v1->v3 order. foreign key enforcement is
immediate (rather than deferred) by default. and ensure
old_recording_playback isn't left with a dangling reference to old_recording
at the v2 stage. Instead, wait until v3 to delete tables it depends on.
This is only the database schema, which I'm adding now in the hopes of
freezing schema version 3. There's no way yet to create users, much less
actually authenticate.
These are not actually populated by the code yet. I'm trying to get the
v3 schema frozen as soon as possible; actually using the fields can come
later.
Add some explanation of their value in time.md, along with some general
musing on leap seconds, and a correction on the frequency error of my cameras.
The new numbers are taken from my odroid setup. In particular, the size check
is noticeably slower than what I'd gathered before, enough to show that it
shouldn't be performed on startup.
* Major refactoring of UI code, small UI changes.
* Single file index.js split up into separate modules
* Modules for handling UI view components
* Modules for handling JSON/Model data
* Modules for support tasks
* Module to encapsulate Moonfire API
* Main application module
* index.js simplified to just activating main app
* Settings file functionality expanded
* UI adds "Time Format" popup to allow changing time representation
* CSS changes/additions to streamline looks
* Recordings loading indicator only appears after 500ms delay, if at all
* Address first set of PR change requests from Scott.
* Add copyright headers to all files (except JSON files)
* Fix bug with entering time values in range pickers
* Fixed an erroneous comment and/or spelling error here and there
* Fixed JSDoc comments where [description] was not filled in
* Removed a TODO from NVRApplication as it no longer applies
* Fixed bug handling "infinite" case of video segment lengths
* Fixed bug in "trim" handler and trim execution
* Retrofit video continues loading from separate PR
Signed-off-by: Dolf Starreveld <dolf@starreveld.com>
* Address PR comments
Signed-off-by: Dolf Starreveld <dolf@starreveld.com>
* Address PR comments
Signed-off-by: Dolf Starreveld <dolf@starreveld.com>
* Various settings in settings-nvr.js module
* settings-nvr-local.js can override settings-nvr.js
* settings-nvr-local is unchecked file
* Both files can be straight maps, or functions returning maps
* webpack env and args available to those functions
* Changes to allow active development of UI using webpack and hotloading.
* Update to webpack 4 (will make this work)
* Update webpack.config.js accordingly
* Move webpack.config.js to its own directory
* Split webpack.config.js into base.config.js, dev.config.js and prod.config.js
* Update configs to be "right" for development vs production using --mode
* Want configuration through (optional) local file that is not
checked in
* Updated package.json for newer babel-loader
* Put in a proxy to localhost port 8080 for evelopment server.
This allows "yarn start" to work on the machine where MoonFire's
server is running. This would be the default situation. Users in
a different setup can change the proxy settings.
* separate these out into a new file, writer.rs, as dir.rs was getting
unwieldy.
* extract traits for the parts of SampleFileDir and std::fs::File they needed;
set up mock implementations.
* move clock.rs to a new base crate to be accessible from the db crate.
* add tests that exercise all the retry paths.
* bugfix: account for the new recording's bytes when calculating how much to
delete.
* bugfix: when retrying an unlink failure in collect_garbage, we shouldn't
warn about all the recordings no longer existing. Do this by retrying each
step rather than the whole procedure again.
* avoid double-panic scenarios, which I hit while tweaking the mocks. These
are quite annoying to debug as Rust doesn't print information about either
panic. I ended up using lldb to get a backtrace. Better to be cautious about
what we're doing when already panicking.
* give more context on raw::insert_recording errors, which I hit as well while
tweaking the new tests.
* Install and build script enhancements
* Use functions for all reporting for consistent output
* Update reporting text to be more clear
* Add some additional messages
* Re-check some install prep items also in install script
* Fix a bug where the wrong db directory would be configured to the service
* Satisfy PR requested changes (spelling mostly).
There may be considerable lag between being fully written and being committed
when using the flush_if_sec feature. Additionally, this is a step toward
listing and viewing recordings before they're fully written. That's a
considerable delay: 60 to 120 seconds for the first recording of a run,
0 to 60 seconds for subsequent recordings.
These recordings aren't yet included in the information returned by
/api/?days=true. They probably should be, but small steps.
I want to start having the db.rs version augment this with the uncommitted
recordings, and it's nice to have the separation of the raw db vs augmented
versions. Also, this fits with the general theme of shrinking db.rs a bit.
I had to put the raw video_sample_entry_id into the rows rather than
the video_sample_entry Arc. In hindsight, this is better anyway: the common
callers don't need to do the btree lookup and arc clone on every row. I think
I'd originally done it that way only because I was quite new to rust and
didn't understand that db could be used from within the row callback given
that both borrows are immutable.
This was considering them as 0, so it would under-delete until the next flush
them delete all at once. That effectively doubled the number of bytes not yet
deleted as they're first transferred to garbage, flushed again, then unlinked.
In hindsight, the "post_tx" step in the upgrade process introduced in
e7f5733 doesn't make sense. If the procedure fails at this stage, nothing says
it still needs to be completed. If the sample file dirs have to be updated
after the database, then there should be another database version to mark that
it's fully completed, and indeed that's the purpose version 3 serves. So get
rid of the Upgrader trait and just go back to a simple run function per
version.
In the case of the sample file dir metadata, it actually can happen before the
database transaction; the stuff written to the database later just needs to be
consistent with what it finds if there's an existing metadata file from a
half-completed update.
For safety, ensure there are no unexpected directory contents before
upgrading 1->2, and ensure the metadata matches before upgrading 2->3.
I want to be able to use it in etags without having to do a full scan of the
recording_playback in advance, which would greatly increase time to first
byte. I probably will even use it in urls to ensure the segments they point to
are stable. I haven't actually done this yet - it will wait until I implement
serving unflushed recordings - but I want to get the schema set up properly.
Every recording it starts must be sent to the syncer with at least one sample
written. It will try forever (unless the channel is down, then panic). This
avoids the situation in which it prevents something in the uncommitted
VecDeque from ever being synced and thus any further recordings from being
flushed.
The new approach is to, rather than panicking, retry forever. The assumption
is that if a given operation is failing, a following operation is unlikely to
succeed, so it's simpler to just keep trying the earlier one than come up with
ways to undo it and proceed with later operations.
I still need to apply this approach to the Writer class. It currently unwraps
(crashes) or just gives up on a recording without ever sending it to the
Syncer. Given that recordings are all synced in order, that means further ones
can never be synced.
When list_oldest_recordings was called twice with no intervening flush, it
returned the same rows twice. This led to trying to delete it twice and all
following flushes failing with a "no such recording x/y" message. Now, return
each row only once, and track how many bytes have been returned.
I think dir.rs's logic is still wrong for how many bytes to delete when
multiple recordings are flushed at once (it ignores the bytes added by the
first when computing the bytes to delete for the second), but this is
progress.
I mistakenly thought these had to be monomorphized. (The FnOnce still
does, until rust-lang/rfcs#1909 is implemented.) Turns out this way works
fine. It should result in less compile time / code size, though I didn't check
this.
This needs a separate run of "cargo +nightly bench --features=nightly", so I
missed it in a couple previous commits. I probably should set up travis-ci...
As noted in schema.sql, this can be used for disambiguation. It also may be
useful in diagnosing data integrity problems.
Also, sneak in a couple minor improvements: better diagnostics in a couple
places, fix to 1->2 upgrade procedure.
This improves the practicality of having many streams (including the doubling
of streams by having main + sub streams for each camera). With these tuned
properly, extra streams don't cause any extra write cycles in normal or error
cases. Consider the worst case in which each RTSP session immediately sends a
single frame and then fails. Moonfire retries every second, so this would
formerly cause one commit per second per stream. (flush_if_sec=0 preserves
this behavior.) Now the commits can be arbitrarily infrequent by setting
higher values of flush_if_sec.
WARNING: this isn't production-ready! I hacked up dir.rs to make tests pass
and "moonfire-nvr run" work in the best-case scenario, but it doesn't handle
errors gracefully. I've been debating what to do when writing a recording
fails. I considered "abandoning" the recording then either reusing or skipping
its id. (in the latter case, marking the file as garbage if it can't be
unlinked immediately). I think now there's no point in abandoning a recording.
If I can't write to that file, there's no reason to believe another will work
better. It's better to retry that recording forever, and perhaps put the whole
directory into an error state that stops recording until those writes go
through. I'm planning to redesign dir.rs to make this happen.
It should reduce compile time / memory usage to put quite a bit of the code
into a separate crate. I also intend to limit visibility of some things to
only within the db crate, but that's for a future change. This is the smallest
move that will compile.
The filenames now represent composite ids (stream id + recording id) rather
than a separate uuid system with its own reservation for a few benefits:
* This provides more information when there are inconsistencies.
* This avoids the need for managing the reservations during recording. I
expect this to simplify delaying flushing of newly written sample files.
Now the directory has to be scanned at startup for files that never got
written to the database, but that's acceptably fast even with millions of
files.
* Less information to keep in memory and in the recording_playback table.
I'd considered using one directory per stream, which might help if the
filesystem has trouble coping with huge directories. But that would mean each
dir has to be fsync()ed separately (more latency and/or more multithreading).
So I'll stick with this until I see concrete evidence of a problem that would
solve.
Test coverage of the error conditions is poor. I plan to do some restructuring
of the db/dir code, hopefully making steps toward testability along the way.