(I also considered the names "capabilities" and "scopes", but I think
"permissions" is the most widely understood.)
This is increasingly necessary as the web API becomes more capable.
Among other things, it allows:
* non-administrator users who can view but not access camera passwords
or change any state
* workers that update signal state based on cameras' built-in motion
detection or a security system's events but don't need to view videos
* control over what can be done without authenticating
Currently session permissions are just copied from user permissions, but
you can also imagine admin sessions vs not, as a checkbox when signing
in. This would match the standard Unix workflow of using a
non-administrative session most of the time.
Relevant to my current signals work (#28) and to the addition of an
administrative API (#35, including #66).
This is mostly just "cargo fix --edition" + Cargo.toml changes.
There's one fix for upgrading to NLL in db/writer.rs:
Writer::previously_opened wouldn't build with NLL because of a
double-borrow the previous borrow checker somehow didn't catch.
Restructure to avoid it.
I'll put elective NLL changes in a following commit.
The guide is not as quick to follow and amateur-friendly as I'd like. A
few things that might improve matters:
* complete #27 (built-in https+letsencrypt), so that when not sharing
the port, users don't need to use nginx or certbot.
* more ubiquitous IPv6 (out of my control but should happen over
time) to reduce need to share the port
* embed a dynamic DNS client
* support UPnP Internet Gateway Device Control Protocol (if common
routers have this enabled? probably not for security reasons.)
It's progress, though. Enough that I think I'll merge the auth branch
into master shortly.
I moved the clocks member from LockedDatabase to Database to make this happen,
so the new DatabaseGuard (replacing a direct MutexGuard<LockedDatabase>) can
access it before acquiring the lock.
I also made the type of clock a type parameter of Database (and so several
other things throughout the system). This allowed me to drop the Arc<>, but
more importantly it means that the Clocks trait doesn't need to stay
object-safe. I plan to take advantage of that shortly.
* separate these out into a new file, writer.rs, as dir.rs was getting
unwieldy.
* extract traits for the parts of SampleFileDir and std::fs::File they needed;
set up mock implementations.
* move clock.rs to a new base crate to be accessible from the db crate.
* add tests that exercise all the retry paths.
* bugfix: account for the new recording's bytes when calculating how much to
delete.
* bugfix: when retrying an unlink failure in collect_garbage, we shouldn't
warn about all the recordings no longer existing. Do this by retrying each
step rather than the whole procedure again.
* avoid double-panic scenarios, which I hit while tweaking the mocks. These
are quite annoying to debug as Rust doesn't print information about either
panic. I ended up using lldb to get a backtrace. Better to be cautious about
what we're doing when already panicking.
* give more context on raw::insert_recording errors, which I hit as well while
tweaking the new tests.
This improves the practicality of having many streams (including the doubling
of streams by having main + sub streams for each camera). With these tuned
properly, extra streams don't cause any extra write cycles in normal or error
cases. Consider the worst case in which each RTSP session immediately sends a
single frame and then fails. Moonfire retries every second, so this would
formerly cause one commit per second per stream. (flush_if_sec=0 preserves
this behavior.) Now the commits can be arbitrarily infrequent by setting
higher values of flush_if_sec.
WARNING: this isn't production-ready! I hacked up dir.rs to make tests pass
and "moonfire-nvr run" work in the best-case scenario, but it doesn't handle
errors gracefully. I've been debating what to do when writing a recording
fails. I considered "abandoning" the recording then either reusing or skipping
its id. (in the latter case, marking the file as garbage if it can't be
unlinked immediately). I think now there's no point in abandoning a recording.
If I can't write to that file, there's no reason to believe another will work
better. It's better to retry that recording forever, and perhaps put the whole
directory into an error state that stops recording until those writes go
through. I'm planning to redesign dir.rs to make this happen.
It should reduce compile time / memory usage to put quite a bit of the code
into a separate crate. I also intend to limit visibility of some things to
only within the db crate, but that's for a future change. This is the smallest
move that will compile.
The filenames now represent composite ids (stream id + recording id) rather
than a separate uuid system with its own reservation for a few benefits:
* This provides more information when there are inconsistencies.
* This avoids the need for managing the reservations during recording. I
expect this to simplify delaying flushing of newly written sample files.
Now the directory has to be scanned at startup for files that never got
written to the database, but that's acceptably fast even with millions of
files.
* Less information to keep in memory and in the recording_playback table.
I'd considered using one directory per stream, which might help if the
filesystem has trouble coping with huge directories. But that would mean each
dir has to be fsync()ed separately (more latency and/or more multithreading).
So I'll stick with this until I see concrete evidence of a problem that would
solve.
Test coverage of the error conditions is poor. I plan to do some restructuring
of the db/dir code, hopefully making steps toward testability along the way.
The idea is to avoid the problems described in src/schema.proto; those
possibilities have bothered me for a while. A bonus is that (in a future
commit) it can replace the sample file uuid scheme in favor of using
<camera_uuid>-<stream_type>/<recording_id> for several advantages:
* on data integrity problems (specifically, extra sample files), more
information to use to understand what happened.
* no more reserving sample files prior to using them. This avoids some extra
database transactions on startup (now there's an extra two total rather
than an extra one per stream). It also simplifies an upcoming change I
want to make in which some streams are not flushed immediately, reducing
the write load significantly (maybe one per minute total rather than one
per stream per minute).
* get rid of eight bytes per playback cache entry in RAM (and nine bytes
per recording_playback row on flash).
The implementation is still pretty rough in places:
* Lack of tests.
* Poor ode organization. In particular, SampleFileDirectory::write_meta
shouldn't be exposed beyond db. I'm thinking about moving db.rs and
SampleFileDirectory to a new crate, moonfire_nvr_db. This would improve
compile times as well.
* No tooling for renaming a sample file directory.
* Config subcommand still panics in conditions that can be reasonably
expected to happen.
This is still pretty basic support. There's no config UI support for
renaming/moving the sample file directories after they are created, and no
error checking that the files are still in the expected place. I can imagine
sysadmins getting into trouble trying to change things. I hope to address at
least some of that in a follow-up change to introduce a versioning/locking
scheme that ensures databases and sample file dirs match in some way.
A bonus change that kinda got pulled along for the ride: a dialog pops up in
the config UI while a stream is being tested. The experience was pretty bad
before; there was no indication the button worked at all until it was done,
sometimes many seconds later.
This allows each camera to have a main and a sub stream. Previously there was
a field in the schema for the sub stream's url, but it didn't do anything. Now
you can configure individual retention for main and sub streams. They show up
grouped in the UI.
No support for upgrading from schema version 1 yet.
This was completely wrong: it overflowed on large filesystems and
double-counted the used bytes.
The new logic is still imperfect in that if there are a bunch of files in the
process of being deleted (moved from recording to reserved_sample_files but
not yet unlinked), they'll be taken out of the total capacity. Maybe it should
stat everything in the sample file directory instead of relying on the
recording table. It's definitely an improvement, though.