Reading from the mmap()ed region in the tokio threads could cause
them to stall:
* That could affect UI serving when there were concurrent
UI requests (i.e., not just requests that needed the reads in
question anyway).
* If there's a faulty disk, it could cause the UI to totally hang.
Better to not mix disks between threads.
* Soon, I want to handle RTSP from the tokio threads (#37). Similarly,
we don't want RTSP streaming to block on operations from unrelated
disks.
I went with just one thread per disk which I think is sufficient.
But it'd be possible to do a fixed-size pool instead which might improve
latency when some pages are already cached.
I also dropped the memmap dependency. I had to compute the page
alignment anyway to get mremap to work, and Moonfire NVR already is
Unix-specific, so there wasn't much value from the memmap or memmap2
crates.
Fixes#88
* Use the standard UUID syntax for /etc/fstab
* Added instruction to create sample directory
* Update install.md
* Change sample ownership instead of perms
In particular, this was happening out of the box on Raspberry Pi OS Lite
20210304, as reported by ironoxidizer@gmail.com here:
https://groups.google.com/g/moonfire-nvr-users/c/2j9LvfFl2u8/m/tJcNS2WfCQAJ
* adjust main.rs to make the problem more obvious
* mention it in the troubleshooting guide
* sidestep it in the nvr docker wrapper script
also just use --networking=host rather than --publish (avoiding a proxy
process). I'm using Docker to simplify the build and deployment process,
not as a security boundary, so just do the simpler thing.
As noted in mylog's 2b1085c:
Looks like both the GNU tools' --color argument and cargo's
CARGO_TERM_COLOR expect always/never rather than on/off. Match that.
Might as well understand off/no/false and on/yes/true also.
* add more description to the troubleshooting guide
* adjust the log format to match more recent glog
* include a config for the lnav tool, which will help colorize,
browse, and search the logs.
Next up: install an ffmpeg log callback for consistency.
This eases build setup. Where Yarn requires a separate package
repository, npm is available in the standard one.
yarn's package repository signature was recently expired, and apparently
will expire again in a year. Avoid dealing with that.
Fixes#110.
Inspired by the poor error message here:
https://github.com/scottlamb/moonfire-nvr/issues/107#issuecomment-777587727
* print the friendlier Display version of the error rather than Debug.
Eg, "EROFS: Read-only filesystem" rather than "Sys(EROFS)". Do this
everywhere: on command exit, on syncer retries, and on stream
retries.
* print the most immediate problem and additional lines for each
cause.
* print the backtrace or an advertisement for RUST_BACKTRACE=1 if it's
unavailable.
* also mention RUST_BACKTRACE=1 in the troubleshooting guide.
* add context in various places, including pathnames. There are surely
many places more it'd be helpful, but this is a start.
* allow subcommands to return failure without an Error.
In particular, "moonfire-nvr check" does its own error printing
because it wants to print all the errors it finds. Printing "see
earlier errors" with a meaningless stack trace seems like it'd just
confuse. But I also want to get rid of the misleading "Success" at
the end and 0 return to the OS.
* give a rule of thumb for update time in the documentation
* log the SQLite3 version, which can affect performance
* do the vacuum in non-WAL mode, to correctly set the page size and to
avoid very slow behavior on older SQLite3 versions. Larger page sizes
are generally faster (including subsequent vacuum operations).
This won't help much for the first vacuum after this change, but it
will help afterward.
* likewise, set the page size properly on "moonfire-nvr init".
Besides being more clear about what belongs to which, this helps with
docker caching. The server and ui parts are only rebuilt when their
respective subdirectories change.
Extend this a bit further by making the webpack build not depend on
the target architecture. And adding cache dirs so parts of the server
and ui build process can be reused when layer-wide caching fails.
This brings most things reasonably up-to-date. libpasta's deps are
dragging a bit, keeping us on an older ring to avoid duplication,
and causing us to use three versions of base64. And I need to update
a few of my companion crates' parking_lot dep to match tokio.
This splits the schema and playback path. The recording path still
adjusts the frame durations and always says the wall and media durations
are the same. I expect to change that in a following commit. I wouldn't
be surprised if that shakes out some bugs in this portion.
This is useful for a combo scrub bar-based UI (#32) + live view UI (#59)
in a non-obvious way. When constructing a HTML Media Source Extensions
API SourceBuffer, the caller can specify a "mode" of either "segments"
or "sequence":
In "sequence" mode, playback assumes segments are added sequentially.
This is good enough for a live view-only UI (#59) but not for a scrub
bar UI in which you may want to seek backward to a segment you've never
seen before. You will then need to insert a segment out-of-sequence.
Imagine what happens when the user goes forward again until the end of
the segment inserted immediately before it. The user should see the
chronologically next segment or a pause for loading if it's unavailable.
The best approximation of this is to track the mapping of timestamps to
segments and insert a VTTCue with an enter/exit handler that seeks to
the right position. But seeking isn't instantaneous; the user will
likely briefly see first the segment they seeked to before. That's
janky. Additionally, the "canplaythrough" event will behave strangely.
In "segments" mode, playback respects the timestamps we set:
* The obvious choice is to use wall clock timestamps. This is fine if
they're known to be fixed and correct. They're not. The
currently-recording segment may be "unanchored", meaning its start
timestamp is not yet fixed. Older timestamps may overlap if the system
clock was stepped between runs. The latter isn't /too/ bad from a user
perspective, though it's confusing as a developer. We probably will
only end up showing the more recent recording for a given
timestamp anyway. But the former is quite annoying. It means we have
to throw away part of the SourceBuffer that we may want to seek back
(causing UI pauses when that happens) or keep our own spare copy of it
(memory bloat). I'd like to avoid the whole mess.
* Another approach is to use timestamps that are guaranteed to be in
the correct order but that may have gaps. In particular, a timestamp
of (recording_id * max_recording_duration) + time_within_recording.
But again seeking isn't instantaneous. In my experiments, there's a
visible pause between segments that drives me nuts.
* Finally, the approach that led me to this schema change. Use
timestamps that place each segment after the one before, possibly with
an intentional gap between runs (to force a wait where we have an
actual gap). This should make the browser's natural playback behavior
work properly: it never goes to an incorrect place, and it only waits
when/if we want it to. We have to maintain a mapping between its
timestamps and segment ids but that's doable.
This commit is only the schema change; the new data aren't exposed in
the API yet, much less used by a UI.
Note that stream.next_recording_id became stream.cum_recordings. I made
a slight definition change in the process: recording ids for new streams
start at 0 rather than 1. Various tests changed accordingly.
The upgrade process makes a best effort to backfill these new fields,
but of course it doesn't know the total duration or number of runs of
previously deleted rows. That's good enough.
Benefits:
* Blake3 is faster. This is most noticeable for the hashing of the
sample file data.
* we no longer need OpenSSL, which helps with shrinking the binary size
(#70). sha1 basically forced OpenSSL usage; ring deliberately doesn't
support this old algorithm, and the pure-Rust sha1 crate is painfully
slow. OpenSSL might still be a better choice than ring/rustls for TLS
but it's nice to have the option.
For the video sample entries, I decided we don't need to hash at all. I
think the id number is sufficiently stable, and it's okay---perhaps even
desirable---if an existing init segment changes for fixes like e5b83c2.
* simplify it. Go from six checked-in config files + one local one to
three checked-in configs + commandline options. I find it less
confusing to have the options plumbed through fewer layers.
* support developing against a https production server, as described in
guide/developing-ui.md.
* fix the source map. The sourceMap parameter in prod.config.js as far
as I can tell evaluated to false when run with production config, and
anyway UglifyJS seems to be incompatible with the specified
cheap-module-source-map. Use source-map instead.
The multipart stream / hanging GET approach worked in a prototype for a
single stream, but Chrome has a per-host limit of six connections. If I
try streaming all my cameras at once, I hit that limit. I can't open all
the streams, much less additional connections to load init segments and
such. Websockets apparently has a much higher limit of 256.
This doesn't take much advantage of async fns so far. For example, the
with_{form,json}_body functions are still designed to be used with
future combinators when it'd be more natural to call them from async
fns now. But it's a start.
Similarly, this still uses the old version of reqwest. Small steps.
Requires Rust 1.40 now. (1.39 is a requirement of async, and 1.40 is a
requirement of http-serve 0.2.0.)
* in markdown files, use code fences rather than indented blocks.
This is harder to screw up (one of them was off by a space so didn't
render properly) and allows me to add info strings.
* uniformly use "useradd" to create the user and group in all three
places (install-manual.md, script-functions.sh, Dockerfile) rather
than addgroup + adduser. Create a full home dir, which I suspect was
the problem in #67. Don't allow customizing group name; it's always
the same as the user.
* install the sqlite3 package so that the "moonfire-nvr sql" command
works properly.
* remove "setup_db" function, which was out of place. Since the
creation of the "moonfire-nvr init" command, this has to happen
after installation of the binary. install.md gives instructions on
this part anyway so remove it from the script.
* give a proper command to create the db dir. It was creating it
within the current directory, not within /var/lib/moonfire-nvr.
Don't bother creating sample directory; "moonfire-nvr config"
will do this.
* when setting owners on a newly created directory, use a single
"install -d" command rather than "mkdir" + "chown".
* address confusion about whether sample file dirs need to be
precreated. (Only when Moonfire NVR doesn't have write permissions
on the parent.)
* always just install the packaged version of ffmpeg rather than
building our own. This has been usable since Debian/Raspbian 9
Stretch; Debian/Raspbian 10 Buster is out now so there's no excuse
for still running Debian/Raspbian 8 Jessie.
* don't chown the UI directory; it can be owned by root as with
the binary.
* in scripts/install.sh, don't enable/start the service yet. It hasn't
been configured.
Add a new schema version 5; now 4 means the directory meta may or may
not be upgraded.
Fixes#65: now it's possible to open the directory even if it lies on a
completely full disk.
(I also considered the names "capabilities" and "scopes", but I think
"permissions" is the most widely understood.)
This is increasingly necessary as the web API becomes more capable.
Among other things, it allows:
* non-administrator users who can view but not access camera passwords
or change any state
* workers that update signal state based on cameras' built-in motion
detection or a security system's events but don't need to view videos
* control over what can be done without authenticating
Currently session permissions are just copied from user permissions, but
you can also imagine admin sessions vs not, as a checkbox when signing
in. This would match the standard Unix workflow of using a
non-administrative session most of the time.
Relevant to my current signals work (#28) and to the addition of an
administrative API (#35, including #66).
travis-ci pointed out that the dependency bump broke 1.31:
Compiling docopt v1.1.0
error[E0658]: imports can only refer to extern crate names passed with `--extern` on stable channel (see issue #53130)
--> /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/docopt-1.1.0/src/parse.rs:48:5
|
48 | use regex;
| ^^^^^
|
Looks like uniform_paths was stabilized in 1.32, and I verified locally that
version builds.
The 091217b workaround of telling ffmpeg to only request the video
stream works perfectly fine for now. I'll revisit when adding audio
support (#34).
Fixes#36
Apparently with docopt, --require-auth=false doesn't work, so booleans
with a default value of true can't be turned off. Toggle the default to
false to deal with this, for now. I'd prefer the default be true, but
I also would prefer to not use a negative --no-require-auth or
--allow-unauthenticated flag. I think I'll switch from docopt to clap
in the near future; it seems to be what the cool kids use.
The guide is not as quick to follow and amateur-friendly as I'd like. A
few things that might improve matters:
* complete #27 (built-in https+letsencrypt), so that when not sharing
the port, users don't need to use nginx or certbot.
* more ubiquitous IPv6 (out of my control but should happen over
time) to reduce need to share the port
* embed a dynamic DNS client
* support UPnP Internet Gateway Device Control Protocol (if common
routers have this enabled? probably not for security reasons.)
It's progress, though. Enough that I think I'll merge the auth branch
into master shortly.
Some caveats:
* it doesn't record the peer IP yet, which makes it harder to verify
sessions are valid. This is a little annoying to do in hyper now
(see hyperium/hyper#1410). The direct peer might not be what we want
right now anyway because there's no TLS support yet (see #27). In
the meantime, the sane way to expose Moonfire NVR to the Internet is
via a proxy server, and recording the proxy's IP is not useful.
Maybe better to interpret a RFC 7239 Forwarded header (and/or
the older X-Forwarded-{For,Proto} headers).
* it doesn't ever use Secure (https-only) cookies, for a similar reason.
It's not safe to use even with a tls proxy until this is fixed.
* there's no "moonfire-nvr config" support for inspecting/invalidating
sessions yet.
* in debug builds, logging in is crazy slow. See libpasta/libpasta#9.
Some notes:
* I removed the Javascript "no-use-before-defined" lint, as some of
the functions form a cycle.
* Fixed#20 along the way. I needed to add support for properly
returning non-OK HTTP statuses to signal unauthorized and such.
* I removed the Access-Control-Allow-Origin header support, which was
at odds with the "SameSite=lax" in the cookie header. The "yarn
start" method for running a local proxy server accomplishes the same
thing as the Access-Control-Allow-Origin support in a more secure
manner.
Fixes#60
The reqwest dependency is significant because the old version required
an old version of openssl, complicating compilation on newer platforms.
reqwest also pulled in old/duplicate versions of hyper, tokio, etc.
Nice to drop a lot of that cruft.
I left rusqlite and uuid alone because they had breaking changes I
didn't want to mess with at the moment.
Bumped the minimum Rust version to 1.30.0, as required by the
new encoding_rs crate (and perhaps other things).
1.26 doesn't work with the updated rusqlite:
error[E0658]: use of unstable library feature 'duration_extras' (see issue #46507)
--> /home/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/rusqlite-0.14.0/src/busy.rs:26:49
|
26 | .and_then(|t| t.checked_add(timeout.subsec_millis().into()))
| ^^^^^^^^^^^^^
1.25 also fails with the upgraded reffers because u128 isn't stable:
error[E0658]: 128-bit type is unstable (see issue #35118)
--> /home/travis/.cargo/git/checkouts/reffers-rs-0d00fc7f893338b3/49a4d75/src/rc_bitmask.rs:194:34
|
194 | rc_bit_mask_internal!(primitive, u128, 42, 42, 42);
| ^^^^
travis-ci pointed out that building with 1.21 broke with a recent dep
upgrade (8c52c36). reffers now uses nested groups of imports, which is a
feature introduced with Rust 1.25. Prior to 1.25, it fails as follows:
error: expected one of `,` or `as`, found `::`
--> /home/travis/.cargo/git/checkouts/reffers-rs-0d00fc7f893338b3/49a4d75/src/arc.rs:6:46
|
6 | use std::sync::atomic::{AtomicUsize, Ordering::SeqCst};
| ^^ expected one of `,` or `as` here
* install.md, install-manual.md, and easy-install.md had a lot of
redundancy. Rework them so the common prefix and suffix are in
install.md and it's clear when to navigate back and forth. This
removes from very stale references to prep.sh and cameras.sql in
install-manual.md (which never should have mentioned these scripts
anyway).
* remove all the SAMPLE_MEDIA_DIR, SAMPLE_FILE_DIR, and
SAMPLE_FILE_PATH stuff from the scripts. This was too complicated
(one variable will suffice) and inconsistent in terminology (a
couple "samples dir" occurrences slipped through review; they
should have been "sample file dir"). It also wasn't really useful
enough because the procedure for a mount point is manual anyway,
and because some installs will have multiple sample file dirs
anyway.
* in the mount point procedure, fix the paths to be consistent. Also
describe the "nofail" and "Requires=" config I have on my machine.
* fix some incorrect info about how to use "moonfire-nvr config" and
describe "flush_if_sec".
* upgrade min required rust to 1.21; crossbeam-deque requires the
ord_max_min feature, apparently stabilized in this version.
* use "make --jobs=2" to build ffmpeg so it goes faster.
https://docs.travis-ci.com/user/reference/overview/ says there are 2
cores available.
* upgrade minimum required Rust from 1.17 to 1.20; reffers 0.4.2
apparently uses std::mem::ManuallyDrop, introduced in 1.20
* install ffmpeg from source (requiring sudo access) rather than using
the ancient one from Ubuntu Trusty to meet the minimum version
requirements specified in ffmpeg/build.rs.
This is only the database schema, which I'm adding now in the hopes of
freezing schema version 3. There's no way yet to create users, much less
actually authenticate.
These are not actually populated by the code yet. I'm trying to get the
v3 schema frozen as soon as possible; actually using the fields can come
later.
Add some explanation of their value in time.md, along with some general
musing on leap seconds, and a correction on the frequency error of my cameras.
* Various settings in settings-nvr.js module
* settings-nvr-local.js can override settings-nvr.js
* settings-nvr-local is unchecked file
* Both files can be straight maps, or functions returning maps
* webpack env and args available to those functions
This improves the practicality of having many streams (including the doubling
of streams by having main + sub streams for each camera). With these tuned
properly, extra streams don't cause any extra write cycles in normal or error
cases. Consider the worst case in which each RTSP session immediately sends a
single frame and then fails. Moonfire retries every second, so this would
formerly cause one commit per second per stream. (flush_if_sec=0 preserves
this behavior.) Now the commits can be arbitrarily infrequent by setting
higher values of flush_if_sec.
WARNING: this isn't production-ready! I hacked up dir.rs to make tests pass
and "moonfire-nvr run" work in the best-case scenario, but it doesn't handle
errors gracefully. I've been debating what to do when writing a recording
fails. I considered "abandoning" the recording then either reusing or skipping
its id. (in the latter case, marking the file as garbage if it can't be
unlinked immediately). I think now there's no point in abandoning a recording.
If I can't write to that file, there's no reason to believe another will work
better. It's better to retry that recording forever, and perhaps put the whole
directory into an error state that stops recording until those writes go
through. I'm planning to redesign dir.rs to make this happen.
The filenames now represent composite ids (stream id + recording id) rather
than a separate uuid system with its own reservation for a few benefits:
* This provides more information when there are inconsistencies.
* This avoids the need for managing the reservations during recording. I
expect this to simplify delaying flushing of newly written sample files.
Now the directory has to be scanned at startup for files that never got
written to the database, but that's acceptably fast even with millions of
files.
* Less information to keep in memory and in the recording_playback table.
I'd considered using one directory per stream, which might help if the
filesystem has trouble coping with huge directories. But that would mean each
dir has to be fsync()ed separately (more latency and/or more multithreading).
So I'll stick with this until I see concrete evidence of a problem that would
solve.
Test coverage of the error conditions is poor. I plan to do some restructuring
of the db/dir code, hopefully making steps toward testability along the way.
The idea is to avoid the problems described in src/schema.proto; those
possibilities have bothered me for a while. A bonus is that (in a future
commit) it can replace the sample file uuid scheme in favor of using
<camera_uuid>-<stream_type>/<recording_id> for several advantages:
* on data integrity problems (specifically, extra sample files), more
information to use to understand what happened.
* no more reserving sample files prior to using them. This avoids some extra
database transactions on startup (now there's an extra two total rather
than an extra one per stream). It also simplifies an upcoming change I
want to make in which some streams are not flushed immediately, reducing
the write load significantly (maybe one per minute total rather than one
per stream per minute).
* get rid of eight bytes per playback cache entry in RAM (and nine bytes
per recording_playback row on flash).
The implementation is still pretty rough in places:
* Lack of tests.
* Poor ode organization. In particular, SampleFileDirectory::write_meta
shouldn't be exposed beyond db. I'm thinking about moving db.rs and
SampleFileDirectory to a new crate, moonfire_nvr_db. This would improve
compile times as well.
* No tooling for renaming a sample file directory.
* Config subcommand still panics in conditions that can be reasonably
expected to happen.
This is still pretty basic support. There's no config UI support for
renaming/moving the sample file directories after they are created, and no
error checking that the files are still in the expected place. I can imagine
sysadmins getting into trouble trying to change things. I hope to address at
least some of that in a follow-up change to introduce a versioning/locking
scheme that ensures databases and sample file dirs match in some way.
A bonus change that kinda got pulled along for the ride: a dialog pops up in
the config UI while a stream is being tested. The experience was pretty bad
before; there was no indication the button worked at all until it was done,
sometimes many seconds later.