After a frustrating search for a suitable channel to use for shutdown
(tokio::sync:⌚:Receiver and
futures::future::Shared<tokio::sync::oneshot::Receiver> didn't look
quite right) in which I rethought my life decisions, I finally just made
my own (server/base/shutdown.rs). We can easily poll it or wait for it
in async or sync contexts. Most importantly, it's convenient; not that
it really matters here, but it's also efficient.
We now do a slightly better job of propagating a "graceful" shutdown
signal, and this channel will give us tools to improve it over time.
* Shut down even when writer or syncer operations are stuck. Fixes#117
* Not done yet: streamers should instantly shut down without waiting for
a connection attempt or frame or something. I'll probably
implement that when removing --rtsp-library=ffmpeg. The code should be
cleaner then.
* Not done yet: fix a couple places that sleep for up to a second when
they could shut down immediately. I just need to do the plumbing for
mock clocks to work.
I also implemented an immediate shutdown mode, activated by a second
signal. I think this will mitigate the streamer wait situation.
* prefix docker/nvr commands with sudo (fixes#142).
I was just going to link to the docker documentation on setting
up non-root access, but that's kind of a personal preference.
I included a `<details>` about it instead and made all the commands
work with sudo.
* take better advantage of github markdown's code block syntax
highlighting. Use "console" for shell session stuff, put the
"nvr" wrapper script in its own block with "bash".
* add some comments to nvr wrapper script where people need to
make changes and/or will be confused.
* add a `<details>` that talks about shutting down and restarting
the session around `nvr config` (see #151). Still not user-friendly
but at least it's better documented now.
Reading from the mmap()ed region in the tokio threads could cause
them to stall:
* That could affect UI serving when there were concurrent
UI requests (i.e., not just requests that needed the reads in
question anyway).
* If there's a faulty disk, it could cause the UI to totally hang.
Better to not mix disks between threads.
* Soon, I want to handle RTSP from the tokio threads (#37). Similarly,
we don't want RTSP streaming to block on operations from unrelated
disks.
I went with just one thread per disk which I think is sufficient.
But it'd be possible to do a fixed-size pool instead which might improve
latency when some pages are already cached.
I also dropped the memmap dependency. I had to compute the page
alignment anyway to get mremap to work, and Moonfire NVR already is
Unix-specific, so there wasn't much value from the memmap or memmap2
crates.
Fixes#88
In particular, this was happening out of the box on Raspberry Pi OS Lite
20210304, as reported by ironoxidizer@gmail.com here:
https://groups.google.com/g/moonfire-nvr-users/c/2j9LvfFl2u8/m/tJcNS2WfCQAJ
* adjust main.rs to make the problem more obvious
* mention it in the troubleshooting guide
* sidestep it in the nvr docker wrapper script
also just use --networking=host rather than --publish (avoiding a proxy
process). I'm using Docker to simplify the build and deployment process,
not as a security boundary, so just do the simpler thing.
As noted in mylog's 2b1085c:
Looks like both the GNU tools' --color argument and cargo's
CARGO_TERM_COLOR expect always/never rather than on/off. Match that.
Might as well understand off/no/false and on/yes/true also.
* add more description to the troubleshooting guide
* adjust the log format to match more recent glog
* include a config for the lnav tool, which will help colorize,
browse, and search the logs.
Next up: install an ffmpeg log callback for consistency.
Inspired by the poor error message here:
https://github.com/scottlamb/moonfire-nvr/issues/107#issuecomment-777587727
* print the friendlier Display version of the error rather than Debug.
Eg, "EROFS: Read-only filesystem" rather than "Sys(EROFS)". Do this
everywhere: on command exit, on syncer retries, and on stream
retries.
* print the most immediate problem and additional lines for each
cause.
* print the backtrace or an advertisement for RUST_BACKTRACE=1 if it's
unavailable.
* also mention RUST_BACKTRACE=1 in the troubleshooting guide.
* add context in various places, including pathnames. There are surely
many places more it'd be helpful, but this is a start.
* allow subcommands to return failure without an Error.
In particular, "moonfire-nvr check" does its own error printing
because it wants to print all the errors it finds. Printing "see
earlier errors" with a meaningless stack trace seems like it'd just
confuse. But I also want to get rid of the misleading "Success" at
the end and 0 return to the OS.
The 091217b workaround of telling ffmpeg to only request the video
stream works perfectly fine for now. I'll revisit when adding audio
support (#34).
Fixes#36
Some caveats:
* it doesn't record the peer IP yet, which makes it harder to verify
sessions are valid. This is a little annoying to do in hyper now
(see hyperium/hyper#1410). The direct peer might not be what we want
right now anyway because there's no TLS support yet (see #27). In
the meantime, the sane way to expose Moonfire NVR to the Internet is
via a proxy server, and recording the proxy's IP is not useful.
Maybe better to interpret a RFC 7239 Forwarded header (and/or
the older X-Forwarded-{For,Proto} headers).
* it doesn't ever use Secure (https-only) cookies, for a similar reason.
It's not safe to use even with a tls proxy until this is fixed.
* there's no "moonfire-nvr config" support for inspecting/invalidating
sessions yet.
* in debug builds, logging in is crazy slow. See libpasta/libpasta#9.
Some notes:
* I removed the Javascript "no-use-before-defined" lint, as some of
the functions form a cycle.
* Fixed#20 along the way. I needed to add support for properly
returning non-OK HTTP statuses to signal unauthorized and such.
* I removed the Access-Control-Allow-Origin header support, which was
at odds with the "SameSite=lax" in the cookie header. The "yarn
start" method for running a local proxy server accomplishes the same
thing as the Access-Control-Allow-Origin support in a more secure
manner.