Fixes#206. 307a388 switched to creating a single-threaded runtime for
each stream, then destroying prior to waiting for TEARDOWN on shutdown.
This meant that the shutdown process could panic with this error:
```
panic at '/home/slamb/git/retina/src/client/mod.rs:219:22': teardown Sender shouldn't be dropped: RecvError(())
```
Let's switch back to expecting a multithreaded runtime context.
Create one for the config subcommand, too.
Don't go all the way back to the old code with its channels, though.
That had the downside that the underlying retina::Session might outlive
the caller, so there could still be an active session when we start
the next one. I haven't seen this cause problems in practice but it
still doesn't seem right.
This alone improves interop and diagnostics, as noted in Retina's
release notes. We also now give the camera name to the session group
(for improved logging of TEARDOWN operations) and expose the RTSP
server's "tool" attribute in debug logs and the config UI's "Test"
button.
Fixes#209Fixes#213
* switch the config interface over to use Retina and make the test
button honor rtsp_transport = udp.
* adjust the threading model of the Retina streaming code.
Before, it spawned a background future that read from the runtime and
wrote to a channel. Other calls read from this channel.
After, it does work directly from within the block_on calls (no
channels).
The immediate motivation was that the config interface didn't have
another runtime handy. And passing in a current thread runtime
deadlocked. I later learned this is a difference between
Runtime::block_on and Handle::block_on. The former will drive IO and
timers; the latter will not.
But this is also more efficient to avoid so many thread hand-offs.
Both the context switches and the extra spinning that
tokio appears to do as mentioned here:
https://github.com/scottlamb/retina/issues/5#issuecomment-871971550
This may not be the final word on the threading model. Eventually
I may not have per-stream writing threads at all. But I think it will
be easier to look at this after getting rid of the separate
`moonfire-nvr config` subcommand in favor of a web interface.
* in tests, read `.mp4` files via the `mp4` crate rather than ffmpeg.
The annoying part is that this doesn't parse edit lists; oh well.
* simplify the `Opener` interface. Formerly, it'd take either a RTSP
URL or a path to a `.mp4` file, and they'd share some code because
they both sometimes used ffmpeg. Now, they're totally different
libraries (`retina` vs `mp4`). Pull the latter out to a `testutil`
module with a different interface that exposes more of the `mp4`
stuff. Now `Opener` is just for RTSP.
* simplify the h264 module. It had a lot of logic to deal with Annex B.
Retina doesn't use this encoding.
Fixes#36Fixes#126
* switch from json to toml.
I think this will be more user-friendly. It allows comments and has
less punctuation. Fewer surprises than yaml (which has e.g. the
"Norway problem"). I might have stayed with JSON if I could see a
good serde json library that allows comments, but hson is unmaintained
and serde-json strictly follows the spec.
* switch from camelCase to snake_case. Seems more idiomatic for TOML
and matches the Rust source.
* forbid unknown keys. Better to spot errors sooner.
* rename "trust_forward_hdrs" to "trust_forward_headers". Nothing else
is abbreviated.
This is better particularly when the user is following the docker
instructions and doesn't have a local checkout at all. It also is a
rendered HTML view rather than raw markdown.
It'd be nice to link to the exact release we're using, not tip of
master. I didn't do this now because it'll likely take some work with
build.rs to check if the user is on a tagged release or not.
Fixes#180
Before it would produce this incorrect message that told you to run
the command you just ran:
```
$ nvr init --db-dir=/nonexistent/db
E20211021 09:08:23.798 main moonfire_nvr] Exiting due to error: db dir /nonexistent/db not found; try running moonfire-nvr init
caused by: ENOENT: No such file or directory
```
Now the same command produces the following:
```
$ nvr init --db-dir=/nonexistent/db
E20211021 09:09:11.056 main moonfire_nvr] Exiting due to error: unable to create db dir /nonexistent/db
caused by: ENOENT: No such file or directory
```
Add tests just for good measure.
After a frustrating search for a suitable channel to use for shutdown
(tokio::sync:⌚:Receiver and
futures::future::Shared<tokio::sync::oneshot::Receiver> didn't look
quite right) in which I rethought my life decisions, I finally just made
my own (server/base/shutdown.rs). We can easily poll it or wait for it
in async or sync contexts. Most importantly, it's convenient; not that
it really matters here, but it's also efficient.
We now do a slightly better job of propagating a "graceful" shutdown
signal, and this channel will give us tools to improve it over time.
* Shut down even when writer or syncer operations are stuck. Fixes#117
* Not done yet: streamers should instantly shut down without waiting for
a connection attempt or frame or something. I'll probably
implement that when removing --rtsp-library=ffmpeg. The code should be
cleaner then.
* Not done yet: fix a couple places that sleep for up to a second when
they could shut down immediately. I just need to do the plumbing for
mock clocks to work.
I also implemented an immediate shutdown mode, activated by a second
signal. I think this will mitigate the streamer wait situation.
* upgrade to Retina 0.3.1 which automatically tears down sessions
* wait out stale sessions before reconnecting
* wait for teardown to complete before shutting down
This adds some pressure on #117: it will keep waiting for the stale
session to expire even if the user has requested shutdown. I'll try
to address that next.
Fixes#136
Before:
```
E20210803 09:00:31.161 main moonfire_nvr] panic at '/Users/slamb/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.10/src/server/server.rs:68:17': error binding to 0.0.0.0:80: error creating server listener: Address already in use (os error 48)
(set environment variable RUST_BACKTRACE=1 to see backtraces)
...potentially unrelated log msgs from other threads before exiting...
```
After:
```
E20210803 09:06:02.633 main moonfire_nvr] Exiting due to error: unable to bind --http-addr=0.0.0.0:80
caused by: error creating server listener: Address already in use (os error 48)
(set environment variable RUST_BACKTRACE=1 to see backtraces)
```
I see a lot of yields and such in CPU profiles. I think the workers
are frequently waking up, finding there's not much to do, and going back
to sleep. Reducing the number of worker threads seems reasonable.
This isn't well-tested and doesn't yet support an initial connection
timeout. But in a quick test, it successfully returns video!
I'd like to do some more aggressive code restructuring for zero-copy
and to have only one writer thread per sample file directory (rather
than the syncer thread + one writer thread per RTSP stream). But I'll
likely wait until I drop support for ffmpeg entirely.
This is (slightly) complicating the switch from ffmpeg to retina
as the RTSP client. And it's not really that close to what I want
to end up with for analytics:
* I'd prefer the analytics happen in a separate process for
several reasons
* Feeding the entire frame to the object detector doesn't produce
good results.
* It doesn't do anything with the results yet anyway.
I'm tired of all the boilerplate, so use the new
GPL-3.0-linking-exception license identifier instead in all the server
components.
I left the ui stuff alone because I'm just going to replace it (#111).
Add a checker for the header because it's easy to forget.
I want to make the project more accessible by not expecting folks to
match my idiosyncratic style. Now almost [1] everything is written
in the "standard" style. CI enforces this.
[1] "Almost": I used #[rustfmt::skip] in a few sections where I felt
aligning things in columns significantly improves readability.
For recovering from corruption, as in #107. These should aid in
restoring database integrity without throwing away the entire database.
I only added the conditions that came up in #107, so far.
* "Missing ... row" => --trash-orphan-sample-files
* "Recording ... missing file" => --delete-orphan-rows
* "bad video_index" => --trash-corrupt-rows
Inspired by the poor error message here:
https://github.com/scottlamb/moonfire-nvr/issues/107#issuecomment-777587727
* print the friendlier Display version of the error rather than Debug.
Eg, "EROFS: Read-only filesystem" rather than "Sys(EROFS)". Do this
everywhere: on command exit, on syncer retries, and on stream
retries.
* print the most immediate problem and additional lines for each
cause.
* print the backtrace or an advertisement for RUST_BACKTRACE=1 if it's
unavailable.
* also mention RUST_BACKTRACE=1 in the troubleshooting guide.
* add context in various places, including pathnames. There are surely
many places more it'd be helpful, but this is a start.
* allow subcommands to return failure without an Error.
In particular, "moonfire-nvr check" does its own error printing
because it wants to print all the errors it finds. Printing "see
earlier errors" with a meaningless stack trace seems like it'd just
confuse. But I also want to get rid of the misleading "Success" at
the end and 0 return to the OS.
* give a rule of thumb for update time in the documentation
* log the SQLite3 version, which can affect performance
* do the vacuum in non-WAL mode, to correctly set the page size and to
avoid very slow behavior on older SQLite3 versions. Larger page sizes
are generally faster (including subsequent vacuum operations).
This won't help much for the first vacuum after this change, but it
will help afterward.
* likewise, set the page size properly on "moonfire-nvr init".
This was mostly straightforward. The most confusing part waas the Sync
bound change on body streams. I copied what hyper did and it seemed to
work. /shruggie
Besides being more clear about what belongs to which, this helps with
docker caching. The server and ui parts are only rebuilt when their
respective subdirectories change.
Extend this a bit further by making the webpack build not depend on
the target architecture. And adding cache dirs so parts of the server
and ui build process can be reused when layer-wide caching fails.