For #136. I'm also going to make this particular case no longer panic,
but there will surely be other affected panics.
Before:
```
E20210803 08:58:31.606 main moonfire_nvr] panic at '/Users/slamb/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.10/src/server/server.rs:68:17'
```
After:
```
E20210803 08:59:51.319 main moonfire_nvr] panic at '/Users/slamb/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.10/src/server/server.rs:68:17': error binding to 0.0.0.0:80: error creating server listener: Address already in use (os error 48)
```
While I'm here, return a clean error if a non-initial video frame
includes a parameter change, rather than doing something crazy (#42).
It's still broken under ffmpeg, it's untested, and it's not as clean
as seamlessly starting a new recording with the new parameters, but
it's better than nothing.
* have a timeout for opening the connection and getting the next
video frame. The former is quite important. The latter is arguably
redundant with the keepalive timer, but this ensures we actually
get a full frame in this timespan rather than some keepalive
responses, RTCP sender reports, or partial frames.
* don't drop extra stuff on loss; just note it. I'm not sure what the
right behavior is but I think I shouldn't change too much at once.
I see a lot of yields and such in CPU profiles. I think the workers
are frequently waking up, finding there's not much to do, and going back
to sleep. Reducing the number of worker threads seems reasonable.
Moonfire NVR has some enforcement on its own; this makes retina vs
ffmpeg more of an apples-to-apples comparison.
I'm also thinking of dropping enforcement from retina; enough things
have sketchy timestamps that this policy doesn't make much sense anyway.
This isn't well-tested and doesn't yet support an initial connection
timeout. But in a quick test, it successfully returns video!
I'd like to do some more aggressive code restructuring for zero-copy
and to have only one writer thread per sample file directory (rather
than the syncer thread + one writer thread per RTSP stream). But I'll
likely wait until I drop support for ffmpeg entirely.
This is (slightly) complicating the switch from ffmpeg to retina
as the RTSP client. And it's not really that close to what I want
to end up with for analytics:
* I'd prefer the analytics happen in a separate process for
several reasons
* Feeding the entire frame to the object detector doesn't produce
good results.
* It doesn't do anything with the results yet anyway.
This caused served chunks to be truncated. On seek, nginx sometimes
served 502 errors, chrome sometimes returned
ERR_CONTENT_LENGTH_MISMATCH, and videos weren't playing properly.
Reading from the mmap()ed region in the tokio threads could cause
them to stall:
* That could affect UI serving when there were concurrent
UI requests (i.e., not just requests that needed the reads in
question anyway).
* If there's a faulty disk, it could cause the UI to totally hang.
Better to not mix disks between threads.
* Soon, I want to handle RTSP from the tokio threads (#37). Similarly,
we don't want RTSP streaming to block on operations from unrelated
disks.
I went with just one thread per disk which I think is sufficient.
But it'd be possible to do a fixed-size pool instead which might improve
latency when some pages are already cached.
I also dropped the memmap dependency. I had to compute the page
alignment anyway to get mremap to work, and Moonfire NVR already is
Unix-specific, so there wasn't much value from the memmap or memmap2
crates.
Fixes#88
* my dad's GW4089IP cameras use 720x480
* some Reolink cameras use 640x352
* I'm playing with rotated cameras (16x9 -> 9x16)
I'd prefer to calculate pasp from a configured camera aspect ratio
than to hardcode the assumption these are 16x9, but that requires
a schema change. This is an improvement for now.
* API change: in update signals, allow setting a start time relative
to now. This is an accuracy improvement in the case where the client
has been retrying an initial request for a while. Kind of an obscure
corner case but easy enough to address. And use a more convenient
enum representation.
* in update signals, choose `now` before acquiring the database lock.
If lock acquisition takes a long time, this more accurately reflects
the time the caller intended.
* in general, make Time and Duration (de)serializable and use them
in json types. This makes the types more self-describing, with
better debug printing on both the server side and on the client
library (in moonfire-playground). To make this work, base has to
import serde which initially seemed like poor layering to me, but
serde seems to be imported in some pretty foundational Rust crates
for this reason. I'll go with it.
In particular, this was happening out of the box on Raspberry Pi OS Lite
20210304, as reported by ironoxidizer@gmail.com here:
https://groups.google.com/g/moonfire-nvr-users/c/2j9LvfFl2u8/m/tJcNS2WfCQAJ
* adjust main.rs to make the problem more obvious
* mention it in the troubleshooting guide
* sidestep it in the nvr docker wrapper script
also just use --networking=host rather than --publish (avoiding a proxy
process). I'm using Docker to simplify the build and deployment process,
not as a security boundary, so just do the simpler thing.
To improve reliability of live streams (#59) on Safari.
Safari was dropping the cookie from websocket update requests.
(But it worked sometimes. I don't get why.) I saw folks on the Internet
thinking this related to HttpOnly:
* https://developer.apple.com/forums/thread/104488
* https://stackoverflow.com/q/47742807/23584
but I still see this behavior without HttpOnly. SameSite=Strict vs
SameSite=Lax appears to make a difference. Try that instead.
SameSite=Strict is pointless for us anyway as noted in a new comment.
Turning off HttpOnly would be more unfortunate security-wise.
As required for live view (#59) to work on Safari.
Safari has some "interesting" expectations:
* There must be a non-empty list of compatible brands. The major brand
is not automatically included. (Looks like ISO/IEC 14496-12 doesn't
spell out which is correct.)
* The tfdt box must be before the trun boxes. Moonfire NVR was not
compliant with ISO/IEC 14496-12:2015 section 8.8.12.1 before.
Chrome and Firefox didn't care, but Safari does.
* The mdat must be written with the small format. Safari is not
implementing the spec properly.
I figured these out by painstakingly comparing Moonfire NVR's output
with gpac's, making it match almost byte-for-byte until it worked, then
backing out changes one at a time to check which were relevant. Ugh!
Chrome appears to time out at 60 seconds of inactivity otherwise.
I think it's better to keep the stream open, even if the camera is
broken.
The implementation looks awkward, but that might be the state of Rust
async right now.
The CI nightly builds had been broken with the following error:
```
error: custom inner attributes are unstable
--> /home/runner/work/moonfire-nvr/moonfire-nvr/server/target/debug/build/moonfire-db-415ce696a754c614/out/schema.rs:10:4
|
10 | #![rustfmt::skip]
| ^^^^^^^^^^^^^
|
= note: `#[deny(soft_unstable)]` on by default
= warning: this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release!
= note: for more information, see issue #64266 <https://github.com/rust-lang/rust/issues/64266>
```
I'd thought this was by mistake given that #[rustfmt::skip] is still
advertised on rustfmt's github page, but maybe not. Looks like
rust-protobuf's newest version uses
`#![cfg_attr(rustfmt, rustfmt::skip)]` to avoid this error.
Also fix a warning on nightly about an extraneous semicolon.
I also enforced some invariants in the signals code, fixing a couple
bugs. The signals code is more complex than I'd like, but hopefully
is working now.
My main goal is to support creating indexes for signals as well as
recordings. An additional goal is to just shrink db.rs a bit; it's
gotten quite large.
I think this has a minor behavior change: permission denied replies
change to HTTP 403 where they were HTTP 401. The new behavior seems
more correct, as these errors can occur when authentication has
succeeded but the session in question is not authorized for the given
operation. The UI currently doesn't care about this distinction.
I'm tired of all the boilerplate, so use the new
GPL-3.0-linking-exception license identifier instead in all the server
components.
I left the ui stuff alone because I'm just going to replace it (#111).
Add a checker for the header because it's easy to forget.
I want to make the project more accessible by not expecting folks to
match my idiosyncratic style. Now almost [1] everything is written
in the "standard" style. CI enforces this.
[1] "Almost": I used #[rustfmt::skip] in a few sections where I felt
aligning things in columns significantly improves readability.
For recovering from corruption, as in #107. These should aid in
restoring database integrity without throwing away the entire database.
I only added the conditions that came up in #107, so far.
* "Missing ... row" => --trash-orphan-sample-files
* "Recording ... missing file" => --delete-orphan-rows
* "bad video_index" => --trash-corrupt-rows
Inspired by the poor error message here:
https://github.com/scottlamb/moonfire-nvr/issues/107#issuecomment-777587727
* print the friendlier Display version of the error rather than Debug.
Eg, "EROFS: Read-only filesystem" rather than "Sys(EROFS)". Do this
everywhere: on command exit, on syncer retries, and on stream
retries.
* print the most immediate problem and additional lines for each
cause.
* print the backtrace or an advertisement for RUST_BACKTRACE=1 if it's
unavailable.
* also mention RUST_BACKTRACE=1 in the troubleshooting guide.
* add context in various places, including pathnames. There are surely
many places more it'd be helpful, but this is a start.
* allow subcommands to return failure without an Error.
In particular, "moonfire-nvr check" does its own error printing
because it wants to print all the errors it finds. Printing "see
earlier errors" with a meaningless stack trace seems like it'd just
confuse. But I also want to get rid of the misleading "Success" at
the end and 0 return to the OS.
* give a rule of thumb for update time in the documentation
* log the SQLite3 version, which can affect performance
* do the vacuum in non-WAL mode, to correctly set the page size and to
avoid very slow behavior on older SQLite3 versions. Larger page sizes
are generally faster (including subsequent vacuum operations).
This won't help much for the first vacuum after this change, but it
will help afterward.
* likewise, set the page size properly on "moonfire-nvr init".
This was mostly straightforward. The most confusing part waas the Sync
bound change on body streams. I copied what hyper did and it seemed to
work. /shruggie
Besides being more clear about what belongs to which, this helps with
docker caching. The server and ui parts are only rebuilt when their
respective subdirectories change.
Extend this a bit further by making the webpack build not depend on
the target architecture. And adding cache dirs so parts of the server
and ui build process can be reused when layer-wide caching fails.