moonfire-nvr

mirror of https://github.com/scottlamb/moonfire-nvr.git synced 2025-01-05 20:13:22 -05:00

Author	SHA1	Message	Date
Scott Lamb	579150c9d5	redact URLs within stream.rs; fixes #13	2019-02-13 22:34:19 -08:00
Scott Lamb	c271cfa2b5	make Writer enforce maximum recording duration My installation recently somehow ended up with a recording with a duration of 503793844 90,000ths of a second, way over the maximum of 5 minutes. (Looks like the machine was pretty unresponsive at the time and/or having network problems.) When this happens, the system really spirals. Every flush afterward (12 per minute with my installation) fails with a CHECK constraint failure on the recording table. It never gives up on that recording. /var/log fills pretty quickly as this failure is extremely verbose (a stack trace, and a line for each byte of video_index). Eventually the sample file dirs fill up too as it continues writing video samples while GC is stuck. The video samples are useless anyway; given that they're not referenced in the database, they'll be deleted on next startup. This ensures the offending recording is never added to the database, so we don't get the same persistent problem. Instead, writing to the recording will fail. The stream will drop and be retried. If the underlying condition that caused a too-long recording (many non-key-frames, or the camera returning a crazy duration, or the monotonic clock jumping forward extremely, or something) has gone away, the system should recover.	2019-01-29 08:26:36 -08:00
Scott Lamb	b5387af3d4	lose "extern crate" everywhere (Rust 2018 edition)	2018-12-28 21:59:39 -06:00
Scott Lamb	699ec87968	upgrade to 2018 Rust edition This is mostly just "cargo fix --edition" + Cargo.toml changes. There's one fix for upgrading to NLL in db/writer.rs: Writer::previously_opened wouldn't build with NLL because of a double-borrow the previous borrow checker somehow didn't catch. Restructure to avoid it. I'll put elective NLL changes in a following commit.	2018-12-28 14:59:06 -06:00
Scott Lamb	8dc5d64333	make with_recording_playback less monomorphized This is a minor code size reduction - instead of being monomorphized into four variants (according to "cargo llvm-lines"), it's now monomorphized into two. The stripped release binary on macOS is about 8kB smaller (0.15%). Not a huge improvement but better than nothing. Benchmarks seem unchanged (though they have a lot of variance).	2018-08-24 15:34:42 -07:00
Scott Lamb	addeb9d2f6	add a TimerGuard around db locks & ops I moved the clocks member from LockedDatabase to Database to make this happen, so the new DatabaseGuard (replacing a direct MutexGuard<LockedDatabase>) can access it before acquiring the lock. I also made the type of clock a type parameter of Database (and so several other things throughout the system). This allowed me to drop the Arc<>, but more importantly it means that the Clocks trait doesn't need to stay object-safe. I plan to take advantage of that shortly.	2018-03-23 13:31:23 -07:00
Scott Lamb	4c8daa6d24	save timestamps along with opens	2018-03-10 16:15:36 -08:00
Scott Lamb	d6fa470713	tests and fixes for Writer and Syncer * separate these out into a new file, writer.rs, as dir.rs was getting unwieldy. * extract traits for the parts of SampleFileDir and std::fs::File they needed; set up mock implementations. * move clock.rs to a new base crate to be accessible from the db crate. * add tests that exercise all the retry paths. * bugfix: account for the new recording's bytes when calculating how much to delete. * bugfix: when retrying an unlink failure in collect_garbage, we shouldn't warn about all the recordings no longer existing. Do this by retrying each step rather than the whole procedure again. * avoid double-panic scenarios, which I hit while tweaking the mocks. These are quite annoying to debug as Rust doesn't print information about either panic. I ended up using lldb to get a backtrace. Better to be cautious about what we're doing when already panicking. * give more context on raw::insert_recording errors, which I hit as well while tweaking the new tests.	2018-03-07 04:42:46 -08:00
Scott Lamb	fb4d88d3e2	make db::dir::Writer equally stubborn Every recording it starts must be sent to the syncer with at least one sample written. It will try forever (unless the channel is down, then panic). This avoids the situation in which it prevents something in the uncommitted VecDeque from ever being synced and thus any further recordings from being flushed.	2018-02-28 12:32:52 -08:00
Scott Lamb	b1d71c4e8d	improve Syncer's robustness The new approach is to, rather than panicking, retry forever. The assumption is that if a given operation is failing, a following operation is unlikely to succeed, so it's simpler to just keep trying the earlier one than come up with ways to undo it and proceed with later operations. I still need to apply this approach to the Writer class. It currently unwraps (crashes) or just gives up on a recording without ever sending it to the Syncer. Given that recordings are all synced in order, that means further ones can never be synced.	2018-02-28 11:07:55 -08:00
Scott Lamb	843e1b49c8	take FnMut closures by reference I mistakenly thought these had to be monomorphized. (The FnOnce still does, until rust-lang/rfcs#1909 is implemented.) Turns out this way works fine. It should result in less compile time / code size, though I didn't check this.	2018-02-23 09:19:42 -08:00
Scott Lamb	b037c9bdd7	knob to reduce db commits (SSD write cycles) This improves the practicality of having many streams (including the doubling of streams by having main + sub streams for each camera). With these tuned properly, extra streams don't cause any extra write cycles in normal or error cases. Consider the worst case in which each RTSP session immediately sends a single frame and then fails. Moonfire retries every second, so this would formerly cause one commit per second per stream. (flush_if_sec=0 preserves this behavior.) Now the commits can be arbitrarily infrequent by setting higher values of flush_if_sec. WARNING: this isn't production-ready! I hacked up dir.rs to make tests pass and "moonfire-nvr run" work in the best-case scenario, but it doesn't handle errors gracefully. I've been debating what to do when writing a recording fails. I considered "abandoning" the recording then either reusing or skipping its id. (in the latter case, marking the file as garbage if it can't be unlinked immediately). I think now there's no point in abandoning a recording. If I can't write to that file, there's no reason to believe another will work better. It's better to retry that recording forever, and perhaps put the whole directory into an error state that stops recording until those writes go through. I'm planning to redesign dir.rs to make this happen.	2018-02-22 16:35:34 -08:00
Scott Lamb	31adbc1e9f	initial split of database to a separate crate It should reduce compile time / memory usage to put quite a bit of the code into a separate crate. I also intend to limit visibility of some things to only within the db crate, but that's for a future change. This is the smallest move that will compile.	2018-02-20 23:15:39 -08:00
Scott Lamb	d84e754b2a	replace homegrown Error with failure crate This reduces boilerplate, making it a bit easier for me to split the db stuff out into its own crate.	2018-02-20 22:46:14 -08:00
Scott Lamb	253f3de399	reorganize the sample file directory The filenames now represent composite ids (stream id + recording id) rather than a separate uuid system with its own reservation for a few benefits: * This provides more information when there are inconsistencies. * This avoids the need for managing the reservations during recording. I expect this to simplify delaying flushing of newly written sample files. Now the directory has to be scanned at startup for files that never got written to the database, but that's acceptably fast even with millions of files. * Less information to keep in memory and in the recording_playback table. I'd considered using one directory per stream, which might help if the filesystem has trouble coping with huge directories. But that would mean each dir has to be fsync()ed separately (more latency and/or more multithreading). So I'll stick with this until I see concrete evidence of a problem that would solve. Test coverage of the error conditions is poor. I plan to do some restructuring of the db/dir code, hopefully making steps toward testability along the way.	2018-02-20 10:11:10 -08:00
Scott Lamb	89b6bccaa3	support multiple sample file directories This is still pretty basic support. There's no config UI support for renaming/moving the sample file directories after they are created, and no error checking that the files are still in the expected place. I can imagine sysadmins getting into trouble trying to change things. I hope to address at least some of that in a follow-up change to introduce a versioning/locking scheme that ensures databases and sample file dirs match in some way. A bonus change that kinda got pulled along for the ride: a dialog pops up in the config UI while a stream is being tested. The experience was pretty bad before; there was no indication the button worked at all until it was done, sometimes many seconds later.	2018-02-11 23:04:02 -08:00
Scott Lamb	dc402bdc01	schema version 2: support sub streams This allows each camera to have a main and a sub stream. Previously there was a field in the schema for the sub stream's url, but it didn't do anything. Now you can configure individual retention for main and sub streams. They show up grouped in the UI. No support for upgrading from schema version 1 yet.	2018-02-03 22:15:54 -08:00
Scott Lamb	c43fb80639	warn if a streamer op takes too long My odroid setup has been occasionally (about once a week) losing about 15 seconds of recordings on all cameras. I'm not sure why. So I'm labelling all the likely suspect spots and logging if any of them takes longer than a second. I think this will give me more information; hopefully narrow it down to network or local disk I/O.	2018-01-31 14:20:30 -08:00
Scott Lamb	7673a00bd9	serve 'video/mp4; codecs="avc1.xxxxxx"' mime type This can be used when constructing a HTML5 SourceBuffer.	2017-10-03 23:25:58 -07:00
Scott Lamb	857a66f29c	use my own ffmpeg crate This significantly improves safety of the ffmpeg interface. The complex ABIs aren't accessed directly from Rust. Instead, I have a small C wrapper which uses the ffmpeg C API and the C headers at compile-time to determine the proper ABI in the same way any C program using ffmpeg would, so that the ABI doesn't have to be duplicated in Rust code. I've tested with ffmpeg 2.x and ffmpeg 3.x; it seems to work properly with both where before ffmpeg 3.x caused segfaults. It still depends on ABI compatibility between the compiled and running versions. C programs need this, too, and normal shared library versioning practices provide this guarantee. But log both versions on startup for diagnosing problems with this. Fixes #7	2017-09-20 21:06:06 -07:00
Scott Lamb	618709734a	trim the recording playback cache a bit It had an Arc which in hindsight isn't necessary; the actual video index generation is fast anyway. This saves a couple pointers per cache entry and the overhead of chasing them. LruCache itself also has some extra pointers on it but that's something to address another day.	2017-02-28 23:28:25 -08:00
Scott Lamb	ce363162f4	trim 16 bytes from each recording::Segment This reduces the working set by another 960 bytes for a typical one-hour recording, improving cache efficiency a bit more. 8 bytes from SampleIndexIterator: * reduce the three "bytes" fields to two. Doing so as "bytes_key" vs "bytes_nonkey" slowed it down a bit, perhaps because the "bytes" is needed right away and requires a branch. But "bytes" vs "bytes_other" seems fine. Looks like it can do this with cmovs in parallel with other stuff. * stuff "is_key" into the "i" field. 8 bytes from recording::Segment itself: * make "frames" and "key_frame" u16s * stuff "trailing_zero" into "video_sample_entry_id"	2017-02-27 21:14:06 -08:00
Scott Lamb	2d0c78a6d8	style improvements * remove stuttering: mp4::Mp4Foo -> mp4::Foo * stop using a &MutexGuard<Foo> where a &Foo will do	2017-02-24 21:33:26 -08:00
Scott Lamb	14461fcad9	bugfix: only double length of first recording	2016-12-30 06:39:09 -08:00
Scott Lamb	938d8a752f	camera clock frequency correction As described in design/time.md: * get the realtime-monotonic once at the start of a run and use the monotonic clock afterward to avoid problems with local time steps * on every recording, try to correct the latest local_time_delta at up to 500 ppm Let's see how this works...	2016-12-29 21:05:57 -08:00
Scott Lamb	a71f6e66d8	test the new local time logic The test ensures it solves the problem of the initial buffering throwing off the start time of the first segment. Along the way, I tested and fixed the new TrailingZero flag; it wasn't being set.	2016-12-29 17:14:36 -08:00
Scott Lamb	c7443436a5	skip the first rotation This is as described in design/time.md.	2016-12-29 13:07:25 -08:00
Scott Lamb	cc297adc75	clean up Writer interface slightly	2016-12-29 12:33:34 -08:00
Scott Lamb	d001e4893c	new logic for calculating a recording's start time This is as described in design/time.md. Other aspects of that design (including using the monotonic clock and adjusting the durations to compensate for camera clock frequency error) are not implemented yet. No new tests yet. Just trying to get some flight miles on these ideas as soon as I can.	2016-12-28 20:56:08 -08:00
Scott Lamb	eee887b9a6	schema version 1 The advantages of the new schema are: * overlapping recordings can be unambiguously described and viewed. This is a significant problem right now; the clock on my cameras appears to run faster than the (NTP-synchronized) clock on my NVR. Thus, if an RTSP session drops and is quickly reconnected, there's likely to be overlap. * less I/O is required to view mp4s when there are multiple cameras. This is a pretty dramatic difference in the number of database read syscalls with pragma page_size = 1024 (605 -> 39 in one test), although I'm not sure how much of that maps to actual I/O wait time. That's probably as dramatic as it is due to overflow page chaining. But even with larger page sizes, there's an improvement. It helps to stop interleaving the video_index fields from different cameras. There are changes to the JSON API to take advantage of this, described in design/api.md. There's an upgrade procedure, described in guide/schema.md.	2016-12-20 22:08:18 -08:00
Scott Lamb	8df0eae567	add a basic test of Streamer, fix it This test is copied from the C++ implementation. It ensures the timestamps are calculated accurately from the pts rather than using ffmpeg's estimated duration. The Rust implementation was doing the easy-but-inaccurate thing, so fix that to make the test pass. Additionally, I did this with a code structure that should ensure the Rust code never drops a Writer without indicating to the syncer that its uuid is abandoned. Such a bug essentially leaks the partially-written file, although a restart would cause it to be properly unlinked and marked as such. There are no tests (yet) that exercise this scenario, though.	2016-12-06 18:41:44 -08:00
Scott Lamb	0a7535536d	Rust rewrite I should have submitted/pushed more incrementally but just played with it on my computer as I was learning the language. The new Rust version more or less matches the functionality of the current C++ version, although there are many caveats listed below. Upgrade notes: when moving from the C++ version, I recommend dropping and recreating the "recording_cover" index in SQLite3 to pick up the addition of the "video_sync_samples" column: $ sudo systemctl stop moonfire-nvr $ sudo -u moonfire-nvr sqlite3 /var/lib/moonfire-nvr/db/db sqlite> drop index recording_cover; sqlite3> create index ...rest of command as in schema.sql...; sqlite3> ^D Some known visible differences from the C++ version: * .mp4 generation queries SQLite3 differently. Before it would just get all video indexes in a single query. Now it leads with a query that should be satisfiable by the covering index (assuming the index has been recreated as noted above), then queries individual recording's indexes as needed to fill a LRU cache. I believe this is roughly similar speed for the initial hit (which generates the moov part of the file) and significantly faster when seeking. I would have done it a while ago with the C++ version but didn't want to track down a lru cache library. It was easier to find with Rust. * On startup, the Rust version cleans up old reserved files. This is as in the design; the C++ version was just missing this code. * The .html recording list output is a little different. It's in ascending order, with the most current segment shorten than an hour rather than the oldest. This is less ergonomic, but it was easy. I could fix it or just wait to obsolete it with some fancier JavaScript UI. * commandline argument parsing and logging have changed formats due to different underlying libraries. * The JSON output isn't quite right (matching the spec / C++ implementation) yet. Additional caveats: * I haven't done any proof-reading of prep.sh + install instructions. * There's a lot of code quality work to do: adding (back) comments and test coverage, developing a good Rust style. * The ffmpeg foreign function interface is particularly sketchy. I'd eventually like to switch to something based on autogenerated bindings. I'd also like to use pure Rust code where practical, but once I do on-NVR motion detection I'll need to existing C/C++ libraries for speed (H.264 decoding + OpenCL-based analysis).	2016-11-25 14:34:00 -08:00

32 Commits