minio

Commit Graph

Author	SHA1	Message	Date
Klaus Post	b890bbfa63	Add local disk health checks (#14447 ) The main goal of this PR is to solve the situation where disks stop responding to operations. This generally causes an FD build-up and eventually will crash the server. This adds detection of hung disks, where calls on disk get stuck. We add functionality to `xlStorageDiskIDCheck` where it keeps track of the number of concurrent requests on a given disk. A total number of 100 operations are allowed. If this limit is reached we will block (but not reject) new requests, but we will monitor the state of the disk. If no requests have been completed or updated within a 15-second window, we mark the disk as offline. Requests that are blocked will be unblocked and return an error as "faulty disk". New requests will be rejected until the disk is marked OK again. Once a disk has been marked faulty, a check will run every 5 seconds that will attempt to write and read back a file. As long as this fails the disk will remain faulty. To prevent lots of long-running requests to mark the disk faulty we implement a callback feature that allows updating the status as parts of these operations are running. We add a reader and writer wrapper that will update the status of each successful read/write operation. This should allow fine enough granularity that a slow, but still operational disk will not reach 15 seconds where 50 operations have not progressed. Note that errors themselves are not enough to mark a disk faulty. A nil (or io.EOF) error will mark a disk as "good". * Make concurrent disk setting configurable via `_MINIO_DISK_MAX_CONCURRENT`. * de-couple IsOnline() from disk health tracker The purpose of IsOnline() is to ensure that we reconnect the drive only when the "drive" was - disconnected from network we need to validate if the drive is "correct" and is the same drive which belongs to this server. - drive was replaced we have to format it - we support hot swapping of the drives. IsOnline() is not meant for taking the drive offline when it is hung, it is not useful we can let the drive be online instead "return" errors for relevant calls. * return errFaultyDisk for DiskInfo() call Co-authored-by: Harshavardhana <harsha@minio.io> Possible future Improvements: * Unify the REST server and local xlStorageDiskIDCheck. This would also improve stats significantly. * Allow reads/writes to be aborted by the context. * Add usage stats, concurrent count, blocked operations, etc.	2022-03-09 11:38:54 -08:00
Anis Elleuch	84c690cb07	storage: Use request.Form and avoid mux matching (#13858 ) request.Form uses less memory allocation and avoids gorilla mux matching with weird characters in parameters such as '\n' - Remove Queries() to avoid matching - Ensure r.ParseForm is called to populate fields - Add a unit test for object names with '\n'	2021-12-09 08:38:46 -08:00
Klaus Post	c897b6a82d	fix: missing entries on first list resume (#13627 ) On first list resume or when specifying a custom markers entries could be missed in rare cases. Do conservative truncation of entries when forwarding. Replaces #13619	2021-11-10 10:41:21 -08:00
Klaus Post	4f3317effe	Close stream on panic (#13605 ) Always close streamHTTPResponse on panic on main thread to avoid write/flush after response handler has returned.	2021-11-08 08:41:27 -08:00
Harshavardhana	5ed781a330	check for context canceled after competing for locks (#13239 ) once we have competed for locks, verify if the context is still valid - this is to ensure that we do not start readdir() or read() calls on the drives on canceled connections.	2021-09-17 14:11:01 -07:00
Harshavardhana	66fcd02aa2	de-couple walkMu and walkReadMu for some granularity (#13231 ) This commit brings two locks instead of single lock for WalkDir() calls on top of `c25816eabc`. The main reason is to avoid contention between readMetadata() and ListDir() calls, ListDir() can take time on prefixes that are huge for readdir() but this shouldn't end up blocking all readMetadata() operations, this allows for more room for I/O while not overly penalizing all listing operations.	2021-09-17 12:14:12 -07:00
Harshavardhana	0f01e7ef0f	fix: check for xl.meta as directory fallback (#13023 ) Objects uploaded in this format for example ``` mc cp /etc/hosts alias/bucket/foo/bar/xl.meta mc ls -r alias/bucket/foo/bar ``` Won't list the object, handle this scenario.	2021-08-21 00:12:29 -07:00
Klaus Post	c25816eabc	xl walk: Limit walk concurrent IO (#12885 ) We are observing heavy system loads, potentially locking the system up for periods when concurrent listing operations are performed. We place a per-disk lock on walk IO operations. This will minimize the impact of concurrent listing operations on the entire system and de-prioritize them compared to other operations. Single list operations should remain largely unaffected.	2021-08-18 18:10:36 -07:00
Harshavardhana	ee028a4693	listObjects optimized to handle max-keys=1 when prefix is object (#13000 ) Some applications albeit poorly written rather than using headObject rely on listObjects to check for existence of object, this unusual request always has prefix=(to actual object) and max-keys=1 handle this situation specially such that we can avoid readdir() on the top level parent to avoid sorting and skipping, ensuring that such type of listObjects() always behaves similar to a headObject() call.	2021-08-18 18:05:05 -07:00
Harshavardhana	9c65168312	fix: all levels deep flat key match (#12996 ) this addresses a regression from #12984 which only addresses flat key from single level deep at bucket level. added extra tests as well to cover all these scenarios.	2021-08-18 07:40:53 -07:00
Harshavardhana	654a6e9871	always set the filter to skip navigating baseDir (#12984 ) baseDir is empty if the top level prefix does not end with `/` this causes large recursive listings without any filtering, to fix this filtering make sure to set the filter prefix appropriately. also do not navigate folders at top level that do not match the filter prefix, entries don't need to match prefix since they are never prefixed with the prefix anyways.	2021-08-17 07:43:24 -07:00
Klaus Post	89febdb3d6	Reuse small buffers (#12948 ) When reading metadata allow reuse of buffers in certain cases. Take the low-hanging fruit. Reduce GC overhead when listing.	2021-08-12 14:27:22 -07:00
Harshavardhana	a2cd3c9a1d	use ParseForm() to allow query param lookups once (#12900 ) ``` cpu: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz BenchmarkURLQueryForm BenchmarkURLQueryForm-4 247099363 4.809 ns/op 0 B/op 0 allocs/op BenchmarkURLQuery BenchmarkURLQuery-4 2517624 462.1 ns/op 432 B/op 4 allocs/op PASS ok github.com/minio/minio/cmd 3.848s ```	2021-08-07 22:43:01 -07:00
Harshavardhana	e124d88788	optimize listing operation concurrency (#12728 ) - remove use of getOnlineDisks() instead rely on fallbackDisks() when disk return errors like diskNotFound, unformattedDisk use other fallback disks to list from, instead of paying the price for checking getOnlineDisks() - optimize getDiskID() further to avoid large write locks when looking formatLastCheck time window This new change allows for a more relaxed fallback for listing allowing for more tolerance and also eventually gain more consistency in results even if using '3' disks by default.	2021-07-24 22:03:38 -07:00
Klaus Post	05aebc52c2	feat: Implement listing version 3.0 (#12605 ) Co-authored-by: Harshavardhana <harsha@minio.io>	2021-07-05 15:34:41 -07:00
Anis Elleuch	f30c996d48	trace: Add bucket/prefix to WalkDir() tracing (#12510 ) Bonus, replace os.* API with os-instrumented.go	2021-06-15 14:34:26 -07:00
Harshavardhana	1f262daf6f	rename all remaining packages to internal/ (#12418 ) This is to ensure that there are no projects that try to import `minio/minio/pkg` into their own repo. Any such common packages should go to `https://github.com/minio/pkg`	2021-06-01 14:59:40 -07:00
Harshavardhana	0287711dc9	fix: implement readMetadata common function for re-use (#12353 ) Previous PR #12351 added functions to read from the reader stream to reduce memory usage, use the same technique in few other places where we are not interested in reading the data part.	2021-05-21 11:41:25 -07:00
Klaus Post	9d1b6fb37d	Add XL reader without data (#12351 ) Add XL metadata reader that reads metadata only on larger files. Use for scanning and listing for now.	2021-05-21 09:10:54 -07:00
Harshavardhana	d501c5e38b	add missing responseBody drain (#12147 ) Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-26 08:59:54 -07:00
Harshavardhana	069432566f	update license change for MinIO Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-23 11:58:53 -07:00
Klaus Post	2623338dc5	Inline small file data in xl.meta file (#11758 )	2021-03-29 17:00:55 -07:00
Klaus Post	9efcb9e15c	Fix listPathRaw/WalkDir cancelation (#11905 ) In #11888 we observe a lot of running, WalkDir calls. There doesn't appear to be any listerners for these calls, so they should be aborted. Ensure that WalkDir aborts when upstream cancels the request. Fixes #11888	2021-03-26 11:18:30 -07:00
Anis Elleuch	0eb146e1b2	add additional metrics per disk API latency, API call counts #11250 ) ``` mc admin info --json ``` provides these details, for now, we shall eventually expose this at Prometheus level eventually. Co-authored-by: Harshavardhana <harsha@minio.io>	2021-03-16 20:06:57 -07:00
Harshavardhana	9171d6ef65	rename all references from crawl -> scanner (#11621 )	2021-02-26 15:11:42 -08:00
Harshavardhana	b517c791e9	[feat]: use DSYNC for xl.meta writes and NOATIME for reads (#11615 ) Instead of using O_SYNC, we are better off using O_DSYNC instead since we are only ever interested in data to be persisted to disk not the associated filesystem metadata. For reads we ask customers to turn off noatime, but instead we can proactively use O_NOATIME flag to avoid atime updates upon reads.	2021-02-24 00:14:16 -08:00
Klaus Post	c5b2a8441b	fix: faster healing when disk is replaced. (#11520 )	2021-02-18 11:06:54 -08:00
Harshavardhana	c9b0f595b9	support directory objects in listing in certain scenarios (#11452 ) When a directory object is presented as a `prefix` param our implementation tend to only list objects present common to the `prefix` than the `prefix` itself, to mimic AWS S3 like flat key behavior this PR ensures that if `prefix` is directory object, it should be automatically considered to be part of the eventual listing result. fixes #11370	2021-02-05 10:12:25 -08:00
Harshavardhana	f71e192343	avoid listing an empty dir without __XLDIR__ (#11427 ) ``` minio server /tmp/disk{1...4} mc mb myminio/testbucket/ mkdir -p /tmp/disk{1..4}/testbucket/test-prefix/ ``` This would end up being listed in the current master, this PR fixes this situation. If a directory is a leaf dir we should it being listed, since it cannot be deleted anymore with DeleteObject, DeleteObjects() API calls because we natively support directories now. Avoid listing it and let healing purge this folder eventually in the background.	2021-02-03 14:06:54 -08:00
Harshavardhana	445a9bd827	fix: heal optimizations in crawler to avoid multiple healing attempts (#11173 ) Fixes two problems - Double healing when bitrot is enabled, instead heal attempt once in applyActions() before lifecycle is applied. - If applyActions() is successful and getSize() returns proper value, then object is accounted for and should be removed from the oldCache namespace map to avoid double heal attempts.	2020-12-28 10:31:00 -08:00
Klaus Post	e6ea5c2703	crawler: Missing folder heal check per set (#10876 )	2020-12-01 12:07:39 -08:00
Harshavardhana	df93102235	fix: unwrapping issues with os.Is* functions (#10949 ) reduces 3 stat calls, reducing the overall startup time significantly.	2020-11-23 08:36:49 -08:00
Harshavardhana	70d2c2ccc9	skip files that are not erasure objects or directories (#10926 ) without this change WalkDir reports errors while trying to read `format.json/xl.meta` which is a replicated file	2020-11-19 09:15:09 -08:00
Harshavardhana	9dea7020f0	allow prefix filtering for WalkDir to be optional (#10923 )	2020-11-18 12:03:16 -08:00
Klaus Post	990d074f7d	metacache: Allow prefix filtering (#10920 ) Do listings with prefix filter when bloom filter is dirty. This will forward the prefix filter to the lister which will make it only scan the folders/objects with the specified prefix. If we have a clean bloom filter we try to build a more generally useful cache so in that case, we will list all objects/folders.	2020-11-18 10:44:18 -08:00
Klaus Post	b5a3d79bce	listobjectversions: Add shortcut for Veeam blocks (#10893 ) Add shortcut for `APN/1.0 Veeam/1.0 Backup/10.0` It requests unique blocks with a specific prefix. We skip scanning the parent directory for more objects matching the prefix.	2020-11-13 16:58:20 -08:00
Klaus Post	a3017c724e	Sort directory objects correctly (#10886 ) Decode dir objects when listing and sort them correctly.	2020-11-12 13:09:34 -08:00
Klaus Post	a982baff27	ListObjects Metadata Caching (#10648 ) Design: https://gist.github.com/klauspost/025c09b48ed4a1293c917cecfabdf21c Gist of improvements: * Cross-server caching and listing will use the same data across servers and requests. * Lists can be arbitrarily resumed at a constant speed. * Metadata for all files scanned is stored for streaming retrieval. * The existing bloom filters controlled by the crawler is used for validating caches. * Concurrent requests for the same data (or parts of it) will not spawn additional walkers. * Listing a subdirectory of an existing recursive cache will use the cache. * All listing operations are fully streamable so the number of objects in a bucket no longer dictates the amount of memory. * Listings can be handled by any server within the cluster. * Caches are cleaned up when out of date or superseded by a more recent one.	2020-10-28 09:18:35 -07:00

38 Commits