minio

mirror of https://github.com/minio/minio.git synced 2025-05-24 10:56:12 -04:00

Author	SHA1	Message	Date
Klaus Post	b890bbfa63	Add local disk health checks (#14447 ) The main goal of this PR is to solve the situation where disks stop responding to operations. This generally causes an FD build-up and eventually will crash the server. This adds detection of hung disks, where calls on disk get stuck. We add functionality to `xlStorageDiskIDCheck` where it keeps track of the number of concurrent requests on a given disk. A total number of 100 operations are allowed. If this limit is reached we will block (but not reject) new requests, but we will monitor the state of the disk. If no requests have been completed or updated within a 15-second window, we mark the disk as offline. Requests that are blocked will be unblocked and return an error as "faulty disk". New requests will be rejected until the disk is marked OK again. Once a disk has been marked faulty, a check will run every 5 seconds that will attempt to write and read back a file. As long as this fails the disk will remain faulty. To prevent lots of long-running requests to mark the disk faulty we implement a callback feature that allows updating the status as parts of these operations are running. We add a reader and writer wrapper that will update the status of each successful read/write operation. This should allow fine enough granularity that a slow, but still operational disk will not reach 15 seconds where 50 operations have not progressed. Note that errors themselves are not enough to mark a disk faulty. A nil (or io.EOF) error will mark a disk as "good". * Make concurrent disk setting configurable via `_MINIO_DISK_MAX_CONCURRENT`. * de-couple IsOnline() from disk health tracker The purpose of IsOnline() is to ensure that we reconnect the drive only when the "drive" was - disconnected from network we need to validate if the drive is "correct" and is the same drive which belongs to this server. - drive was replaced we have to format it - we support hot swapping of the drives. IsOnline() is not meant for taking the drive offline when it is hung, it is not useful we can let the drive be online instead "return" errors for relevant calls. * return errFaultyDisk for DiskInfo() call Co-authored-by: Harshavardhana <harsha@minio.io> Possible future Improvements: * Unify the REST server and local xlStorageDiskIDCheck. This would also improve stats significantly. * Allow reads/writes to be aborted by the context. * Add usage stats, concurrent count, blocked operations, etc.	2022-03-09 11:38:54 -08:00
Poorna	46ba15ab03	Return MethodNotAllowed if force del on replicated bucket (#14505 )	2022-03-08 14:28:51 -08:00
Poorna	1e39ca39c3	fix: consistent replies for incorrect range requests on replicated buckets (#14345 ) Propagate error from replication proxy target correctly to the client if range GET is unsatisfiable.	2022-03-08 13:58:55 -08:00
Krishnan Parthasarathi	80ef1ae51c	Simplify assembling of tierStats from data-usage (#14504 )	2022-03-08 12:08:29 -08:00
Krishna Srinivas	4d0715d226	Implement netperf for "mc support perf net" (#14397 ) Co-authored-by: Klaus Post <klauspost@gmail.com>	2022-03-08 09:54:38 -08:00
Klaus Post	8a274169da	heal: Fix first entry on dangling (#14495 ) Instead of the first, the last entry was returned pointerizing the range value.	2022-03-08 09:04:20 -08:00
Harshavardhana	5d6f6d8d5b	create missing .minio.sys/config, .minio.sys/buckets during decommission (#14497 )	2022-03-07 16:18:57 -08:00
Anis Elleuch	bacf6156c1	metrics: Avoid crash when fetching tier metrics (#14493 ) Data usage does not always contain tiering info even if the data usage information is valid. Avoid a crash in that case. (e.g. the scanner scanned the namespace, the user enables tiering, prometheus scrapes the server before the scanner gets a chance to update the data usage with new tiering information)	2022-03-07 10:59:32 -08:00
Klaus Post	1d1b213f1f	scanner: Consider preselection bias when selecting for Healing (#14492 ) Healing decisions would align with skipped folder counters. This can lead to files never being selected for heal checks on "clean" paths. Use different hashing methods and take objectHealProbDiv into account when calculating the cycle. Found by @vadmeste	2022-03-07 09:25:53 -08:00
Harshavardhana	92a77cc78e	update pkg v1.1.20 to reload certs in k8s always (#14470 )	2022-03-04 20:34:39 -08:00
Harshavardhana	b0c84e3de7	fix: deleteVersions causing xl.meta to have empty Versions[] slice (#14483 ) This is a side-affect of the optimization done in PR #13544 which causes a certain type of delete operations on given object versions can cause lastVersion indication to be skipped, which leads to an `xl.meta` where Versions[] slice is empty while the entire file is intact by itself. This PR tries to ensure that such files are visible and deletable by regular means of listing as null 'delete-marker' and also avoid the situation where this potential issue might arise.	2022-03-04 20:01:26 -08:00
Anis Elleuch	bbc914e174	heal: Do not override heal scan mode mode if it is set (#14476 ) mc admin heal has --scan=deep flag which enforces bitrot checking when doing the healing. Do not force override an existing heal scan option.	2022-03-04 18:25:06 -08:00
Anis Elleuch	3fca4055d2	heal: Re-heal an object when a corruption is found during normal scan (#14482 ) When scanning using normal mode, HealObject() can report an error saying that it found a corrupted part. This doesn't have when HealObject() is called with bitrot scan flag. However, when this happens, we can still restart HealObject() with the bitrot scan. This is also important because this means the scanner and the new disks healer will not be able to heal an object that doesn't exist in a specific disk and has corruption in another disk. Also without this PR, mc admin heal command without bitrot will report an error.	2022-03-04 18:24:34 -08:00
Harshavardhana	66afa16aed	canceled PUTs throw frivolous logs (#14475 ) remote drives might throw frivolous logs, if the caller canceled the PUT operation in such scenarios there is no reason to log.	2022-03-04 10:31:33 -08:00
Harshavardhana	0e3bafcc54	improve logs, fix banner formatting (#14456 )	2022-03-03 13:21:16 -08:00
Andreas Auernhammer	b48f719b8e	kes: remove unnecessary error conversion (#14459 ) This commit removes some duplicate code that converts KES API errors. This code was added since KES `0.18.0` changed some exported API errors. However, the KES SDK handles this error conversion itself. Therefore, it is not necessary to duplicate this behavior in MinIO. See: `21555fa624/error.go (L94)` Signed-off-by: Andreas Auernhammer <hi@aead.dev>	2022-03-03 09:42:37 -08:00
Lenin Alevski	289fcbd08c	KES dependency upgrade (#14454 ) - Updating KES dependency to v.0.18.0 - Fixing incompatibility issue when checking for errors during KES key creation Signed-off-by: Lenin Alevski <alevsk.8772@gmail.com>	2022-03-02 23:03:40 -08:00
Harshavardhana	7e803adf13	do not attempt force delete on bucket (#14452 ) caller needs to ask explicitly for force delete otherwise, the force delete might end up deleting an existing bucket with data. fixes #14445	2022-03-02 20:47:53 -08:00
Anis Elleuch	4a15bd8ff8	Return info for DiskInfo when the disk is unformatted (#14427 ) In a distributed setup, a DiskInfo REST call to an unformatted disk returns an error with no disk information, such as the disk endpoint URL, which is unexpected.	2022-03-01 15:06:47 -08:00
Klaus Post	b030ef1aca	tests: Clean up dsync package (#14415 ) Add non-constant timeouts to dsync package. Reduce test runtime by minutes. Hopefully not too aggressive.	2022-03-01 11:14:28 -08:00
Harshavardhana	cc46a99f97	skip object-lock headers without values (#14430 ) metadata headers can have headers without values as per AWS S3 spec however, we need to skip some headers that do not have values that potentially can have empty values set.	2022-03-01 11:04:47 -08:00
Xuehan Xu	becec6cb6b	correct mrf.newSetReconnected invocation's param order (#14426 ) Signed-off-by: xuxuehan <xuxuehan@qianxin.com>	2022-02-28 09:13:19 -08:00
Harshavardhana	b7c90751b0	allow drive tests to respond only drive paths	2022-02-25 18:54:46 -08:00
Harshavardhana	e43cc316ff	remove errCh usage from HealObjects() simplify it (#14414 ) errCh is not needed instead, rely on errs slice to capture and return errors instead. most probably fixes #14247	2022-02-25 12:20:41 -08:00
hellivan	03b35ecdd0	collect correct parentUser for OIDC creds auto expiration (#14400 )	2022-02-24 11:43:15 -08:00
Harshavardhana	c08540c7b7	reject speedtest when there isn't enough disk space available (#14402 ) small setups do not return appropriate errors when speedtest cannot run on small tiny setups, allow the tests to fail appropriately more pro-actively. many users bring toy setups, this PR simply returns an error in such situations.	2022-02-24 09:06:18 -08:00
Shireesh Anjal	3934700a08	Make audit webhook and kafka config dynamic (#14390 )	2022-02-24 09:05:33 -08:00
Harshavardhana	2d78e20120	enable CI environment additionally for MINIO_CI_CD (#14395 ) all CI/CD environments set CI=true this is enough for MinIO to be run inside CI environments, support it.	2022-02-23 16:01:59 -08:00
Harshavardhana	2e6f8bdf19	do not skip healing disks during deletes (#14394 ) healing disks take active I/O it is possible that deleted objects might stay in .trash folder for a really long time until the drive is fully healed. this PR changes it such that we are making sure we purge the active content written to these disks as well.	2022-02-23 14:30:46 -08:00
Shireesh Anjal	25144fedd5	Send deployment id and minio version in http header (#14378 )	2022-02-23 13:36:01 -08:00
Krishnan Parthasarathi	27f64dd9a4	Add support for tier-remove and tier-verify (#14382 ) * Add tier remove support only if it's empty * Add support for tier verify	2022-02-23 13:34:25 -08:00
Harshavardhana	9d7648f02f	reduce unnecessary logging during speedtest (#14387 ) - speedtest logs calls that were canceled spuriously, in situations where it should be ignored. - all errors of interest are always sent back to the client there is no need to log them on the server console. - PUT failures should negate the increments such that GET is not attempted on unsuccessful calls. - do not attempt MRF on speedtest objects.	2022-02-23 11:59:13 -08:00
Poorna	1ef8babfef	cache: improve error reported for atime check (#14384 )	2022-02-23 11:57:06 -08:00
Poorna	4ea7bf0510	Use custom transport for site replication (#14391 ) Also, ensure that tiering uses a different instance of custom transport	2022-02-23 11:50:40 -08:00
Anis Elleuch	5dcf1d13a9	ci: Always set disks as non root disks (#14389 ) In the testing mode, reformatting disks will fail because the healing code will complain if one disk is in root mode. This commit will automatically set all disks as non-root if MINIO_CI_CD is set.	2022-02-23 10:11:33 -08:00
Shireesh Anjal	94d37d05e5	Apply dynamic config at sub-system level (#14369 ) Currently, when applying any dynamic config, the system reloads and re-applies the config of all the dynamic sub-systems. This PR refactors the code in such a way that changing config of a given dynamic sub-system will work on only that sub-system.	2022-02-22 10:59:28 -08:00
Harshavardhana	0cbdc458c5	fix: do not reload disk format.json on a reconnected disk (#14351 ) An onlineDisk means its a valid disk but it may be a re-connected disk, this PR verifies that based on LastConn() to only trigger MRF. Current code would again re-load the disk 'format.json' which is not necessary and perhaps an unnecessary call. A potential side affect of this is closing perfectly online disks and getting re-replaced by reloading 'format.json'. This PR tries to avoid this situation by making sure MRF is triggered but not reloading 'format.json' because of MRF.	2022-02-21 15:51:54 -08:00
Harshavardhana	65b1a4282e	fix: console logger regression with dynamic logger webhook registration (#14346 ) fixes a regression from #14289	2022-02-17 17:50:10 -08:00
Harshavardhana	af3dc25dfe	align 32bit integers with atomic values in structs (#14344 ) fixes #14341	2022-02-17 15:22:26 -08:00
Krishnan Parthasarathi	5a0c0079a1	Don't add free-version on restore-object (#14340 )	2022-02-17 15:05:19 -08:00
Harshavardhana	af8f563ed3	allow clearing FIFO config as fallback (#14338 ) FIFO is already removed, for users who upgrade are allowed to clear their configs.	2022-02-17 12:49:46 -08:00
Poorna	93af4a4864	Handle non existent kms key correctly (#14329 ) - in PutBucketEncryption API - admin APIs for `mc admin KMS key [create\|info]` - PutObject API when invalid KMS key is specified	2022-02-17 11:36:14 -08:00
Shireesh Anjal	28f188e3ef	Make logger webhook config dynamic (#14289 ) It should not be required to restart the server after setting the logger webhook config.	2022-02-17 11:11:15 -08:00
Harshavardhana	d756da41b9	fix: print gateway banner on removal notice	2022-02-16 20:34:47 -08:00
Krishnan Parthasarathi	cdab4a3b85	Update hourly tier-stats only on succesful tiering (#14330 )	2022-02-16 17:29:12 -08:00
Klaus Post	b88c57ba93	Add fgprof profiles (#14321 ) https://github.com/felixge/fgprof#rocket-fgprof---the-full-go-profiler	2022-02-16 12:00:10 -08:00
Klaus Post	60cd513a33	Fix leaked healing goroutines (#14322 ) Only the first `listAndHeal` would ever be able to write on errCh, blocking all others infinitely. Instead read all errors but return the first non-nil, if any. The intention appears to be that this should cancel on any error, so that part is kept. Regression from #13990	2022-02-16 08:40:18 -08:00
Harshavardhana	03a6e8aee2	fix: creating steep directory structure on trash folder (#14314 ) weird directory structures get created on the '.trash' folder upon server restarts, this PR fixes this.	2022-02-15 16:34:03 -08:00
Anis Elleuch	4afbb89774	nas: Clean stale background appended files (#14295 ) When more than one gateway reads and writes from the same mount point and there is a load balancer pointing to those gateways. Each gateway will try to create its own temporary append file but fails to clear it later when not needed. This commit creates a routine that checks all upload IDs saved in multipart directory and remove any stale entry with the same upload id in the memory and in the temporary background append folder as well.	2022-02-15 09:25:47 -08:00
Klaus Post	5ec57a9533	Add GetObject gzip option (#14226 ) Enabled with `mc admin config set alias/ api gzip_objects=on` Standard filtering applies (1K response minimum, not compressed content type, not range request, gzip accepted by client).	2022-02-14 09:19:01 -08:00

... 2 3 4 5 6 ...

4454 Commits