minio

mirror of https://github.com/minio/minio.git synced 2024-12-26 15:15:55 -05:00

Author	SHA1	Message	Date
Harshavardhana	9693c382a8	make renameData() more defensive during overwrites (#19548 ) instead upon any error in renameData(), we still preserve the existing dataDir in some form for recoverability in strange situations such as out of disk space type errors. Bonus: avoid running list and heal() instead allow versions disparity to return the actual versions, uuid to heal. Currently limit this to 100 versions and lesser disparate objects. an undo now reverts back the xl.meta from xl.meta.bkp during overwrites on such flaky setups. Bonus: Save N depth syscalls via skipping the parents upon overwrites and versioned updates. Flaky setup examples are stretch clusters with regular packet drops etc, we need to add some defensive code around to avoid dangling objects.	2024-04-23 10:15:52 -07:00
Harshavardhana	1173b26fc8	avoid triggering heals on metacache files if any (#19299 )	2024-03-19 20:21:15 -07:00
Harshavardhana	9a012a53ef	initialize the disk healer early on (#19143 ) This PR fixes a bug that perhaps has been long introduced, with no visible workarounds. In any deployment, if an entire erasure set is deleted, there is no way the cluster recovers.	2024-02-27 23:02:14 -08:00
Harshavardhana	f25cbdf43c	use all the available nr_requests for NVMe (#18920 )	2024-01-30 14:10:06 -08:00
Harshavardhana	196e7e072b	allow bitrot files to be healed in MRF (#18618 ) bitrot scanMode was ignored in MRF, allow it to heal relevant content if needed when seen as an error.	2023-12-08 12:26:01 -08:00
Harshavardhana	8fdfcfb562	upon RenameData() quorum error delete any partial success (#18586 ) there is potential for danglingWrites when quorum failed, where only some drives took a successful write, generally this is left to the healing routine to pick it up. However it is better that we delete it right away to avoid potential for quorum issues on version signature when there are many versions of an object.	2023-12-04 11:33:39 -08:00
Harshavardhana	1f8b9b4bd5	fix: do not listAndHeal() inline with PutObject() (#17499 ) there is a possibility that slow drives can actually add latency to the overall call, leading to a large spike in latency. this can happen if there are other parallel listObjects() calls to the same drive, in-turn causing each other to sort of serialize. this potentially improves performance and makes PutObject() also non-blocking.	2023-06-24 19:31:04 -07:00
Aditya Manthramurthy	5a1612fe32	Bump up madmin-go and pkg deps (#17469 )	2023-06-19 17:53:08 -07:00
Harshavardhana	84f31ed45d	simplify MRF, converge it to regular healing (#17026 )	2023-04-19 07:47:42 -07:00
Aditya Manthramurthy	a30cfdd88f	Bump up madmin-go to v2 (#16162 )	2022-12-06 13:46:50 -08:00
Harshavardhana	37e3f5de10	do not print object not found errors in MRF healing (#15646 )	2022-09-02 14:22:40 -07:00
Anis Elleuch	01e5632949	mrf: Fix stale MRF data showed in heal info (#14953 ) One usee reported having mc admin heal status output ETA increasing by time. It turned out it is MRF that is not clearing its data due to a bug in the code. pendingItems is increased when an object is queued to be healed but never decreasd when there is a healing error. This commit will decrease pendingItems and pendingBytes even when there is an error to give accurate reporting.	2022-05-20 07:33:18 -07:00
Aditya Manthramurthy	9aadd725d2	Avoid calling .Reset() on active timer (#14941 ) .Reset() documentation states: For a Timer created with NewTimer, Reset should be invoked only on stopped or expired timers with drained channels. This change is just to comply with this requirement as there might be some runtime dependent situation that might lead to unexpected behavior.	2022-05-18 15:37:58 -07:00
Anis Elleuch	16431d222c	heal: Enable periodic bitrot scan configuration (#14464 )	2022-04-07 08:10:40 -07:00
Harshavardhana	af3dc25dfe	align 32bit integers with atomic values in structs (#14344 ) fixes #14341	2022-02-17 15:22:26 -08:00
Harshavardhana	f527c708f2	run gofumpt cleanup across code-base (#14015 )	2022-01-02 09:15:06 -08:00
Harshavardhana	a19e3bc9d9	add more dangling heal related tests (#13140 ) also make sure that HealObject() never returns 'ObjectNotFound' or 'VersionNotFound' errors, as those are meaningless and not useful for the caller.	2021-09-02 20:56:13 -07:00
Harshavardhana	c11a2ac396	refactor healing to remove certain structs (#13079 ) - remove sourceCh usage from healing we already have tasks and resp channel - use read locks to lookup globalHealConfig - fix healing resolver to pick candidates quickly that need healing, without this resolver was unexpectedly skipping.	2021-08-26 14:06:04 -07:00
Harshavardhana	0559f46bbb	fix: make healObject() make non-blocking (#13071 ) healObject() should be non-blocking to ensure that scanner is not blocked for a long time, this adversely affects performance of the scanner and also affects the way usage is updated subsequently. This PR allows for a non-blocking behavior for healing, dropping operations that cannot be queued anymore.	2021-08-25 17:46:20 -07:00
Anis Elleuch	39874b77ed	mrf: Avoid rare data race and more simplification (#12791 ) This change avoids a rare data race and simplify the function that returns MRF last activity information.	2021-07-26 08:00:59 -07:00
Anis Elleuch	b0b4696a64	heal: Add MRF metrics to background heal API response (#12398 ) This commit gathers MRF metrics from all nodes in a cluster and return it to the caller. This will show information about the number of objects in the MRF queues waiting to be healed.	2021-07-15 22:32:06 -07:00

21 Commits