minio

mirror of https://github.com/minio/minio.git synced 2024-12-26 23:25:54 -05:00

Author	SHA1	Message	Date
Anis Eleuch	f85c28e960	heal: large objects fix and avoid .healing.bin corner case premature exit (#20577 ) xlStorage.Healing() returns nil if there is an error reading .healing.bin or if this latter is empty. healing.bin update() call returns early if .healing.bin is empty; hence, no further update of .healing.bin is possible. A .healing.bin can be empty if os.Open() with O_TRUNC is successful but the next Write returns an error. To avoid this weird situation, avoid making healingTracker.update() to return early if .healing.bin is empty, so write again. This commit also fixes wrong error log printing when an object is healed in another drive in the same erasure set but not in the drive that is actively healing by fresh drive healing code. Currently, it prints <nil> instead of a factual error. * heal: Scan .minio.sys metadata only during site-wide heal (#137) mc admin heal always invoke .minio.sys heal, but sometimes, this latter contains a lot of data, many service accounts, STS accounts etc, which makes mc admin heal command very slow. Only invoke .minio.sys healing when no bucket was specified in `mc admin heal` command.	2024-10-26 02:58:27 -07:00
Harshavardhana	006cacfefb	to turn-off healing drop legacy ENV (#20315 )	2024-08-23 15:43:31 -07:00
Anis Eleuch	a8f143298f	heal: Reset healing params when a retry is decided (#20285 ) Currently, retry healing of a new drive healing does not reset HealedBuckets means that the next healing retry will skip those buckets. The commit will fix this behavior. Also, the skipped objects counter will include objects uploaded that are uploaded after the healing is started.	2024-08-22 05:35:43 -07:00
Anis Eleuch	85c3db3a93	heal: Add finished flag to .healing.bin to avoid removing this latter (#20250 ) Sometimes, we need historical information in .healing.bin, such as the number of expired objects that the healing avoids to heal and that can create drive usage disparency in the same erasure set. For that reason, this commit will not remove .healing.bin anymore and it will have a new field called Finished so we know healing is finished in that drive.	2024-08-20 08:42:49 -07:00
Anis Eleuch	51b1f41518	heal: Persist MRF queue in the disk during shutdown (#19410 )	2024-08-13 15:26:05 -07:00
Anis Eleuch	b7f319b62a	properly reload a fresh drive when found in a failed state during startup (#20145 ) When a drive is in a failed state when a single node multiple drives deployment is started, a replacement of a fresh disk will not be properly healed unless the user restarts the node. Fix this by always adding the new fresh disk to globalLocalDrivesMap. Also remove globalLocalDrives for simplification, a map to store local node drives can still be used since the order of local drives of a node is not defined.	2024-07-24 16:30:33 -07:00
Anis Eleuch	2ec1f404ac	info: Always refresh the root disk status (#20023 ) Add root drive status in the disk info cache function, so unmounting a drive without restarting a local node reflects the correct value.	2024-07-02 13:41:29 -07:00
Harshavardhana	22c5a5b91b	add healing retries when there are failed heal attempts (#19986 ) transient errors for long running tasks are normal, allow for drive to retry again upto 3 times before giving up on healing the drive.	2024-06-25 10:32:56 -07:00
Klaus Post	2f9018f03b	Do regular checks for healing status while scanning (#19946 )	2024-06-18 09:11:04 -07:00
Harshavardhana	bbb64eaade	skip healing properly in the scanner when a drive is hotplugged (#19939 ) skip healing properly in scanner when drive is hotplugged due to how the state is passed around the SkipHealing might not be the true state() of the system always, causing a situation where we might healing from the scanner on the same drive which is being. Due to this competing heals get triggered that slow each other down.	2024-06-17 16:39:11 -07:00
Anis Eleuch	d274566463	race: Fix rare race detected by testing (#19872 ) Below is the race warning: ``` WARNING: DATA RACE Write at 0x00c02d3d27c0 by goroutine 1210: github.com/minio/minio/cmd.(healingTracker).bucketDone() github.com/minio/minio/cmd/background-newdisks-heal-ops.go:273 +0x13a github.com/minio/minio/cmd.(erasureObjects).healErasureSet() github.com/minio/minio/cmd/global-heal.go:525 +0x2158 github.com/minio/minio/cmd.healFreshDisk() github.com/minio/minio/cmd/background-newdisks-heal-ops.go:450 +0x107e github.com/minio/minio/cmd.monitorLocalDisksAndHeal.func1() github.com/minio/minio/cmd/background-newdisks-heal-ops.go:528 +0x150 github.com/minio/minio/cmd.monitorLocalDisksAndHeal.gowrap2() github.com/minio/minio/cmd/background-newdisks-heal-ops.go:538 +0x82 Previous read at 0x00c02d3d27c0 by goroutine 1446: github.com/minio/minio/cmd.(*erasureObjects).healErasureSet.func5() github.com/minio/minio/cmd/global-heal.go:232 +0xfd ```	2024-06-04 08:12:32 -07:00
Anis Eleuch	1277ad69a6	heal: Remove .healing.bin when all ES drives are healing (#19846 ) In the very rare case when all drives in a erasure set need to be healed, remove .healing.bin from all drives, otherwise it will be stuck in a loop Also, fix a unit test that fails sometimes due to wrong test.	2024-05-31 07:48:50 -07:00
Aditya Manthramurthy	5f78691fcf	ldap: Add user DN attributes list config param (#19758 ) This change uses the updated ldap library in minio/pkg (bumped up to v3). A new config parameter is added for LDAP configuration to specify extra user attributes to load from the LDAP server and to store them as additional claims for the user. A test is added in sts_handlers.go that shows how to access the LDAP attributes as a claim. This is in preparation for adding SSH pubkey authentication to MinIO's SFTP integration.	2024-05-24 16:05:23 -07:00
Anis Eleuch	95bf4a57b6	logging: Add subsystem to log API (#19002 ) Create new code paths for multiple subsystems in the code. This will make maintaing this easier later. Also introduce bugLogIf() for errors that should not happen in the first place.	2024-04-04 05:04:40 -07:00
Harshavardhana	f8696cc8f6	fallback to globalLocalDrives for non-distributed setups	2024-02-28 14:56:08 -08:00
Anis Eleuch	2bdb9511bd	heal: Add skipped objects to the heal summary (#19142 ) New disk healing code skips/expires objects that ILM supposed to expire. Add more visibility to the user about this activity by calculating those objects and print it at the end of healing activity.	2024-02-28 09:05:40 -08:00
Harshavardhana	9a012a53ef	initialize the disk healer early on (#19143 ) This PR fixes a bug that perhaps has been long introduced, with no visible workarounds. In any deployment, if an entire erasure set is deleted, there is no way the cluster recovers.	2024-02-27 23:02:14 -08:00
Harshavardhana	b6e98aed01	fix: found races in accessing globalLocalDrives (#19069 ) make a copy before accessing globalLocalDrives Bonus: update console v0.46.0 Signed-off-by: Harshavardhana <harsha@minio.io>	2024-02-16 17:15:57 -08:00
Anis Eleuch	68dde2359f	log: Add logger.Event to send to console and other logger targets (#19060 ) Add a new function logger.Event() to send the log to Console and http/kafka log webhooks. This will include some internal events such as disk healing and rebalance/decommissioning	2024-02-15 15:13:30 -08:00
Harshavardhana	32e668eb94	update() stale rebalance stats() object during pool expansion (#18882 ) it is entirely possible that a rebalance process which was running when it was asked to "stop" it failed to write its last statistics to the disk. After this a pool expansion can cause disruption and all S3 API calls would fail at IsPoolRebalancing() function. This PRs makes sure that we update rebalance.bin under such conditions to avoid any runtime crashes.	2024-01-27 10:14:03 -08:00
Harshavardhana	52229a21cb	avoid reload of 'format.json' over the network under normal conditions (#18842 )	2024-01-23 14:11:46 -08:00
Harshavardhana	a50ea92c64	feat: introduce list_quorum="auto" to prefer quorum drives (#18084 ) NOTE: This feature is not retro-active; it will not cater to previous transactions on existing setups. To enable this feature, please set ` _MINIO_DRIVE_QUORUM=on` environment variable as part of systemd service or k8s configmap. Once this has been enabled, you need to also set `list_quorum`. ``` ~ mc admin config set alias/ api list_quorum=auto` ``` A new debugging tool is available to check for any missing counters.	2023-12-29 15:52:41 -08:00
Harshavardhana	b3314e97a6	re-use the same local drive used by remote-peer (#18645 ) historically, we have always kept storage-rest-server and a local storage API separate without much trouble, since they both can independently operate due to no special state() between them. however, over some time, we have added state() such as - drive monitoring threads now there will be "2" of them per drive instead of just 1. - concurrent tokens available per drive are now twice instead of just single shared, allowing unexpectedly high amount of I/O to go through. - applying serialization by using walkMutexes can now be adequately honored for both remote callers and local callers.	2023-12-13 19:27:55 -08:00
Anis Eleuch	b7d11141e1	rename Force to Immediate for clarity (#18540 )	2023-11-28 22:35:16 -08:00
Harshavardhana	9878031cfd	fix: change DISK_ to DRIVE_ for some drive related envs (#18005 )	2023-09-11 12:19:22 -07:00
Aditya Manthramurthy	1c99fb106c	Update to minio/pkg/v2 (#17967 )	2023-09-04 12:57:37 -07:00
Harshavardhana	24e86d0c59	avoid passing around poolIdx, setIdx instead pass the relevant disks (#17660 )	2023-07-17 09:52:05 -07:00
Aditya Manthramurthy	5a1612fe32	Bump up madmin-go and pkg deps (#17469 )	2023-06-19 17:53:08 -07:00
Harshavardhana	1443b5927a	allow quorum fileInfo to pick same parityBlocks (#17454 ) Bonus: allow replication to proceed for 503 errors such as with error code SlowDownRead	2023-06-18 18:20:15 -07:00
jiuker	bd2dc6c670	fix: in healing tracker printTo when err (#17207 )	2023-05-15 10:14:48 -07:00
Anis Eleuch	224d9a752f	fix: the race in healing tracker code (#17048 )	2023-04-18 14:49:56 -07:00
Harshavardhana	4c5edacae2	ignore operation timedout errors (#16891 )	2023-03-26 03:16:51 -07:00
Klaus Post	628042e65e	tests: Protect globalLocalDrives against races (#16800 )	2023-03-13 06:04:20 -07:00
dorman	4d7c8e3bb8	Remove redundant log (#16710 ) Co-authored-by: z30001483 <zekaifeng2@huawei.com>	2023-02-27 09:59:47 -08:00
Harshavardhana	bfedea9bad	fix: disk healing should honor the right pool/set index (#16712 )	2023-02-27 04:55:32 -08:00
Anis Elleuch	857674c3a0	heal: Do not mark buckets as done when there is no online disks (#16621 )	2023-02-14 12:50:13 -08:00
Anis Elleuch	beb1924437	Properly restart fresh disk healing when failed in some places (#16413 )	2023-01-14 05:06:46 +05:30
Anis Elleuch	0333412148	fix: heal only once per disk per set among multiple disks (#16358 )	2023-01-05 20:41:19 -08:00
Anis Elleuch	acc9c033ed	debug: Add X-Amz-Request-ID to lock/unlock calls (#16309 )	2022-12-23 19:49:07 -08:00
Aditya Manthramurthy	a30cfdd88f	Bump up madmin-go to v2 (#16162 )	2022-12-06 13:46:50 -08:00
jiuker	ce53d7f6c2	add disk.Close() in healFreshDisk to indicate idiomatic flow of code (#16124 )	2022-11-26 00:26:15 -08:00
Anis Elleuch	533c9d4fe3	fix: lockName to disallow parallel same erasure set healing (#15951 )	2022-10-26 12:43:54 -07:00
ebozduman	b57e7321e7	Replaces 'disk'=>'drive' visible to end user (#15464 )	2022-08-04 16:10:08 -07:00
Poorna	426c902b87	site replication: fix healing of bucket deletes. (#15377 ) This PR changes the handling of bucket deletes for site replicated setups to hold on to deleted bucket state until it syncs to all the clusters participating in site replication.	2022-07-25 17:51:32 -07:00
Harshavardhana	e7ac1ea54c	allow decommission to continue when healing (#15312 ) Bonus: - heal buckets in-case during startup the new pools have bucket missing.	2022-07-15 21:03:23 -07:00
Praveen raj Mani	b49fc33cb3	purge objects immediately with `x-minio-force-delete` in DeleteObject and DeleteBucket API (#15148 )	2022-07-11 09:15:54 -07:00
Anis Elleuch	b3eda248a3	Parallelize new disks healing of different erasure sets (#15112 ) - Always reformat all disks when a new disk is detected, this will ensure new uploads to be written in new fresh disks - Always heal all buckets first when an erasure set started to be healed - Use a lock to prevent two disks belonging to different nodes but in the same erasure set to be healed in parallel - Heal different sets in parallel Bonus: - Avoid logging errUnformattedDisk when a new fresh disk is inserted but not detected by healing mechanism yet (10 seconds lag)	2022-06-21 07:53:55 -07:00
Harshavardhana	6cfb1cb6fd	fix: timer usage across codebase (#14935 ) it seems in some places we have been wrongly using the timer.Reset() function, nicely exposed by an example shared by @donatello https://go.dev/play/p/qoF71_D1oXD this PR fixes all the usage comprehensively	2022-05-17 22:42:59 -07:00
Harshavardhana	41079f1015	heal: remove blocking healDiskMeta upon startup (#14514 ) This type of code is not necessary, read's of all metadata content at `.minio.sys/config` automatically triggers healing when necessary in the GetObjectNInfo() call-path. Having this code is not useful and this also adds to the overall startup time of MinIO when there are lots of users and policies.	2022-03-10 02:45:14 -08:00
Harshavardhana	0e3bafcc54	improve logs, fix banner formatting (#14456 )	2022-03-03 13:21:16 -08:00

1 2 3

106 Commits