minio

Commit Graph

Author	SHA1	Message	Date
Anis Eleuch	a8f143298f	heal: Reset healing params when a retry is decided (#20285 ) Currently, retry healing of a new drive healing does not reset HealedBuckets means that the next healing retry will skip those buckets. The commit will fix this behavior. Also, the skipped objects counter will include objects uploaded that are uploaded after the healing is started.	2024-08-22 05:35:43 -07:00
Anis Eleuch	85c3db3a93	heal: Add finished flag to .healing.bin to avoid removing this latter (#20250 ) Sometimes, we need historical information in .healing.bin, such as the number of expired objects that the healing avoids to heal and that can create drive usage disparency in the same erasure set. For that reason, this commit will not remove .healing.bin anymore and it will have a new field called Finished so we know healing is finished in that drive.	2024-08-20 08:42:49 -07:00
Klaus Post	3ffeabdfcb	Fix govet+staticcheck issues (#20263 ) This is better: https://github.com/golang/go/issues/60529	2024-08-14 10:11:51 -07:00
Harshavardhana	a9dc061d84	count metrics properly for any failures during drive heal (#20193 ) or via `mc admin heal --set 1 --pool 1`	2024-07-30 22:46:26 -07:00
Anis Eleuch	ce183cb2b4	heal: List and heal again for any listing error (#19999 ) When a fresh drive healing is finished, add more checks for the drive listing errors. If any, re-list and heal again. Although this is an infrequent use case to have listPathRaw() returning nil when minDisks is set to 1, we still need to handle all possible use cases to avoid missing healing any object. Also, check for HealObject result to decide of an object is healed in the fresh disk since HealObject returns nil if an object is healed in any disk, and not in the new fresh drive.	2024-07-10 09:55:36 -07:00
Harshavardhana	7bd1d899bc	remove overzealous check during HEAD() (#19940 ) due to a historic bug in CopyObject() where an inlined object loses its metadata, the check causes an incorrect fallback verifying data-dir. CopyObject() bug was fixed in `ffa91f9794` however the occurrence of this problem is historic, so the aforementioned check is stretching too much. Bonus: simplify fileInfoRaw() to read xl.json as well, also recreate buckets properly.	2024-06-17 07:29:18 -07:00
Anis Eleuch	1277ad69a6	heal: Remove .healing.bin when all ES drives are healing (#19846 ) In the very rare case when all drives in a erasure set need to be healed, remove .healing.bin from all drives, otherwise it will be stuck in a loop Also, fix a unit test that fails sometimes due to wrong test.	2024-05-31 07:48:50 -07:00
Aditya Manthramurthy	5f78691fcf	ldap: Add user DN attributes list config param (#19758 ) This change uses the updated ldap library in minio/pkg (bumped up to v3). A new config parameter is added for LDAP configuration to specify extra user attributes to load from the LDAP server and to store them as additional claims for the user. A test is added in sts_handlers.go that shows how to access the LDAP attributes as a claim. This is in preparation for adding SSH pubkey authentication to MinIO's SFTP integration.	2024-05-24 16:05:23 -07:00
Harshavardhana	b534dc69ab	deprecate unexpected healing failed counters (#19705 ) simplify this to avoid verbose metrics, and make room for valid metrics to be reported for alerting etc.	2024-05-09 11:04:41 -07:00
Harshavardhana	3549e583a6	results must be a single channel to avoid overwriting `healing.bin` (#19702 )	2024-05-09 10:15:03 -07:00
Anis Eleuch	135874ebdc	heal: Avoid marking a bucket as done when remote drives are offline (#19587 )	2024-04-25 23:32:14 -07:00
Anis Eleuch	9a3c992d7a	heal: Fix regression in healing a new fresh drive (#19615 )	2024-04-25 14:55:41 -07:00
Harshavardhana	9693c382a8	make renameData() more defensive during overwrites (#19548 ) instead upon any error in renameData(), we still preserve the existing dataDir in some form for recoverability in strange situations such as out of disk space type errors. Bonus: avoid running list and heal() instead allow versions disparity to return the actual versions, uuid to heal. Currently limit this to 100 versions and lesser disparate objects. an undo now reverts back the xl.meta from xl.meta.bkp during overwrites on such flaky setups. Bonus: Save N depth syscalls via skipping the parents upon overwrites and versioned updates. Flaky setup examples are stretch clusters with regular packet drops etc, we need to add some defensive code around to avoid dangling objects.	2024-04-23 10:15:52 -07:00
Harshavardhana	95c65f4e8f	do not panic on rebalance during server restarts (#19563 ) This PR makes a feasible approach to handle all the scenarios that we must face to avoid returning "panic." Instead, we must return "errServerNotInitialized" when a bucketMetadataSys.Get() is called, allowing the caller to retry their operation and wait. Bonus fix the way data-usage-cache stores the object. Instead of storing usage-cache.bin with the bucket as `.minio.sys/buckets`, the `buckets` must be relative to the bucket `.minio.sys` as part of the object name. Otherwise, there is no way to decommission entries at `.minio.sys/buckets` and their final erasure set positions. A bucket must never have a `/` in it. Adds code to read() from existing data-usage.bin upon upgrade.	2024-04-22 10:49:30 -07:00
Harshavardhana	04101d472f	fix: add fallbackDisks for disk healing (#19425 )	2024-04-08 02:22:13 -07:00
Anis Eleuch	95bf4a57b6	logging: Add subsystem to log API (#19002 ) Create new code paths for multiple subsystems in the code. This will make maintaing this easier later. Also introduce bugLogIf() for errors that should not happen in the first place.	2024-04-04 05:04:40 -07:00
Anis Eleuch	2bdb9511bd	heal: Add skipped objects to the heal summary (#19142 ) New disk healing code skips/expires objects that ILM supposed to expire. Add more visibility to the user about this activity by calculating those objects and print it at the end of healing activity.	2024-02-28 09:05:40 -08:00
Harshavardhana	9a012a53ef	initialize the disk healer early on (#19143 ) This PR fixes a bug that perhaps has been long introduced, with no visible workarounds. In any deployment, if an entire erasure set is deleted, there is no way the cluster recovers.	2024-02-27 23:02:14 -08:00
Anis Eleuch	68dde2359f	log: Add logger.Event to send to console and other logger targets (#19060 ) Add a new function logger.Event() to send the log to Console and http/kafka log webhooks. This will include some internal events such as disk healing and rebalance/decommissioning	2024-02-15 15:13:30 -08:00
Harshavardhana	1d3bd02089	avoid close 'nil' panics if any (#18890 ) brings a generic implementation that prints a stack trace for 'nil' channel closes(), if not safely closes it.	2024-01-28 10:04:17 -08:00
Harshavardhana	74851834c0	further bootstrap/startup optimization for reading 'format.json' (#18868 ) - Move RenameFile to websockets - Move ReadAll that is primarily is used for reading 'format.json' to to websockets - Optimize DiskInfo calls, and provide a way to make a NoOp DiskInfo call.	2024-01-25 12:45:46 -08:00
Harshavardhana	dd2542e96c	add codespell action (#18818 ) Original work here, #18474, refixed and updated.	2024-01-17 23:03:17 -08:00
Shubhendu	e31081d79d	Heal buckets at node level (#18612 ) Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-01-09 20:34:04 -08:00
Anis Eleuch	8432fd5ac2	prom: Add online and healing drives metrics per erasure set (#18700 )	2023-12-21 16:56:43 -08:00
Harshavardhana	e30c0e7ca3	Revert "Heal buckets at node level (#18504 )" This reverts commit `708296ae1b`.	2023-12-05 22:34:46 -08:00
Shubhendu	708296ae1b	Heal buckets at node level (#18504 )	2023-12-05 02:17:35 -08:00
Harshavardhana	109a9e3f35	skip ILM expired objects from healing (#18569 )	2023-12-01 07:56:24 -08:00
Klaus Post	5f971fea6e	Fix Mux Connect Error (#18567 ) `OpMuxConnectError` was not handled correctly. Remove local checks for single request handlers so they can run before being registered locally. Bonus: Only log IAM bootstrap on startup.	2023-12-01 00:18:04 -08:00
Harshavardhana	21ecb941fe	fix: avoid counting out of band deletes during disk heal (#18205 )	2023-10-10 14:39:48 -07:00
Anis Eleuch	41de53996b	heal: calculate the number of workers based on NRRequests (#17945 )	2023-09-11 14:48:54 -07:00
Aditya Manthramurthy	1c99fb106c	Update to minio/pkg/v2 (#17967 )	2023-09-04 12:57:37 -07:00
Harshavardhana	c45bc32d98	skip disks under scanning when healing disks (#17822 ) Bonus: - avoid calling DiskInfo() calls when missing blocks instead heal the object using MRF operation. - change the max_sleep to 250ms beyond that we will not stop healing.	2023-08-09 12:51:47 -07:00
Aditya Manthramurthy	5a1612fe32	Bump up madmin-go and pkg deps (#17469 )	2023-06-19 17:53:08 -07:00
Anis Eleuch	9ef7eda33a	heal: Avoid objects created after the heal disk start time (#17323 )	2023-05-31 13:10:45 -07:00
Harshavardhana	84f31ed45d	simplify MRF, converge it to regular healing (#17026 )	2023-04-19 07:47:42 -07:00
Anis Eleuch	224d9a752f	fix: the race in healing tracker code (#17048 )	2023-04-18 14:49:56 -07:00
Poorna	a9269cee29	heal: avoid logging version not found (#17031 )	2023-04-13 19:45:52 -07:00
Harshavardhana	bfedea9bad	fix: disk healing should honor the right pool/set index (#16712 )	2023-02-27 04:55:32 -08:00
Klaus Post	84bb7d05a9	fix: healing deadlocks and ordering (#16643 )	2023-02-17 23:22:43 +05:30
Anis Elleuch	857674c3a0	heal: Do not mark buckets as done when there is no online disks (#16621 )	2023-02-14 12:50:13 -08:00
Anis Elleuch	b1d98febfd	New disk healing goes through the healing workers (#16568 )	2023-02-08 09:25:29 -08:00
Harshavardhana	d08e3cc895	add a way to avoid blocking queueHealTask() depending on caller (#16433 )	2023-01-19 18:50:54 +05:30
Anis Elleuch	d98116559b	Use async healing in PutObject call (#16431 )	2023-01-19 00:54:22 -08:00
Anis Elleuch	3039fd4519	Optimize background heal status to use LocalStorageInfo (#16414 )	2023-01-17 05:02:00 +05:30
Aditya Manthramurthy	a30cfdd88f	Bump up madmin-go to v2 (#16162 )	2022-12-06 13:46:50 -08:00
Harshavardhana	5a8df7efb3	re-implement StorageInfo to be a peer call (#16155 )	2022-12-01 14:31:35 -08:00
Klaus Post	cc1d8f0057	Check for abandoned data when healing (#16122 )	2022-11-28 10:20:55 -08:00
Krishnan Parthasarathi	96bfa77856	serialize updates to healing tracker (#15647 ) When healing is parallelized by setting the ` _MINIO_HEAL_WORKERS` environment variable, multiple goroutines may race while updating the disk's healing tracker. This change serializes only these concurrent updates using a channel. Note, the healing tracker is still not concurrency safe in other contexts.	2022-09-07 08:47:21 -07:00
Krishnan Parthasarathi	99fbfe2421	Add concurrency to healing objects on a fresh disk (#15575 )	2022-08-25 13:07:15 -07:00
ebozduman	b57e7321e7	Replaces 'disk'=>'drive' visible to end user (#15464 )	2022-08-04 16:10:08 -07:00

1 2 3

118 Commits