minio

Commit Graph

Author	SHA1	Message	Date
Harshavardhana	eac4e4b279	honor replaced disk properly by updating globalLocalDrives (#19038 ) globalLocalDrives seem to be not updated during the HealFormat() leads to a requirement where the server needs to be restarted for the healing to continue.	2024-02-12 13:00:20 -08:00
Harshavardhana	80ca120088	remove checkBucketExist check entirely to avoid fan-out calls (#18917 ) Each Put, List, Multipart operations heavily rely on making GetBucketInfo() call to verify if bucket exists or not on a regular basis. This has a large performance cost when there are tons of servers involved. We did optimize this part by vectorizing the bucket calls, however its not enough, beyond 100 nodes and this becomes fairly visible in terms of performance.	2024-01-30 12:43:25 -08:00
Harshavardhana	486e2e48ea	enable xattr capture by default (#18911 ) - healing must not set the write xattr because that is the job of active healing to update. what we need to preserve is permanent deletes. - remove older env for drive monitoring and enable it accordingly, as a global value.	2024-01-29 23:03:58 -08:00
Harshavardhana	1d3bd02089	avoid close 'nil' panics if any (#18890 ) brings a generic implementation that prints a stack trace for 'nil' channel closes(), if not safely closes it.	2024-01-28 10:04:17 -08:00
Harshavardhana	74851834c0	further bootstrap/startup optimization for reading 'format.json' (#18868 ) - Move RenameFile to websockets - Move ReadAll that is primarily is used for reading 'format.json' to to websockets - Optimize DiskInfo calls, and provide a way to make a NoOp DiskInfo call.	2024-01-25 12:45:46 -08:00
Harshavardhana	e377bb949a	migrate bootstrap logic directly to websockets (#18855 ) improve performance for startup sequences by 2x for 300+ nodes.	2024-01-24 13:36:44 -08:00
Harshavardhana	52229a21cb	avoid reload of 'format.json' over the network under normal conditions (#18842 )	2024-01-23 14:11:46 -08:00
Harshavardhana	a0e1163fb6	reject reference format from a different deployment (#18800 ) reference format is constant for any lifetime of a minio cluster, we do not have to ever replace it during HealFormat() as it will never change. additionally we should simply reject reference formats that we do not understand early on.	2024-01-16 15:13:14 -08:00
Harshavardhana	e5c8794b8b	avoid disk monitoring leaks under various conditions (#18777 ) - HealFormat() was leaking healthcheck goroutines for disks, we are only interested in enabling healthcheck for the newly formatted disk, not for existing disks. - When disk is a root-disk a random disk monitor was leaking while we ignored the drive. - When loading the disk for each erasure set, we were leaking goroutines for the prepare-storage.go disks which were replaced via the globalLocalDrives slice - avoid disk monitoring utilizing health tokens that would cause exhaustion in the tokens, prematurely which were meant for incoming I/O. This is ensured by avoiding writing O_DIRECT aligned buffer instead write 2048 worth of content only as O_DSYNC, which is sufficient.	2024-01-12 01:48:36 -08:00
Shubhendu	e31081d79d	Heal buckets at node level (#18612 ) Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-01-09 20:34:04 -08:00
Harshavardhana	a50ea92c64	feat: introduce list_quorum="auto" to prefer quorum drives (#18084 ) NOTE: This feature is not retro-active; it will not cater to previous transactions on existing setups. To enable this feature, please set ` _MINIO_DRIVE_QUORUM=on` environment variable as part of systemd service or k8s configmap. Once this has been enabled, you need to also set `list_quorum`. ``` ~ mc admin config set alias/ api list_quorum=auto` ``` A new debugging tool is available to check for any missing counters.	2023-12-29 15:52:41 -08:00
Harshavardhana	5b2ced0119	re-use globalLocalDrives properly (#18721 )	2023-12-29 09:30:10 -08:00
Anis Eleuch	8432fd5ac2	prom: Add online and healing drives metrics per erasure set (#18700 )	2023-12-21 16:56:43 -08:00
Harshavardhana	7c948adf88	allow pre-allocating buffers to reduce frequent GCs during growth (#18686 ) This PR also increases per node bpool memory from 1024 entries to 2048 entries; along with that, it also moves the byte pool centrally instead of being per pool.	2023-12-21 08:59:38 -08:00
Harshavardhana	b3314e97a6	re-use the same local drive used by remote-peer (#18645 ) historically, we have always kept storage-rest-server and a local storage API separate without much trouble, since they both can independently operate due to no special state() between them. however, over some time, we have added state() such as - drive monitoring threads now there will be "2" of them per drive instead of just 1. - concurrent tokens available per drive are now twice instead of just single shared, allowing unexpectedly high amount of I/O to go through. - applying serialization by using walkMutexes can now be adequately honored for both remote callers and local callers.	2023-12-13 19:27:55 -08:00
Harshavardhana	e30c0e7ca3	Revert "Heal buckets at node level (#18504 )" This reverts commit `708296ae1b`.	2023-12-05 22:34:46 -08:00
Shubhendu	708296ae1b	Heal buckets at node level (#18504 )	2023-12-05 02:17:35 -08:00
Klaus Post	2229509362	fix: leaking offline disks in MarkOffline() thread (#18414 ) `monitorAndConnectEndpoints` will continue to attempt to reconnect offline disks. Since disks were never closed, a `MarkOffline` would continue to try to check these disks forever. Close previous disks.	2023-11-09 09:33:32 -08:00
Aditya Manthramurthy	1c99fb106c	Update to minio/pkg/v2 (#17967 )	2023-09-04 12:57:37 -07:00
Harshavardhana	eb55034dfe	optimize deletePrefix, use direct set location via object name (#17827 ) * optimize deletePrefix, use direct set location via object name instead of fanning out the calls for an object force delete we can assume the set location and not do fan-out calls * Apply suggestions from code review Co-authored-by: Krishnan Parthasarathi <krisis@users.noreply.github.com> --------- Co-authored-by: Krishnan Parthasarathi <krisis@users.noreply.github.com>	2023-08-09 16:30:22 -07:00
Harshavardhana	b0f0e53bba	fix: make sure to correctly initialize health checks (#17765 ) health checks were missing for drives replaced since - HealFormat() would replace the drives without a health check - disconnected drives when they reconnect via connectEndpoint() the loop also loses health checks for local disks and merges these into a single code. - other than this separate cleanUp, health check variables to avoid overloading them with similar requirements. - also ensure that we compete via context selector for disk monitoring such that the canceled disks don't linger around longer waiting for the ticker to trigger. - allow disabling active monitoring.	2023-08-01 10:54:26 -07:00
Harshavardhana	81be718674	fix: optimize DiskInfo() call avoid metrics when not needed (#17763 )	2023-07-31 15:20:48 -07:00
Aditya Manthramurthy	5a1612fe32	Bump up madmin-go and pkg deps (#17469 )	2023-06-19 17:53:08 -07:00
Klaus Post	c839b64f6a	fix: compressed+encrypted block overhead (#17289 )	2023-05-26 10:57:07 -07:00
jiuker	f037c9b286	Protecting the read index is not out of bounds (#17226 )	2023-05-17 12:09:41 -07:00
Praveen raj Mani	72802a5972	Use 'minio/pkg/sync/errgroup' and 'minio/pkg/workers' (#17069 )	2023-04-25 22:57:40 -07:00
Harshavardhana	84f31ed45d	simplify MRF, converge it to regular healing (#17026 )	2023-04-19 07:47:42 -07:00
Harshavardhana	6825bd7e75	fix: inlined objects don't need to honor long locks (#17039 )	2023-04-17 12:16:37 -07:00
Poorna	d1e775313d	support decommissioning of tiered objects (#16751 )	2023-03-16 07:48:05 -07:00
Harshavardhana	b4ef5ff294	remove unnecessary code checking for supported features (#16423 )	2023-01-17 19:37:47 +05:30
Anis Elleuch	2146ed4033	xl: Quit early when EC config is incorrect (#16390 ) Co-authored-by: Anis Elleuch <anis@min.io>	2023-01-09 23:07:45 -08:00
Anis Elleuch	0333412148	fix: heal only once per disk per set among multiple disks (#16358 )	2023-01-05 20:41:19 -08:00
Harshavardhana	a15a2556c3	converge listBuckets() as a peer call (#16346 )	2023-01-03 23:39:40 -08:00
Harshavardhana	f1bbb7fef5	vectorize cluster-wide calls such as bucket operations (#16313 )	2023-01-03 08:16:39 -08:00
Harshavardhana	f93183f66e	fix: a deadlock by refactoring listBuckets() under site replication (#16323 )	2022-12-29 00:08:31 -08:00
Harshavardhana	b882310e2b	avoid locks for internal and invalid buckets in MakeBucket() (#16302 )	2022-12-23 07:46:00 -08:00
Aditya Manthramurthy	a30cfdd88f	Bump up madmin-go to v2 (#16162 )	2022-12-06 13:46:50 -08:00
Harshavardhana	5a8df7efb3	re-implement StorageInfo to be a peer call (#16155 )	2022-12-01 14:31:35 -08:00
Klaus Post	cc1d8f0057	Check for abandoned data when healing (#16122 )	2022-11-28 10:20:55 -08:00
Klaus Post	98ba622679	Reduce temporary file clean-up waits (#16110 )	2022-11-22 07:23:36 -08:00
Klaus Post	ecc932d5dd	Clean entire tmp-old on restart (#15979 )	2022-10-31 07:27:50 -07:00
Harshavardhana	23b329b9df	remove gateway completely (#15929 )	2022-10-24 17:44:15 -07:00
Klaus Post	3c605c93fe	warn when 0 parity has been set as default parity (#15790 )	2022-10-04 22:41:42 -07:00
Klaus Post	a9f1ad7924	Add extended checksum support (#15433 )	2022-08-29 16:57:16 -07:00
ebozduman	b57e7321e7	Replaces 'disk'=>'drive' visible to end user (#15464 )	2022-08-04 16:10:08 -07:00
Poorna	426c902b87	site replication: fix healing of bucket deletes. (#15377 ) This PR changes the handling of bucket deletes for site replicated setups to hold on to deleted bucket state until it syncs to all the clusters participating in site replication.	2022-07-25 17:51:32 -07:00
Anis Elleuch	ed0cbfb31e	fix: rootdisk detection by not using cached value when GetDiskInfo() errors out (#15249 ) GetDiskInfo() uses timedValue to cache the disk info for one second. timedValue behavior was recently changed to return an old cached value when calculating a new value returns an error. When a mount point is empty, GetDiskInfo() will return errUnformattedDisk, timedValue will return cached disk info with unexpected IsRootDisk value, e.g. false if the mount point belongs to a root disk. Therefore, the mount point will be considered a valid disk and will be formatted as well. This commit will also add more defensive code when marking root disks: always mark a disk offline for any GetDiskInfo() error except errUnformattedDisk. The server will try anyway to reconnect to those disks every 10 seconds.	2022-07-07 17:05:23 -07:00
Harshavardhana	32b2f6117e	fix: do not pass around sync.Map (#15250 ) it is not safe to pass around sync.Map through pointers, as it may be concurrently updated by different callers. this PR simplifies by avoiding sync.Map altogether, we do not need sync.Map to keep object->erasureMap association. This PR fixes a crash when concurrently using this value when audit logs are configured. ``` fatal error: concurrent map iteration and map write goroutine 247651580 [running]: runtime.throw({0x277a6c1?, 0xc002381400?}) runtime/panic.go:992 +0x71 fp=0xc004d29b20 sp=0xc004d29af0 pc=0x438671 runtime.mapiternext(0xc0d6e87f18?) runtime/map.go:871 +0x4eb fp=0xc004d29b90 sp=0xc004d29b20 pc=0x41002b ```	2022-07-07 17:04:25 -07:00
Klaus Post	ac055b09e9	Add detailed scanner metrics (#15161 )	2022-07-05 14:45:49 -07:00
Harshavardhana	bd099f5e71	fix: change timedValue to return the previously cached value (#15169 ) fix: change timedvalue to return previous cached value caller can interpret the underlying error and decide accordingly, places where we do not interpret the errors upon timedValue.Get() - we should simply use the previously cached value instead of returning "empty". Bonus: remove some unused code	2022-06-25 08:50:16 -07:00

1 2 3 4

185 Commits