minio

Commit Graph

Author	SHA1	Message	Date
Poorna	e029f8a9d7	set kms keyid in replication opts (#20542 )	2024-10-09 23:49:55 -07:00
Klaus Post	f1302c40fe	Fix uninitialized replication stats (#20260 ) Services are unfrozen before `initBackgroundReplication` is finished. This means that the globalReplicationStats write is racy. Switch to an atomic pointer. Provide the `ReplicationPool` with the stats, so it doesn't have to be grabbed from the atomic pointer on every use. All other loads and checks are nil, and calls return empty values when stats still haven't been initialized.	2024-08-15 05:04:40 -07:00
Harshavardhana	89c58ce87d	enhance getActualSize() to return valid values for most situations (#20228 )	2024-08-08 08:29:58 -07:00
Poorna	2d40433bc1	remove replication throttle deadline for objects > 128MiB (#20184 ) context deadline was introduced to avoid a slow transfer from blocking replication queue(s) shared by other buckets that may not be under throttling. This PR removes this context deadline for larger objects since they are anyway restricted to a limited set of workers. Otherwise, objects would get dequeued when the throttle limit is exceeded and cannot proceed within the deadline.	2024-07-29 15:14:52 -07:00
Poorna	6651c655cb	fix replication of checksum when encryption is enabled (#20161 ) - Adding functional tests - Return checksum header on GET/HEAD, previously this was returning InvalidPartNumber error	2024-07-29 01:02:16 -07:00
Poorna	641a56da0d	fix panic in replication queuing (#20169 ) Regression from #20077 ``` Jul 26 19:08:29 minio-dr-0101a minio[275423]: Error: grid handler (NSScanner) panic: runtime error: index out of range [4] with length 1 (errors.errorString) Jul 26 19:08:29 minio-dr-0101a minio[275423]: 33: internal/logger/logger.go:268:logger.LogIf() Jul 26 19:08:29 minio-dr-0101a minio[275423]: 32: internal/grid/connection.go:50:grid.gridLogIf() Jul 26 19:08:29 minio-dr-0101a minio[275423]: 31: internal/grid/muxserver.go:234:grid.(muxServer).handleRequests.func1() Jul 26 19:08:29 minio-dr-0101a minio[275423]: 30: cmd/bucket-replication.go:2165:cmd.(ReplicationPool).queueReplicaTask() Jul 26 19:08:29 minio-dr-0101a minio[275423]: 29: cmd/bucket-replication.go:3440:cmd.queueReplicationHeal() Jul 26 19:08:29 minio-dr-0101a minio[275423]: 28: cmd/data-scanner.go:1396:cmd.(scannerItem).healReplication() Jul 26 19:08:29 minio-dr-0101a minio[275423]: 27: cmd/data-scanner.go:1220:cmd.(scannerItem).applyActions() Jul 26 19:08:29 minio-dr-0101a minio[275423]: 26: cmd/xl-storage.go:627:cmd.(xlStorage).NSScanner.func2() ```	2024-07-26 13:48:21 -07:00
jiuker	132e7413ba	fix: check once ready for site-replication (#20149 )	2024-07-26 10:27:42 -07:00
Anis Eleuch	b7f319b62a	properly reload a fresh drive when found in a failed state during startup (#20145 ) When a drive is in a failed state when a single node multiple drives deployment is started, a replacement of a fresh disk will not be properly healed unless the user restarts the node. Fix this by always adding the new fresh disk to globalLocalDrivesMap. Also remove globalLocalDrives for simplification, a map to store local node drives can still be used since the order of local drives of a node is not defined.	2024-07-24 16:30:33 -07:00
Poorna	989c318a28	replication: make large workers configurable (#20077 ) This PR also improves throttling by reducing tokens requested from rate limiter based on available tokens to avoid exceeding throttle wait deadlines	2024-07-12 07:57:31 -07:00
Anis Eleuch	4d7d008741	bootstrap: Speed up bucket metadata loading (#19969 ) Currently, bucket metadata is being loaded serially inside ListBuckets Objet API. Fix that by loading the bucket metadata as the number of erasure sets * 10, which is a good approximation.	2024-06-21 15:22:24 -07:00
Klaus Post	ad04afe381	Fix SSEC multipart checksum replication (#19915 ) * Multipart SSEC checksums were not transferred. * Remove key mismatch logging. This key is user-controlled with SSEC. * If the source is SSEC and the destination reports ErrSSEEncryptedObject, assume replication is good.	2024-06-12 23:56:12 -07:00
Klaus Post	d2eed44c78	Fix replication checksum transfer (#19906 ) Compression will be disabled by default if SSE-C is specified. So we can still honor SSE-C.	2024-06-10 10:40:33 -07:00
Klaus Post	a2cab02554	Fix SSE-C checksums (#19896 ) Compression will be disabled by default if SSE-C is specified. So we can still honor SSE-C.	2024-06-10 08:31:51 -07:00
Shubhendu	2f6e03fb60	Calculate correct object size while replication (#19888 ) It was missing in case of `replicateObject` but was present for `replicateAll` already Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-06-06 12:31:01 -07:00
Poorna	5aaef9790f	replication: pass checksum headers to replica (#19834 )	2024-06-06 02:36:42 -07:00
Klaus Post	0a63dc199c	Add trace sizes to more trace types (#19864 ) Add trace sizes to * ILM traces * Replication traces * Healing traces * Decommission traces * Rebalance traces * (s)ftp traces * http traces.	2024-06-03 08:45:54 -07:00
Harshavardhana	e8d14c0d90	verify preconditions during CompleteMultipart (#19713 ) Bonus: hold the write lock properly to apply optimistic concurrency during NewMultipartUpload()	2024-05-10 17:31:22 -07:00
Klaus Post	847ee5ac45	Make WalkDir return errors (#19677 ) If used, 'opts.Marker` will cause many missed entries since results are returned unsorted, and pools are serialized. Switch to fully concurrent listing and merging across pools to return sorted entries.	2024-05-06 13:27:52 -07:00
Harshavardhana	8161411c5d	fix: a crash in RemoveReplication target (#19640 ) calling a remote target remove with a perfectly well constructed ARN can lead to a crash for a bucket with no replication configured. This PR fixes, and adds a crash check for ImportMetadata as well.	2024-04-30 18:09:56 -07:00
Harshavardhana	95c65f4e8f	do not panic on rebalance during server restarts (#19563 ) This PR makes a feasible approach to handle all the scenarios that we must face to avoid returning "panic." Instead, we must return "errServerNotInitialized" when a bucketMetadataSys.Get() is called, allowing the caller to retry their operation and wait. Bonus fix the way data-usage-cache stores the object. Instead of storing usage-cache.bin with the bucket as `.minio.sys/buckets`, the `buckets` must be relative to the bucket `.minio.sys` as part of the object name. Otherwise, there is no way to decommission entries at `.minio.sys/buckets` and their final erasure set positions. A bucket must never have a `/` in it. Adds code to read() from existing data-usage.bin upon upgrade.	2024-04-22 10:49:30 -07:00
Harshavardhana	1aa8896ad6	Revert "cleanup: Simplify usage of MinIOSourceProxyRequest (#19553 )" This reverts commit `928c0181bf`. This change was not correct, reverting. We track 3 states with the ProxyRequest header - if replication process wants to know if object is already replicated with a HEAD, it shouldn't proxy back - Poorna	2024-04-20 02:05:54 -07:00
Robert Lützner	928c0181bf	cleanup: Simplify usage of MinIOSourceProxyRequest (#19553 ) This replaces a convoluted condition that ultimately evaluated to "is this HTTP header present in the request or not?"	2024-04-19 05:23:31 -07:00
Harshavardhana	7e3166475d	simplify common functions in replication (#19480 )	2024-04-11 17:27:32 -07:00
Harshavardhana	96d226c0b1	remove frivolous log about abort-multipart failure in replication (#19413 )	2024-04-05 04:39:55 -07:00
Anis Eleuch	95bf4a57b6	logging: Add subsystem to log API (#19002 ) Create new code paths for multiple subsystems in the code. This will make maintaing this easier later. Also introduce bugLogIf() for errors that should not happen in the first place.	2024-04-04 05:04:40 -07:00
Shubhendu	468a9fae83	Enable replication of SSE-C objects (#19107 ) If site replication enabled across sites, replicate the SSE-C objects as well. These objects could be read from target sites using the same client encryption keys. Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-03-28 10:44:56 -07:00
Poorna	d990661d1f	replication: enforce precondition for multipart (#19306 )	2024-03-20 18:12:37 -07:00
Harshavardhana	b6e98aed01	fix: found races in accessing globalLocalDrives (#19069 ) make a copy before accessing globalLocalDrives Bonus: update console v0.46.0 Signed-off-by: Harshavardhana <harsha@minio.io>	2024-02-16 17:15:57 -08:00
Poorna	a9cf32811c	Fix panic in tagging request proxying (#19032 )	2024-02-11 18:18:43 -08:00
Poorna	27d02ea6f7	metrics: add replication metrics on proxied requests (#18957 )	2024-02-05 22:00:45 -08:00
Harshavardhana	80ca120088	remove checkBucketExist check entirely to avoid fan-out calls (#18917 ) Each Put, List, Multipart operations heavily rely on making GetBucketInfo() call to verify if bucket exists or not on a regular basis. This has a large performance cost when there are tons of servers involved. We did optimize this part by vectorizing the bucket calls, however its not enough, beyond 100 nodes and this becomes fairly visible in terms of performance.	2024-01-30 12:43:25 -08:00
Poorna	7ffc162ea8	exclude veeam virtual objects from replication (#18918 ) Fixes: #18916	2024-01-30 10:43:58 -08:00
Poorna	bcfd7fbbcf	reuse transports for callhome and remote tgt validation (#18912 )	2024-01-29 23:05:39 -08:00
Harshavardhana	1d3bd02089	avoid close 'nil' panics if any (#18890 ) brings a generic implementation that prints a stack trace for 'nil' channel closes(), if not safely closes it.	2024-01-28 10:04:17 -08:00
Poorna	b6e9d235fe	fix replication error logs to include target endpoint (#18863 )	2024-01-24 13:05:43 -08:00
Harshavardhana	dd2542e96c	add codespell action (#18818 ) Original work here, #18474, refixed and updated.	2024-01-17 23:03:17 -08:00
Poorna	b2b26d9c95	support proxying of tagging requests in replication (#18649 ) support proxying of tagging requests in active-active replication Note: even if proxying is successful, PutObjectTagging/DeleteObjectTagging will continue to report a 404 since the object is not present locally.	2024-01-12 23:51:33 -08:00
Anis Eleuch	8a0ba093dd	audit: Fix merrs and derrs object dangling message (#18714 ) merrs and derrs are empty when a dangling object is deleted. Fix the bug and adds invalid-meta data for data blocks	2023-12-27 22:27:04 -08:00
Harshavardhana	b3314e97a6	re-use the same local drive used by remote-peer (#18645 ) historically, we have always kept storage-rest-server and a local storage API separate without much trouble, since they both can independently operate due to no special state() between them. however, over some time, we have added state() such as - drive monitoring threads now there will be "2" of them per drive instead of just 1. - concurrent tokens available per drive are now twice instead of just single shared, allowing unexpectedly high amount of I/O to go through. - applying serialization by using walkMutexes can now be adequately honored for both remote callers and local callers.	2023-12-13 19:27:55 -08:00
Poorna	3781a0f9ad	replication: Pass metadata timestamps in CopyObject call (#18647 ) Regression from #18285. CopyObject options were inheriting source MTime for metadata timestamps if unspecified, removing this prevented metadata updates from being applied on target.	2023-12-13 15:28:55 -08:00
Poorna	6b06da76cb	add configuration to limit replication workers (#18601 )	2023-12-07 16:22:00 -08:00
Harshavardhana	53ce92b9ca	fix: use the right channel to feed the data in (#18605 ) this PR fixes a regression in batch replication where we weren't sending any data from the Walk() results due to incorrect channels being used.	2023-12-06 18:17:03 -08:00
Krishnan Parthasarathi	c397fb6c7a	Minor fixes to bucket replication (#18578 )	2023-12-01 16:13:08 -08:00
Harshavardhana	bd0819330d	avoid Walk() API listing objects without quorum (#18535 ) This allows batch replication to basically do not attempt to copy objects that do not have read quorum. This PR also allows walk() to provide custom values for quorum under batch replication, and key rotation.	2023-11-27 17:20:04 -08:00
Harshavardhana	a4cfb5e1ed	return errors if dataDir is missing during HeadObject() (#18477 ) Bonus: allow replication to attempt Deletes/Puts when the remote returns quorum errors of some kind, this is to ensure that MinIO can rewrite the namespace with the latest version that exists on the source.	2023-11-20 21:33:47 -08:00
Anis Eleuch	02331a612c	batch-repl: Replicate missing metadata and standard headers (#18484 ) - Replicate Expires when the source is local or remote - Replicate metadata when the source is remote	2023-11-18 19:12:44 -08:00
Krishnan Parthasarathi	9569a85cee	Avoid allocs for MRF on-disk header (#18425 )	2023-11-10 19:54:46 -08:00
Poorna	03dc65e12d	Reload replication targets lazily if missing (#18333 ) There can be rare situations where errors seen in bucket metadata load on startup or subsequent metadata updates can result in missing replication remotes. Attempt a refresh of remote targets backed by a good replication config lazily in 5 minute intervals if there ever occurs a situation where remote targets go AWOL.	2023-10-27 21:08:53 -07:00
Harshavardhana	e1e33077e8	fix: tests and resync replication status (#18244 )	2023-10-13 17:03:34 -07:00
Harshavardhana	74e0c9ab9b	reduce unnecessary logging, simplify certain error handling (#18196 ) remove a bunch of unnecessary logs	2023-10-10 00:33:42 -07:00

1 2 3 4 5 ...

271 Commits