minio

mirror of https://github.com/minio/minio.git synced 2024-12-26 15:15:55 -05:00

Author	SHA1	Message	Date
Harshavardhana	8161411c5d	fix: a crash in RemoveReplication target (#19640 ) calling a remote target remove with a perfectly well constructed ARN can lead to a crash for a bucket with no replication configured. This PR fixes, and adds a crash check for ImportMetadata as well.	2024-04-30 18:09:56 -07:00
Harshavardhana	95c65f4e8f	do not panic on rebalance during server restarts (#19563 ) This PR makes a feasible approach to handle all the scenarios that we must face to avoid returning "panic." Instead, we must return "errServerNotInitialized" when a bucketMetadataSys.Get() is called, allowing the caller to retry their operation and wait. Bonus fix the way data-usage-cache stores the object. Instead of storing usage-cache.bin with the bucket as `.minio.sys/buckets`, the `buckets` must be relative to the bucket `.minio.sys` as part of the object name. Otherwise, there is no way to decommission entries at `.minio.sys/buckets` and their final erasure set positions. A bucket must never have a `/` in it. Adds code to read() from existing data-usage.bin upon upgrade.	2024-04-22 10:49:30 -07:00
Harshavardhana	1aa8896ad6	Revert "cleanup: Simplify usage of MinIOSourceProxyRequest (#19553 )" This reverts commit `928c0181bf`. This change was not correct, reverting. We track 3 states with the ProxyRequest header - if replication process wants to know if object is already replicated with a HEAD, it shouldn't proxy back - Poorna	2024-04-20 02:05:54 -07:00
Robert Lützner	928c0181bf	cleanup: Simplify usage of MinIOSourceProxyRequest (#19553 ) This replaces a convoluted condition that ultimately evaluated to "is this HTTP header present in the request or not?"	2024-04-19 05:23:31 -07:00
Harshavardhana	7e3166475d	simplify common functions in replication (#19480 )	2024-04-11 17:27:32 -07:00
Harshavardhana	96d226c0b1	remove frivolous log about abort-multipart failure in replication (#19413 )	2024-04-05 04:39:55 -07:00
Anis Eleuch	95bf4a57b6	logging: Add subsystem to log API (#19002 ) Create new code paths for multiple subsystems in the code. This will make maintaing this easier later. Also introduce bugLogIf() for errors that should not happen in the first place.	2024-04-04 05:04:40 -07:00
Shubhendu	468a9fae83	Enable replication of SSE-C objects (#19107 ) If site replication enabled across sites, replicate the SSE-C objects as well. These objects could be read from target sites using the same client encryption keys. Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-03-28 10:44:56 -07:00
Poorna	d990661d1f	replication: enforce precondition for multipart (#19306 )	2024-03-20 18:12:37 -07:00
Harshavardhana	b6e98aed01	fix: found races in accessing globalLocalDrives (#19069 ) make a copy before accessing globalLocalDrives Bonus: update console v0.46.0 Signed-off-by: Harshavardhana <harsha@minio.io>	2024-02-16 17:15:57 -08:00
Poorna	a9cf32811c	Fix panic in tagging request proxying (#19032 )	2024-02-11 18:18:43 -08:00
Poorna	27d02ea6f7	metrics: add replication metrics on proxied requests (#18957 )	2024-02-05 22:00:45 -08:00
Harshavardhana	80ca120088	remove checkBucketExist check entirely to avoid fan-out calls (#18917 ) Each Put, List, Multipart operations heavily rely on making GetBucketInfo() call to verify if bucket exists or not on a regular basis. This has a large performance cost when there are tons of servers involved. We did optimize this part by vectorizing the bucket calls, however its not enough, beyond 100 nodes and this becomes fairly visible in terms of performance.	2024-01-30 12:43:25 -08:00
Poorna	7ffc162ea8	exclude veeam virtual objects from replication (#18918 ) Fixes: #18916	2024-01-30 10:43:58 -08:00
Poorna	bcfd7fbbcf	reuse transports for callhome and remote tgt validation (#18912 )	2024-01-29 23:05:39 -08:00
Harshavardhana	1d3bd02089	avoid close 'nil' panics if any (#18890 ) brings a generic implementation that prints a stack trace for 'nil' channel closes(), if not safely closes it.	2024-01-28 10:04:17 -08:00
Poorna	b6e9d235fe	fix replication error logs to include target endpoint (#18863 )	2024-01-24 13:05:43 -08:00
Harshavardhana	dd2542e96c	add codespell action (#18818 ) Original work here, #18474, refixed and updated.	2024-01-17 23:03:17 -08:00
Poorna	b2b26d9c95	support proxying of tagging requests in replication (#18649 ) support proxying of tagging requests in active-active replication Note: even if proxying is successful, PutObjectTagging/DeleteObjectTagging will continue to report a 404 since the object is not present locally.	2024-01-12 23:51:33 -08:00
Anis Eleuch	8a0ba093dd	audit: Fix merrs and derrs object dangling message (#18714 ) merrs and derrs are empty when a dangling object is deleted. Fix the bug and adds invalid-meta data for data blocks	2023-12-27 22:27:04 -08:00
Harshavardhana	b3314e97a6	re-use the same local drive used by remote-peer (#18645 ) historically, we have always kept storage-rest-server and a local storage API separate without much trouble, since they both can independently operate due to no special state() between them. however, over some time, we have added state() such as - drive monitoring threads now there will be "2" of them per drive instead of just 1. - concurrent tokens available per drive are now twice instead of just single shared, allowing unexpectedly high amount of I/O to go through. - applying serialization by using walkMutexes can now be adequately honored for both remote callers and local callers.	2023-12-13 19:27:55 -08:00
Poorna	3781a0f9ad	replication: Pass metadata timestamps in CopyObject call (#18647 ) Regression from #18285. CopyObject options were inheriting source MTime for metadata timestamps if unspecified, removing this prevented metadata updates from being applied on target.	2023-12-13 15:28:55 -08:00
Poorna	6b06da76cb	add configuration to limit replication workers (#18601 )	2023-12-07 16:22:00 -08:00
Harshavardhana	53ce92b9ca	fix: use the right channel to feed the data in (#18605 ) this PR fixes a regression in batch replication where we weren't sending any data from the Walk() results due to incorrect channels being used.	2023-12-06 18:17:03 -08:00
Krishnan Parthasarathi	c397fb6c7a	Minor fixes to bucket replication (#18578 )	2023-12-01 16:13:08 -08:00
Harshavardhana	bd0819330d	avoid Walk() API listing objects without quorum (#18535 ) This allows batch replication to basically do not attempt to copy objects that do not have read quorum. This PR also allows walk() to provide custom values for quorum under batch replication, and key rotation.	2023-11-27 17:20:04 -08:00
Harshavardhana	a4cfb5e1ed	return errors if dataDir is missing during HeadObject() (#18477 ) Bonus: allow replication to attempt Deletes/Puts when the remote returns quorum errors of some kind, this is to ensure that MinIO can rewrite the namespace with the latest version that exists on the source.	2023-11-20 21:33:47 -08:00
Anis Eleuch	02331a612c	batch-repl: Replicate missing metadata and standard headers (#18484 ) - Replicate Expires when the source is local or remote - Replicate metadata when the source is remote	2023-11-18 19:12:44 -08:00
Krishnan Parthasarathi	9569a85cee	Avoid allocs for MRF on-disk header (#18425 )	2023-11-10 19:54:46 -08:00
Poorna	03dc65e12d	Reload replication targets lazily if missing (#18333 ) There can be rare situations where errors seen in bucket metadata load on startup or subsequent metadata updates can result in missing replication remotes. Attempt a refresh of remote targets backed by a good replication config lazily in 5 minute intervals if there ever occurs a situation where remote targets go AWOL.	2023-10-27 21:08:53 -07:00
Harshavardhana	e1e33077e8	fix: tests and resync replication status (#18244 )	2023-10-13 17:03:34 -07:00
Harshavardhana	74e0c9ab9b	reduce unnecessary logging, simplify certain error handling (#18196 ) remove a bunch of unnecessary logs	2023-10-10 00:33:42 -07:00
Poorna	72871dbb9a	delete replication: avoid overwriting replication decision (#18174 ) from ObjectInfo unless version purge status is present. Otherwise there is potential to make incorrect replication decision if Stat returned an error	2023-10-05 21:09:45 -06:00
Poorna	b73699fad8	replication: pass user tags while queueing (#18052 ) Continues from #18032 - otherwise replication will fail on tag based rules.	2023-09-19 03:18:28 -07:00
jiuker	9947c01c8e	feat: SSE-KMS use uuid instead of read all data to md5. (#17958 )	2023-09-18 10:00:54 -07:00
Harshavardhana	fa6d082bfd	reduce all major allocations in replication path (#18032 ) - remove targetClient for passing around via replicationObjectInfo{} - remove cloing to object info unnecessarily - remove objectInfo from replicationObjectInfo{} (only require necessary fields)	2023-09-16 02:28:06 -07:00
Harshavardhana	a2aabfabd9	add backups for usage-caches to rely on upon error (#18029 ) This allows scanner to avoid lengthy scans, skip things appropriately and also not lose metrics in any manner. reduce longer deadlines for usage-cache loads/saves to match the disk timeout which is 2minutes now per IOP.	2023-09-14 11:53:52 -07:00
Poorna	96fbf18201	replication: queue existing objects to same workers as incoming (#18020 ) Previously existing objects were queued to single worker and MRF re-queues are also handled by same worker - this does not fully use the available bandwidth in case there is no incoming workload.	2023-09-12 21:59:15 -07:00
Harshavardhana	1df5e31706	optimize MRF replication queue to avoid memory leaks (#18007 )	2023-09-11 20:59:11 -07:00
Poorna	703ed46d79	fix: replication of tags while removing (#17989 ) A tag removal was not being replicated prior to this change	2023-09-06 19:05:02 -07:00
Poorna	13a2dc8485	replication resync: avoid blocking on results channel. (#17981 ) continues fix in #17775	2023-09-05 20:22:39 -07:00
Harshavardhana	5b114b43f7	refactor bandwidth throttling for replication target (#17980 ) This refactor is to allow using the bandwidth throttling for other purposes.	2023-09-05 20:21:59 -07:00
Poorna	d665e855de	replication: remove check for empty version id (#17964 )	2023-09-01 13:46:10 -07:00
Poorna	b48bbe08b2	Add additional info for replication metrics API (#17293 ) to track the replication transfer rate across different nodes, number of active workers in use and in-queue stats to get an idea of the current workload. This PR also adds replication metrics to the site replication status API. For site replication, prometheus metrics are no longer at the bucket level - but at the cluster level. Add prometheus metric to track credential errors since uptime	2023-08-30 01:00:59 -07:00
Poorna	4a6af93c83	mark replication target offline if network timeouts seen (#17907 ) regular target liveness check every 5 secs will toggle state back as target returns online.	2023-08-24 09:24:26 -07:00
Harshavardhana	1c5af7c31a	serialize queueMRFHeal(), add timeouts and avoid normal build-ups (#17886 ) we expect a certain level of IOPs and latency so this is okay. fixes other miscellaneous bugs - such as hanging on mrfCh <- when the context is canceled - queuing MRF heal when the context is canceled - remove unused saveStateCh channel	2023-08-21 16:44:50 -07:00
Poorna	dfaf735073	replication: fix queuing of large uploads (#17831 ) Fixes regression from #17687	2023-08-10 15:48:42 -07:00
Harshavardhana	b732a673dc	reduce logging in bucket replication in retry scenarios (#17820 )	2023-08-08 13:27:40 -07:00
Poorna	26c23b30f4	replication: set context timeout for NewMultipartUpload calls (#17807 )	2023-08-05 12:27:07 -07:00
Poorna	311380f8cb	replication resync: fix queueing (#17775 ) Assign resync of all versions of object to the same worker to avoid locking contention. Fixes parallel resync implementation in #16707	2023-08-01 11:51:15 -07:00

1 2 3 4 5 ...

253 Commits