minio

Commit Graph

Author	SHA1	Message	Date
Poorna	6b9fd256e1	Persist in-memory replication stats to disk (#15594 ) to avoid relying on scanner-calculated replication metrics. This will improve the accuracy of the replication stats reported. This PR also adds on to #15556 by handing replication traffic that could not be queued by available workers to the MRF queue so that entries in `PENDING` status are healed faster.	2022-09-12 12:40:02 -07:00
Anis Elleuch	cf52691959	Save resync status in the backend using a last update timestamp (#15638 ) Currently, there is a short time window where the code is allowed to save the status of a replication resync. Currently, the window is `now.Sub(st.EndTime) <= resyncTimeInterval`. Also, any failure to write in the backend disks is not retried. Refactor the code a little bit to rely on the last timestamp of a successful write of the resync status of any given bucket in the backend disks.	2022-09-01 16:53:36 -07:00
Anis Elleuch	10e75116ef	Avoid replicating dirs in listing with replication enabled (#15641 ) When replication is enabled in a particular bucket, the listing will send objects to bucket replication, but it is also sending prefixes for non recursive listing which is useless and shows a lot of error logs. This commit will ignore prefixes.	2022-09-01 15:22:11 -07:00
Harshavardhana	433b6fa8fe	upgrade golang-lint to the latest (#15600 )	2022-08-26 12:52:29 -07:00
Harshavardhana	edba7c987b	fix: objects matching prefixes should not leave delete markers (#15586 ) This is needed to ensure that we do not leave prefixes where version is suspended, instead we never leave versions on these paths.	2022-08-24 13:46:29 -07:00
Poorna	4155c5b695	replication: improve MRF healing. (#15556 ) This PR improves the replication failure healing by persisting most recent failures to disk and re-queuing them until the replication is successful. While this does not eliminate the need for healing during a full scan, queuing MRF vastly improves the ETA to keeping replicated buckets in sync as it does not wait for the scanner visit to detect unreplicated object versions.	2022-08-22 16:53:06 -07:00
Harshavardhana	ae4ee95d25	change default lock retry interval to 50ms (#15560 ) competing calls on the same object on versioned bucket mutating calls on the same object may unexpected have higher delays. This can be reproduced with a replicated bucket overwriting the same object writes, deletes repeatedly. For longer locks like scanner keep the 1sec interval	2022-08-19 16:21:05 -07:00
Harshavardhana	e9055e9ef7	fix: walk() should cancel itself upon context cancellation (#15553 ) This PR fixes possible leaks that may emanate from not listening on context cancelation or timeouts. ``` goroutine 60957610 [chan send, 16 minutes]: github.com/minio/minio/cmd.(erasureServerPools).Walk.func1.1.1(...) github.com/minio/minio/cmd/erasure-server-pool.go:1724 +0x368 github.com/minio/minio/cmd.listPathRaw({0x4a9a740, 0xc0666dffc0},... github.com/minio/minio/cmd/metacache-set.go:1022 +0xfc4 github.com/minio/minio/cmd.(erasureServerPools).Walk.func1.1() github.com/minio/minio/cmd/erasure-server-pool.go:1764 +0x528 created by github.com/minio/minio/cmd.(*erasureServerPools).Walk.func1 github.com/minio/minio/cmd/erasure-server-pool.go:1697 +0x1b7 ```	2022-08-18 17:49:08 -07:00
Poorna	21fe14201f	replication: centralize healthcheck for remote targets (#15516 ) This PR moves health check from minio-go client to being managed on the server. Additionally integrating health check into site replication	2022-08-16 17:46:22 -07:00
Poorna	21bf5b4db7	replication: heal proactively upon access (#15501 ) Queue failed/pending replication for healing during listing and GET/HEAD API calls. This includes healing of existing objects that were never replicated or those in the middle of a resync operation. This PR also fixes a bug in ListObjectVersions where lifecycle filtering should be done.	2022-08-09 15:00:24 -07:00
ebozduman	b57e7321e7	Replaces 'disk'=>'drive' visible to end user (#15464 )	2022-08-04 16:10:08 -07:00
Poorna	5e0776e96a	replication: Include replica object versions for resync (#15427 )	2022-07-28 13:43:02 -07:00
Harshavardhana	5e763b71dc	use logger.LogOnce to reduce printing disconnection logs (#15408 ) fixes #15334 - re-use net/url parsed value for http.Request{} - remove gosimple, structcheck and unusued due to https://github.com/golangci/golangci-lint/issues/2649 - unwrapErrs upto leafErr to ensure that we store exactly the correct errors	2022-07-27 09:44:59 -07:00
Poorna	cab8d3d568	feat: add API to return list of objects waiting to be replicated (#15091 )	2022-07-21 11:05:44 -07:00
Poorna	b4f6901903	resync: Avoid concurrent access/write on map (#15286 ) fixes a crash ``` fatal error: concurrent map iteration and map write minio[19309]: goroutine 18640 [running]: minio[19309]: runtime.throw({0x27a3399?, 0x1785?}) minio[19309]: runtime/panic.go:992 +0x71 fp=0xc0062f1c80 sp=0xc0062f1c50 pc=0x438671 minio[19309]: runtime.mapiternext(0xc0062f1e90?) minio[19309]: runtime/map.go:871 +0x4eb fp=0xc0062f1cf0 sp=0xc0062f1c80 pc=0x41002b minio[19309]: github.com/minio/minio/cmd.(*ReplicationPool).periodicResyncMetaSave(0xc0056c00c0, {0x4d06a48, 0xc0005b2480}, {0x4d22fc0, 0xc0015ea0 ```	2022-07-13 16:29:10 -07:00
Harshavardhana	0a8b78cb84	fix: simplify passing auditLog eventType (#15278 ) Rename Trigger -> Event to be a more appropriate name for the audit event. Bonus: fixes a bug in AddMRFWorker() it did not cancel the waitgroup, leading to waitgroup leaks.	2022-07-12 10:43:32 -07:00
Harshavardhana	31c4fdbf79	fix: resyncing 'null' version on pre-existing content (#15043 ) PR #15041 fixed replicating 'null' version however due to a regression from #14994 caused the target versions for these 'null' versioned objects to have different 'versions', this may cause confusion with bi-directional replication and cause double replication. This PR fixes this properly by making sure we replicate the correct versions on the objects.	2022-06-06 15:14:56 -07:00
Harshavardhana	48e367ff7d	reject resync start on misconfigured replication rules (#15041 ) we expect resync to start on buckets with replication rule ExistingObjects enabled, if not we reject such calls.	2022-06-06 02:54:39 -07:00
Harshavardhana	52221db7ef	fix: for unexpected errors in reading versioning config panic (#14994 ) We need to make sure if we cannot read bucket metadata for some reason, and bucket metadata is not missing and returning corrupted information we should panic such handlers to disallow I/O to protect the overall state on the system. In-case of such corruption we have a mechanism now to force recreate the metadata on the bucket, using `x-minio-force-create` header with `PUT /bucket` API call. Additionally fix the versioning config updated state to be set properly for the site replication healing to trigger correctly.	2022-05-31 02:57:57 -07:00
Harshavardhana	f1abb92f0c	feat: Single drive XL implementation (#14970 ) Main motivation is move towards a common backend format for all different types of modes in MinIO, allowing for a simpler code and predictable behavior across all features. This PR also brings features such as versioning, replication, transitioning to single drive setups.	2022-05-30 10:58:37 -07:00
Poorna	5c81d0d89a	site replication: heal missing/invalid replication config (#14979 ) Validate remote target ARNs and heal any stale rules in the replication config	2022-05-26 17:57:23 -07:00
Harshavardhana	f8650a3493	fetch bucket replication stats across peers in single call (#14956 ) current implementation relied on recursively calling one bucket at a time across all peers, this would be very slow and chatty when there are 100's of buckets which would mean 100*peerCount amount of network operations. This PR attempts to reduce this entire call into `peerCount` amount of network calls only. This functionality addresses also a concern where the Prometheus metrics would significantly slow down when one of the peers is offline.	2022-05-23 09:15:30 -07:00
Harshavardhana	6cfb1cb6fd	fix: timer usage across codebase (#14935 ) it seems in some places we have been wrongly using the timer.Reset() function, nicely exposed by an example shared by @donatello https://go.dev/play/p/qoF71_D1oXD this PR fixes all the usage comprehensively	2022-05-17 22:42:59 -07:00
Harshavardhana	62aa42cccf	avoid replication proxy on version excluded paths (#14878 ) no need to attempt proxying objects that were never replicated, but do have local `null` versions on them.	2022-05-08 16:50:31 -07:00
Harshavardhana	5cffd3780a	fix: multiple fixes in prefix exclude implementation (#14877 ) - do not need to restrict prefix exclusions that do not have `/` as suffix, relax this requirement as spark may have staging folders with other autogenerated characters , so we are better off doing full prefix March and skip. - multiple delete objects was incorrectly creating a null delete marker on a versioned bucket instead of creating a proper versioned delete marker. - do not suspend paths on the excluded prefixes during delete operations to avoid creating `null` delete markers, honor suspension of versioning only at bucket level for delete markers.	2022-05-07 22:06:44 -07:00
Krishnan Parthasarathi	ad8e611098	feat: implement prefix-level versioning exclusion (#14828 ) Spark/Hadoop workloads which use Hadoop MR Committer v1/v2 algorithm upload objects to a temporary prefix in a bucket. These objects are 'renamed' to a different prefix on Job commit. Object storage admins are forced to configure separate ILM policies to expire these objects and their versions to reclaim space. Our solution: This can be avoided by simply marking objects under these prefixes to be excluded from versioning, as shown below. Consequently, these objects are excluded from replication, and don't require ILM policies to prune unnecessary versions. - MinIO Extension to Bucket Version Configuration ```xml <VersioningConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> <Status>Enabled</Status> <ExcludeFolders>true</ExcludeFolders> <ExcludedPrefixes> <Prefix>app1-jobs//_temporary/</Prefix> </ExcludedPrefixes> <ExcludedPrefixes> <Prefix>app2-jobs//__magic/</Prefix> </ExcludedPrefixes> <!-- .. up to 10 prefixes in all --> </VersioningConfiguration> ``` Note: `ExcludeFolders` excludes all folders in a bucket from versioning. This is required to prevent the parent folders from accumulating delete markers, especially those which are shared across spark workloads spanning projects/teams. - To enable version exclusion on a list of prefixes ``` mc version enable --excluded-prefixes "app1-jobs//_temporary/,app2-jobs//_magic," --exclude-prefix-marker myminio/test ```	2022-05-06 19:05:28 -07:00
Poorna	3a64580663	Add support for site replication healing (#14572 ) heal bucket metadata and IAM entries for sites participating in site replication from the site with the most updated entry. Co-authored-by: Harshavardhana <harsha@minio.io> Co-authored-by: Aditya Manthramurthy <aditya@minio.io>	2022-04-24 02:36:31 -07:00
Krishnan Parthasarathi	7b81967a3c	Fix handling of object versions pending purge (#14555 ) - GetObject() with vid should return 405 - GetObject() without vid should return 404 - ListObjects() should ignore this object if this is the "latest" version of the object - ListObjectVersions() should list this object as "DELETE marker" - Remove data parts before sync'ing the version pending purge	2022-03-16 16:59:43 -07:00
Poorna	1e39ca39c3	fix: consistent replies for incorrect range requests on replicated buckets (#14345 ) Propagate error from replication proxy target correctly to the client if range GET is unsatisfiable.	2022-03-08 13:58:55 -08:00
Poorna	ed3418c046	Refactor replication resync to be an active process (#14266 ) When resync is triggered, walk the bucket namespace and resync objects that are unreplicated. This PR also adds an API to report resync progress.	2022-02-10 10:16:52 -08:00
Poorna	288e276abe	Specify tags in options while selecting replication targets (#14126 ) When the replication rule is based on tag matches, the replication process should pick up targets matching the tags specified in the replication rule. Fixing regression due to #12880	2022-01-19 10:45:42 -08:00
Harshavardhana	cc3f139d1f	replication: attempt abort multipart-upload at max 3 times on remote (#14087 ) this is mainly an attempt to relinquish space on the remote site, if this still doesn't do it we give and let the admin know with a log message.	2022-01-11 22:32:29 -08:00
Poorna	54a98773f8	fix: replication of tag removal (#14056 ) Currently tag removal leaves replication state as `PENDING` because the `HEAD` api returns just a tag count but not the actual tags, and this is treated as a no-op	2022-01-10 19:06:10 -08:00
Harshavardhana	f527c708f2	run gofumpt cleanup across code-base (#14015 )	2022-01-02 09:15:06 -08:00
Aditya Manthramurthy	997e808088	fix; race in bucket replication stats (#13942 ) - r.ulock was not locked when r.UsageCache was being modified Bonus: - simplify code by removing some unnecessary clone methods - we can do this because go arrays are values (not pointers/references) that are automatically copied on assignment. - remove some unnecessary map allocation calls	2021-12-17 15:33:13 -08:00
Poorna K	e270ab65b3	fix: healing of replication delete markers (#13933 ) A corner case can occur where the delete-marker was propagated but the metadata could not be updated on the primary. Sending a RemoveObject call with the Delete marker version would end up permanently deleting the version on target. Instead, perform a Stat on the delete-marker version on target and redo replication only if the delete-marker is missing on target.	2021-12-16 15:34:55 -08:00
Poorna K	d422d24278	replication: warn if insufficient workers (#13899 ) This should give an early warning if configured replication workers are insufficient to meet application workload.	2021-12-13 18:22:56 -08:00
Harshavardhana	914bfb2d9c	fix: allow compaction on replicated buckets (#13711 ) currently getReplicationConfig() failure incorrectly returns error on unexpected buckets upon upgrade, we should always calculate usage as much as possible.	2021-11-19 14:46:14 -08:00
Klaus Post	faf013ec84	Improve performance on multiple versions (#13573 ) Existing: ```go type xlMetaV2 struct { Versions []xlMetaV2Version `json:"Versions" msg:"Versions"` } ``` Serialized as regular MessagePack. ```go //msgp:tuple xlMetaV2VersionHeader type xlMetaV2VersionHeader struct { VersionID [16]byte ModTime int64 Type VersionType Flags xlFlags } ``` Serialize as streaming MessagePack, format: ``` int(headerVersion) int(xlmetaVersion) int(nVersions) for each version { binary blob, xlMetaV2VersionHeader, serialized binary blob, xlMetaV2Version, serialized. } ``` xlMetaV2VersionHeader is <= 30 bytes serialized. Deserialized struct can easily be reused and does not contain pointers, so efficient as a slice (single allocation) This allows quickly parsing everything as slices of bytes (no copy). Versions are always saved sorted by modTime, newest first. No more need to sort on load. * Allows checking if a version exists. * Allows reading single version without unmarshal all. * Allows reading latest version of type without unmarshal all. * Allows reading latest version without unmarshal of all. * Allows checking if the latest is deleteMarker by reading first entry. * Allows adding/updating/deleting a version with only header deserialization. * Reduces allocations on conversion to FileInfo(s).	2021-11-18 12:15:22 -08:00
Anis Elleuch	4caed7cc0d	metrics: Add replication latency metrics (#13515 ) Add a new Prometheus metric for bucket replication latency e.g.: minio_bucket_replication_latency_ns{ bucket="testbucket", operation="upload", range="LESS_THAN_1_MiB", server="127.0.0.1:9001", targetArn="arn:minio:replication::45da043c-14f5-4da4-9316-aba5f77bf730:testbucket"} 2.2015663e+07 Co-authored-by: Klaus Post <klauspost@gmail.com>	2021-11-17 12:10:57 -08:00
Harshavardhana	661b263e77	add gocritic/ruleguard checks back again, cleanup code. (#13665 ) - remove some duplicated code - reported a bug, separately fixed in #13664 - using strings.ReplaceAll() when needed - using filepath.ToSlash() use when needed - remove all non-Go style comments from the codebase Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>	2021-11-16 09:28:29 -08:00
Harshavardhana	4ed0eb7012	remove double reads updating object metadata (#13542 ) Removes RLock/RUnlock for updating metadata, since we already take a write lock to update metadata, this change removes reading of xl.meta as well as an additional lock, the performance gain should increase 3x theoretically for - PutObjectRetention - PutObjectLegalHold This optimization is mainly for Veeam like workloads that require a certain level of iops from these API calls, we were losing iops.	2021-10-30 08:22:04 -07:00
Poorna K	e7f559c582	Fixes to replication metrics (#13493 ) For reporting ReplicaSize and loading initial replication metrics correctly.	2021-10-21 18:52:55 -07:00
Poorna Krishnamoorthy	7f6ed35347	Allow null versions to be replicated (#13310 ) for pre-existing objects present in a bucket prior to enabling existing object replication. Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>	2021-09-28 10:26:12 -07:00
Poorna Krishnamoorthy	19ecdc75a8	replication: Simplify metrics calculation (#13274 ) Also doing some code cleanup	2021-09-22 10:48:45 -07:00
Poorna Krishnamoorthy	806b10b934	fix: improve error messages returned during replication setup (#13261 )	2021-09-21 13:03:20 -07:00
Poorna Krishnamoorthy	c4373ef290	Add support for multi site replication (#12880 )	2021-09-18 13:31:35 -07:00
Harshavardhana	0892f1e406	fix: multipart replication and encrypted etag for sse-s3 (#13171 ) Replication was not working properly for encrypted objects in single PUT object for preserving etag, We need to make sure to preserve etag such that replication works properly and not gets into infinite loops of copying due to ETag mismatches.	2021-09-08 22:25:23 -07:00
Poorna Krishnamoorthy	9af4e7b1da	Add healthcheck back for replication targets (#13168 ) This will allow objects to relinquish read lock held during replication earlier if the target is known to be down without waiting for connection timeout when replication is attempted.	2021-09-08 15:34:50 -07:00
Poorna Krishnamoorthy	a366143c5b	Remove replication permission check (#13135 ) Fixes #13105	2021-09-02 09:31:13 -07:00

1 2 3

142 Commits