Commit Graph

177 Commits

Author SHA1 Message Date
Poorna c33a237067
fix: under site replication disallow remote target modification (#16628) 2023-02-15 20:22:13 -08:00
jiuker a15b6f21b8
remove incorrect use of WaitGroup (#16596) 2023-02-12 20:59:45 -08:00
Poorna 876e1a91b2
replication: Fix typo checking PreconditionFailed status code (#16517) 2023-02-02 19:22:02 +05:30
Poorna 820d94447c
replication: fix target bucket passed on GET proxy (#16495) 2023-01-27 10:24:51 -08:00
Poorna ed20134a7b
replication: detect proxy header presence correctly (#16489) 2023-01-27 01:29:32 -08:00
Harshavardhana e64b9f6751
fix: disallow SSE-C encrypted objects on replicated buckets (#16467) 2023-01-24 15:46:33 -08:00
Poorna ddad231921
replication: Avoid logging PreConditionFailed error (#16450) 2023-01-21 07:33:04 +05:30
Poorna 1b02e046c2
Fix bandwidth monitoring to be per remote target (#16360) 2023-01-19 18:52:16 +05:30
Harshavardhana 2937711390
fix: DeleteObject() API with versionId under replication (#16325) 2022-12-28 22:48:33 -08:00
Anis Elleuch acc9c033ed
debug: Add X-Amz-Request-ID to lock/unlock calls (#16309) 2022-12-23 19:49:07 -08:00
Poorna de0b43de32
persist replication stats with leader lock (#16282) 2022-12-22 14:25:13 -08:00
Poorna 6423e4c767
Remove site replication config if it succeeded locally (#16279) 2022-12-22 01:31:20 -08:00
Harshavardhana 2fc182d8e6
fix: iso8601TimeFormat padding issue for certain nanoseconds (#16207) 2022-12-12 10:28:30 -08:00
Aditya Manthramurthy a30cfdd88f
Bump up madmin-go to v2 (#16162) 2022-12-06 13:46:50 -08:00
Klaus Post a713aee3d5
Run staticcheck on CI (#16170) 2022-12-05 11:18:50 -08:00
Klaus Post 1cd875de1e
Persist updated metadata (#16160) 2022-12-02 08:35:04 -08:00
Anis Elleuch 641ab24aec
repl: resync orchestrator to use global shared lock (#16154) 2022-12-01 12:10:09 -08:00
Klaus Post a22b4adf4c
distribute replication ops based on names (#16083) 2022-11-17 15:20:09 -08:00
Klaus Post b7bb122be8
fix: replication auto-scaling deadlock (#16084) 2022-11-17 07:35:02 -08:00
Klaus Post 8a07000e58
fix: refactor getReplicationDiff for safe use (#16051) 2022-11-15 07:59:21 -08:00
Poorna d6bc141bd1
feat: Add support for site level resync (#15753) 2022-11-14 07:16:40 -08:00
Poorna 34d28dd79f
replication: Avoid blocking on mrf save (#16045) 2022-11-10 10:20:02 -08:00
Klaus Post 2894dd4d1a
fix: hold lock while serializing replication stats (#16007) 2022-11-04 09:59:14 -07:00
Klaus Post 0f0e154315
fix: inconsistent replication delete marker timestamps (#15956) 2022-10-27 09:46:52 -07:00
Harshavardhana 23b329b9df
remove gateway completely (#15929) 2022-10-24 17:44:15 -07:00
Anis Elleuch fc6c794972
Audit dangling object removal (#15933) 2022-10-24 11:35:07 -07:00
Poorna e4e90b53c1
fix: delete-marker replication check properly (#15923) 2022-10-21 14:45:06 -07:00
Harshavardhana 59e33b3b21
validate setBucketTarget properly as per BucketExists() call (#15860) 2022-10-13 17:46:49 -07:00
Poorna 0e3c92c027 attempt delete marker replication after object is replicated (#15857)
Ensure delete marker replication success, especially since the
recent optimizations to heal on HEAD, LIST and GET can force
replication attempts on delete marker before underlying object
version could have synced.
2022-10-13 17:45:23 -07:00
Harshavardhana 97112c69be
fix: replication stats() to not crash under any situation (#15851)
Co-authored-by: Daniel Valdivia <18384552+dvaldivia@users.noreply.github.com>
2022-10-12 15:47:41 -07:00
Anis Elleuch e856e10ac2
ignore VersionNotFound in addition to ObjectNotFound while replicating (#15814) 2022-10-07 16:11:41 -07:00
Poorna 8ea6fb368d
Add auto configuration of replication workers (#15636) 2022-09-24 16:20:28 -07:00
Harshavardhana 50a8ba6a6f
fix: parse and save retainUntilDate in correct time format (#15741) 2022-09-23 08:49:27 -07:00
Harshavardhana 124544d834
add pre-conditions support for PUT calls during replication (#15674)
PUT shall only proceed if pre-conditions are met, the new
code uses

- x-minio-source-mtime
- x-minio-source-etag

to verify if the object indeed needs to be replicated
or not, allowing us to avoid StatObject() call.
2022-09-14 18:44:04 -07:00
Poorna a0fb0c1835
panic if replication config could not be read from disk (#15685)
If replication config could not be read from bucket metadata for some
reason, issue a panic so that unexpected replication outcomes can
be avoided for replicated buckets.

For similar reasons, adding a panic while fetching object-lock config
if it failed for reason other than non-existence of config.
2022-09-13 21:23:33 -07:00
Poorna 6b9fd256e1
Persist in-memory replication stats to disk (#15594)
to avoid relying on scanner-calculated replication metrics.
This will improve the accuracy of the replication stats reported.

This PR also adds on to #15556 by handing replication
traffic that could not be queued by available workers to the 
MRF queue so that entries in `PENDING` status are healed faster.
2022-09-12 12:40:02 -07:00
Anis Elleuch cf52691959
Save resync status in the backend using a last update timestamp (#15638)
Currently, there is a short time window where the code is allowed 
to save the status of a replication resync. Currently, the window is
`now.Sub(st.EndTime) <= resyncTimeInterval`. Also, any failure to 
write in the backend disks is not retried.

Refactor the code a little bit to rely on the last timestamp of a
successful write of the resync status of any given bucket in the 
backend disks.
2022-09-01 16:53:36 -07:00
Anis Elleuch 10e75116ef
Avoid replicating dirs in listing with replication enabled (#15641)
When replication is enabled in a particular bucket, the listing will send
objects to bucket replication, but it is also sending prefixes for non
recursive listing which is useless and shows a lot of error logs.

This commit will ignore prefixes.
2022-09-01 15:22:11 -07:00
Harshavardhana 433b6fa8fe
upgrade golang-lint to the latest (#15600) 2022-08-26 12:52:29 -07:00
Harshavardhana edba7c987b
fix: objects matching prefixes should not leave delete markers (#15586)
This is needed to ensure that we do not leave prefixes where
version is suspended, instead we never leave versions on
these paths.
2022-08-24 13:46:29 -07:00
Poorna 4155c5b695
replication: improve MRF healing. (#15556)
This PR improves the replication failure healing by persisting
most recent failures to disk and re-queuing them until the replication
is successful.

While this does not eliminate the need for healing during a full scan, 
queuing MRF vastly improves the ETA to keeping replicated buckets 
in sync as it does not wait for the scanner visit to detect unreplicated 
object versions.
2022-08-22 16:53:06 -07:00
Harshavardhana ae4ee95d25
change default lock retry interval to 50ms (#15560)
competing calls on the same object on versioned bucket
mutating calls on the same object may unexpected have
higher delays.

This can be reproduced with a replicated bucket
overwriting the same object writes, deletes repeatedly.

For longer locks like scanner keep the 1sec interval
2022-08-19 16:21:05 -07:00
Harshavardhana e9055e9ef7
fix: walk() should cancel itself upon context cancellation (#15553)
This PR fixes possible leaks that may emanate from not
listening on context cancelation or timeouts.

```
goroutine 60957610 [chan send, 16 minutes]:
github.com/minio/minio/cmd.(*erasureServerPools).Walk.func1.1.1(...)
        github.com/minio/minio/cmd/erasure-server-pool.go:1724 +0x368
github.com/minio/minio/cmd.listPathRaw({0x4a9a740, 0xc0666dffc0},...
        github.com/minio/minio/cmd/metacache-set.go:1022 +0xfc4
github.com/minio/minio/cmd.(*erasureServerPools).Walk.func1.1()
        github.com/minio/minio/cmd/erasure-server-pool.go:1764 +0x528
created by github.com/minio/minio/cmd.(*erasureServerPools).Walk.func1
        github.com/minio/minio/cmd/erasure-server-pool.go:1697 +0x1b7
```
2022-08-18 17:49:08 -07:00
Poorna 21fe14201f
replication: centralize healthcheck for remote targets (#15516)
This PR moves health check from minio-go client to being
managed on the server.

Additionally integrating health check into site replication
2022-08-16 17:46:22 -07:00
Poorna 21bf5b4db7
replication: heal proactively upon access (#15501)
Queue failed/pending replication for healing during listing and GET/HEAD
API calls. This includes healing of existing objects that were never
replicated or those in the middle of a resync operation.

This PR also fixes a bug in ListObjectVersions where lifecycle filtering
should be done.
2022-08-09 15:00:24 -07:00
ebozduman b57e7321e7
Replaces 'disk'=>'drive' visible to end user (#15464) 2022-08-04 16:10:08 -07:00
Poorna 5e0776e96a
replication: Include replica object versions for resync (#15427) 2022-07-28 13:43:02 -07:00
Harshavardhana 5e763b71dc
use logger.LogOnce to reduce printing disconnection logs (#15408)
fixes #15334

- re-use net/url parsed value for http.Request{}
- remove gosimple, structcheck and unusued due to https://github.com/golangci/golangci-lint/issues/2649
- unwrapErrs upto leafErr to ensure that we store exactly the correct errors
2022-07-27 09:44:59 -07:00
Poorna cab8d3d568
feat: add API to return list of objects waiting to be replicated (#15091) 2022-07-21 11:05:44 -07:00
Poorna b4f6901903
resync: Avoid concurrent access/write on map (#15286)
fixes a crash

```
fatal error: concurrent map iteration and map write
minio[19309]: goroutine 18640 [running]:
minio[19309]: runtime.throw({0x27a3399?, 0x1785?})
minio[19309]: runtime/panic.go:992 +0x71 fp=0xc0062f1c80 sp=0xc0062f1c50 pc=0x438671
minio[19309]: runtime.mapiternext(0xc0062f1e90?)
minio[19309]: runtime/map.go:871 +0x4eb fp=0xc0062f1cf0 sp=0xc0062f1c80 pc=0x41002b
minio[19309]: github.com/minio/minio/cmd.(*ReplicationPool).periodicResyncMetaSave(0xc0056c00c0, {0x4d06a48, 0xc0005b2480}, {0x4d22fc0, 0xc0015ea0
```
2022-07-13 16:29:10 -07:00