Ensure delete marker replication success, especially since the
recent optimizations to heal on HEAD, LIST and GET can force
replication attempts on delete marker before underlying object
version could have synced.
Move to using `xl.meta` data structure to keep temporary partInfo,
this allows for a future change where we move to different parts to
different drives.
PUT shall only proceed if pre-conditions are met, the new
code uses
- x-minio-source-mtime
- x-minio-source-etag
to verify if the object indeed needs to be replicated
or not, allowing us to avoid StatObject() call.
When limiting listing do not count delete, since they may be discarded.
Extend limit, since we may be discarding the forward-to marker.
Fix directories always being sent to resolve, since they didn't return as match.
On occasion this test fails:
```
2022-09-12T17:22:44.6562737Z === RUN TestGetObjectWithOutdatedDisks
2022-09-12T17:22:44.6563751Z erasure-object_test.go:1214: Test 2: Expected data to have md5sum = `c946b71bb69c07daf25470742c967e7c`, found `7d16d23f07072af1a809707ba101ae07`
2
```
Theory: Both objects are written with the same timestamp due to lower timer resolution on Windows. This results in secondary resolution, which is deterministic, but random.
Solution: Instead of hacking in a wait we request the specific version we want. Should still keep the test relevant.
Bonus: Remote action dependency for vulncheck
If replication config could not be read from bucket metadata for some
reason, issue a panic so that unexpected replication outcomes can
be avoided for replicated buckets.
For similar reasons, adding a panic while fetching object-lock config
if it failed for reason other than non-existence of config.
to avoid relying on scanner-calculated replication metrics.
This will improve the accuracy of the replication stats reported.
This PR also adds on to #15556 by handing replication
traffic that could not be queued by available workers to the
MRF queue so that entries in `PENDING` status are healed faster.
500k is a reasonable limit for any single MinIO
cluster deployment, in future we may increase this
value.
However for now we are going to keep this limit.
When healing is parallelized by setting the ` _MINIO_HEAL_WORKERS`
environment variable, multiple goroutines may race while updating the disk's
healing tracker. This change serializes only these concurrent updates using a
channel. Note, the healing tracker is still not concurrency safe in other contexts.
This PR is a continuation of the previous change instead
of returning an error, instead trigger a spot heal on the
'xl.meta' and return only after the healing is complete.
This allows for future GETs on the same resource to be
consistent for any version of the object.
xl.meta gets written and never rolled back, however
we definitely need to validate the state that is
persisted on the disk, if there are inconsistencies
- more than write quorum we should return an error
to the client
- if write quorum was achieved however there are
inconsistent xl.meta's we should simply trigger
an MRF on them
The `clusterInfo` struct in admin-handlers is same as
madmin.ClusterRegistrationInfo, except for small differences in field
names.
Removing this and using madmin.ClusterRegistrationInfo in its place will
help in following ways:
- The JSON payload generated by mc in case of cluster registration will
be consistent (same keys) with cluster.info generated by minio as part
of the profile and inspect zip
- health-analyzer can parse the cluster.info using the same struct and
won't have to define it's own
Currently, there is a short time window where the code is allowed
to save the status of a replication resync. Currently, the window is
`now.Sub(st.EndTime) <= resyncTimeInterval`. Also, any failure to
write in the backend disks is not retried.
Refactor the code a little bit to rely on the last timestamp of a
successful write of the resync status of any given bucket in the
backend disks.
When replication is enabled in a particular bucket, the listing will send
objects to bucket replication, but it is also sending prefixes for non
recursive listing which is useless and shows a lot of error logs.
This commit will ignore prefixes.
under some sequence of events following code would
reach an infinite loop.
```
idx1, idx2 := 0, 1
for ; idx2 != idx1; idx2++ {
fmt.Println(idx2)
}
```
fixes#15639
A lot of warning messages are printed in CI/CD failures generated by go
test. Avoid that by requiring at least Error level for logging when
doing go test.
inlined data often is bigger than the allowed
O_DIRECT alignment, so potentially we can write
'xl.meta' without O_DSYNC instead we can rely on
O_DIRECT + fdatasync() instead.
This PR allows O_DIRECT on inlined data that
would gain the benefits of performing O_DIRECT,
eventually performing an fdatasync() at the end.
Performance boost can be observed here for small
objects < 128KiB. The performance boost is mainly
seen on HDD, and marginal on NVMe setups.
When a node finds a change in the other replication cluster and applies
to itself will already notify other peers. No need for all nodes in a
given cluster to do site replication healing, only one node is
sufficient.