Commit Graph

4692 Commits

Author SHA1 Message Date
Harshavardhana
124544d834
add pre-conditions support for PUT calls during replication (#15674)
PUT shall only proceed if pre-conditions are met, the new
code uses

- x-minio-source-mtime
- x-minio-source-etag

to verify if the object indeed needs to be replicated
or not, allowing us to avoid StatObject() call.
2022-09-14 18:44:04 -07:00
Poorna
b910904fa6
change replication stats save path for windows (#15690) 2022-09-14 13:49:13 -07:00
Klaus Post
eee1ce305c
When listing, do not count delete markers (#15689)
When limiting listing do not count delete, since they may be discarded.

Extend limit, since we may be discarding the forward-to marker.

Fix directories always being sent to resolve, since they didn't return as match.
2022-09-14 12:11:27 -07:00
Klaus Post
5c61c3ccdc
Fix flaky TestGetObjectWithOutdatedDisks (#15687)
On occasion this test fails:

```
2022-09-12T17:22:44.6562737Z === RUN   TestGetObjectWithOutdatedDisks
2022-09-12T17:22:44.6563751Z     erasure-object_test.go:1214: Test 2: Expected data to have md5sum = `c946b71bb69c07daf25470742c967e7c`, found `7d16d23f07072af1a809707ba101ae07`
2
```

Theory: Both objects are written with the same timestamp due to lower timer resolution on Windows. This results in secondary resolution, which is deterministic, but random.

Solution: Instead of hacking in a wait we request the specific version we want. Should still keep the test relevant.

Bonus: Remote action dependency for vulncheck
2022-09-14 08:17:39 -07:00
Poorna
a0fb0c1835
panic if replication config could not be read from disk (#15685)
If replication config could not be read from bucket metadata for some
reason, issue a panic so that unexpected replication outcomes can
be avoided for replicated buckets.

For similar reasons, adding a panic while fetching object-lock config
if it failed for reason other than non-existence of config.
2022-09-13 21:23:33 -07:00
Aditya Manthramurthy
e152b2a975
Pass groups claim into condition values (#15679)
This allows using `jwt:groups` as a multi-valued condition key in policies.
2022-09-13 09:45:36 -07:00
Poorna
6b9fd256e1
Persist in-memory replication stats to disk (#15594)
to avoid relying on scanner-calculated replication metrics.
This will improve the accuracy of the replication stats reported.

This PR also adds on to #15556 by handing replication
traffic that could not be queued by available workers to the 
MRF queue so that entries in `PENDING` status are healed faster.
2022-09-12 12:40:02 -07:00
Klaus Post
ff9a74b91f
Add fast max-keys=1 support for Listing (#15670)
Add a listing option to stop when the limit is reached.  
This can be used by stateless listings for fast results.
2022-09-09 08:13:06 -07:00
Harshavardhana
b579163802
limit number of buckets to 500k (#15668)
500k is a reasonable limit for any single MinIO
cluster deployment, in future we may increase this
value.

However for now we are going to keep this limit.
2022-09-09 03:06:34 -07:00
Krishnan Parthasarathi
96bfa77856
serialize updates to healing tracker (#15647)
When healing is parallelized by setting the ` _MINIO_HEAL_WORKERS` 
environment variable, multiple goroutines may race while updating the disk's 
healing tracker. This change serializes only these concurrent updates using a
channel. Note, the healing tracker is still not concurrency safe in other contexts.
2022-09-07 08:47:21 -07:00
Harshavardhana
8e997eba4a
fix: trigger Heal when xl.meta needs healing during PUT (#15661)
This PR is a continuation of the previous change instead
of returning an error, instead trigger a spot heal on the
'xl.meta' and return only after the healing is complete.

This allows for future GETs on the same resource to be
consistent for any version of the object.
2022-09-07 07:25:39 -07:00
Harshavardhana
228c6686f8
allow non-standards fallback for all http.TimeFormats (#15662)
fixes #15645
2022-09-07 07:24:54 -07:00
Harshavardhana
7776d064cf
allow non-standards fallback for Expires header (#15655)
fixes #15645
2022-09-05 19:18:18 -07:00
Harshavardhana
2d9b5a65f1
verify RenameData() versions to be consistent (#15649)
xl.meta gets written and never rolled back, however
we definitely need to validate the state that is
persisted on the disk, if there are inconsistencies

- more than write quorum we should return an error
  to the client

- if write quorum was achieved however there are
  inconsistent xl.meta's we should simply trigger
  an MRF on them
2022-09-05 16:51:37 -07:00
Shireesh Anjal
c240da6568
Reuse madmin.ClusterRegistrationInfo (#15654)
The `clusterInfo` struct in admin-handlers is same as
madmin.ClusterRegistrationInfo, except for small differences in field
names.

Removing this and using madmin.ClusterRegistrationInfo in its place will
help in following ways:

- The JSON payload generated by mc in case of cluster registration will
  be consistent (same keys) with cluster.info generated by minio as part
  of the profile and inspect zip
- health-analyzer can parse the cluster.info using the same struct and
  won't have to define it's own
2022-09-05 10:02:25 -07:00
Harshavardhana
157272dc5b
fix: use optimized json.NewEncoder instead for metrics (#15648) 2022-09-05 08:06:35 -07:00
yudoutingle
f4c56026a2
fix: potential deadLock caused by unlocking a non-existing lock (#15635) 2022-09-02 14:24:32 -07:00
Harshavardhana
37e3f5de10
do not print object not found errors in MRF healing (#15646) 2022-09-02 14:22:40 -07:00
Harshavardhana
5ea629beb2
avoid printing io.ErrUnexpectedEOF for .metacache objects (#15642) 2022-09-02 12:47:17 -07:00
Anis Elleuch
cf52691959
Save resync status in the backend using a last update timestamp (#15638)
Currently, there is a short time window where the code is allowed 
to save the status of a replication resync. Currently, the window is
`now.Sub(st.EndTime) <= resyncTimeInterval`. Also, any failure to 
write in the backend disks is not retried.

Refactor the code a little bit to rely on the last timestamp of a
successful write of the resync status of any given bucket in the 
backend disks.
2022-09-01 16:53:36 -07:00
Anis Elleuch
10e75116ef
Avoid replicating dirs in listing with replication enabled (#15641)
When replication is enabled in a particular bucket, the listing will send
objects to bucket replication, but it is also sending prefixes for non
recursive listing which is useless and shows a lot of error logs.

This commit will ignore prefixes.
2022-09-01 15:22:11 -07:00
Harshavardhana
f649968c69
tier: avoid stats infinite loop in forwardTo method (#15640)
under some sequence of events following code would
reach an infinite loop.

```
idx1, idx2 := 0, 1
for ; idx2 != idx1; idx2++ {
        fmt.Println(idx2)
}
```

fixes #15639
2022-09-01 13:51:06 -07:00
Harshavardhana
bcedc2b0d9
fix: add healing metric type for heal tracing (#15631)
changes the `heal.checkBucket` to `heal.Bucket` instead
since the latter is more meaningful.
2022-08-31 12:28:03 -07:00
Klaus Post
8e4a45ec41
fix: encrypt checksums in metadata (#15620) 2022-08-31 08:13:23 -07:00
Klaus Post
dec942beb6
feat: Add healing trace (#15616) 2022-08-31 01:56:12 -07:00
Abirdcfly
d4e0f13bb3
chore: remove duplicate word in comments (#15607)
Signed-off-by: Abirdcfly <fp544037857@gmail.com>

Signed-off-by: Abirdcfly <fp544037857@gmail.com>
2022-08-30 08:26:43 -07:00
Anis Elleuch
1f28a3bb80
Avoid messages from go test output (#15601)
A lot of warning messages are printed in CI/CD failures generated by go
test. Avoid that by requiring at least Error level for logging when
doing go test.
2022-08-30 08:23:40 -07:00
Krishnan Parthasarathi
3a1d3a7952
audit-log: Add time to get/restore object from remote-tier (#15602) 2022-08-29 21:33:59 -07:00
Klaus Post
a9f1ad7924
Add extended checksum support (#15433) 2022-08-29 16:57:16 -07:00
Poorna
929b9e164e
site replication: Avoid returning root svcacct info in sr metadata (#15608)
Service accounts of root users should not be replicated.
2022-08-29 11:19:51 -07:00
Harshavardhana
97376f6e8f
improve performance for inlined data (#15603)
inlined data often is bigger than the allowed
O_DIRECT alignment, so potentially we can write
'xl.meta' without O_DSYNC instead we can rely on
O_DIRECT + fdatasync() instead.

This PR allows O_DIRECT on inlined data that
would gain the benefits of performing O_DIRECT,
eventually performing an fdatasync() at the end.

Performance boost can be observed here for small
objects < 128KiB. The performance boost is mainly
seen on HDD, and marginal on NVMe setups.
2022-08-29 11:19:29 -07:00
Febriananda Wida Pramudita
1f22a16b15
fix: endpoints for single local disks must retain port info (#15585) 2022-08-26 12:53:15 -07:00
Harshavardhana
433b6fa8fe
upgrade golang-lint to the latest (#15600) 2022-08-26 12:52:29 -07:00
Krishnan Parthasarathi
99fbfe2421
Add concurrency to healing objects on a fresh disk (#15575) 2022-08-25 13:07:15 -07:00
Poorna
b1b6264bea
fix: validate deployment id when adding peer clusters (#15591)
Fixes: #15573
2022-08-25 11:30:52 -07:00
Aditya Manthramurthy
18dffb26e7
Allow querying a single target in config get API (#15587) 2022-08-25 00:17:05 -07:00
Harshavardhana
edba7c987b
fix: objects matching prefixes should not leave delete markers (#15586)
This is needed to ensure that we do not leave prefixes where
version is suspended, instead we never leave versions on
these paths.
2022-08-24 13:46:29 -07:00
Anis Elleuch
b737c83a66
Ensure that only one node performs site replication healing (#15584)
When a node finds a change in the other replication cluster and applies
to itself will already notify other peers. No need for all nodes in a
given cluster to do site replication healing, only one node is
sufficient.
2022-08-24 13:46:09 -07:00
Anis Elleuch
97a6322de1
Fix regression in notifying peers about new policy mapping (#15583)
Switch from mux.Vars() to r.Form to avoid the issue of missing arguments
passed to LoadPolicyMappingHandler.
2022-08-24 12:34:52 -07:00
Klaus Post
037fe4afdc
Add listing block reuse (#15579)
When streaming results, pool metadata slices when sent.
2022-08-24 09:11:16 -07:00
Aditya Manthramurthy
afbb63a197
Factor out external event notification funcs (#15574)
This change moves external event notification functionality into
`event-notification.go`. This simplifies notification related code.
2022-08-24 06:42:36 -07:00
Harshavardhana
8902561f3c
use new xxml for XML responses to support rare control characters (#15511)
use new xxml/XML responses to support rare control characters

fixes #15023
2022-08-23 17:04:11 -07:00
Anis Elleuch
b8cdf060c8
Properly replicate policy mapping for virtual users (#15558)
Currently, replicating policy mapping for STS users does not work. Fix
it is by passing user type to PolicyDBSet.
2022-08-23 11:11:45 -07:00
Poorna
4155c5b695
replication: improve MRF healing. (#15556)
This PR improves the replication failure healing by persisting
most recent failures to disk and re-queuing them until the replication
is successful.

While this does not eliminate the need for healing during a full scan, 
queuing MRF vastly improves the ETA to keeping replicated buckets 
in sync as it does not wait for the scanner visit to detect unreplicated 
object versions.
2022-08-22 16:53:06 -07:00
Poorna
471467d310
fix: ensure metadata update happens after deletemarker replication (#15564)
Fixes regression caused by #15521
2022-08-22 15:59:06 -07:00
Harshavardhana
ae4ee95d25
change default lock retry interval to 50ms (#15560)
competing calls on the same object on versioned bucket
mutating calls on the same object may unexpected have
higher delays.

This can be reproduced with a replicated bucket
overwriting the same object writes, deletes repeatedly.

For longer locks like scanner keep the 1sec interval
2022-08-19 16:21:05 -07:00
Harshavardhana
e9055e9ef7
fix: walk() should cancel itself upon context cancellation (#15553)
This PR fixes possible leaks that may emanate from not
listening on context cancelation or timeouts.

```
goroutine 60957610 [chan send, 16 minutes]:
github.com/minio/minio/cmd.(*erasureServerPools).Walk.func1.1.1(...)
        github.com/minio/minio/cmd/erasure-server-pool.go:1724 +0x368
github.com/minio/minio/cmd.listPathRaw({0x4a9a740, 0xc0666dffc0},...
        github.com/minio/minio/cmd/metacache-set.go:1022 +0xfc4
github.com/minio/minio/cmd.(*erasureServerPools).Walk.func1.1()
        github.com/minio/minio/cmd/erasure-server-pool.go:1764 +0x528
created by github.com/minio/minio/cmd.(*erasureServerPools).Walk.func1
        github.com/minio/minio/cmd/erasure-server-pool.go:1697 +0x1b7
```
2022-08-18 17:49:08 -07:00
Harshavardhana
d350b666ff
feat: add idempotent delete marker support (#15521)
The bottom line is delete markers are a nuisance,
most applications are not version aware and this
has simply complicated the version management.

AWS S3 gave an unnecessary complication overhead
for customers, they need to now manage these
markers by applying ILM settings and clean
them up on a regular basis.

To make matters worse all these delete markers
get replicated as well in a replicated setup,
requiring two ILM settings on each site.

This PR is an attempt to address this inferior
implementation by deviating MinIO towards an
idempotent delete marker implementation i.e
MinIO will never create any more than single
consecutive delete markers.

This significantly reduces operational overhead
by making versioning more useful for real data.

This is an S3 spec deviation for pragmatic reasons.
2022-08-18 16:41:59 -07:00
Harshavardhana
895357607a
avoid using errors.As for 'errors.New' use errors.Is (#15549)
Bonus: ignore coredns CVE, for now, there is no fix yet

https://github.com/coredns/coredns/issues/5574
2022-08-18 11:10:49 -07:00
Harshavardhana
bf38c0c0d1
fix: increase concurrency of DeleteObjects() to N/10th (#15546)
instead of keeping the value 10 and static, make
the concurrency a function of incoming number of
objects being deleted.
2022-08-18 09:33:56 -07:00