Commit Graph

4354 Commits

Author SHA1 Message Date
Anis Elleuch
1f92fc3fc0
Always check for root disks unless MINIO_CI_CD is set (#14232)
The current code considers a pool with all root disks to be as part
of a testing environment even if there are other pools with mounted
disks. This will result to illegitimate writing in root disks.

Fix this by simplifing the logic: require MINIO_CI_CD in order to skip
root disk check.
2022-02-13 15:42:07 -08:00
Harshavardhana
fad3d66093
parallelize background cleanup on local disks across sets (#14290) 2022-02-11 14:22:48 -08:00
Poorna
ed3418c046
Refactor replication resync to be an active process (#14266)
When resync is triggered, walk the bucket namespace and
resync objects that are unreplicated. This PR also adds
an API to report resync progress.
2022-02-10 10:16:52 -08:00
Anis Elleuch
71bab74148
Fix adding bucket forwarder handler in server mode (#14288)
MinIO configuration is loaded after the initialization of the server
handlers, which will miss the initialization of the bucket forwarder
handler.

Though the federation is deprecated, let's fix this for the time being.
2022-02-10 08:49:36 -08:00
Anis Elleuch
661ea57907
restore: Add quotes some fields in x-amz-restore header (#14281)
S3 spec returns x-amz-restore header in HEAD/GET object with the
following format:

```
x-amz-restore: ongoing-request="false", expiry-date="Fri, 21 Dec 2012
00:00:00 GMT"
```

This commit adds quotes as the current code does not support it. It will
also supports the old format saved in the disk (in xl.meta) for backward
compatibility.
2022-02-09 13:17:41 -08:00
Anis Elleuch
1f18efb0ba
gateway: Active bucket forwarding handler (#14277)
A regression removed support of federation in the gateway mode. 
Enable it again.

Federation is deprecated for a while but let's fix this for the time being.
2022-02-09 09:31:47 -08:00
Daniel
8ae46bce93
fix the error logs have been omitted because of retryCount never exceed 10 (#14268) 2022-02-09 03:14:22 -08:00
Harshavardhana
f19a414e09
fix: allow danging objects to be purged properly deleteMultipleObjects() (#14273)
Deleting bulk objects had an issue since the relevant versionID
is not passed through the layers to ensure that the dangling
object purge actually works cleanly.

This is a continuation of quorum related error returned by
multi-object delete API from #14248

This PR ensures that we pass down correct information as
well as extend the scope of dangling object detection.
2022-02-08 20:08:23 -08:00
Krishnan Parthasarathi
0ee2933234
Export tier metrics via Prometheus (#13413)
e.g
```
minio_cluster_ilm_transitioned_bytes{server="minio3:9000",tier="S3TIER-1"} 1.36317772e+08
minio_cluster_ilm_transitioned_bytes{server="minio3:9000",tier="S3TIER-2"} 2892
minio_cluster_ilm_transitioned_bytes{server="minio3:9000",tier="STANDARD"}
1.3631488e+08

minio_cluster_ilm_transitioned_objects{server="minio3:9000",tier="S3TIER-1"} 1
minio_cluster_ilm_transitioned_objects{server="minio3:9000",tier="S3TIER-2"} 0
minio_cluster_ilm_transitioned_objects{server="minio3:9000",tier="STANDARD"} 1

minio_cluster_ilm_transitioned_versions{server="minio3:9000",tier="S3TIER-1"} 3
minio_cluster_ilm_transitioned_versions{server="minio3:9000",tier="S3TIER-2"} 2
minio_cluster_ilm_transitioned_versions{server="minio3:9000",tier="STANDARD"} 1
```
2022-02-08 12:45:28 -08:00
Shireesh Anjal
9890f579f8
Add subsystem level validation on config set (#14269)
When setting a config of a particular sub-system, validate the existing
config and notification targets of only that sub-system, so that
existing errors related to one sub-system (e.g. notification target
offline) do not result in errors for other sub-systems.
2022-02-08 10:36:41 -08:00
Anis Elleuch
2ee337ead5
prometheus: Add incoming requests metrics since last scrape (#14261)
Some users running MinIO claim that their system became slow. One 
way to investigate is to look at this Prometheus history of the number of
the requests reaching the server. The existing current S3 requests metric
is not enough because it can increase of the system really becomes slow, 
due to disk issues for example.
2022-02-07 16:30:14 -08:00
Harshavardhana
3c87e1e60d
fix: rename some function names to avoid confusion (#14262) 2022-02-07 11:49:07 -08:00
Harshavardhana
0cac868a36
speed-up startup time, do not block on ListBuckets() (#14240)
Bonus fixes #13816
2022-02-07 10:39:57 -08:00
Harshavardhana
186c477f3c init console server after server config is initialized
fixes #14259
2022-02-07 00:17:33 -08:00
Harshavardhana
6123377e66
speedup getFormatErasureInQuorum use driveCount (#14239)
startup speed-up, currently getFormatErasureInQuorum()
would spend up to 2-3secs when there are 3000+ drives
for example in a setup, simplify this implementation
to use drive counts.
2022-02-04 12:21:21 -08:00
Harshavardhana
0256dae657
fix: quorum requirement for DeleteMarkers and parity upgraded objects (#14248)
DeleteMarkers do not have a default quorum, i.e it is possible that
DeleteMarkers were created with n/2+1 quorum as well to make sure
that we satisfy situations such as those we need to make sure delete
markers only expect n/2 read quorum.

Additionally we should also look at additional metadata on the
actual objects that might have been "erasure" upgraded with new
parity when disks are down.

In such a scenario do not default to the standard storage class
parity, instead use the parityBlocks present on the FileInfo to
ensure that we are dealing with the correct quorum for READs and
DELETEs.
2022-02-04 02:47:36 -08:00
Harshavardhana
84b121bbe1
return error with empty x-amz-copy-source-range headers (#14249)
fixes #14246
2022-02-03 16:58:27 -08:00
Harshavardhana
01e550a9be
ignore unreadable metrics on certain closed systems (#14234)
fixes #14233
2022-02-03 09:45:12 -08:00
Poorna
63a2e0bab6
Remove notification from NotificationSys on bucket deletion (#14236) 2022-02-02 17:11:56 -08:00
Harshavardhana
24657859a8
when o_direct is disabled do not attempt fadvise call (#14230) 2022-02-02 08:54:52 -08:00
Sidhartha Mani
d7df6bc738
add support for speedtest drive (#14182) 2022-02-01 22:38:05 -08:00
Poorna
a4e1de93a7
Add API for removing site(s) from site replication (#14104) 2022-02-01 17:26:09 -08:00
Klaus Post
067d21d0f2
fs: Retry listing if no marker (#14221)
Retry listings, when no next marker is returned and the result isn't truncated.

This can happen when an object is queued, but no info can be fetched.

Fixes #14190
2022-02-01 10:00:14 -08:00
Shireesh Anjal
3882da6ac5
Add subnet proxy config (#14225)
Will store the HTTP(S) proxy URL to use for connecting to SUBNET.
2022-02-01 09:52:38 -08:00
Anis Elleuch
127e8bf3b6
heal: Avoid printing repetitive error to heal a root disk (#14220)
The healing code repeatedly tries to heal a root disk when it is empty
the reason is that connectEndpoint() returns errUnformattedDisk even
if the disk is a root disk. Changing that to returning another error
will avoid queueing the disk to the healing code in each connect disks
iteration.
2022-01-31 17:28:20 -08:00
Harshavardhana
74faed166a
Add quota usage as part of prometheus metrics (#14222)
Bonus: pass caller context when needed to all bucket metadata handling calls.
2022-01-31 17:27:43 -08:00
Harshavardhana
dbd05d6e82
remove FIFO bucket quota, use ILM expiration instead (#14206) 2022-01-31 11:07:04 -08:00
Harshavardhana
b5d35c7e09
ignore disk metrics for single drive mode (#14212)
fixes #14211
2022-01-31 00:44:26 -08:00
Poorna
0f88cdc80e
Return all stats in SiteReplicationStatus API if options unset (#14207) 2022-01-28 21:19:38 -08:00
Poorna
38e3c7a8f7
Added filters for SiteReplicationStatus API to support new UI changes (#14177) 2022-01-28 15:37:55 -08:00
Poorna
a4be47d7ad
Validate config before saving changes after config reset (#14203) 2022-01-27 18:28:16 -08:00
Harshavardhana
aaea94a48d
update quorum requirement to list all objects (#14201)
some upgraded objects might not get listed due
to different quorum ratios across objects.

make sure to list all objects that satisfy the
maximum possible quorum.
2022-01-27 17:00:15 -08:00
Aditya Manthramurthy
c3d9c45f58
Ensure that AssumeRole calls are sent to Audit log (#14202)
When authentication fails MinIO was not sending out an Audit log 
event for this STS call
2022-01-27 16:17:11 -08:00
Klaus Post
a2a48cc065
Optimize read locker cleanup (#14200)
When objects hold a lot of read locks cleanup time grows exponentially.

```
BEFORE:

Unable to complete tests.

AFTER:

=== RUN   Test_localLocker_expireOldLocksExpire/100-locks/1-read
    local-locker_test.go:298: Scan Took: 0s. Left: 100/100
    local-locker_test.go:317: Expire 50% took: 0s. Left: 44/44
    local-locker_test.go:331: Expire rest took: 0s. Left: 0/0
=== RUN   Test_localLocker_expireOldLocksExpire/100-locks/100-read
    local-locker_test.go:298: Scan Took: 0s. Left: 10000/100
    local-locker_test.go:317: Expire 50% took: 1ms. Left: 5000/100
    local-locker_test.go:331: Expire rest took: 1ms. Left: 0/0
=== RUN   Test_localLocker_expireOldLocksExpire/100-locks/1000-read
    local-locker_test.go:298: Scan Took: 2ms. Left: 100000/100
    local-locker_test.go:317: Expire 50% took: 55ms. Left: 50038/100
    local-locker_test.go:331: Expire rest took: 29ms. Left: 0/0
=== RUN   Test_localLocker_expireOldLocksExpire/10000-locks/1-read
    local-locker_test.go:298: Scan Took: 1ms. Left: 10000/10000
    local-locker_test.go:317: Expire 50% took: 2ms. Left: 5019/5019
    local-locker_test.go:331: Expire rest took: 2ms. Left: 0/0
=== RUN   Test_localLocker_expireOldLocksExpire/10000-locks/100-read
    local-locker_test.go:298: Scan Took: 23ms. Left: 1000000/10000
    local-locker_test.go:317: Expire 50% took: 160ms. Left: 499798/10000
    local-locker_test.go:331: Expire rest took: 138ms. Left: 0/0
=== RUN   Test_localLocker_expireOldLocksExpire/10000-locks/1000-read
    local-locker_test.go:298: Scan Took: 200ms. Left: 10000000/10000
    local-locker_test.go:317: Expire 50% took: 5.888s. Left: 5000196/10000
    local-locker_test.go:331: Expire rest took: 3.417s. Left: 0/0
=== RUN   Test_localLocker_expireOldLocksExpire/1000000-locks/1-read
    local-locker_test.go:298: Scan Took: 133ms. Left: 1000000/1000000
    local-locker_test.go:317: Expire 50% took: 348ms. Left: 500255/500255
    local-locker_test.go:331: Expire rest took: 307ms. Left: 0/0
```
2022-01-27 14:10:57 -08:00
Harshavardhana
cf407f7176
do not expect 'speedtest' to be a bucket (#14199)
fixes #14196
2022-01-27 08:13:03 -08:00
Harshavardhana
d6dd17a483
make sure to pass groups for all credentials while verifying policies (#14193)
fixes #14180
2022-01-26 21:53:36 -08:00
Aditya Manthramurthy
7dfa565d00
Identity LDAP: Allow multiple search base DNs (#14191)
This change allows the MinIO server to lookup users in different directory
sub-trees by allowing specification of multiple search bases separated by
semicolons.
2022-01-26 15:05:59 -08:00
Krishnan Parthasarathi
d2e5f01542
feat: maintain in-memory tier stats for the last 24hrs (#13782) 2022-01-26 14:33:10 -08:00
yfanswer
f4e373e0d2
de-couple cache completeMultipartUpload with caller context (#14181) 2022-01-26 11:55:58 -08:00
Harshavardhana
57118919d2
cached diskIDs are not needed for scanner healing (#14170)
This PR removes an unnecessary state that gets
passed around for DiskIDs, which is not necessary
since each disk exactly knows which pool and which
set it belongs to on a running system.

Currently cached DiskId's won't work properly
because it always ends up skipping offline disks
and never runs healing when disks are offline, as
it expects all the cached diskIDs to be present
always. This also sort of made things in-flexible
in terms perhaps a new diskID for `format.json`.
(however this is not a big issue)

This is an unnecessary requirement that healing
via scanner needs all drives to be online, instead
healing should trigger even when partial nodes
and drives are available this ensures that we
keep the SLA in-tact on the objects when disks
are offline for a prolonged period of time.
2022-01-26 08:34:56 -08:00
Klaus Post
7db05a80dd
locking: Fix wrong map id (#14184)
Wrong resource is being fetched, since idx is incremented, but mapID is reused.

Regression caused by #13454 - that part didn't optimize anything anyway.
2022-01-26 08:34:09 -08:00
Anis Elleuch
45a99c3fd3
publish storage API latency through node metrics (#14117)
Publish storage functions latency to help compare the performance 
of different disks in a single deployment.

e.g.:
```
minio_node_disk_latency_us{api="storage.WalkDir",disk="/tmp/xl/1",server="localhost:9001"} 226
minio_node_disk_latency_us{api="storage.WalkDir",disk="/tmp/xl/2",server="localhost:9002"} 1180
minio_node_disk_latency_us{api="storage.WalkDir",disk="/tmp/xl/3",server="localhost:9003"} 1183
minio_node_disk_latency_us{api="storage.WalkDir",disk="/tmp/xl/4",server="localhost:9004"} 1625
```
2022-01-25 16:31:44 -08:00
Harshavardhana
b68f0cbde4
ignore remote disks with diskID empty as offline (#14168)
concurrent loading of erasure sets can now expose a
situation in a distributed setup that might return
diskID as empty, treat such disks as offline.
2022-01-24 19:40:02 -08:00
Krishnan Parthasarathi
ebc3627c73
further improvements to newXLStorage (#14166)
- create internal erasure volumes only if the disk is unformatted
- return a copy of format data in xlStorage.ReadAll
- parse env vars only once, to be re-used by xl-storage
2022-01-24 17:09:12 -08:00
Harshavardhana
5a9f133491
speed up startup sequence for all operations (#14148)
This speed-up is intended for faster startup times
for almost all MinIO operations. Changes here are

- Drives are not re-read for 'format.json' on a regular
  basis once read during init is remembered and refreshed
  at 5 second intervals.

- Do not do O_DIRECT tests on drives with existing 'format.json'
  only fresh setups need this check.

- Parallelize initializing erasureSets for multiple sets.

- Avoid re-reading format.json when migrating 'format.json'
  from really old V1->V2->V3

- Keep a copy of local drives for any given server in memory
  for a quick lookup.
2022-01-24 11:28:45 -08:00
Harshavardhana
f6d13f57bb
fix: correct parentUser lookup for OIDC auto expiration (#14154)
fixes #14026

This is a regression from #13884
2022-01-22 16:36:11 -08:00
Poorna
48da4aeee0
Add API for removing site(s) from site replication (#14022) 2022-01-21 08:48:21 -08:00
Harshavardhana
7f214a0e46
use dnscache resolver for resolving command line endpoints (#14135)
this helps in caching the resolved values early on, avoids
causing further resolution for individual nodes when
object layer comes online.

this can speed up our startup time during, upgrades etc by
an order of magnitude.

additional changes in connectLoadInitFormats() and parallelize
all calls that might be potentially blocking.
2022-01-20 13:03:15 -08:00
Klaus Post
e1a0a1e73c
fs: Return prefix as listing marker if no objects (#14143)
Fixes #14132
2022-01-20 10:55:18 -08:00
Harshavardhana
9d588319dd
support site replication to replicate IAM users,groups (#14128)
- Site replication was missing replicating users,
  groups when an empty site was added.

- Add site replication for groups and users when they
  are disabled and enabled.

- Add support for replicating bucket quota config.
2022-01-19 20:02:24 -08:00
Klaus Post
0012ca8ca5
Fix inconsistent metadata after healing (#14125)
When calculating signatures empty part ETags were not discarded, leading 
to a different signature compared to freshly created ones.

This would mean that after a heal signature of the healed metadata would be 
different. Fixing the calculation of signature will make these consistent.

Furthermore when inconsistent entries, with zero version ID, with the same 
mod times but different signatures, the one with the lowest signature would 
be picked for quorum check. Since this is 50/50, we fall back to a simple 
quorum count on all signatures.

Each of these fixes by themselves will lead to quorum. Tests were added 
for regressions and expected outcomes.
2022-01-19 10:48:00 -08:00
Poorna
288e276abe
Specify tags in options while selecting replication targets (#14126)
When the replication rule is based on tag matches, the replication process
should pick up targets matching the tags specified in the replication
rule.

Fixing regression due to #12880
2022-01-19 10:45:42 -08:00
Jarbitz
f22e745514
fix: ListBucketUsers comment doc (#14129) 2022-01-19 10:45:13 -08:00
Krishnan Parthasarathi
070c31eac5
Wait for updates collector when disk.NSScanner returns error (#14127) 2022-01-19 00:46:43 -08:00
Harshavardhana
70e1cbda21
allow disabling O_DIRECT in certain environments for reads (#14115)
repeated reads on single large objects in HPC like
workloads, need the following option to disable
O_DIRECT for a more effective usage of the kernel
page-cache.

However this optional should be used in very specific
situations only, and shouldn't be enabled on all
servers.

NVMe servers benefit always from keeping O_DIRECT on.
2022-01-17 08:34:14 -08:00
Harshavardhana
60f2df54e0
Add envVars for CLI arguments (#14114)
fixes #14107
2022-01-15 16:20:02 -08:00
Harshavardhana
ba708f51f2
fix: copyMetrics to avoid map references elsewhere (#14113)
map labels might have been referenced else, this
can lead to concurrent access at lower layers.

avoid this by copying the information while
concurrently serving the metrics.
2022-01-14 16:48:19 -08:00
Harshavardhana
0df31f63ab
reject changing pools when there are pending decommissions in-progress (#14102)
do not allow mutation to pool command line when there are
unfinished decommissions in place, disallow such scenarios
to avoid user mistakes.

also add testcases to cover all relevant scenarios.
2022-01-14 10:32:35 -08:00
Klaus Post
64d4da5a37
Add Put input readahead (#14084)
When reading input for PutObject or PutObjectPart add a readahead buffer for big inputs.

This will make network reads+hashing separate run async with erasure coding and writes. This will reduce overall latency in distributed setups where the input is from upstream and writes go to other servers.

We will read at 2 buffers ahead, meaning one will always be ready/waiting and one is currently being read from.

This improves PutObject and PutObjectParts for these cases.
2022-01-14 10:01:25 -08:00
Harshavardhana
7aec38a73e
Simplify the messaging for internode versions (#14103)
provide a cleaner message instead of cryptic
logs, also provide the relevant link on how to do
recommended way to upgrade.
2022-01-13 17:25:08 -08:00
Klaus Post
a2fd8caa69
Ignore version not found in deleteVersions (#14093)
When deleting multiple versions it "gives" up with an errFileVersionNotFound if 
a version cannot be found. This effectively skips deleting other versions 
sent in the same request. 

This can happen on inconsistent objects. We should ignore errFileVersionNotFound 
and continue with others.

We already ignore these at the caller level, this PR is continuation of 54a9877
2022-01-13 14:28:07 -08:00
Harshavardhana
f546636c52
fix: use renameAll instead of deleteObject() for purging temporary files (#14096)
This PR simplifies few things

- Multipart parts are renamed, upon failure are unrenamed() keep this
  multipart specific behavior it is needed and works fine.

- AbortMultipart should blindly delete once lock is acquired instead
  of re-reading metadata and calculating quorum, abort is a delete()
  operation and client has no business looking for errors on this.

- Skip Access() calls to folders that are operating on
  `.minio.sys/multipart` folder as well.
2022-01-13 11:07:41 -08:00
Harshavardhana
38ccc4f672
fix: make sure to avoid calling RenameData() on disconnected disks. (#14094)
Large clusters with multiple sets, or multi-pool setups at times might
fail and report unexpected "file not found" errors. This can become
a problem during startup sequence when some files need to be created
at multiple locations.

- This PR ensures that we nil the erasure writers such that they
  are skipped in RenameData() call.

- RenameData() doesn't need to "Access()" calls for `.minio.sys`
  folders they always exist.

- Make sure PutObject() never returns ObjectNotFound{} for any
  errors, make sure it always returns "WriteQuorum" when renameData()
  fails with ObjectNotFound{}. Return appropriate errors for all
  other cases.
2022-01-12 18:49:01 -08:00
Harshavardhana
cc3f139d1f
replication: attempt abort multipart-upload at max 3 times on remote (#14087)
this is mainly an attempt to relinquish space on the remote
site, if this still doesn't do it we give and let the admin
know with a log message.
2022-01-11 22:32:29 -08:00
Harshavardhana
d50442da01
fix: simplify usage calculation and progress (#14086) 2022-01-11 18:48:43 -08:00
Harshavardhana
404b05a44c
fix: ignore drained pool in Healing, hold lock additionally (#14080) 2022-01-11 12:27:47 -08:00
Harshavardhana
3d7c1ad31d
ignore configNotFound error in AccountInfo() (#14082)
fixes #14081
2022-01-11 08:43:18 -08:00
yinhen
d300e775a6
Avoid reconnect of disk during startup sequence (#14070) 2022-01-10 23:33:58 -08:00
Harshavardhana
7ee2d1c339
fix: when healing log path when we give up (#14079) 2022-01-10 21:22:17 -08:00
Poorna
54a98773f8
fix: replication of tag removal (#14056)
Currently tag removal leaves replication state as `PENDING` 
because the `HEAD` api returns just a tag count but not the 
actual tags, and this is treated as a no-op
2022-01-10 19:06:10 -08:00
Harshavardhana
737a3f0bad
fix: decommission bugfixes found during migration of .minio.sys/config (#14078) 2022-01-10 17:26:00 -08:00
Harshavardhana
3bd9636a5b
do not remove Sid from svcaccount policies (#14064)
fixes #13905
2022-01-10 14:26:26 -08:00
Harshavardhana
76b21de0c6
feat: decommission feature for pools (#14012)
```
λ mc admin decommission start alias/ http://minio{1...2}/data{1...4}
```

```
λ mc admin decommission status alias/
┌─────┬─────────────────────────────────┬──────────────────────────────────┬────────┐
│ ID  │ Pools                           │ Capacity                         │ Status │
│ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Active │
│ 2nd │ http://minio{3...4}/data{1...4} │ 329 GiB (used) / 421 GiB (total) │ Active │
└─────┴─────────────────────────────────┴──────────────────────────────────┴────────┘
```

```
λ mc admin decommission status alias/ http://minio{1...2}/data{1...4}
Progress: ===================> [1GiB/sec] [15%] [4TiB/50TiB]
Time Remaining: 4 hours (started 3 hours ago)
```

```
λ mc admin decommission status alias/ http://minio{1...2}/data{1...4}
ERROR: This pool is not scheduled for decommissioning currently.
```

```
λ mc admin decommission cancel alias/
┌─────┬─────────────────────────────────┬──────────────────────────────────┬──────────┐
│ ID  │ Pools                           │ Capacity                         │ Status   │
│ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Draining │
└─────┴─────────────────────────────────┴──────────────────────────────────┴──────────┘
```

> NOTE: Canceled decommission will not make the pool active again, since we might have
> Potentially partial duplicate content on the other pools, to avoid this scenario be
> very sure to start decommissioning as a planned activity.

```
λ mc admin decommission cancel alias/ http://minio{1...2}/data{1...4}
┌─────┬─────────────────────────────────┬──────────────────────────────────┬────────────────────┐
│ ID  │ Pools                           │ Capacity                         │ Status             │
│ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Draining(Canceled) │
└─────┴─────────────────────────────────┴──────────────────────────────────┴────────────────────┘
```
2022-01-10 09:07:49 -08:00
Harshavardhana
b7c5e45fff
heal: isObjectDangling should return false when it cannot decide (#14053)
In a multi-pool setup when disks are coming up, or in a single pool
setup let's say with 100's of erasure sets with a slow network.

It's possible when healing is attempted on `.minio.sys/config`
folder, it can lead to healing unexpectedly deleting some policy
files as dangling due to a mistake in understanding when `isObjectDangling`
is considered to be 'true'.

This issue happened in commit 30135eed86
when we assumed the validMeta with empty ErasureInfo is considered
to be fully dangling. This implementation issue gets exposed when
the server is starting up.

This is most easily seen with multiple-pool setups because of the
disconnected fashion pools that come up. The decision to purge the
object as dangling is taken incorrectly prior to the correct state
being achieved on each pool, when the corresponding drive let's say
returns 'errDiskNotFound', a 'delete' is triggered. At this point,
the 'drive' comes online because this is part of the startup sequence
as drives can come online lazily.

This kind of situation exists because we allow (totalDisks/2) number
of drives to be online when the server is being restarted.

Implementation made an incorrect assumption here leading to policies
getting deleted.

Added tests to capture the implementation requirements.
2022-01-07 19:11:54 -08:00
Aditya Manthramurthy
0a224654c2
fix: progagation of service accounts for site replication (#14054)
- Only non-root-owned service accounts are replicated for now.
- Add integration tests for OIDC with site replication
2022-01-07 17:41:43 -08:00
Aditya Manthramurthy
1981fe2072
Add internal IDP and OIDC users support for site-replication (#14041)
- This allows site-replication to be configured when using OpenID or the
  internal IDentity Provider.

- Internal IDP IAM users and groups will now be replicated to all members of the
  set of replicated sites.

- When using OpenID as the external identity provider, STS and service accounts
  are replicated.

- Currently this change dis-allows root service accounts from being
  replicated (TODO: discuss security implications).
2022-01-06 15:52:43 -08:00
Minio Trusted
76877eb6fa move gofumpt to golang-ci 2022-01-06 13:08:21 -08:00
Klaus Post
3d66d053c7
Add small client TLS PSK cache (#14039) 2022-01-06 11:34:02 -08:00
Klaus Post
0e31cff762
fix: DeleteMultipleObjects to finish even if cancelled + concurrent sets (#14038)
* Process sets concurrently.
* Disconnect context from request.
* Insert context cancellation checks.
* errFileNotFound and errFileVersionNotFound are ok, unless creating delete markers.
2022-01-06 10:47:49 -08:00
Shireesh Anjal
c27110e37d
Add timeinfo to health data (#14013)
Capture RoundtripDuration to figure out 
NTP issues in subnet health analyzer.
2022-01-06 01:51:10 -08:00
Harshavardhana
89441a22aa
enforceRetentionForDeletion should return false early for delete-marker (#14033) 2022-01-05 17:05:28 -08:00
Poorna
4d39fd4165
Add API for cluster replication status visibility (#13885) 2022-01-05 02:44:08 -08:00
Harshavardhana
001b77e7e1
use readConfig/saveConfig to simplify I/O on usage/tracker info (#14019) 2022-01-03 10:22:58 -08:00
Harshavardhana
a60ac7ca17
fix: audit log to support object names in multipleObjectNames() handler (#14017) 2022-01-03 01:28:52 -08:00
Harshavardhana
42ba0da6b0
fix: initialize new drwMutex for each attempt in 'for {' loop. (#14009)
It is possible that GetLock() call remembers a previously
failed releaseAll() when there are networking issues, now
this state can have potential side effects.

This PR tries to avoid this side affect by making sure
to initialize NewNSLock() for each GetLock() attempts
made to avoid any prior state in the memory that can
interfere with the new lock grants.
2022-01-02 09:15:34 -08:00
Harshavardhana
f527c708f2
run gofumpt cleanup across code-base (#14015) 2022-01-02 09:15:06 -08:00
Harshavardhana
79df2c7ce7
correctly calculate read quorum based on the available fileInfo (#14000)
The current usage of assuming `default` parity of `4` is not correct
for all objects stored on MinIO, objects in .minio.sys have maximum
parity, healing won't trigger on these objects due to incorrect
verification of quorum.
2021-12-28 15:33:03 -08:00
Harshavardhana
866a95de38
fix: choose appropriate quorum for a given erasure set (#13998)
multiObject delete should honor expected quorum
2021-12-28 12:41:52 -08:00
Minio Trusted
bb97eafa82 madmin-go v1.1.23 and pkg v1.1.11 2021-12-26 23:23:18 -08:00
Harshavardhana
c980804514
trim values from envrionment files (#13991)
trim values to remove any spaces, newlines
from the files while importing credentials
and other values.
2021-12-25 22:02:54 -08:00
Harshavardhana
b883803b21
fix: healing across pools removing dangling objects (#13990)
adds other simplifications to the code when running
namespace heals across pools.
2021-12-25 09:01:44 -08:00
Harshavardhana
7e3a7d7044
add healing for invalid shards by skipping the blocks (#13978)
Built on top of #13945, now we need to simply skip the
shards and its automated.
2021-12-23 23:01:46 -08:00
Aditya Manthramurthy
5a96cbbeaa
Fix user privilege escalation bug (#13976)
The AddUser() API endpoint was accepting a policy field. 
This API is used to update a user's secret key and account 
status, and allows a regular user to update their own secret key. 

The policy update is also applied though does not appear to 
be used by any existing client-side functionality.

This fix changes the accepted request body type and removes 
the ability to apply policy changes as that is possible via the 
policy set API.

NOTE: Changing passwords can be disabled as a workaround
for this issue by adding an explicit "Deny" rule to disable the API
for users.
2021-12-23 09:21:21 -08:00
Harshavardhana
54ec0a1308
add configurable delta for skipping shards (#13967)
This PR is an attempt to make this configurable
as not all situations have same level of tolerable
delta, i.e disks are replaced days apart or even
hours.

There is also a possibility that nodes have drifted
in time, when NTP is not configured on the system.
2021-12-22 11:43:01 -08:00
Harshavardhana
1cf726348f
return meaningful error for disabled users (#13968)
fixes #13958
2021-12-22 11:40:21 -08:00
Harshavardhana
0e3037631f
skip inconsistent shards if possible (#13945)
data shards were wrong due to a healing bug
reported in #13803 mainly with unaligned object
sizes.

This PR is an attempt to automatically avoid
these shards, with available information about
the `xl.meta` and actually disk mtime.
2021-12-21 10:08:26 -08:00
Aditya Manthramurthy
6fbf4f96b6
Move last remaining IAM notification calls into IAMSys methods (#13941) 2021-12-21 02:16:50 -08:00
Aditya Manthramurthy
526e10a2e0
Fix regression in STS permissions via group in internal IDP (#13955)
- When using MinIO's internal IDP, STS credential permissions did not check the
groups of a user.

- Also fix bug in policy checking in AccountInfo call
2021-12-20 14:07:16 -08:00
Harshavardhana
499872f31d
Add configurable channel queue_size for audit/logger webhook targets (#13819)
Also log all the missed events and logs instead of silently
swallowing the events.

Bonus: Extend the logger webhook to support mTLS
similar to audit webhook target.
2021-12-20 13:16:53 -08:00
Anis Elleuch
5cc16e098c
env: Remove quotes when parsing a config env file (#13953)
The code parsing the config environment file does not remove 
quotes of environment variables values. This commit adds this 
capability.
2021-12-20 13:13:06 -08:00
Aditya Manthramurthy
1f4e0bd17c
fix: access for root user's STS credential (#13947)
add a test to cover this case
2021-12-19 23:05:20 -08:00
Aditya Manthramurthy
997e808088
fix; race in bucket replication stats (#13942)
- r.ulock was not locked when r.UsageCache was being modified

Bonus:

- simplify code by removing some unnecessary clone methods - we can 
do this because go arrays are values (not pointers/references) that are 
automatically copied on assignment.

- remove some unnecessary map allocation calls
2021-12-17 15:33:13 -08:00
Shireesh Anjal
13441ad0f8
Add IsKubernetes and IsDocker to health data (#13936) 2021-12-17 14:46:54 -08:00
Harshavardhana
aa508591c1
cache only metrics served from the disks (#13940)
do not need to cache in-memory instant metrics
2021-12-17 11:40:09 -08:00
Harshavardhana
818f0201fc
re-implement prometheus metrics endpoint to be simpler (#13922)
data-structures were repeatedly initialized
this causes GC pressure, instead re-use the
collectors.

Initialize collectors in `init()`, also make
sure to honor the cache semantics for performance
requirements.

Avoid a global map and a global lock for metrics
lookup instead let them all be lock-free unless
the cache is being invalidated.
2021-12-17 10:11:04 -08:00
Aditya Manthramurthy
890f43ffa5
Map policy to parent for STS (#13884)
When STS credentials are created for a user, a unique (hopefully stable) parent
user value exists for the credential, which corresponds to the user for whom the
credentials are created. The access policy is mapped to this parent-user and is
persisted. This helps ensure that all STS credentials of a user have the same
policy assignment at all times.

Before this change, for an OIDC STS credential, when the policy claim changes in
the provider (when not using RoleARNs), the change would not take effect on
existing credentials, but only on new ones.

To support existing STS credentials without parent-user policy mappings, we
lookup the policy in the policy claim value. This behavior should be deprecated
when such support is no longer required, as it can still lead to stale
policy mappings.

Additionally this change also simplifies the implementation for all non-RoleARN
STS credentials. Specifically, for AssumeRole (internal IDP) STS credentials,
policies are picked up from the parent user's policies; for
AssumeRoleWithCertificate STS credentials, policies are picked up from the
parent user mapping created when the STS credential is generated.
AssumeRoleWithLDAP already picks up policies mapped to the virtual parent user.
2021-12-17 00:46:30 -08:00
Poorna K
e270ab65b3
fix: healing of replication delete markers (#13933)
A corner case can occur where the delete-marker was propagated 
but the metadata could not be updated on the primary. Sending 
a RemoveObject call with the Delete marker version would end 
up permanently deleting the version on target. Instead, perform 
a Stat on the delete-marker version on target and redo replication 
only if the delete-marker is missing on target.
2021-12-16 15:34:55 -08:00
Anis Elleuch
926373f9c1
Run the data scanner routine in a loop (#13928)
After the introduction of Refresh logic in locks, the data scanner can
quit when the data scanner lock is not able to get refreshed. In that
case, the context of the data scanner will get canceled and
runDataScanner() will quit. Another server would pick the scanning
routine but after some time, all nodes can just have all scanning
routine aborted, as described above.

This fix will just run the data scanner in a loop.
2021-12-16 08:32:15 -08:00
Poorna K
111c6177d2
Deprecate caching for erasure/distributed mode (#13909)
Fixes: #13907

Also removing default value of `writethrough` for cache commit
which was interfering with cache_after setting
2021-12-15 16:48:34 -08:00
Poorna K
b42cfcea60
Disallow versioning/replication change in cluster replication setup (#13910) 2021-12-15 10:37:08 -08:00
Klaus Post
aca6dfbd60
Check for nil RPC in listing (#13917)
Fixes #13915
2021-12-15 09:19:11 -08:00
Harshavardhana
5f7e6d03ff
copy bucket slice to avoid skipping .minio.sys/buckets (#13912)
healing was skipping `.minio.sys/buckets` path so
essentially not healing `.usage.json` - fix this
by making a copy of `buckets` slice.
2021-12-15 09:18:09 -08:00
Harshavardhana
88ad742da0
fix: error handling cases in site-replication (#13901)
- Allow proper SRError to be propagated to
  handlers and converted appropriately.

- Make sure to enable object locking on buckets
  when requested in MakeBucketHook.

- When DNSConfig is enabled attempt to delete it
  first before deleting buckets locally.
2021-12-14 14:09:57 -08:00
Krishnan Parthasarathi
44a9339c0a
Newer noncurrent versions (#13815)
- Rename MaxNoncurrentVersions tag to NewerNoncurrentVersions

Note: We apply overlapping NewerNoncurrentVersions rules such that 
we honor the highest among applicable limits. e.g if 2 overlapping rules 
are configured with 2 and 3 noncurrent versions to be retained, we 
will retain 3.

- Expire newer noncurrent versions after noncurrent days
- MinIO extension: allow noncurrent days to be zero, allowing expiry 
  of noncurrent version as soon as more than configured 
  NewerNoncurrentVersions are present.
- Allow NewerNoncurrentVersions rules on object-locked buckets
- No x-amz-expiration when NewerNoncurrentVersions configured
- ComputeAction should skip rules with NewerNoncurrentVersions > 0
- Add unit tests for lifecycle.ComputeAction
- Support lifecycle rules with MaxNoncurrentVersions
- Extend ExpectedExpiryTime to work with zero days
- Fix all-time comparisons to be relative to UTC
2021-12-14 09:41:44 -08:00
Harshavardhana
113c7ff49a
add code to parse secrets natively instead of shell scripts (#13883) 2021-12-13 18:23:31 -08:00
Poorna K
d422d24278
replication: warn if insufficient workers (#13899)
This should give an early warning if configured replication 
workers are insufficient to meet application workload.
2021-12-13 18:22:56 -08:00
Aditya Manthramurthy
de400f3473
Allow setting non-existent policy on a user/group (#13898) 2021-12-13 15:55:52 -08:00
Harshavardhana
8144a125ce
check for update in background (#13889) 2021-12-13 09:43:03 -08:00
jiangfucheng
88c0d0120c
update heal object unit test (#13886) 2021-12-11 09:04:07 -08:00
Aditya Manthramurthy
44fefe5b9f
Add option to policy info API to return create/mod timestamps (#13796)
- This introduces a new admin API with a query parameter (v=2) to return a
response with the timestamps

- Older API still works for compatibility/smooth transition in console
2021-12-11 09:03:39 -08:00
Aditya Manthramurthy
f2bd026d0e
Allow OIDC user to query user info if policies permit (#13882) 2021-12-10 15:03:39 -08:00
Klaus Post
81e43b87c2
Don't zero buffer if big enough (#13877)
Only append zeroed bytes when we don't have enough space anyway.
2021-12-10 13:08:10 -08:00
Aditya Manthramurthy
a02e17f15c
Add tests to ensure that OIDC user can create IAM users (#13881) 2021-12-10 13:04:21 -08:00
Harshavardhana
5b7c00ff52
add more tests to cover areas for weird object names (#13873)
continuation of #13858 to add more tests and also validate the 
written object data.
2021-12-09 17:52:53 -08:00
Aditya Manthramurthy
b9f0046ee7
Allow STS credentials to create users (#13874)
- allow any regular user to change their own password
- allow STS credentials to create users if permissions allow

Bonus: do not allow changes to sts/service account credentials (via add user API)
2021-12-09 17:48:51 -08:00
Harshavardhana
3b79f7e4ae
ignore if volume exists in MakeVolBulk, return other errors (#13866) 2021-12-09 15:55:42 -08:00
Aditya Manthramurthy
85d2df02b9
fix: user listing with LDAP (#13872)
Users listing was showing just a weird policy 
mapping output which does not make sense here.
2021-12-09 15:55:28 -08:00
Harshavardhana
2f1e8ba612
add more directory marker tests and fix a bug (#13871)
ListObjects() should never list a delete-marked folder
if latest is delete marker and delimiter is not provided.

ListObjectVersions() should list a delete-marked folder
even if latest is delete marker and delimiter is not
provided.

Enhance further versioning listing on the buckets
2021-12-09 14:59:23 -08:00
Anis Elleuch
84c690cb07
storage: Use request.Form and avoid mux matching (#13858)
request.Form uses less memory allocation and avoids gorilla mux matching
with weird characters in parameters such as '\n'

- Remove Queries() to avoid matching
- Ensure r.ParseForm is called to populate fields
- Add a unit test for object names with '\n'
2021-12-09 08:38:46 -08:00
Harshavardhana
239bbad7ab
add test to expect prefix without a directory object (#13865)
Motivation is to cover more areas
2021-12-09 08:36:54 -08:00
Harshavardhana
dcff6c996d
fix: do not list delete-marked objects (#13864)
delete marked objects should not be considered
for listing when listing is delimited, this issue
as introduced in PR #13804 which was mainly to
address listing of directories in listing when
delimited.

This PR fixes this properly and adds tests to
ensure that we behave in accordance with how
an S3 API behaves for ListObjects() without
versions.
2021-12-08 17:34:52 -08:00
Poorna K
0a66a6f1e5
Avoid cache GC of writebacks before commit syncs (#13860)
Save part.1 for writebacks in a separate folder
and move it to cache dir atomically while saving
the cache metadata. This is to avoid GC mistaking
part.1 as orphaned cache entries and purging them.

This PR also fixes object size being overwritten during
retries for write-back mode.
2021-12-08 14:52:31 -08:00
Harshavardhana
e82a5c5c54
fix: site replication issues and add tests (#13861)
- deleting policies was deleting all LDAP
  user mapping, this was a regression introduced
  in #13567

- deleting of policies is properly sent across
  all sites.

- remove unexpected errors instead embed the real
  errors as part of the 500 error response.
2021-12-08 11:50:15 -08:00
Harshavardhana
b9aae1aaae
fix: speedtest should exit upon errors cleanly (#13851)
- deleteBucket() should be called for cleanup
  if client abruptly disconnects

- out of disk errors should be sent to client
  properly and also cancel the calls

- limit concurrency to available MAXPROCS not
  32 for auto-tuned setup, if procs are beyond
  32 then continue normally. this is to handle
  smaller setups.

fixes #13834
2021-12-06 16:36:14 -08:00
Harshavardhana
7d70afc937
fix: potential crash in diskCache when fileScorer is empty (#13850)
```
goroutine 115 [running]:
github.com/minio/minio/cmd.(*diskCache).purge.func3({0xc007a10a40, 0x40}, 0x40)
   github.com/minio/minio/cmd/disk-cache-backend.go:430 +0x90d
```
2021-12-06 15:55:29 -08:00
Aditya Manthramurthy
12b63061c2
Fix LDAP service account creation (#13849)
- when a user has only group permissions
- fixes regression from ac74237f0 (#13657)
- fixes https://github.com/minio/console/issues/1291
2021-12-06 15:55:11 -08:00
Klaus Post
038fdeea83
snowball: return errors on failures (#13836)
Return errors when untar fails at once.

Current error handling was quite a mess. Errors are written 
to the stream, but processing continues.

Instead, return errors when they occur and transform 
internal errors to bad request errors, since it is likely a 
problem with the input.

Fixes #13832
2021-12-06 09:45:23 -08:00
Anis Elleuch
0b6225bcc3
Better error msg when version mismatch of internode API (#13845)
Sometimes, we see an error message like "Server expects 'storage' API
version 'v41', instead found 'v41'" shows a more generic error message
with the path of the REST call.
2021-12-06 09:44:48 -08:00
Anis Elleuch
f286ef8e17
isMultipart to test on parts sizes only if object is encrypted (#13839)
ObjectInfo.isMultipart() is testing if parts sizes are compatible with
encrypted parts but this only can be done if the object is encrypted.
2021-12-06 09:43:43 -08:00
Harshavardhana
b120bcb60a
validate if cached value is empty before use (#13830)
fixes a crash reproduced while running hadoop tests

```
goroutine 201564 [running]:
github.com/minio/minio/cmd.metaCacheEntries.resolve({0xc0206ab7a0, 0x4, 0xc0015b1908}, 0xc0212a7040)
	github.com/minio/minio/cmd/metacache-entries.go:352 +0x58a
```

Bonus: HeadBucket() should always provide content-type
2021-12-06 02:59:51 -08:00
Harshavardhana
be34fc9134
fix: kms-id header should have arn:aws:kms: prefix (#13833)
arn:aws:kms: is a must for KMS keyID.
2021-12-06 00:39:32 -08:00
Harshavardhana
8591d17d82
return appropriate errors upon parseErrors (#13831) 2021-12-05 11:36:26 -08:00
Harshavardhana
f6190d6751
Add single drive support for directory prefixes in Listing (#13829)
This fixes the compatibility issue with Hadoop 3.3.1

fixes #13710
2021-12-03 18:08:40 -08:00
Aditya Manthramurthy
4f35054d29
Ensure that role ARNs don't collide (#13817)
This is to prepare for multiple providers enhancement.
2021-12-03 13:15:56 -08:00
Shireesh Anjal
d29df6714a
Introduce new config subnet api_key (#13793)
The earlier approach of using a license token for 
communicating with SUBNET is being replaced 
with a simpler mechanism of API keys. Unlike the 
license which is a JWT token, these API keys will 
be simple UUID tokens and don't have any embedded 
information in them. SUBNET would generate the 
API key on cluster registration, and then it would 
be saved in this config, to be used for subsequent 
communication with SUBNET.
2021-12-03 09:32:11 -08:00
jiangfucheng
7460fb8349
fix padding error and compatible with uploaded objects (#13803) 2021-12-03 09:26:30 -08:00
Harshavardhana
a7c430355a
fix: throw appropriate errors when all disks fail (#13820)
when all disks fail with same error, fail server
startup anyways - we cannot proceed.

fixes #13818
2021-12-03 09:25:17 -08:00
Aditya Manthramurthy
b14527b7af
If role policy is configured, require that role ARN be set in STS (#13814) 2021-12-02 15:43:39 -08:00
Klaus Post
3db931dc0e
Improve listing consistency with version merging (#13723) 2021-12-02 11:29:16 -08:00
Klaus Post
8309ddd486
Fix panic (not fatal) on connection drops (#13811)
Fix more regressions from #13597 with double closed channels.

```
panic: "POST /minio/storage/data/distxl-plain/s1/d2/v42/createfile?disk-id=c789f7e1-2b52-442a-b518-aa2dac03f3a1&file-path=f6161668-b939-4543-9873-91b9da4cdff6%2F5eafa986-a3bf-4b1c-8bc0-03a37de390a3%2Fpart.1&length=2621760&volume=.minio.sys%2Ftmp": send on closed channel
goroutine 1977 [running]:
runtime/debug.Stack()
        c:/go/src/runtime/debug/stack.go:24 +0x65
github.com/minio/minio/cmd.setCriticalErrorHandler.func1.1()
        d:/minio/minio/cmd/generic-handlers.go:468 +0x8e
panic({0x2928860, 0x4fb17e0})
        c:/go/src/runtime/panic.go:1038 +0x215
github.com/minio/minio/cmd.keepHTTPReqResponseAlive.func2({0x4fe4ea0, 0xc02737d8a0})
        d:/minio/minio/cmd/storage-rest-server.go:818 +0x48
github.com/minio/minio/cmd.(*storageRESTServer).CreateFileHandler(0xc0015a8510, {0x50073e0, 0xc0273ec460}, 0xc029b9a400)
        d:/minio/minio/cmd/storage-rest-server.go:334 +0x1d2
net/http.HandlerFunc.ServeHTTP(...)
        c:/go/src/net/http/server.go:2046
github.com/minio/minio/cmd.httpTraceHdrs.func1({0x50073e0, 0xc0273ec460}, 0x0)
        d:/minio/minio/cmd/handler-utils.go:372 +0x53
net/http.HandlerFunc.ServeHTTP(0x5007380, {0x50073e0, 0xc0273ec460}, 0x10)
        c:/go/src/net/http/server.go:2046 +0x2f
github.com/minio/minio/cmd.addCustomHeaders.func1({0x5007380, 0xc0273dcf00}, 0xc0273f7340)
```

Reverts but adds write checks.
2021-12-02 11:22:32 -08:00