Commit Graph

4209 Commits

Author SHA1 Message Date
Harshavardhana f6d13f57bb
fix: correct parentUser lookup for OIDC auto expiration (#14154)
fixes #14026

This is a regression from #13884
2022-01-22 16:36:11 -08:00
Poorna 48da4aeee0
Add API for removing site(s) from site replication (#14022) 2022-01-21 08:48:21 -08:00
Harshavardhana 7f214a0e46
use dnscache resolver for resolving command line endpoints (#14135)
this helps in caching the resolved values early on, avoids
causing further resolution for individual nodes when
object layer comes online.

this can speed up our startup time during, upgrades etc by
an order of magnitude.

additional changes in connectLoadInitFormats() and parallelize
all calls that might be potentially blocking.
2022-01-20 13:03:15 -08:00
Klaus Post e1a0a1e73c
fs: Return prefix as listing marker if no objects (#14143)
Fixes #14132
2022-01-20 10:55:18 -08:00
Harshavardhana 9d588319dd
support site replication to replicate IAM users,groups (#14128)
- Site replication was missing replicating users,
  groups when an empty site was added.

- Add site replication for groups and users when they
  are disabled and enabled.

- Add support for replicating bucket quota config.
2022-01-19 20:02:24 -08:00
Klaus Post 0012ca8ca5
Fix inconsistent metadata after healing (#14125)
When calculating signatures empty part ETags were not discarded, leading 
to a different signature compared to freshly created ones.

This would mean that after a heal signature of the healed metadata would be 
different. Fixing the calculation of signature will make these consistent.

Furthermore when inconsistent entries, with zero version ID, with the same 
mod times but different signatures, the one with the lowest signature would 
be picked for quorum check. Since this is 50/50, we fall back to a simple 
quorum count on all signatures.

Each of these fixes by themselves will lead to quorum. Tests were added 
for regressions and expected outcomes.
2022-01-19 10:48:00 -08:00
Poorna 288e276abe
Specify tags in options while selecting replication targets (#14126)
When the replication rule is based on tag matches, the replication process
should pick up targets matching the tags specified in the replication
rule.

Fixing regression due to #12880
2022-01-19 10:45:42 -08:00
Jarbitz f22e745514
fix: ListBucketUsers comment doc (#14129) 2022-01-19 10:45:13 -08:00
Krishnan Parthasarathi 070c31eac5
Wait for updates collector when disk.NSScanner returns error (#14127) 2022-01-19 00:46:43 -08:00
Harshavardhana 70e1cbda21
allow disabling O_DIRECT in certain environments for reads (#14115)
repeated reads on single large objects in HPC like
workloads, need the following option to disable
O_DIRECT for a more effective usage of the kernel
page-cache.

However this optional should be used in very specific
situations only, and shouldn't be enabled on all
servers.

NVMe servers benefit always from keeping O_DIRECT on.
2022-01-17 08:34:14 -08:00
Harshavardhana 60f2df54e0
Add envVars for CLI arguments (#14114)
fixes #14107
2022-01-15 16:20:02 -08:00
Harshavardhana ba708f51f2
fix: copyMetrics to avoid map references elsewhere (#14113)
map labels might have been referenced else, this
can lead to concurrent access at lower layers.

avoid this by copying the information while
concurrently serving the metrics.
2022-01-14 16:48:19 -08:00
Harshavardhana 0df31f63ab
reject changing pools when there are pending decommissions in-progress (#14102)
do not allow mutation to pool command line when there are
unfinished decommissions in place, disallow such scenarios
to avoid user mistakes.

also add testcases to cover all relevant scenarios.
2022-01-14 10:32:35 -08:00
Klaus Post 64d4da5a37
Add Put input readahead (#14084)
When reading input for PutObject or PutObjectPart add a readahead buffer for big inputs.

This will make network reads+hashing separate run async with erasure coding and writes. This will reduce overall latency in distributed setups where the input is from upstream and writes go to other servers.

We will read at 2 buffers ahead, meaning one will always be ready/waiting and one is currently being read from.

This improves PutObject and PutObjectParts for these cases.
2022-01-14 10:01:25 -08:00
Harshavardhana 7aec38a73e
Simplify the messaging for internode versions (#14103)
provide a cleaner message instead of cryptic
logs, also provide the relevant link on how to do
recommended way to upgrade.
2022-01-13 17:25:08 -08:00
Klaus Post a2fd8caa69
Ignore version not found in deleteVersions (#14093)
When deleting multiple versions it "gives" up with an errFileVersionNotFound if 
a version cannot be found. This effectively skips deleting other versions 
sent in the same request. 

This can happen on inconsistent objects. We should ignore errFileVersionNotFound 
and continue with others.

We already ignore these at the caller level, this PR is continuation of 54a9877
2022-01-13 14:28:07 -08:00
Harshavardhana f546636c52
fix: use renameAll instead of deleteObject() for purging temporary files (#14096)
This PR simplifies few things

- Multipart parts are renamed, upon failure are unrenamed() keep this
  multipart specific behavior it is needed and works fine.

- AbortMultipart should blindly delete once lock is acquired instead
  of re-reading metadata and calculating quorum, abort is a delete()
  operation and client has no business looking for errors on this.

- Skip Access() calls to folders that are operating on
  `.minio.sys/multipart` folder as well.
2022-01-13 11:07:41 -08:00
Harshavardhana 38ccc4f672
fix: make sure to avoid calling RenameData() on disconnected disks. (#14094)
Large clusters with multiple sets, or multi-pool setups at times might
fail and report unexpected "file not found" errors. This can become
a problem during startup sequence when some files need to be created
at multiple locations.

- This PR ensures that we nil the erasure writers such that they
  are skipped in RenameData() call.

- RenameData() doesn't need to "Access()" calls for `.minio.sys`
  folders they always exist.

- Make sure PutObject() never returns ObjectNotFound{} for any
  errors, make sure it always returns "WriteQuorum" when renameData()
  fails with ObjectNotFound{}. Return appropriate errors for all
  other cases.
2022-01-12 18:49:01 -08:00
Harshavardhana cc3f139d1f
replication: attempt abort multipart-upload at max 3 times on remote (#14087)
this is mainly an attempt to relinquish space on the remote
site, if this still doesn't do it we give and let the admin
know with a log message.
2022-01-11 22:32:29 -08:00
Harshavardhana d50442da01
fix: simplify usage calculation and progress (#14086) 2022-01-11 18:48:43 -08:00
Harshavardhana 404b05a44c
fix: ignore drained pool in Healing, hold lock additionally (#14080) 2022-01-11 12:27:47 -08:00
Harshavardhana 3d7c1ad31d
ignore configNotFound error in AccountInfo() (#14082)
fixes #14081
2022-01-11 08:43:18 -08:00
yinhen d300e775a6
Avoid reconnect of disk during startup sequence (#14070) 2022-01-10 23:33:58 -08:00
Harshavardhana 7ee2d1c339
fix: when healing log path when we give up (#14079) 2022-01-10 21:22:17 -08:00
Poorna 54a98773f8
fix: replication of tag removal (#14056)
Currently tag removal leaves replication state as `PENDING` 
because the `HEAD` api returns just a tag count but not the 
actual tags, and this is treated as a no-op
2022-01-10 19:06:10 -08:00
Harshavardhana 737a3f0bad
fix: decommission bugfixes found during migration of .minio.sys/config (#14078) 2022-01-10 17:26:00 -08:00
Harshavardhana 3bd9636a5b
do not remove Sid from svcaccount policies (#14064)
fixes #13905
2022-01-10 14:26:26 -08:00
Harshavardhana 76b21de0c6
feat: decommission feature for pools (#14012)
```
λ mc admin decommission start alias/ http://minio{1...2}/data{1...4}
```

```
λ mc admin decommission status alias/
┌─────┬─────────────────────────────────┬──────────────────────────────────┬────────┐
│ ID  │ Pools                           │ Capacity                         │ Status │
│ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Active │
│ 2nd │ http://minio{3...4}/data{1...4} │ 329 GiB (used) / 421 GiB (total) │ Active │
└─────┴─────────────────────────────────┴──────────────────────────────────┴────────┘
```

```
λ mc admin decommission status alias/ http://minio{1...2}/data{1...4}
Progress: ===================> [1GiB/sec] [15%] [4TiB/50TiB]
Time Remaining: 4 hours (started 3 hours ago)
```

```
λ mc admin decommission status alias/ http://minio{1...2}/data{1...4}
ERROR: This pool is not scheduled for decommissioning currently.
```

```
λ mc admin decommission cancel alias/
┌─────┬─────────────────────────────────┬──────────────────────────────────┬──────────┐
│ ID  │ Pools                           │ Capacity                         │ Status   │
│ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Draining │
└─────┴─────────────────────────────────┴──────────────────────────────────┴──────────┘
```

> NOTE: Canceled decommission will not make the pool active again, since we might have
> Potentially partial duplicate content on the other pools, to avoid this scenario be
> very sure to start decommissioning as a planned activity.

```
λ mc admin decommission cancel alias/ http://minio{1...2}/data{1...4}
┌─────┬─────────────────────────────────┬──────────────────────────────────┬────────────────────┐
│ ID  │ Pools                           │ Capacity                         │ Status             │
│ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Draining(Canceled) │
└─────┴─────────────────────────────────┴──────────────────────────────────┴────────────────────┘
```
2022-01-10 09:07:49 -08:00
Harshavardhana b7c5e45fff
heal: isObjectDangling should return false when it cannot decide (#14053)
In a multi-pool setup when disks are coming up, or in a single pool
setup let's say with 100's of erasure sets with a slow network.

It's possible when healing is attempted on `.minio.sys/config`
folder, it can lead to healing unexpectedly deleting some policy
files as dangling due to a mistake in understanding when `isObjectDangling`
is considered to be 'true'.

This issue happened in commit 30135eed86
when we assumed the validMeta with empty ErasureInfo is considered
to be fully dangling. This implementation issue gets exposed when
the server is starting up.

This is most easily seen with multiple-pool setups because of the
disconnected fashion pools that come up. The decision to purge the
object as dangling is taken incorrectly prior to the correct state
being achieved on each pool, when the corresponding drive let's say
returns 'errDiskNotFound', a 'delete' is triggered. At this point,
the 'drive' comes online because this is part of the startup sequence
as drives can come online lazily.

This kind of situation exists because we allow (totalDisks/2) number
of drives to be online when the server is being restarted.

Implementation made an incorrect assumption here leading to policies
getting deleted.

Added tests to capture the implementation requirements.
2022-01-07 19:11:54 -08:00
Aditya Manthramurthy 0a224654c2
fix: progagation of service accounts for site replication (#14054)
- Only non-root-owned service accounts are replicated for now.
- Add integration tests for OIDC with site replication
2022-01-07 17:41:43 -08:00
Aditya Manthramurthy 1981fe2072
Add internal IDP and OIDC users support for site-replication (#14041)
- This allows site-replication to be configured when using OpenID or the
  internal IDentity Provider.

- Internal IDP IAM users and groups will now be replicated to all members of the
  set of replicated sites.

- When using OpenID as the external identity provider, STS and service accounts
  are replicated.

- Currently this change dis-allows root service accounts from being
  replicated (TODO: discuss security implications).
2022-01-06 15:52:43 -08:00
Minio Trusted 76877eb6fa move gofumpt to golang-ci 2022-01-06 13:08:21 -08:00
Klaus Post 3d66d053c7
Add small client TLS PSK cache (#14039) 2022-01-06 11:34:02 -08:00
Klaus Post 0e31cff762
fix: DeleteMultipleObjects to finish even if cancelled + concurrent sets (#14038)
* Process sets concurrently.
* Disconnect context from request.
* Insert context cancellation checks.
* errFileNotFound and errFileVersionNotFound are ok, unless creating delete markers.
2022-01-06 10:47:49 -08:00
Shireesh Anjal c27110e37d
Add timeinfo to health data (#14013)
Capture RoundtripDuration to figure out 
NTP issues in subnet health analyzer.
2022-01-06 01:51:10 -08:00
Harshavardhana 89441a22aa
enforceRetentionForDeletion should return false early for delete-marker (#14033) 2022-01-05 17:05:28 -08:00
Poorna 4d39fd4165
Add API for cluster replication status visibility (#13885) 2022-01-05 02:44:08 -08:00
Harshavardhana 001b77e7e1
use readConfig/saveConfig to simplify I/O on usage/tracker info (#14019) 2022-01-03 10:22:58 -08:00
Harshavardhana a60ac7ca17
fix: audit log to support object names in multipleObjectNames() handler (#14017) 2022-01-03 01:28:52 -08:00
Harshavardhana 42ba0da6b0
fix: initialize new drwMutex for each attempt in 'for {' loop. (#14009)
It is possible that GetLock() call remembers a previously
failed releaseAll() when there are networking issues, now
this state can have potential side effects.

This PR tries to avoid this side affect by making sure
to initialize NewNSLock() for each GetLock() attempts
made to avoid any prior state in the memory that can
interfere with the new lock grants.
2022-01-02 09:15:34 -08:00
Harshavardhana f527c708f2
run gofumpt cleanup across code-base (#14015) 2022-01-02 09:15:06 -08:00
Harshavardhana 79df2c7ce7
correctly calculate read quorum based on the available fileInfo (#14000)
The current usage of assuming `default` parity of `4` is not correct
for all objects stored on MinIO, objects in .minio.sys have maximum
parity, healing won't trigger on these objects due to incorrect
verification of quorum.
2021-12-28 15:33:03 -08:00
Harshavardhana 866a95de38
fix: choose appropriate quorum for a given erasure set (#13998)
multiObject delete should honor expected quorum
2021-12-28 12:41:52 -08:00
Minio Trusted bb97eafa82 madmin-go v1.1.23 and pkg v1.1.11 2021-12-26 23:23:18 -08:00
Harshavardhana c980804514
trim values from envrionment files (#13991)
trim values to remove any spaces, newlines
from the files while importing credentials
and other values.
2021-12-25 22:02:54 -08:00
Harshavardhana b883803b21
fix: healing across pools removing dangling objects (#13990)
adds other simplifications to the code when running
namespace heals across pools.
2021-12-25 09:01:44 -08:00
Harshavardhana 7e3a7d7044
add healing for invalid shards by skipping the blocks (#13978)
Built on top of #13945, now we need to simply skip the
shards and its automated.
2021-12-23 23:01:46 -08:00
Aditya Manthramurthy 5a96cbbeaa
Fix user privilege escalation bug (#13976)
The AddUser() API endpoint was accepting a policy field. 
This API is used to update a user's secret key and account 
status, and allows a regular user to update their own secret key. 

The policy update is also applied though does not appear to 
be used by any existing client-side functionality.

This fix changes the accepted request body type and removes 
the ability to apply policy changes as that is possible via the 
policy set API.

NOTE: Changing passwords can be disabled as a workaround
for this issue by adding an explicit "Deny" rule to disable the API
for users.
2021-12-23 09:21:21 -08:00
Harshavardhana 54ec0a1308
add configurable delta for skipping shards (#13967)
This PR is an attempt to make this configurable
as not all situations have same level of tolerable
delta, i.e disks are replaced days apart or even
hours.

There is also a possibility that nodes have drifted
in time, when NTP is not configured on the system.
2021-12-22 11:43:01 -08:00
Harshavardhana 1cf726348f
return meaningful error for disabled users (#13968)
fixes #13958
2021-12-22 11:40:21 -08:00