do not allow mutation to pool command line when there are
unfinished decommissions in place, disallow such scenarios
to avoid user mistakes.
also add testcases to cover all relevant scenarios.
When reading input for PutObject or PutObjectPart add a readahead buffer for big inputs.
This will make network reads+hashing separate run async with erasure coding and writes. This will reduce overall latency in distributed setups where the input is from upstream and writes go to other servers.
We will read at 2 buffers ahead, meaning one will always be ready/waiting and one is currently being read from.
This improves PutObject and PutObjectParts for these cases.
When deleting multiple versions it "gives" up with an errFileVersionNotFound if
a version cannot be found. This effectively skips deleting other versions
sent in the same request.
This can happen on inconsistent objects. We should ignore errFileVersionNotFound
and continue with others.
We already ignore these at the caller level, this PR is continuation of 54a9877
This PR simplifies few things
- Multipart parts are renamed, upon failure are unrenamed() keep this
multipart specific behavior it is needed and works fine.
- AbortMultipart should blindly delete once lock is acquired instead
of re-reading metadata and calculating quorum, abort is a delete()
operation and client has no business looking for errors on this.
- Skip Access() calls to folders that are operating on
`.minio.sys/multipart` folder as well.
Large clusters with multiple sets, or multi-pool setups at times might
fail and report unexpected "file not found" errors. This can become
a problem during startup sequence when some files need to be created
at multiple locations.
- This PR ensures that we nil the erasure writers such that they
are skipped in RenameData() call.
- RenameData() doesn't need to "Access()" calls for `.minio.sys`
folders they always exist.
- Make sure PutObject() never returns ObjectNotFound{} for any
errors, make sure it always returns "WriteQuorum" when renameData()
fails with ObjectNotFound{}. Return appropriate errors for all
other cases.
Currently tag removal leaves replication state as `PENDING`
because the `HEAD` api returns just a tag count but not the
actual tags, and this is treated as a no-op
```
λ mc admin decommission start alias/ http://minio{1...2}/data{1...4}
```
```
λ mc admin decommission status alias/
┌─────┬─────────────────────────────────┬──────────────────────────────────┬────────┐
│ ID │ Pools │ Capacity │ Status │
│ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Active │
│ 2nd │ http://minio{3...4}/data{1...4} │ 329 GiB (used) / 421 GiB (total) │ Active │
└─────┴─────────────────────────────────┴──────────────────────────────────┴────────┘
```
```
λ mc admin decommission status alias/ http://minio{1...2}/data{1...4}
Progress: ===================> [1GiB/sec] [15%] [4TiB/50TiB]
Time Remaining: 4 hours (started 3 hours ago)
```
```
λ mc admin decommission status alias/ http://minio{1...2}/data{1...4}
ERROR: This pool is not scheduled for decommissioning currently.
```
```
λ mc admin decommission cancel alias/
┌─────┬─────────────────────────────────┬──────────────────────────────────┬──────────┐
│ ID │ Pools │ Capacity │ Status │
│ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Draining │
└─────┴─────────────────────────────────┴──────────────────────────────────┴──────────┘
```
> NOTE: Canceled decommission will not make the pool active again, since we might have
> Potentially partial duplicate content on the other pools, to avoid this scenario be
> very sure to start decommissioning as a planned activity.
```
λ mc admin decommission cancel alias/ http://minio{1...2}/data{1...4}
┌─────┬─────────────────────────────────┬──────────────────────────────────┬────────────────────┐
│ ID │ Pools │ Capacity │ Status │
│ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Draining(Canceled) │
└─────┴─────────────────────────────────┴──────────────────────────────────┴────────────────────┘
```
In a multi-pool setup when disks are coming up, or in a single pool
setup let's say with 100's of erasure sets with a slow network.
It's possible when healing is attempted on `.minio.sys/config`
folder, it can lead to healing unexpectedly deleting some policy
files as dangling due to a mistake in understanding when `isObjectDangling`
is considered to be 'true'.
This issue happened in commit 30135eed86
when we assumed the validMeta with empty ErasureInfo is considered
to be fully dangling. This implementation issue gets exposed when
the server is starting up.
This is most easily seen with multiple-pool setups because of the
disconnected fashion pools that come up. The decision to purge the
object as dangling is taken incorrectly prior to the correct state
being achieved on each pool, when the corresponding drive let's say
returns 'errDiskNotFound', a 'delete' is triggered. At this point,
the 'drive' comes online because this is part of the startup sequence
as drives can come online lazily.
This kind of situation exists because we allow (totalDisks/2) number
of drives to be online when the server is being restarted.
Implementation made an incorrect assumption here leading to policies
getting deleted.
Added tests to capture the implementation requirements.
- This allows site-replication to be configured when using OpenID or the
internal IDentity Provider.
- Internal IDP IAM users and groups will now be replicated to all members of the
set of replicated sites.
- When using OpenID as the external identity provider, STS and service accounts
are replicated.
- Currently this change dis-allows root service accounts from being
replicated (TODO: discuss security implications).
It is possible that GetLock() call remembers a previously
failed releaseAll() when there are networking issues, now
this state can have potential side effects.
This PR tries to avoid this side affect by making sure
to initialize NewNSLock() for each GetLock() attempts
made to avoid any prior state in the memory that can
interfere with the new lock grants.
The current usage of assuming `default` parity of `4` is not correct
for all objects stored on MinIO, objects in .minio.sys have maximum
parity, healing won't trigger on these objects due to incorrect
verification of quorum.
The AddUser() API endpoint was accepting a policy field.
This API is used to update a user's secret key and account
status, and allows a regular user to update their own secret key.
The policy update is also applied though does not appear to
be used by any existing client-side functionality.
This fix changes the accepted request body type and removes
the ability to apply policy changes as that is possible via the
policy set API.
NOTE: Changing passwords can be disabled as a workaround
for this issue by adding an explicit "Deny" rule to disable the API
for users.
This PR is an attempt to make this configurable
as not all situations have same level of tolerable
delta, i.e disks are replaced days apart or even
hours.
There is also a possibility that nodes have drifted
in time, when NTP is not configured on the system.
data shards were wrong due to a healing bug
reported in #13803 mainly with unaligned object
sizes.
This PR is an attempt to automatically avoid
these shards, with available information about
the `xl.meta` and actually disk mtime.
- When using MinIO's internal IDP, STS credential permissions did not check the
groups of a user.
- Also fix bug in policy checking in AccountInfo call
Also log all the missed events and logs instead of silently
swallowing the events.
Bonus: Extend the logger webhook to support mTLS
similar to audit webhook target.
- r.ulock was not locked when r.UsageCache was being modified
Bonus:
- simplify code by removing some unnecessary clone methods - we can
do this because go arrays are values (not pointers/references) that are
automatically copied on assignment.
- remove some unnecessary map allocation calls
data-structures were repeatedly initialized
this causes GC pressure, instead re-use the
collectors.
Initialize collectors in `init()`, also make
sure to honor the cache semantics for performance
requirements.
Avoid a global map and a global lock for metrics
lookup instead let them all be lock-free unless
the cache is being invalidated.
When STS credentials are created for a user, a unique (hopefully stable) parent
user value exists for the credential, which corresponds to the user for whom the
credentials are created. The access policy is mapped to this parent-user and is
persisted. This helps ensure that all STS credentials of a user have the same
policy assignment at all times.
Before this change, for an OIDC STS credential, when the policy claim changes in
the provider (when not using RoleARNs), the change would not take effect on
existing credentials, but only on new ones.
To support existing STS credentials without parent-user policy mappings, we
lookup the policy in the policy claim value. This behavior should be deprecated
when such support is no longer required, as it can still lead to stale
policy mappings.
Additionally this change also simplifies the implementation for all non-RoleARN
STS credentials. Specifically, for AssumeRole (internal IDP) STS credentials,
policies are picked up from the parent user's policies; for
AssumeRoleWithCertificate STS credentials, policies are picked up from the
parent user mapping created when the STS credential is generated.
AssumeRoleWithLDAP already picks up policies mapped to the virtual parent user.
A corner case can occur where the delete-marker was propagated
but the metadata could not be updated on the primary. Sending
a RemoveObject call with the Delete marker version would end
up permanently deleting the version on target. Instead, perform
a Stat on the delete-marker version on target and redo replication
only if the delete-marker is missing on target.