Simplify MRF queueing and add backlog handler
- Limit re-tries to 3 to avoid repeated re-queueing. Fall offs
to be re-tried when the scanner revisits this object or upon access.
- Change MRF to have each node process only its MRF entries.
- Collect MRF backlog by the node to allow for current backlog visibility
Now it would list details of all KMS instances with additional
attributes `endpoint` and `version`. In the case of k8s-based
deployment the list would consist of a single entry.
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
This would better to record the correct API name so that
any verification around audit logs to figure out if required
APIs are called required no of times, would be correct.
Here in this case of policy attached, API `AttachDetachPolicyBuiltin`
would be called with `requestPath` as `/minio/admin/v3/idp/builtin/policy/attach`
and in case of detach policy the value would be `/minio/admin/v3/idp/builtin/policy/detach`
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
Also shutdown poll add jitter, to verify if the shutdown
sequence can finish before 500ms, this reduces the overall
time taken during "restart" of the service.
Provides speedup for `mc admin service restart` during
active I/O, also ensures that systemd doesn't treat the
returned 'error' as a failure, certain configurations in
systemd can cause it to 'auto-restart' the process by-itself
which can interfere with `mc admin service restart`.
It can be observed how now restarting the service is
much snappier.
on unversioned buckets its possible that 0-byte objects
might lose quorum on flaky systems, allow them to be same
as DELETE markers. Since practically speak they have no
content.
Optimize DeleteObject API to avoid extra
GetObjectInfo call on the replicating side.
For receiving side, it is just a regular
DeleteObject call.
Bonus: Fix a corner case where version purged is
absent on target (either due to replication not yet
complete or target version already deleted in a
one-way replication or when replication was disabled).
In such cases, mark version purge complete.
Since `addCustomerHeaders` middleware was after the `httpTracer`
middleware, the request ID was not set in the http tracing context. By
reordering these middleware functions, the request ID header becomes
available. We also avoid setting the tracing context key again in
`newContext`.
Bonus: All middleware functions are renamed with a "Middleware" suffix
to avoid confusion with http Handler functions.
* Reduce allocations
* Add stringsHasPrefixFold which can compare string prefixes, while ignoring case and not allocating.
* Reuse all msgp.Readers
* Reuse metadata buffers when not reading data.
* Make type safe. Make buffer 4K instead of 8.
* Unslice
DNS refresh() in-case of MinIO can safely re-use
the previous values on bare-metal setups, since
bare-metal arrangements do not change DNS in any
manner commonly.
This PR simplifies that, we only ever need DNS caching
on bare-metal setups.
- On containerized setups do not enable DNS
caching at all, as it may have adverse effects on
the overall effectiveness of k8s DNS systems.
k8s DNS systems are dynamic and expect applications
to avoid managing DNS caching themselves, instead
provide a cleaner container native caching
implementations that must be used.
- update IsDocker() detection, including podman runtime
- move to minio/dnscache fork for a simpler package
Following extension allows users to specify immediate purge of
all versions as soon as the latest version of this object has
expired.
```
<LifecycleConfiguration>
<Rule>
<ID>ClassADocRule</ID>
<Filter>
<Prefix>classA/</Prefix>
</Filter>
<Status>Enabled</Status>
<Expiration>
<Days>3650</Days>
<ExpiredObjectAllVersions>true</ExpiredObjectAllVersions>
</Expiration>
</Rule>
...
```
- look for requested encryption while compressing
not just via HTTP Headers, but also via multipart
metadata
- look for SSE-S3 etag decryption not just via HTTP
Headers, but also via multipart metadata
fixes#17519
current decommission traces were missing for
- Skipped ILM expired versions
- Skipped single DELETE marked version
- A success or failure in decommissioning DELETE marker
- allow additional info to be shared in DecomStatus() API
there is a possibility that slow drives can actually add latency
to the overall call, leading to a large spike in latency.
this can happen if there are other parallel listObjects()
calls to the same drive, in-turn causing each other to sort
of serialize.
this potentially improves performance and makes PutObject()
also non-blocking.
This change adds a `Secret` property to `HelpKV` to identify secrets
like passwords and auth tokens that should not be revealed by the server
in its configuration fetching APIs. Configuration reporting APIs now do
not return secrets.
For policy attach/detach API to work correctly the server should hold a
lock before reading existing policy mapping and until after writing the
updated policy mapping. This is fixed in this change.
A site replication bug, where LDAP policy attach/detach were not
correctly propagated is also fixed in this change.
Bonus: Additionally, the server responds with the actual (or net)
changes performed in the attach/detach API call. For e.g. if a user
already has policy A applied, and a call to attach policies A and B is
performed, the server will respond that B was attached successfully.
A continuation of PR #17479 for rebalance behavior must
also match the decommission behavior.
Fixes bug where rebalance would ignore rebalancing object
versions after one of the version returned "ObjectNotFound"
while decommissioning it can so happen that the non-current
versions are all expired but there is a DEL marker as the
latest version.
For such objects, we should not decommission them instead
calculate the remaining versions and if the remaining versions
is one and that version is a DEL marker consider such
an object not to be scheduled for decommissioning.
With the current asynchronous behaviour in sending notification events
to the targets, we can't provide guaranteed delivery as the systems
might go for restarts.
For such event-driven use-cases, we can provide an option to enable
synchronous events where the APIs wait until the event is successfully
sent or persisted.
This commit adds 'MINIO_API_SYNC_EVENTS' env which when set to 'on'
will enable sending/persisting events to targets synchronously.
A state is updated with a delete marker, which does not have parity or
data blocks defined, which can cause the integer divide by zero panics.
This commit fixes to avoid panics.
on "unversioned" buckets there are situations
when successive concurrent I/O can lead to
an inconsistent state() with mtime while the
etag might be the same for the object on disk.
in such a scenario it is possible for us to
allow reading of the object since etag matches
and if etag matches we are guaranteed that we
have enough copies the object will be readable
and same.
This PR allows fallback in such scenarios.
This PR also returns the replication status in
proxy calls and defers replication attempt if
HEAD on object version returned a error different
from NoSuchKey
A specific node should do the decommissioning task, however routing the
start decommissioning to that node was not working properly.
Co-authored-by: Anis Elleuch <anis@min.io>
fixes an issue under bucket replication could cause
ETags for replicated SSE-S3 single part PUT objects,
to fail as we would attempt a decryption while listing,
or stat() operation.
- lifecycle must return InvalidArgument for rule errors
- do not return `null` versionId in HTTP header
- reject mixed SSE uploads with correct error message
- getObjectTagging to be allowed for anonymous policies
- return correct errors for invalid retention period
- return sorted list of tags for an object
- putObjectTagging must return 200 OK not 204 OK
- return 409 ErrObjectLockConfigurationNotAllowed for existing buckets
PUT calls cannot afford to have large latency build-ups due
to contentious usage.json, or worse letting them fail with
some unexpected error, this can happen when this file is
concurrently being updated via scanner or it is being
healed during a disk replacement heal.
However, these are fairly quick in theory, stressed clusters
can quickly show visible latency this can add up leading to
invalid errors returned during PUT.
It is perhaps okay for us to relax this error return requirement
instead, make sure that we log that we are proceeding to take in
the requests while the quota is using an older value for the quota
enforcement. These things will reconcile themselves eventually,
via scanner making sure to overwrite the usage.json.
Bonus: make sure that storage-rest-client sets ExpectTimeouts to
be 'true', such that DiskInfo() call with contextTimeout does
not prematurely disconnect the servers leading to a longer
healthCheck, back-off routine. This can easily pile up while also
causing active callers to disconnect, leading to quorum loss.
DiskInfo is actively used in the PUT, Multipart call path for
upgrading parity when disks are down, it in-turn shouldn't cause
more disks to go down.