healing disks take active I/O it is possible
that deleted objects might stay in .trash
folder for a really long time until the drive
is fully healed.
this PR changes it such that we are making sure
we purge the active content written to these
disks as well.
- speedtest logs calls that were canceled
spuriously, in situations where it should
be ignored.
- all errors of interest are always sent back
to the client there is no need to log them
on the server console.
- PUT failures should negate the increments
such that GET is not attempted on unsuccessful
calls.
- do not attempt MRF on speedtest objects.
In the testing mode, reformatting disks will fail because the healing
code will complain if one disk is in root mode. This commit will
automatically set all disks as non-root if MINIO_CI_CD is set.
Currently, when applying any dynamic config, the system reloads and
re-applies the config of all the dynamic sub-systems.
This PR refactors the code in such a way that changing config of a given
dynamic sub-system will work on only that sub-system.
An onlineDisk means its a valid disk but it may be a
re-connected disk, this PR verifies that based on LastConn()
to only trigger MRF. Current code would again re-load the
disk 'format.json' which is not necessary and perhaps an
unnecessary call.
A potential side affect of this is closing perfectly online
disks and getting re-replaced by reloading 'format.json'.
This PR tries to avoid this situation by making sure MRF
is triggered but not reloading 'format.json' because of MRF.
Only the first `listAndHeal` would ever be able to write on errCh, blocking all others infinitely.
Instead read all errors but return the first non-nil, if any.
The intention appears to be that this should cancel on any error,
so that part is kept.
Regression from #13990
When more than one gateway reads and writes from the same mount point
and there is a load balancer pointing to those gateways. Each gateway
will try to create its own temporary append file but fails to clear it later
when not needed.
This commit creates a routine that checks all upload IDs saved in
multipart directory and remove any stale entry with the same upload id
in the memory and in the temporary background append folder as well.
Enabled with `mc admin config set alias/ api gzip_objects=on`
Standard filtering applies (1K response minimum, not compressed content
type, not range request, gzip accepted by client).
The current code considers a pool with all root disks to be as part
of a testing environment even if there are other pools with mounted
disks. This will result to illegitimate writing in root disks.
Fix this by simplifing the logic: require MINIO_CI_CD in order to skip
root disk check.
MinIO configuration is loaded after the initialization of the server
handlers, which will miss the initialization of the bucket forwarder
handler.
Though the federation is deprecated, let's fix this for the time being.
S3 spec returns x-amz-restore header in HEAD/GET object with the
following format:
```
x-amz-restore: ongoing-request="false", expiry-date="Fri, 21 Dec 2012
00:00:00 GMT"
```
This commit adds quotes as the current code does not support it. It will
also supports the old format saved in the disk (in xl.meta) for backward
compatibility.
A regression removed support of federation in the gateway mode.
Enable it again.
Federation is deprecated for a while but let's fix this for the time being.
Deleting bulk objects had an issue since the relevant versionID
is not passed through the layers to ensure that the dangling
object purge actually works cleanly.
This is a continuation of quorum related error returned by
multi-object delete API from #14248
This PR ensures that we pass down correct information as
well as extend the scope of dangling object detection.
When setting a config of a particular sub-system, validate the existing
config and notification targets of only that sub-system, so that
existing errors related to one sub-system (e.g. notification target
offline) do not result in errors for other sub-systems.
Some users running MinIO claim that their system became slow. One
way to investigate is to look at this Prometheus history of the number of
the requests reaching the server. The existing current S3 requests metric
is not enough because it can increase of the system really becomes slow,
due to disk issues for example.
startup speed-up, currently getFormatErasureInQuorum()
would spend up to 2-3secs when there are 3000+ drives
for example in a setup, simplify this implementation
to use drive counts.
DeleteMarkers do not have a default quorum, i.e it is possible that
DeleteMarkers were created with n/2+1 quorum as well to make sure
that we satisfy situations such as those we need to make sure delete
markers only expect n/2 read quorum.
Additionally we should also look at additional metadata on the
actual objects that might have been "erasure" upgraded with new
parity when disks are down.
In such a scenario do not default to the standard storage class
parity, instead use the parityBlocks present on the FileInfo to
ensure that we are dealing with the correct quorum for READs and
DELETEs.
Retry listings, when no next marker is returned and the result isn't truncated.
This can happen when an object is queued, but no info can be fetched.
Fixes#14190
The healing code repeatedly tries to heal a root disk when it is empty
the reason is that connectEndpoint() returns errUnformattedDisk even
if the disk is a root disk. Changing that to returning another error
will avoid queueing the disk to the healing code in each connect disks
iteration.