all retries must not be counted as failed messages,
a failed message is a single counter not for all
retries, this PR fixes this.
Also we do not need to retry 10-times, instead we should
retry at max 3 times with some jitter to deliver the
messages.
- we already have MRF for most recent failures
- we trigger healing during HEAD/GET operation
These are enough, also change the default max wait
from 5sec to 1sec for default scanner speed.
configs from 2020 server throws an
error due to deprecation of the keys
however an attempt is made to parse
them, we should have chosen existing
defaults - this PR fixes that.
Fix drive rotational calculation status
If a MinIO drive path is mounted to a partition and not a real disk,
getting the rotational status would fail because Linux does not expose
that status to partition; In other words,
/sys/block/drive-partition-name/queue/rotational does not exist;
To fix the issue, the code will search for the rotational status of the
disk that hosts the partition, and this can be calculated from the
real path of /sys/class/block/<drive-partition-name>
Keys are helpful to ensure the strict ordering of messages, however currently the
code uses a random request id for every log, hence using the request-id
as a Kafka key is not serve any purpose;
This commit removes the usage of the key, to also fix the audit issue from
internal subsystem that does not have a request ID.
This commit updates the minio/kes-go dependency
to v0.2.0 and updates the existing code to work
with the new KES APIs.
The `SetPolicy` handler got removed since it
may not get implemented by KES at all and could
not have been used in the past since stateless KES
is read-only w.r.t. policies and identities.
Signed-off-by: Andreas Auernhammer <hi@aead.dev>
Two fields in lifecycles made GOB encoding consistently fail with `gob: type lifecycle.Prefix has no exported fields`.
This meant that in distributed systems listings would never be able to continue and would restart on every call.
Fix issues and be sure to log these errors at least once per bucket. We may see some connectivity errors here, but we shouldn't hide them.
Bonus:
- avoid calling DiskInfo() calls when missing blocks
instead heal the object using MRF operation.
- change the max_sleep to 250ms beyond that we will
not stop healing.
Send() is synchronous and can affect the latency of S3 requests when the
logger buffer is full.
Avoid checking if the HTTP target is online or not and increase the
workers anyway since the buffer is already full.
Also, avoid logs flooding when the audit target is down.
slower drives get knocked off because they are too slow via
active monitoring, we do not need to block calls arbitrarily.
Serializing adds latencies for already slow calls, remove
it for SSDs/NVMEs
Also, add a selection with context when writing to `out <-`
channel, to avoid any potential blocks.
Now it would list details of all KMS instances with additional
attributes `endpoint` and `version`. In the case of k8s-based
deployment the list would consist of a single entry.
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
Also shutdown poll add jitter, to verify if the shutdown
sequence can finish before 500ms, this reduces the overall
time taken during "restart" of the service.
Provides speedup for `mc admin service restart` during
active I/O, also ensures that systemd doesn't treat the
returned 'error' as a failure, certain configurations in
systemd can cause it to 'auto-restart' the process by-itself
which can interfere with `mc admin service restart`.
It can be observed how now restarting the service is
much snappier.