This PR adds new bucket replication graphs for better and granular
monitoring of bucket replication. Also arranged all replication graphs
together.
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
to track the replication transfer rate across different nodes,
number of active workers in use and in-queue stats to get
an idea of the current workload.
This PR also adds replication metrics to the site replication
status API. For site replication, prometheus metrics are
no longer at the bucket level - but at the cluster level.
Add prometheus metric to track credential errors since uptime
There is some consistent confusion between the Community Helm Chart in this repo and the MinIO Kubernetes Operator Helm Chart.
This change seeks to clarify the differences between the two charts and which ones are community maintained vs MinIO maintained.
replicationTimestamp might differ if there were retries
in replication and the retried attempt overwrote in
quorum but enough shards with newer timestamp causing
the existing timestamps on xl.meta to be invalid, we
do not rely on this value for anything external.
this is purely a hint for debugging purposes, but there
is no real value in it considering the object itself
is in-tact we do not have to spend time healing this
situation.
we may consider healing this situation in future but
that needs to be decoupled to make sure that we do not
over calculate how much we have to heal.
.metacache objects are transient in nature, and are better left to
use page-cache effectively to avoid using more IOPs on the disks.
this allows for incoming calls to be not taxed heavily due to
multiple large batch listings.
given a versionId the mtime is always the same, it
can never be different than its original value.
versionIds also do not conflict, since they are uuid's
and unique practically forever.
we expect a certain level of IOPs and latency so this is okay.
fixes other miscellaneous bugs
- such as hanging on mrfCh <- when the context is canceled
- queuing MRF heal when the context is canceled
- remove unused saveStateCh channel
In distributed setup with a load balancer, randmoly any server
would report the metrics `minio_cluster_bucket_total` and
`minio_cluster_usage_object_total` and while graphing it, we should
take max of reported values.
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
This commit updates the minio/kes-go dependency
to v0.2.0 and updates the existing code to work
with the new KES APIs.
The `SetPolicy` handler got removed since it
may not get implemented by KES at all and could
not have been used in the past since stateless KES
is read-only w.r.t. policies and identities.
Signed-off-by: Andreas Auernhammer <hi@aead.dev>
Bonus fixes include
- do not have to write final xl.meta (renameData) does this
already, saves some IOPs.
- make sure to purge the multipart directory properly using
a recursive delete, otherwise this can easily pile up and
rely on the stale uploads cleanup.
fixes#17863
This reverts commit bf3901342c.
This is to fix a regression caused when there are inconsistent
versions, but one version is in quorum. SuccessorModTime issue
must be fixed differently.
batch status can perpetually wait after completion
due to a race between the MetricsHandler() returning
the active metrics in intervals of 1sec and delete
of metrics after job completion.
this PR ensures that we keep the 'status' around
for a while, i.e upto 24hrs for all the batch jobs.
Two fields in lifecycles made GOB encoding consistently fail with `gob: type lifecycle.Prefix has no exported fields`.
This meant that in distributed systems listings would never be able to continue and would restart on every call.
Fix issues and be sure to log these errors at least once per bucket. We may see some connectivity errors here, but we shouldn't hide them.