local disk metrics were polluting cluster metrics
Please remove them instead of adding relevant ones.
- batch job metrics were incorrectly kept at bucket
metrics endpoint, move it to cluster metrics.
- add tier metrics to cluster peer metrics from the node.
- fix missing set level cluster health metrics
In distributed setup with a load balancer, randmoly any server
would report the metrics `minio_cluster_bucket_total` and
`minio_cluster_usage_object_total` and while graphing it, we should
take max of reported values.
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
users/customers do not have a reasonable number of buckets anymore,
this is why we must avoid overpopulating cluster endpoints, instead
move the bucket monitoring to a separate endpoint.
some of it's a breaking change here for a couple of metrics, but
it is imperative that we do it to improve the responsiveness of
our Prometheus cluster endpoint.
Bonus: Added new cluster metrics for usage, objects and histograms
PR #15052 caused a regression, add the missing metrics back.
Bonus:
- internode information should be only for distributed setups
- update the dashboard to include 4xx and 5xx error panels.