Commit Graph

92 Commits

Author SHA1 Message Date
Anugrah Vijay 6acf038a84
docs: fix bucket metrics API\ path in docs (#18661) 2023-12-18 08:21:08 -08:00
Praveen raj Mani 10ca0a6936
Label the notification target metrics by their target IDs (#18633)
This patch adds the targetID to the existing notification target metrics
and deprecates the current target metrics which points to the overall
event notification subsystem
2023-12-14 09:09:26 -08:00
Shubhendu 6d4c1156d6
Changed the expression to render the value (#18627)
The metrics `minio_bucket_replication_received_bytes` and
`minio_bucket_replication_sent_bytes` are additive in nature
and rendering the value as is looks fine.

Also added sort order for few graphs for better reading of tool
tips as keeping ones with highest value at top helps.

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-12-13 10:05:47 -08:00
Shireesh Anjal 7350a29fec
Capture percentage of cpu load and memory used (#18596)
By default the cpu load is the cumulative of all cores. Capture the
percentage load (load * 100 / cpu-count)

Also capture the percentage memory used (used * 100 / total)
2023-12-06 13:19:59 -08:00
Harshavardhana e98172d72d
avoid hot-tier SLA to be tied to warm-tier SLA (#18581)
it is okay if the warm-tier cannot keep up, we should continue
to take I/O at hot-tier, only fail hot-tier or block it when
we are disk full.

Bonus: add metrics counter for these missed tasks, we will
know for sure if one of the node is lagging behind or is
losing too many tasks during transitioning.
2023-12-02 13:02:12 -08:00
Shubhendu 317b40ef90
Fixed broken docs link (#18486)
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-11-20 12:04:49 -08:00
Shubhendu e938ece492
Added guidelines for setting prometheus alerts (#18479)
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-11-19 10:16:08 -08:00
Shubhendu e4b619ce1a
Added graph for Erasure Set Tolerance value (#18472)
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-11-17 10:38:15 -08:00
vicmunoz da95a2d13f
fix: object versions metric help (#18388) 2023-11-03 11:43:52 -07:00
Shubhendu ef67c39910
Added graphs for KMS metrics (#18321)
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-10-30 03:20:53 -07:00
Shireesh Anjal 6d20ec3bea
Add support for resource metrics (#18057)
Add a new endpoint for "resource" metrics `/v2/metrics/resource`

This should return system metrics related to drives, network, CPU and
memory. Except for drives, other metrics should have corresponding "avg"
and "max" values also.

Reuse the real-time feature to capture the required data,
introducing CPU and memory metrics in it.

Collect the data every minute and keep updating the average and max values
accordingly, returning the latest values when the API is called.
2023-09-30 13:40:20 -07:00
Harshavardhana 822cbd4b43 add couple of missing things from #18027 2023-09-13 23:26:48 -07:00
Ravind Kumar 3c19a9308d
DOCS-987: Reorganizing list.md for better RST compatibility (#18027) 2023-09-13 23:23:37 -07:00
Shubhendu e47e625f73
Added replication graphs for site replication metrics (#17951)
This dashboard graphs the metrics when site replication is enabled
across MinIO instances.

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-08-31 08:31:16 -07:00
Shubhendu 0ce9e00ffa
Added node scanner and node drives graphs (#17949)
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-08-30 14:01:51 -07:00
Shubhendu c778c381b5
Added new bucket replication graphs (#17947)
This PR adds new bucket replication graphs for better and granular
monitoring of bucket replication. Also arranged all replication graphs
together.

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-08-30 11:57:41 -07:00
Poorna b48bbe08b2
Add additional info for replication metrics API (#17293)
to track the replication transfer rate across different nodes,
number of active workers in use and in-queue stats to get
an idea of the current workload.

This PR also adds replication metrics to the site replication
status API. For site replication, prometheus metrics are
no longer at the bucket level - but at the cluster level.

Add prometheus metric to track credential errors since uptime
2023-08-30 01:00:59 -07:00
Shubhendu c3c8441a1d
Corrected the count of buckets and objects graphs (#17883)
In distributed setup with a load balancer, randmoly any server
would report the metrics `minio_cluster_bucket_total` and
`minio_cluster_usage_object_total` and while graphing it, we should
take max of reported values.

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-08-21 09:04:38 -07:00
Harshavardhana 8a9b886011
update grafana dashboard with disk -> drive rename (#17857) 2023-08-15 16:04:20 -07:00
Harshavardhana c4ca0a5a57
add two more drive metrics when metrics is available (#17854) 2023-08-15 10:55:47 -07:00
Shubhendu b6b6d6e8d8
Removed replication dashboard (#17815)
As all replication metrics are moved at bucket level, all replication
graphs as well are added under minio-bucket.json. Removing the independent
replication dashboard.

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-08-08 08:13:45 -07:00
Harshavardhana 114fab4c70
export cluster health as prometheus metrics (#17741) 2023-07-28 01:16:53 -07:00
Shubhendu e1731d9403
Added bucket specific grafana dashboard (#17727)
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-07-26 15:10:11 -07:00
Harshavardhana e1094dde08 update MinIO replication dashboard with latest metrics 2023-07-21 17:30:04 -07:00
Doom And Love d004c45386
grafana-dashboard: Update scrape_jobs variable to be single select (#17696)
Set `includeAll` and `mult` to be false since this dashboard only works with a single value being selected
2023-07-21 14:12:44 -07:00
Krishnan Parthasarathi 9eeee92d36
Add deletemarker_total metric (#17689) 2023-07-20 07:52:32 -07:00
Harshavardhana c0a5bdaed9
update grafana dashboard JSON with the new metrics (#17683) 2023-07-19 08:16:04 -07:00
Harshavardhana 6426b74770
move bucket centric metrics to /minio/v2/metrics/bucket handlers (#17663)
users/customers do not have a reasonable number of buckets anymore,
this is why we must avoid overpopulating cluster endpoints, instead
move the bucket monitoring to a separate endpoint.

some of it's a breaking change here for a couple of metrics, but
it is imperative that we do it to improve the responsiveness of
our Prometheus cluster endpoint.

Bonus: Added new cluster metrics for usage, objects and histograms
2023-07-18 22:25:12 -07:00
Harshavardhana 7605d07bb2
add support for bucket level request count per API (#17468)
New metrics added to calculate API request count
per bucket, per API.  Captures errors, including
4xx, 5xx HTTP status codes separately.
2023-06-21 09:41:59 -07:00
Anis Eleuch 46d45a6923
grafana: Add TCP dial errors panel (#17101) 2023-04-28 11:11:17 -07:00
Anis Eleuch 2448a9e047
grafana: Remove minio_s3_requests_errors_total metric (#17094) 2023-04-27 10:55:30 -07:00
Jiffs Maverick 61101d82d9 Rename inodes metric in grafana dashboards (#17030) 2023-04-21 11:07:30 -07:00
Harshavardhana e47a31f9fc
fix: object size distribution in metrics for all objects (#16539) 2023-02-04 21:10:10 -08:00
Anis Elleuch e73894fa50
grafana: Show one metric for the total data growth (#16449) 2023-01-20 09:39:28 -08:00
Anis Elleuch b8943fdf19
doc: Update prometheus metrics list (#16329) 2022-12-29 15:08:22 -08:00
Daryl White d44f3526dc
Update links to documentation site (#15750) 2022-09-28 21:28:45 -07:00
ebozduman b57e7321e7
Replaces 'disk'=>'drive' visible to end user (#15464) 2022-08-04 16:10:08 -07:00
MohammadReza f4d5c861f3
update grafana dashboard (#15357) 2022-07-21 15:17:44 -07:00
Harshavardhana 8082d1fed6
add bucket level S3 received/sent bytes (#15084)
adds bucket level metrics for bytes received and sent bytes on all S3 API calls.
2022-06-14 15:14:24 -07:00
Minio Trusted f34b2ef90b update dashboard Data Usage Growth as time series 2022-06-13 22:05:36 -07:00
Harshavardhana 7413045f0e
fix: add missing minio_s3_requests_total (#15070)
PR #15052 caused a regression, add the missing metrics back.

Bonus:

- internode information should be only for distributed setups 
- update the dashboard to include 4xx and 5xx error panels.
2022-06-11 00:50:31 -07:00
Anis Elleuch 5fb420c703
prometheus: Add S3 4xx and 5xx S3 monitoring (#15052)
Currently minio_s3_requests_errors_total covers 4xx and 
5xx S3 responses which can be confusing when s3 applications 
sent a lot of HEAD requests with obvious 404 responses or 
when the replication is enabled.

Add 
- minio_s3_requests_4xx_errors_total
- minio_s3_requests_5xx_errors_total

to help users monitor 4xx and 5xx HTTP status codes separately.
2022-06-08 11:22:34 -07:00
Minio Trusted f63645546d update minimum goroutine threshold on dashboard 2022-06-06 22:13:54 -07:00
Harshavardhana c2630bb3a3 add total usage pie chart based on total/free bytes 2022-05-28 09:53:53 -07:00
Eco 81d2b54dfd
doc: typo fix for ttfb entry in table (#14647) 2022-03-29 09:42:02 -07:00
Harshavardhana e3e0532613
cleanup markdown docs across multiple files (#14296)
enable markdown-linter
2022-02-11 16:51:25 -08:00
Krishnan Parthasarathi 0ee2933234
Export tier metrics via Prometheus (#13413)
e.g
```
minio_cluster_ilm_transitioned_bytes{server="minio3:9000",tier="S3TIER-1"} 1.36317772e+08
minio_cluster_ilm_transitioned_bytes{server="minio3:9000",tier="S3TIER-2"} 2892
minio_cluster_ilm_transitioned_bytes{server="minio3:9000",tier="STANDARD"}
1.3631488e+08

minio_cluster_ilm_transitioned_objects{server="minio3:9000",tier="S3TIER-1"} 1
minio_cluster_ilm_transitioned_objects{server="minio3:9000",tier="S3TIER-2"} 0
minio_cluster_ilm_transitioned_objects{server="minio3:9000",tier="STANDARD"} 1

minio_cluster_ilm_transitioned_versions{server="minio3:9000",tier="S3TIER-1"} 3
minio_cluster_ilm_transitioned_versions{server="minio3:9000",tier="S3TIER-2"} 2
minio_cluster_ilm_transitioned_versions{server="minio3:9000",tier="STANDARD"} 1
```
2022-02-08 12:45:28 -08:00
Harshavardhana 74faed166a
Add quota usage as part of prometheus metrics (#14222)
Bonus: pass caller context when needed to all bucket metadata handling calls.
2022-01-31 17:27:43 -08:00
chrisbecke ef0b8367b5
Update minio-overview.json data source panel (#13730)
Add missing datasource in `Healing` panel.
2021-11-23 09:01:07 -08:00
Mani 7b82411e6f
change the unit of measurement from TB to TiB (#13686) 2021-11-18 20:06:37 -08:00