Commit Graph

171 Commits

Author SHA1 Message Date
Krutika Dhananjay
58659f26f4 Drop v3 metrics from community docs (#21678) 2025-11-06 02:38:16 -08:00
Daryl White
0848e69602 Update docs links throughout (#21513) 2025-08-12 11:20:36 -07:00
Johannes Horn
d002beaee3 feat: add variable for datasource in grafana dashboards (#21470) 2025-08-03 18:46:49 -07:00
Shubhendu
2d8ba15b9e Correct spelling (#21225) 2025-04-23 08:13:23 -07:00
Alexander Kalaj
46922c71b7 Updating Prom queries to include tilde needed to work (#21054) 2025-03-22 08:22:29 -07:00
TripleChecker
bc4008ced4 Fix typos (#20970) 2025-02-26 01:25:50 -08:00
Erfan
4208d7af5a docs: remove redundant prometheus metric (#20618) 2024-11-06 07:44:21 -08:00
Allan Roger Reid
b3ab7546ee Fix typos in README.md (#20599) 2024-10-31 10:38:53 -07:00
Andrea Longo
db2c7ed1d1 Docs: link to prom collector repo for info on debug metrics (#20209)
link to prom collector repo for info on debug metrics
2024-08-02 15:30:11 -07:00
Poorna
74c047cb03 fix replication last hour metric (#20199)
also adding missing recent_backlog_count metric to v3 metrics
2024-08-01 17:55:27 -07:00
Andrea Longo
3bc39db34e Restructure metrics v3 readme for docs use (#20114) 2024-07-29 11:48:51 -07:00
Anis Eleuch
21cf29330e grafana: Fix the unit in Open FDs panel (#20144) 2024-07-24 07:51:03 -07:00
Shubhendu
3bd3470d0b Corrected names of node replication metrics (#19932)
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2024-06-13 15:26:54 -07:00
Bala FA
7edc352d23 Add ILM metrics in metrics-v3 (#19539)
Signed-off-by: Bala.FA <bala@minio.io>
2024-06-06 02:36:25 -07:00
Shubhendu
39ac720826 Remove hardcoded override as not needed (#19868)
Fixes: https://github.com/minio/minio/issues/19867

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2024-06-04 06:24:37 -07:00
Shireesh Anjal
a591e06ae5 Add cluster scanner metrics in metrics-v3 (#19517)
endpoint: /minio/metrics/v3/cluster/scanner
metrics:
 - bucket_scans_finished (counter)
 - bucket_scans_started (counter)
 - directories_scanned (counter)
 - last_activity_nano_seconds (gauge)
 - objects_scanned (counter)
 - versions_scanned (counter)
2024-05-24 12:29:25 -07:00
Shireesh Anjal
5659cddc84 Add cluster config metrics in metrics-v3 (#19507)
endpoint: /minio/metrics/v3/cluster/config
metrics:
- write_quorum
- rrs_parity
- standard_parity
2024-05-24 05:50:46 -07:00
Shubhendu
1654a9b7e6 Use point in time values for gauge metrics in graphs (#19690)
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2024-05-24 04:11:51 -07:00
Shireesh Anjal
673a521711 Change endpoint of v3 notification metrics (#19804)
from /cluster/notification to /notification
2024-05-24 04:10:24 -07:00
Shireesh Anjal
7981509cc8 Add cluster and bucket replication metrics in metrics-v3 (#19546)
endpoint: /minio/metrics/v3/cluster/replication
metrics:
- average_active_workers
- average_queued_bytes
- average_queued_count
- average_transfer_rate
- current_active_workers
- current_transfer_rate
- last_minute_queued_bytes
- last_minute_queued_count
- max_active_workers
- max_queued_bytes
- max_queued_count
- max_transfer_rate
- recent_backlog_count

endpoint: /minio/metrics/v3/api/bucket/replication
metrics:
- last_hour_failed_bytes
- last_hour_failed_count
- last_minute_failed_bytes
- last_minute_failed_count
- latency_ms
- proxied_delete_tagging_requests_total
- proxied_get_requests_failures
- proxied_get_requests_total
- proxied_get_tagging_requests_failures
- proxied_get_tagging_requests_total
- proxied_head_requests_failures
- proxied_head_requests_total
- proxied_put_tagging_requests_failures
- proxied_put_tagging_requests_total
- sent_bytes
- sent_count
- total_failed_bytes
- total_failed_count
- proxied_delete_tagging_requests_failures
2024-05-23 00:41:18 -07:00
Shireesh Anjal
3bab4822f3 Add logger webhook metrics in metrics-v3 (#19515)
endpoint: /minio/metrics/v3/cluster/webhook
metrics:
- failed_messages (counter)
- online (gauge)
- queue_length (gauge)
- total_messages (counter)
2024-05-14 00:27:33 -07:00
Shireesh Anjal
5808190398 Add more metrics to v3/cluster/erasure-set (#19714)
Metrics being added:

- read_tolerance: No of drive failures that can be tolerated without
  disrupting read operations
- write_tolerance: No of drive failures that can be tolerated without
  disrupting write operations
- read_health: Health of the erasure set in a pool for read operations
  (1=healthy, 0=unhealthy)
- write_health: Health of the erasure set in a pool for write operations
  (1=healthy, 0=unhealthy)
2024-05-14 00:25:56 -07:00
Shireesh Anjal
b2a82248b1 Move /system/go to /debug/go (#19707) 2024-05-14 00:25:37 -07:00
Shireesh Anjal
074d70112d Consolidate drive health related metrics into single metric (#19706)
Instead of having "online" and "healing" as two metrics, replace with a
single metric "health" which can have following values:

0 = offline
1 = healthy
2 = healing
2024-05-12 10:23:50 -07:00
Shireesh Anjal
60d7e8143a Move /cluster/audit to /audit (#19708)
As the audit metrics are server level and not 
overall cluster level.
2024-05-10 07:50:39 -07:00
Shireesh Anjal
04f92f1291 Change endpoint format for per-bucket metrics (#19655)
Per-bucket metrics endpoints always start with /bucket and the bucket
name is appended to the path. e.g. if the collector path is /bucket/api,
the endpoint for the bucket "mybucket" would be
/minio/metrics/v3/bucket/api/mybucket

Change the existing bucket api endpoint accordingly from /api/bucket to
/bucket/api
2024-05-02 10:37:57 -07:00
Bala FA
e5b16adb1c Add cluster IAM metrics in metrics-v3 (#19595)
Signed-off-by: Bala.FA <bala@minio.io>
2024-05-02 01:20:42 -07:00
Shireesh Anjal
4caa3422bd Add process metrics in metrics-v3 (#19612)
endpoint: /minio/metrics/v3/system/process
metrics:
- locks_read_total
- locks_write_total
- cpu_total_seconds
- go_routine_total
- io_rchar_bytes
- io_read_bytes
- io_wchar_bytes
- io_write_bytes
- start_time_seconds
- uptime_seconds
- file_descriptor_limit_total
- file_descriptor_open_total
- syscall_read_total
- syscall_write_total
- resident_memory_bytes
- virtual_memory_bytes
- virtual_memory_max_bytes

Since the standard process collector implements only a subset of these
metrics, remove it and implement our own custom process collector that
captures all the process metrics we need.
2024-04-26 09:07:23 -07:00
Harshavardhana
c54ffde568 add metrics ioerror counter for alerts on I/O errors (#19618) 2024-04-25 15:01:31 -07:00
Bala FA
14cdadfb56 Add cluster notification metrics in metrics-v3 (#19533)
Signed-off-by: Bala.FA <bala@minio.io>
2024-04-23 21:10:35 -07:00
Shireesh Anjal
f7b665347e Add system CPU metrics to metrics-v3 (#19560)
endpoint: /minio/metrics/v3/system/cpu

metrics:
- minio_system_cpu_avg_idle
- minio_system_cpu_avg_iowait
- minio_system_cpu_load
- minio_system_cpu_load_perc
- minio_system_cpu_nice
- minio_system_cpu_steal
- minio_system_cpu_system
- minio_system_cpu_user
2024-04-23 16:56:12 -07:00
Shireesh Anjal
ca5fab8656 Add cluster audit metrics in metrics-v3 (#19514)
endpoint: /minio/metrics/v3/cluster/audit
metrics:
- failed_messages (counter)
- total_messages (counter)
- target_queue_length (gauge)
2024-04-17 02:18:02 -07:00
Shireesh Anjal
6df76ca73c Add system memory metrics in v3 (#19486)
Following memory metrics will be added under /system/memory

- available
- buffers
- cache
- free
- shared
- total
- used
- used_perc
2024-04-16 22:10:25 -07:00
Markus Wagner
0cf3d93360 removed hardcoded datasource uid (#19477) 2024-04-15 03:03:01 -07:00
Shubhendu
d3a07c29ba Correct sample for node scrape configuration (#19491)
As node metrics should be scraped per node basis, use a sample
configuartion using all the nodes in targets.

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2024-04-12 08:49:30 -07:00
Harshavardhana
41ec038523 remove permission denied error for being drive error (#19478) 2024-04-11 14:22:15 -07:00
Shireesh Anjal
08d3d06a06 Add drive metrics in metrics-v3 (#19452)
Add following metrics:

- used_inodes
- total_inodes
- healing
- online
- reads_per_sec
- reads_kb_per_sec
- reads_await
- writes_per_sec
- writes_kb_per_sec
- writes_await
- perc_util

To be able to calculate the `per_sec` values, we capture the IOStats-related 
data in the beginning (along with the time at which they were captured), 
and compare them against the current values subsequently. This is because 
dividing by "time since server uptime." doesn't work in k8s environments.
2024-04-11 10:46:34 -07:00
Shubhendu
d96d696841 Dont use deprecated angular (#19396)
Support for Angular would be stopped with newer versions of grafana

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2024-04-03 19:01:53 -07:00
Shubhendu
d87f91720b Split the replication dashboard in cluster and node level (#19374)
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2024-03-28 10:15:39 -07:00
Shubhendu
d63e603040 Pre populate the server names using a query (#19367)
User doesn't need to remember and enter the server values,
rather they can select from the pre populated list.

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2024-03-28 08:14:26 -07:00
Shubhendu
3d4fc28ec9 Render node graphs by node (#19356)
As total drives count, online vs offline are per node basis, its
corect to select node for which graphs need to be rendered.

Set prometheus scrape jobs to fetch metrics from all nodes. A sample
scrape job for node metrics could be as below

```
- job_name: minio-job-node
  bearer_token: <token>
  metrics_path: /minio/v2/metrics/node
  scheme: https
  tls_config:
    insecure_skip_verify: true
  static_configs:
  - targets: [tenant1-ss-0-0.tenant1-hl.tenant-ns.svc.cluster.local:9000,tenant1-ss-0-1.tenant1-hl.tenant-ns.svc.cluster.local:9000,tenant1-ss-0-2.tenant1-hl.tenant-ns.svc.cluster.local:9000,tenant1-ss-0-3.tenant1-hl.tenant-ns.svc.cluster.local:9000]
```

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2024-03-27 10:41:08 -07:00
Shubhendu
53a14c7301 Adding dashboard for MinIO node metrics (#19329)
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2024-03-26 08:01:28 -07:00
Aditya Manthramurthy
b2c5b75efa feat: Add Metrics V3 API (#19068)
Metrics v3 is mainly a reorganization of metrics into smaller groups of
metrics and the removal of internal aggregation of metrics received from
peer nodes in a MinIO cluster.

This change adds the endpoint `/minio/metrics/v3` as the top-level metrics
endpoint and under this, various sub-endpoints are implemented. These
are currently documented in `docs/metrics/v3.md`

The handler will serve metrics at any path
`/minio/metrics/v3/PATH`, as follows:

when PATH is a sub-endpoint listed above => serves the group of
metrics under that path; or when PATH is a (non-empty) parent 
directory of the sub-endpoints listed above => serves metrics
from each child sub-endpoint of PATH. otherwise, returns a no 
resource found error

All available metrics are listed in the `docs/metrics/v3.md`. More will
be added subsequently.
2024-03-10 01:15:15 -08:00
Ravind Kumar
f3e7c42425 Update metrics list.md with new metrics from RELEASE.2024-01-05 (#19161) 2024-02-29 14:53:54 -08:00
Shubhendu
f46bee242c Re-organized grafana dashboards (#19157)
Moved different dashboards to their specific directories. Also
mentioned that these dashbards are examples of how to create
graphs using MinIO provided and metrics and customers should
change / add graphs on their specific need basis.

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2024-02-29 10:35:20 -08:00
schmittey
c44f311c4f Add missing yaml syntax highlighting in prometheus README.md (#19087) 2024-02-20 16:22:37 -08:00
Shubhendu
cb7dab17cb Graph cluster and bucket replication proxied requests (#19078)
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2024-02-20 01:45:00 -08:00
Harshavardhana
eac4e4b279 honor replaced disk properly by updating globalLocalDrives (#19038)
globalLocalDrives seem to be not updated during the
HealFormat() leads to a requirement where the server
needs to be restarted for the healing to continue.
2024-02-12 13:00:20 -08:00
Poorna
27d02ea6f7 metrics: add replication metrics on proxied requests (#18957) 2024-02-05 22:00:45 -08:00
Daniel Valdivia
403ec7cf21 fix: metrics URI path in prometheus docs (#18907) 2024-01-29 14:34:21 -08:00