Commit Graph

24 Commits

Author SHA1 Message Date
Andrea Longo
db2c7ed1d1
Docs: link to prom collector repo for info on debug metrics (#20209)
link to prom collector repo for info on debug metrics
2024-08-02 15:30:11 -07:00
Poorna
74c047cb03
fix replication last hour metric (#20199)
also adding missing recent_backlog_count metric to v3 metrics
2024-08-01 17:55:27 -07:00
Andrea Longo
3bc39db34e
Restructure metrics v3 readme for docs use (#20114) 2024-07-29 11:48:51 -07:00
Bala FA
7edc352d23
Add ILM metrics in metrics-v3 (#19539)
Signed-off-by: Bala.FA <bala@minio.io>
2024-06-06 02:36:25 -07:00
Shireesh Anjal
a591e06ae5
Add cluster scanner metrics in metrics-v3 (#19517)
endpoint: /minio/metrics/v3/cluster/scanner
metrics:
 - bucket_scans_finished (counter)
 - bucket_scans_started (counter)
 - directories_scanned (counter)
 - last_activity_nano_seconds (gauge)
 - objects_scanned (counter)
 - versions_scanned (counter)
2024-05-24 12:29:25 -07:00
Shireesh Anjal
5659cddc84
Add cluster config metrics in metrics-v3 (#19507)
endpoint: /minio/metrics/v3/cluster/config
metrics:
- write_quorum
- rrs_parity
- standard_parity
2024-05-24 05:50:46 -07:00
Shireesh Anjal
673a521711
Change endpoint of v3 notification metrics (#19804)
from /cluster/notification to /notification
2024-05-24 04:10:24 -07:00
Shireesh Anjal
7981509cc8
Add cluster and bucket replication metrics in metrics-v3 (#19546)
endpoint: /minio/metrics/v3/cluster/replication
metrics:
- average_active_workers
- average_queued_bytes
- average_queued_count
- average_transfer_rate
- current_active_workers
- current_transfer_rate
- last_minute_queued_bytes
- last_minute_queued_count
- max_active_workers
- max_queued_bytes
- max_queued_count
- max_transfer_rate
- recent_backlog_count

endpoint: /minio/metrics/v3/api/bucket/replication
metrics:
- last_hour_failed_bytes
- last_hour_failed_count
- last_minute_failed_bytes
- last_minute_failed_count
- latency_ms
- proxied_delete_tagging_requests_total
- proxied_get_requests_failures
- proxied_get_requests_total
- proxied_get_tagging_requests_failures
- proxied_get_tagging_requests_total
- proxied_head_requests_failures
- proxied_head_requests_total
- proxied_put_tagging_requests_failures
- proxied_put_tagging_requests_total
- sent_bytes
- sent_count
- total_failed_bytes
- total_failed_count
- proxied_delete_tagging_requests_failures
2024-05-23 00:41:18 -07:00
Shireesh Anjal
3bab4822f3
Add logger webhook metrics in metrics-v3 (#19515)
endpoint: /minio/metrics/v3/cluster/webhook
metrics:
- failed_messages (counter)
- online (gauge)
- queue_length (gauge)
- total_messages (counter)
2024-05-14 00:27:33 -07:00
Shireesh Anjal
5808190398
Add more metrics to v3/cluster/erasure-set (#19714)
Metrics being added:

- read_tolerance: No of drive failures that can be tolerated without
  disrupting read operations
- write_tolerance: No of drive failures that can be tolerated without
  disrupting write operations
- read_health: Health of the erasure set in a pool for read operations
  (1=healthy, 0=unhealthy)
- write_health: Health of the erasure set in a pool for write operations
  (1=healthy, 0=unhealthy)
2024-05-14 00:25:56 -07:00
Shireesh Anjal
b2a82248b1
Move /system/go to /debug/go (#19707) 2024-05-14 00:25:37 -07:00
Shireesh Anjal
074d70112d
Consolidate drive health related metrics into single metric (#19706)
Instead of having "online" and "healing" as two metrics, replace with a
single metric "health" which can have following values:

0 = offline
1 = healthy
2 = healing
2024-05-12 10:23:50 -07:00
Shireesh Anjal
60d7e8143a
Move /cluster/audit to /audit (#19708)
As the audit metrics are server level and not 
overall cluster level.
2024-05-10 07:50:39 -07:00
Shireesh Anjal
04f92f1291
Change endpoint format for per-bucket metrics (#19655)
Per-bucket metrics endpoints always start with /bucket and the bucket
name is appended to the path. e.g. if the collector path is /bucket/api,
the endpoint for the bucket "mybucket" would be
/minio/metrics/v3/bucket/api/mybucket

Change the existing bucket api endpoint accordingly from /api/bucket to
/bucket/api
2024-05-02 10:37:57 -07:00
Bala FA
e5b16adb1c
Add cluster IAM metrics in metrics-v3 (#19595)
Signed-off-by: Bala.FA <bala@minio.io>
2024-05-02 01:20:42 -07:00
Shireesh Anjal
4caa3422bd
Add process metrics in metrics-v3 (#19612)
endpoint: /minio/metrics/v3/system/process
metrics:
- locks_read_total
- locks_write_total
- cpu_total_seconds
- go_routine_total
- io_rchar_bytes
- io_read_bytes
- io_wchar_bytes
- io_write_bytes
- start_time_seconds
- uptime_seconds
- file_descriptor_limit_total
- file_descriptor_open_total
- syscall_read_total
- syscall_write_total
- resident_memory_bytes
- virtual_memory_bytes
- virtual_memory_max_bytes

Since the standard process collector implements only a subset of these
metrics, remove it and implement our own custom process collector that
captures all the process metrics we need.
2024-04-26 09:07:23 -07:00
Harshavardhana
c54ffde568
add metrics ioerror counter for alerts on I/O errors (#19618) 2024-04-25 15:01:31 -07:00
Bala FA
14cdadfb56
Add cluster notification metrics in metrics-v3 (#19533)
Signed-off-by: Bala.FA <bala@minio.io>
2024-04-23 21:10:35 -07:00
Shireesh Anjal
f7b665347e
Add system CPU metrics to metrics-v3 (#19560)
endpoint: /minio/metrics/v3/system/cpu

metrics:
- minio_system_cpu_avg_idle
- minio_system_cpu_avg_iowait
- minio_system_cpu_load
- minio_system_cpu_load_perc
- minio_system_cpu_nice
- minio_system_cpu_steal
- minio_system_cpu_system
- minio_system_cpu_user
2024-04-23 16:56:12 -07:00
Shireesh Anjal
ca5fab8656
Add cluster audit metrics in metrics-v3 (#19514)
endpoint: /minio/metrics/v3/cluster/audit
metrics:
- failed_messages (counter)
- total_messages (counter)
- target_queue_length (gauge)
2024-04-17 02:18:02 -07:00
Shireesh Anjal
6df76ca73c
Add system memory metrics in v3 (#19486)
Following memory metrics will be added under /system/memory

- available
- buffers
- cache
- free
- shared
- total
- used
- used_perc
2024-04-16 22:10:25 -07:00
Harshavardhana
41ec038523
remove permission denied error for being drive error (#19478) 2024-04-11 14:22:15 -07:00
Shireesh Anjal
08d3d06a06
Add drive metrics in metrics-v3 (#19452)
Add following metrics:

- used_inodes
- total_inodes
- healing
- online
- reads_per_sec
- reads_kb_per_sec
- reads_await
- writes_per_sec
- writes_kb_per_sec
- writes_await
- perc_util

To be able to calculate the `per_sec` values, we capture the IOStats-related 
data in the beginning (along with the time at which they were captured), 
and compare them against the current values subsequently. This is because 
dividing by "time since server uptime." doesn't work in k8s environments.
2024-04-11 10:46:34 -07:00
Aditya Manthramurthy
b2c5b75efa
feat: Add Metrics V3 API (#19068)
Metrics v3 is mainly a reorganization of metrics into smaller groups of
metrics and the removal of internal aggregation of metrics received from
peer nodes in a MinIO cluster.

This change adds the endpoint `/minio/metrics/v3` as the top-level metrics
endpoint and under this, various sub-endpoints are implemented. These
are currently documented in `docs/metrics/v3.md`

The handler will serve metrics at any path
`/minio/metrics/v3/PATH`, as follows:

when PATH is a sub-endpoint listed above => serves the group of
metrics under that path; or when PATH is a (non-empty) parent 
directory of the sub-endpoints listed above => serves metrics
from each child sub-endpoint of PATH. otherwise, returns a no 
resource found error

All available metrics are listed in the `docs/metrics/v3.md`. More will
be added subsequently.
2024-03-10 01:15:15 -08:00