minio

mirror of https://github.com/minio/minio.git synced 2025-07-13 02:51:05 -04:00

Author	SHA1	Message	Date
Shubhendu	3bd3470d0b	Corrected names of node replication metrics (#19932 ) Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-06-13 15:26:54 -07:00
Bala FA	7edc352d23	Add ILM metrics in metrics-v3 (#19539 ) Signed-off-by: Bala.FA <bala@minio.io>	2024-06-06 02:36:25 -07:00
Shubhendu	39ac720826	Remove hardcoded `override` as not needed (#19868 ) Fixes: https://github.com/minio/minio/issues/19867 Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-06-04 06:24:37 -07:00
Shireesh Anjal	a591e06ae5	Add cluster scanner metrics in metrics-v3 (#19517 ) endpoint: /minio/metrics/v3/cluster/scanner metrics: - bucket_scans_finished (counter) - bucket_scans_started (counter) - directories_scanned (counter) - last_activity_nano_seconds (gauge) - objects_scanned (counter) - versions_scanned (counter)	2024-05-24 12:29:25 -07:00
Shireesh Anjal	5659cddc84	Add cluster config metrics in metrics-v3 (#19507 ) endpoint: /minio/metrics/v3/cluster/config metrics: - write_quorum - rrs_parity - standard_parity	2024-05-24 05:50:46 -07:00
Shubhendu	1654a9b7e6	Use point in time values for `gauge` metrics in graphs (#19690 ) Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-05-24 04:11:51 -07:00
Shireesh Anjal	673a521711	Change endpoint of v3 notification metrics (#19804 ) from /cluster/notification to /notification	2024-05-24 04:10:24 -07:00
Shireesh Anjal	7981509cc8	Add cluster and bucket replication metrics in metrics-v3 (#19546 ) endpoint: /minio/metrics/v3/cluster/replication metrics: - average_active_workers - average_queued_bytes - average_queued_count - average_transfer_rate - current_active_workers - current_transfer_rate - last_minute_queued_bytes - last_minute_queued_count - max_active_workers - max_queued_bytes - max_queued_count - max_transfer_rate - recent_backlog_count endpoint: /minio/metrics/v3/api/bucket/replication metrics: - last_hour_failed_bytes - last_hour_failed_count - last_minute_failed_bytes - last_minute_failed_count - latency_ms - proxied_delete_tagging_requests_total - proxied_get_requests_failures - proxied_get_requests_total - proxied_get_tagging_requests_failures - proxied_get_tagging_requests_total - proxied_head_requests_failures - proxied_head_requests_total - proxied_put_tagging_requests_failures - proxied_put_tagging_requests_total - sent_bytes - sent_count - total_failed_bytes - total_failed_count - proxied_delete_tagging_requests_failures	2024-05-23 00:41:18 -07:00
Shireesh Anjal	3bab4822f3	Add logger webhook metrics in metrics-v3 (#19515 ) endpoint: /minio/metrics/v3/cluster/webhook metrics: - failed_messages (counter) - online (gauge) - queue_length (gauge) - total_messages (counter)	2024-05-14 00:27:33 -07:00
Shireesh Anjal	5808190398	Add more metrics to v3/cluster/erasure-set (#19714 ) Metrics being added: - read_tolerance: No of drive failures that can be tolerated without disrupting read operations - write_tolerance: No of drive failures that can be tolerated without disrupting write operations - read_health: Health of the erasure set in a pool for read operations (1=healthy, 0=unhealthy) - write_health: Health of the erasure set in a pool for write operations (1=healthy, 0=unhealthy)	2024-05-14 00:25:56 -07:00
Shireesh Anjal	b2a82248b1	Move /system/go to /debug/go (#19707 )	2024-05-14 00:25:37 -07:00
Shireesh Anjal	074d70112d	Consolidate drive health related metrics into single metric (#19706 ) Instead of having "online" and "healing" as two metrics, replace with a single metric "health" which can have following values: 0 = offline 1 = healthy 2 = healing	2024-05-12 10:23:50 -07:00
Shireesh Anjal	60d7e8143a	Move /cluster/audit to /audit (#19708 ) As the audit metrics are server level and not overall cluster level.	2024-05-10 07:50:39 -07:00
Shireesh Anjal	04f92f1291	Change endpoint format for per-bucket metrics (#19655 ) Per-bucket metrics endpoints always start with /bucket and the bucket name is appended to the path. e.g. if the collector path is /bucket/api, the endpoint for the bucket "mybucket" would be /minio/metrics/v3/bucket/api/mybucket Change the existing bucket api endpoint accordingly from /api/bucket to /bucket/api	2024-05-02 10:37:57 -07:00
Bala FA	e5b16adb1c	Add cluster IAM metrics in metrics-v3 (#19595 ) Signed-off-by: Bala.FA <bala@minio.io>	2024-05-02 01:20:42 -07:00
Shireesh Anjal	4caa3422bd	Add process metrics in `metrics-v3` (#19612 ) endpoint: /minio/metrics/v3/system/process metrics: - locks_read_total - locks_write_total - cpu_total_seconds - go_routine_total - io_rchar_bytes - io_read_bytes - io_wchar_bytes - io_write_bytes - start_time_seconds - uptime_seconds - file_descriptor_limit_total - file_descriptor_open_total - syscall_read_total - syscall_write_total - resident_memory_bytes - virtual_memory_bytes - virtual_memory_max_bytes Since the standard process collector implements only a subset of these metrics, remove it and implement our own custom process collector that captures all the process metrics we need.	2024-04-26 09:07:23 -07:00
Harshavardhana	c54ffde568	add metrics ioerror counter for alerts on I/O errors (#19618 )	2024-04-25 15:01:31 -07:00
Bala FA	14cdadfb56	Add cluster notification metrics in metrics-v3 (#19533 ) Signed-off-by: Bala.FA <bala@minio.io>	2024-04-23 21:10:35 -07:00
Shireesh Anjal	f7b665347e	Add system CPU metrics to metrics-v3 (#19560 ) endpoint: /minio/metrics/v3/system/cpu metrics: - minio_system_cpu_avg_idle - minio_system_cpu_avg_iowait - minio_system_cpu_load - minio_system_cpu_load_perc - minio_system_cpu_nice - minio_system_cpu_steal - minio_system_cpu_system - minio_system_cpu_user	2024-04-23 16:56:12 -07:00
Shireesh Anjal	ca5fab8656	Add cluster audit metrics in metrics-v3 (#19514 ) endpoint: /minio/metrics/v3/cluster/audit metrics: - failed_messages (counter) - total_messages (counter) - target_queue_length (gauge)	2024-04-17 02:18:02 -07:00
Shireesh Anjal	6df76ca73c	Add system memory metrics in v3 (#19486 ) Following memory metrics will be added under /system/memory - available - buffers - cache - free - shared - total - used - used_perc	2024-04-16 22:10:25 -07:00
Markus Wagner	0cf3d93360	removed hardcoded datasource uid (#19477 )	2024-04-15 03:03:01 -07:00
Shubhendu	d3a07c29ba	Correct sample for node scrape configuration (#19491 ) As node metrics should be scraped per node basis, use a sample configuartion using all the nodes in targets. Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-04-12 08:49:30 -07:00
Harshavardhana	41ec038523	remove permission denied error for being drive error (#19478 )	2024-04-11 14:22:15 -07:00
Shireesh Anjal	08d3d06a06	Add drive metrics in metrics-v3 (#19452 ) Add following metrics: - used_inodes - total_inodes - healing - online - reads_per_sec - reads_kb_per_sec - reads_await - writes_per_sec - writes_kb_per_sec - writes_await - perc_util To be able to calculate the `per_sec` values, we capture the IOStats-related data in the beginning (along with the time at which they were captured), and compare them against the current values subsequently. This is because dividing by "time since server uptime." doesn't work in k8s environments.	2024-04-11 10:46:34 -07:00
Shubhendu	d96d696841	Dont use deprecated angular (#19396 ) Support for Angular would be stopped with newer versions of grafana Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-04-03 19:01:53 -07:00
Shubhendu	d87f91720b	Split the replication dashboard in cluster and node level (#19374 ) Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-03-28 10:15:39 -07:00
Shubhendu	d63e603040	Pre populate the server names using a query (#19367 ) User doesn't need to remember and enter the server values, rather they can select from the pre populated list. Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-03-28 08:14:26 -07:00
Shubhendu	3d4fc28ec9	Render node graphs by node (#19356 ) As total drives count, online vs offline are per node basis, its corect to select node for which graphs need to be rendered. Set prometheus scrape jobs to fetch metrics from all nodes. A sample scrape job for node metrics could be as below ``` - job_name: minio-job-node bearer_token: <token> metrics_path: /minio/v2/metrics/node scheme: https tls_config: insecure_skip_verify: true static_configs: - targets: [tenant1-ss-0-0.tenant1-hl.tenant-ns.svc.cluster.local:9000,tenant1-ss-0-1.tenant1-hl.tenant-ns.svc.cluster.local:9000,tenant1-ss-0-2.tenant1-hl.tenant-ns.svc.cluster.local:9000,tenant1-ss-0-3.tenant1-hl.tenant-ns.svc.cluster.local:9000] ``` Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-03-27 10:41:08 -07:00
Shubhendu	53a14c7301	Adding dashboard for MinIO node metrics (#19329 ) Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-03-26 08:01:28 -07:00
Aditya Manthramurthy	b2c5b75efa	feat: Add Metrics V3 API (#19068 ) Metrics v3 is mainly a reorganization of metrics into smaller groups of metrics and the removal of internal aggregation of metrics received from peer nodes in a MinIO cluster. This change adds the endpoint `/minio/metrics/v3` as the top-level metrics endpoint and under this, various sub-endpoints are implemented. These are currently documented in `docs/metrics/v3.md` The handler will serve metrics at any path `/minio/metrics/v3/PATH`, as follows: when PATH is a sub-endpoint listed above => serves the group of metrics under that path; or when PATH is a (non-empty) parent directory of the sub-endpoints listed above => serves metrics from each child sub-endpoint of PATH. otherwise, returns a no resource found error All available metrics are listed in the `docs/metrics/v3.md`. More will be added subsequently.	2024-03-10 01:15:15 -08:00
Ravind Kumar	f3e7c42425	Update metrics list.md with new metrics from RELEASE.2024-01-05 (#19161 )	2024-02-29 14:53:54 -08:00
Shubhendu	f46bee242c	Re-organized grafana dashboards (#19157 ) Moved different dashboards to their specific directories. Also mentioned that these dashbards are examples of how to create graphs using MinIO provided and metrics and customers should change / add graphs on their specific need basis. Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-02-29 10:35:20 -08:00
schmittey	c44f311c4f	Add missing yaml syntax highlighting in prometheus README.md (#19087 )	2024-02-20 16:22:37 -08:00
Shubhendu	cb7dab17cb	Graph cluster and bucket replication proxied requests (#19078 ) Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-02-20 01:45:00 -08:00
Harshavardhana	eac4e4b279	honor replaced disk properly by updating globalLocalDrives (#19038 ) globalLocalDrives seem to be not updated during the HealFormat() leads to a requirement where the server needs to be restarted for the healing to continue.	2024-02-12 13:00:20 -08:00
Poorna	27d02ea6f7	metrics: add replication metrics on proxied requests (#18957 )	2024-02-05 22:00:45 -08:00
Daniel Valdivia	403ec7cf21	fix: metrics URI path in prometheus docs (#18907 )	2024-01-29 14:34:21 -08:00
Harshavardhana	944f3c1477	remove local disk metrics from cluster metrics (#18886 ) local disk metrics were polluting cluster metrics Please remove them instead of adding relevant ones. - batch job metrics were incorrectly kept at bucket metrics endpoint, move it to cluster metrics. - add tier metrics to cluster peer metrics from the node. - fix missing set level cluster health metrics	2024-01-28 12:53:59 -08:00
Cesar N	1a91edecae	Update list.md node_cpu wording (#18878 )	2024-01-26 18:57:58 -08:00
Harshavardhana	e11d851aee	add new drive I/O waiting/tokens metric (#18836 ) Bonus: add virtual memory used as well part of the system resource metrics.	2024-01-19 14:51:36 -08:00
Harshavardhana	dd2542e96c	add codespell action (#18818 ) Original work here, #18474, refixed and updated.	2024-01-17 23:03:17 -08:00
Shubhendu	9434fff215	Added list of scanner metrics to document (#18731 ) Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-01-02 10:41:33 -08:00
Mario Bros	fbd8dfe60f	Adding ~ to match job when multiple jobs (#18706 )	2023-12-27 15:39:20 -08:00
Shubhendu	9d7660b409	Graph cluster wide where applicable (#18705 ) Graph the maximum value reported across nodes at cluster level for applicable scenarios. Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2023-12-22 08:14:32 -08:00
Krishnan Parthasarathi	56b7045c20	Export tier metrics (#18678 ) minio_node_tier_ttlb_seconds - Distribution of time to last byte for streaming objects from warm tier minio_node_tier_requests_success - Number of requests to download object from warm tier that were successful minio_node_tier_requests_failure - Number of requests to download object from warm tier that failed	2023-12-20 20:13:40 -08:00
Anugrah Vijay	6acf038a84	docs: fix bucket metrics API\ path in docs (#18661 )	2023-12-18 08:21:08 -08:00
Praveen raj Mani	10ca0a6936	Label the notification target metrics by their target IDs (#18633 ) This patch adds the targetID to the existing notification target metrics and deprecates the current target metrics which points to the overall event notification subsystem	2023-12-14 09:09:26 -08:00
Shubhendu	6d4c1156d6	Changed the expression to render the value (#18627 ) The metrics `minio_bucket_replication_received_bytes` and `minio_bucket_replication_sent_bytes` are additive in nature and rendering the value as is looks fine. Also added sort order for few graphs for better reading of tool tips as keeping ones with highest value at top helps. Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2023-12-13 10:05:47 -08:00
Shireesh Anjal	7350a29fec	Capture percentage of cpu load and memory used (#18596 ) By default the cpu load is the cumulative of all cores. Capture the percentage load (load * 100 / cpu-count) Also capture the percentage memory used (used * 100 / total)	2023-12-06 13:19:59 -08:00

1 2 3 4

159 Commits