minio

mirror of https://github.com/minio/minio.git synced 2025-11-25 03:56:17 -05:00

Author	SHA1	Message	Date
Shubhendu	3d4fc28ec9	Render node graphs by node (#19356 ) As total drives count, online vs offline are per node basis, its corect to select node for which graphs need to be rendered. Set prometheus scrape jobs to fetch metrics from all nodes. A sample scrape job for node metrics could be as below ``` - job_name: minio-job-node bearer_token: <token> metrics_path: /minio/v2/metrics/node scheme: https tls_config: insecure_skip_verify: true static_configs: - targets: [tenant1-ss-0-0.tenant1-hl.tenant-ns.svc.cluster.local:9000,tenant1-ss-0-1.tenant1-hl.tenant-ns.svc.cluster.local:9000,tenant1-ss-0-2.tenant1-hl.tenant-ns.svc.cluster.local:9000,tenant1-ss-0-3.tenant1-hl.tenant-ns.svc.cluster.local:9000] ``` Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-03-27 10:41:08 -07:00
Shubhendu	53a14c7301	Adding dashboard for MinIO node metrics (#19329 ) Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-03-26 08:01:28 -07:00
Aditya Manthramurthy	b2c5b75efa	feat: Add Metrics V3 API (#19068 ) Metrics v3 is mainly a reorganization of metrics into smaller groups of metrics and the removal of internal aggregation of metrics received from peer nodes in a MinIO cluster. This change adds the endpoint `/minio/metrics/v3` as the top-level metrics endpoint and under this, various sub-endpoints are implemented. These are currently documented in `docs/metrics/v3.md` The handler will serve metrics at any path `/minio/metrics/v3/PATH`, as follows: when PATH is a sub-endpoint listed above => serves the group of metrics under that path; or when PATH is a (non-empty) parent directory of the sub-endpoints listed above => serves metrics from each child sub-endpoint of PATH. otherwise, returns a no resource found error All available metrics are listed in the `docs/metrics/v3.md`. More will be added subsequently.	2024-03-10 01:15:15 -08:00
Ravind Kumar	f3e7c42425	Update metrics list.md with new metrics from RELEASE.2024-01-05 (#19161 )	2024-02-29 14:53:54 -08:00
Shubhendu	f46bee242c	Re-organized grafana dashboards (#19157 ) Moved different dashboards to their specific directories. Also mentioned that these dashbards are examples of how to create graphs using MinIO provided and metrics and customers should change / add graphs on their specific need basis. Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-02-29 10:35:20 -08:00
schmittey	c44f311c4f	Add missing yaml syntax highlighting in prometheus README.md (#19087 )	2024-02-20 16:22:37 -08:00
Shubhendu	cb7dab17cb	Graph cluster and bucket replication proxied requests (#19078 ) Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-02-20 01:45:00 -08:00
Harshavardhana	eac4e4b279	honor replaced disk properly by updating globalLocalDrives (#19038 ) globalLocalDrives seem to be not updated during the HealFormat() leads to a requirement where the server needs to be restarted for the healing to continue.	2024-02-12 13:00:20 -08:00
Poorna	27d02ea6f7	metrics: add replication metrics on proxied requests (#18957 )	2024-02-05 22:00:45 -08:00
Daniel Valdivia	403ec7cf21	fix: metrics URI path in prometheus docs (#18907 )	2024-01-29 14:34:21 -08:00
Harshavardhana	944f3c1477	remove local disk metrics from cluster metrics (#18886 ) local disk metrics were polluting cluster metrics Please remove them instead of adding relevant ones. - batch job metrics were incorrectly kept at bucket metrics endpoint, move it to cluster metrics. - add tier metrics to cluster peer metrics from the node. - fix missing set level cluster health metrics	2024-01-28 12:53:59 -08:00
Cesar N	1a91edecae	Update list.md node_cpu wording (#18878 )	2024-01-26 18:57:58 -08:00
Harshavardhana	e11d851aee	add new drive I/O waiting/tokens metric (#18836 ) Bonus: add virtual memory used as well part of the system resource metrics.	2024-01-19 14:51:36 -08:00
Harshavardhana	dd2542e96c	add codespell action (#18818 ) Original work here, #18474, refixed and updated.	2024-01-17 23:03:17 -08:00
Shubhendu	9434fff215	Added list of scanner metrics to document (#18731 ) Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-01-02 10:41:33 -08:00
Mario Bros	fbd8dfe60f	Adding ~ to match job when multiple jobs (#18706 )	2023-12-27 15:39:20 -08:00
Shubhendu	9d7660b409	Graph cluster wide where applicable (#18705 ) Graph the maximum value reported across nodes at cluster level for applicable scenarios. Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2023-12-22 08:14:32 -08:00
Krishnan Parthasarathi	56b7045c20	Export tier metrics (#18678 ) minio_node_tier_ttlb_seconds - Distribution of time to last byte for streaming objects from warm tier minio_node_tier_requests_success - Number of requests to download object from warm tier that were successful minio_node_tier_requests_failure - Number of requests to download object from warm tier that failed	2023-12-20 20:13:40 -08:00
Anugrah Vijay	6acf038a84	docs: fix bucket metrics API\ path in docs (#18661 )	2023-12-18 08:21:08 -08:00
Praveen raj Mani	10ca0a6936	Label the notification target metrics by their target IDs (#18633 ) This patch adds the targetID to the existing notification target metrics and deprecates the current target metrics which points to the overall event notification subsystem	2023-12-14 09:09:26 -08:00
Shubhendu	6d4c1156d6	Changed the expression to render the value (#18627 ) The metrics `minio_bucket_replication_received_bytes` and `minio_bucket_replication_sent_bytes` are additive in nature and rendering the value as is looks fine. Also added sort order for few graphs for better reading of tool tips as keeping ones with highest value at top helps. Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2023-12-13 10:05:47 -08:00
Shireesh Anjal	7350a29fec	Capture percentage of cpu load and memory used (#18596 ) By default the cpu load is the cumulative of all cores. Capture the percentage load (load * 100 / cpu-count) Also capture the percentage memory used (used * 100 / total)	2023-12-06 13:19:59 -08:00
Harshavardhana	e98172d72d	avoid hot-tier SLA to be tied to warm-tier SLA (#18581 ) it is okay if the warm-tier cannot keep up, we should continue to take I/O at hot-tier, only fail hot-tier or block it when we are disk full. Bonus: add metrics counter for these missed tasks, we will know for sure if one of the node is lagging behind or is losing too many tasks during transitioning.	2023-12-02 13:02:12 -08:00
Shubhendu	317b40ef90	Fixed broken docs link (#18486 ) Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2023-11-20 12:04:49 -08:00
Shubhendu	e938ece492	Added guidelines for setting prometheus alerts (#18479 ) Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2023-11-19 10:16:08 -08:00
Shubhendu	e4b619ce1a	Added graph for Erasure Set Tolerance value (#18472 ) Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2023-11-17 10:38:15 -08:00
vicmunoz	da95a2d13f	fix: object versions metric help (#18388 )	2023-11-03 11:43:52 -07:00
Shubhendu	ef67c39910	Added graphs for KMS metrics (#18321 ) Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2023-10-30 03:20:53 -07:00
Shireesh Anjal	6d20ec3bea	Add support for resource metrics (#18057 ) Add a new endpoint for "resource" metrics `/v2/metrics/resource` This should return system metrics related to drives, network, CPU and memory. Except for drives, other metrics should have corresponding "avg" and "max" values also. Reuse the real-time feature to capture the required data, introducing CPU and memory metrics in it. Collect the data every minute and keep updating the average and max values accordingly, returning the latest values when the API is called.	2023-09-30 13:40:20 -07:00
Harshavardhana	822cbd4b43	add couple of missing things from #18027	2023-09-13 23:26:48 -07:00
Ravind Kumar	3c19a9308d	DOCS-987: Reorganizing list.md for better RST compatibility (#18027 )	2023-09-13 23:23:37 -07:00
Shubhendu	e47e625f73	Added replication graphs for site replication metrics (#17951 ) This dashboard graphs the metrics when site replication is enabled across MinIO instances. Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2023-08-31 08:31:16 -07:00
Shubhendu	0ce9e00ffa	Added node scanner and node drives graphs (#17949 ) Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2023-08-30 14:01:51 -07:00
Shubhendu	c778c381b5	Added new bucket replication graphs (#17947 ) This PR adds new bucket replication graphs for better and granular monitoring of bucket replication. Also arranged all replication graphs together. Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2023-08-30 11:57:41 -07:00
Poorna	b48bbe08b2	Add additional info for replication metrics API (#17293 ) to track the replication transfer rate across different nodes, number of active workers in use and in-queue stats to get an idea of the current workload. This PR also adds replication metrics to the site replication status API. For site replication, prometheus metrics are no longer at the bucket level - but at the cluster level. Add prometheus metric to track credential errors since uptime	2023-08-30 01:00:59 -07:00
Shubhendu	c3c8441a1d	Corrected the count of buckets and objects graphs (#17883 ) In distributed setup with a load balancer, randmoly any server would report the metrics `minio_cluster_bucket_total` and `minio_cluster_usage_object_total` and while graphing it, we should take max of reported values. Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2023-08-21 09:04:38 -07:00
Harshavardhana	8a9b886011	update grafana dashboard with disk -> drive rename (#17857 )	2023-08-15 16:04:20 -07:00
Harshavardhana	c4ca0a5a57	add two more drive metrics when metrics is available (#17854 )	2023-08-15 10:55:47 -07:00
Shubhendu	b6b6d6e8d8	Removed replication dashboard (#17815 ) As all replication metrics are moved at bucket level, all replication graphs as well are added under minio-bucket.json. Removing the independent replication dashboard. Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2023-08-08 08:13:45 -07:00
Harshavardhana	114fab4c70	export cluster health as prometheus metrics (#17741 )	2023-07-28 01:16:53 -07:00
Shubhendu	e1731d9403	Added bucket specific grafana dashboard (#17727 ) Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2023-07-26 15:10:11 -07:00
Harshavardhana	e1094dde08	update MinIO replication dashboard with latest metrics	2023-07-21 17:30:04 -07:00
Doom And Love	d004c45386	grafana-dashboard: Update scrape_jobs variable to be single select (#17696 ) Set `includeAll` and `mult` to be false since this dashboard only works with a single value being selected	2023-07-21 14:12:44 -07:00
Krishnan Parthasarathi	9eeee92d36	Add deletemarker_total metric (#17689 )	2023-07-20 07:52:32 -07:00
Harshavardhana	c0a5bdaed9	update grafana dashboard JSON with the new metrics (#17683 )	2023-07-19 08:16:04 -07:00
Harshavardhana	6426b74770	move bucket centric metrics to /minio/v2/metrics/bucket handlers (#17663 ) users/customers do not have a reasonable number of buckets anymore, this is why we must avoid overpopulating cluster endpoints, instead move the bucket monitoring to a separate endpoint. some of it's a breaking change here for a couple of metrics, but it is imperative that we do it to improve the responsiveness of our Prometheus cluster endpoint. Bonus: Added new cluster metrics for usage, objects and histograms	2023-07-18 22:25:12 -07:00
Harshavardhana	8af0773baf	remove deprecated Content-Security-Policy (#17580 ) https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Security-Policy/block-all-mixed-content	2023-07-06 09:18:38 -07:00
Harshavardhana	7605d07bb2	add support for bucket level request count per API (#17468 ) New metrics added to calculate API request count per bucket, per API. Captures errors, including 4xx, 5xx HTTP status codes separately.	2023-06-21 09:41:59 -07:00
Anis Eleuch	46d45a6923	grafana: Add TCP dial errors panel (#17101 )	2023-04-28 11:11:17 -07:00
Anis Eleuch	2448a9e047	grafana: Remove minio_s3_requests_errors_total metric (#17094 )	2023-04-27 10:55:30 -07:00

1 2 3

131 Commits