minio

Commit Graph

Author	SHA1	Message	Date
Harshavardhana	6426b74770	move bucket centric metrics to /minio/v2/metrics/bucket handlers (#17663 ) users/customers do not have a reasonable number of buckets anymore, this is why we must avoid overpopulating cluster endpoints, instead move the bucket monitoring to a separate endpoint. some of it's a breaking change here for a couple of metrics, but it is imperative that we do it to improve the responsiveness of our Prometheus cluster endpoint. Bonus: Added new cluster metrics for usage, objects and histograms	2023-07-18 22:25:12 -07:00
drivebyer	04c792476f	fix: provide a possible slice cap for heal failed metrics items (#17647 ) Signed-off-by: Wu <yang.wu@daocloud.io>	2023-07-14 11:02:45 -07:00
Harshavardhana	5b7c83341b	move per bucket metrics to peer location (#17627 )	2023-07-11 07:46:24 -07:00
Anis Eleuch	6d0bc5ab1e	prometheus: Fix internode stats (#17594 ) Internode calculation was done inside S3 handlers, fix it by moving it to internode handlers. Remove admin stats since it is not used.	2023-07-08 07:35:11 -07:00
Harshavardhana	abb1f22057	Revert "change ttfb_distribution metrics to histogramMetric (#17115 )" This reverts commit `9112ca4e29`.	2023-07-07 13:57:37 -07:00
Harshavardhana	7605d07bb2	add support for bucket level request count per API (#17468 ) New metrics added to calculate API request count per bucket, per API. Captures errors, including 4xx, 5xx HTTP status codes separately.	2023-06-21 09:41:59 -07:00
Aditya Manthramurthy	5a1612fe32	Bump up madmin-go and pkg deps (#17469 )	2023-06-19 17:53:08 -07:00
drivebyer	ad2ab6eb3e	fix: Give accurate cap to slice (#17224 )	2023-05-17 15:14:09 -07:00
jiuker	9a799065b3	fix: make slice cap of right size (#17192 )	2023-05-16 08:10:07 -07:00
Shireesh Anjal	c326e5a34e	Add metrics for webhook endpoint stats (#17179 )	2023-05-11 11:24:37 -07:00
Harshavardhana	5569acd95c	disallow EC:0 if not set during server startup (#17141 )	2023-05-04 14:44:30 -07:00
Harshavardhana	9112ca4e29	change ttfb_distribution metrics to histogramMetric (#17115 )	2023-05-03 07:31:00 -07:00
Anis Eleuch	a42650c065	Add minio_bucket_usage_version_total metric to Prometheus (#17023 )	2023-04-12 20:08:07 -07:00
Harshavardhana	3b7781835e	add lock metrics per node (#16943 )	2023-04-03 21:23:24 -07:00
Klaus Post	d85da9236e	Add Object Version count histogram (#16739 )	2023-03-10 08:53:59 -08:00
Harshavardhana	901887e6bf	feat: add lambda transformation functions target (#16507 )	2023-03-07 08:12:41 -08:00
ferhat elmas	714283fae2	cleanup ignored static analysis (#16767 )	2023-03-06 08:56:10 -08:00
Aditya Manthramurthy	8cde38404d	Add metrics for custom auth plugin (#16701 )	2023-02-27 09:55:18 -08:00
Andreas Auernhammer	74887c7372	kms: add support for KES API keys and switch to KES Go SDK (#16617 ) Signed-off-by: Andreas Auernhammer <hi@aead.dev>	2023-02-14 07:19:20 -08:00
Anis Elleuch	e05205756f	metrics: Add more logs when unable to read bucket usage (#16405 )	2023-01-13 02:32:00 +05:30
Anis Elleuch	27417459fb	metrics: Show healing info for all nodes (#16315 )	2022-12-26 08:35:32 -08:00
Harshavardhana	2433698372	fix: remove unnecessary logs for client conn errors (#16261 )	2022-12-15 08:25:05 -08:00
Aditya Manthramurthy	a30cfdd88f	Bump up madmin-go to v2 (#16162 )	2022-12-06 13:46:50 -08:00
Anis Elleuch	932d2c3c62	Add X-Amz-Request-Id to internode calls (#16146 )	2022-12-06 09:27:26 -08:00
Harshavardhana	5a8df7efb3	re-implement StorageInfo to be a peer call (#16155 )	2022-12-01 14:31:35 -08:00
Klaus Post	5b242f1d11	Add Audit target metrics (#16044 )	2022-11-10 10:20:21 -08:00
Klaus Post	bbc312fce6	Add notification queue metrics (#16026 )	2022-11-08 16:36:47 -08:00
Harshavardhana	23b329b9df	remove gateway completely (#15929 )	2022-10-24 17:44:15 -07:00
Anis Elleuch	048a46ec2a	Add RPC tcp timeout/errs and AVG duration to prometheus (#15747 )	2022-09-26 09:04:26 -07:00
Poorna	6b9fd256e1	Persist in-memory replication stats to disk (#15594 ) to avoid relying on scanner-calculated replication metrics. This will improve the accuracy of the replication stats reported. This PR also adds on to #15556 by handing replication traffic that could not be queued by available workers to the MRF queue so that entries in `PENDING` status are healed faster.	2022-09-12 12:40:02 -07:00
ebozduman	b57e7321e7	Replaces 'disk'=>'drive' visible to end user (#15464 )	2022-08-04 16:10:08 -07:00
Harshavardhana	5e763b71dc	use logger.LogOnce to reduce printing disconnection logs (#15408 ) fixes #15334 - re-use net/url parsed value for http.Request{} - remove gosimple, structcheck and unusued due to https://github.com/golangci/golangci-lint/issues/2649 - unwrapErrs upto leafErr to ensure that we store exactly the correct errors	2022-07-27 09:44:59 -07:00
Andreas Auernhammer	f800cee4fa	metric: add KMS-related metrics (#15258 ) This commit adds a minimal set of KMS-related metrics: ``` # HELP minio_cluster_kms_online Reports whether the KMS is online (1) or offline (0) # TYPE minio_cluster_kms_online gauge minio_cluster_kms_online{server="127.0.0.1:9000"} 1 # HELP minio_cluster_kms_request_error Number of KMS requests that failed with a well-defined error # TYPE minio_cluster_kms_request_error counter minio_cluster_kms_request_error{server="127.0.0.1:9000"} 16790 # HELP minio_cluster_kms_request_success Number of KMS requests that succeeded # TYPE minio_cluster_kms_request_success counter minio_cluster_kms_request_success{server="127.0.0.1:9000"} 348031 ``` Currently, we report whether the KMS is available and how many requests succeeded/failed. However, KES exposes much more metrics that can be exposed if necessary. See: https://pkg.go.dev/github.com/minio/kes#Metric Signed-off-by: Andreas Auernhammer <hi@aead.dev>	2022-07-11 09:17:28 -07:00
Anis Elleuch	ed0cbfb31e	fix: rootdisk detection by not using cached value when GetDiskInfo() errors out (#15249 ) GetDiskInfo() uses timedValue to cache the disk info for one second. timedValue behavior was recently changed to return an old cached value when calculating a new value returns an error. When a mount point is empty, GetDiskInfo() will return errUnformattedDisk, timedValue will return cached disk info with unexpected IsRootDisk value, e.g. false if the mount point belongs to a root disk. Therefore, the mount point will be considered a valid disk and will be formatted as well. This commit will also add more defensive code when marking root disks: always mark a disk offline for any GetDiskInfo() error except errUnformattedDisk. The server will try anyway to reconnect to those disks every 10 seconds.	2022-07-07 17:05:23 -07:00
Anis Elleuch	8d98282afd	Better reporting of total/free usable capacity of the cluster (#15230 ) The current code uses approximation using a ratio. The approximation can skew if we have multiple pools with different disk capacities. Replace the algorithm with a simpler one which counts data disks and ignore parity disks.	2022-07-06 13:29:49 -07:00
Klaus Post	ac055b09e9	Add detailed scanner metrics (#15161 )	2022-07-05 14:45:49 -07:00
Harshavardhana	63ac260bd5	Simplify Prometheus metrics gather (#15210 )	2022-07-01 13:18:39 -07:00
Harshavardhana	bd099f5e71	fix: change timedValue to return the previously cached value (#15169 ) fix: change timedvalue to return previous cached value caller can interpret the underlying error and decide accordingly, places where we do not interpret the errors upon timedValue.Get() - we should simply use the previously cached value instead of returning "empty". Bonus: remove some unused code	2022-06-25 08:50:16 -07:00
Harshavardhana	8082d1fed6	add bucket level S3 received/sent bytes (#15084 ) adds bucket level metrics for bytes received and sent bytes on all S3 API calls.	2022-06-14 15:14:24 -07:00
Harshavardhana	7413045f0e	fix: add missing minio_s3_requests_total (#15070 ) PR #15052 caused a regression, add the missing metrics back. Bonus: - internode information should be only for distributed setups - update the dashboard to include 4xx and 5xx error panels.	2022-06-11 00:50:31 -07:00
Anis Elleuch	5fb420c703	prometheus: Add S3 4xx and 5xx S3 monitoring (#15052 ) Currently minio_s3_requests_errors_total covers 4xx and 5xx S3 responses which can be confusing when s3 applications sent a lot of HEAD requests with obvious 404 responses or when the replication is enabled. Add - minio_s3_requests_4xx_errors_total - minio_s3_requests_5xx_errors_total to help users monitor 4xx and 5xx HTTP status codes separately.	2022-06-08 11:22:34 -07:00
Harshavardhana	2420f6c000	fix: make metrics endpoint responsive by reducing the chatter (#15055 ) peerOnlineCounter was making NxN calls to many peers, this can be really long and tedious if there are random servers that are going down. Instead we should calculate online peers from the point of view of "self" and return those online and offline appropriately by performing a healthcheck.	2022-06-08 02:43:13 -07:00
Harshavardhana	f1abb92f0c	feat: Single drive XL implementation (#14970 ) Main motivation is move towards a common backend format for all different types of modes in MinIO, allowing for a simpler code and predictable behavior across all features. This PR also brings features such as versioning, replication, transitioning to single drive setups.	2022-05-30 10:58:37 -07:00
Harshavardhana	f8650a3493	fetch bucket replication stats across peers in single call (#14956 ) current implementation relied on recursively calling one bucket at a time across all peers, this would be very slow and chatty when there are 100's of buckets which would mean 100*peerCount amount of network operations. This PR attempts to reduce this entire call into `peerCount` amount of network calls only. This functionality addresses also a concern where the Prometheus metrics would significantly slow down when one of the peers is offline.	2022-05-23 09:15:30 -07:00
Aditya Manthramurthy	165d60421d	Add metrics for observing IAM sync operations (#14680 )	2022-04-03 13:08:59 -07:00
Poorna	9e25475475	Validate tier manager is initialized in tier Empty() check (#14646 ) Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>	2022-03-29 10:10:06 -07:00
Krishnan Parthasarathi	80ef1ae51c	Simplify assembling of tierStats from data-usage (#14504 )	2022-03-08 12:08:29 -08:00
Anis Elleuch	bacf6156c1	metrics: Avoid crash when fetching tier metrics (#14493 ) Data usage does not always contain tiering info even if the data usage information is valid. Avoid a crash in that case. (e.g. the scanner scanned the namespace, the user enables tiering, prometheus scrapes the server before the scanner gets a chance to update the data usage with new tiering information)	2022-03-07 10:59:32 -08:00
Krishnan Parthasarathi	0ee2933234	Export tier metrics via Prometheus (#13413 ) e.g ``` minio_cluster_ilm_transitioned_bytes{server="minio3:9000",tier="S3TIER-1"} 1.36317772e+08 minio_cluster_ilm_transitioned_bytes{server="minio3:9000",tier="S3TIER-2"} 2892 minio_cluster_ilm_transitioned_bytes{server="minio3:9000",tier="STANDARD"} 1.3631488e+08 minio_cluster_ilm_transitioned_objects{server="minio3:9000",tier="S3TIER-1"} 1 minio_cluster_ilm_transitioned_objects{server="minio3:9000",tier="S3TIER-2"} 0 minio_cluster_ilm_transitioned_objects{server="minio3:9000",tier="STANDARD"} 1 minio_cluster_ilm_transitioned_versions{server="minio3:9000",tier="S3TIER-1"} 3 minio_cluster_ilm_transitioned_versions{server="minio3:9000",tier="S3TIER-2"} 2 minio_cluster_ilm_transitioned_versions{server="minio3:9000",tier="STANDARD"} 1 ```	2022-02-08 12:45:28 -08:00
Anis Elleuch	2ee337ead5	prometheus: Add incoming requests metrics since last scrape (#14261 ) Some users running MinIO claim that their system became slow. One way to investigate is to look at this Prometheus history of the number of the requests reaching the server. The existing current S3 requests metric is not enough because it can increase of the system really becomes slow, due to disk issues for example.	2022-02-07 16:30:14 -08:00

1 2

90 Commits