Commit Graph

109 Commits

Author SHA1 Message Date
jiuker
6ca6788bb7
feat: add events_errors_total metric (#18610) 2023-12-07 16:21:17 -08:00
Harshavardhana
e98172d72d
avoid hot-tier SLA to be tied to warm-tier SLA (#18581)
it is okay if the warm-tier cannot keep up, we should continue
to take I/O at hot-tier, only fail hot-tier or block it when
we are disk full.

Bonus: add metrics counter for these missed tasks, we will
know for sure if one of the node is lagging behind or is
losing too many tasks during transitioning.
2023-12-02 13:02:12 -08:00
Krishnan Parthasarathi
a50f26b7f5
Implement batch-expiration for objects (#17946)
Based on an initial PR from -
https://github.com/minio/minio/pull/17792

But fully completes it with newer finalized YAML spec.
2023-12-02 02:51:33 -08:00
Anis Eleuch
fe63664164
prom: Add drive failure tolerance per erasure set (#18424) 2023-11-13 00:59:48 -08:00
vicmunoz
da95a2d13f
fix: object versions metric help (#18388) 2023-11-03 11:43:52 -07:00
Klaus Post
128256e3ab
Add event counters (#18232)
Export metric for global events sent and skipped for the lifetime of the server.
2023-10-12 15:39:22 -07:00
Harshavardhana
6829ae5b13
completely remove drive caching layer from gateway days (#18217)
This has already been deprecated for close to a year now.
2023-10-11 21:18:17 -07:00
Matthew Toohey
f731e7ea36
Fix current_send_in_progress metric always being zero (#18160) 2023-10-09 17:28:17 -07:00
Shireesh Anjal
6d20ec3bea
Add support for resource metrics (#18057)
Add a new endpoint for "resource" metrics `/v2/metrics/resource`

This should return system metrics related to drives, network, CPU and
memory. Except for drives, other metrics should have corresponding "avg"
and "max" values also.

Reuse the real-time feature to capture the required data,
introducing CPU and memory metrics in it.

Collect the data every minute and keep updating the average and max values
accordingly, returning the latest values when the API is called.
2023-09-30 13:40:20 -07:00
Harshavardhana
9788d85ea3
remove logging for invalid metadata values (#18068) 2023-09-20 15:49:55 -07:00
Poorna
812f5a02d7
metrics: fix panic in replication stats reporting (#17979) 2023-09-05 10:26:18 -07:00
Poorna
b48bbe08b2
Add additional info for replication metrics API (#17293)
to track the replication transfer rate across different nodes,
number of active workers in use and in-queue stats to get
an idea of the current workload.

This PR also adds replication metrics to the site replication
status API. For site replication, prometheus metrics are
no longer at the bucket level - but at the cluster level.

Add prometheus metric to track credential errors since uptime
2023-08-30 01:00:59 -07:00
Harshavardhana
ba4566e86d
add missing IAM node metrics to cluster and node endpoint (#17908) 2023-08-24 09:26:37 -07:00
Harshavardhana
c4ca0a5a57
add two more drive metrics when metrics is available (#17854) 2023-08-15 10:55:47 -07:00
Yang Wu
23e4895dfc
Create metrics slice when necessary (#17809) 2023-08-07 02:21:22 -07:00
Harshavardhana
2fa561f22e
do not crash on invalid metric values (#17764)
```
minio[1032735]: panic: label value "\xc0.\xc0." is not valid UTF-8
minio[1032735]: goroutine 1781101 [running]:
minio[1032735]: github.com/prometheus/client_golang/prometheus.MustNewConstMetric(...)
```

log such errors for investigation
2023-08-01 00:55:39 -07:00
Harshavardhana
114fab4c70
export cluster health as prometheus metrics (#17741) 2023-07-28 01:16:53 -07:00
drivebyer
a7fb3a3853
fix: Create metrics slice when necessary in getCacheMetrics() (#17711) 2023-07-24 08:40:21 -07:00
Krishnan Parthasarathi
9eeee92d36
Add deletemarker_total metric (#17689) 2023-07-20 07:52:32 -07:00
Harshavardhana
6426b74770
move bucket centric metrics to /minio/v2/metrics/bucket handlers (#17663)
users/customers do not have a reasonable number of buckets anymore,
this is why we must avoid overpopulating cluster endpoints, instead
move the bucket monitoring to a separate endpoint.

some of it's a breaking change here for a couple of metrics, but
it is imperative that we do it to improve the responsiveness of
our Prometheus cluster endpoint.

Bonus: Added new cluster metrics for usage, objects and histograms
2023-07-18 22:25:12 -07:00
drivebyer
04c792476f
fix: provide a possible slice cap for heal failed metrics items (#17647)
Signed-off-by: Wu <yang.wu@daocloud.io>
2023-07-14 11:02:45 -07:00
Harshavardhana
5b7c83341b
move per bucket metrics to peer location (#17627) 2023-07-11 07:46:24 -07:00
Anis Eleuch
6d0bc5ab1e
prometheus: Fix internode stats (#17594)
Internode calculation was done inside S3 handlers, fix it by moving it
to internode handlers.

Remove admin stats since it is not used.
2023-07-08 07:35:11 -07:00
Harshavardhana
abb1f22057 Revert "change ttfb_distribution metrics to histogramMetric (#17115)"
This reverts commit 9112ca4e29.
2023-07-07 13:57:37 -07:00
Harshavardhana
7605d07bb2
add support for bucket level request count per API (#17468)
New metrics added to calculate API request count
per bucket, per API.  Captures errors, including
4xx, 5xx HTTP status codes separately.
2023-06-21 09:41:59 -07:00
Aditya Manthramurthy
5a1612fe32
Bump up madmin-go and pkg deps (#17469) 2023-06-19 17:53:08 -07:00
drivebyer
ad2ab6eb3e
fix: Give accurate cap to slice (#17224) 2023-05-17 15:14:09 -07:00
jiuker
9a799065b3
fix: make slice cap of right size (#17192) 2023-05-16 08:10:07 -07:00
Shireesh Anjal
c326e5a34e
Add metrics for webhook endpoint stats (#17179) 2023-05-11 11:24:37 -07:00
Harshavardhana
5569acd95c
disallow EC:0 if not set during server startup (#17141) 2023-05-04 14:44:30 -07:00
Harshavardhana
9112ca4e29
change ttfb_distribution metrics to histogramMetric (#17115) 2023-05-03 07:31:00 -07:00
Anis Eleuch
a42650c065
Add minio_bucket_usage_version_total metric to Prometheus (#17023) 2023-04-12 20:08:07 -07:00
Harshavardhana
3b7781835e
add lock metrics per node (#16943) 2023-04-03 21:23:24 -07:00
Klaus Post
d85da9236e
Add Object Version count histogram (#16739) 2023-03-10 08:53:59 -08:00
Harshavardhana
901887e6bf
feat: add lambda transformation functions target (#16507) 2023-03-07 08:12:41 -08:00
ferhat elmas
714283fae2
cleanup ignored static analysis (#16767) 2023-03-06 08:56:10 -08:00
Aditya Manthramurthy
8cde38404d
Add metrics for custom auth plugin (#16701) 2023-02-27 09:55:18 -08:00
Andreas Auernhammer
74887c7372
kms: add support for KES API keys and switch to KES Go SDK (#16617)
Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2023-02-14 07:19:20 -08:00
Anis Elleuch
e05205756f
metrics: Add more logs when unable to read bucket usage (#16405) 2023-01-13 02:32:00 +05:30
Anis Elleuch
27417459fb
metrics: Show healing info for all nodes (#16315) 2022-12-26 08:35:32 -08:00
Harshavardhana
2433698372
fix: remove unnecessary logs for client conn errors (#16261) 2022-12-15 08:25:05 -08:00
Aditya Manthramurthy
a30cfdd88f
Bump up madmin-go to v2 (#16162) 2022-12-06 13:46:50 -08:00
Anis Elleuch
932d2c3c62
Add X-Amz-Request-Id to internode calls (#16146) 2022-12-06 09:27:26 -08:00
Harshavardhana
5a8df7efb3
re-implement StorageInfo to be a peer call (#16155) 2022-12-01 14:31:35 -08:00
Klaus Post
5b242f1d11
Add Audit target metrics (#16044) 2022-11-10 10:20:21 -08:00
Klaus Post
bbc312fce6
Add notification queue metrics (#16026) 2022-11-08 16:36:47 -08:00
Harshavardhana
23b329b9df
remove gateway completely (#15929) 2022-10-24 17:44:15 -07:00
Anis Elleuch
048a46ec2a
Add RPC tcp timeout/errs and AVG duration to prometheus (#15747) 2022-09-26 09:04:26 -07:00
Poorna
6b9fd256e1
Persist in-memory replication stats to disk (#15594)
to avoid relying on scanner-calculated replication metrics.
This will improve the accuracy of the replication stats reported.

This PR also adds on to #15556 by handing replication
traffic that could not be queued by available workers to the 
MRF queue so that entries in `PENDING` status are healed faster.
2022-09-12 12:40:02 -07:00
ebozduman
b57e7321e7
Replaces 'disk'=>'drive' visible to end user (#15464) 2022-08-04 16:10:08 -07:00