Commit Graph

83 Commits

Author SHA1 Message Date
Anis Eleuch
46d45a6923 grafana: Add TCP dial errors panel (#17101) 2023-04-28 11:11:17 -07:00
Anis Eleuch
2448a9e047 grafana: Remove minio_s3_requests_errors_total metric (#17094) 2023-04-27 10:55:30 -07:00
Jiffs Maverick
61101d82d9 Rename inodes metric in grafana dashboards (#17030) 2023-04-21 11:07:30 -07:00
Harshavardhana
e47a31f9fc fix: object size distribution in metrics for all objects (#16539) 2023-02-04 21:10:10 -08:00
Anis Elleuch
e73894fa50 grafana: Show one metric for the total data growth (#16449) 2023-01-20 09:39:28 -08:00
Anis Elleuch
b8943fdf19 doc: Update prometheus metrics list (#16329) 2022-12-29 15:08:22 -08:00
Harshavardhana
23b329b9df remove gateway completely (#15929) 2022-10-24 17:44:15 -07:00
Daryl White
d44f3526dc Update links to documentation site (#15750) 2022-09-28 21:28:45 -07:00
ebozduman
b57e7321e7 Replaces 'disk'=>'drive' visible to end user (#15464) 2022-08-04 16:10:08 -07:00
MohammadReza
f4d5c861f3 update grafana dashboard (#15357) 2022-07-21 15:17:44 -07:00
Harshavardhana
8082d1fed6 add bucket level S3 received/sent bytes (#15084)
adds bucket level metrics for bytes received and sent bytes on all S3 API calls.
2022-06-14 15:14:24 -07:00
Minio Trusted
f34b2ef90b update dashboard Data Usage Growth as time series 2022-06-13 22:05:36 -07:00
Harshavardhana
7413045f0e fix: add missing minio_s3_requests_total (#15070)
PR #15052 caused a regression, add the missing metrics back.

Bonus:

- internode information should be only for distributed setups 
- update the dashboard to include 4xx and 5xx error panels.
2022-06-11 00:50:31 -07:00
Anis Elleuch
5fb420c703 prometheus: Add S3 4xx and 5xx S3 monitoring (#15052)
Currently minio_s3_requests_errors_total covers 4xx and 
5xx S3 responses which can be confusing when s3 applications 
sent a lot of HEAD requests with obvious 404 responses or 
when the replication is enabled.

Add 
- minio_s3_requests_4xx_errors_total
- minio_s3_requests_5xx_errors_total

to help users monitor 4xx and 5xx HTTP status codes separately.
2022-06-08 11:22:34 -07:00
Minio Trusted
f63645546d update minimum goroutine threshold on dashboard 2022-06-06 22:13:54 -07:00
Harshavardhana
c2630bb3a3 add total usage pie chart based on total/free bytes 2022-05-28 09:53:53 -07:00
Krishna Srinivas
389ec21d0c Update documentation for /minio/health/cluster (#14889) 2022-05-12 09:54:07 -07:00
Eco
81d2b54dfd doc: typo fix for ttfb entry in table (#14647) 2022-03-29 09:42:02 -07:00
Harshavardhana
e3e0532613 cleanup markdown docs across multiple files (#14296)
enable markdown-linter
2022-02-11 16:51:25 -08:00
Krishnan Parthasarathi
0ee2933234 Export tier metrics via Prometheus (#13413)
e.g
```
minio_cluster_ilm_transitioned_bytes{server="minio3:9000",tier="S3TIER-1"} 1.36317772e+08
minio_cluster_ilm_transitioned_bytes{server="minio3:9000",tier="S3TIER-2"} 2892
minio_cluster_ilm_transitioned_bytes{server="minio3:9000",tier="STANDARD"}
1.3631488e+08

minio_cluster_ilm_transitioned_objects{server="minio3:9000",tier="S3TIER-1"} 1
minio_cluster_ilm_transitioned_objects{server="minio3:9000",tier="S3TIER-2"} 0
minio_cluster_ilm_transitioned_objects{server="minio3:9000",tier="STANDARD"} 1

minio_cluster_ilm_transitioned_versions{server="minio3:9000",tier="S3TIER-1"} 3
minio_cluster_ilm_transitioned_versions{server="minio3:9000",tier="S3TIER-2"} 2
minio_cluster_ilm_transitioned_versions{server="minio3:9000",tier="STANDARD"} 1
```
2022-02-08 12:45:28 -08:00
Harshavardhana
74faed166a Add quota usage as part of prometheus metrics (#14222)
Bonus: pass caller context when needed to all bucket metadata handling calls.
2022-01-31 17:27:43 -08:00
chrisbecke
ef0b8367b5 Update minio-overview.json data source panel (#13730)
Add missing datasource in `Healing` panel.
2021-11-23 09:01:07 -08:00
Mani
7b82411e6f change the unit of measurement from TB to TiB (#13686) 2021-11-18 20:06:37 -08:00
Ashish Kumar Sinha
3d2bc15e9a Add grafana json file for replication metrics (#13678) 2021-11-17 14:49:46 -08:00
jandres - moscardo
1aa08f594d Update README.md prometheus (#13514)
Modify the doc to warn users about Prometheus sending `domain:port`
2021-11-02 12:27:30 -07:00
Harshavardhana
90e505e58f calculate API requests/error as increase() intervals not as rate() 2021-09-12 11:28:28 -07:00
Harshavardhana
1250312287 fail ready/liveness if etcd is unhealthy in gateway mode (#13146) 2021-09-03 17:05:41 -07:00
Nitish Tiwari
60394ddf83 Add support for changing job name in Grafana dashboard (#13050) 2021-08-24 09:51:09 -07:00
Krishnan Parthasarathi
30b77f59b1 doc: Add ilm prometheus metrics information (#12994) 2021-08-17 12:19:36 -07:00
Matt Sarrel
109c8acf4f fixed typo in metrics README.md (#12874) 2021-08-04 12:48:57 -07:00
Nitish Tiwari
32017454ee fix typo in Grafana dashboard json (#12471) 2021-06-09 08:04:12 -07:00
Nitish Tiwari
00c5d7e1b3 Add healing related metrics in official dashboard (#12456) 2021-06-07 12:46:54 -07:00
Poorna Krishnamoorthy
3690de0c6b Drop Pending size and count from replication metrics (#12378)
Real-time metrics calculated in-memory rely on the initial
replication metrics saved with data usage. However, this can
lag behind the actual state of the cluster at the time of server 
restart leading to inaccurate Pending size/counts reported to
Prometheus. Dropping the Pending metrics as this can be more 
reliably monitored by applications with replication notifications.

Signed-off-by: Poorna Krishnamoorthy <poorna@minio.io>
2021-05-31 20:26:52 -07:00
Nitish Tiwari
a592d3be19 fix the dashboard to use $rate_interval (#12277)
refer https://grafana.com/blog/2020/09/28/new-in-grafana-7.2-__rate_interval-for-prometheus-rate-queries-that-just-work/
for further information
2021-05-12 08:06:47 -07:00
Harshavardhana
2fd9c13b50 rename minio-cluster to minio-job as per prometheus config 2021-05-06 12:39:58 -07:00
Nitish Tiwari
ddc1e4b5b3 Update Grafana dashboard to use the new v2 cluster metrics (#12220)
Fixes #11543
2021-05-06 14:44:03 +05:30
Harshavardhana
8a9d15ace2 update prometheus metrics with failed_count 2021-04-04 09:52:37 -07:00
Poorna Krishnamoorthy
47c09a1e6f Various improvements in replication (#11949)
- collect real time replication metrics for prometheus.
- add pending_count, failed_count metric for total pending/failed replication operations.

- add API to get replication metrics

- add MRF worker to handle spill-over replication operations

- multiple issues found with replication
- fixes an issue when client sends a bucket
 name with `/` at the end from SetRemoteTarget
 API call make sure to trim the bucket name to 
 avoid any extra `/`.

- hold write locks in GetObjectNInfo during replication
  to ensure that object version stack is not overwritten
  while reading the content.

- add additional protection during WriteMetadata() to
  ensure that we always write a valid FileInfo{} and avoid
  ever writing empty FileInfo{} to the lowest layers.

Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
Co-authored-by: Harshavardhana <harsha@minio.io>
2021-04-03 09:03:42 -07:00
Ritesh H Shukla
23b03dadb8 Add process uptime metric (#11844) 2021-03-20 21:23:27 -07:00
Harshavardhana
2c198ae7b6 fix: prometheus metrics disks_online count when disks are down (#11689)
prometheus metrics was using total disks instead
of online disk count, when disks were down, this
PR fixes this and also adds a new metric for
total_disk_count
2021-03-03 11:18:41 -08:00
Krishna Srinivas
876b79b8d8 read-health check endpoint returns success if cluster can serve read requests (#11310) 2021-02-09 01:00:44 -08:00
Ritesh H Shukla
c4848f9b4f Add process start time to cluster metrics. (#11405) 2021-02-01 23:02:18 -08:00
Ritesh H Shukla
0bf2d84f96 update new metrics url docs (#11342) 2021-01-25 01:03:07 -08:00
Ritesh H Shukla
7575c24037 Add open FD and FD limit to cluster metrics (#11328) 2021-01-22 18:30:16 -08:00
Harshavardhana
c080f04e66 fix: prometheus metrics link typo update to latest 2021-01-22 01:53:23 -08:00
Ritesh H Shukla
b4add82bb6 Updated Prometheus metrics (#11141)
* Add metrics for nodes online and offline
* Add cluster capacity metrics
* Introduce v2 metrics
2021-01-18 20:35:38 -08:00
Harshavardhana
14792cdbc6 docs: fix the metrics formatting (#11081) 2020-12-10 18:15:47 -08:00
Harshavardhana
97856bfebf fix: grafana double counting for bucket usage, histrogram and objects (#11070) 2020-12-09 20:30:37 -08:00
Nitish Tiwari
54d243cd98 fix: grafana dashboard calculating online nodes (#11041)
Also use a generic name instead of diff names per revision
2020-12-09 00:26:42 -08:00
Ritesh H Shukla
04848dfa1c Add documentation for bucket replication related metrics (#11055) 2020-12-08 12:48:10 -08:00