Add top level metrics document to summarize monitoring endpoints (#5923)

Minio server supports healthcheck and prometheus related
unauthenticated endpoints. This document summarizes this
information in a single place and add links for more detailed
documentation if needed.
This commit is contained in:
Nitish Tiwari 2018-05-16 00:53:21 +05:30 committed by Dee Koder
parent 5c21e89559
commit 9cab0f25e0
3 changed files with 22 additions and 38 deletions

View File

@ -1,36 +0,0 @@
## Minio Prometheus Metric
Minio server exposes an endpoint for Promethueus to scrape server data at `/minio/prometheus/metrics`.
### Prometheus probe
Prometheus is used to monitor Minio server information like http request, disk storage, network stats etc.. It uses a config file named `prometheus.yaml` to scrape data from server. The value for `metrics_path` and `targets` need to be configured in the config yaml to specify the endpoint and url as shown:
```
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: minio
metrics_path: /minio/prometheus/metrics
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9000']
```
Prometheus can be run by executing :
```
./prometheus --config.file=prometheus.yml
```
### List of Minio metric exposed
Minio exposes the following list of metric to Prometheus
- `minio_disk_storage_bytes` : Total byte count of disk storage available to current Minio server instance
- `minio_disk_storage_free_bytes` : Total byte count of free disk storage available to current Minio server instance
- `minio_http_requests_duration_seconds_bucket` : The bucket into which observations are counted for creating Histogram
- `minio_http_requests_duration_seconds_count` : The count of current number of observations i.e. total HTTP requests (HEAD/GET/PUT/POST/DELETE).
- `minio_http_requests_duration_seconds_sum` : The current aggregate time spent servicing all HTTP requests (HEAD/GET/PUT/POST/DELETE) in seconds
- `minio_http_requests_total` : Total number of requests served by current Minio server instance
- `minio_network_received_bytes_total` : Total number of bytes received by current Minio server instance
- `minio_network_sent_bytes_total` : Total number of bytes sent by current Minio server instance
- `minio_offline_disks` : Total number of offline disks for current Minio server instance
- `minio_total_disks` : Total number of disks for current Minio server instance
- `minio_server_start_time_seconds` : Time Unix time in seconds when current Minio server instance started

20
docs/metrics/README.md Normal file
View File

@ -0,0 +1,20 @@
## Minio Monitoring Guide
Minio server exposes monitoring data over un-authenticated endpoints so monitoring tools can pick the data without you having to share Minio server credentials. This document lists the monitoring endpoints and relevant documentation.
### Healthcheck Probe
Minio server has two healthcheck related endpoints, a liveness probe to indicate if server is working fine and a readiness probe to indicate if server is not accepting connections due to heavy load.
- Liveness probe available at `/minio/health/live`
- Readiness probe available at `/minio/health/ready`
Read more on how to use these endpoints in [Minio healthcheck guide](./healthcheck/README.md).
### Prometheus Probe
Minio server exposes Prometheus compatible data on a single endpoint.
- Prometheus data available at `/minio/prometheus/metrics`
To use this endpoint, setup Prometheus to scrape data from this endpoint. Read more on how to use Prometheues to monitor Minio server in [How to monitor Minio server with Prometheus](https://github.com/minio/cookbook/blob/master/docs/how-to-monitor-minio-with-prometheus.md).

View File

@ -4,7 +4,7 @@ Minio server exposes two un-authenticated, healthcheck endpoints - liveness prob
### Liveness probe
This probe is used to identify situations where the server is running but may not behave optimally, i.e. sluggish response or corrupt backend. Such problems can be *only* fixed by a restart.
This probe is used to identify situations where the server is running but may not behave optimally, i.e. sluggish response or corrupt back-end. Such problems can be *only* fixed by a restart.
Internally, Minio liveness probe handler does a ListBuckets call. If successful, the server returns 200 OK, otherwise 503 Service Unavailable.