Add top level metrics document to summarize monitoring endpoints (#5923)

Minio server supports healthcheck and prometheus related
unauthenticated endpoints. This document summarizes this
information in a single place and add links for more detailed
documentation if needed.
This commit is contained in:
Nitish Tiwari 2018-05-16 00:53:21 +05:30 committed by Dee Koder
parent 5c21e89559
commit 9cab0f25e0
3 changed files with 22 additions and 38 deletions

View File

@ -1,36 +0,0 @@
## Minio Prometheus Metric
Minio server exposes an endpoint for Promethueus to scrape server data at `/minio/prometheus/metrics`.
### Prometheus probe
Prometheus is used to monitor Minio server information like http request, disk storage, network stats etc.. It uses a config file named `prometheus.yaml` to scrape data from server. The value for `metrics_path` and `targets` need to be configured in the config yaml to specify the endpoint and url as shown:
```
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: minio
metrics_path: /minio/prometheus/metrics
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9000']
```
Prometheus can be run by executing :
```
./prometheus --config.file=prometheus.yml
```
### List of Minio metric exposed
Minio exposes the following list of metric to Prometheus
- `minio_disk_storage_bytes` : Total byte count of disk storage available to current Minio server instance
- `minio_disk_storage_free_bytes` : Total byte count of free disk storage available to current Minio server instance
- `minio_http_requests_duration_seconds_bucket` : The bucket into which observations are counted for creating Histogram
- `minio_http_requests_duration_seconds_count` : The count of current number of observations i.e. total HTTP requests (HEAD/GET/PUT/POST/DELETE).
- `minio_http_requests_duration_seconds_sum` : The current aggregate time spent servicing all HTTP requests (HEAD/GET/PUT/POST/DELETE) in seconds
- `minio_http_requests_total` : Total number of requests served by current Minio server instance
- `minio_network_received_bytes_total` : Total number of bytes received by current Minio server instance
- `minio_network_sent_bytes_total` : Total number of bytes sent by current Minio server instance
- `minio_offline_disks` : Total number of offline disks for current Minio server instance
- `minio_total_disks` : Total number of disks for current Minio server instance
- `minio_server_start_time_seconds` : Time Unix time in seconds when current Minio server instance started

20
docs/metrics/README.md Normal file
View File

@ -0,0 +1,20 @@
## Minio Monitoring Guide
Minio server exposes monitoring data over un-authenticated endpoints so monitoring tools can pick the data without you having to share Minio server credentials. This document lists the monitoring endpoints and relevant documentation.
### Healthcheck Probe
Minio server has two healthcheck related endpoints, a liveness probe to indicate if server is working fine and a readiness probe to indicate if server is not accepting connections due to heavy load.
- Liveness probe available at `/minio/health/live`
- Readiness probe available at `/minio/health/ready`
Read more on how to use these endpoints in [Minio healthcheck guide](./healthcheck/README.md).
### Prometheus Probe
Minio server exposes Prometheus compatible data on a single endpoint.
- Prometheus data available at `/minio/prometheus/metrics`
To use this endpoint, setup Prometheus to scrape data from this endpoint. Read more on how to use Prometheues to monitor Minio server in [How to monitor Minio server with Prometheus](https://github.com/minio/cookbook/blob/master/docs/how-to-monitor-minio-with-prometheus.md).

View File

@ -4,11 +4,11 @@ Minio server exposes two un-authenticated, healthcheck endpoints - liveness prob
### Liveness probe
This probe is used to identify situations where the server is running but may not behave optimally, i.e. sluggish response or corrupt backend. Such problems can be *only* fixed by a restart.
This probe is used to identify situations where the server is running but may not behave optimally, i.e. sluggish response or corrupt back-end. Such problems can be *only* fixed by a restart.
Internally, Minio liveness probe handler does a ListBuckets call. If successful, the server returns 200 OK, otherwise 503 Service Unavailable.
When liveness probe fails, Kubernetes like platforms restart the container.
When liveness probe fails, Kubernetes like platforms restart the container.
### Readiness probe