minio/docs/metrics/v3.md
Shireesh Anjal 3bab4822f3
Add logger webhook metrics in metrics-v3 (#19515)
endpoint: /minio/metrics/v3/cluster/webhook
metrics:
- failed_messages (counter)
- online (gauge)
- queue_length (gauge)
- total_messages (counter)
2024-05-14 00:27:33 -07:00

305 lines
31 KiB
Markdown

# Metrics Version 3
In metrics version 3, all metrics are available under the endpoint:
```
/minio/metrics/v3
```
however, a specific path under this is required.
Metrics are organized into groups at paths **relative** to the top-level endpoint above.
## Metrics Request Handling
Each endpoint below can be queried at different intervals as needed via a scrape configuration in Prometheus or a compatible metrics collection tool.
For ease of configuration, each (non-empty) parent of the path serves all metric endpoints that are at descendant paths. For example, to query all system metrics one needs to only scrape `/minio/metrics/v3/system/`.
Some metrics are bucket specific. These will have a `/bucket` component in their path. As the number of buckets can be large, the metrics scrape operation needs to be provided with a specific list of buckets via the `bucket` query parameter. Only metrics for the given buckets will be returned (with the bucket label set). For example to query API metrics for buckets `test1` and `test2`, make a scrape request to `/minio/metrics/v3/api/bucket?buckets=test1,test2`.
Instead of a metrics scrape, it is also possible to list the metrics that would be returned by a path. This is done by adding a `?list` query parameter. The MinIO server will then list all possible metrics that could be returned. During an actual metrics scrape, only available metrics are returned - not all of them. With the `list` query parameter, the output format can be selected - just set the request `Content-Type` to `application/json` for JSON output, or `text/plain` for a simple markdown formatted table. The latter is the default.
## Request, System and Cluster Metrics
At a high level metrics are grouped into three categories, listed in the following sub-sections. The path in each of the tables is relative to the top-level endpoint.
### Request metrics
These are metrics about requests served by the (current) node.
| Path | Description |
|-----------------|--------------------------------------------------|
| `/api/requests` | Metrics over all requests |
| `/api/bucket` | Metrics over all requests split by bucket labels |
| | |
### Audit metrics
These are metrics about the minio process and the node.
| Path | Description |
|----------|----------------------------------------|
| `/audit` | Metrics related to audit functionality |
### Logger webhook metrics
These are metrics about the minio logger webhooks
| Path | Description |
|-------------------|------------------------------------|
| `/logger/webhook` | Metrics related to logger webhooks |
### System metrics
These are metrics about the minio process and the node.
| Path | Description |
|-----------------------------|---------------------------------------------------|
| `/system/drive` | Metrics about drives on the system |
| `/system/memory` | Metrics about memory on the system |
| `/system/network/internode` | Metrics about internode requests made by the node |
| `/system/process` | Standard process metrics |
| | |
### Debug metrics
These are metrics for debugging
| Path | Description |
|-----------------------------|---------------------------------------------------|
| `/debug/go` | Standard Go lang metrics |
| | |
### Cluster metrics
These present metrics about the whole MinIO cluster.
| Path | Description |
|--------------------------|-----------------------------|
| `/cluster/health` | Cluster health metrics |
| `/cluster/usage/objects` | Object statistics |
| `/cluster/usage/buckets` | Object statistics by bucket |
| `/cluster/erasure-set` | Erasure set metrics |
| | |
## Metrics Listing
Each of the following sub-sections list metrics returned by each of the endpoints.
The standard metrics group for GoCollector is not shown below.
### `/api/requests`
| Name | Type | Help | Labels |
|------------------------------------------------|-----------|---------------------------------------------------------|----------------------------------|
| `minio_api_requests_rejected_auth_total` | `counter` | Total number of requests rejected for auth failure | `type,pool_index,server` |
| `minio_api_requests_rejected_header_total` | `counter` | Total number of requests rejected for invalid header | `type,pool_index,server` |
| `minio_api_requests_rejected_timestamp_total` | `counter` | Total number of requests rejected for invalid timestamp | `type,pool_index,server` |
| `minio_api_requests_rejected_invalid_total` | `counter` | Total number of invalid requests | `type,pool_index,server` |
| `minio_api_requests_waiting_total` | `gauge` | Total number of requests in the waiting queue | `type,pool_index,server` |
| `minio_api_requests_incoming_total` | `gauge` | Total number of incoming requests | `type,pool_index,server` |
| `minio_api_requests_inflight_total` | `gauge` | Total number of requests currently in flight | `name,type,pool_index,server` |
| `minio_api_requests_total` | `counter` | Total number of requests | `name,type,pool_index,server` |
| `minio_api_requests_errors_total` | `counter` | Total number of requests with (4xx and 5xx) errors | `name,type,pool_index,server` |
| `minio_api_requests_5xx_errors_total` | `counter` | Total number of requests with 5xx errors | `name,type,pool_index,server` |
| `minio_api_requests_4xx_errors_total` | `counter` | Total number of requests with 4xx errors | `name,type,pool_index,server` |
| `minio_api_requests_canceled_total` | `counter` | Total number of requests canceled by the client | `name,type,pool_index,server` |
| `minio_api_requests_ttfb_seconds_distribution` | `counter` | Distribution of time to first byte across API calls | `name,type,le,pool_index,server` |
| `minio_api_requests_traffic_sent_bytes` | `counter` | Total number of bytes sent | `type,pool_index,server` |
| `minio_api_requests_traffic_received_bytes` | `counter` | Total number of bytes received | `type,pool_index,server` |
### `/bucket/api`
| Name | Type | Help | Labels |
|----------------------------------------------|-----------|------------------------------------------------------------------|-----------------------------------------|
| `minio_bucket_api_traffic_received_bytes` | `counter` | Total number of bytes sent for a bucket | `bucket,type,server,pool_index` |
| `minio_bucket_api_traffic_sent_bytes` | `counter` | Total number of bytes received for a bucket | `bucket,type,server,pool_index` |
| `minio_bucket_api_inflight_total` | `gauge` | Total number of requests currently in flight for a bucket | `bucket,name,type,server,pool_index` |
| `minio_bucket_api_total` | `counter` | Total number of requests for a bucket | `bucket,name,type,server,pool_index` |
| `minio_bucket_api_canceled_total` | `counter` | Total number of requests canceled by the client for a bucket | `bucket,name,type,server,pool_index` |
| `minio_bucket_api_4xx_errors_total` | `counter` | Total number of requests with 4xx errors for a bucket | `bucket,name,type,server,pool_index` |
| `minio_bucket_api_5xx_errors_total` | `counter` | Total number of requests with 5xx errors for a bucket | `bucket,name,type,server,pool_index` |
| `minio_bucket_api_ttfb_seconds_distribution` | `counter` | Distribution of time to first byte across API calls for a bucket | `bucket,name,le,type,server,pool_index` |
### `/audit`
| Name | Type | Help | Labels |
|-----------------------------------|-----------|----------------------------------------------------------|--------------------|
| `minio_audit_failed_messages` | `counter` | Total number of messages that failed to send since start | `target_id,server` |
| `minio_audit_target_queue_length` | `gauge` | Number of unsent messages in queue for target | `target_id,server` |
| `minio_audit_total_messages` | `counter` | Total number of messages sent since start | `target_id,server` |
### `/system/drive`
| Name | Type | Help | Labels |
|------------------------------------------------|-----------|--------------------------------------------------------------------|-----------------------------------------------------|
| `minio_system_drive_used_bytes` | `gauge` | Total storage used on a drive in bytes | `drive,set_index,drive_index,pool_index,server` |
| `minio_system_drive_free_bytes` | `gauge` | Total storage free on a drive in bytes | `drive,set_index,drive_index,pool_index,server` |
| `minio_system_drive_total_bytes` | `gauge` | Total storage available on a drive in bytes | `drive,set_index,drive_index,pool_index,server` |
| `minio_system_drive_used_inodes` | `gauge` | Total used inodes on a drive | `drive,set_index,drive_index,pool_index,server` |
| `minio_system_drive_free_inodes` | `gauge` | Total free inodes on a drive | `drive,set_index,drive_index,pool_index,server` |
| `minio_system_drive_total_inodes` | `gauge` | Total inodes available on a drive | `drive,set_index,drive_index,pool_index,server` |
| `minio_system_drive_timeout_errors_total` | `counter` | Total timeout errors on a drive | `drive,set_index,drive_index,pool_index,server` |
| `minio_system_drive_io_errors_total` | `counter` | Total I/O errors on a drive | `drive,set_index,drive_index,pool_index,server` |
| `minio_system_drive_availability_errors_total` | `counter` | Total availability errors (I/O errors, timeouts) on a drive | `drive,set_index,drive_index,pool_index,server` |
| `minio_system_drive_waiting_io` | `gauge` | Total waiting I/O operations on a drive | `drive,set_index,drive_index,pool_index,server` |
| `minio_system_drive_api_latency_micros` | `gauge` | Average last minute latency in µs for drive API storage operations | `drive,api,set_index,drive_index,pool_index,server` |
| `minio_system_drive_offline_count` | `gauge` | Count of offline drives | `pool_index,server` |
| `minio_system_drive_online_count` | `gauge` | Count of online drives | `pool_index,server` |
| `minio_system_drive_count` | `gauge` | Count of all drives | `pool_index,server` |
| `minio_system_drive_health` | `gauge` | Drive health (0 = offline, 1 = healthy, 2 = healing) | `drive,set_index,drive_index,pool_index,server` |
| `minio_system_drive_reads_per_sec` | `gauge` | Reads per second on a drive | `drive,set_index,drive_index,pool_index,server` |
| `minio_system_drive_reads_kb_per_sec` | `gauge` | Kilobytes read per second on a drive | `drive,set_index,drive_index,pool_index,server` |
| `minio_system_drive_reads_await` | `gauge` | Average time for read requests served on a drive | `drive,set_index,drive_index,pool_index,server` |
| `minio_system_drive_writes_per_sec` | `gauge` | Writes per second on a drive | `drive,set_index,drive_index,pool_index,server` |
| `minio_system_drive_writes_kb_per_sec` | `gauge` | Kilobytes written per second on a drive | `drive,set_index,drive_index,pool_index,server` |
| `minio_system_drive_writes_await` | `gauge` | Average time for write requests served on a drive | `drive,set_index,drive_index,pool_index,server` |
| `minio_system_drive_perc_util` | `gauge` | Percentage of time the disk was busy | `drive,set_index,drive_index,pool_index,server` |
### `/system/memory`
| Name | Type | Help | Labels |
|----------------------------------|---------|------------------------------------|----------|
| `minio_system_memory_used` | `gauge` | Used memory on the node | `server` |
| `minio_system_memory_used_perc` | `gauge` | Used memory percentage on the node | `server` |
| `minio_system_memory_free` | `gauge` | Free memory on the node | `server` |
| `minio_system_memory_total` | `gauge` | Total memory on the node | `server` |
| `minio_system_memory_buffers` | `gauge` | Buffers memory on the node | `server` |
| `minio_system_memory_cache` | `gauge` | Cache memory on the node | `server` |
| `minio_system_memory_shared` | `gauge` | Shared memory on the node | `server` |
| `minio_system_memory_available` | `gauge` | Available memory on the node | `server` |
### `/system/cpu`
| Name | Type | Help | Labels |
|-------------------------------|---------|------------------------------------|----------|
| `minio_system_cpu_avg_idle` | `gauge` | Average CPU idle time | `server` |
| `minio_system_cpu_avg_iowait` | `gauge` | Average CPU IOWait time | `server` |
| `minio_system_cpu_load` | `gauge` | CPU load average 1min | `server` |
| `minio_system_cpu_load_perc` | `gauge` | CPU load average 1min (percentage) | `server` |
| `minio_system_cpu_nice` | `gauge` | CPU nice time | `server` |
| `minio_system_cpu_steal` | `gauge` | CPU steal time | `server` |
| `minio_system_cpu_system` | `gauge` | CPU system time | `server` |
| `minio_system_cpu_user` | `gauge` | CPU user time | `server` |
### `/system/network/internode`
| Name | Type | Help | Labels |
|------------------------------------------------------|-----------|----------------------------------------------------------|---------------------|
| `minio_system_network_internode_errors_total` | `counter` | Total number of failed internode calls | `server,pool_index` |
| `minio_system_network_internode_dial_errors_total` | `counter` | Total number of internode TCP dial timeouts and errors | `server,pool_index` |
| `minio_system_network_internode_dial_avg_time_nanos` | `gauge` | Average dial time of internodes TCP calls in nanoseconds | `server,pool_index` |
| `minio_system_network_internode_sent_bytes_total` | `counter` | Total number of bytes sent to other peer nodes | `server,pool_index` |
| `minio_system_network_internode_recv_bytes_total` | `counter` | Total number of bytes received from other peer nodes | `server,pool_index` |
### `/system/process`
| Name | Type | Help | Labels |
|-------------------------------|-----------|----------------------------------------------------------------------------------------------------------------|----------|
| `locks_read_total` | `gauge` | Number of current READ locks on this peer | `server` |
| `locks_write_total` | `gauge` | Number of current WRITE locks on this peer | `server` |
| `cpu_total_seconds` | `counter` | Total user and system CPU time spent in seconds | `server` |
| `go_routine_total` | `gauge` | Total number of go routines running | `server` |
| `io_rchar_bytes` | `counter` | Total bytes read by the process from the underlying storage system including cache, /proc/[pid]/io rchar | `server` |
| `io_read_bytes` | `counter` | Total bytes read by the process from the underlying storage system, /proc/[pid]/io read_bytes | `server` |
| `io_wchar_bytes` | `counter` | Total bytes written by the process to the underlying storage system including page cache, /proc/[pid]/io wchar | `server` |
| `io_write_bytes` | `counter` | Total bytes written by the process to the underlying storage system, /proc/[pid]/io write_bytes | `server` |
| `start_time_seconds` | `gauge` | Start time for MinIO process in seconds since Unix epoc | `server` |
| `uptime_seconds` | `gauge` | Uptime for MinIO process in seconds | `server` |
| `file_descriptor_limit_total` | `gauge` | Limit on total number of open file descriptors for the MinIO Server process | `server` |
| `file_descriptor_open_total` | `gauge` | Total number of open file descriptors by the MinIO Server process | `server` |
| `syscall_read_total` | `counter` | Total read SysCalls to the kernel. /proc/[pid]/io syscr | `server` |
| `syscall_write_total` | `counter` | Total write SysCalls to the kernel. /proc/[pid]/io syscw | `server` |
| `resident_memory_bytes` | `gauge` | Resident memory size in bytes | `server` |
| `virtual_memory_bytes` | `gauge` | Virtual memory size in bytes | `server` |
| `virtual_memory_max_bytes` | `gauge` | Maximum virtual memory size in bytes | `server` |
### `/cluster/health`
| Name | Type | Help | Labels |
|----------------------------------------------------|---------|------------------------------------------------|--------|
| `minio_cluster_health_drives_offline_count` | `gauge` | Count of offline drives in the cluster | |
| `minio_cluster_health_drives_online_count` | `gauge` | Count of online drives in the cluster | |
| `minio_cluster_health_drives_count` | `gauge` | Count of all drives in the cluster | |
| `minio_cluster_health_nodes_offline_count` | `gauge` | Count of offline nodes in the cluster | |
| `minio_cluster_health_nodes_online_count` | `gauge` | Count of online nodes in the cluster | |
| `minio_cluster_health_capacity_raw_total_bytes` | `gauge` | Total cluster raw storage capacity in bytes | |
| `minio_cluster_health_capacity_raw_free_bytes` | `gauge` | Total cluster raw storage free in bytes | |
| `minio_cluster_health_capacity_usable_total_bytes` | `gauge` | Total cluster usable storage capacity in bytes | |
| `minio_cluster_health_capacity_usable_free_bytes` | `gauge` | Total cluster usable storage free in bytes | |
### `/cluster/usage/objects`
| Name | Type | Help | Labels |
|----------------------------------------------------------|---------|----------------------------------------------------------------|---------|
| `minio_cluster_usage_objects_since_last_update_seconds` | `gauge` | Time since last update of usage metrics in seconds | |
| `minio_cluster_usage_objects_total_bytes` | `gauge` | Total cluster usage in bytes | |
| `minio_cluster_usage_objects_count` | `gauge` | Total cluster objects count | |
| `minio_cluster_usage_objects_versions_count` | `gauge` | Total cluster object versions (including delete markers) count | |
| `minio_cluster_usage_objects_delete_markers_count` | `gauge` | Total cluster delete markers count | |
| `minio_cluster_usage_objects_buckets_count` | `gauge` | Total cluster buckets count | |
| `minio_cluster_usage_objects_size_distribution` | `gauge` | Cluster object size distribution | `range` |
| `minio_cluster_usage_objects_version_count_distribution` | `gauge` | Cluster object version count distribution | `range` |
### `/cluster/usage/buckets`
| Name | Type | Help | Labels |
|-----------------------------------------------------------------|---------|------------------------------------------------------------------|----------------|
| `minio_cluster_usage_buckets_since_last_update_seconds` | `gauge` | Time since last update of usage metrics in seconds | |
| `minio_cluster_usage_buckets_total_bytes` | `gauge` | Total bucket size in bytes | `bucket` |
| `minio_cluster_usage_buckets_objects_count` | `gauge` | Total objects count in bucket | `bucket` |
| `minio_cluster_usage_buckets_versions_count` | `gauge` | Total object versions (including delete markers) count in bucket | `bucket` |
| `minio_cluster_usage_buckets_delete_markers_count` | `gauge` | Total delete markers count in bucket | `bucket` |
| `minio_cluster_usage_buckets_quota_total_bytes` | `gauge` | Total bucket quota in bytes | `bucket` |
| `minio_cluster_usage_buckets_object_size_distribution` | `gauge` | Bucket object size distribution | `range,bucket` |
| `minio_cluster_usage_buckets_object_version_count_distribution` | `gauge` | Bucket object version count distribution | `range,bucket` |
### `/cluster/erasure-set`
| Name | Type | Help | Labels |
|--------------------------------------------------|---------|-----------------------------------------------------------------------------------|------------------|
| `minio_cluster_erasure_set_overall_write_quorum` | `gauge` | Overall write quorum across pools and sets | |
| `minio_cluster_erasure_set_overall_health` | `gauge` | Overall health across pools and sets (1=healthy, 0=unhealthy) | |
| `minio_cluster_erasure_set_read_quorum` | `gauge` | Read quorum for the erasure set in a pool | `pool_id,set_id` |
| `minio_cluster_erasure_set_write_quorum` | `gauge` | Write quorum for the erasure set in a pool | `pool_id,set_id` |
| `minio_cluster_erasure_set_online_drives_count` | `gauge` | Count of online drives in the erasure set in a pool | `pool_id,set_id` |
| `minio_cluster_erasure_set_healing_drives_count` | `gauge` | Count of healing drives in the erasure set in a pool | `pool_id,set_id` |
| `minio_cluster_erasure_set_health` | `gauge` | Health of the erasure set in a pool (1=healthy, 0=unhealthy) | `pool_id,set_id` |
| `minio_cluster_erasure_set_read_tolerance` | `gauge` | No of drive failures that can be tolerated without disrupting read operations | `pool_id,set_id` |
| `minio_cluster_erasure_set_write_tolerance` | `gauge` | No of drive failures that can be tolerated without disrupting write operations | `pool_id,set_id` |
| `minio_cluster_erasure_set_read_health` | `gauge` | Health of the erasure set in a pool for read operations (1=healthy, 0=unhealthy) | `pool_id,set_id` |
| `minio_cluster_erasure_set_write_health` | `gauge` | Health of the erasure set in a pool for write operations (1=healthy, 0=unhealthy) | `pool_id,set_id` |
### `/cluster/notification`
| Name | Type | Help | Labels |
|-------------------------------------------------------|-----------|------------------------------------------------------------------------------------------|--------|
| `minio_cluster_notification_current_send_in_progress` | `counter` | Number of concurrent async Send calls active to all targets | |
| `minio_cluster_notification_events_errors_total` | `counter` | Events that were failed to be sent to the targets | |
| `minio_cluster_notification_events_sent_total` | `counter` | Total number of events sent to the targets | |
| `minio_cluster_notification_events_skipped_total` | `counter` | Events that were skipped to be sent to the targets due to the in-memory queue being full | |
### `/cluster/iam`
| Name | Type | Help | Labels |
|-----------------------------------------------------------------|-----------|--------------------------------------------------------------------------------------------------------------------------|--------|
| `minio_cluster_iam_last_sync_duration_millis` | `counter` | Last successful IAM data sync duration in milliseconds | |
| `minio_cluster_iam_plugin_authn_service_failed_requests_minute` | `counter` | When plugin authentication is configured, returns failed requests count in the last full minute | |
| `minio_cluster_iam_plugin_authn_service_last_fail_seconds` | `counter` | When plugin authentication is configured, returns time (in seconds) since the last failed request to the service | |
| `minio_cluster_iam_plugin_authn_service_last_succ_seconds` | `counter` | When plugin authentication is configured, returns time (in seconds) since the last successful request to the service | |
| `minio_cluster_iam_plugin_authn_service_succ_avg_rtt_ms_minute` | `counter` | When plugin authentication is configured, returns average round-trip-time of successful requests in the last full minute | |
| `minio_cluster_iam_plugin_authn_service_succ_max_rtt_ms_minute` | `counter` | When plugin authentication is configured, returns maximum round-trip-time of successful requests in the last full minute | |
| `minio_cluster_iam_plugin_authn_service_total_requests_minute` | `counter` | When plugin authentication is configured, returns total requests count in the last full minute | |
| `minio_cluster_iam_since_last_sync_millis` | `counter` | Time (in milliseconds) since last successful IAM data sync | |
| `minio_cluster_iam_sync_failures` | `counter` | Number of failed IAM data syncs since server start | |
| `minio_cluster_iam_sync_successes` | `counter` | Number of successful IAM data syncs since server start | |
### `/logger/webhook`
| Name | Type | Help | Labels |
|-----------------------------------------|-----------|----------------------------------------------|------------------------|
| `minio_logger_webhook_failed_messages` | `counter` | Number of messages that failed to send | `server,name,endpoint` |
| `minio_logger_webhook_queue_length` | `gauge` | Webhook queue length | `server,name,endpoint` |
| `minio_logger_webhook_total_message` | `counter` | Total number of messages sent to this target | `server,name,endpoint` |