mirror of
https://github.com/minio/minio.git
synced 2024-12-30 17:13:20 -05:00
4caa3422bd
endpoint: /minio/metrics/v3/system/process metrics: - locks_read_total - locks_write_total - cpu_total_seconds - go_routine_total - io_rchar_bytes - io_read_bytes - io_wchar_bytes - io_write_bytes - start_time_seconds - uptime_seconds - file_descriptor_limit_total - file_descriptor_open_total - syscall_read_total - syscall_write_total - resident_memory_bytes - virtual_memory_bytes - virtual_memory_max_bytes Since the standard process collector implements only a subset of these metrics, remove it and implement our own custom process collector that captures all the process metrics we need.
256 lines
26 KiB
Markdown
256 lines
26 KiB
Markdown
# Metrics Version 3
|
|
|
|
In metrics version 3, all metrics are available under the endpoint:
|
|
|
|
```
|
|
/minio/metrics/v3
|
|
```
|
|
|
|
however, a specific path under this is required.
|
|
|
|
Metrics are organized into groups at paths **relative** to the top-level endpoint above.
|
|
|
|
## Metrics Request Handling
|
|
|
|
Each endpoint below can be queried at different intervals as needed via a scrape configuration in Prometheus or a compatible metrics collection tool.
|
|
|
|
For ease of configuration, each (non-empty) parent of the path serves all metric endpoints that are at descendant paths. For example, to query all system metrics one needs to only scrape `/minio/metrics/v3/system/`.
|
|
|
|
Some metrics are bucket specific. These will have a `/bucket` component in their path. As the number of buckets can be large, the metrics scrape operation needs to be provided with a specific list of buckets via the `bucket` query parameter. Only metrics for the given buckets will be returned (with the bucket label set). For example to query API metrics for buckets `test1` and `test2`, make a scrape request to `/minio/metrics/v3/api/bucket?buckets=test1,test2`.
|
|
|
|
Instead of a metrics scrape, it is also possible to list the metrics that would be returned by a path. This is done by adding a `?list` query parameter. The MinIO server will then list all possible metrics that could be returned. During an actual metrics scrape, only available metrics are returned - not all of them. With the `list` query parameter, the output format can be selected - just set the request `Content-Type` to `application/json` for JSON output, or `text/plain` for a simple markdown formatted table. The latter is the default.
|
|
|
|
## Request, System and Cluster Metrics
|
|
|
|
At a high level metrics are grouped into three categories, listed in the following sub-sections. The path in each of the tables is relative to the top-level endpoint.
|
|
|
|
### Request metrics
|
|
|
|
These are metrics about requests served by the (current) node.
|
|
|
|
| Path | Description |
|
|
|-----------------|--------------------------------------------------|
|
|
| `/api/requests` | Metrics over all requests |
|
|
| `/api/bucket` | Metrics over all requests split by bucket labels |
|
|
| | |
|
|
|
|
|
|
### System metrics
|
|
|
|
These are metrics about the minio process and the node.
|
|
|
|
| Path | Description |
|
|
|-----------------------------|---------------------------------------------------|
|
|
| `/system/drive` | Metrics about drives on the system |
|
|
| `/system/memory` | Metrics about memory on the system |
|
|
| `/system/network/internode` | Metrics about internode requests made by the node |
|
|
| `/system/process` | Standard process metrics |
|
|
| `/system/go` | Standard Go lang metrics |
|
|
| | |
|
|
|
|
### Cluster metrics
|
|
|
|
These present metrics about the whole MinIO cluster.
|
|
|
|
| Path | Description |
|
|
|--------------------------|-----------------------------|
|
|
| `/cluster/health` | Cluster health metrics |
|
|
| `/cluster/usage/objects` | Object statistics |
|
|
| `/cluster/usage/buckets` | Object statistics by bucket |
|
|
| `/cluster/erasure-set` | Erasure set metrics |
|
|
| | |
|
|
|
|
## Metrics Listing
|
|
|
|
Each of the following sub-sections list metrics returned by each of the endpoints.
|
|
|
|
The standard metrics group for GoCollector is not shown below.
|
|
|
|
### `/api/requests`
|
|
|
|
| Name | Type | Help | Labels |
|
|
|------------------------------------------------|-----------|---------------------------------------------------------|----------------------------------|
|
|
| `minio_api_requests_rejected_auth_total` | `counter` | Total number of requests rejected for auth failure | `type,pool_index,server` |
|
|
| `minio_api_requests_rejected_header_total` | `counter` | Total number of requests rejected for invalid header | `type,pool_index,server` |
|
|
| `minio_api_requests_rejected_timestamp_total` | `counter` | Total number of requests rejected for invalid timestamp | `type,pool_index,server` |
|
|
| `minio_api_requests_rejected_invalid_total` | `counter` | Total number of invalid requests | `type,pool_index,server` |
|
|
| `minio_api_requests_waiting_total` | `gauge` | Total number of requests in the waiting queue | `type,pool_index,server` |
|
|
| `minio_api_requests_incoming_total` | `gauge` | Total number of incoming requests | `type,pool_index,server` |
|
|
| `minio_api_requests_inflight_total` | `gauge` | Total number of requests currently in flight | `name,type,pool_index,server` |
|
|
| `minio_api_requests_total` | `counter` | Total number of requests | `name,type,pool_index,server` |
|
|
| `minio_api_requests_errors_total` | `counter` | Total number of requests with (4xx and 5xx) errors | `name,type,pool_index,server` |
|
|
| `minio_api_requests_5xx_errors_total` | `counter` | Total number of requests with 5xx errors | `name,type,pool_index,server` |
|
|
| `minio_api_requests_4xx_errors_total` | `counter` | Total number of requests with 4xx errors | `name,type,pool_index,server` |
|
|
| `minio_api_requests_canceled_total` | `counter` | Total number of requests canceled by the client | `name,type,pool_index,server` |
|
|
| `minio_api_requests_ttfb_seconds_distribution` | `counter` | Distribution of time to first byte across API calls | `name,type,le,pool_index,server` |
|
|
| `minio_api_requests_traffic_sent_bytes` | `counter` | Total number of bytes sent | `type,pool_index,server` |
|
|
| `minio_api_requests_traffic_received_bytes` | `counter` | Total number of bytes received | `type,pool_index,server` |
|
|
|
|
### `/api/bucket`
|
|
|
|
| Name | Type | Help | Labels |
|
|
|----------------------------------------------|-----------|------------------------------------------------------------------|-----------------------------------------|
|
|
| `minio_api_bucket_traffic_received_bytes` | `counter` | Total number of bytes sent for a bucket | `bucket,type,server,pool_index` |
|
|
| `minio_api_bucket_traffic_sent_bytes` | `counter` | Total number of bytes received for a bucket | `bucket,type,server,pool_index` |
|
|
| `minio_api_bucket_inflight_total` | `gauge` | Total number of requests currently in flight for a bucket | `bucket,name,type,server,pool_index` |
|
|
| `minio_api_bucket_total` | `counter` | Total number of requests for a bucket | `bucket,name,type,server,pool_index` |
|
|
| `minio_api_bucket_canceled_total` | `counter` | Total number of requests canceled by the client for a bucket | `bucket,name,type,server,pool_index` |
|
|
| `minio_api_bucket_4xx_errors_total` | `counter` | Total number of requests with 4xx errors for a bucket | `bucket,name,type,server,pool_index` |
|
|
| `minio_api_bucket_5xx_errors_total` | `counter` | Total number of requests with 5xx errors for a bucket | `bucket,name,type,server,pool_index` |
|
|
| `minio_api_bucket_ttfb_seconds_distribution` | `counter` | Distribution of time to first byte across API calls for a bucket | `bucket,name,le,type,server,pool_index` |
|
|
|
|
### `/system/drive`
|
|
|
|
| Name | Type | Help | Labels |
|
|
|------------------------------------------------|-----------|--------------------------------------------------------------------|-----------------------------------------------------|
|
|
| `minio_system_drive_used_bytes` | `gauge` | Total storage used on a drive in bytes | `drive,set_index,drive_index,pool_index,server` |
|
|
| `minio_system_drive_free_bytes` | `gauge` | Total storage free on a drive in bytes | `drive,set_index,drive_index,pool_index,server` |
|
|
| `minio_system_drive_total_bytes` | `gauge` | Total storage available on a drive in bytes | `drive,set_index,drive_index,pool_index,server` |
|
|
| `minio_system_drive_used_inodes` | `gauge` | Total used inodes on a drive | `drive,set_index,drive_index,pool_index,server` |
|
|
| `minio_system_drive_free_inodes` | `gauge` | Total free inodes on a drive | `drive,set_index,drive_index,pool_index,server` |
|
|
| `minio_system_drive_total_inodes` | `gauge` | Total inodes available on a drive | `drive,set_index,drive_index,pool_index,server` |
|
|
| `minio_system_drive_timeout_errors_total` | `counter` | Total timeout errors on a drive | `drive,set_index,drive_index,pool_index,server` |
|
|
| `minio_system_drive_io_errors_total` | `counter` | Total I/O errors on a drive | `drive,set_index,drive_index,pool_index,server` |
|
|
| `minio_system_drive_availability_errors_total` | `counter` | Total availability errors (I/O errors, timeouts) on a drive | `drive,set_index,drive_index,pool_index,server` |
|
|
| `minio_system_drive_waiting_io` | `gauge` | Total waiting I/O operations on a drive | `drive,set_index,drive_index,pool_index,server` |
|
|
| `minio_system_drive_api_latency_micros` | `gauge` | Average last minute latency in µs for drive API storage operations | `drive,api,set_index,drive_index,pool_index,server` |
|
|
| `minio_system_drive_offline_count` | `gauge` | Count of offline drives | `pool_index,server` |
|
|
| `minio_system_drive_online_count` | `gauge` | Count of online drives | `pool_index,server` |
|
|
| `minio_system_drive_count` | `gauge` | Count of all drives | `pool_index,server` |
|
|
| `minio_system_drive_healing` | `gauge` | Is it healing? | `drive,set_index,drive_index,pool_index,server` |
|
|
| `minio_system_drive_online` | `gauge` | Is it online? | `drive,set_index,drive_index,pool_index,server` |
|
|
| `minio_system_drive_reads_per_sec` | `gauge` | Reads per second on a drive | `drive,set_index,drive_index,pool_index,server` |
|
|
| `minio_system_drive_reads_kb_per_sec` | `gauge` | Kilobytes read per second on a drive | `drive,set_index,drive_index,pool_index,server` |
|
|
| `minio_system_drive_reads_await` | `gauge` | Average time for read requests served on a drive | `drive,set_index,drive_index,pool_index,server` |
|
|
| `minio_system_drive_writes_per_sec` | `gauge` | Writes per second on a drive | `drive,set_index,drive_index,pool_index,server` |
|
|
| `minio_system_drive_writes_kb_per_sec` | `gauge` | Kilobytes written per second on a drive | `drive,set_index,drive_index,pool_index,server` |
|
|
| `minio_system_drive_writes_await` | `gauge` | Average time for write requests served on a drive | `drive,set_index,drive_index,pool_index,server` |
|
|
| `minio_system_drive_perc_util` | `gauge` | Percentage of time the disk was busy | `drive,set_index,drive_index,pool_index,server` |
|
|
|
|
### `/system/memory`
|
|
|
|
| Name | Type | Help | Labels |
|
|
|----------------------------------|---------|------------------------------------|----------|
|
|
| `minio_system_memory_used` | `gauge` | Used memory on the node | `server` |
|
|
| `minio_system_memory_used_perc` | `gauge` | Used memory percentage on the node | `server` |
|
|
| `minio_system_memory_free` | `gauge` | Free memory on the node | `server` |
|
|
| `minio_system_memory_total` | `gauge` | Total memory on the node | `server` |
|
|
| `minio_system_memory_buffers` | `gauge` | Buffers memory on the node | `server` |
|
|
| `minio_system_memory_cache` | `gauge` | Cache memory on the node | `server` |
|
|
| `minio_system_memory_shared` | `gauge` | Shared memory on the node | `server` |
|
|
| `minio_system_memory_available` | `gauge` | Available memory on the node | `server` |
|
|
|
|
### `/system/cpu`
|
|
|
|
| Name | Type | Help | Labels |
|
|
|-------------------------------|---------|------------------------------------|----------|
|
|
| `minio_system_cpu_avg_idle` | `gauge` | Average CPU idle time | `server` |
|
|
| `minio_system_cpu_avg_iowait` | `gauge` | Average CPU IOWait time | `server` |
|
|
| `minio_system_cpu_load` | `gauge` | CPU load average 1min | `server` |
|
|
| `minio_system_cpu_load_perc` | `gauge` | CPU load average 1min (percentage) | `server` |
|
|
| `minio_system_cpu_nice` | `gauge` | CPU nice time | `server` |
|
|
| `minio_system_cpu_steal` | `gauge` | CPU steal time | `server` |
|
|
| `minio_system_cpu_system` | `gauge` | CPU system time | `server` |
|
|
| `minio_system_cpu_user` | `gauge` | CPU user time | `server` |
|
|
|
|
### `/system/network/internode`
|
|
|
|
| Name | Type | Help | Labels |
|
|
|------------------------------------------------------|-----------|----------------------------------------------------------|---------------------|
|
|
| `minio_system_network_internode_errors_total` | `counter` | Total number of failed internode calls | `server,pool_index` |
|
|
| `minio_system_network_internode_dial_errors_total` | `counter` | Total number of internode TCP dial timeouts and errors | `server,pool_index` |
|
|
| `minio_system_network_internode_dial_avg_time_nanos` | `gauge` | Average dial time of internodes TCP calls in nanoseconds | `server,pool_index` |
|
|
| `minio_system_network_internode_sent_bytes_total` | `counter` | Total number of bytes sent to other peer nodes | `server,pool_index` |
|
|
| `minio_system_network_internode_recv_bytes_total` | `counter` | Total number of bytes received from other peer nodes | `server,pool_index` |
|
|
|
|
### `/system/process`
|
|
|
|
| Name | Type | Help | Labels |
|
|
|-------------------------------|-----------|----------------------------------------------------------------------------------------------------------------|----------|
|
|
| `locks_read_total` | `gauge` | Number of current READ locks on this peer | `server` |
|
|
| `locks_write_total` | `gauge` | Number of current WRITE locks on this peer | `server` |
|
|
| `cpu_total_seconds` | `counter` | Total user and system CPU time spent in seconds | `server` |
|
|
| `go_routine_total` | `gauge` | Total number of go routines running | `server` |
|
|
| `io_rchar_bytes` | `counter` | Total bytes read by the process from the underlying storage system including cache, /proc/[pid]/io rchar | `server` |
|
|
| `io_read_bytes` | `counter` | Total bytes read by the process from the underlying storage system, /proc/[pid]/io read_bytes | `server` |
|
|
| `io_wchar_bytes` | `counter` | Total bytes written by the process to the underlying storage system including page cache, /proc/[pid]/io wchar | `server` |
|
|
| `io_write_bytes` | `counter` | Total bytes written by the process to the underlying storage system, /proc/[pid]/io write_bytes | `server` |
|
|
| `start_time_seconds` | `gauge` | Start time for MinIO process in seconds since Unix epoc | `server` |
|
|
| `uptime_seconds` | `gauge` | Uptime for MinIO process in seconds | `server` |
|
|
| `file_descriptor_limit_total` | `gauge` | Limit on total number of open file descriptors for the MinIO Server process | `server` |
|
|
| `file_descriptor_open_total` | `gauge` | Total number of open file descriptors by the MinIO Server process | `server` |
|
|
| `syscall_read_total` | `counter` | Total read SysCalls to the kernel. /proc/[pid]/io syscr | `server` |
|
|
| `syscall_write_total` | `counter` | Total write SysCalls to the kernel. /proc/[pid]/io syscw | `server` |
|
|
| `resident_memory_bytes` | `gauge` | Resident memory size in bytes | `server` |
|
|
| `virtual_memory_bytes` | `gauge` | Virtual memory size in bytes | `server` |
|
|
| `virtual_memory_max_bytes` | `gauge` | Maximum virtual memory size in bytes | `server` |
|
|
|
|
### `/cluster/health`
|
|
|
|
| Name | Type | Help | Labels |
|
|
|----------------------------------------------------|---------|------------------------------------------------|--------|
|
|
| `minio_cluster_health_drives_offline_count` | `gauge` | Count of offline drives in the cluster | |
|
|
| `minio_cluster_health_drives_online_count` | `gauge` | Count of online drives in the cluster | |
|
|
| `minio_cluster_health_drives_count` | `gauge` | Count of all drives in the cluster | |
|
|
| `minio_cluster_health_nodes_offline_count` | `gauge` | Count of offline nodes in the cluster | |
|
|
| `minio_cluster_health_nodes_online_count` | `gauge` | Count of online nodes in the cluster | |
|
|
| `minio_cluster_health_capacity_raw_total_bytes` | `gauge` | Total cluster raw storage capacity in bytes | |
|
|
| `minio_cluster_health_capacity_raw_free_bytes` | `gauge` | Total cluster raw storage free in bytes | |
|
|
| `minio_cluster_health_capacity_usable_total_bytes` | `gauge` | Total cluster usable storage capacity in bytes | |
|
|
| `minio_cluster_health_capacity_usable_free_bytes` | `gauge` | Total cluster usable storage free in bytes | |
|
|
|
|
### `/cluster/audit`
|
|
|
|
| Name | Type | Help | Labels |
|
|
|-------------------------------------------|-----------|----------------------------------------------------------|-------------|
|
|
| `minio_cluster_audit_failed_messages` | `counter` | Total number of messages that failed to send since start | `target_id` |
|
|
| `minio_cluster_audit_target_queue_length` | `gauge` | Number of unsent messages in queue for target | `target_id` |
|
|
| `minio_cluster_audit_total_messages` | `counter` | Total number of messages sent since start | `target_id` |
|
|
|
|
### `/cluster/usage/objects`
|
|
|
|
| Name | Type | Help | Labels |
|
|
|----------------------------------------------------------|---------|----------------------------------------------------------------|---------|
|
|
| `minio_cluster_usage_objects_since_last_update_seconds` | `gauge` | Time since last update of usage metrics in seconds | |
|
|
| `minio_cluster_usage_objects_total_bytes` | `gauge` | Total cluster usage in bytes | |
|
|
| `minio_cluster_usage_objects_count` | `gauge` | Total cluster objects count | |
|
|
| `minio_cluster_usage_objects_versions_count` | `gauge` | Total cluster object versions (including delete markers) count | |
|
|
| `minio_cluster_usage_objects_delete_markers_count` | `gauge` | Total cluster delete markers count | |
|
|
| `minio_cluster_usage_objects_buckets_count` | `gauge` | Total cluster buckets count | |
|
|
| `minio_cluster_usage_objects_size_distribution` | `gauge` | Cluster object size distribution | `range` |
|
|
| `minio_cluster_usage_objects_version_count_distribution` | `gauge` | Cluster object version count distribution | `range` |
|
|
|
|
### `/cluster/usage/buckets`
|
|
|
|
| Name | Type | Help | Labels |
|
|
|-----------------------------------------------------------------|---------|------------------------------------------------------------------|----------------|
|
|
| `minio_cluster_usage_buckets_since_last_update_seconds` | `gauge` | Time since last update of usage metrics in seconds | |
|
|
| `minio_cluster_usage_buckets_total_bytes` | `gauge` | Total bucket size in bytes | `bucket` |
|
|
| `minio_cluster_usage_buckets_objects_count` | `gauge` | Total objects count in bucket | `bucket` |
|
|
| `minio_cluster_usage_buckets_versions_count` | `gauge` | Total object versions (including delete markers) count in bucket | `bucket` |
|
|
| `minio_cluster_usage_buckets_delete_markers_count` | `gauge` | Total delete markers count in bucket | `bucket` |
|
|
| `minio_cluster_usage_buckets_quota_total_bytes` | `gauge` | Total bucket quota in bytes | `bucket` |
|
|
| `minio_cluster_usage_buckets_object_size_distribution` | `gauge` | Bucket object size distribution | `range,bucket` |
|
|
| `minio_cluster_usage_buckets_object_version_count_distribution` | `gauge` | Bucket object version count distribution | `range,bucket` |
|
|
|
|
### `/cluster/erasure-set`
|
|
|
|
| Name | Type | Help | Labels |
|
|
|--------------------------------------------------|---------|---------------------------------------------------------------|------------------|
|
|
| `minio_cluster_erasure_set_overall_write_quorum` | `gauge` | Overall write quorum across pools and sets | |
|
|
| `minio_cluster_erasure_set_overall_health` | `gauge` | Overall health across pools and sets (1=healthy, 0=unhealthy) | |
|
|
| `minio_cluster_erasure_set_read_quorum` | `gauge` | Read quorum for the erasure set in a pool | `pool_id,set_id` |
|
|
| `minio_cluster_erasure_set_write_quorum` | `gauge` | Write quorum for the erasure set in a pool | `pool_id,set_id` |
|
|
| `minio_cluster_erasure_set_online_drives_count` | `gauge` | Count of online drives in the erasure set in a pool | `pool_id,set_id` |
|
|
| `minio_cluster_erasure_set_healing_drives_count` | `gauge` | Count of healing drives in the erasure set in a pool | `pool_id,set_id` |
|
|
| `minio_cluster_erasure_set_health` | `gauge` | Health of the erasure set in a pool (1=healthy, 0=unhealthy) | `pool_id,set_id` |
|
|
|
|
### `/cluster/notification`
|
|
|
|
| Name | Type | Help | Labels |
|
|
|-------------------------------------------------------|-----------|------------------------------------------------------------------------------------------|--------|
|
|
| `minio_cluster_notification_current_send_in_progress` | `counter` | Number of concurrent async Send calls active to all targets | |
|
|
| `minio_cluster_notification_events_errors_total` | `counter` | Events that were failed to be sent to the targets | |
|
|
| `minio_cluster_notification_events_sent_total` | `counter` | Total number of events sent to the targets | |
|
|
| `minio_cluster_notification_events_skipped_total` | `counter` | Events that were skipped to be sent to the targets due to the in-memory queue being full | |
|