Commit Graph

42 Commits

Author SHA1 Message Date
Aditya Manthramurthy 4c8562bcec
Fix v2 metrics: Send all ttfb api labels (#20191)
Fix a regression in #19733 where TTFB metrics for all APIs except
GetObject were removed in v2 and v3 metrics. This causes breakage for
existing v2 metrics users. Instead we continue to send TTFB for all APIs
in V2 but only send for GetObject in V3.
2024-07-30 15:28:46 -07:00
Shireesh Anjal 0e59e50b39
Capture ttfb api metrics only for GetObject (#19733)
as that is the only API where the TTFB metric is beneficial, and
capturing this for all APIs exponentially increases the response size in
large clusters.
2024-05-14 23:25:13 -07:00
Aditya Manthramurthy b2c5b75efa
feat: Add Metrics V3 API (#19068)
Metrics v3 is mainly a reorganization of metrics into smaller groups of
metrics and the removal of internal aggregation of metrics received from
peer nodes in a MinIO cluster.

This change adds the endpoint `/minio/metrics/v3` as the top-level metrics
endpoint and under this, various sub-endpoints are implemented. These
are currently documented in `docs/metrics/v3.md`

The handler will serve metrics at any path
`/minio/metrics/v3/PATH`, as follows:

when PATH is a sub-endpoint listed above => serves the group of
metrics under that path; or when PATH is a (non-empty) parent 
directory of the sub-endpoints listed above => serves metrics
from each child sub-endpoint of PATH. otherwise, returns a no 
resource found error

All available metrics are listed in the `docs/metrics/v3.md`. More will
be added subsequently.
2024-03-10 01:15:15 -08:00
Aditya Manthramurthy 9a4d003ac7
Add common middleware to S3 API handlers (#19171)
The middleware sets up tracing, throttling, gzipped responses and
collecting API stats.

Additionally, this change updates the names of handler functions in
metric labels to be the same as the name derived from Go lang reflection
on the handler name.

The metric api labels are now stored in memory the same as the handler
name - they will be camelcased, e.g. `GetObject` instead of `getobject`.

For compatibility, we lowercase the metric api label values when emitting the metrics.
2024-03-04 10:05:56 -08:00
Harshavardhana 6426b74770
move bucket centric metrics to /minio/v2/metrics/bucket handlers (#17663)
users/customers do not have a reasonable number of buckets anymore,
this is why we must avoid overpopulating cluster endpoints, instead
move the bucket monitoring to a separate endpoint.

some of it's a breaking change here for a couple of metrics, but
it is imperative that we do it to improve the responsiveness of
our Prometheus cluster endpoint.

Bonus: Added new cluster metrics for usage, objects and histograms
2023-07-18 22:25:12 -07:00
Harshavardhana 5b7c83341b
move per bucket metrics to peer location (#17627) 2023-07-11 07:46:24 -07:00
Anis Eleuch 6d0bc5ab1e
prometheus: Fix internode stats (#17594)
Internode calculation was done inside S3 handlers, fix it by moving it
to internode handlers.

Remove admin stats since it is not used.
2023-07-08 07:35:11 -07:00
Harshavardhana 7605d07bb2
add support for bucket level request count per API (#17468)
New metrics added to calculate API request count
per bucket, per API.  Captures errors, including
4xx, 5xx HTTP status codes separately.
2023-06-21 09:41:59 -07:00
Harshavardhana a5835cecbf
fix: regression in counting total requests (#17024) 2023-04-12 14:37:19 -07:00
Harshavardhana ae029191a3
liveness returns "busy" if queued requests > available capacity (#16719) 2023-02-27 08:34:52 -08:00
Anis Elleuch 1f1dcdce65
move HTTP recorder to an internal library (#16128) 2022-11-28 10:20:27 -08:00
Harshavardhana 8082d1fed6
add bucket level S3 received/sent bytes (#15084)
adds bucket level metrics for bytes received and sent bytes on all S3 API calls.
2022-06-14 15:14:24 -07:00
Harshavardhana 7413045f0e
fix: add missing minio_s3_requests_total (#15070)
PR #15052 caused a regression, add the missing metrics back.

Bonus:

- internode information should be only for distributed setups 
- update the dashboard to include 4xx and 5xx error panels.
2022-06-11 00:50:31 -07:00
Anis Elleuch 5fb420c703
prometheus: Add S3 4xx and 5xx S3 monitoring (#15052)
Currently minio_s3_requests_errors_total covers 4xx and 
5xx S3 responses which can be confusing when s3 applications 
sent a lot of HEAD requests with obvious 404 responses or 
when the replication is enabled.

Add 
- minio_s3_requests_4xx_errors_total
- minio_s3_requests_5xx_errors_total

to help users monitor 4xx and 5xx HTTP status codes separately.
2022-06-08 11:22:34 -07:00
Harshavardhana af3dc25dfe
align 32bit integers with atomic values in structs (#14344)
fixes #14341
2022-02-17 15:22:26 -08:00
Anis Elleuch 2ee337ead5
prometheus: Add incoming requests metrics since last scrape (#14261)
Some users running MinIO claim that their system became slow. One 
way to investigate is to look at this Prometheus history of the number of
the requests reaching the server. The existing current S3 requests metric
is not enough because it can increase of the system really becomes slow, 
due to disk issues for example.
2022-02-07 16:30:14 -08:00
Harshavardhana f527c708f2
run gofumpt cleanup across code-base (#14015) 2022-01-02 09:15:06 -08:00
Harshavardhana 18338d60d5 treat all 2xx, 3xx as good status-codes
fixes #13560
2021-11-02 14:12:43 -07:00
Klaus Post f31a00de01
fix: http stats race in traffic metering (#12956)
Traffic metering was not protected against concurrent updates.

```
WARNING: DATA RACE
Read at 0x00c02b0dace8 by goroutine 235:
  github.com/minio/minio/cmd.setHTTPStatsHandler.func1()
      d:/minio/minio/cmd/generic-handlers.go:360 +0x27d
  net/http.HandlerFunc.ServeHTTP()
...

Previous write at 0x00c02b0dace8 by goroutine 994:
  github.com/minio/minio/internal/http/stats.(*IncomingTrafficMeter).Read()
      d:/minio/minio/internal/http/stats/http-traffic-recorder.go:34 +0xd2

```
2021-08-13 07:30:03 -07:00
zxxxhonest 3a0ca7af8c
panic: unaligned 64-bit atomic operation (#12559)
goroutine 1 [running]:
runtime/internal/atomic.panicUnaligned()
        /usr/local/go/src/runtime/internal/atomic/unaligned.go:8 +0x24

golang doc:
// BUG(rsc): On x86-32, the 64-bit functions use instructions unavailable before the Pentium MMX.
//
// On non-Linux ARM, the 64-bit functions use instructions unavailable before the ARMv6k core.
//
// On ARM, x86-32, and 32-bit MIPS,
// it is the caller's responsibility to arrange for 64-bit
// alignment of 64-bit words accessed atomically. The first word in a
// variable or in an allocated struct, array, or slice can be relied upon to be
// 64-bit aligned.
2021-06-23 07:15:43 -07:00
Harshavardhana 1f262daf6f
rename all remaining packages to internal/ (#12418)
This is to ensure that there are no projects
that try to import `minio/minio/pkg` into
their own repo. Any such common packages should
go to `https://github.com/minio/pkg`
2021-06-01 14:59:40 -07:00
Harshavardhana 069432566f update license change for MinIO
Signed-off-by: Harshavardhana <harsha@minio.io>
2021-04-23 11:58:53 -07:00
Ritesh H Shukla 3ddd8b04d1
fix: handle unsupported APIs more granularly (#11674) 2021-03-30 23:19:36 -07:00
Harshavardhana d7f32ad649 xl: avoid sending Delete() remote call for fully successful runs
an optimization to avoid extra syscalls in PutObject(),
adds up to our PutObject response times.
2021-03-24 17:32:12 -07:00
Klaus Post 749e9c5771
metrics: Add canceled requests (#11881)
Add metric for canceled requests
2021-03-24 10:25:27 -07:00
Anis Elleuch 98d3f94996
metrics: Add the number of requests in the waiting queue (#11580)
We can use this metric to check if there are too many S3 clients in the
queue and could explain why some of those S3 clients are timing out.

```
minio_s3_requests_waiting_total{server="127.0.0.1:9000"} 9981
```

If max_requests is 10000 then there is a strong possibility that clients
are timing out because of the queue deadline.
2021-02-20 00:21:55 -08:00
Ritesh H Shukla b4add82bb6
Updated Prometheus metrics (#11141)
* Add metrics for nodes online and offline
* Add cluster capacity metrics
* Introduce v2 metrics
2021-01-18 20:35:38 -08:00
Harshavardhana 3a0082f0f1
fix: TTFB prometheus metrics calculation (#11082)
until now metrics was reporting entire call
duration instead of ttfb's this PR fixes it
2020-12-10 23:02:25 -08:00
Harshavardhana 6ac48a65cb
fix: use unused cacheMetrics code in prometheus (#9588)
remove all other unusued/deadcode
2020-05-13 08:15:26 -07:00
Harshavardhana 27d716c663
simplify usage of mutexes and atomic constants (#9501) 2020-05-03 22:35:40 -07:00
Anis Elleuch 27632ca6ec
audit: Merge ResponseWriter with RecordAPIStats (#9496)
ResponseWriter & RecordAPIStats has similar role, merge them.

This commit will also fix wrong auditing for STS and Web and others
since they are using ResponseWriter instead of the RecordAPIStats.
2020-04-30 11:27:19 -07:00
Anis Elleuch d090a17ed0
fix: Audit tests on the correct response writer type (#9445) 2020-04-29 22:17:36 -07:00
Klaus Post 8d98662633
re-implement data usage crawler to be more efficient (#9075)
Implementation overview: 

https://gist.github.com/klauspost/1801c858d5e0df391114436fdad6987b
2020-03-18 16:19:29 -07:00
Harshavardhana b1ad99edbf
fix: avoid crash copy map before reading (#8825)
code of this form is always racy, when the
map itself is being written to as well

```
func (r Map) retMap() map[string]string {
     .. lock ..
     return r.internalMap
}

func (r Map) addMap(k, v string) {
     .. lock ..
     r.internalMap[k] = v
}
```

Anyone reading from `retMap()` is not protected
because of locking and we need to make sure
to avoid code in this manner. Always safe to
copy the map and return.
2020-01-16 01:35:30 -08:00
Praveen raj Mani 8836d57e3c The prometheus metrics refractoring (#8003)
The measures are consolidated to the following metrics

- `disk_storage_used` : Disk space used by the disk.
- `disk_storage_available`: Available disk space left on the disk.
- `disk_storage_total`: Total disk space on the disk.
- `disks_offline`: Total number of offline disks in current MinIO instance.
- `disks_total`: Total number of disks in current MinIO instance.
- `s3_requests_total`: Total number of s3 requests in current MinIO instance.
- `s3_errors_total`: Total number of errors in s3 requests in current MinIO instance.
- `s3_requests_current`: Total number of active s3 requests in current MinIO instance.
- `internode_rx_bytes_total`: Total number of internode bytes received by current MinIO server instance.
- `internode_tx_bytes_total`: Total number of bytes sent to the other nodes by current MinIO server instance.
- `s3_rx_bytes_total`: Total number of s3 bytes received by current MinIO server instance.
- `s3_tx_bytes_total`: Total number of s3 bytes sent by current MinIO server instance.
- `minio_version_info`: Current MinIO version with commit-id.
- `s3_ttfb_seconds_bucket`: Histogram that holds the latency information of the requests.

And this PR also modifies the current StorageInfo queries

- Decouples StorageInfo from ServerInfo .
- StorageInfo is enhanced to give endpoint information.

NOTE: ADMIN API VERSION IS BUMPED UP IN THIS PR

Fixes #7873
2019-10-22 21:01:14 -07:00
kannappanr 5ecac91a55
Replace Minio refs in docs with MinIO and links (#7494) 2019-04-09 11:39:42 -07:00
Harshavardhana 396d78352d Support HTTP/2.0 (#7204)
Fixes #6704
2019-02-14 17:53:46 -08:00
Bala FA 72fa2b4537 Add RPC counters for HTTP stats. (#6206)
This patch introduces separate counters for HTTP stats for minio
reserved bucket.

Fixes #6158
2018-08-30 14:17:58 +05:30
Nitish Tiwari e6ebcc4cb6 Remove redundant prometheus data points (#5934)
Removed field minio_http_requests_total as it was redundant with
minio_http_requests_duration_seconds_count

Also removed field minio_server_start_time_seconds as it was
redundant with process_start_time_seconds
2018-05-15 12:23:43 -07:00
Ashish Kumar Sinha 9ebb72aa99 Introduce new unauthenticated endpoint /metric (#5723) (#5829)
/metric exposes Promethus compatible data for scraping metrics

Fixes: #5723
2018-04-18 16:01:42 -07:00
Anis Elleuch 83abad0b37 admin: ServerInfo() returns info for each node (#4150)
ServerInfo() will gather information from all nodes before returning
it back to the client.
2017-04-21 07:15:53 -07:00
Harshavardhana 27749c2124 admin/info: Add HTTPStats value as part of serverInfo() struct. (#4049)
Remove our counter implementation instead use atomic external
package which supports more types and methods.
2017-04-06 23:08:33 -07:00