Commit Graph

22 Commits

Author SHA1 Message Date
Klaus Post 6e38d0f3ab
Add more bootstrap info in debug mode (#17362) 2023-06-08 08:39:47 -07:00
Harshavardhana 2f9e2147f5
allow quota enforcement to rely on older values (#17351)
PUT calls cannot afford to have large latency build-ups due
to contentious usage.json, or worse letting them fail with
some unexpected error, this can happen when this file is
concurrently being updated via scanner or it is being
healed during a disk replacement heal.

However, these are fairly quick in theory, stressed clusters
can quickly show visible latency this can add up leading to
invalid errors returned during PUT.

It is perhaps okay for us to relax this error return requirement
instead, make sure that we log that we are proceeding to take in
the requests while the quota is using an older value for the quota
enforcement. These things will reconcile themselves eventually,
via scanner making sure to overwrite the usage.json.

Bonus: make sure that storage-rest-client sets ExpectTimeouts to
be 'true', such that DiskInfo() call with contextTimeout does
not prematurely disconnect the servers leading to a longer
healthCheck, back-off routine. This can easily pile up while also
causing active callers to disconnect, leading to quorum loss.

DiskInfo is actively used in the PUT, Multipart call path for
upgrading parity when disks are down, it in-turn shouldn't cause
more disks to go down.
2023-06-05 16:56:35 -07:00
jiuker fb5ce3b87a
record err time when remote node is offline (#17262) 2023-05-30 10:07:26 -07:00
jiuker b1b00a5055
fix: Avoid Income globalStats twice upon error (#17263) 2023-05-22 07:42:27 -07:00
Anis Eleuch 4640b13c66
Use expontential backoff algo for internode reconnections (#17052) 2023-05-02 12:35:52 -07:00
Anis Eleuch 0b7ca094e4
Remove Expect 100-continue in internode communications (#17061) 2023-04-26 09:33:45 -07:00
Harshavardhana d19cbc81b5
fix: do not return IAM/Bucket metadata replication errors to client (#16486) 2023-01-26 11:11:54 -08:00
Anis Elleuch 932d2c3c62
Add X-Amz-Request-Id to internode calls (#16146) 2022-12-06 09:27:26 -08:00
Klaus Post ddeca9f12a
fix: filter rest errors and logs returned (#16019) 2022-11-07 10:38:08 -08:00
Anis Elleuch 048a46ec2a
Add RPC tcp timeout/errs and AVG duration to prometheus (#15747) 2022-09-26 09:04:26 -07:00
Klaus Post ff12080ff5
Remove deprecated io/ioutil (#15707) 2022-09-19 11:05:16 -07:00
Anis Elleuch 4a92134235
prometheus: track errors during REST read/write calls (#15678)
minio_inter_node_traffic_errors_total currently does not track
requests body write/read errors of internode REST communications.

This commit fixes this by wrapping resp.Body.
2022-09-12 12:40:51 -07:00
ebozduman b57e7321e7
Replaces 'disk'=>'drive' visible to end user (#15464) 2022-08-04 16:10:08 -07:00
Harshavardhana 5e763b71dc
use logger.LogOnce to reduce printing disconnection logs (#15408)
fixes #15334

- re-use net/url parsed value for http.Request{}
- remove gosimple, structcheck and unusued due to https://github.com/golangci/golangci-lint/issues/2649
- unwrapErrs upto leafErr to ensure that we store exactly the correct errors
2022-07-27 09:44:59 -07:00
Harshavardhana 785b429737
add reconnect duration allows for verifying disconnect intervals (#15306) 2022-07-15 14:41:24 -07:00
Klaus Post 9004d69c6f
Make ReqInfo concurrency safe (#15204)
Some read/writes of ReqInfo did not get appropriate locks, leading to races.

Make sure reading and writing holds appropriate locks.
2022-06-30 10:48:50 -07:00
Harshavardhana 65b4b100a8
de-couple caller context to avoid internal races (#15195)
```
fatal error: concurrent map iteration and map write
fatal error: concurrent map iteration and map write

goroutine 745335841 [running]:
runtime.throw({0x273e67b?, 0x80?})
        runtime/panic.go:992 +0x71 fp=0xc0390bc240 sp=0xc0390bc210 pc=0x438671
runtime.mapiternext(0x40d987?)
        runtime/map.go:871 +0x4eb fp=0xc0390bc2b0 sp=0xc0390bc240 pc=0x41002b
runtime.mapiterinit(0x46bec7?, 0x4ef76c?, 0xc0017cc9c0?)
        runtime/map.go:861 +0x228 fp=0xc0390bc2d0 sp=0xc0390bc2b0 pc=0x40fae8
reflect.mapiterinit(0x1b5?, 0xc0?, 0x235bcc0?)
```

```
github.com/minio/minio/internal/rest/client.go:151 +0x5f4 fp=0xc0390bd988 sp=0xc0390bd730 pc=0x153e434
```
2022-06-29 14:44:26 -07:00
Harshavardhana 1a56ebea70
cleanup dsync tests and remove net/rpc references (#14118) 2022-01-18 12:44:38 -08:00
Harshavardhana ffd497673f
internode lockArgs should use messagepack (#13329)
it would seem like using `bufio.Scan()` is very
slow for heavy concurrent I/O, ie. when r.Body
is slow , instead use a proper
binary exchange format, to marshal and unmarshal
the LockArgs datastructure in a cleaner way.

this PR increases performance of the locking
sub-system for tiny repeated read lock requests
on same object.

```
BenchmarkLockArgs
BenchmarkLockArgs-4              6417609               185.7 ns/op            56 B/op          2 allocs/op
BenchmarkLockArgsOld
BenchmarkLockArgsOld-4           1187368              1015 ns/op            4096 B/op          1 allocs/op
```
2021-09-30 11:53:01 -07:00
Harshavardhana da74e2f167
move internal/net to pkg/net package (#12505) 2021-06-14 14:54:37 -07:00
Anis Elleuch 6c8be64cdb
rest: healthcheck should not update failure metrics (#12458)
Otherwise, we can see high numbers of networking issues when a node is
down.
2021-06-08 14:09:26 -07:00
Harshavardhana 1f262daf6f
rename all remaining packages to internal/ (#12418)
This is to ensure that there are no projects
that try to import `minio/minio/pkg` into
their own repo. Any such common packages should
go to `https://github.com/minio/pkg`
2021-06-01 14:59:40 -07:00