Streams can return errors if the cancelation is picked up before the response
stream close is picked up. Under extreme load, this could lead to missing
responses.
Send server mux ack async so a blocked send cannot block newMuxStream
call. Stream will not progress until mux has been acked.
in k8s things really do come online very asynchronously,
we need to use implementation that allows this randomness.
To facilitate this move WriteAll() as part of the
websocket layer instead.
Bonus: avoid instances of dnscache usage on k8s
New disk healing code skips/expires objects that ILM supposed to expire.
Add more visibility to the user about this activity by calculating those
objects and print it at the end of healing activity.
This PR fixes a bug that perhaps has been long introduced,
with no visible workarounds. In any deployment, if an entire
erasure set is deleted, there is no way the cluster recovers.
Currently, if one object tag matches with one lifecycle tag filter, ILM
will select it, however, this is wrong. All the Tag filters in the
lifecycle document should be satisfied.
This change is to decouple need for root credentials to match between
site replication deployments.
Also ensuring site replication config initialization is re-tried until
it succeeds, this deoendency is critical to STS flow in site replication
scenario.
Currently, we read from `/proc/diskstats` which is found to be
un-reliable in k8s environments. We can read from `sysfs` instead.
Also, cache the latest drive io stats to find the diff and update
the metrics.
* Remove lock for cached operations.
* Rename "Relax" to `ReturnLastGood`.
* Add `CacheError` to allow caching values even on errors.
* Add NoWait that will return current value with async fetching if within 2xTTL.
* Make benchmark somewhat representative.
```
Before: BenchmarkCache-12 16408370 63.12 ns/op 0 B/op
After: BenchmarkCache-12 428282187 2.789 ns/op 0 B/op
```
* Remove `storageRESTClient.scanning`. Nonsensical - RPC clients will not have any idea about scanning.
* Always fetch remote diskinfo metrics and cache them. Seems most calls are requesting metrics.
* Do async fetching of usage caches.
It also fixes a long-standing bug in expiring transitioned objects.
The expiration action was deleting the current version in the case'
of tiered objects instead of adding a delete marker.
only enable md5sum if explicitly asked by the client, otherwise
its not necessary to compute md5sum when SSE-KMS/SSE-C is enabled.
this is continuation of #17958
If network conditions have filled the output queue before a reconnect happens blocked sends could stop reconnects from happening. In short `respMu` would be held for a mux client while sending - if the queue is full this will never get released and closing the mux client will hang.
A) Use the mux client context instead of connection context for sends, so sends are unblocked when the mux client is canceled.
B) Use a `TryLock` on "close" and cancel the request if we cannot get the lock at once. This will unblock any attempts to send.
data-dir not being present is okay, however we can still
rely on the `rename()` atomic call instead of relying on
write xl.meta write which may truncate the io.EOF.
Add a new function logger.Event() to send the log to Console and
http/kafka log webhooks. This will include some internal events such as
disk healing and rebalance/decommissioning