allow quota enforcement to rely on older values (#17351)

PUT calls cannot afford to have large latency build-ups due
to contentious usage.json, or worse letting them fail with
some unexpected error, this can happen when this file is
concurrently being updated via scanner or it is being
healed during a disk replacement heal.

However, these are fairly quick in theory, stressed clusters
can quickly show visible latency this can add up leading to
invalid errors returned during PUT.

It is perhaps okay for us to relax this error return requirement
instead, make sure that we log that we are proceeding to take in
the requests while the quota is using an older value for the quota
enforcement. These things will reconcile themselves eventually,
via scanner making sure to overwrite the usage.json.

Bonus: make sure that storage-rest-client sets ExpectTimeouts to
be 'true', such that DiskInfo() call with contextTimeout does
not prematurely disconnect the servers leading to a longer
healthCheck, back-off routine. This can easily pile up while also
causing active callers to disconnect, leading to quorum loss.

DiskInfo is actively used in the PUT, Multipart call path for
upgrading parity when disks are down, it in-turn shouldn't cause
more disks to go down.
This commit is contained in:
Harshavardhana
2023-06-05 16:56:35 -07:00
committed by GitHub
parent 75c6fc4f02
commit 2f9e2147f5
7 changed files with 29 additions and 24 deletions

View File

@@ -200,8 +200,8 @@ func (di *distLockInstance) GetRLock(ctx context.Context, timeout *dynamicTimeou
}) {
timeout.LogFailure()
defer cancel()
if err := newCtx.Err(); err == context.Canceled {
return LockContext{ctx: ctx, cancel: func() {}}, err
if errors.Is(newCtx.Err(), context.Canceled) {
return LockContext{ctx: ctx, cancel: func() {}}, newCtx.Err()
}
return LockContext{ctx: ctx, cancel: func() {}}, OperationTimedOut{}
}
@@ -255,8 +255,8 @@ func (li *localLockInstance) GetLock(ctx context.Context, timeout *dynamicTimeou
li.ns.unlock(li.volume, li.paths[si], readLock)
}
}
if err := ctx.Err(); err == context.Canceled {
return LockContext{}, err
if errors.Is(ctx.Err(), context.Canceled) {
return LockContext{}, ctx.Err()
}
return LockContext{}, OperationTimedOut{}
}