Streams can return errors if the cancelation is picked up before the response
stream close is picked up. Under extreme load, this could lead to missing
responses.
Send server mux ack async so a blocked send cannot block newMuxStream
call. Stream will not progress until mux has been acked.
If network conditions have filled the output queue before a reconnect happens blocked sends could stop reconnects from happening. In short `respMu` would be held for a mux client while sending - if the queue is full this will never get released and closing the mux client will hang.
A) Use the mux client context instead of connection context for sends, so sends are unblocked when the mux client is canceled.
B) Use a `TryLock` on "close" and cancel the request if we cannot get the lock at once. This will unblock any attempts to send.
Fix reported races that are actually synchronized by network calls.
But this should add some extra safety for untimely disconnects.
Race reported:
```
WARNING: DATA RACE
Read at 0x00c00171c9c0 by goroutine 214:
github.com/minio/minio/internal/grid.(*muxClient).addResponse()
e:/gopath/src/github.com/minio/minio/internal/grid/muxclient.go:519 +0x111
github.com/minio/minio/internal/grid.(*muxClient).error()
e:/gopath/src/github.com/minio/minio/internal/grid/muxclient.go:470 +0x21d
github.com/minio/minio/internal/grid.(*Connection).handleDisconnectClientMux()
e:/gopath/src/github.com/minio/minio/internal/grid/connection.go:1391 +0x15b
github.com/minio/minio/internal/grid.(*Connection).handleMsg()
e:/gopath/src/github.com/minio/minio/internal/grid/connection.go:1190 +0x1ab
github.com/minio/minio/internal/grid.(*Connection).handleMessages.func1()
e:/gopath/src/github.com/minio/minio/internal/grid/connection.go:981 +0x610
Previous write at 0x00c00171c9c0 by goroutine 1081:
github.com/minio/minio/internal/grid.(*muxClient).roundtrip()
e:/gopath/src/github.com/minio/minio/internal/grid/muxclient.go:94 +0x324
github.com/minio/minio/internal/grid.(*muxClient).traceRoundtrip()
e:/gopath/src/github.com/minio/minio/internal/grid/trace.go:74 +0x10e4
github.com/minio/minio/internal/grid.(*Subroute).Request()
e:/gopath/src/github.com/minio/minio/internal/grid/connection.go:366 +0x230
github.com/minio/minio/internal/grid.(*SingleHandler[go.shape.*github.com/minio/minio/cmd.DiskInfoOptions,go.shape.*github.com/minio/minio/cmd.DiskInfo]).Call()
e:/gopath/src/github.com/minio/minio/internal/grid/handlers.go:554 +0x3fd
github.com/minio/minio/cmd.(*storageRESTClient).DiskInfo()
e:/gopath/src/github.com/minio/minio/cmd/storage-rest-client.go:314 +0x270
github.com/minio/minio/cmd.erasureObjects.getOnlineDisksWithHealingAndInfo.func1()
e:/gopath/src/github.com/minio/minio/cmd/erasure.go:293 +0x171
```
This read will always happen after the write, since there is a network call in between.
However a disconnect could come in while we are setting up the call, so we protect against that with extra checks.
We have observed cases where a blocked stream will block for cancellations.
This happens when response channel is blocked and we want to push an error.
This will have the response mutex locked, which will prevent all other operations until upstream is unblocked.
Make this behavior non-blocking and if blocked spawn a goroutine that will send the response and close the output.
Still a lot of "dancing". Added a test for this and reviewed.
Do not rely on `connChange` to do reconnects.
Instead, you can block while the connection is running and reconnect
when handleMessages returns.
Add fully async monitoring instead of monitoring on the main goroutine
and keep this to avoid full network lockup.
This PR adds a WebSocket grid feature that allows servers to communicate via
a single two-way connection.
There are two request types:
* Single requests, which are `[]byte => ([]byte, error)`. This is for efficient small
roundtrips with small payloads.
* Streaming requests which are `[]byte, chan []byte => chan []byte (and error)`,
which allows for different combinations of full two-way streams with an initial payload.
Only a single stream is created between two machines - and there is, as such, no
server/client relation since both sides can initiate and handle requests. Which server
initiates the request is decided deterministically on the server names.
Requests are made through a mux client and server, which handles message
passing, congestion, cancelation, timeouts, etc.
If a connection is lost, all requests are canceled, and the calling server will try
to reconnect. Registered handlers can operate directly on byte
slices or use a higher-level generics abstraction.
There is no versioning of handlers/clients, and incompatible changes should
be handled by adding new handlers.
The request path can be changed to a new one for any protocol changes.
First, all servers create a "Manager." The manager must know its address
as well as all remote addresses. This will manage all connections.
To get a connection to any remote, ask the manager to provide it given
the remote address using.
```
func (m *Manager) Connection(host string) *Connection
```
All serverside handlers must also be registered on the manager. This will
make sure that all incoming requests are served. The number of in-flight
requests and responses must also be given for streaming requests.
The "Connection" returned manages the mux-clients. Requests issued
to the connection will be sent to the remote.
* `func (c *Connection) Request(ctx context.Context, h HandlerID, req []byte) ([]byte, error)`
performs a single request and returns the result. Any deadline provided on the request is
forwarded to the server, and canceling the context will make the function return at once.
* `func (c *Connection) NewStream(ctx context.Context, h HandlerID, payload []byte) (st *Stream, err error)`
will initiate a remote call and send the initial payload.
```Go
// A Stream is a two-way stream.
// All responses *must* be read by the caller.
// If the call is canceled through the context,
//The appropriate error will be returned.
type Stream struct {
// Responses from the remote server.
// Channel will be closed after an error or when the remote closes.
// All responses *must* be read by the caller until either an error is returned or the channel is closed.
// Canceling the context will cause the context cancellation error to be returned.
Responses <-chan Response
// Requests sent to the server.
// If the handler is defined with 0 incoming capacity this will be nil.
// Channel *must* be closed to signal the end of the stream.
// If the request context is canceled, the stream will no longer process requests.
Requests chan<- []byte
}
type Response struct {
Msg []byte
Err error
}
```
There are generic versions of the server/client handlers that allow the use of type
safe implementations for data types that support msgpack marshal/unmarshal.