epoll contention on TCP causes latency build-up when
we have high volume ingress. This PR is an attempt to
relieve this pressure.
upstream issue https://github.com/golang/go/issues/65064
It seems to be a deeper problem; haven't yet tried the fix
provide in this issue, but however this change without
changing the compiler helps.
Of course, this is a workaround for now, hoping for a
more comprehensive fix from Go runtime.
Use `runtime.Gosched()` if we have less than maxMergeMessages and the
queue is empty. Up maxMergeMessages to 50 to merge more messages into
a single write.
Add length check for an early bailout on readAllInto when we know packet length.
Split the read and write sides of handleMessages into two separate functions
Cosmetic. The only non-copy-and-paste change is that `cancel(ErrDisconnected)` is moved
into the defer on `readStream`.
Add `ConnDialer` to abstract connection creation.
- `IncomingConn(ctx context.Context, conn net.Conn)` is provided as an entry point for
incoming custom connections.
- `ConnectWS` is provided to create web socket connections.
When a reconnection happens, `handleMessages` must be able to complete and exit.
This can be prevented in a full queue.
Deadlock chain (May 10th release)
```
1 @ 0x44110e 0x453125 0x109f88c 0x109f7d5 0x10a472c 0x10a3f72 0x10a34ed 0x4795e1
# 0x109f88b github.com/minio/minio/internal/grid.(*Connection).send+0x3eb github.com/minio/minio/internal/grid/connection.go:548
# 0x109f7d4 github.com/minio/minio/internal/grid.(*Connection).queueMsg+0x334 github.com/minio/minio/internal/grid/connection.go:586
# 0x10a472b github.com/minio/minio/internal/grid.(*Connection).handleAckMux+0xab github.com/minio/minio/internal/grid/connection.go:1284
# 0x10a3f71 github.com/minio/minio/internal/grid.(*Connection).handleMsg+0x231 github.com/minio/minio/internal/grid/connection.go:1211
# 0x10a34ec github.com/minio/minio/internal/grid.(*Connection).handleMessages.func1+0x6cc github.com/minio/minio/internal/grid/connection.go:1019
---> blocks ---> via (Connection).handleMsgWg
1 @ 0x44110e 0x454165 0x454134 0x475325 0x486b08 0x10a161a 0x10a1465 0x2470e67 0x7395a9 0x20e61af 0x20e5f1f 0x7395a9 0x22f781c 0x7395a9 0x22f89a5 0x7395a9 0x22f6e82 0x7395a9 0x22f49a2 0x7395a9 0x2206e45 0x7395a9 0x22f4d9c 0x7395a9 0x210ba06 0x7395a9 0x23089c2 0x7395a9 0x22f86e9 0x7395a9 0xd42582 0x2106c04
# 0x475324 sync.runtime_Semacquire+0x24 runtime/sema.go:62
# 0x486b07 sync.(*WaitGroup).Wait+0x47 sync/waitgroup.go:116
# 0x10a1619 github.com/minio/minio/internal/grid.(*Connection).reconnected+0xb9 github.com/minio/minio/internal/grid/connection.go:857
# 0x10a1464 github.com/minio/minio/internal/grid.(*Connection).handleIncoming+0x384 github.com/minio/minio/internal/grid/connection.go:825
```
Add a queue cleaner in reconnected that will pop old messages so `handleMessages` can
send messages without blocking and exit appropriately for the connection to be re-established.
Messages are likely dropped by the remote, but we may have some that can succeed,
so we only drop when running out of space.
This change uses the updated ldap library in minio/pkg (bumped
up to v3). A new config parameter is added for LDAP configuration to
specify extra user attributes to load from the LDAP server and to store
them as additional claims for the user.
A test is added in sts_handlers.go that shows how to access the LDAP
attributes as a claim.
This is in preparation for adding SSH pubkey authentication to MinIO's SFTP
integration.
Do not log errors on oneway streams when sending ping fails. Instead, cancel the stream.
This also makes sure pings are sent when blocked on sending responses.
LastPong is saved as nanoseconds after a connection or reconnection but
saved as seconds when receiving a pong message. The code deciding if
a pong is too old can be skewed since it assumes LastPong is only in
seconds.
Create new code paths for multiple subsystems in the code. This will
make maintaing this easier later.
Also introduce bugLogIf() for errors that should not happen in the first
place.
Use `ODirectPoolSmall` buffers for inline data in PutObject.
Add a separate call for inline data that will fetch a buffer for the inline data before unmarshal.
Fix races in IAM cache
Fixes#19344
On the top level we only grab a read lock, but we write to the cache if we manage to fetch it.
a03dac41eb/cmd/iam-store.go (L446) is also flipped to what it should be AFAICT.
Change the internal cache structure to a concurrency safe implementation.
Bonus: Also switch grid implementation.
Streams can return errors if the cancelation is picked up before the response
stream close is picked up. Under extreme load, this could lead to missing
responses.
Send server mux ack async so a blocked send cannot block newMuxStream
call. Stream will not progress until mux has been acked.
in k8s things really do come online very asynchronously,
we need to use implementation that allows this randomness.
To facilitate this move WriteAll() as part of the
websocket layer instead.
Bonus: avoid instances of dnscache usage on k8s
If network conditions have filled the output queue before a reconnect happens blocked sends could stop reconnects from happening. In short `respMu` would be held for a mux client while sending - if the queue is full this will never get released and closing the mux client will hang.
A) Use the mux client context instead of connection context for sends, so sends are unblocked when the mux client is canceled.
B) Use a `TryLock` on "close" and cancel the request if we cannot get the lock at once. This will unblock any attempts to send.
- bucket metadata does not need to look for legacy things
anymore if b.Created is non-zero
- stagger bucket metadata loads across lots of nodes to
avoid the current thundering herd problem.
- Remove deadlines for RenameData, RenameFile - these
calls should not ever be timed out and should wait
until completion or wait for client timeout. Do not
choose timeouts for applications during the WRITE phase.
- increase R/W buffer size, increase maxMergeMessages to 30
We have observed cases where a blocked stream will block for cancellations.
This happens when response channel is blocked and we want to push an error.
This will have the response mutex locked, which will prevent all other operations until upstream is unblocked.
Make this behavior non-blocking and if blocked spawn a goroutine that will send the response and close the output.
Still a lot of "dancing". Added a test for this and reviewed.
Do not rely on `connChange` to do reconnects.
Instead, you can block while the connection is running and reconnect
when handleMessages returns.
Add fully async monitoring instead of monitoring on the main goroutine
and keep this to avoid full network lockup.
Add separate reconnection mutex
Give more safety around reconnects and make sure a state change isn't missed.
Tested with several runs of `λ go test -race -v -count=500`
Adds separate mutex and doesn't mix in the testing mutex.
Race checks would occasionally show race on handleMsgWg WaitGroup by debug messages (used in test only).
Use the `connMu` mutex to protect this against concurrent Wait/Add.
Fixes#18827
When rejecting incoming grid requests fill out the rejection reason and log it once.
This will give more context when startup is failing. Already logged after a retry on caller.
`OpMuxConnectError` was not handled correctly.
Remove local checks for single request handlers so they can
run before being registered locally.
Bonus: Only log IAM bootstrap on startup.
This PR adds a WebSocket grid feature that allows servers to communicate via
a single two-way connection.
There are two request types:
* Single requests, which are `[]byte => ([]byte, error)`. This is for efficient small
roundtrips with small payloads.
* Streaming requests which are `[]byte, chan []byte => chan []byte (and error)`,
which allows for different combinations of full two-way streams with an initial payload.
Only a single stream is created between two machines - and there is, as such, no
server/client relation since both sides can initiate and handle requests. Which server
initiates the request is decided deterministically on the server names.
Requests are made through a mux client and server, which handles message
passing, congestion, cancelation, timeouts, etc.
If a connection is lost, all requests are canceled, and the calling server will try
to reconnect. Registered handlers can operate directly on byte
slices or use a higher-level generics abstraction.
There is no versioning of handlers/clients, and incompatible changes should
be handled by adding new handlers.
The request path can be changed to a new one for any protocol changes.
First, all servers create a "Manager." The manager must know its address
as well as all remote addresses. This will manage all connections.
To get a connection to any remote, ask the manager to provide it given
the remote address using.
```
func (m *Manager) Connection(host string) *Connection
```
All serverside handlers must also be registered on the manager. This will
make sure that all incoming requests are served. The number of in-flight
requests and responses must also be given for streaming requests.
The "Connection" returned manages the mux-clients. Requests issued
to the connection will be sent to the remote.
* `func (c *Connection) Request(ctx context.Context, h HandlerID, req []byte) ([]byte, error)`
performs a single request and returns the result. Any deadline provided on the request is
forwarded to the server, and canceling the context will make the function return at once.
* `func (c *Connection) NewStream(ctx context.Context, h HandlerID, payload []byte) (st *Stream, err error)`
will initiate a remote call and send the initial payload.
```Go
// A Stream is a two-way stream.
// All responses *must* be read by the caller.
// If the call is canceled through the context,
//The appropriate error will be returned.
type Stream struct {
// Responses from the remote server.
// Channel will be closed after an error or when the remote closes.
// All responses *must* be read by the caller until either an error is returned or the channel is closed.
// Canceling the context will cause the context cancellation error to be returned.
Responses <-chan Response
// Requests sent to the server.
// If the handler is defined with 0 incoming capacity this will be nil.
// Channel *must* be closed to signal the end of the stream.
// If the request context is canceled, the stream will no longer process requests.
Requests chan<- []byte
}
type Response struct {
Msg []byte
Err error
}
```
There are generic versions of the server/client handlers that allow the use of type
safe implementations for data types that support msgpack marshal/unmarshal.