This PR adds a WebSocket grid feature that allows servers to communicate via
a single two-way connection.
There are two request types:
* Single requests, which are `[]byte => ([]byte, error)`. This is for efficient small
roundtrips with small payloads.
* Streaming requests which are `[]byte, chan []byte => chan []byte (and error)`,
which allows for different combinations of full two-way streams with an initial payload.
Only a single stream is created between two machines - and there is, as such, no
server/client relation since both sides can initiate and handle requests. Which server
initiates the request is decided deterministically on the server names.
Requests are made through a mux client and server, which handles message
passing, congestion, cancelation, timeouts, etc.
If a connection is lost, all requests are canceled, and the calling server will try
to reconnect. Registered handlers can operate directly on byte
slices or use a higher-level generics abstraction.
There is no versioning of handlers/clients, and incompatible changes should
be handled by adding new handlers.
The request path can be changed to a new one for any protocol changes.
First, all servers create a "Manager." The manager must know its address
as well as all remote addresses. This will manage all connections.
To get a connection to any remote, ask the manager to provide it given
the remote address using.
```
func (m *Manager) Connection(host string) *Connection
```
All serverside handlers must also be registered on the manager. This will
make sure that all incoming requests are served. The number of in-flight
requests and responses must also be given for streaming requests.
The "Connection" returned manages the mux-clients. Requests issued
to the connection will be sent to the remote.
* `func (c *Connection) Request(ctx context.Context, h HandlerID, req []byte) ([]byte, error)`
performs a single request and returns the result. Any deadline provided on the request is
forwarded to the server, and canceling the context will make the function return at once.
* `func (c *Connection) NewStream(ctx context.Context, h HandlerID, payload []byte) (st *Stream, err error)`
will initiate a remote call and send the initial payload.
```Go
// A Stream is a two-way stream.
// All responses *must* be read by the caller.
// If the call is canceled through the context,
//The appropriate error will be returned.
type Stream struct {
// Responses from the remote server.
// Channel will be closed after an error or when the remote closes.
// All responses *must* be read by the caller until either an error is returned or the channel is closed.
// Canceling the context will cause the context cancellation error to be returned.
Responses <-chan Response
// Requests sent to the server.
// If the handler is defined with 0 incoming capacity this will be nil.
// Channel *must* be closed to signal the end of the stream.
// If the request context is canceled, the stream will no longer process requests.
Requests chan<- []byte
}
type Response struct {
Msg []byte
Err error
}
```
There are generic versions of the server/client handlers that allow the use of type
safe implementations for data types that support msgpack marshal/unmarshal.
Wrong resource is being fetched, since idx is incremented, but mapID is reused.
Regression caused by #13454 - that part didn't optimize anything anyway.
Refresh was doing a linear scan of all locked resources. This was adding
up to significant delays in locking on high load systems with long
running requests.
Add a secondary index for O(log(n)) UID -> resource lookups.
Multiple resources are stored in consecutive strings.
Bonus fixes:
* On multiple Unlock entries unlock the write locks we can.
* Fix `expireOldLocks` skipping checks on entry after expiring one.
* Return fast on canTakeUnlock/canTakeLock.
* Prealloc some places.
A multi resources lock is a single lock UID with multiple associated
resources. This is created for example by multi objects delete
operation. This commit changes the behavior of Refresh() to iterate over
all locks having the same UID and refresh them.
Bonus: Fix showing top locks for multi delete objects
In the event when a lock is not refreshed in the cluster, this latter
will be automatically removed in the subsequent cleanup of non
refreshed locks routine, but it forgot to clean the local server,
hence having the same weird stale locks present.
This commit will remove the lock locally also in remote nodes, if
removing a lock from a remote node will fail, it will be anyway
removed later in the locks cleanup routine.
This is to ensure that there are no projects
that try to import `minio/minio/pkg` into
their own repo. Any such common packages should
go to `https://github.com/minio/pkg`
- lock maintenance loop was incorrectly sleeping
as well as using ticker badly, leading to
extra expiration routines getting triggered
that could flood the network.
- multipart upload cleanup should be based on
timer instead of ticker, to ensure that long
running jobs don't get triggered twice.
- make sure to get right lockers for object name
for some flaky networks this may be too fast of a value
choose a defensive value, and let this be addressed
properly in a new refactor of dsync with renewal logic.
Also enable faster fallback delay to cater for misconfigured
IPv6 servers
refer
- https://golang.org/pkg/net/#Dialer
- https://tools.ietf.org/html/rfc6555
This refactor is done for few reasons below
- to avoid deadlocks in scenarios when number
of nodes are smaller < actual erasure stripe
count where in N participating local lockers
can lead to deadlocks across systems.
- avoids expiry routines to run 1000 of separate
network operations and routes per disk where
as each of them are still accessing one single
local entity.
- it is ideal to have since globalLockServer
per instance.
- In a 32node deployment however, each server
group is still concentrated towards the
same set of lockers that partipicate during
the write/read phase, unlike previous minio/dsync
implementation - this potentially avoids send
32 requests instead we will still send at max
requests of unique nodes participating in a
write/read phase.
- reduces overall chattiness on smaller setups.
only newly replaced drives get the new `format.json`,
this avoids disks reloading their in-memory reference
format, ensures that drives are online without
reloading the in-memory reference format.
keeping reference format in-tact means UUIDs
never change once they are formatted.
lockers currently might leave stale lockers,
in unknown ways waiting for downed lockers.
locker check interval is high enough to safely
cleanup stale locks.
- select lockers which are non-local and online to have
affinity towards remote servers for lock contention
- optimize lock retry interval to avoid sending too many
messages during lock contention, reduces average CPU
usage as well
- if bucket is not set, when deleteObject fails make sure
setPutObjHeaders() honors lifecycle only if bucket name
is set.
- fix top locks to list out always the oldest lockers always,
avoid getting bogged down into map's unordered nature.
- Add owner information for expiry, locking, unlocking a resource
- TopLocks returns now locks in quorum by default, provides
a way to capture stale locks as well with `?stale=true`
- Simplify the quorum handling for locks to avoid from storage
class, because there were challenges to make it consistent
across all situations.
- And other tiny simplifications to reset locks.
Context timeout might race on each other when timeouts are lower
i.e when two lock attempts happened very quickly on the same resource
and the servers were yet trying to establish quorum.
This situation can lead to locks held which wouldn't be unlocked
and subsequent lock attempts would fail.
This would require a complete server restart. A potential of this
issue happening is when server is booting up and we are trying
to hold a 'transaction.lock' in quick bursts of timeout.
Change distributed locking to allow taking bulk locks
across objects, reduces usually 1000 calls to 1.
Also allows for situations where multiple clients sends
delete requests to objects with following names
```
{1,2,3,4,5}
```
```
{5,4,3,2,1}
```
will block and ensure that we do not fail the request
on each other.
Use reference format to initialize lockers
during startup, also handle `nil` for NetLocker
in dsync and remove *errorLocker* implementation
Add further tuning parameters such as
- DialTimeout is now 15 seconds from 30 seconds
- KeepAliveTimeout is not 20 seconds, 5 seconds
more than default 15 seconds
- ResponseHeaderTimeout to 10 seconds
- ExpectContinueTimeout is reduced to 3 seconds
- DualStack is enabled by default remove setting
it to `true`
- Reduce IdleConnTimeout to 30 seconds from
1 minute to avoid idleConn build up
Fixes#8773
This PR implements locking from a global entity into
a more localized set level entity, allowing for locks
to be held only on the resources which are writing
to a collection of disks rather than a global level.
In this process this PR also removes the top-level
limit of 32 nodes to an unlimited number of nodes. This
is a precursor change before bring in bucket expansion.
Added support for new RPC support using HTTP POST. RPC's
arguments and reply are Gob encoded and sent as HTTP
request/response body.
This patch also removes Go RPC based implementation.