minio

Commit Graph

Author	SHA1	Message	Date
Klaus Post	51aa59a737	perf: websocket grid connectivity for all internode communication (#18461 ) This PR adds a WebSocket grid feature that allows servers to communicate via a single two-way connection. There are two request types: * Single requests, which are `[]byte => ([]byte, error)`. This is for efficient small roundtrips with small payloads. * Streaming requests which are `[]byte, chan []byte => chan []byte (and error)`, which allows for different combinations of full two-way streams with an initial payload. Only a single stream is created between two machines - and there is, as such, no server/client relation since both sides can initiate and handle requests. Which server initiates the request is decided deterministically on the server names. Requests are made through a mux client and server, which handles message passing, congestion, cancelation, timeouts, etc. If a connection is lost, all requests are canceled, and the calling server will try to reconnect. Registered handlers can operate directly on byte slices or use a higher-level generics abstraction. There is no versioning of handlers/clients, and incompatible changes should be handled by adding new handlers. The request path can be changed to a new one for any protocol changes. First, all servers create a "Manager." The manager must know its address as well as all remote addresses. This will manage all connections. To get a connection to any remote, ask the manager to provide it given the remote address using. ``` func (m Manager) Connection(host string) Connection ``` All serverside handlers must also be registered on the manager. This will make sure that all incoming requests are served. The number of in-flight requests and responses must also be given for streaming requests. The "Connection" returned manages the mux-clients. Requests issued to the connection will be sent to the remote. * `func (c Connection) Request(ctx context.Context, h HandlerID, req []byte) ([]byte, error)` performs a single request and returns the result. Any deadline provided on the request is forwarded to the server, and canceling the context will make the function return at once. `func (c Connection) NewStream(ctx context.Context, h HandlerID, payload []byte) (st Stream, err error)` will initiate a remote call and send the initial payload. ```Go // A Stream is a two-way stream. // All responses must be read by the caller. // If the call is canceled through the context, //The appropriate error will be returned. type Stream struct { // Responses from the remote server. // Channel will be closed after an error or when the remote closes. // All responses must be read by the caller until either an error is returned or the channel is closed. // Canceling the context will cause the context cancellation error to be returned. Responses <-chan Response // Requests sent to the server. // If the handler is defined with 0 incoming capacity this will be nil. // Channel must be closed to signal the end of the stream. // If the request context is canceled, the stream will no longer process requests. Requests chan<- []byte } type Response struct { Msg []byte Err error } ``` There are generic versions of the server/client handlers that allow the use of type safe implementations for data types that support msgpack marshal/unmarshal.	2023-11-20 17:09:35 -08:00
Harshavardhana	3b7781835e	add lock metrics per node (#16943 )	2023-04-03 21:23:24 -07:00
Klaus Post	8b0ab6ead6	Revert "Make localLocker lock attempts cancellable (#16510 )" (#16884 )	2023-03-23 10:26:21 -07:00
Klaus Post	cdb1b48ad9	Make localLocker lock attempts cancellable (#16510 )	2023-01-31 09:41:17 -08:00
yudoutingle	f4c56026a2	fix: potential deadLock caused by unlocking a non-existing lock (#15635 )	2022-09-02 14:24:32 -07:00
Klaus Post	a2a48cc065	Optimize read locker cleanup (#14200 ) When objects hold a lot of read locks cleanup time grows exponentially. ``` BEFORE: Unable to complete tests. AFTER: === RUN Test_localLocker_expireOldLocksExpire/100-locks/1-read local-locker_test.go:298: Scan Took: 0s. Left: 100/100 local-locker_test.go:317: Expire 50% took: 0s. Left: 44/44 local-locker_test.go:331: Expire rest took: 0s. Left: 0/0 === RUN Test_localLocker_expireOldLocksExpire/100-locks/100-read local-locker_test.go:298: Scan Took: 0s. Left: 10000/100 local-locker_test.go:317: Expire 50% took: 1ms. Left: 5000/100 local-locker_test.go:331: Expire rest took: 1ms. Left: 0/0 === RUN Test_localLocker_expireOldLocksExpire/100-locks/1000-read local-locker_test.go:298: Scan Took: 2ms. Left: 100000/100 local-locker_test.go:317: Expire 50% took: 55ms. Left: 50038/100 local-locker_test.go:331: Expire rest took: 29ms. Left: 0/0 === RUN Test_localLocker_expireOldLocksExpire/10000-locks/1-read local-locker_test.go:298: Scan Took: 1ms. Left: 10000/10000 local-locker_test.go:317: Expire 50% took: 2ms. Left: 5019/5019 local-locker_test.go:331: Expire rest took: 2ms. Left: 0/0 === RUN Test_localLocker_expireOldLocksExpire/10000-locks/100-read local-locker_test.go:298: Scan Took: 23ms. Left: 1000000/10000 local-locker_test.go:317: Expire 50% took: 160ms. Left: 499798/10000 local-locker_test.go:331: Expire rest took: 138ms. Left: 0/0 === RUN Test_localLocker_expireOldLocksExpire/10000-locks/1000-read local-locker_test.go:298: Scan Took: 200ms. Left: 10000000/10000 local-locker_test.go:317: Expire 50% took: 5.888s. Left: 5000196/10000 local-locker_test.go:331: Expire rest took: 3.417s. Left: 0/0 === RUN Test_localLocker_expireOldLocksExpire/1000000-locks/1-read local-locker_test.go:298: Scan Took: 133ms. Left: 1000000/1000000 local-locker_test.go:317: Expire 50% took: 348ms. Left: 500255/500255 local-locker_test.go:331: Expire rest took: 307ms. Left: 0/0 ```	2022-01-27 14:10:57 -08:00
Klaus Post	7db05a80dd	locking: Fix wrong map id (#14184 ) Wrong resource is being fetched, since idx is incremented, but mapID is reused. Regression caused by #13454 - that part didn't optimize anything anyway.	2022-01-26 08:34:09 -08:00
Harshavardhana	f527c708f2	run gofumpt cleanup across code-base (#14015 )	2022-01-02 09:15:06 -08:00
Klaus Post	9afdbe3648	fix: RLock UID memory leak (#13607 ) UID were misnamed in RLock, leading to memory buildup. Regression in #13430	2021-11-08 07:35:50 -08:00
Harshavardhana	44e4bdc6f4	restrict multi object delete > 1000 objects (#13454 ) AWS S3 returns error if > 1000 objects are sent per MultiObject delete request, we should comply no reason to not comply.	2021-10-18 08:38:33 -07:00
Klaus Post	779060bc16	Locker: Improve Refresh speed (#13430 ) Refresh was doing a linear scan of all locked resources. This was adding up to significant delays in locking on high load systems with long running requests. Add a secondary index for O(log(n)) UID -> resource lookups. Multiple resources are stored in consecutive strings. Bonus fixes: * On multiple Unlock entries unlock the write locks we can. * Fix `expireOldLocks` skipping checks on entry after expiring one. * Return fast on canTakeUnlock/canTakeLock. * Prealloc some places.	2021-10-15 03:12:13 -07:00
Anis Elleuch	e05886561d	lock: Fix Refresh logic with multi resources lock (#13092 ) A multi resources lock is a single lock UID with multiple associated resources. This is created for example by multi objects delete operation. This commit changes the behavior of Refresh() to iterate over all locks having the same UID and refresh them. Bonus: Fix showing top locks for multi delete objects	2021-08-27 13:07:55 -07:00
Anis Elleuch	06b71c99ee	locks: Ensure local lock removal after a failed refresh (#12979 ) In the event when a lock is not refreshed in the cluster, this latter will be automatically removed in the subsequent cleanup of non refreshed locks routine, but it forgot to clean the local server, hence having the same weird stale locks present. This commit will remove the lock locally also in remote nodes, if removing a lock from a remote node will fail, it will be anyway removed later in the locks cleanup routine.	2021-08-27 08:59:36 -07:00
Harshavardhana	1f262daf6f	rename all remaining packages to internal/ (#12418 ) This is to ensure that there are no projects that try to import `minio/minio/pkg` into their own repo. Any such common packages should go to `https://github.com/minio/pkg`	2021-06-01 14:59:40 -07:00
Anis Elleuch	0b34dfb479	lock: Timeout Unlock RPC call (#12213 ) RPC unlock call needs to be timed out otherwise this can block indefinitely. Signed-off-by: Anis Elleuch <anis@min.io>	2021-05-11 02:11:29 -07:00
Harshavardhana	069432566f	update license change for MinIO Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-23 11:58:53 -07:00
Anis Elleuch	7be7109471	locking: Add Refresh for better locking cleanup (#11535 ) Co-authored-by: Anis Elleuch <anis@min.io> Co-authored-by: Harshavardhana <harsha@minio.io>	2021-03-03 18:36:43 -08:00
Harshavardhana	88c1bb0720	fix: improper ticker usage in goroutines (#11468 ) - lock maintenance loop was incorrectly sleeping as well as using ticker badly, leading to extra expiration routines getting triggered that could flood the network. - multipart upload cleanup should be based on timer instead of ticker, to ensure that long running jobs don't get triggered twice. - make sure to get right lockers for object name	2021-02-05 19:23:48 -08:00
Harshavardhana	da55a05587	fix aggressive expiration detection (#11446 ) for some flaky networks this may be too fast of a value choose a defensive value, and let this be addressed properly in a new refactor of dsync with renewal logic. Also enable faster fallback delay to cater for misconfigured IPv6 servers refer - https://golang.org/pkg/net/#Dialer - https://tools.ietf.org/html/rfc6555	2021-02-04 16:56:40 -08:00
Harshavardhana	9cdd981ce7	fix: expire locks only on participating lockers (#11335 ) additionally also add a new ForceUnlock API, to allow forcibly unlocking locks if possible.	2021-01-25 10:01:27 -08:00
Harshavardhana	a4f6705874	expire stale locks when owner is down (#11247 ) fixes #11246	2021-01-07 19:16:18 -08:00
Harshavardhana	4550ac6fff	fix: refactor locks to apply them uniquely per node (#11052 ) This refactor is done for few reasons below - to avoid deadlocks in scenarios when number of nodes are smaller < actual erasure stripe count where in N participating local lockers can lead to deadlocks across systems. - avoids expiry routines to run 1000 of separate network operations and routes per disk where as each of them are still accessing one single local entity. - it is ideal to have since globalLockServer per instance. - In a 32node deployment however, each server group is still concentrated towards the same set of lockers that partipicate during the write/read phase, unlike previous minio/dsync implementation - this potentially avoids send 32 requests instead we will still send at max requests of unique nodes participating in a write/read phase. - reduces overall chattiness on smaller setups.	2020-12-10 07:28:37 -08:00
Harshavardhana	029758cb20	fix: retain the previous UUID for newly replaced drives (#10759 ) only newly replaced drives get the new `format.json`, this avoids disks reloading their in-memory reference format, ensures that drives are online without reloading the in-memory reference format. keeping reference format in-tact means UUIDs never change once they are formatted.	2020-10-26 10:29:29 -07:00
Harshavardhana	d9db7f3308	expire lockers if lockers are offline (#10749 ) lockers currently might leave stale lockers, in unknown ways waiting for downed lockers. locker check interval is high enough to safely cleanup stale locks.	2020-10-24 13:23:16 -07:00
Harshavardhana	736e58dd68	fix: handle concurrent lockers with multiple optimizations (#10640 ) - select lockers which are non-local and online to have affinity towards remote servers for lock contention - optimize lock retry interval to avoid sending too many messages during lock contention, reduces average CPU usage as well - if bucket is not set, when deleteObject fails make sure setPutObjHeaders() honors lifecycle only if bucket name is set. - fix top locks to list out always the oldest lockers always, avoid getting bogged down into map's unordered nature.	2020-10-08 12:32:32 -07:00
Harshavardhana	eafa775952	fix: add lock ownership to expire locks (#10571 ) - Add owner information for expiry, locking, unlocking a resource - TopLocks returns now locks in quorum by default, provides a way to capture stale locks as well with `?stale=true` - Simplify the quorum handling for locks to avoid from storage class, because there were challenges to make it consistent across all situations. - And other tiny simplifications to reset locks.	2020-09-25 19:21:52 -07:00
Harshavardhana	fe157166ca	fix: Pass context all the way down to the network call in lockers (#10161 ) Context timeout might race on each other when timeouts are lower i.e when two lock attempts happened very quickly on the same resource and the servers were yet trying to establish quorum. This situation can lead to locks held which wouldn't be unlocked and subsequent lock attempts would fail. This would require a complete server restart. A potential of this issue happening is when server is booting up and we are trying to hold a 'transaction.lock' in quick bursts of timeout.	2020-07-29 23:15:34 -07:00
Harshavardhana	ab7d3cd508	fix: Speed up multi-object delete by taking bulk locks (#8974 ) Change distributed locking to allow taking bulk locks across objects, reduces usually 1000 calls to 1. Also allows for situations where multiple clients sends delete requests to objects with following names ``` {1,2,3,4,5} ``` ``` {5,4,3,2,1} ``` will block and ensure that we do not fail the request on each other.	2020-02-21 11:29:57 +05:30
Harshavardhana	5aa5dcdc6d	lock: improve locker initialization at init (#8776 ) Use reference format to initialize lockers during startup, also handle `nil` for NetLocker in dsync and remove errorLocker implementation Add further tuning parameters such as - DialTimeout is now 15 seconds from 30 seconds - KeepAliveTimeout is not 20 seconds, 5 seconds more than default 15 seconds - ResponseHeaderTimeout to 10 seconds - ExpectContinueTimeout is reduced to 3 seconds - DualStack is enabled by default remove setting it to `true` - Reduce IdleConnTimeout to 30 seconds from 1 minute to avoid idleConn build up Fixes #8773	2020-01-10 02:35:06 -08:00
Harshavardhana	720442b1a2	Add lock expiry handler to expire state locks (#8562 )	2019-11-25 16:39:43 -08:00
Harshavardhana	347b29d059	Implement bucket expansion (#8509 )	2019-11-19 17:42:27 -08:00
Harshavardhana	e9b2bf00ad	Support MinIO to be deployed on more than 32 nodes (#8492 ) This PR implements locking from a global entity into a more localized set level entity, allowing for locks to be held only on the resources which are writing to a collection of disks rather than a global level. In this process this PR also removes the top-level limit of 32 nodes to an unlimited number of nodes. This is a precursor change before bring in bucket expansion.	2019-11-13 12:17:45 -08:00
Krishna Srinivas	338e9a9be9	Put object client disconnect (#7824 ) Fail putObject and postpolicy in case client prematurely disconnects Use request's context to cancel lock requests on client disconnects	2019-06-28 22:09:17 -07:00
kannappanr	5ecac91a55	Replace Minio refs in docs with MinIO and links (#7494 )	2019-04-09 11:39:42 -07:00
Harshavardhana	df35d7db9d	Introduce staticcheck for stricter builds (#7035 )	2019-02-13 18:29:36 +05:30
kannappanr	ce870466ff	Top Locks command implementation (#7052 ) API to list locks used in distributed XL mode	2019-01-24 07:22:14 -08:00
Bala FA	6a53dd1701	Implement HTTP POST based RPC (#5840 ) Added support for new RPC support using HTTP POST. RPC's arguments and reply are Gob encoded and sent as HTTP request/response body. This patch also removes Go RPC based implementation.	2018-06-06 14:21:56 +05:30

37 Commits