minio

mirror of https://github.com/minio/minio.git synced 2025-11-21 10:16:03 -05:00

Author	SHA1	Message	Date
Klaus Post	51aa59a737	perf: websocket grid connectivity for all internode communication (#18461 ) This PR adds a WebSocket grid feature that allows servers to communicate via a single two-way connection. There are two request types: * Single requests, which are `[]byte => ([]byte, error)`. This is for efficient small roundtrips with small payloads. * Streaming requests which are `[]byte, chan []byte => chan []byte (and error)`, which allows for different combinations of full two-way streams with an initial payload. Only a single stream is created between two machines - and there is, as such, no server/client relation since both sides can initiate and handle requests. Which server initiates the request is decided deterministically on the server names. Requests are made through a mux client and server, which handles message passing, congestion, cancelation, timeouts, etc. If a connection is lost, all requests are canceled, and the calling server will try to reconnect. Registered handlers can operate directly on byte slices or use a higher-level generics abstraction. There is no versioning of handlers/clients, and incompatible changes should be handled by adding new handlers. The request path can be changed to a new one for any protocol changes. First, all servers create a "Manager." The manager must know its address as well as all remote addresses. This will manage all connections. To get a connection to any remote, ask the manager to provide it given the remote address using. ``` func (m Manager) Connection(host string) Connection ``` All serverside handlers must also be registered on the manager. This will make sure that all incoming requests are served. The number of in-flight requests and responses must also be given for streaming requests. The "Connection" returned manages the mux-clients. Requests issued to the connection will be sent to the remote. * `func (c Connection) Request(ctx context.Context, h HandlerID, req []byte) ([]byte, error)` performs a single request and returns the result. Any deadline provided on the request is forwarded to the server, and canceling the context will make the function return at once. `func (c Connection) NewStream(ctx context.Context, h HandlerID, payload []byte) (st Stream, err error)` will initiate a remote call and send the initial payload. ```Go // A Stream is a two-way stream. // All responses must be read by the caller. // If the call is canceled through the context, //The appropriate error will be returned. type Stream struct { // Responses from the remote server. // Channel will be closed after an error or when the remote closes. // All responses must be read by the caller until either an error is returned or the channel is closed. // Canceling the context will cause the context cancellation error to be returned. Responses <-chan Response // Requests sent to the server. // If the handler is defined with 0 incoming capacity this will be nil. // Channel must be closed to signal the end of the stream. // If the request context is canceled, the stream will no longer process requests. Requests chan<- []byte } type Response struct { Msg []byte Err error } ``` There are generic versions of the server/client handlers that allow the use of type safe implementations for data types that support msgpack marshal/unmarshal.	2023-11-20 17:09:35 -08:00
Aditya Manthramurthy	1c99fb106c	Update to minio/pkg/v2 (#17967 )	2023-09-04 12:57:37 -07:00
Klaus Post	45a717a142	Avoid per request URL parsing (#17593 ) Every request does a `url.Parse(c.url.String())` to clone a URL. The host will also be static, so we rewrite that on creation.	2023-07-07 22:07:30 -07:00
Klaus Post	6e38d0f3ab	Add more bootstrap info in debug mode (#17362 )	2023-06-08 08:39:47 -07:00
Harshavardhana	2f9e2147f5	allow quota enforcement to rely on older values (#17351 ) PUT calls cannot afford to have large latency build-ups due to contentious usage.json, or worse letting them fail with some unexpected error, this can happen when this file is concurrently being updated via scanner or it is being healed during a disk replacement heal. However, these are fairly quick in theory, stressed clusters can quickly show visible latency this can add up leading to invalid errors returned during PUT. It is perhaps okay for us to relax this error return requirement instead, make sure that we log that we are proceeding to take in the requests while the quota is using an older value for the quota enforcement. These things will reconcile themselves eventually, via scanner making sure to overwrite the usage.json. Bonus: make sure that storage-rest-client sets ExpectTimeouts to be 'true', such that DiskInfo() call with contextTimeout does not prematurely disconnect the servers leading to a longer healthCheck, back-off routine. This can easily pile up while also causing active callers to disconnect, leading to quorum loss. DiskInfo is actively used in the PUT, Multipart call path for upgrading parity when disks are down, it in-turn shouldn't cause more disks to go down.	2023-06-05 16:56:35 -07:00
jiuker	fb5ce3b87a	record err time when remote node is offline (#17262 )	2023-05-30 10:07:26 -07:00
jiuker	b1b00a5055	fix: Avoid Income globalStats twice upon error (#17263 )	2023-05-22 07:42:27 -07:00
Anis Eleuch	4640b13c66	Use expontential backoff algo for internode reconnections (#17052 )	2023-05-02 12:35:52 -07:00
Anis Eleuch	0b7ca094e4	Remove Expect 100-continue in internode communications (#17061 )	2023-04-26 09:33:45 -07:00
Harshavardhana	d19cbc81b5	fix: do not return IAM/Bucket metadata replication errors to client (#16486 )	2023-01-26 11:11:54 -08:00
Anis Elleuch	932d2c3c62	Add X-Amz-Request-Id to internode calls (#16146 )	2022-12-06 09:27:26 -08:00
Klaus Post	ddeca9f12a	fix: filter rest errors and logs returned (#16019 )	2022-11-07 10:38:08 -08:00
Anis Elleuch	6287e8c571	fix: race when accessing REST TCP dial values (#15770 )	2022-09-29 09:27:58 -07:00
Anis Elleuch	048a46ec2a	Add RPC tcp timeout/errs and AVG duration to prometheus (#15747 )	2022-09-26 09:04:26 -07:00
Klaus Post	ff12080ff5	Remove deprecated io/ioutil (#15707 )	2022-09-19 11:05:16 -07:00
Anis Elleuch	4a92134235	prometheus: track errors during REST read/write calls (#15678 ) minio_inter_node_traffic_errors_total currently does not track requests body write/read errors of internode REST communications. This commit fixes this by wrapping resp.Body.	2022-09-12 12:40:51 -07:00
ebozduman	b57e7321e7	Replaces 'disk'=>'drive' visible to end user (#15464 )	2022-08-04 16:10:08 -07:00
Harshavardhana	5e763b71dc	use logger.LogOnce to reduce printing disconnection logs (#15408 ) fixes #15334 - re-use net/url parsed value for http.Request{} - remove gosimple, structcheck and unusued due to https://github.com/golangci/golangci-lint/issues/2649 - unwrapErrs upto leafErr to ensure that we store exactly the correct errors	2022-07-27 09:44:59 -07:00
Harshavardhana	785b429737	add reconnect duration allows for verifying disconnect intervals (#15306 )	2022-07-15 14:41:24 -07:00
Klaus Post	9004d69c6f	Make ReqInfo concurrency safe (#15204 ) Some read/writes of ReqInfo did not get appropriate locks, leading to races. Make sure reading and writing holds appropriate locks.	2022-06-30 10:48:50 -07:00
Harshavardhana	65b4b100a8	de-couple caller context to avoid internal races (#15195 ) ``` fatal error: concurrent map iteration and map write fatal error: concurrent map iteration and map write goroutine 745335841 [running]: runtime.throw({0x273e67b?, 0x80?}) runtime/panic.go:992 +0x71 fp=0xc0390bc240 sp=0xc0390bc210 pc=0x438671 runtime.mapiternext(0x40d987?) runtime/map.go:871 +0x4eb fp=0xc0390bc2b0 sp=0xc0390bc240 pc=0x41002b runtime.mapiterinit(0x46bec7?, 0x4ef76c?, 0xc0017cc9c0?) runtime/map.go:861 +0x228 fp=0xc0390bc2d0 sp=0xc0390bc2b0 pc=0x40fae8 reflect.mapiterinit(0x1b5?, 0xc0?, 0x235bcc0?) ``` ``` github.com/minio/minio/internal/rest/client.go:151 +0x5f4 fp=0xc0390bd988 sp=0xc0390bd730 pc=0x153e434 ```	2022-06-29 14:44:26 -07:00
Harshavardhana	1a56ebea70	cleanup dsync tests and remove net/rpc references (#14118 )	2022-01-18 12:44:38 -08:00
Harshavardhana	661b263e77	add gocritic/ruleguard checks back again, cleanup code. (#13665 ) - remove some duplicated code - reported a bug, separately fixed in #13664 - using strings.ReplaceAll() when needed - using filepath.ToSlash() use when needed - remove all non-Go style comments from the codebase Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>	2021-11-16 09:28:29 -08:00
Harshavardhana	ffd497673f	internode lockArgs should use messagepack (#13329 ) it would seem like using `bufio.Scan()` is very slow for heavy concurrent I/O, ie. when r.Body is slow , instead use a proper binary exchange format, to marshal and unmarshal the LockArgs datastructure in a cleaner way. this PR increases performance of the locking sub-system for tiny repeated read lock requests on same object. ``` BenchmarkLockArgs BenchmarkLockArgs-4 6417609 185.7 ns/op 56 B/op 2 allocs/op BenchmarkLockArgsOld BenchmarkLockArgsOld-4 1187368 1015 ns/op 4096 B/op 1 allocs/op ```	2021-09-30 11:53:01 -07:00
Harshavardhana	da74e2f167	move internal/net to pkg/net package (#12505 )	2021-06-14 14:54:37 -07:00
Anis Elleuch	6c8be64cdb	rest: healthcheck should not update failure metrics (#12458 ) Otherwise, we can see high numbers of networking issues when a node is down.	2021-06-08 14:09:26 -07:00
Harshavardhana	1f262daf6f	rename all remaining packages to internal/ (#12418 ) This is to ensure that there are no projects that try to import `minio/minio/pkg` into their own repo. Any such common packages should go to `https://github.com/minio/pkg`	2021-06-01 14:59:40 -07:00

27 Commits