minio

mirror of https://github.com/minio/minio.git synced 2025-11-22 10:37:42 -05:00

Author	SHA1	Message	Date
Klaus Post	e8a476ef5a	Keep larger merge buffers for RPC (#20654 ) Keep larger merge buffers When sending large messages >1K, the merge buffer would continuously be reallocated. This could happen on listings, where blocks are typically 4->8K. Keep merge buffer of up to 256KB. Benchmark with 4096b messages: ``` benchmark old ns/op new ns/op delta BenchmarkRequests/servers=2/bytes/par=32-32 8271 6360 -23.10% BenchmarkRequests/servers=2/bytes/par=64-32 7840 4731 -39.66% BenchmarkRequests/servers=2/bytes/par=128-32 7291 4740 -34.99% BenchmarkRequests/servers=2/bytes/par=256-32 7095 4580 -35.45% BenchmarkRequests/servers=2/bytes/par=512-32 6757 4584 -32.16% BenchmarkRequests/servers=2/bytes/par=1024-32 6429 4453 -30.74% benchmark old bytes new bytes delta BenchmarkRequests/servers=2/bytes/par=32-32 12090 821 -93.21% BenchmarkRequests/servers=2/bytes/par=64-32 17423 820 -95.29% BenchmarkRequests/servers=2/bytes/par=128-32 18493 822 -95.56% BenchmarkRequests/servers=2/bytes/par=256-32 18892 821 -95.65% BenchmarkRequests/servers=2/bytes/par=512-32 19064 826 -95.67% BenchmarkRequests/servers=2/bytes/par=1024-32 19038 842 -95.58% ```	2024-11-16 09:18:48 -08:00
Klaus Post	b5177993b3	Make DeadlineConn http.Listener compatible (#20635 ) HTTP likes to slap an infinite read deadline on a connection and do a blocking read while the response is being written. This effectively means that a reading deadline becomes the request-response deadline. Instead of enforcing our timeout, we pass it through and keep "infinite deadline" is sticky on connections. However, we still "record" when reads are aborted, so we never overwrite that. The HTTP server should have `ReadTimeout` and `IdleTimeout` set for the deadline to be effective. Use --idle-timeout for incoming connections.	2024-11-12 12:41:41 -08:00
Harshavardhana	8c9ab85cfa	Add multipart uploads cache for ListMultipartUploads() (#20407 ) this cache will be honored only when `prefix=""` while performing ListMultipartUploads() operation. This is mainly to satisfy applications like alluxio for their underfs implementation and tests. replaces https://github.com/minio/minio/pull/20181	2024-09-09 09:58:30 -07:00
Klaus Post	3ffeabdfcb	Fix govet+staticcheck issues (#20263 ) This is better: https://github.com/golang/go/issues/60529	2024-08-14 10:11:51 -07:00
Harshavardhana	2e0fd2cba9	implement a safer completeMultipart implementation (#20227 ) - optimize writing part.N.meta by writing both part.N and its meta in sequence without network component. - remove part.N.meta, part.N which were partially success ful, in quorum loss situations during renamePart() - allow for strict read quorum check arbitrated via ETag for the given part number, this makes it double safer upon final commit. - return an appropriate error when read quorum is missing, instead of returning InvalidPart{}, which is non-retryable error. This kind of situation can happen when many nodes are going offline in rotation, an example of such a restart() behavior is statefulset updates in k8s. fixes #20091	2024-08-12 01:38:15 -07:00
Harshavardhana	a17f14f73a	separate lock from common grid to avoid epoll contention (#20180 ) epoll contention on TCP causes latency build-up when we have high volume ingress. This PR is an attempt to relieve this pressure. upstream issue https://github.com/golang/go/issues/65064 It seems to be a deeper problem; haven't yet tried the fix provide in this issue, but however this change without changing the compiler helps. Of course, this is a workaround for now, hoping for a more comprehensive fix from Go runtime.	2024-07-29 11:10:04 -07:00
Klaus Post	59788e25c7	Update connection deadlines less frequently (#20166 ) Only set write deadline on connections every second. Combine the 2 write locations into 1.	2024-07-26 10:40:11 -07:00
Klaus Post	15b609ecea	Expose RPC reconnections and ping time (#20157 ) - Keeps track of reconnection count. - Keeps track of connection ping roundtrip times. Sends timestamp in ping message. - Allow ping without payload.	2024-07-25 14:07:21 -07:00
Klaus Post	c0e2886e37	Tweak grid for less writes (#20129 ) Use `runtime.Gosched()` if we have less than maxMergeMessages and the queue is empty. Up maxMergeMessages to 50 to merge more messages into a single write. Add length check for an early bailout on readAllInto when we know packet length.	2024-07-23 03:28:14 -07:00
Harshavardhana	8e618d45fc	remove unnecessary LRU for internode auth token (#20119 ) removes contentious usage of mutexes in LRU, which were never really reused in any manner; we do not need it. To trust hosts, the correct way is TLS certs; this PR completely removes this dependency, which has never been useful. ``` 0 0% 100% 25.83s 26.76% github.com/hashicorp/golang-lru/v2/expirable.(LRU[...]) 0 0% 100% 28.03s 29.04% github.com/hashicorp/golang-lru/v2/expirable.(LRU[...]) ``` Bonus: use `x-minio-time` as a nanosecond to avoid unnecessary parsing logic of time strings instead of using a more straightforward mechanism.	2024-07-22 00:04:48 -07:00
Klaus Post	ded373e600	Split handleMessages (cosmetic) (#20095 ) Split the read and write sides of handleMessages into two separate functions Cosmetic. The only non-copy-and-paste change is that `cancel(ErrDisconnected)` is moved into the defer on `readStream`.	2024-07-15 12:02:30 -07:00
Klaus Post	0d0b0aa599	Abstract grid connections (#20038 ) Add `ConnDialer` to abstract connection creation. - `IncomingConn(ctx context.Context, conn net.Conn)` is provided as an entry point for incoming custom connections. - `ConnectWS` is provided to create web socket connections.	2024-07-08 14:44:00 -07:00
Klaus Post	3415c4dd1e	Fix reconnected deadlock with full queue (#19964 ) When a reconnection happens, `handleMessages` must be able to complete and exit. This can be prevented in a full queue. Deadlock chain (May 10th release) ``` 1 @ 0x44110e 0x453125 0x109f88c 0x109f7d5 0x10a472c 0x10a3f72 0x10a34ed 0x4795e1 # 0x109f88b github.com/minio/minio/internal/grid.(Connection).send+0x3eb github.com/minio/minio/internal/grid/connection.go:548 # 0x109f7d4 github.com/minio/minio/internal/grid.(Connection).queueMsg+0x334 github.com/minio/minio/internal/grid/connection.go:586 # 0x10a472b github.com/minio/minio/internal/grid.(Connection).handleAckMux+0xab github.com/minio/minio/internal/grid/connection.go:1284 # 0x10a3f71 github.com/minio/minio/internal/grid.(Connection).handleMsg+0x231 github.com/minio/minio/internal/grid/connection.go:1211 # 0x10a34ec github.com/minio/minio/internal/grid.(Connection).handleMessages.func1+0x6cc github.com/minio/minio/internal/grid/connection.go:1019 ---> blocks ---> via (Connection).handleMsgWg 1 @ 0x44110e 0x454165 0x454134 0x475325 0x486b08 0x10a161a 0x10a1465 0x2470e67 0x7395a9 0x20e61af 0x20e5f1f 0x7395a9 0x22f781c 0x7395a9 0x22f89a5 0x7395a9 0x22f6e82 0x7395a9 0x22f49a2 0x7395a9 0x2206e45 0x7395a9 0x22f4d9c 0x7395a9 0x210ba06 0x7395a9 0x23089c2 0x7395a9 0x22f86e9 0x7395a9 0xd42582 0x2106c04 # 0x475324 sync.runtime_Semacquire+0x24 runtime/sema.go:62 # 0x486b07 sync.(WaitGroup).Wait+0x47 sync/waitgroup.go:116 # 0x10a1619 github.com/minio/minio/internal/grid.(Connection).reconnected+0xb9 github.com/minio/minio/internal/grid/connection.go:857 # 0x10a1464 github.com/minio/minio/internal/grid.(Connection).handleIncoming+0x384 github.com/minio/minio/internal/grid/connection.go:825 ``` Add a queue cleaner in reconnected that will pop old messages so `handleMessages` can send messages without blocking and exit appropriately for the connection to be re-established. Messages are likely dropped by the remote, but we may have some that can succeed, so we only drop when running out of space.	2024-06-20 16:11:40 -07:00
Klaus Post	d2eed44c78	Fix replication checksum transfer (#19906 ) Compression will be disabled by default if SSE-C is specified. So we can still honor SSE-C.	2024-06-10 10:40:33 -07:00
Anis Eleuch	789cbc6fb2	heal: Dangling check to evaluate object parts separately (#19797 )	2024-06-10 08:51:27 -07:00
Klaus Post	f00187033d	Two way streams for upcoming locking enhancements (#19796 )	2024-06-07 08:51:52 -07:00
Klaus Post	e72429c79c	Add sizes to traces (#19851 ) added to storage and grid traces. Can provide more context for traces that aren't HTTP. Others may apply.	2024-05-31 22:17:37 -07:00
Klaus Post	c5b3f5553f	Add per connection RPC metrics (#19852 ) Provides individual and aggregate stats for each RPC connection. Example: ``` "rpc": { "collectedAt": "2024-05-31T14:33:29.1373103+02:00", "connected": 30, "disconnected": 0, "outgoingStreams": 69, "incomingStreams": 0, "outgoingBytes": 174822796, "incomingBytes": 175821566, "outgoingMessages": 768595, "incomingMessages": 768589, "outQueue": 0, "lastPongTime": "2024-05-31T12:33:28Z", "byDestination": { "http://127.0.0.1:9001": { "collectedAt": "2024-05-31T14:33:29.1373103+02:00", "connected": 5, "disconnected": 0, "outgoingStreams": 2, "incomingStreams": 0, "outgoingBytes": 38432543, "incomingBytes": 66604052, "outgoingMessages": 229496, "incomingMessages": 229575, "outQueue": 0, "lastPongTime": "2024-05-31T12:33:27Z" }, "http://127.0.0.1:9002": { "collectedAt": "2024-05-31T14:33:29.1373103+02:00", "connected": 5, "disconnected": 0, "outgoingStreams": 6, "incomingStreams": 0, "outgoingBytes": 38215680, "incomingBytes": 66121283, "outgoingMessages": 228525, "incomingMessages": 228510, "outQueue": 0, "lastPongTime": "2024-05-31T12:33:27Z" }, ... ```	2024-05-31 22:16:24 -07:00
Aditya Manthramurthy	5f78691fcf	ldap: Add user DN attributes list config param (#19758 ) This change uses the updated ldap library in minio/pkg (bumped up to v3). A new config parameter is added for LDAP configuration to specify extra user attributes to load from the LDAP server and to store them as additional claims for the user. A test is added in sts_handlers.go that shows how to access the LDAP attributes as a claim. This is in preparation for adding SSH pubkey authentication to MinIO's SFTP integration.	2024-05-24 16:05:23 -07:00
Shubhendu	7c7650b7c3	Add sufficient deadlines and countermeasures to handle hung node scenario (#19688 ) Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io> Signed-off-by: Harshavardhana <harsha@minio.io>	2024-05-22 16:07:14 -07:00
Harshavardhana	ae14681c3e	Revert "Fix two-way stream cancelation and pings (#19763 )" This reverts commit `4d698841f4`.	2024-05-22 03:00:00 -07:00
Klaus Post	4d698841f4	Fix two-way stream cancelation and pings (#19763 ) Do not log errors on oneway streams when sending ping fails. Instead, cancel the stream. This also makes sure pings are sent when blocked on sending responses.	2024-05-22 01:25:25 -07:00
Harshavardhana	0b3eb7f218	add more deadlines and pass around context under most situations (#19752 )	2024-05-15 15:19:00 -07:00
Klaus Post	6d3e0c7db6	Tweak one way stream ping (#19743 ) Do not log errors on oneway streams when sending ping fails. Instead cancel the stream. This also makes sure pings are sent when blocked on sending responses. I will do a separate PR that includes this and adds pings to two-way streams as well as tests for pings.	2024-05-15 08:39:21 -07:00
Anis Eleuch	67bd71b7a5	grid: Fix a window of a disconnected node not marked as offline (#19703 ) LastPong is saved as nanoseconds after a connection or reconnection but saved as seconds when receiving a pong message. The code deciding if a pong is too old can be skewed since it assumes LastPong is only in seconds.	2024-05-08 17:50:13 -07:00
jiuker	6bb10a81a6	avoid data race for testing (#19635 )	2024-04-30 08:03:35 -07:00
Harshavardhana	9693c382a8	make renameData() more defensive during overwrites (#19548 ) instead upon any error in renameData(), we still preserve the existing dataDir in some form for recoverability in strange situations such as out of disk space type errors. Bonus: avoid running list and heal() instead allow versions disparity to return the actual versions, uuid to heal. Currently limit this to 100 versions and lesser disparate objects. an undo now reverts back the xl.meta from xl.meta.bkp during overwrites on such flaky setups. Bonus: Save N depth syscalls via skipping the parents upon overwrites and versioned updates. Flaky setup examples are stretch clusters with regular packet drops etc, we need to add some defensive code around to avoid dangling objects.	2024-04-23 10:15:52 -07:00
Anis Eleuch	95bf4a57b6	logging: Add subsystem to log API (#19002 ) Create new code paths for multiple subsystems in the code. This will make maintaing this easier later. Also introduce bugLogIf() for errors that should not happen in the first place.	2024-04-04 05:04:40 -07:00
Klaus Post	912bbb2f1d	Always return slice with cap (#19395 ) Documentation promised this - so we should do it as well. Try to get a buffer and stash if it isn't big enough.	2024-04-02 08:56:18 -07:00
Klaus Post	b435806d91	Reduce big message RPC allocations (#19390 ) Use `ODirectPoolSmall` buffers for inline data in PutObject. Add a separate call for inline data that will fetch a buffer for the inline data before unmarshal.	2024-04-01 16:42:09 -07:00
Klaus Post	7ff4164d65	Fix races in IAM cache lazy loading (#19346 ) Fix races in IAM cache Fixes #19344 On the top level we only grab a read lock, but we write to the cache if we manage to fetch it. `a03dac41eb/cmd/iam-store.go (L446)` is also flipped to what it should be AFAICT. Change the internal cache structure to a concurrency safe implementation. Bonus: Also switch grid implementation.	2024-03-26 11:12:57 -07:00
Klaus Post	5c32058ff3	cosmetic: Move request goroutines to methods (#19241 ) Cosmetic change, but breaks up a big code block and will make a goroutine dumps of streams are more readable, so it is clearer what each goroutine is doing.	2024-03-13 11:43:58 -07:00
Klaus Post	51f62a8da3	Port ListBuckets to websockets layer & some cleanup (#19199 )	2024-03-08 11:08:18 -08:00
Klaus Post	40fb3371fa	Mux: Send async mux ack and fix stream error responses (#19149 ) Streams can return errors if the cancelation is picked up before the response stream close is picked up. Under extreme load, this could lead to missing responses. Send server mux ack async so a blocked send cannot block newMuxStream call. Stream will not progress until mux has been acked.	2024-02-28 10:05:18 -08:00
Harshavardhana	51874a5776	fix: allow DNS disconnection events to happen in k8s (#19145 ) in k8s things really do come online very asynchronously, we need to use implementation that allows this randomness. To facilitate this move WriteAll() as part of the websocket layer instead. Bonus: avoid instances of dnscache usage on k8s	2024-02-28 09:54:52 -08:00
Klaus Post	92180bc793	Add array recycling safety (#19103 ) Nil entries when recycling arrays.	2024-02-21 12:27:35 -08:00
Klaus Post	22aa16ab12	Fix grid reconnection deadlock (#19101 ) If network conditions have filled the output queue before a reconnect happens blocked sends could stop reconnects from happening. In short `respMu` would be held for a mux client while sending - if the queue is full this will never get released and closing the mux client will hang. A) Use the mux client context instead of connection context for sends, so sends are unblocked when the mux client is canceled. B) Use a `TryLock` on "close" and cancel the request if we cannot get the lock at once. This will unblock any attempts to send.	2024-02-21 07:49:34 -08:00
Klaus Post	e06168596f	Convert more peer <--> peer REST calls (#19004 ) * Convert more peer <--> peer REST calls * Clean up in general. * Add JSON wrapper. * Add slice wrapper. * Add option to make handler return nil error if no connection is given, `IgnoreNilConn`. Converts the following: ``` + HandlerGetMetrics + HandlerGetResourceMetrics + HandlerGetMemInfo + HandlerGetProcInfo + HandlerGetOSInfo + HandlerGetPartitions + HandlerGetNetInfo + HandlerGetCPUs + HandlerServerInfo + HandlerGetSysConfig + HandlerGetSysServices + HandlerGetSysErrors + HandlerGetAllBucketStats + HandlerGetBucketStats + HandlerGetSRMetrics + HandlerGetPeerMetrics + HandlerGetMetacacheListing + HandlerUpdateMetacacheListing + HandlerGetPeerBucketMetrics + HandlerStorageInfo + HandlerGetLocks + HandlerBackgroundHealStatus + HandlerGetLastDayTierStats + HandlerSignalService + HandlerGetBandwidth ```	2024-02-19 14:54:46 -08:00
Harshavardhana	607cafadbc	converge clusterRead health into cluster health (#19063 )	2024-02-15 16:48:36 -08:00
Klaus Post	8e68ff9321	Add extra disconnect safety (#19022 ) Fix reported races that are actually synchronized by network calls. But this should add some extra safety for untimely disconnects. Race reported: ``` WARNING: DATA RACE Read at 0x00c00171c9c0 by goroutine 214: github.com/minio/minio/internal/grid.(muxClient).addResponse() e:/gopath/src/github.com/minio/minio/internal/grid/muxclient.go:519 +0x111 github.com/minio/minio/internal/grid.(muxClient).error() e:/gopath/src/github.com/minio/minio/internal/grid/muxclient.go:470 +0x21d github.com/minio/minio/internal/grid.(Connection).handleDisconnectClientMux() e:/gopath/src/github.com/minio/minio/internal/grid/connection.go:1391 +0x15b github.com/minio/minio/internal/grid.(Connection).handleMsg() e:/gopath/src/github.com/minio/minio/internal/grid/connection.go:1190 +0x1ab github.com/minio/minio/internal/grid.(Connection).handleMessages.func1() e:/gopath/src/github.com/minio/minio/internal/grid/connection.go:981 +0x610 Previous write at 0x00c00171c9c0 by goroutine 1081: github.com/minio/minio/internal/grid.(muxClient).roundtrip() e:/gopath/src/github.com/minio/minio/internal/grid/muxclient.go:94 +0x324 github.com/minio/minio/internal/grid.(muxClient).traceRoundtrip() e:/gopath/src/github.com/minio/minio/internal/grid/trace.go:74 +0x10e4 github.com/minio/minio/internal/grid.(Subroute).Request() e:/gopath/src/github.com/minio/minio/internal/grid/connection.go:366 +0x230 github.com/minio/minio/internal/grid.(SingleHandler[go.shape.github.com/minio/minio/cmd.DiskInfoOptions,go.shape.github.com/minio/minio/cmd.DiskInfo]).Call() e:/gopath/src/github.com/minio/minio/internal/grid/handlers.go:554 +0x3fd github.com/minio/minio/cmd.(storageRESTClient).DiskInfo() e:/gopath/src/github.com/minio/minio/cmd/storage-rest-client.go:314 +0x270 github.com/minio/minio/cmd.erasureObjects.getOnlineDisksWithHealingAndInfo.func1() e:/gopath/src/github.com/minio/minio/cmd/erasure.go:293 +0x171 ``` This read will always happen after the write, since there is a network call in between. However a disconnect could come in while we are setting up the call, so we protect against that with extra checks.	2024-02-09 08:43:38 -08:00
Harshavardhana	035a3ea4ae	optimize startup sequence performance (#19009 ) - bucket metadata does not need to look for legacy things anymore if b.Created is non-zero - stagger bucket metadata loads across lots of nodes to avoid the current thundering herd problem. - Remove deadlines for RenameData, RenameFile - these calls should not ever be timed out and should wait until completion or wait for client timeout. Do not choose timeouts for applications during the WRITE phase. - increase R/W buffer size, increase maxMergeMessages to 30	2024-02-08 11:21:21 -08:00
Klaus Post	7ec43bd177	Fix blocked streams blocking reconnects (#19017 ) We have observed cases where a blocked stream will block for cancellations. This happens when response channel is blocked and we want to push an error. This will have the response mutex locked, which will prevent all other operations until upstream is unblocked. Make this behavior non-blocking and if blocked spawn a goroutine that will send the response and close the output. Still a lot of "dancing". Added a test for this and reviewed.	2024-02-08 10:15:27 -08:00
Klaus Post	9bcc46d93d	Fix second muxclient context leak (#18987 ) Subrouted requests were also leaking contexts in mux clients. Similar to #18956	2024-02-06 13:35:16 -08:00
Klaus Post	22687c1f50	Add websocket TCP write timeouts (#18988 ) Add 3 second write timeout to writes. This will make dead TCP connections terminate in a reasonable time. Fixes writes blocking for reconnection.	2024-02-06 13:34:46 -08:00
Harshavardhana	100c35c281	avoid excessive logs when peer is down (#18969 )	2024-02-04 23:25:42 -08:00
Harshavardhana	960d604013	disconnected returns, an unexpected error to List() returning 500s (#18959 ) provide the error string appropriately so that the matching of error types works. Also add a string based fallback for the said error.	2024-02-03 01:04:33 -08:00
Klaus Post	63bf5f42a1	Fix mux client memory leak (#18956 ) Add missing client cancellation, resulting in memory buildup tracing back to context.WithCancelCause/context.WithCancelDeadlineCause	2024-02-02 15:31:06 -08:00
Harshavardhana	ff80cfd83d	move Make,Delete,Head,Heal bucket calls to websockets (#18951 )	2024-02-02 14:54:54 -08:00
Klaus Post	ce0cb913bc	Fix ineffective recycling (#18952 ) Recycle would always be called on the dummy value `any(newRT())` instead of the actual value given to the recycle function. Caught by race tests, but mostly harmless, except for reduced perf. Other minor cleanups. Introduced in #18940 (unreleased)	2024-02-02 08:48:12 -08:00
Klaus Post	b192bc348c	Improve object reuse for grid messages (#18940 ) Allow internal types to support a `Recycler` interface, which will allow for sharing of common types across handlers. This means that all `grid.MSS` (and similar) objects are shared across in a common pool instead of a per-handler pool. Add internal request reuse of internal types. Add for safe (pointerless) types explicitly. Only log params for internal types. Doing Sprint(obj) is just a bit too messy.	2024-02-01 12:41:20 -08:00

1 2

71 Commits