minio

mirror of https://github.com/minio/minio.git synced 2025-11-27 12:53:45 -05:00

Author	SHA1	Message	Date
Shubhendu	7c7650b7c3	Add sufficient deadlines and countermeasures to handle hung node scenario (#19688 ) Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io> Signed-off-by: Harshavardhana <harsha@minio.io>	2024-05-22 16:07:14 -07:00
Harshavardhana	ca80eced24	usage of deadline conn at Accept() breaks websocket (#19789 ) fortunately not wired up to use, however if anyone enables deadlines for conn then sporadically MinIO startups fail.	2024-05-22 10:49:27 -07:00
Klaus Post	d4b391de1b	Add PutObject Ring Buffer (#19605 ) Replace the `io.Pipe` from streamingBitrotWriter -> CreateFile with a fixed size ring buffer. This will add an output buffer for encoded shards to be written to disk - potentially via RPC. This will remove blocking when `(*streamingBitrotWriter).Write` is called, and it writes hashes and data. With current settings, the write looks like this: ``` Outbound ┌───────────────────┐ ┌────────────────┐ ┌───────────────┐ ┌────────────────┐ │ │ Parr. │ │ (http body) │ │ │ │ │ Bitrot Hash │ Write │ Pipe │ Read │ HTTP buffer │ Write (syscall) │ TCP Buffer │ │ Erasure Shard │ ──────────► │ (unbuffered) │ ────────────► │ (64K Max) │ ───────────────────► │ (4MB) │ │ │ │ │ │ (io.Copy) │ │ │ └───────────────────┘ └────────────────┘ └───────────────┘ └────────────────┘ ``` We write a Hash (32 bytes). Since the pipe is unbuffered, it will block until the 32 bytes have been delivered to the TCP buffer, and the next Read hits the Pipe. Then we write the shard data. This will typically be bigger than 64KB, so it will block until two blocks have been read from the pipe. When we insert a ring buffer: ``` Outbound ┌───────────────────┐ ┌────────────────┐ ┌───────────────┐ ┌────────────────┐ │ │ │ │ (http body) │ │ │ │ │ Bitrot Hash │ Write │ Ring Buffer │ Read │ HTTP buffer │ Write (syscall) │ TCP Buffer │ │ Erasure Shard │ ──────────► │ (2MB) │ ────────────► │ (64K Max) │ ───────────────────► │ (4MB) │ │ │ │ │ │ (io.Copy) │ │ │ └───────────────────┘ └────────────────┘ └───────────────┘ └────────────────┘ ``` The hash+shard will fit within the ring buffer, so writes will not block - but will complete after a memcopy. Reads can fill the 64KB buffer if there is data for it. If the network is congested, the ring buffer will become filled, and all syscalls will be on full buffers. Only when the ring buffer is filled will erasure coding start blocking. Since there is always "space" to write output data, we remove the parallel writing since we are always writing to memory now, and the goroutine synchronization overhead probably not worth taking. If the output were blocked in the existing, we would still wait for it to unblock in parallel write, so it would make no difference there - except now the ring buffer smoothes out the load. There are some micro-optimizations we could look at later. The biggest is that, in most cases, we could encode directly to the ring buffer - if we are not at a boundary. Also, "force filling" the Read requests (i.e., blocking until a full read can be completed) could be investigated and maybe allow concurrent memory on read and write.	2024-05-14 17:11:04 -07:00
Harshavardhana	f3a52cc195	simplify listener implementation setup customizations in right place (#19589 )	2024-04-23 21:08:47 -07:00
Shubhendu	468a9fae83	Enable replication of SSE-C objects (#19107 ) If site replication enabled across sites, replicate the SSE-C objects as well. These objects could be read from target sites using the same client encryption keys. Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2024-03-28 10:44:56 -07:00
Harshavardhana	364d3a0ac9	fix: new staticheck and linter issues reported (#19340 )	2024-03-27 08:10:40 -07:00
Harshavardhana	2c2f5d871c	debug: introduce support for configuring client connect WRITE deadline (#19170 ) just like client-conn-read-deadline, added a new flag that does client-conn-write-deadline as well. Both are not configured by default, since we do not yet know what is the right value. Allow this to be configurable if needed.	2024-03-01 08:00:42 -08:00
Harshavardhana	607cafadbc	converge clusterRead health into cluster health (#19063 )	2024-02-15 16:48:36 -08:00
Harshavardhana	997ba3a574	introduce reader deadlines for net.Conn (#19023 ) Bonus: set "retry-after" header for AWS SDKs if possible to honor them.	2024-02-09 13:25:16 -08:00
Harshavardhana	2ddf2ca934	allow configuring maximum idle connections per host (#18908 )	2024-01-29 16:50:37 -08:00
Harshavardhana	dd2542e96c	add codespell action (#18818 ) Original work here, #18474, refixed and updated.	2024-01-17 23:03:17 -08:00
Poorna	b2b26d9c95	support proxying of tagging requests in replication (#18649 ) support proxying of tagging requests in active-active replication Note: even if proxying is successful, PutObjectTagging/DeleteObjectTagging will continue to report a 404 since the object is not present locally.	2024-01-12 23:51:33 -08:00
Sveinn	9b8ba97f9f	feat: add support for GetObjectAttributes API (#18732 )	2024-01-05 10:43:06 -08:00
Klaus Post	69294cf98a	Disable DMA optimization on windows (#18575 ) It appears that Windows can lock up when errors occur. Use regular copy here.	2023-12-01 16:13:19 -08:00
Klaus Post	51aa59a737	perf: websocket grid connectivity for all internode communication (#18461 ) This PR adds a WebSocket grid feature that allows servers to communicate via a single two-way connection. There are two request types: * Single requests, which are `[]byte => ([]byte, error)`. This is for efficient small roundtrips with small payloads. * Streaming requests which are `[]byte, chan []byte => chan []byte (and error)`, which allows for different combinations of full two-way streams with an initial payload. Only a single stream is created between two machines - and there is, as such, no server/client relation since both sides can initiate and handle requests. Which server initiates the request is decided deterministically on the server names. Requests are made through a mux client and server, which handles message passing, congestion, cancelation, timeouts, etc. If a connection is lost, all requests are canceled, and the calling server will try to reconnect. Registered handlers can operate directly on byte slices or use a higher-level generics abstraction. There is no versioning of handlers/clients, and incompatible changes should be handled by adding new handlers. The request path can be changed to a new one for any protocol changes. First, all servers create a "Manager." The manager must know its address as well as all remote addresses. This will manage all connections. To get a connection to any remote, ask the manager to provide it given the remote address using. ``` func (m Manager) Connection(host string) Connection ``` All serverside handlers must also be registered on the manager. This will make sure that all incoming requests are served. The number of in-flight requests and responses must also be given for streaming requests. The "Connection" returned manages the mux-clients. Requests issued to the connection will be sent to the remote. * `func (c Connection) Request(ctx context.Context, h HandlerID, req []byte) ([]byte, error)` performs a single request and returns the result. Any deadline provided on the request is forwarded to the server, and canceling the context will make the function return at once. `func (c Connection) NewStream(ctx context.Context, h HandlerID, payload []byte) (st Stream, err error)` will initiate a remote call and send the initial payload. ```Go // A Stream is a two-way stream. // All responses must be read by the caller. // If the call is canceled through the context, //The appropriate error will be returned. type Stream struct { // Responses from the remote server. // Channel will be closed after an error or when the remote closes. // All responses must be read by the caller until either an error is returned or the channel is closed. // Canceling the context will cause the context cancellation error to be returned. Responses <-chan Response // Requests sent to the server. // If the handler is defined with 0 incoming capacity this will be nil. // Channel must be closed to signal the end of the stream. // If the request context is canceled, the stream will no longer process requests. Requests chan<- []byte } type Response struct { Msg []byte Err error } ``` There are generic versions of the server/client handlers that allow the use of type safe implementations for data types that support msgpack marshal/unmarshal.	2023-11-20 17:09:35 -08:00
Harshavardhana	91d8bddbd1	use sendfile/splice implementation to perform DMA (#18411 ) sendfile implementation to perform DMA on all platforms Go stdlib already supports sendfile/splice implementations for - Linux - Windows - *BSD - Solaris Along with this change however O_DIRECT for reads() must be removed as well since we need to use sendfile() implementation The main reason to add O_DIRECT for reads was to reduce the chances of page-cache causing OOMs for MinIO, however it would seem that avoiding buffer copies from user-space to kernel space this issue is not a problem anymore. There is no Go based memory allocation required, and neither the page-cache is referenced back to MinIO. This page- cache reference is fully owned by kernel at this point, this essentially should solve the problem of page-cache build up. With this now we also support SG - when NIC supports Scatter/Gather https://en.wikipedia.org/wiki/Gather/scatter_(vector_addressing)	2023-11-10 10:10:14 -08:00
Harshavardhana	754f7a8a39	replace io.Discard usage to fix some NUMA copy() latencies (#18394 ) replace io.Discard usage to fix NUMA copy() latencies On NUMA systems copying from 8K buffer allocated via io.Discard leads to large latency build-up for every ``` copy(new8kbuf, largebuf) ``` can in-cur upto 1ms worth of latencies on NUMA systems due to memory sharding across NUMA nodes.	2023-11-06 14:26:08 -08:00
Aditya Manthramurthy	1c99fb106c	Update to minio/pkg/v2 (#17967 )	2023-09-04 12:57:37 -07:00
Harshavardhana	114fab4c70	export cluster health as prometheus metrics (#17741 )	2023-07-28 01:16:53 -07:00
Harshavardhana	2d1cda2061	fix: do not os.Exit(1) while writing goroutines during shutdown (#17640 ) Also shutdown poll add jitter, to verify if the shutdown sequence can finish before 500ms, this reduces the overall time taken during "restart" of the service. Provides speedup for `mc admin service restart` during active I/O, also ensures that systemd doesn't treat the returned 'error' as a failure, certain configurations in systemd can cause it to 'auto-restart' the process by-itself which can interfere with `mc admin service restart`. It can be observed how now restarting the service is much snappier.	2023-07-12 07:18:30 -07:00
Poorna	fb49aead9b	replication: add validation API (#17520 ) To check if replication is set up properly on a bucket.	2023-07-10 20:09:20 -07:00
Harshavardhana	28a01f0320	update missing license header in files (#17603 )	2023-07-08 10:42:05 -07:00
Harshavardhana	e37c4efc6e	fix: upon DNS refresh() failure use previous values (#17561 ) DNS refresh() in-case of MinIO can safely re-use the previous values on bare-metal setups, since bare-metal arrangements do not change DNS in any manner commonly. This PR simplifies that, we only ever need DNS caching on bare-metal setups. - On containerized setups do not enable DNS caching at all, as it may have adverse effects on the overall effectiveness of k8s DNS systems. k8s DNS systems are dynamic and expect applications to avoid managing DNS caching themselves, instead provide a cleaner container native caching implementations that must be used. - update IsDocker() detection, including podman runtime - move to minio/dnscache fork for a simpler package	2023-07-03 12:30:51 -07:00
Harshavardhana	7f782983ca	fix: for FTP server driver allow implicit trust of TLS (#17541 ) fixes #17535	2023-06-30 08:04:13 -07:00
Anis Eleuch	0f0dcf0c5e	tar: Avoid storing snowball extraction header in extract objects (#17389 )	2023-06-12 09:42:06 -07:00
Anis Eleuch	bb24346e04	listen: Only error out if not able to bind any interface (#17353 )	2023-06-12 09:09:28 -07:00
Klaus Post	6e38d0f3ab	Add more bootstrap info in debug mode (#17362 )	2023-06-08 08:39:47 -07:00
Anis Eleuch	eba378e4a1	vrf: Fix testing for loopback coming from the address (#17372 )	2023-06-07 09:53:05 -07:00
Krishnan Parthasarathi	62df731006	Add updatedAt for GetBucketLifecycleConfig (#17271 )	2023-05-24 22:52:39 -07:00
Harshavardhana	5569acd95c	disallow EC:0 if not set during server startup (#17141 )	2023-05-04 14:44:30 -07:00
Harshavardhana	9571b0825e	add configurable VRF interface and user-timeout (#17108 )	2023-05-03 14:12:25 -07:00
Anis Eleuch	31b5acc245	tcp: Increase user timeout to 10 minutes (#17087 )	2023-04-26 17:48:31 -07:00
Harshavardhana	a5835cecbf	fix: regression in counting total requests (#17024 )	2023-04-12 14:37:19 -07:00
Anis Eleuch	c259a8ea38	Set tcp user timeout to clean sockets with data in the buffer (#16887 )	2023-03-24 08:10:58 -07:00
Harshavardhana	901887e6bf	feat: add lambda transformation functions target (#16507 )	2023-03-07 08:12:41 -08:00
Harshavardhana	5c98223c89	add correct HostId instead of deploymentId for error responses (#16686 )	2023-02-22 15:41:09 +05:30
Harshavardhana	65c104a589	add x-amz-id-2 to indicate the node that received the request (#16474 )	2023-01-25 09:14:10 -08:00
Aditya Manthramurthy	698862ec5d	Fix transports/timeouts related regressions (#16427 )	2023-01-18 10:06:38 +05:30
Allan Roger Reid	9815dac48f	fix: allow bind on ipv6 loopback failures (#16388 )	2023-01-11 08:47:39 +05:30
Anis Elleuch	1c85652cff	lint: Fix in darwin environment (#16368 )	2023-01-05 10:12:01 -08:00
Aditya Manthramurthy	2d60bf8c50	Refactor HTTP transports (#16222 )	2022-12-12 20:31:21 -08:00
Harshavardhana	419f351df3	avoid logging gzipped body in trace output (#16172 )	2022-12-05 13:21:27 -08:00
Anis Elleuch	1f1dcdce65	move HTTP recorder to an internal library (#16128 )	2022-11-28 10:20:27 -08:00
Shireesh Anjal	5246e3be84	Send health diagnostics data as part of callhome (#16006 )	2022-11-15 13:53:05 -08:00
Harshavardhana	944c62daf4	skip flaky tests on windows OS (#16015 )	2022-11-07 00:11:21 -08:00
Poorna	e4e90b53c1	fix: delete-marker replication check properly (#15923 )	2022-10-21 14:45:06 -07:00
Klaus Post	bd3dfad8b9	Add concurrent Snowball extraction + options (#15836 )	2022-10-18 13:50:21 -07:00
Poorna	0e3c92c027	attempt delete marker replication after object is replicated (#15857 ) Ensure delete marker replication success, especially since the recent optimizations to heal on HEAD, LIST and GET can force replication attempts on delete marker before underlying object version could have synced.	2022-10-13 17:45:23 -07:00
Klaus Post	ff12080ff5	Remove deprecated io/ioutil (#15707 )	2022-09-19 11:05:16 -07:00
Klaus Post	a9f1ad7924	Add extended checksum support (#15433 )	2022-08-29 16:57:16 -07:00

1 2

74 Commits