minio

mirror of https://github.com/minio/minio.git synced 2025-12-06 07:42:32 -05:00

Author	SHA1	Message	Date
Harshavardhana	f8696cc8f6	fallback to globalLocalDrives for non-distributed setups	2024-02-28 14:56:08 -08:00
Harshavardhana	51874a5776	fix: allow DNS disconnection events to happen in k8s (#19145 ) in k8s things really do come online very asynchronously, we need to use implementation that allows this randomness. To facilitate this move WriteAll() as part of the websocket layer instead. Bonus: avoid instances of dnscache usage on k8s	2024-02-28 09:54:52 -08:00
Harshavardhana	9a012a53ef	initialize the disk healer early on (#19143 ) This PR fixes a bug that perhaps has been long introduced, with no visible workarounds. In any deployment, if an entire erasure set is deleted, there is no way the cluster recovers.	2024-02-27 23:02:14 -08:00
Klaus Post	e06168596f	Convert more peer <--> peer REST calls (#19004 ) * Convert more peer <--> peer REST calls * Clean up in general. * Add JSON wrapper. * Add slice wrapper. * Add option to make handler return nil error if no connection is given, `IgnoreNilConn`. Converts the following: ``` + HandlerGetMetrics + HandlerGetResourceMetrics + HandlerGetMemInfo + HandlerGetProcInfo + HandlerGetOSInfo + HandlerGetPartitions + HandlerGetNetInfo + HandlerGetCPUs + HandlerServerInfo + HandlerGetSysConfig + HandlerGetSysServices + HandlerGetSysErrors + HandlerGetAllBucketStats + HandlerGetBucketStats + HandlerGetSRMetrics + HandlerGetPeerMetrics + HandlerGetMetacacheListing + HandlerUpdateMetacacheListing + HandlerGetPeerBucketMetrics + HandlerStorageInfo + HandlerGetLocks + HandlerBackgroundHealStatus + HandlerGetLastDayTierStats + HandlerSignalService + HandlerGetBandwidth ```	2024-02-19 14:54:46 -08:00
Klaus Post	b192bc348c	Improve object reuse for grid messages (#18940 ) Allow internal types to support a `Recycler` interface, which will allow for sharing of common types across handlers. This means that all `grid.MSS` (and similar) objects are shared across in a common pool instead of a per-handler pool. Add internal request reuse of internal types. Add for safe (pointerless) types explicitly. Only log params for internal types. Doing Sprint(obj) is just a bit too messy.	2024-02-01 12:41:20 -08:00
Harshavardhana	80ca120088	remove checkBucketExist check entirely to avoid fan-out calls (#18917 ) Each Put, List, Multipart operations heavily rely on making GetBucketInfo() call to verify if bucket exists or not on a regular basis. This has a large performance cost when there are tons of servers involved. We did optimize this part by vectorizing the bucket calls, however its not enough, beyond 100 nodes and this becomes fairly visible in terms of performance.	2024-01-30 12:43:25 -08:00
Harshavardhana	1d3bd02089	avoid close 'nil' panics if any (#18890 ) brings a generic implementation that prints a stack trace for 'nil' channel closes(), if not safely closes it.	2024-01-28 10:04:17 -08:00
Harshavardhana	74851834c0	further bootstrap/startup optimization for reading 'format.json' (#18868 ) - Move RenameFile to websockets - Move ReadAll that is primarily is used for reading 'format.json' to to websockets - Optimize DiskInfo calls, and provide a way to make a NoOp DiskInfo call.	2024-01-25 12:45:46 -08:00
Harshavardhana	52229a21cb	avoid reload of 'format.json' over the network under normal conditions (#18842 )	2024-01-23 14:11:46 -08:00
Harshavardhana	dd2542e96c	add codespell action (#18818 ) Original work here, #18474, refixed and updated.	2024-01-17 23:03:17 -08:00
Harshavardhana	21d60eab7c	remove all older unused APIs (#18769 )	2024-01-17 20:41:23 -08:00
Anis Eleuch	3f4488c589	scanner: Allow full throttle if there is no parallel disk ops (#18109 )	2024-01-02 13:51:24 -08:00
Harshavardhana	a50ea92c64	feat: introduce list_quorum="auto" to prefer quorum drives (#18084 ) NOTE: This feature is not retro-active; it will not cater to previous transactions on existing setups. To enable this feature, please set ` _MINIO_DRIVE_QUORUM=on` environment variable as part of systemd service or k8s configmap. Once this has been enabled, you need to also set `list_quorum`. ``` ~ mc admin config set alias/ api list_quorum=auto` ``` A new debugging tool is available to check for any missing counters.	2023-12-29 15:52:41 -08:00
Klaus Post	6c89a81af4	Fix CreateFile shared buffer corruption. (#18652 ) `(xlStorageDiskIDCheck).CreateFile` wraps the incoming reader in `xioutil.NewDeadlineReader`. The wrapped reader is handed to `(xlStorage).CreateFile`. This performs a Read call via `writeAllDirect`, which reads into an `ODirectPool` buffer. `(*DeadlineReader).Read` spawns an async read into the buffer. If a timeout is hit while reading, the read operation returns to `writeAllDirect`. The operation returns an error and the buffer is reused. However, if the async `Read` call unblocks, it will write to the now recycled buffer. Fix: Remove the `DeadlineReader` - it is inherently unsafe. Instead, rely on the network timeouts. This is not a disk timeout, anyway. Regression in https://github.com/minio/minio/pull/17745	2023-12-14 10:51:57 -08:00
Harshavardhana	b3314e97a6	re-use the same local drive used by remote-peer (#18645 ) historically, we have always kept storage-rest-server and a local storage API separate without much trouble, since they both can independently operate due to no special state() between them. however, over some time, we have added state() such as - drive monitoring threads now there will be "2" of them per drive instead of just 1. - concurrent tokens available per drive are now twice instead of just single shared, allowing unexpectedly high amount of I/O to go through. - applying serialization by using walkMutexes can now be adequately honored for both remote callers and local callers.	2023-12-13 19:27:55 -08:00
Harshavardhana	05bb655efc	avoid caching metrics for timeout errors per drive (#18584 ) Bonus: combine the loop for drive/REST registration.	2023-12-04 11:54:13 -08:00
Klaus Post	69294cf98a	Disable DMA optimization on windows (#18575 ) It appears that Windows can lock up when errors occur. Use regular copy here.	2023-12-01 16:13:19 -08:00
Klaus Post	961b0b524e	Do not require restart when a disk is unreachable during node boot (#18576 ) A disk that is not able to initialize when an instance is started will never have a handler registered, which means a user will need to restart the node after fixing the disk; This will also prevent showing the wrong 'upgrade is needed.' error message in that case. When the disk is still failing, print an error every 30 minutes; Disk reconnection will be retried every 30 seconds. Co-authored-by: Anis Elleuch <anis@min.io>	2023-12-01 12:01:14 -08:00
Klaus Post	5f971fea6e	Fix Mux Connect Error (#18567 ) `OpMuxConnectError` was not handled correctly. Remove local checks for single request handlers so they can run before being registered locally. Bonus: Only log IAM bootstrap on startup.	2023-12-01 00:18:04 -08:00
Klaus Post	ca488cce87	Add detailed parameter tracing + custom prefix (#18518 ) * Allow per handler custom prefix. * Add automatic parameter extraction	2023-11-26 01:32:59 -08:00
Harshavardhana	a4cfb5e1ed	return errors if dataDir is missing during HeadObject() (#18477 ) Bonus: allow replication to attempt Deletes/Puts when the remote returns quorum errors of some kind, this is to ensure that MinIO can rewrite the namespace with the latest version that exists on the source.	2023-11-20 21:33:47 -08:00
Klaus Post	51aa59a737	perf: websocket grid connectivity for all internode communication (#18461 ) This PR adds a WebSocket grid feature that allows servers to communicate via a single two-way connection. There are two request types: * Single requests, which are `[]byte => ([]byte, error)`. This is for efficient small roundtrips with small payloads. * Streaming requests which are `[]byte, chan []byte => chan []byte (and error)`, which allows for different combinations of full two-way streams with an initial payload. Only a single stream is created between two machines - and there is, as such, no server/client relation since both sides can initiate and handle requests. Which server initiates the request is decided deterministically on the server names. Requests are made through a mux client and server, which handles message passing, congestion, cancelation, timeouts, etc. If a connection is lost, all requests are canceled, and the calling server will try to reconnect. Registered handlers can operate directly on byte slices or use a higher-level generics abstraction. There is no versioning of handlers/clients, and incompatible changes should be handled by adding new handlers. The request path can be changed to a new one for any protocol changes. First, all servers create a "Manager." The manager must know its address as well as all remote addresses. This will manage all connections. To get a connection to any remote, ask the manager to provide it given the remote address using. ``` func (m Manager) Connection(host string) Connection ``` All serverside handlers must also be registered on the manager. This will make sure that all incoming requests are served. The number of in-flight requests and responses must also be given for streaming requests. The "Connection" returned manages the mux-clients. Requests issued to the connection will be sent to the remote. * `func (c Connection) Request(ctx context.Context, h HandlerID, req []byte) ([]byte, error)` performs a single request and returns the result. Any deadline provided on the request is forwarded to the server, and canceling the context will make the function return at once. `func (c Connection) NewStream(ctx context.Context, h HandlerID, payload []byte) (st Stream, err error)` will initiate a remote call and send the initial payload. ```Go // A Stream is a two-way stream. // All responses must be read by the caller. // If the call is canceled through the context, //The appropriate error will be returned. type Stream struct { // Responses from the remote server. // Channel will be closed after an error or when the remote closes. // All responses must be read by the caller until either an error is returned or the channel is closed. // Canceling the context will cause the context cancellation error to be returned. Responses <-chan Response // Requests sent to the server. // If the handler is defined with 0 incoming capacity this will be nil. // Channel must be closed to signal the end of the stream. // If the request context is canceled, the stream will no longer process requests. Requests chan<- []byte } type Response struct { Msg []byte Err error } ``` There are generic versions of the server/client handlers that allow the use of type safe implementations for data types that support msgpack marshal/unmarshal.	2023-11-20 17:09:35 -08:00
Harshavardhana	91d8bddbd1	use sendfile/splice implementation to perform DMA (#18411 ) sendfile implementation to perform DMA on all platforms Go stdlib already supports sendfile/splice implementations for - Linux - Windows - *BSD - Solaris Along with this change however O_DIRECT for reads() must be removed as well since we need to use sendfile() implementation The main reason to add O_DIRECT for reads was to reduce the chances of page-cache causing OOMs for MinIO, however it would seem that avoiding buffer copies from user-space to kernel space this issue is not a problem anymore. There is no Go based memory allocation required, and neither the page-cache is referenced back to MinIO. This page- cache reference is fully owned by kernel at this point, this essentially should solve the problem of page-cache build up. With this now we also support SG - when NIC supports Scatter/Gather https://en.wikipedia.org/wiki/Gather/scatter_(vector_addressing)	2023-11-10 10:10:14 -08:00
Aditya Manthramurthy	1c99fb106c	Update to minio/pkg/v2 (#17967 )	2023-09-04 12:57:37 -07:00
Harshavardhana	124e28578c	remove strict persistence requirements for List() .metacache objects (#17917 ) .metacache objects are transient in nature, and are better left to use page-cache effectively to avoid using more IOPs on the disks. this allows for incoming calls to be not taxed heavily due to multiple large batch listings.	2023-08-25 07:58:11 -07:00
Klaus Post	7c8746732b	Return cancelled storage calls as 499 (#17895 ) Make upstream cancels more visible - right now they are just reported as "forbidden".	2023-08-22 11:10:41 -07:00
Harshavardhana	81be718674	fix: optimize DiskInfo() call avoid metrics when not needed (#17763 )	2023-07-31 15:20:48 -07:00
Anis Eleuch	df29d25e6b	return different status code for internode communication (#17655 ) mc admin trace -a will be able to quickly show 401 Unauthorized header to pinpoint trivial issues between nodes, such as wrong root credentials and skewed time.	2023-07-14 18:34:55 -07:00
Klaus Post	4f89e5bba9	Add active disk health checks (#17539 ) Add check every 2 minutes to see if a write+read operation can complete. If disk is unresponsive for 2 minutes or returns errFaultyDisk, take it offline.	2023-07-13 11:41:55 -07:00
Anis Eleuch	6d0bc5ab1e	prometheus: Fix internode stats (#17594 ) Internode calculation was done inside S3 handlers, fix it by moving it to internode handlers. Remove admin stats since it is not used.	2023-07-08 07:35:11 -07:00
Klaus Post	ff5988f4e0	Reduce allocations (#17584 ) * Reduce allocations * Add stringsHasPrefixFold which can compare string prefixes, while ignoring case and not allocating. * Reuse all msgp.Readers * Reuse metadata buffers when not reading data. * Make type safe. Make buffer 4K instead of 8. * Unslice	2023-07-06 16:02:08 -07:00
Aditya Manthramurthy	5a1612fe32	Bump up madmin-go and pkg deps (#17469 )	2023-06-19 17:53:08 -07:00
jiuker	e3a070e3de	put *msgp.Reader back to pool (#17156 )	2023-05-08 07:51:39 -07:00
Harshavardhana	31b0decd46	migrate to minio/mux from gorilla/mux (#16456 )	2023-01-23 16:42:47 +05:30
Aditya Manthramurthy	a30cfdd88f	Bump up madmin-go to v2 (#16162 )	2022-12-06 13:46:50 -08:00
Klaus Post	cc1d8f0057	Check for abandoned data when healing (#16122 )	2022-11-28 10:20:55 -08:00
Klaus Post	ecc932d5dd	Clean entire tmp-old on restart (#15979 )	2022-10-31 07:27:50 -07:00
Klaus Post	bd3dfad8b9	Add concurrent Snowball extraction + options (#15836 )	2022-10-18 13:50:21 -07:00
Klaus Post	ff12080ff5	Remove deprecated io/ioutil (#15707 )	2022-09-19 11:05:16 -07:00
Harshavardhana	2d9b5a65f1	verify RenameData() versions to be consistent (#15649 ) xl.meta gets written and never rolled back, however we definitely need to validate the state that is persisted on the disk, if there are inconsistencies - more than write quorum we should return an error to the client - if write quorum was achieved however there are inconsistent xl.meta's we should simply trigger an MRF on them	2022-09-05 16:51:37 -07:00
Abirdcfly	d4e0f13bb3	chore: remove duplicate word in comments (#15607 ) Signed-off-by: Abirdcfly <fp544037857@gmail.com> Signed-off-by: Abirdcfly <fp544037857@gmail.com>	2022-08-30 08:26:43 -07:00
ebozduman	b57e7321e7	Replaces 'disk'=>'drive' visible to end user (#15464 )	2022-08-04 16:10:08 -07:00
Klaus Post	f939d1c183	Independent Multipart Uploads (#15346 ) Do completely independent multipart uploads. In distributed mode, a lock was held to merge each multipart upload as it was added. This lock was highly contested and retries are expensive (timewise) in distributed mode. Instead, each part adds its metadata information uniquely. This eliminates the per object lock required for each to merge. The metadata is read back and merged by "CompleteMultipartUpload" without locks when constructing final object. Co-authored-by: Harshavardhana <harsha@minio.io>	2022-07-19 08:35:29 -07:00
Praveen raj Mani	b49fc33cb3	purge objects immediately with `x-minio-force-delete` in DeleteObject and DeleteBucket API (#15148 )	2022-07-11 09:15:54 -07:00
Anis Elleuch	d4e565e595	Add defensive check for one stream message size (#15029 ) In a streaming response, the client knows the size of a streamed message but never checks the message size. Add the check to error out if the response message is truncated.	2022-06-02 09:16:26 -07:00
Harshavardhana	f1abb92f0c	feat: Single drive XL implementation (#14970 ) Main motivation is move towards a common backend format for all different types of modes in MinIO, allowing for a simpler code and predictable behavior across all features. This PR also brings features such as versioning, replication, transitioning to single drive setups.	2022-05-30 10:58:37 -07:00
Harshavardhana	507f993075	attempt to real resolve when there is a quorum failure on reads (#14613 )	2022-04-20 12:49:05 -07:00
Anis Elleuch	16431d222c	heal: Enable periodic bitrot scan configuration (#14464 )	2022-04-07 08:10:40 -07:00
Klaus Post	b890bbfa63	Add local disk health checks (#14447 ) The main goal of this PR is to solve the situation where disks stop responding to operations. This generally causes an FD build-up and eventually will crash the server. This adds detection of hung disks, where calls on disk get stuck. We add functionality to `xlStorageDiskIDCheck` where it keeps track of the number of concurrent requests on a given disk. A total number of 100 operations are allowed. If this limit is reached we will block (but not reject) new requests, but we will monitor the state of the disk. If no requests have been completed or updated within a 15-second window, we mark the disk as offline. Requests that are blocked will be unblocked and return an error as "faulty disk". New requests will be rejected until the disk is marked OK again. Once a disk has been marked faulty, a check will run every 5 seconds that will attempt to write and read back a file. As long as this fails the disk will remain faulty. To prevent lots of long-running requests to mark the disk faulty we implement a callback feature that allows updating the status as parts of these operations are running. We add a reader and writer wrapper that will update the status of each successful read/write operation. This should allow fine enough granularity that a slow, but still operational disk will not reach 15 seconds where 50 operations have not progressed. Note that errors themselves are not enough to mark a disk faulty. A nil (or io.EOF) error will mark a disk as "good". * Make concurrent disk setting configurable via `_MINIO_DISK_MAX_CONCURRENT`. * de-couple IsOnline() from disk health tracker The purpose of IsOnline() is to ensure that we reconnect the drive only when the "drive" was - disconnected from network we need to validate if the drive is "correct" and is the same drive which belongs to this server. - drive was replaced we have to format it - we support hot swapping of the drives. IsOnline() is not meant for taking the drive offline when it is hung, it is not useful we can let the drive be online instead "return" errors for relevant calls. * return errFaultyDisk for DiskInfo() call Co-authored-by: Harshavardhana <harsha@minio.io> Possible future Improvements: * Unify the REST server and local xlStorageDiskIDCheck. This would also improve stats significantly. * Allow reads/writes to be aborted by the context. * Add usage stats, concurrent count, blocked operations, etc.	2022-03-09 11:38:54 -08:00
Anis Elleuch	4a15bd8ff8	Return info for DiskInfo when the disk is unformatted (#14427 ) In a distributed setup, a DiskInfo REST call to an unformatted disk returns an error with no disk information, such as the disk endpoint URL, which is unexpected.	2022-03-01 15:06:47 -08:00

1 2 3 4

182 Commits