minio

Commit Graph

Author	SHA1	Message	Date
Anis Eleuch	7ebceacac6	heal: Fix deep scan failing to heal objects (#117 ) The verify file handler response format was changed from gob to msgp since two months but we forgot updating the verify handler client. VerifyFile is only called during a heal deep scan (bitrot check). HealObject() will fail in that case and will mark all disks corrupted and will return early (as unrecoverable object but it will also not be removed) It is a bit rare for HealObject to be called with a deep scan flag. It is called when a HealObject with a normal scan (e.g. new drive healing) detects a bitrot corruption, therefore healing objects with a detected bitrot corruption will fail.	2024-10-13 06:07:21 -07:00
Harshavardhana	2e0fd2cba9	implement a safer completeMultipart implementation (#20227 ) - optimize writing part.N.meta by writing both part.N and its meta in sequence without network component. - remove part.N.meta, part.N which were partially success ful, in quorum loss situations during renamePart() - allow for strict read quorum check arbitrated via ETag for the given part number, this makes it double safer upon final commit. - return an appropriate error when read quorum is missing, instead of returning InvalidPart{}, which is non-retryable error. This kind of situation can happen when many nodes are going offline in rotation, an example of such a restart() behavior is statefulset updates in k8s. fixes #20091	2024-08-12 01:38:15 -07:00
Harshavardhana	80ff907d08	add DeleteBulk support, add sufficient deadlines per rename() (#20185 ) deadlines per moveToTrash() allows for a more granular timeout approach for syscalls, instead of an aggregate timeout. This PR also enhances multipart state cleanup to be optimal by removing 100's of multipart network rename() calls into single network call.	2024-07-29 18:56:40 -07:00
Harshavardhana	3ae104edae	change Read* calls over net/http to move to http.MethodGet (#20173 ) - ReadVersion - ReadFile - ReadXL Further changes include to - Compact internode resource RPC paths - Compact internode query params To optimize on parsing by gorilla/mux as the length of this string increases latency in gorilla/mux - reduce to a meaningful string.	2024-07-29 01:00:12 -07:00
Harshavardhana	064f36ca5a	move to GET for internal stream READs instead of POST (#20160 ) the main reason is to let Go net/http perform necessary book keeping properly, and in essential from consistency point of view its GETs all the way. Deprecate sendFile() as its buggy inside Go runtime.	2024-07-26 05:55:01 -07:00
Harshavardhana	91805bcab6	add optimizations to bring performance on unversioned READS (#20128 ) allow non-inlined on disk to be inlined via an unversioned ReadVersion() call, we only need ReadXL() to resolve objects with multiple versions only. The choice of this block makes it to be dynamic and chosen by the user via `mc admin config set` Other bonus things - Start measuring internode TTFB performance. - Set TCP_NODELAY, TCP_CORK for low latency	2024-07-23 03:53:03 -07:00
Harshavardhana	7bd1d899bc	remove overzealous check during HEAD() (#19940 ) due to a historic bug in CopyObject() where an inlined object loses its metadata, the check causes an incorrect fallback verifying data-dir. CopyObject() bug was fixed in `ffa91f9794` however the occurrence of this problem is historic, so the aforementioned check is stretching too much. Bonus: simplify fileInfoRaw() to read xl.json as well, also recreate buckets properly.	2024-06-17 07:29:18 -07:00
Anis Eleuch	789cbc6fb2	heal: Dangling check to evaluate object parts separately (#19797 )	2024-06-10 08:51:27 -07:00
Aditya Manthramurthy	5f78691fcf	ldap: Add user DN attributes list config param (#19758 ) This change uses the updated ldap library in minio/pkg (bumped up to v3). A new config parameter is added for LDAP configuration to specify extra user attributes to load from the LDAP server and to store them as additional claims for the user. A test is added in sts_handlers.go that shows how to access the LDAP attributes as a claim. This is in preparation for adding SSH pubkey authentication to MinIO's SFTP integration.	2024-05-24 16:05:23 -07:00
Harshavardhana	0b3eb7f218	add more deadlines and pass around context under most situations (#19752 )	2024-05-15 15:19:00 -07:00
Harshavardhana	d3db7d31a3	fix: add deadlines for all synchronous REST callers (#19741 ) add deadlines that can be dynamically changed via the drive max timeout values. Bonus: optimize "file not found" case and hung drives/network - circuit break the check and return right away instead of waiting.	2024-05-15 09:52:29 -07:00
Harshavardhana	9a267f9270	allow caller context during reloads() to cancel (#19687 ) canceled callers might linger around longer, can potentially overwhelm the system. Instead provider a caller context and canceled callers don't hold on to them. Bonus: we have no reason to cache errors, we should never cache errors otherwise we can potentially have quorum errors creeping in unexpectedly. We should let the cache when invalidating hit the actual resources instead.	2024-05-08 17:51:34 -07:00
Harshavardhana	a372c6a377	a bunch of fixes for error handling (#19627 ) - handle errFileCorrupt properly - micro-optimization of sending done() response quicker to close the goroutine. - fix logger.Event() usage in a couple of places - handle the rest of the client to return a different error other than lastErr() when the client is closed.	2024-04-28 10:53:50 -07:00
Harshavardhana	9693c382a8	make renameData() more defensive during overwrites (#19548 ) instead upon any error in renameData(), we still preserve the existing dataDir in some form for recoverability in strange situations such as out of disk space type errors. Bonus: avoid running list and heal() instead allow versions disparity to return the actual versions, uuid to heal. Currently limit this to 100 versions and lesser disparate objects. an undo now reverts back the xl.meta from xl.meta.bkp during overwrites on such flaky setups. Bonus: Save N depth syscalls via skipping the parents upon overwrites and versioned updates. Flaky setup examples are stretch clusters with regular packet drops etc, we need to add some defensive code around to avoid dangling objects.	2024-04-23 10:15:52 -07:00
Harshavardhana	d1c58fc2eb	remove older deploymentID fix behavior to speed up startup (#19497 ) since mid 2018 we do not have any deployments without deployment-id, it is time to put this code to rest, this PR removes this old code as its no longer valuable. on setups with 1000's of drives these are all quite expensive operations.	2024-04-15 01:25:46 -07:00
Harshavardhana	074febd9e1	remove SetDiskLoc() rely on the endpoint values instead (#19475 ) the disk location never changes in the lifetime of a MinIO cluster, even if it did validate this close to the disk instead at the higher layer. Return appropriate errors indicating an invalid drive, so that the drive is not recognized as part of a valid drive.	2024-04-11 10:45:28 -07:00
Anis Eleuch	95bf4a57b6	logging: Add subsystem to log API (#19002 ) Create new code paths for multiple subsystems in the code. This will make maintaing this easier later. Also introduce bugLogIf() for errors that should not happen in the first place.	2024-04-04 05:04:40 -07:00
Klaus Post	b435806d91	Reduce big message RPC allocations (#19390 ) Use `ODirectPoolSmall` buffers for inline data in PutObject. Add a separate call for inline data that will fetch a buffer for the inline data before unmarshal.	2024-04-01 16:42:09 -07:00
Harshavardhana	deeadd1a37	fix: convert multiple callers to use toStorageErr(err) correctly (#19339 ) we must attempt to convert all errors at storage-rest-client into StorageErr() regardless of what functionality is being called in, this PR fixes this for multiple callers including some internally used functions.	2024-03-25 23:24:59 -07:00
Harshavardhana	51874a5776	fix: allow DNS disconnection events to happen in k8s (#19145 ) in k8s things really do come online very asynchronously, we need to use implementation that allows this randomness. To facilitate this move WriteAll() as part of the websocket layer instead. Bonus: avoid instances of dnscache usage on k8s	2024-02-28 09:54:52 -08:00
Aditya Manthramurthy	62ce52c8fd	cachevalue: simplify exported interface (#19137 ) - Also add cache options type	2024-02-28 09:09:09 -08:00
Klaus Post	2b5e4b853c	Improve caching (#19130 ) * Remove lock for cached operations. * Rename "Relax" to `ReturnLastGood`. * Add `CacheError` to allow caching values even on errors. * Add NoWait that will return current value with async fetching if within 2xTTL. * Make benchmark somewhat representative. ``` Before: BenchmarkCache-12 16408370 63.12 ns/op 0 B/op After: BenchmarkCache-12 428282187 2.789 ns/op 0 B/op ``` * Remove `storageRESTClient.scanning`. Nonsensical - RPC clients will not have any idea about scanning. * Always fetch remote diskinfo metrics and cache them. Seems most calls are requesting metrics. * Do async fetching of usage caches.	2024-02-26 10:49:19 -08:00
Harshavardhana	a3ac62596c	move timedValue -> cachevalue package (#19114 )	2024-02-23 13:28:14 -08:00
Harshavardhana	2faba02d6b	fix: allow diskInfo at storageRPC to be cached (#19112 ) Bonus: convert timedValue into a typed implementation	2024-02-23 09:21:38 -08:00
Klaus Post	e06168596f	Convert more peer <--> peer REST calls (#19004 ) * Convert more peer <--> peer REST calls * Clean up in general. * Add JSON wrapper. * Add slice wrapper. * Add option to make handler return nil error if no connection is given, `IgnoreNilConn`. Converts the following: ``` + HandlerGetMetrics + HandlerGetResourceMetrics + HandlerGetMemInfo + HandlerGetProcInfo + HandlerGetOSInfo + HandlerGetPartitions + HandlerGetNetInfo + HandlerGetCPUs + HandlerServerInfo + HandlerGetSysConfig + HandlerGetSysServices + HandlerGetSysErrors + HandlerGetAllBucketStats + HandlerGetBucketStats + HandlerGetSRMetrics + HandlerGetPeerMetrics + HandlerGetMetacacheListing + HandlerUpdateMetacacheListing + HandlerGetPeerBucketMetrics + HandlerStorageInfo + HandlerGetLocks + HandlerBackgroundHealStatus + HandlerGetLastDayTierStats + HandlerSignalService + HandlerGetBandwidth ```	2024-02-19 14:54:46 -08:00
Harshavardhana	035a3ea4ae	optimize startup sequence performance (#19009 ) - bucket metadata does not need to look for legacy things anymore if b.Created is non-zero - stagger bucket metadata loads across lots of nodes to avoid the current thundering herd problem. - Remove deadlines for RenameData, RenameFile - these calls should not ever be timed out and should wait until completion or wait for client timeout. Do not choose timeouts for applications during the WRITE phase. - increase R/W buffer size, increase maxMergeMessages to 30	2024-02-08 11:21:21 -08:00
Harshavardhana	960d604013	disconnected returns, an unexpected error to List() returning 500s (#18959 ) provide the error string appropriately so that the matching of error types works. Also add a string based fallback for the said error.	2024-02-03 01:04:33 -08:00
Klaus Post	b192bc348c	Improve object reuse for grid messages (#18940 ) Allow internal types to support a `Recycler` interface, which will allow for sharing of common types across handlers. This means that all `grid.MSS` (and similar) objects are shared across in a common pool instead of a per-handler pool. Add internal request reuse of internal types. Add for safe (pointerless) types explicitly. Only log params for internal types. Doing Sprint(obj) is just a bit too messy.	2024-02-01 12:41:20 -08:00
Harshavardhana	80ca120088	remove checkBucketExist check entirely to avoid fan-out calls (#18917 ) Each Put, List, Multipart operations heavily rely on making GetBucketInfo() call to verify if bucket exists or not on a regular basis. This has a large performance cost when there are tons of servers involved. We did optimize this part by vectorizing the bucket calls, however its not enough, beyond 100 nodes and this becomes fairly visible in terms of performance.	2024-01-30 12:43:25 -08:00
Harshavardhana	1d3bd02089	avoid close 'nil' panics if any (#18890 ) brings a generic implementation that prints a stack trace for 'nil' channel closes(), if not safely closes it.	2024-01-28 10:04:17 -08:00
Harshavardhana	6347fb6636	add missing proper error return in WalkDir() (#18884 ) without this the caller might end up returning incorrect errors and not ignoring the drive properly.	2024-01-27 16:13:41 -08:00
Harshavardhana	74851834c0	further bootstrap/startup optimization for reading 'format.json' (#18868 ) - Move RenameFile to websockets - Move ReadAll that is primarily is used for reading 'format.json' to to websockets - Optimize DiskInfo calls, and provide a way to make a NoOp DiskInfo call.	2024-01-25 12:45:46 -08:00
Harshavardhana	52229a21cb	avoid reload of 'format.json' over the network under normal conditions (#18842 )	2024-01-23 14:11:46 -08:00
Harshavardhana	21d60eab7c	remove all older unused APIs (#18769 )	2024-01-17 20:41:23 -08:00
Anis Eleuch	a47fc75c26	xl: Remove wrong wording for errCorruptedFormat (#18775 ) Also add errCorruptedBackend to make it easier to differentiate between corrupted content or something else wrong in the backend drive	2024-01-12 14:48:44 -08:00
Anis Eleuch	3f4488c589	scanner: Allow full throttle if there is no parallel disk ops (#18109 )	2024-01-02 13:51:24 -08:00
Harshavardhana	a50ea92c64	feat: introduce list_quorum="auto" to prefer quorum drives (#18084 ) NOTE: This feature is not retro-active; it will not cater to previous transactions on existing setups. To enable this feature, please set ` _MINIO_DRIVE_QUORUM=on` environment variable as part of systemd service or k8s configmap. Once this has been enabled, you need to also set `list_quorum`. ``` ~ mc admin config set alias/ api list_quorum=auto` ``` A new debugging tool is available to check for any missing counters.	2023-12-29 15:52:41 -08:00
Harshavardhana	9032f49f25	DiskInfo() must return errDiskNotFound not internal errors (#18514 )	2023-11-24 09:07:14 -08:00
Harshavardhana	a4cfb5e1ed	return errors if dataDir is missing during HeadObject() (#18477 ) Bonus: allow replication to attempt Deletes/Puts when the remote returns quorum errors of some kind, this is to ensure that MinIO can rewrite the namespace with the latest version that exists on the source.	2023-11-20 21:33:47 -08:00
Klaus Post	51aa59a737	perf: websocket grid connectivity for all internode communication (#18461 ) This PR adds a WebSocket grid feature that allows servers to communicate via a single two-way connection. There are two request types: * Single requests, which are `[]byte => ([]byte, error)`. This is for efficient small roundtrips with small payloads. * Streaming requests which are `[]byte, chan []byte => chan []byte (and error)`, which allows for different combinations of full two-way streams with an initial payload. Only a single stream is created between two machines - and there is, as such, no server/client relation since both sides can initiate and handle requests. Which server initiates the request is decided deterministically on the server names. Requests are made through a mux client and server, which handles message passing, congestion, cancelation, timeouts, etc. If a connection is lost, all requests are canceled, and the calling server will try to reconnect. Registered handlers can operate directly on byte slices or use a higher-level generics abstraction. There is no versioning of handlers/clients, and incompatible changes should be handled by adding new handlers. The request path can be changed to a new one for any protocol changes. First, all servers create a "Manager." The manager must know its address as well as all remote addresses. This will manage all connections. To get a connection to any remote, ask the manager to provide it given the remote address using. ``` func (m Manager) Connection(host string) Connection ``` All serverside handlers must also be registered on the manager. This will make sure that all incoming requests are served. The number of in-flight requests and responses must also be given for streaming requests. The "Connection" returned manages the mux-clients. Requests issued to the connection will be sent to the remote. * `func (c Connection) Request(ctx context.Context, h HandlerID, req []byte) ([]byte, error)` performs a single request and returns the result. Any deadline provided on the request is forwarded to the server, and canceling the context will make the function return at once. `func (c Connection) NewStream(ctx context.Context, h HandlerID, payload []byte) (st Stream, err error)` will initiate a remote call and send the initial payload. ```Go // A Stream is a two-way stream. // All responses must be read by the caller. // If the call is canceled through the context, //The appropriate error will be returned. type Stream struct { // Responses from the remote server. // Channel will be closed after an error or when the remote closes. // All responses must be read by the caller until either an error is returned or the channel is closed. // Canceling the context will cause the context cancellation error to be returned. Responses <-chan Response // Requests sent to the server. // If the handler is defined with 0 incoming capacity this will be nil. // Channel must be closed to signal the end of the stream. // If the request context is canceled, the stream will no longer process requests. Requests chan<- []byte } type Response struct { Msg []byte Err error } ``` There are generic versions of the server/client handlers that allow the use of type safe implementations for data types that support msgpack marshal/unmarshal.	2023-11-20 17:09:35 -08:00
Harshavardhana	754f7a8a39	replace io.Discard usage to fix some NUMA copy() latencies (#18394 ) replace io.Discard usage to fix NUMA copy() latencies On NUMA systems copying from 8K buffer allocated via io.Discard leads to large latency build-up for every ``` copy(new8kbuf, largebuf) ``` can in-cur upto 1ms worth of latencies on NUMA systems due to memory sharding across NUMA nodes.	2023-11-06 14:26:08 -08:00
Harshavardhana	8e32de3ba9	cache DiskInfo() metrics call separately (#18270 )	2023-10-18 11:17:32 -07:00
Harshavardhana	ac3a19138a	fix: set scanning details locally to avoid cached values (#18092 ) atomic variable results such as scanning must not use cached values, instead rely on real-time information.	2023-09-25 08:26:29 -07:00
Aditya Manthramurthy	1c99fb106c	Update to minio/pkg/v2 (#17967 )	2023-09-04 12:57:37 -07:00
Harshavardhana	124e28578c	remove strict persistence requirements for List() .metacache objects (#17917 ) .metacache objects are transient in nature, and are better left to use page-cache effectively to avoid using more IOPs on the disks. this allows for incoming calls to be not taxed heavily due to multiple large batch listings.	2023-08-25 07:58:11 -07:00
Harshavardhana	b0f0e53bba	fix: make sure to correctly initialize health checks (#17765 ) health checks were missing for drives replaced since - HealFormat() would replace the drives without a health check - disconnected drives when they reconnect via connectEndpoint() the loop also loses health checks for local disks and merges these into a single code. - other than this separate cleanUp, health check variables to avoid overloading them with similar requirements. - also ensure that we compete via context selector for disk monitoring such that the canceled disks don't linger around longer waiting for the ticker to trigger. - allow disabling active monitoring.	2023-08-01 10:54:26 -07:00
Harshavardhana	81be718674	fix: optimize DiskInfo() call avoid metrics when not needed (#17763 )	2023-07-31 15:20:48 -07:00
Harshavardhana	82075e8e3a	use strconv variants to improve on performance per 'op' (#17626 ) ``` BenchmarkItoa BenchmarkItoa-8 673628088 1.946 ns/op 0 B/op 0 allocs/op BenchmarkFormatInt BenchmarkFormatInt-8 592919769 2.012 ns/op 0 B/op 0 allocs/op BenchmarkSprint BenchmarkSprint-8 26149144 49.06 ns/op 2 B/op 1 allocs/op BenchmarkSprintBool BenchmarkSprintBool-8 26440180 45.92 ns/op 4 B/op 1 allocs/op BenchmarkFormatBool BenchmarkFormatBool-8 1000000000 0.2558 ns/op 0 B/op 0 allocs/op ```	2023-07-11 07:46:58 -07:00
Klaus Post	ff5988f4e0	Reduce allocations (#17584 ) * Reduce allocations * Add stringsHasPrefixFold which can compare string prefixes, while ignoring case and not allocating. * Reuse all msgp.Readers * Reuse metadata buffers when not reading data. * Make type safe. Make buffer 4K instead of 8. * Unslice	2023-07-06 16:02:08 -07:00
Aditya Manthramurthy	5a1612fe32	Bump up madmin-go and pkg deps (#17469 )	2023-06-19 17:53:08 -07:00

1 2 3 4

191 Commits