minio

Commit Graph

Author	SHA1	Message	Date
Krishnan Parthasarathi	04be352ae9	Relax quorum agreement on DataDir values (#20232 ) Previously, we checked if we had a quorum on the DataDir value. We are removing this check, which allows reading objects with different DataDir values in a few drives (due to a rebalance-stop race bug) provided their eTags or ModTimes match.	2024-08-12 12:02:21 -07:00
Harshavardhana	2e0fd2cba9	implement a safer completeMultipart implementation (#20227 ) - optimize writing part.N.meta by writing both part.N and its meta in sequence without network component. - remove part.N.meta, part.N which were partially success ful, in quorum loss situations during renamePart() - allow for strict read quorum check arbitrated via ETag for the given part number, this makes it double safer upon final commit. - return an appropriate error when read quorum is missing, instead of returning InvalidPart{}, which is non-retryable error. This kind of situation can happen when many nodes are going offline in rotation, an example of such a restart() behavior is statefulset updates in k8s. fixes #20091	2024-08-12 01:38:15 -07:00
Poorna	6651c655cb	fix replication of checksum when encryption is enabled (#20161 ) - Adding functional tests - Return checksum header on GET/HEAD, previously this was returning InvalidPartNumber error	2024-07-29 01:02:16 -07:00
Krishnan Parthasarathi	4a1edfd9aa	Different read quorum for tiered objects (#20115 ) For a non-tiered object, MinIO requires that EcM (# of data blocks) of xl.meta agree, corresponding to the number of data blocks needed to read this object. OTOH, tiered objects have metadata in the hot tier and data in the warm tier. The data and its integrity are offloaded to the warm tier. This allows us to reduce the read quorum from EcM (typically > N/2, where N - erasure stripe width) to N/2 + 1. The simple majority of metadata ensures consensus on what the object is and where it is located.	2024-07-25 14:02:50 -07:00
Aditya Manthramurthy	5f78691fcf	ldap: Add user DN attributes list config param (#19758 ) This change uses the updated ldap library in minio/pkg (bumped up to v3). A new config parameter is added for LDAP configuration to specify extra user attributes to load from the LDAP server and to store them as additional claims for the user. A test is added in sts_handlers.go that shows how to access the LDAP attributes as a claim. This is in preparation for adding SSH pubkey authentication to MinIO's SFTP integration.	2024-05-24 16:05:23 -07:00
Krishnan Parthasarathi	1228d6bf1a	Return NumVersions in quorum when available (#19766 ) Similar to https://github.com/minio/minio/pull/17925	2024-05-17 13:57:37 -07:00
Harshavardhana	a372c6a377	a bunch of fixes for error handling (#19627 ) - handle errFileCorrupt properly - micro-optimization of sending done() response quicker to close the goroutine. - fix logger.Event() usage in a couple of places - handle the rest of the client to return a different error other than lastErr() when the client is closed.	2024-04-28 10:53:50 -07:00
Poorna	e7aa26dc29	fix: allow DeleteObject unversioned objects with insufficient read quorum (#19581 ) Since the object is being permanently deleted, the lack of read quorum should not matter as long as sufficient disks are online to complete the deletion with parity requirements. If several pools have the same object with insufficient read quorum, attempt to delete object from all the pools where it exists	2024-04-25 17:31:12 -07:00
Anis Eleuch	95bf4a57b6	logging: Add subsystem to log API (#19002 ) Create new code paths for multiple subsystems in the code. This will make maintaing this easier later. Also introduce bugLogIf() for errors that should not happen in the first place.	2024-04-04 05:04:40 -07:00
Anis Eleuch	24b4f9d748	Fix quorum calculation with zero parity objects (#19250 ) Currently, the code relies on object parity to decide whether it is a delete marker or a regular object. In the case of a delete marker, the return quorum is half of the disks in the erasure set. However, this calculation must be corrected with objects with EC = 0, mainly because EC is not a one-time fixed configuration. Though all data are correct, the manifested symptom is a 503 with an EC=0 object. This bug was manifested after we introduced the fast Get Object feature that does not read all data from all disks in case of inlined objects	2024-03-12 12:59:11 -07:00
Krishnan Parthasarathi	a7577da768	Improve expiration of tiered objects (#18926 ) - Use a shared worker pool for all ILM expiry tasks - Free version cleanup executes in a separate goroutine - Add a free version only if removing the remote object fails - Add ILM expiry metrics to the node namespace - Move tier journal tasks to expiryState - Remove unused on-disk journal for tiered objects pending deletion - Distribute expiry tasks across workers such that the expiry of versions of the same object serialized - Ability to resize worker pool without server restart - Make scaling down of expiryState workers' concurrency safe; Thanks @klauspost - Add error logs when expiryState and transition state are not initialized (yet) * metrics: Add missed tier journal entry tasks * Initialize the ILM worker pool after the object layer	2024-03-01 21:11:03 -08:00
Harshavardhana	c599c11e70	fix: relax metadata checks for healing (#19165 ) we should do this to ensure that we focus on data healing as primary focus, fixing metadata as part of healing must be done but making data available is the main focus. the main reason is metadata inconsistencies can cause data availability issues, which must be avoided at all cost. will be bringing in an additional healing mechanism that involves "metadata-only" heal, for now we do not expect to have these checks. continuation of #19154 Bonus: add a pro-active healthcheck to perform a connection	2024-02-29 22:49:01 -08:00
Harshavardhana	80ca120088	remove checkBucketExist check entirely to avoid fan-out calls (#18917 ) Each Put, List, Multipart operations heavily rely on making GetBucketInfo() call to verify if bucket exists or not on a regular basis. This has a large performance cost when there are tons of servers involved. We did optimize this part by vectorizing the bucket calls, however its not enough, beyond 100 nodes and this becomes fairly visible in terms of performance.	2024-01-30 12:43:25 -08:00
Harshavardhana	708cebe7f0	add necessary protection err, fileInfo slice reads and writes (#18854 ) protection was in place. However, it covered only some areas, so we re-arranged the code to ensure we could hold locks properly. Along with this, remove the DataShardFix code altogether, in deployments with many drive replacements, this can affect and lead to quorum loss.	2024-01-24 01:08:23 -08:00
Harshavardhana	dd2542e96c	add codespell action (#18818 ) Original work here, #18474, refixed and updated.	2024-01-17 23:03:17 -08:00
Harshavardhana	38637897ba	fix: listing SSE encrypted multipart objects (#18786 ) GetActualSize() was heavily relying on o.Parts() to be non-empty to figure out if the object is multipart or not, However, we have many indicators of whether an object is multipart or not. Blindly assuming that o.Parts == nil is not a multipart, is an incorrect expectation instead, multipart must be obtained via - Stored metadata value indicating this is a multipart encrypted object. - Rely on <meta>-actual-size metadata to get the object's actual size. This value is preserved for additional reasons such as these. - ETag != 32 length	2024-01-15 00:57:49 -08:00
Harshavardhana	fba883839d	feat: bring new HDD related performance enhancements (#18239 ) Optionally allows customers to enable - Enable an external cache to catch GET/HEAD responses - Enable skipping disks that are slow to respond in GET/HEAD when we have already achieved a quorum	2023-11-22 13:46:17 -08:00
Poorna	96ec8fcba1	Preserve replica timestamps in multipart (#18318 ) Also a backward compatibility fix to use x-amz-replica-status if present as replication status.	2023-10-25 21:24:10 -07:00
Aditya Manthramurthy	1c99fb106c	Update to minio/pkg/v2 (#17967 )	2023-09-04 12:57:37 -07:00
Krishnan Parthasarathi	71c32e9b48	Return successorModTime in quorum when available (#17925 )	2023-09-04 08:24:17 -07:00
Harshavardhana	18b3655c99	with xlv2 format we never had to fill in checksumInfo() (#17963 ) - this PR avoids sending a large ChecksumInfo slice when its not needed - also for a file with XLV2 format there is no reason to allocate Checksum slice while reading	2023-09-01 13:45:58 -07:00
Harshavardhana	1ea7826c0e	do not have to consider replicationTimestamp for healing and quorum (#17922 ) replicationTimestamp might differ if there were retries in replication and the retried attempt overwrote in quorum but enough shards with newer timestamp causing the existing timestamps on xl.meta to be invalid, we do not rely on this value for anything external. this is purely a hint for debugging purposes, but there is no real value in it considering the object itself is in-tact we do not have to spend time healing this situation. we may consider healing this situation in future but that needs to be decoupled to make sure that we do not over calculate how much we have to heal.	2023-08-25 15:31:15 -07:00
Harshavardhana	9ebd10d3f4	Revert "Include SuccessorModTime for FileInfo quorum (#17732 )" (#17860 ) This reverts commit `bf3901342c`. This is to fix a regression caused when there are inconsistent versions, but one version is in quorum. SuccessorModTime issue must be fixed differently.	2023-08-16 07:51:33 -07:00
Krishnan Parthasarathi	bf3901342c	Include SuccessorModTime for FileInfo quorum (#17732 )	2023-07-26 17:04:16 -07:00
Harshavardhana	a566bcf613	treat 0-byte objects to honor same quorum as delete marker (#17633 ) on unversioned buckets its possible that 0-byte objects might lose quorum on flaky systems, allow them to be same as DELETE markers. Since practically speak they have no content.	2023-07-11 21:53:49 -07:00
Harshavardhana	1443b5927a	allow quorum fileInfo to pick same parityBlocks (#17454 ) Bonus: allow replication to proceed for 503 errors such as with error code SlowDownRead	2023-06-18 18:20:15 -07:00
Harshavardhana	64de61d15d	fallback on etags if they match when mtime is not same (#17424 ) on "unversioned" buckets there are situations when successive concurrent I/O can lead to an inconsistent state() with mtime while the etag might be the same for the object on disk. in such a scenario it is possible for us to allow reading of the object since etag matches and if etag matches we are guaranteed that we have enough copies the object will be readable and same. This PR allows fallback in such scenarios.	2023-06-17 19:18:20 -07:00
jiuker	0474791cf8	fix: set time format right (#17402 )	2023-06-14 07:49:13 -07:00
Harshavardhana	d5059840ef	fix: for delete marked objects choose appropriate parity (#17287 )	2023-05-26 09:57:44 -07:00
Anis Eleuch	a30a55f3b1	Add object parity in listing V2M and listing versions M (#17238 )	2023-05-19 09:42:45 -07:00
Praveen raj Mani	72802a5972	Use 'minio/pkg/sync/errgroup' and 'minio/pkg/workers' (#17069 )	2023-04-25 22:57:40 -07:00
Harshavardhana	8fd07bcd51	simplify sort.Sort by using sort.Slice (#17066 )	2023-04-24 13:28:18 -07:00
Harshavardhana	6825bd7e75	fix: inlined objects don't need to honor long locks (#17039 )	2023-04-17 12:16:37 -07:00
Krishnan Parthasarathi	f92450d8b3	commonParity should pick readable FileInfo (#17032 )	2023-04-14 16:23:28 -07:00
Harshavardhana	72daccd468	fix: scanner in healing cycle must use actual size (#16589 )	2023-02-10 06:53:03 -08:00
Harshavardhana	c242e6c391	fix: calculate common parity properly (#16406 )	2023-01-13 03:28:16 +05:30
Harshavardhana	2937711390	fix: DeleteObject() API with versionId under replication (#16325 )	2022-12-28 22:48:33 -08:00
Krishnan Parthasarathi	40a2c6b882	Return remote tier as StorageClass for transitioned objects (#16035 )	2022-11-09 15:57:34 -08:00
Anis Elleuch	783dd875f7	refactor objectQuorumFromMeta() to search for parity quorum (#15844 )	2022-10-12 16:42:45 -07:00
Harshavardhana	228c6686f8	allow non-standards fallback for all http.TimeFormats (#15662 ) fixes #15645	2022-09-07 07:24:54 -07:00
Harshavardhana	7776d064cf	allow non-standards fallback for Expires header (#15655 ) fixes #15645	2022-09-05 19:18:18 -07:00
Klaus Post	a9f1ad7924	Add extended checksum support (#15433 )	2022-08-29 16:57:16 -07:00
Harshavardhana	65166e4ce4	fix: readQuorum calculation when defaultParityCount is 0 (#15363 ) when parity is '0' the readQuorum must be equal to the number of data disks.	2022-07-21 07:25:54 -07:00
Harshavardhana	ce8397f7d9	use partInfo only for intermediate part.x.meta (#15353 )	2022-07-19 18:56:24 -07:00
Klaus Post	911a17b149	Add compressed file index (#15247 )	2022-07-11 17:30:56 -07:00
Harshavardhana	9c605ad153	allow support for parity '0', '1' enabling support for 2,3 drive setups (#15171 ) allows for further granular setups - 2 drives (1 parity, 1 data) - 3 drives (1 parity, 2 data) Bonus: allows '0' parity as well.	2022-06-27 20:22:18 -07:00
Harshavardhana	52221db7ef	fix: for unexpected errors in reading versioning config panic (#14994 ) We need to make sure if we cannot read bucket metadata for some reason, and bucket metadata is not missing and returning corrupted information we should panic such handlers to disallow I/O to protect the overall state on the system. In-case of such corruption we have a mechanism now to force recreate the metadata on the bucket, using `x-minio-force-create` header with `PUT /bucket` API call. Additionally fix the versioning config updated state to be set properly for the site replication healing to trigger correctly.	2022-05-31 02:57:57 -07:00
Harshavardhana	f1abb92f0c	feat: Single drive XL implementation (#14970 ) Main motivation is move towards a common backend format for all different types of modes in MinIO, allowing for a simpler code and predictable behavior across all features. This PR also brings features such as versioning, replication, transitioning to single drive setups.	2022-05-30 10:58:37 -07:00
Harshavardhana	9d07cde385	use crypto/sha256 only for FIPS 140-2 compliance (#14983 ) It would seem like the PR #11623 had chewed more than it wanted to, non-fips build shouldn't really be forced to use slower crypto/sha256 even for presumed "non-performance" codepaths. In MinIO there are really no "non-performance" codepaths. This assumption seems to have had an adverse effect in certain areas of CPU usage. This PR ensures that we stick to sha256-simd on all non-FIPS builds, our most common build to ensure we get the best out of the CPU at any given point in time.	2022-05-27 06:00:19 -07:00
Anis Elleuch	77dc99e71d	Do not use inline data size in xl.meta quorum calculation (#14831 ) * Do not use inline data size in xl.meta quorum calculation Data shards of one object can different inline/not-inline decision in multiple disks. This happens with outdated disks when inline decision changes. For example, enabling bucket versioning configuration will change the small file threshold. When the parity of an object becomes low, GET object can return 503 because it is not unable to calculate the xl.meta quorum, just because some xl.meta has inline data and other are not. So this commit will be disable taking the size of the inline data into consideration when calculating the xl.meta quorum. * Add tests for simulatenous inline/notinline object Co-authored-by: Anis Elleuch <anis@min.io>	2022-05-24 06:26:38 -07:00

1 2

100 Commits