minio

Commit Graph

Author	SHA1	Message	Date
Aditya Manthramurthy	1c99fb106c	Update to minio/pkg/v2 (#17967 )	2023-09-04 12:57:37 -07:00
Krishnan Parthasarathi	71c32e9b48	Return successorModTime in quorum when available (#17925 )	2023-09-04 08:24:17 -07:00
Harshavardhana	18b3655c99	with xlv2 format we never had to fill in checksumInfo() (#17963 ) - this PR avoids sending a large ChecksumInfo slice when its not needed - also for a file with XLV2 format there is no reason to allocate Checksum slice while reading	2023-09-01 13:45:58 -07:00
Harshavardhana	1ea7826c0e	do not have to consider replicationTimestamp for healing and quorum (#17922 ) replicationTimestamp might differ if there were retries in replication and the retried attempt overwrote in quorum but enough shards with newer timestamp causing the existing timestamps on xl.meta to be invalid, we do not rely on this value for anything external. this is purely a hint for debugging purposes, but there is no real value in it considering the object itself is in-tact we do not have to spend time healing this situation. we may consider healing this situation in future but that needs to be decoupled to make sure that we do not over calculate how much we have to heal.	2023-08-25 15:31:15 -07:00
Harshavardhana	9ebd10d3f4	Revert "Include SuccessorModTime for FileInfo quorum (#17732 )" (#17860 ) This reverts commit `bf3901342c`. This is to fix a regression caused when there are inconsistent versions, but one version is in quorum. SuccessorModTime issue must be fixed differently.	2023-08-16 07:51:33 -07:00
Krishnan Parthasarathi	bf3901342c	Include SuccessorModTime for FileInfo quorum (#17732 )	2023-07-26 17:04:16 -07:00
Harshavardhana	a566bcf613	treat 0-byte objects to honor same quorum as delete marker (#17633 ) on unversioned buckets its possible that 0-byte objects might lose quorum on flaky systems, allow them to be same as DELETE markers. Since practically speak they have no content.	2023-07-11 21:53:49 -07:00
Harshavardhana	1443b5927a	allow quorum fileInfo to pick same parityBlocks (#17454 ) Bonus: allow replication to proceed for 503 errors such as with error code SlowDownRead	2023-06-18 18:20:15 -07:00
Harshavardhana	64de61d15d	fallback on etags if they match when mtime is not same (#17424 ) on "unversioned" buckets there are situations when successive concurrent I/O can lead to an inconsistent state() with mtime while the etag might be the same for the object on disk. in such a scenario it is possible for us to allow reading of the object since etag matches and if etag matches we are guaranteed that we have enough copies the object will be readable and same. This PR allows fallback in such scenarios.	2023-06-17 19:18:20 -07:00
jiuker	0474791cf8	fix: set time format right (#17402 )	2023-06-14 07:49:13 -07:00
Harshavardhana	d5059840ef	fix: for delete marked objects choose appropriate parity (#17287 )	2023-05-26 09:57:44 -07:00
Anis Eleuch	a30a55f3b1	Add object parity in listing V2M and listing versions M (#17238 )	2023-05-19 09:42:45 -07:00
Praveen raj Mani	72802a5972	Use 'minio/pkg/sync/errgroup' and 'minio/pkg/workers' (#17069 )	2023-04-25 22:57:40 -07:00
Harshavardhana	8fd07bcd51	simplify sort.Sort by using sort.Slice (#17066 )	2023-04-24 13:28:18 -07:00
Harshavardhana	6825bd7e75	fix: inlined objects don't need to honor long locks (#17039 )	2023-04-17 12:16:37 -07:00
Krishnan Parthasarathi	f92450d8b3	commonParity should pick readable FileInfo (#17032 )	2023-04-14 16:23:28 -07:00
Harshavardhana	72daccd468	fix: scanner in healing cycle must use actual size (#16589 )	2023-02-10 06:53:03 -08:00
Harshavardhana	c242e6c391	fix: calculate common parity properly (#16406 )	2023-01-13 03:28:16 +05:30
Harshavardhana	2937711390	fix: DeleteObject() API with versionId under replication (#16325 )	2022-12-28 22:48:33 -08:00
Krishnan Parthasarathi	40a2c6b882	Return remote tier as StorageClass for transitioned objects (#16035 )	2022-11-09 15:57:34 -08:00
Anis Elleuch	783dd875f7	refactor objectQuorumFromMeta() to search for parity quorum (#15844 )	2022-10-12 16:42:45 -07:00
Harshavardhana	228c6686f8	allow non-standards fallback for all http.TimeFormats (#15662 ) fixes #15645	2022-09-07 07:24:54 -07:00
Harshavardhana	7776d064cf	allow non-standards fallback for Expires header (#15655 ) fixes #15645	2022-09-05 19:18:18 -07:00
Klaus Post	a9f1ad7924	Add extended checksum support (#15433 )	2022-08-29 16:57:16 -07:00
Harshavardhana	65166e4ce4	fix: readQuorum calculation when defaultParityCount is 0 (#15363 ) when parity is '0' the readQuorum must be equal to the number of data disks.	2022-07-21 07:25:54 -07:00
Harshavardhana	ce8397f7d9	use partInfo only for intermediate part.x.meta (#15353 )	2022-07-19 18:56:24 -07:00
Klaus Post	911a17b149	Add compressed file index (#15247 )	2022-07-11 17:30:56 -07:00
Harshavardhana	9c605ad153	allow support for parity '0', '1' enabling support for 2,3 drive setups (#15171 ) allows for further granular setups - 2 drives (1 parity, 1 data) - 3 drives (1 parity, 2 data) Bonus: allows '0' parity as well.	2022-06-27 20:22:18 -07:00
Harshavardhana	52221db7ef	fix: for unexpected errors in reading versioning config panic (#14994 ) We need to make sure if we cannot read bucket metadata for some reason, and bucket metadata is not missing and returning corrupted information we should panic such handlers to disallow I/O to protect the overall state on the system. In-case of such corruption we have a mechanism now to force recreate the metadata on the bucket, using `x-minio-force-create` header with `PUT /bucket` API call. Additionally fix the versioning config updated state to be set properly for the site replication healing to trigger correctly.	2022-05-31 02:57:57 -07:00
Harshavardhana	f1abb92f0c	feat: Single drive XL implementation (#14970 ) Main motivation is move towards a common backend format for all different types of modes in MinIO, allowing for a simpler code and predictable behavior across all features. This PR also brings features such as versioning, replication, transitioning to single drive setups.	2022-05-30 10:58:37 -07:00
Harshavardhana	9d07cde385	use crypto/sha256 only for FIPS 140-2 compliance (#14983 ) It would seem like the PR #11623 had chewed more than it wanted to, non-fips build shouldn't really be forced to use slower crypto/sha256 even for presumed "non-performance" codepaths. In MinIO there are really no "non-performance" codepaths. This assumption seems to have had an adverse effect in certain areas of CPU usage. This PR ensures that we stick to sha256-simd on all non-FIPS builds, our most common build to ensure we get the best out of the CPU at any given point in time.	2022-05-27 06:00:19 -07:00
Anis Elleuch	77dc99e71d	Do not use inline data size in xl.meta quorum calculation (#14831 ) * Do not use inline data size in xl.meta quorum calculation Data shards of one object can different inline/not-inline decision in multiple disks. This happens with outdated disks when inline decision changes. For example, enabling bucket versioning configuration will change the small file threshold. When the parity of an object becomes low, GET object can return 503 because it is not unable to calculate the xl.meta quorum, just because some xl.meta has inline data and other are not. So this commit will be disable taking the size of the inline data into consideration when calculating the xl.meta quorum. * Add tests for simulatenous inline/notinline object Co-authored-by: Anis Elleuch <anis@min.io>	2022-05-24 06:26:38 -07:00
Krishnan Parthasarathi	ad8e611098	feat: implement prefix-level versioning exclusion (#14828 ) Spark/Hadoop workloads which use Hadoop MR Committer v1/v2 algorithm upload objects to a temporary prefix in a bucket. These objects are 'renamed' to a different prefix on Job commit. Object storage admins are forced to configure separate ILM policies to expire these objects and their versions to reclaim space. Our solution: This can be avoided by simply marking objects under these prefixes to be excluded from versioning, as shown below. Consequently, these objects are excluded from replication, and don't require ILM policies to prune unnecessary versions. - MinIO Extension to Bucket Version Configuration ```xml <VersioningConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> <Status>Enabled</Status> <ExcludeFolders>true</ExcludeFolders> <ExcludedPrefixes> <Prefix>app1-jobs//_temporary/</Prefix> </ExcludedPrefixes> <ExcludedPrefixes> <Prefix>app2-jobs//__magic/</Prefix> </ExcludedPrefixes> <!-- .. up to 10 prefixes in all --> </VersioningConfiguration> ``` Note: `ExcludeFolders` excludes all folders in a bucket from versioning. This is required to prevent the parent folders from accumulating delete markers, especially those which are shared across spark workloads spanning projects/teams. - To enable version exclusion on a list of prefixes ``` mc version enable --excluded-prefixes "app1-jobs//_temporary/,app2-jobs//_magic," --exclude-prefix-marker myminio/test ```	2022-05-06 19:05:28 -07:00
Krishnan Parthasarathi	5a0c0079a1	Don't add free-version on restore-object (#14340 )	2022-02-17 15:05:19 -08:00
Harshavardhana	0256dae657	fix: quorum requirement for DeleteMarkers and parity upgraded objects (#14248 ) DeleteMarkers do not have a default quorum, i.e it is possible that DeleteMarkers were created with n/2+1 quorum as well to make sure that we satisfy situations such as those we need to make sure delete markers only expect n/2 read quorum. Additionally we should also look at additional metadata on the actual objects that might have been "erasure" upgraded with new parity when disks are down. In such a scenario do not default to the standard storage class parity, instead use the parityBlocks present on the FileInfo to ensure that we are dealing with the correct quorum for READs and DELETEs.	2022-02-04 02:47:36 -08:00
Harshavardhana	54ec0a1308	add configurable delta for skipping shards (#13967 ) This PR is an attempt to make this configurable as not all situations have same level of tolerable delta, i.e disks are replaced days apart or even hours. There is also a possibility that nodes have drifted in time, when NTP is not configured on the system.	2021-12-22 11:43:01 -08:00
Harshavardhana	0e3037631f	skip inconsistent shards if possible (#13945 ) data shards were wrong due to a healing bug reported in #13803 mainly with unaligned object sizes. This PR is an attempt to automatically avoid these shards, with available information about the `xl.meta` and actually disk mtime.	2021-12-21 10:08:26 -08:00
Harshavardhana	28f95f1fbe	quorum calculation getLatestFileInfo should be itself (#13717 ) FileInfo quorum shouldn't be passed down, instead inferred after obtaining a maximally occurring FileInfo. This PR also changes other functions that rely on wrong quorum calculation. Update tests as well to handle the proper requirement. All these changes are needed when migrating from older deployments where we used to set N/2 quorum for reads to EC:4 parity in newer releases.	2021-11-22 09:36:29 -08:00
Harshavardhana	c791de0e1e	re-implement pickValidInfo dataDir, move to quorum calculation (#13681 ) dataDir loosely based on maxima is incorrect and does not work in all situations such as disks in the following order - xl.json migration to xl.meta there may be partial xl.json's leftover if some disks are not yet connected when the disk is yet to come up, since xl.json mtime and xl.meta is same the dataDir maxima doesn't work properly leading to quorum issues. - its also possible that XLV1 might be true among the disks available, make sure to keep FileInfo based on common quorum and skip unexpected disks with the older data format. Also, this PR tests upgrade from older to a newer release if the data is readable and matches the checksum. NOTE: this is just initial work we can build on top of this to do further tests.	2021-11-21 10:41:30 -08:00
Klaus Post	faf013ec84	Improve performance on multiple versions (#13573 ) Existing: ```go type xlMetaV2 struct { Versions []xlMetaV2Version `json:"Versions" msg:"Versions"` } ``` Serialized as regular MessagePack. ```go //msgp:tuple xlMetaV2VersionHeader type xlMetaV2VersionHeader struct { VersionID [16]byte ModTime int64 Type VersionType Flags xlFlags } ``` Serialize as streaming MessagePack, format: ``` int(headerVersion) int(xlmetaVersion) int(nVersions) for each version { binary blob, xlMetaV2VersionHeader, serialized binary blob, xlMetaV2Version, serialized. } ``` xlMetaV2VersionHeader is <= 30 bytes serialized. Deserialized struct can easily be reused and does not contain pointers, so efficient as a slice (single allocation) This allows quickly parsing everything as slices of bytes (no copy). Versions are always saved sorted by modTime, newest first. No more need to sort on load. * Allows checking if a version exists. * Allows reading single version without unmarshal all. * Allows reading latest version of type without unmarshal all. * Allows reading latest version without unmarshal of all. * Allows checking if the latest is deleteMarker by reading first entry. * Allows adding/updating/deleting a version with only header deserialization. * Reduces allocations on conversion to FileInfo(s).	2021-11-18 12:15:22 -08:00
Harshavardhana	661b263e77	add gocritic/ruleguard checks back again, cleanup code. (#13665 ) - remove some duplicated code - reported a bug, separately fixed in #13664 - using strings.ReplaceAll() when needed - using filepath.ToSlash() use when needed - remove all non-Go style comments from the codebase Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>	2021-11-16 09:28:29 -08:00
Harshavardhana	94d587e6fc	fix: delete-markers without quorum were unreadable (#13351 ) DeleteMarkers were unreadable if they had quorum based guarantees, this PR tries to fix this behavior appropriately. DeleteMarkers with sufficient should be allowed and the return error should be accordingly with or without version-id. This also allows for overwrites which may not be possible in a multi-pool setup. fixes #12787	2021-10-04 08:53:38 -07:00
Poorna Krishnamoorthy	c4373ef290	Add support for multi site replication (#12880 )	2021-09-18 13:31:35 -07:00
Harshavardhana	0892f1e406	fix: multipart replication and encrypted etag for sse-s3 (#13171 ) Replication was not working properly for encrypted objects in single PUT object for preserving etag, We need to make sure to preserve etag such that replication works properly and not gets into infinite loops of copying due to ETag mismatches.	2021-09-08 22:25:23 -07:00
Krishnan Parthasarathi	db35bcf2ce	heal: Remove transitioned objects' parts from outdated disks (#13018 ) Bonus: check equality for replication and other metadata	2021-08-23 13:14:55 -07:00
Krishnan Parthasarathi	e210cb3670	fix: use transition/replication fields in FileInfo quorum calculation (#13010 )	2021-08-19 14:55:42 -07:00
Harshavardhana	ef4d023c85	fix: various performance improvements to tiering (#12965 ) - deletes should always Sweep() for tiering at the end and does not need an extra getObjectInfo() call - puts, copy and multipart writes should conditionally do getObjectInfo() when tiering targets are configured - introduce 'TransitionedObject' struct for ease of usage and understanding. - multiple-pools optimization deletes don't need to hold read locks verifying objects across namespace and pools.	2021-08-17 07:50:00 -07:00
Harshavardhana	3c34e18a4e	allow multipart uploads for single part multipart (#12821 ) its possible that some multipart uploads would have uploaded only single parts so relying on `len(o.Parts)` alone is not sufficient, we need to look for ETag pattern to be absolutely sure.	2021-07-28 22:11:55 -07:00
Krishnan Parthasarathi	a1df230518	Add a 'free' version to track deletion of tiered object content (#12470 )	2021-06-30 19:32:07 -07:00
Aditya Manthramurthy	30a3921d3e	[Tiering] Support remote tiers with object versioning (#12342 ) - Adds versioning support for S3 based remote tiers that have versioning enabled. This ensures that when reading or deleting we specify the specific version ID of the object. In case of deletion, this is important to ensure that the object version is actually deleted instead of simply being marked for deletion. - Stores the remote object's version id in the tier-journal. Tier-journal file version is not bumped up as serializing the new struct version is compatible with old journals without the remote object version id. - `storageRESTVersion` is bumped up as FileInfo struct now includes a `TransitionRemoteVersionID` member. - Azure and GCS support for this feature will be added subsequently. Co-authored-by: Krishnan Parthasarathi <krisis@users.noreply.github.com>	2021-06-03 14:26:51 -07:00

1 2

82 Commits