minio

Commit Graph

Author	SHA1	Message	Date
Harshavardhana	aa874010e2	fix: regression in resolving the right versions (#15430 ) fix: regression in resolving right versions commit `d480022711` caused a regression in real resolver, by picking up incorrect versionID.	2022-07-29 10:03:53 -07:00
Harshavardhana	ce8397f7d9	use partInfo only for intermediate part.x.meta (#15353 )	2022-07-19 18:56:24 -07:00
Harshavardhana	7da9e3a6f8	support encrypted/compressed objects properly during decommission (#15320 ) fixes #15314	2022-07-16 19:35:24 -07:00
Klaus Post	0149382cdc	Add padding to compressed+encrypted files (#15282 ) Add up to 256 bytes of padding for compressed+encrypted files. This will obscure the obvious cases of extremely compressible content and leave a similar output size for a very wide variety of inputs. This does not mean the compression ratio doesn't leak information about the content, but the outcome space is much smaller, so often less information is leaked.	2022-07-13 07:52:15 -07:00
Klaus Post	911a17b149	Add compressed file index (#15247 )	2022-07-11 17:30:56 -07:00
Praveen raj Mani	b49fc33cb3	purge objects immediately with `x-minio-force-delete` in DeleteObject and DeleteBucket API (#15148 )	2022-07-11 09:15:54 -07:00
Anis Elleuch	54a061bdda	Save minio version information centrally (#15181 )	2022-06-29 14:45:49 -07:00
Harshavardhana	9c605ad153	allow support for parity '0', '1' enabling support for 2,3 drive setups (#15171 ) allows for further granular setups - 2 drives (1 parity, 1 data) - 3 drives (1 parity, 2 data) Bonus: allows '0' parity as well.	2022-06-27 20:22:18 -07:00
Harshavardhana	6722f58668	save MinIO version with each version (8-bytes extra) (#15170 ) store MinIO version along with each version in 'xl.meta' for future purposes, can be used as ways to add specific code for bug fixes if any.	2022-06-27 03:59:41 -07:00
Minio Trusted	e2d4d097e7	do not print errors upon 'nil' err	2022-06-06 17:33:41 -07:00
Harshavardhana	df9eeb7f8f	fix: do not log concurrently when multiple disks return errors (#15044 ) since the values inside 'context' are mutated internally by logger, make sure to log serially upon errors not concurrently.	2022-06-06 15:15:11 -07:00
Harshavardhana	52221db7ef	fix: for unexpected errors in reading versioning config panic (#14994 ) We need to make sure if we cannot read bucket metadata for some reason, and bucket metadata is not missing and returning corrupted information we should panic such handlers to disallow I/O to protect the overall state on the system. In-case of such corruption we have a mechanism now to force recreate the metadata on the bucket, using `x-minio-force-create` header with `PUT /bucket` API call. Additionally fix the versioning config updated state to be set properly for the site replication healing to trigger correctly.	2022-05-31 02:57:57 -07:00
Harshavardhana	d480022711	fix: invalidate outdated disks appropriately during readAllXL (#15002 ) readAllXL would return inlined data for outdated disks causing "read" to return incorrect content to the client, this PR fixes this behavior by making sure we skip such outdated disks appropriately based on the latest ModTime on the disk.	2022-05-30 12:43:54 -07:00
Harshavardhana	f1abb92f0c	feat: Single drive XL implementation (#14970 ) Main motivation is move towards a common backend format for all different types of modes in MinIO, allowing for a simpler code and predictable behavior across all features. This PR also brings features such as versioning, replication, transitioning to single drive setups.	2022-05-30 10:58:37 -07:00
Harshavardhana	38caddffe7	fix: copyObject on versioned bucket when updating metadata (#14971 ) updating metadata with CopyObject on a versioned bucket causes the latest version to be not readable, this PR fixes this properly by handling the inline data bug fix introduced in PR #14780. This bug affects only inlined data.	2022-05-24 17:27:45 -07:00
Harshavardhana	5cffd3780a	fix: multiple fixes in prefix exclude implementation (#14877 ) - do not need to restrict prefix exclusions that do not have `/` as suffix, relax this requirement as spark may have staging folders with other autogenerated characters , so we are better off doing full prefix March and skip. - multiple delete objects was incorrectly creating a null delete marker on a versioned bucket instead of creating a proper versioned delete marker. - do not suspend paths on the excluded prefixes during delete operations to avoid creating `null` delete markers, honor suspension of versioning only at bucket level for delete markers.	2022-05-07 22:06:44 -07:00
Krishnan Parthasarathi	ad8e611098	feat: implement prefix-level versioning exclusion (#14828 ) Spark/Hadoop workloads which use Hadoop MR Committer v1/v2 algorithm upload objects to a temporary prefix in a bucket. These objects are 'renamed' to a different prefix on Job commit. Object storage admins are forced to configure separate ILM policies to expire these objects and their versions to reclaim space. Our solution: This can be avoided by simply marking objects under these prefixes to be excluded from versioning, as shown below. Consequently, these objects are excluded from replication, and don't require ILM policies to prune unnecessary versions. - MinIO Extension to Bucket Version Configuration ```xml <VersioningConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> <Status>Enabled</Status> <ExcludeFolders>true</ExcludeFolders> <ExcludedPrefixes> <Prefix>app1-jobs//_temporary/</Prefix> </ExcludedPrefixes> <ExcludedPrefixes> <Prefix>app2-jobs//__magic/</Prefix> </ExcludedPrefixes> <!-- .. up to 10 prefixes in all --> </VersioningConfiguration> ``` Note: `ExcludeFolders` excludes all folders in a bucket from versioning. This is required to prevent the parent folders from accumulating delete markers, especially those which are shared across spark workloads spanning projects/teams. - To enable version exclusion on a list of prefixes ``` mc version enable --excluded-prefixes "app1-jobs//_temporary/,app2-jobs//_magic," --exclude-prefix-marker myminio/test ```	2022-05-06 19:05:28 -07:00
Harshavardhana	c7df1ffc6f	avoid concurrent reads and writes to opts.UserDefined (#14862 ) do not modify opts.UserDefined after object-handler has set all the necessary values, any mutation needed should be done on a copy of this value not directly. As there are other pieces of code that access opts.UserDefined concurrently this becomes challenging. fixes #14856	2022-05-05 04:14:41 -07:00
Anis Elleuch	44a3b58e52	Add audit log for decommissioning (#14858 )	2022-05-04 00:45:27 -07:00
Harshavardhana	507f993075	attempt to real resolve when there is a quorum failure on reads (#14613 )	2022-04-20 12:49:05 -07:00
Harshavardhana	73a6a60785	fix: replication deleteObject() regression and CopyObject() behavior (#14780 ) This PR fixes two issues - The first fix is a regression from #14555, the fix itself in #14555 is correct but the interpretation of that information by the object layer code for "replication" was not correct. This PR tries to fix this situation by making sure the "Delete" replication works as expected when "VersionPurgeStatus" is already set. Without this fix, there is a DELETE marker created incorrectly on the source where the "DELETE" was triggered. - The second fix is perhaps an older problem started since we inlined-data on the disk for small objects, CopyObject() incorrectly inline's a non-inlined data. This is due to the fact that we have code where we read the `part.1` under certain conditions where the size of the `part.1` is less than the specific "threshold". This eventually causes problems when we are "deleting" the data that is only inlined, which means dataDir is ignored leaving such dataDir on the disk, that looks like an inconsistent content on the namespace. fixes #14767	2022-04-20 10:22:05 -07:00
Harshavardhana	153a612253	fetch bucket retention config once for ILM evalAction (#14727 ) This is mainly an optimization, does not change any existing functionality.	2022-04-11 13:25:32 -07:00
Krishnan Parthasarathi	7b81967a3c	Fix handling of object versions pending purge (#14555 ) - GetObject() with vid should return 405 - GetObject() without vid should return 404 - ListObjects() should ignore this object if this is the "latest" version of the object - ListObjectVersions() should list this object as "DELETE marker" - Remove data parts before sync'ing the version pending purge	2022-03-16 16:59:43 -07:00
Harshavardhana	0e3bafcc54	improve logs, fix banner formatting (#14456 )	2022-03-03 13:21:16 -08:00
Harshavardhana	9d7648f02f	reduce unnecessary logging during speedtest (#14387 ) - speedtest logs calls that were canceled spuriously, in situations where it should be ignored. - all errors of interest are always sent back to the client there is no need to log them on the server console. - PUT failures should negate the increments such that GET is not attempted on unsuccessful calls. - do not attempt MRF on speedtest objects.	2022-02-23 11:59:13 -08:00
Harshavardhana	f19a414e09	fix: allow danging objects to be purged properly deleteMultipleObjects() (#14273 ) Deleting bulk objects had an issue since the relevant versionID is not passed through the layers to ensure that the dangling object purge actually works cleanly. This is a continuation of quorum related error returned by multi-object delete API from #14248 This PR ensures that we pass down correct information as well as extend the scope of dangling object detection.	2022-02-08 20:08:23 -08:00
Harshavardhana	aaea94a48d	update quorum requirement to list all objects (#14201 ) some upgraded objects might not get listed due to different quorum ratios across objects. make sure to list all objects that satisfy the maximum possible quorum.	2022-01-27 17:00:15 -08:00
Klaus Post	64d4da5a37	Add Put input readahead (#14084 ) When reading input for PutObject or PutObjectPart add a readahead buffer for big inputs. This will make network reads+hashing separate run async with erasure coding and writes. This will reduce overall latency in distributed setups where the input is from upstream and writes go to other servers. We will read at 2 buffers ahead, meaning one will always be ready/waiting and one is currently being read from. This improves PutObject and PutObjectParts for these cases.	2022-01-14 10:01:25 -08:00
Harshavardhana	f546636c52	fix: use renameAll instead of deleteObject() for purging temporary files (#14096 ) This PR simplifies few things - Multipart parts are renamed, upon failure are unrenamed() keep this multipart specific behavior it is needed and works fine. - AbortMultipart should blindly delete once lock is acquired instead of re-reading metadata and calculating quorum, abort is a delete() operation and client has no business looking for errors on this. - Skip Access() calls to folders that are operating on `.minio.sys/multipart` folder as well.	2022-01-13 11:07:41 -08:00
Harshavardhana	38ccc4f672	fix: make sure to avoid calling RenameData() on disconnected disks. (#14094 ) Large clusters with multiple sets, or multi-pool setups at times might fail and report unexpected "file not found" errors. This can become a problem during startup sequence when some files need to be created at multiple locations. - This PR ensures that we nil the erasure writers such that they are skipped in RenameData() call. - RenameData() doesn't need to "Access()" calls for `.minio.sys` folders they always exist. - Make sure PutObject() never returns ObjectNotFound{} for any errors, make sure it always returns "WriteQuorum" when renameData() fails with ObjectNotFound{}. Return appropriate errors for all other cases.	2022-01-12 18:49:01 -08:00
Poorna	54a98773f8	fix: replication of tag removal (#14056 ) Currently tag removal leaves replication state as `PENDING` because the `HEAD` api returns just a tag count but not the actual tags, and this is treated as a no-op	2022-01-10 19:06:10 -08:00
Harshavardhana	76b21de0c6	feat: decommission feature for pools (#14012 ) ``` λ mc admin decommission start alias/ http://minio{1...2}/data{1...4} ``` ``` λ mc admin decommission status alias/ ┌─────┬─────────────────────────────────┬──────────────────────────────────┬────────┐ │ ID │ Pools │ Capacity │ Status │ │ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Active │ │ 2nd │ http://minio{3...4}/data{1...4} │ 329 GiB (used) / 421 GiB (total) │ Active │ └─────┴─────────────────────────────────┴──────────────────────────────────┴────────┘ ``` ``` λ mc admin decommission status alias/ http://minio{1...2}/data{1...4} Progress: ===================> [1GiB/sec] [15%] [4TiB/50TiB] Time Remaining: 4 hours (started 3 hours ago) ``` ``` λ mc admin decommission status alias/ http://minio{1...2}/data{1...4} ERROR: This pool is not scheduled for decommissioning currently. ``` ``` λ mc admin decommission cancel alias/ ┌─────┬─────────────────────────────────┬──────────────────────────────────┬──────────┐ │ ID │ Pools │ Capacity │ Status │ │ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Draining │ └─────┴─────────────────────────────────┴──────────────────────────────────┴──────────┘ ``` > NOTE: Canceled decommission will not make the pool active again, since we might have > Potentially partial duplicate content on the other pools, to avoid this scenario be > very sure to start decommissioning as a planned activity. ``` λ mc admin decommission cancel alias/ http://minio{1...2}/data{1...4} ┌─────┬─────────────────────────────────┬──────────────────────────────────┬────────────────────┐ │ ID │ Pools │ Capacity │ Status │ │ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Draining(Canceled) │ └─────┴─────────────────────────────────┴──────────────────────────────────┴────────────────────┘ ```	2022-01-10 09:07:49 -08:00
Klaus Post	0e31cff762	fix: DeleteMultipleObjects to finish even if cancelled + concurrent sets (#14038 ) * Process sets concurrently. * Disconnect context from request. * Insert context cancellation checks. * errFileNotFound and errFileVersionNotFound are ok, unless creating delete markers.	2022-01-06 10:47:49 -08:00
Harshavardhana	f527c708f2	run gofumpt cleanup across code-base (#14015 )	2022-01-02 09:15:06 -08:00
Harshavardhana	866a95de38	fix: choose appropriate quorum for a given erasure set (#13998 ) multiObject delete should honor expected quorum	2021-12-28 12:41:52 -08:00
Harshavardhana	54ec0a1308	add configurable delta for skipping shards (#13967 ) This PR is an attempt to make this configurable as not all situations have same level of tolerable delta, i.e disks are replaced days apart or even hours. There is also a possibility that nodes have drifted in time, when NTP is not configured on the system.	2021-12-22 11:43:01 -08:00
Harshavardhana	0e3037631f	skip inconsistent shards if possible (#13945 ) data shards were wrong due to a healing bug reported in #13803 mainly with unaligned object sizes. This PR is an attempt to automatically avoid these shards, with available information about the `xl.meta` and actually disk mtime.	2021-12-21 10:08:26 -08:00
Harshavardhana	b280a37c4d	add delete-marker proactively in DeleteObject() (#13795 ) single object delete was not working properly on a bucket when versioning was suspended, current version 'null' object was never removed. added unit tests to cover the behavior fixes #13783	2021-11-30 18:30:06 -08:00
Harshavardhana	c791de0e1e	re-implement pickValidInfo dataDir, move to quorum calculation (#13681 ) dataDir loosely based on maxima is incorrect and does not work in all situations such as disks in the following order - xl.json migration to xl.meta there may be partial xl.json's leftover if some disks are not yet connected when the disk is yet to come up, since xl.json mtime and xl.meta is same the dataDir maxima doesn't work properly leading to quorum issues. - its also possible that XLV1 might be true among the disks available, make sure to keep FileInfo based on common quorum and skip unexpected disks with the older data format. Also, this PR tests upgrade from older to a newer release if the data is readable and matches the checksum. NOTE: this is just initial work we can build on top of this to do further tests.	2021-11-21 10:41:30 -08:00
Harshavardhana	661b263e77	add gocritic/ruleguard checks back again, cleanup code. (#13665 ) - remove some duplicated code - reported a bug, separately fixed in #13664 - using strings.ReplaceAll() when needed - using filepath.ToSlash() use when needed - remove all non-Go style comments from the codebase Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>	2021-11-16 09:28:29 -08:00
Harshavardhana	bb639d9f29	remove double reads delete versions (#13544 ) deleting collection of versions belonging to same object, we can avoid re-reading the xl.meta from the disk instead purge all the requested versions in-memory, the tradeoff is to allocate a map to de-dup the versions, allow disks to be read only once per object. additionally reduce the data transfer between nodes by shortening msgp data values.	2021-11-01 10:50:07 -07:00
Harshavardhana	4ed0eb7012	remove double reads updating object metadata (#13542 ) Removes RLock/RUnlock for updating metadata, since we already take a write lock to update metadata, this change removes reading of xl.meta as well as an additional lock, the performance gain should increase 3x theoretically for - PutObjectRetention - PutObjectLegalHold This optimization is mainly for Veeam like workloads that require a certain level of iops from these API calls, we were losing iops.	2021-10-30 08:22:04 -07:00
Harshavardhana	d693431183	fix: ReadFileStream should return an error when size mismatches (#13435 ) offset+length should match the Size() of the individual parts return 'errFileCorrupt' otherwise, to trigger healing of the individual parts do not error out prematurely when healing such bitrot's upon successful parts being written to the client. another issue this PR fixes is to not return and error to the client if we have just triggered a heal on a specific part of the object, instead continue to read all the content and let the heal happen asynchronously later.	2021-10-13 19:49:14 -07:00
Harshavardhana	200caab82b	fix: multi-pool setup make sure acquire locks properly (#13280 ) This was a regression introduced in '14bb969782' this has the potential to cause corruption when there are concurrent overwrites attempting to update the content on the namespace. This PR adds a situation where PutObject(), CopyObject() compete properly for the same locks with NewMultipartUpload() however it ends up turning off competing locks for the actual object with GetObject() and DeleteObject() - since they do not compete due to concurrent I/O on a versioned bucket it can lead to loss of versions. This PR fixes this bug with multi-pool setup with replication that causes corruption of inlined data due to lack of competing locks in a multi-pool setup. Instead CompleteMultipartUpload holds the necessary locks when finishing the transaction, knowing the exact location of an object to schedule the multipart upload doesn't need to compete in this manner, a pool id location for existing object.	2021-09-22 21:46:24 -07:00
Krishnan Parthasarathi	31d7cc2cd4	erasure: Set fi.IsLatest when adding a new version (#13277 )	2021-09-22 19:17:09 -07:00
Poorna Krishnamoorthy	c4373ef290	Add support for multi site replication (#12880 )	2021-09-18 13:31:35 -07:00
Harshavardhana	03b7bebc96	fix: invalid quorum calculation in TransitionObject (#13125 ) Quorum calculation should be based on the existing metadata, custom quorum calculation can lead to unreadable content.	2021-09-01 08:57:42 -07:00
Harshavardhana	35f2552fc5	reduce extra getObjectInfo() calls during ILM transition (#13091 ) * reduce extra getObjectInfo() calls during ILM transition This PR also changes expiration logic to be non-blocking, scanner is now free from additional costs incurred due to slower object layer calls and hitting the drives. * move verifying expiration inside locks	2021-08-27 17:06:47 -07:00
Harshavardhana	bbf3576f70	remove unecessary metadata structs in applyTransitionAction() (#13059 )	2021-08-24 12:24:00 -07:00
Poorna Krishnamoorthy	674c6f7a7b	fix: resync of replication of delete markers (#12932 ) Fixes #12919	2021-08-23 14:48:22 -07:00

1 2 3 4

180 Commits