minio

Commit Graph

Author	SHA1	Message	Date
Harshavardhana	e0f4dd6027	remove unncessary logs from WalkDir(), PutObject() (#16818 )	2023-03-15 11:52:23 -07:00
ferhat elmas	714283fae2	cleanup ignored static analysis (#16767 )	2023-03-06 08:56:10 -08:00
Klaus Post	9acf1024e4	Remove bloom filter (#16682 ) Removes the bloom filter since it has so limited usability, often gets saturated anyway and adds a bunch of complexity to the scanner. Also removes a tiny bit of CPU by each write operation.	2023-02-24 09:03:31 +05:30
Harshavardhana	a0f06eac2a	add Veeam SOS API first implementation (#16688 )	2023-02-22 19:54:57 +05:30
Krishnan Parthasarathi	d136ac0596	Don't close transition task channel on server exit (#16627 )	2023-02-15 22:09:25 -08:00
Krishnan Parthasarathi	cea2ca8c8e	Add restore-status header for multipart objects (#16508 )	2023-01-31 07:53:45 +05:30
Harshavardhana	67fce4a5b3	fix: dangling delete() upon success should return 404 (#16494 )	2023-01-27 12:43:45 -08:00
Anis Elleuch	d98116559b	Use async healing in PutObject call (#16431 )	2023-01-19 00:54:22 -08:00
Harshavardhana	2937711390	fix: DeleteObject() API with versionId under replication (#16325 )	2022-12-28 22:48:33 -08:00
Anis Elleuch	acc9c033ed	debug: Add X-Amz-Request-ID to lock/unlock calls (#16309 )	2022-12-23 19:49:07 -08:00
Krishnan Parthasarathi	2fa35def2c	Fix DeleteObject when only free versions remain (#16289 )	2022-12-21 16:24:07 -08:00
Anis Elleuch	89db3fdb5d	Do not return an error when version disparity is detected (#16269 )	2022-12-16 08:52:12 -08:00
Harshavardhana	dfe73629a3	fix: delete marker discrepancies via DeleteObject() API (#16195 )	2022-12-08 18:15:16 -08:00
Aditya Manthramurthy	a30cfdd88f	Bump up madmin-go to v2 (#16162 )	2022-12-06 13:46:50 -08:00
Klaus Post	a713aee3d5	Run staticcheck on CI (#16170 )	2022-12-05 11:18:50 -08:00
Klaus Post	1cd875de1e	Persist updated metadata (#16160 )	2022-12-02 08:35:04 -08:00
Poorna	63fc6ba2cd	preserve replicated ETag properly on target (#16129 )	2022-11-26 14:43:32 -08:00
Harshavardhana	91f45c4aa6	avoid inconsistent versions healing when versions are large (#16066 )	2022-11-14 18:35:26 -08:00
Anis Elleuch	7260241511	Remove some logs caused by external apps (#16027 )	2022-11-08 13:29:05 -08:00
Harshavardhana	fd6f6fc8df	cleanup stale parent multipart directories (#15980 )	2022-11-01 08:00:02 -07:00
Harshavardhana	136d41775f	remove numAvailableDisks check as it doesn't serve any purpose (#15954 )	2022-10-27 09:05:24 -07:00
Poorna	7dd8b6c8ed	ensure ILM expiry creates non null deleteMarker for versioned bucket (#15947 )	2022-10-26 16:09:27 -07:00
Anis Elleuch	fc6c794972	Audit dangling object removal (#15933 )	2022-10-24 11:35:07 -07:00
Anis Elleuch	ac85c2af76	lifecycle: refactor rules filtering and tagging support (#15914 )	2022-10-21 10:46:53 -07:00
Harshavardhana	928feb0889	remove unused debug param from evalActionFromLifecycle (#15813 )	2022-10-07 10:24:12 -07:00
Harshavardhana	9e5853ecc0	optimize double reads by reusing results from checkUploadIDExists() (#15692 ) Move to using `xl.meta` data structure to keep temporary partInfo, this allows for a future change where we move to different parts to different drives.	2022-09-15 12:43:49 -07:00
Harshavardhana	124544d834	add pre-conditions support for PUT calls during replication (#15674 ) PUT shall only proceed if pre-conditions are met, the new code uses - x-minio-source-mtime - x-minio-source-etag to verify if the object indeed needs to be replicated or not, allowing us to avoid StatObject() call.	2022-09-14 18:44:04 -07:00
Harshavardhana	8e997eba4a	fix: trigger Heal when xl.meta needs healing during PUT (#15661 ) This PR is a continuation of the previous change instead of returning an error, instead trigger a spot heal on the 'xl.meta' and return only after the healing is complete. This allows for future GETs on the same resource to be consistent for any version of the object.	2022-09-07 07:25:39 -07:00
Harshavardhana	2d9b5a65f1	verify RenameData() versions to be consistent (#15649 ) xl.meta gets written and never rolled back, however we definitely need to validate the state that is persisted on the disk, if there are inconsistencies - more than write quorum we should return an error to the client - if write quorum was achieved however there are inconsistent xl.meta's we should simply trigger an MRF on them	2022-09-05 16:51:37 -07:00
Harshavardhana	5ea629beb2	avoid printing io.ErrUnexpectedEOF for .metacache objects (#15642 )	2022-09-02 12:47:17 -07:00
Klaus Post	8e4a45ec41	fix: encrypt checksums in metadata (#15620 )	2022-08-31 08:13:23 -07:00
Klaus Post	a9f1ad7924	Add extended checksum support (#15433 )	2022-08-29 16:57:16 -07:00
Poorna	471467d310	fix: ensure metadata update happens after deletemarker replication (#15564 ) Fixes regression caused by #15521	2022-08-22 15:59:06 -07:00
Harshavardhana	d350b666ff	feat: add idempotent delete marker support (#15521 ) The bottom line is delete markers are a nuisance, most applications are not version aware and this has simply complicated the version management. AWS S3 gave an unnecessary complication overhead for customers, they need to now manage these markers by applying ILM settings and clean them up on a regular basis. To make matters worse all these delete markers get replicated as well in a replicated setup, requiring two ILM settings on each site. This PR is an attempt to address this inferior implementation by deviating MinIO towards an idempotent delete marker implementation i.e MinIO will never create any more than single consecutive delete markers. This significantly reduces operational overhead by making versioning more useful for real data. This is an S3 spec deviation for pragmatic reasons.	2022-08-18 16:41:59 -07:00
Anis Elleuch	b3edb25377	bloom: healObject to mark a path dirty only for dangling objects (#15458 ) The path is marked dirty automatically when healObject() is called, which is wrong. HealObject() is called during self-healing and this will lead to an increase in the false positive result of the bloom filter. Also move NSUpdated() from renameData() and call it directly in CompleteMultipart and PutObject, this is not a functional change but it will make it less prone to errors in the future.	2022-08-02 16:57:39 -07:00
Harshavardhana	aa874010e2	fix: regression in resolving the right versions (#15430 ) fix: regression in resolving right versions commit `d480022711` caused a regression in real resolver, by picking up incorrect versionID.	2022-07-29 10:03:53 -07:00
Harshavardhana	ce8397f7d9	use partInfo only for intermediate part.x.meta (#15353 )	2022-07-19 18:56:24 -07:00
Harshavardhana	7da9e3a6f8	support encrypted/compressed objects properly during decommission (#15320 ) fixes #15314	2022-07-16 19:35:24 -07:00
Klaus Post	0149382cdc	Add padding to compressed+encrypted files (#15282 ) Add up to 256 bytes of padding for compressed+encrypted files. This will obscure the obvious cases of extremely compressible content and leave a similar output size for a very wide variety of inputs. This does not mean the compression ratio doesn't leak information about the content, but the outcome space is much smaller, so often less information is leaked.	2022-07-13 07:52:15 -07:00
Klaus Post	911a17b149	Add compressed file index (#15247 )	2022-07-11 17:30:56 -07:00
Praveen raj Mani	b49fc33cb3	purge objects immediately with `x-minio-force-delete` in DeleteObject and DeleteBucket API (#15148 )	2022-07-11 09:15:54 -07:00
Anis Elleuch	54a061bdda	Save minio version information centrally (#15181 )	2022-06-29 14:45:49 -07:00
Harshavardhana	9c605ad153	allow support for parity '0', '1' enabling support for 2,3 drive setups (#15171 ) allows for further granular setups - 2 drives (1 parity, 1 data) - 3 drives (1 parity, 2 data) Bonus: allows '0' parity as well.	2022-06-27 20:22:18 -07:00
Harshavardhana	6722f58668	save MinIO version with each version (8-bytes extra) (#15170 ) store MinIO version along with each version in 'xl.meta' for future purposes, can be used as ways to add specific code for bug fixes if any.	2022-06-27 03:59:41 -07:00
Minio Trusted	e2d4d097e7	do not print errors upon 'nil' err	2022-06-06 17:33:41 -07:00
Harshavardhana	df9eeb7f8f	fix: do not log concurrently when multiple disks return errors (#15044 ) since the values inside 'context' are mutated internally by logger, make sure to log serially upon errors not concurrently.	2022-06-06 15:15:11 -07:00
Harshavardhana	52221db7ef	fix: for unexpected errors in reading versioning config panic (#14994 ) We need to make sure if we cannot read bucket metadata for some reason, and bucket metadata is not missing and returning corrupted information we should panic such handlers to disallow I/O to protect the overall state on the system. In-case of such corruption we have a mechanism now to force recreate the metadata on the bucket, using `x-minio-force-create` header with `PUT /bucket` API call. Additionally fix the versioning config updated state to be set properly for the site replication healing to trigger correctly.	2022-05-31 02:57:57 -07:00
Harshavardhana	d480022711	fix: invalidate outdated disks appropriately during readAllXL (#15002 ) readAllXL would return inlined data for outdated disks causing "read" to return incorrect content to the client, this PR fixes this behavior by making sure we skip such outdated disks appropriately based on the latest ModTime on the disk.	2022-05-30 12:43:54 -07:00
Harshavardhana	f1abb92f0c	feat: Single drive XL implementation (#14970 ) Main motivation is move towards a common backend format for all different types of modes in MinIO, allowing for a simpler code and predictable behavior across all features. This PR also brings features such as versioning, replication, transitioning to single drive setups.	2022-05-30 10:58:37 -07:00
Harshavardhana	38caddffe7	fix: copyObject on versioned bucket when updating metadata (#14971 ) updating metadata with CopyObject on a versioned bucket causes the latest version to be not readable, this PR fixes this properly by handling the inline data bug fix introduced in PR #14780. This bug affects only inlined data.	2022-05-24 17:27:45 -07:00
Harshavardhana	5cffd3780a	fix: multiple fixes in prefix exclude implementation (#14877 ) - do not need to restrict prefix exclusions that do not have `/` as suffix, relax this requirement as spark may have staging folders with other autogenerated characters , so we are better off doing full prefix March and skip. - multiple delete objects was incorrectly creating a null delete marker on a versioned bucket instead of creating a proper versioned delete marker. - do not suspend paths on the excluded prefixes during delete operations to avoid creating `null` delete markers, honor suspension of versioning only at bucket level for delete markers.	2022-05-07 22:06:44 -07:00
Krishnan Parthasarathi	ad8e611098	feat: implement prefix-level versioning exclusion (#14828 ) Spark/Hadoop workloads which use Hadoop MR Committer v1/v2 algorithm upload objects to a temporary prefix in a bucket. These objects are 'renamed' to a different prefix on Job commit. Object storage admins are forced to configure separate ILM policies to expire these objects and their versions to reclaim space. Our solution: This can be avoided by simply marking objects under these prefixes to be excluded from versioning, as shown below. Consequently, these objects are excluded from replication, and don't require ILM policies to prune unnecessary versions. - MinIO Extension to Bucket Version Configuration ```xml <VersioningConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> <Status>Enabled</Status> <ExcludeFolders>true</ExcludeFolders> <ExcludedPrefixes> <Prefix>app1-jobs//_temporary/</Prefix> </ExcludedPrefixes> <ExcludedPrefixes> <Prefix>app2-jobs//__magic/</Prefix> </ExcludedPrefixes> <!-- .. up to 10 prefixes in all --> </VersioningConfiguration> ``` Note: `ExcludeFolders` excludes all folders in a bucket from versioning. This is required to prevent the parent folders from accumulating delete markers, especially those which are shared across spark workloads spanning projects/teams. - To enable version exclusion on a list of prefixes ``` mc version enable --excluded-prefixes "app1-jobs//_temporary/,app2-jobs//_magic," --exclude-prefix-marker myminio/test ```	2022-05-06 19:05:28 -07:00
Harshavardhana	c7df1ffc6f	avoid concurrent reads and writes to opts.UserDefined (#14862 ) do not modify opts.UserDefined after object-handler has set all the necessary values, any mutation needed should be done on a copy of this value not directly. As there are other pieces of code that access opts.UserDefined concurrently this becomes challenging. fixes #14856	2022-05-05 04:14:41 -07:00
Anis Elleuch	44a3b58e52	Add audit log for decommissioning (#14858 )	2022-05-04 00:45:27 -07:00
Harshavardhana	507f993075	attempt to real resolve when there is a quorum failure on reads (#14613 )	2022-04-20 12:49:05 -07:00
Harshavardhana	73a6a60785	fix: replication deleteObject() regression and CopyObject() behavior (#14780 ) This PR fixes two issues - The first fix is a regression from #14555, the fix itself in #14555 is correct but the interpretation of that information by the object layer code for "replication" was not correct. This PR tries to fix this situation by making sure the "Delete" replication works as expected when "VersionPurgeStatus" is already set. Without this fix, there is a DELETE marker created incorrectly on the source where the "DELETE" was triggered. - The second fix is perhaps an older problem started since we inlined-data on the disk for small objects, CopyObject() incorrectly inline's a non-inlined data. This is due to the fact that we have code where we read the `part.1` under certain conditions where the size of the `part.1` is less than the specific "threshold". This eventually causes problems when we are "deleting" the data that is only inlined, which means dataDir is ignored leaving such dataDir on the disk, that looks like an inconsistent content on the namespace. fixes #14767	2022-04-20 10:22:05 -07:00
Harshavardhana	153a612253	fetch bucket retention config once for ILM evalAction (#14727 ) This is mainly an optimization, does not change any existing functionality.	2022-04-11 13:25:32 -07:00
Krishnan Parthasarathi	7b81967a3c	Fix handling of object versions pending purge (#14555 ) - GetObject() with vid should return 405 - GetObject() without vid should return 404 - ListObjects() should ignore this object if this is the "latest" version of the object - ListObjectVersions() should list this object as "DELETE marker" - Remove data parts before sync'ing the version pending purge	2022-03-16 16:59:43 -07:00
Harshavardhana	0e3bafcc54	improve logs, fix banner formatting (#14456 )	2022-03-03 13:21:16 -08:00
Harshavardhana	9d7648f02f	reduce unnecessary logging during speedtest (#14387 ) - speedtest logs calls that were canceled spuriously, in situations where it should be ignored. - all errors of interest are always sent back to the client there is no need to log them on the server console. - PUT failures should negate the increments such that GET is not attempted on unsuccessful calls. - do not attempt MRF on speedtest objects.	2022-02-23 11:59:13 -08:00
Harshavardhana	f19a414e09	fix: allow danging objects to be purged properly deleteMultipleObjects() (#14273 ) Deleting bulk objects had an issue since the relevant versionID is not passed through the layers to ensure that the dangling object purge actually works cleanly. This is a continuation of quorum related error returned by multi-object delete API from #14248 This PR ensures that we pass down correct information as well as extend the scope of dangling object detection.	2022-02-08 20:08:23 -08:00
Harshavardhana	aaea94a48d	update quorum requirement to list all objects (#14201 ) some upgraded objects might not get listed due to different quorum ratios across objects. make sure to list all objects that satisfy the maximum possible quorum.	2022-01-27 17:00:15 -08:00
Klaus Post	64d4da5a37	Add Put input readahead (#14084 ) When reading input for PutObject or PutObjectPart add a readahead buffer for big inputs. This will make network reads+hashing separate run async with erasure coding and writes. This will reduce overall latency in distributed setups where the input is from upstream and writes go to other servers. We will read at 2 buffers ahead, meaning one will always be ready/waiting and one is currently being read from. This improves PutObject and PutObjectParts for these cases.	2022-01-14 10:01:25 -08:00
Harshavardhana	f546636c52	fix: use renameAll instead of deleteObject() for purging temporary files (#14096 ) This PR simplifies few things - Multipart parts are renamed, upon failure are unrenamed() keep this multipart specific behavior it is needed and works fine. - AbortMultipart should blindly delete once lock is acquired instead of re-reading metadata and calculating quorum, abort is a delete() operation and client has no business looking for errors on this. - Skip Access() calls to folders that are operating on `.minio.sys/multipart` folder as well.	2022-01-13 11:07:41 -08:00
Harshavardhana	38ccc4f672	fix: make sure to avoid calling RenameData() on disconnected disks. (#14094 ) Large clusters with multiple sets, or multi-pool setups at times might fail and report unexpected "file not found" errors. This can become a problem during startup sequence when some files need to be created at multiple locations. - This PR ensures that we nil the erasure writers such that they are skipped in RenameData() call. - RenameData() doesn't need to "Access()" calls for `.minio.sys` folders they always exist. - Make sure PutObject() never returns ObjectNotFound{} for any errors, make sure it always returns "WriteQuorum" when renameData() fails with ObjectNotFound{}. Return appropriate errors for all other cases.	2022-01-12 18:49:01 -08:00
Poorna	54a98773f8	fix: replication of tag removal (#14056 ) Currently tag removal leaves replication state as `PENDING` because the `HEAD` api returns just a tag count but not the actual tags, and this is treated as a no-op	2022-01-10 19:06:10 -08:00
Harshavardhana	76b21de0c6	feat: decommission feature for pools (#14012 ) ``` λ mc admin decommission start alias/ http://minio{1...2}/data{1...4} ``` ``` λ mc admin decommission status alias/ ┌─────┬─────────────────────────────────┬──────────────────────────────────┬────────┐ │ ID │ Pools │ Capacity │ Status │ │ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Active │ │ 2nd │ http://minio{3...4}/data{1...4} │ 329 GiB (used) / 421 GiB (total) │ Active │ └─────┴─────────────────────────────────┴──────────────────────────────────┴────────┘ ``` ``` λ mc admin decommission status alias/ http://minio{1...2}/data{1...4} Progress: ===================> [1GiB/sec] [15%] [4TiB/50TiB] Time Remaining: 4 hours (started 3 hours ago) ``` ``` λ mc admin decommission status alias/ http://minio{1...2}/data{1...4} ERROR: This pool is not scheduled for decommissioning currently. ``` ``` λ mc admin decommission cancel alias/ ┌─────┬─────────────────────────────────┬──────────────────────────────────┬──────────┐ │ ID │ Pools │ Capacity │ Status │ │ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Draining │ └─────┴─────────────────────────────────┴──────────────────────────────────┴──────────┘ ``` > NOTE: Canceled decommission will not make the pool active again, since we might have > Potentially partial duplicate content on the other pools, to avoid this scenario be > very sure to start decommissioning as a planned activity. ``` λ mc admin decommission cancel alias/ http://minio{1...2}/data{1...4} ┌─────┬─────────────────────────────────┬──────────────────────────────────┬────────────────────┐ │ ID │ Pools │ Capacity │ Status │ │ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Draining(Canceled) │ └─────┴─────────────────────────────────┴──────────────────────────────────┴────────────────────┘ ```	2022-01-10 09:07:49 -08:00
Klaus Post	0e31cff762	fix: DeleteMultipleObjects to finish even if cancelled + concurrent sets (#14038 ) * Process sets concurrently. * Disconnect context from request. * Insert context cancellation checks. * errFileNotFound and errFileVersionNotFound are ok, unless creating delete markers.	2022-01-06 10:47:49 -08:00
Harshavardhana	f527c708f2	run gofumpt cleanup across code-base (#14015 )	2022-01-02 09:15:06 -08:00
Harshavardhana	866a95de38	fix: choose appropriate quorum for a given erasure set (#13998 ) multiObject delete should honor expected quorum	2021-12-28 12:41:52 -08:00
Harshavardhana	54ec0a1308	add configurable delta for skipping shards (#13967 ) This PR is an attempt to make this configurable as not all situations have same level of tolerable delta, i.e disks are replaced days apart or even hours. There is also a possibility that nodes have drifted in time, when NTP is not configured on the system.	2021-12-22 11:43:01 -08:00
Harshavardhana	0e3037631f	skip inconsistent shards if possible (#13945 ) data shards were wrong due to a healing bug reported in #13803 mainly with unaligned object sizes. This PR is an attempt to automatically avoid these shards, with available information about the `xl.meta` and actually disk mtime.	2021-12-21 10:08:26 -08:00
Harshavardhana	b280a37c4d	add delete-marker proactively in DeleteObject() (#13795 ) single object delete was not working properly on a bucket when versioning was suspended, current version 'null' object was never removed. added unit tests to cover the behavior fixes #13783	2021-11-30 18:30:06 -08:00
Harshavardhana	c791de0e1e	re-implement pickValidInfo dataDir, move to quorum calculation (#13681 ) dataDir loosely based on maxima is incorrect and does not work in all situations such as disks in the following order - xl.json migration to xl.meta there may be partial xl.json's leftover if some disks are not yet connected when the disk is yet to come up, since xl.json mtime and xl.meta is same the dataDir maxima doesn't work properly leading to quorum issues. - its also possible that XLV1 might be true among the disks available, make sure to keep FileInfo based on common quorum and skip unexpected disks with the older data format. Also, this PR tests upgrade from older to a newer release if the data is readable and matches the checksum. NOTE: this is just initial work we can build on top of this to do further tests.	2021-11-21 10:41:30 -08:00
Harshavardhana	661b263e77	add gocritic/ruleguard checks back again, cleanup code. (#13665 ) - remove some duplicated code - reported a bug, separately fixed in #13664 - using strings.ReplaceAll() when needed - using filepath.ToSlash() use when needed - remove all non-Go style comments from the codebase Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>	2021-11-16 09:28:29 -08:00
Harshavardhana	bb639d9f29	remove double reads delete versions (#13544 ) deleting collection of versions belonging to same object, we can avoid re-reading the xl.meta from the disk instead purge all the requested versions in-memory, the tradeoff is to allocate a map to de-dup the versions, allow disks to be read only once per object. additionally reduce the data transfer between nodes by shortening msgp data values.	2021-11-01 10:50:07 -07:00
Harshavardhana	4ed0eb7012	remove double reads updating object metadata (#13542 ) Removes RLock/RUnlock for updating metadata, since we already take a write lock to update metadata, this change removes reading of xl.meta as well as an additional lock, the performance gain should increase 3x theoretically for - PutObjectRetention - PutObjectLegalHold This optimization is mainly for Veeam like workloads that require a certain level of iops from these API calls, we were losing iops.	2021-10-30 08:22:04 -07:00
Harshavardhana	d693431183	fix: ReadFileStream should return an error when size mismatches (#13435 ) offset+length should match the Size() of the individual parts return 'errFileCorrupt' otherwise, to trigger healing of the individual parts do not error out prematurely when healing such bitrot's upon successful parts being written to the client. another issue this PR fixes is to not return and error to the client if we have just triggered a heal on a specific part of the object, instead continue to read all the content and let the heal happen asynchronously later.	2021-10-13 19:49:14 -07:00
Harshavardhana	200caab82b	fix: multi-pool setup make sure acquire locks properly (#13280 ) This was a regression introduced in '14bb969782' this has the potential to cause corruption when there are concurrent overwrites attempting to update the content on the namespace. This PR adds a situation where PutObject(), CopyObject() compete properly for the same locks with NewMultipartUpload() however it ends up turning off competing locks for the actual object with GetObject() and DeleteObject() - since they do not compete due to concurrent I/O on a versioned bucket it can lead to loss of versions. This PR fixes this bug with multi-pool setup with replication that causes corruption of inlined data due to lack of competing locks in a multi-pool setup. Instead CompleteMultipartUpload holds the necessary locks when finishing the transaction, knowing the exact location of an object to schedule the multipart upload doesn't need to compete in this manner, a pool id location for existing object.	2021-09-22 21:46:24 -07:00
Krishnan Parthasarathi	31d7cc2cd4	erasure: Set fi.IsLatest when adding a new version (#13277 )	2021-09-22 19:17:09 -07:00
Poorna Krishnamoorthy	c4373ef290	Add support for multi site replication (#12880 )	2021-09-18 13:31:35 -07:00
Harshavardhana	03b7bebc96	fix: invalid quorum calculation in TransitionObject (#13125 ) Quorum calculation should be based on the existing metadata, custom quorum calculation can lead to unreadable content.	2021-09-01 08:57:42 -07:00
Harshavardhana	35f2552fc5	reduce extra getObjectInfo() calls during ILM transition (#13091 ) * reduce extra getObjectInfo() calls during ILM transition This PR also changes expiration logic to be non-blocking, scanner is now free from additional costs incurred due to slower object layer calls and hitting the drives. * move verifying expiration inside locks	2021-08-27 17:06:47 -07:00
Harshavardhana	bbf3576f70	remove unecessary metadata structs in applyTransitionAction() (#13059 )	2021-08-24 12:24:00 -07:00
Poorna Krishnamoorthy	674c6f7a7b	fix: resync of replication of delete markers (#12932 ) Fixes #12919	2021-08-23 14:48:22 -07:00
Krishnan Parthasarathi	b7e3651d3c	Set free-version id in case of version/version-suspended buckets (#12982 ) This free-version id may be used to track tiered object contents of the object (version) being deleted.	2021-08-17 08:59:48 -07:00
Klaus Post	24722ddd02	Remove inline data hack (#12946 ) move the code down to the storage layer, this logic decouples the inline data from the size parameter making it flexible and future proof.	2021-08-13 08:25:54 -07:00
Klaus Post	3eac02f676	Use metadata reader in ReadVersion (#12942 ) Use `readMetadata` when reading version information without data requested. Reduces IO on inlined data. Bonus: Inline compressed data as well when compression is enabled.	2021-08-12 10:05:24 -07:00
Harshavardhana	40a2fa8e81	fix: add more optimizations to putMetacacheObject() (#12916 ) - avoid extra lookup for 'xl.meta' since we are definitely sure that it doesn't exist. - use this in newMultipartUpload() as well - also additionally do not write with O_DSYNC to avoid loading the drives, instead create 'xl.meta' for listing operations without O_DSYNC since these are ephemeral objects. - do the same with newMultipartUpload() since it gets synced when the PutObjectPart() is attempted, we do not need to tax newMultipartUpload() instead.	2021-08-10 11:12:22 -07:00
Harshavardhana	54ab3a1d5b	implement putMetacacheObject() optimizing List operations (#12903 ) removes unexpected features from regular putObject() such as - increasing parity when disks are down, avoids a lot of DiskInfo() calls. - triggering MRF for metacache objects if disks are offline - avoiding renames from temporary location to actual namespace, not needed since metacache files are unique.	2021-08-09 06:58:54 -07:00
Harshavardhana	035882d292	fix: remove parentIsObject() check (#12851 ) we will allow situations such as ``` a/b/1.txt a/b ``` and ``` a/b a/b/1.txt ``` we are going to document that this usecase is not supported and we will never support it, if any application does this users have to delete the top level parent to make sure namespace is accessible at lower level. rest of the situations where the prefixes get created across sets are supported as is.	2021-08-03 13:26:57 -07:00
Krishnan Parthasarathi	0a62ae4e61	Revert ignoring inlined objects for transition (#12843 )	2021-07-30 16:45:17 -07:00
Harshavardhana	a51799d9f0	feat: Add support for audit notifications for transition (#12842 ) This PR adds audit notifications for transitioning objects, similar to audit logging for expiration and replication traffic.	2021-07-30 12:45:25 -07:00
Harshavardhana	f175ff8f66	add healing fixes for delete-marker (#12788 ) - delete-markers are incorrectly reported as corrupt with wrong data sent to client 'mc admin heal -r' on objects with delete marker will report as 'grey' incorrectly. - do not heal delete-markers during HeadObject() this can lead to inconsistent order of heals on the object, although this is not an issue in terms of order of versions it is rather simpler to keep the same order on all drives. - defaultHealResult() should handle 'err == nil' case such that valid cases should be handled as 'drive' status OK.	2021-07-26 08:01:41 -07:00
Krishnan Parthasarathi	6ea083d197	Add deployment-id and source bucket to transitioned object name (#12693 ) This allows remote bucket admin to identify the origin of transitioned objects by simply inspecting the object prefixes. e.g let's take a remote tier TIER-1 pointing to a remote bucket (prefix) testbucket/testprefix-1. The remote bucket admin can list all transitioned objects from a MinIO deployment identified by '2e78e906-1c5d-4f94-8689-9df44cafde39' and source bucket 'mybucket' like so, ``` $ ./mc ls -r minio-tier-target/testbucket/testprefix-1/2e78e906-1c5d-4f94-8689-9df44cafde39/mybucket/ [2021-07-12 17:15:50 PDT] 160B 48/fb/48fbc0e6-3a73-458b-9337-8e722c619ca4 [2021-07-12 16:58:46 PDT] 160B 7d/1c/7d1c96bd-031a-48d4-99ea-b1304e870830 ```	2021-07-20 10:49:52 -07:00
Krishnan Parthasarathi	29eea52e14	Skip transitioning of object versions if inlined (#12705 )	2021-07-16 09:38:27 -07:00
Anis Elleuch	b0b4696a64	heal: Add MRF metrics to background heal API response (#12398 ) This commit gathers MRF metrics from all nodes in a cluster and return it to the caller. This will show information about the number of objects in the MRF queues waiting to be healed.	2021-07-15 22:32:06 -07:00
Harshavardhana	affee27b05	fix: speed up erasure code upgrade checks (#12626 ) DiskInfo() calls can stagger and wait if run serially timing out 10secs per drive, to avoid this lets check DiskInfo in parallel to avoid delays when nodes get disconnected.	2021-07-08 01:04:37 -07:00
Krishnan Parthasarathi	a1df230518	Add a 'free' version to track deletion of tiered object content (#12470 )	2021-06-30 19:32:07 -07:00
Harshavardhana	41caf89cf4	fix: apply pre-conditions first on object metadata (#12545 ) This change in error flow complies with AWS S3 behavior for applications depending on specific error conditions. fixes #12543	2021-06-24 09:44:00 -07:00
Anis Elleuch	7722b91e1d	s3: Force a prefix removal using a special header (#12504 ) An S3 client can send `x-minio-force-delete: true` to remove a prefix.	2021-06-15 18:43:14 -07:00
Klaus Post	d524544494	Fix nil disk check in parity upgrade feature (#12444 ) Fixes #12443	2021-06-04 09:38:19 -07:00
Aditya Manthramurthy	30a3921d3e	[Tiering] Support remote tiers with object versioning (#12342 ) - Adds versioning support for S3 based remote tiers that have versioning enabled. This ensures that when reading or deleting we specify the specific version ID of the object. In case of deletion, this is important to ensure that the object version is actually deleted instead of simply being marked for deletion. - Stores the remote object's version id in the tier-journal. Tier-journal file version is not bumped up as serializing the new struct version is compatible with old journals without the remote object version id. - `storageRESTVersion` is bumped up as FileInfo struct now includes a `TransitionRemoteVersionID` member. - Azure and GCS support for this feature will be added subsequently. Co-authored-by: Krishnan Parthasarathi <krisis@users.noreply.github.com>	2021-06-03 14:26:51 -07:00
Harshavardhana	1f262daf6f	rename all remaining packages to internal/ (#12418 ) This is to ensure that there are no projects that try to import `minio/minio/pkg` into their own repo. Any such common packages should go to `https://github.com/minio/pkg`	2021-06-01 14:59:40 -07:00
Harshavardhana	81d5688d56	move the dependency to minio/pkg for common libraries (#12397 )	2021-05-28 15:17:01 -07:00
Harshavardhana	89bb9f17d7	fix: when parityDrives hits > len(storageDisks)/2, keep maxParity (#12387 ) Additionally move out `x-minio-internal-erasure-upgraded` from HTTP headers list, as its an internal header, rename elsewhere accordingly.	2021-05-27 13:38:04 -07:00
Klaus Post	acc452b7ce	Add more erasure codes on degraded systems. (#11852 ) In cases where a cluster is degraded, we do not uphold our consistency guarantee and we will write fewer erasure codes and rely on healing to recreate the missing shards. In some cases replacing known bad disks in practice take days. We want to change the behavior of a known degraded system to keep the erasure code promise of the storage class for each object. This will create the objects with the same confidence as a fully functional cluster. The tradeoff will be that objects created during a partial outage will take up slightly more space. This means that when the storage class is EC:4, there should always be written 4 parity shards, even if some disks are unavailable. When an object is created on a set, the disks are immediately checked. If any disks are unavailable additional parity shards will be made for each offline disk, up to 50% of the number of disks. We add an internal metadata field with the actual and intended erasure code level, this can optionally be picked up later by the scanner if we decide that data like this should be re-sharded.	2021-05-27 11:38:09 -07:00
Harshavardhana	4840974d7a	fix: inline data upon overwrites should be readable (#12369 ) This PR fixes two bugs - Remove fi.Data upon overwrite of objects from inlined-data to non-inlined-data - Workaround for an existing bug on disk with latest releases to ignore fi.Data and instead read from the disk for non-inlined-data - Addtionally add a reserved metadata header to indicate data is inlined for a given version.	2021-05-25 16:33:06 -07:00
Klaus Post	cde6469b88	Fix hanging erasure writes (#12253 ) However, this slice is also used for closing the writers, so close is never called on these. Furthermore when an error is returned from a write it is now reported to the reader. bonus: remove unused heal param from `newBitrotWriter`. * Remove copy, now that we don't mutate.	2021-05-17 08:32:28 -07:00
Harshavardhana	f1e479d274	remove more duplicate bloom filter trackers (#12302 ) At some places bloom filter tracker was getting updated for `.minio.sys/tmp` bucket, there is no reason to update bloom filters for those. And add a missing bloom filter update for MakeBucket() Bonus: purge unused function deleteEmptyDir()	2021-05-17 08:25:48 -07:00
Harshavardhana	2ab9dc7609	do not update bloomFilters for temporary objects	2021-05-15 19:54:07 -07:00
Harshavardhana	d84261aa6d	fix: ensure proper usage of DataDir (#12300 ) - GetObject() should always use a common dataDir to read from when it starts reading, this allows the code in erasure decoding to have sane expectations. - Healing should always heal on the common dataDir, this allows the code in dangling object detection to purge dangling content. These both situations can happen under certain types of retries during PUT when server is restarting etc, some namespace entries might be left over.	2021-05-14 16:50:47 -07:00
Harshavardhana	e84f533c6c	add missing wait groups for certain io.Pipe() usage (#12264 ) wait groups are necessary with io.Pipes() to avoid races when a blocking function may not be expected and a Write() -> Close() before Read() races on each other. We should avoid such situations.. Co-authored-by: Klaus Post <klauspost@gmail.com>	2021-05-11 09:18:37 -07:00
Harshavardhana	1aa5858543	move madmin to github.com/minio/madmin-go (#12239 )	2021-05-06 08:52:02 -07:00
Krishnan Parthasarathi	860bf1bab2	Add IsRemote method on FileInfo, ObjectInfo (#12209 ) Provides a convenient method to know if an object's contents are in its remote tier.	2021-05-04 08:40:42 -07:00
Harshavardhana	0d3ddf7286	fix: improve NewObjectReader implementation for careful cleanup usage (#12199 ) cleanup functions should never be cleaned before the reader is instantiated, this type of design leads to situations where order of lockers and places for them to use becomes confusing. Allow WithCleanupFuncs() if the caller wishes to add cleanupFns to be run upon close() or an error during initialization of the reader. Also make sure streams are closed before we unlock the resources, this allows for ordered cleanup of resources.	2021-04-30 18:37:58 -07:00
Harshavardhana	64f6020854	fix: cleanup locking, cancel context upon lock timeout (#12183 ) upon errors to acquire lock context would still leak, since the cancel would never be called. since the lock is never acquired - proactively clear it before returning.	2021-04-29 20:55:21 -07:00
Harshavardhana	336c8ac99f	fix: do not heal when disks are down (#12186 ) HeadObject() was erroneously attempting a heal when disks are down, avoid it.	2021-04-29 09:54:16 -07:00
Anis Elleuch	9e797532dc	lock: Always cancel the returned Get(R)Lock context (#12162 ) * lock: Always cancel the returned Get(R)Lock context There is a leak with cancel created inside the locking mechanism. The cancel purpose was to cancel operations such erasure get/put that are holding non-refreshable locks. This PR will ensure the created context.Cancel is passed to the unlock API so it will cleanup and avoid leaks. * locks: Avoid returning nil cancel in local lockers Since there is no Refresh mechanism in the local locking mechanism, we do not generate a new context or cancel. Currently, a nil cancel function is returned but this can cause a crash. Return a dummy function instead.	2021-04-27 16:12:50 -07:00
Poorna Krishnamoorthy	4be0f92067	Fix multipart restore to remove part match (#12161 ) Part ETags are not available after multipart finalizes, removing this check as not useful. Signed-off-by: Poorna Krishnamoorthy <poorna@minio.io> Co-authored-by: Harshavardhana <harsha@minio.io>	2021-04-26 18:24:06 -07:00
Harshavardhana	2966823818	use jsoniter for json marshal/unmarshal in KMS (#12146 ) Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-26 16:01:52 -07:00
Harshavardhana	4eb9b6eaf8	preserve metadata multipart restore (#12139 ) avoid re-read of xl.meta instead just use the success criteria from PutObjectPart() and check the ETag matches per Part, if they match then the parts have been successfully restored as is. Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-24 19:07:27 -07:00
Poorna Krishnamoorthy	5d954ea228	fix: versionID and MTime for restored object (#12145 ) Signed-off-by: Poorna Krishnamoorthy <poorna@minio.io>	2021-04-24 19:04:35 -07:00
Harshavardhana	cbfdf97abf	Use CompleteMultipartUpload in RestoreTransitionedObject Signed-off-by: Krishnan Parthasarathi <kp@minio.io>	2021-04-23 11:58:53 -07:00
Krishnan Parthasarathi	3831027c54	fix: compiler errors in restoreTransitionedObject (#12120 )	2021-04-23 11:58:53 -07:00
Krishnan Parthasarathi	c829e3a13b	Support for remote tier management (#12090 ) With this change, MinIO's ILM supports transitioning objects to a remote tier. This change includes support for Azure Blob Storage, AWS S3 compatible object storage incl. MinIO and Google Cloud Storage as remote tier storage backends. Some new additions include: - Admin APIs remote tier configuration management - Simple journal to track remote objects to be 'collected' This is used by object API handlers which 'mutate' object versions by overwriting/replacing content (Put/CopyObject) or removing the version itself (e.g DeleteObjectVersion). - Rework of previous ILM transition to fit the new model In the new model, a storage class (a.k.a remote tier) is defined by the 'remote' object storage type (one of s3, azure, GCS), bucket name and a prefix. * Fixed bugs, review comments, and more unit-tests - Leverage inline small object feature - Migrate legacy objects to the latest object format before transitioning - Fix restore to particular version if specified - Extend SharedDataDirCount to handle transitioned and restored objects - Restore-object should accept version-id for version-suspended bucket (#12091) - Check if remote tier creds have sufficient permissions - Bonus minor fixes to existing error messages Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io> Co-authored-by: Krishna Srinivas <krishna@minio.io> Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-23 11:58:53 -07:00
Harshavardhana	069432566f	update license change for MinIO Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-23 11:58:53 -07:00
Harshavardhana	a7acfa6158	fix: pick valid FileInfo additionally based on dataDir (#12116 ) * fix: pick valid FileInfo additionally based on dataDir historically we have always relied on modTime to be consistent and same, we can now add additional reference to look for the same dataDir value. A dataDir is the same for an object at a given point in time for a given version, let's say a `null` version is overwritten in quorum we do not by mistake pick up the fileInfo's incorrectly. * make sure to not preserve fi.Data Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-21 19:06:08 -07:00
Harshavardhana	2ef824bbb2	collapse two distinct calls into single RenameData() call (#12093 ) This is an optimization by reducing one extra system call, and many network operations. This reduction should increase the performance for small file workloads.	2021-04-20 10:44:39 -07:00
Harshavardhana	d46386246f	api: Introduce metadata update APIs to update only metadata (#11962 ) Current implementation heavily relies on readAllFileInfo but with the advent of xl.meta inlined with data, we cannot easily avoid reading data when we are only interested is updating metadata, this leads to invariably write amplification during metadata updates, repeatedly reading data when we are only interested in updating metadata. This PR ensures that we implement a metadata only update API at storage layer, that handles updates to metadata alone for any given version - given the version is valid and present. This helps reduce the chattiness for following calls.. - PutObjectTags - DeleteObjectTags - PutObjectLegalHold - PutObjectRetention - ReplicateObject (updates metadata on replication status)	2021-04-04 13:32:31 -07:00
Poorna Krishnamoorthy	47c09a1e6f	Various improvements in replication (#11949 ) - collect real time replication metrics for prometheus. - add pending_count, failed_count metric for total pending/failed replication operations. - add API to get replication metrics - add MRF worker to handle spill-over replication operations - multiple issues found with replication - fixes an issue when client sends a bucket name with `/` at the end from SetRemoteTarget API call make sure to trim the bucket name to avoid any extra `/`. - hold write locks in GetObjectNInfo during replication to ensure that object version stack is not overwritten while reading the content. - add additional protection during WriteMetadata() to ensure that we always write a valid FileInfo{} and avoid ever writing empty FileInfo{} to the lowest layers. Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io> Co-authored-by: Harshavardhana <harsha@minio.io>	2021-04-03 09:03:42 -07:00
Harshavardhana	434e5c0cfe	allow preserving legacyXLv1 with inline data format (#11951 ) current master breaks this important requirement we need to preserve legacyXLv1 format, this is simply ignored and overwritten causing a myriad of issues by leaving stale files on the namespace etc. for now lets still use the two-phase approach of writing to `tmp` and then renaming the content to the actual namespace.	2021-04-01 22:12:03 -07:00
Harshavardhana	204c610d84	do not use dataDir to reference inline data use versionID (#11942 ) versionID is the one that needs to be preserved and as well as overwritten in case of replication, transition etc - dataDir is an ephemeral entity that changes during overwrites - make sure that versionID is used to save the object content. this would break things if you are already running the latest master, please wipe your current content and re-do your setup after this change.	2021-04-01 13:09:23 -07:00
Klaus Post	4dcce17eb9	Determine small objects on shard size (#11935 ) Use shard size to determine when to inline data. For unversioned objects, use 128K/shard and for versioned 16K thresholds.	2021-03-31 09:19:14 -07:00
Klaus Post	2623338dc5	Inline small file data in xl.meta file (#11758 )	2021-03-29 17:00:55 -07:00
Harshavardhana	d7f32ad649	xl: avoid sending Delete() remote call for fully successful runs an optimization to avoid extra syscalls in PutObject(), adds up to our PutObject response times.	2021-03-24 17:32:12 -07:00
Harshavardhana	75741dbf4a	xl: remove cleanupDir instead use Delete() (#11880 ) use a single call to remove directly at disk instead of doing recursively at network layer.	2021-03-24 09:08:05 -07:00
Harshavardhana	51a8619a79	[feat] Add configurable deadline for writers (#11822 ) This PR adds deadlines per Write() calls, such that slow drives are timed-out appropriately and the overall responsiveness for Writes() is always up to a predefined threshold providing applications sustained latency even if one of the drives is slow to respond.	2021-03-18 14:09:55 -07:00
Harshavardhana	6160188bf3	fix: erasure index based reading based on actual ParityBlocks (#11792 ) in some setups with ordering issues in drive configuration, we should rely on expected parityBlocks instead of `len(disks)/2`	2021-03-15 20:03:13 -07:00
Anis Elleuch	eac66e67ec	Use maximum parity for config files (#11740 ) Some deployments have low parity (EC:2), but we really do not need to save our config data with the same parity configuration. N/2 would be better to keep MinIO configurations intact when unexpected a number of drives fail.	2021-03-09 10:19:47 -08:00
Anis Elleuch	7be7109471	locking: Add Refresh for better locking cleanup (#11535 ) Co-authored-by: Anis Elleuch <anis@min.io> Co-authored-by: Harshavardhana <harsha@minio.io>	2021-03-03 18:36:43 -08:00
Klaus Post	c3217bd6eb	Use actual size for buffer selection (#11687 ) For compressed inputs, this will be -1, but the object may be small.	2021-03-03 16:28:10 -08:00
Klaus Post	85620dfe93	use bucket in path in distribution hash (#11634 ) Use bucket in erasure distribution hash. For the rare cases where objects with the same names are uploaded to many buckets.	2021-02-25 10:11:31 -08:00
Klaus Post	11b2220696	Don't autoheal if disks are healing (#11558 ) Don't spawn automatic healing ops if a disk is healing.	2021-02-17 10:18:12 -08:00
Harshavardhana	aa8450a2a1	fix: parallelize getPoolIdx() for object lookup (#11547 )	2021-02-16 19:36:15 -08:00
Harshavardhana	7d4a2d2b68	fix: multiple pool reads parallelize when possible (#11537 )	2021-02-16 02:43:47 -08:00
Harshavardhana	cbf4bb62e0	fix: getPoolIdx decouple from top level options (#11512 ) top-level options shouldn't be passed down for GetObjectInfo() while verifying the objects in different pools, this is to make sure that we always get the value from the pool where the object exists.	2021-02-10 11:45:02 -08:00
Poorna Krishnamoorthy	93eb549a83	fix: duplicate delete marker attempts in bi-directional replication (#11491 )	2021-02-09 15:11:43 -08:00
Harshavardhana	68d299e719	fix: case-insensitive lookups for metadata (#11489 ) continuation of #11487, with more changes	2021-02-08 18:12:28 -08:00
Poorna Krishnamoorthy	f9c5636c2d	fix: lookup metdata case insensitively (#11487 ) while setting replication options	2021-02-08 16:19:05 -08:00

1 2 3 4 5 ...

315 Commits