minio

Commit Graph

Author	SHA1	Message	Date
Klaus Post	f7cecf0945	Make isIndexedMetaV2 return errors (#15012 ) Indexed streams would be decoded by the legacy loader if there was an error loading it. Return an error when the stream is indexed and it cannot be loaded. Fixes "unknown minor metadata version" on corrupted xl.meta files and returns an actual error.	2022-05-31 19:06:57 -07:00
Harshavardhana	f046f557fa	request only 1 best version for latest version resolution (#14625 ) ListObjects, ListObjectsV2 calls are being heavily taxed when there are many versions on objects left over from a previous release or ILM was never setup to clean them up. Instead of being absolutely correct at resolving the exact latest version of an object, we simply rely on the top most 1 version and resolve the rest. Once we have obtained the top most "1" version for ListObject, ListObjectsV2 call we break out.	2022-03-25 08:50:07 -07:00
Krishnan Parthasarathi	7b81967a3c	Fix handling of object versions pending purge (#14555 ) - GetObject() with vid should return 405 - GetObject() without vid should return 404 - ListObjects() should ignore this object if this is the "latest" version of the object - ListObjectVersions() should list this object as "DELETE marker" - Remove data parts before sync'ing the version pending purge	2022-03-16 16:59:43 -07:00
Harshavardhana	cf94d1f1f1	do not crash readXLMetaNoData - if the `xl.meta` has incorrect content (#14538 ) ``` tmp = buf[want:] ``` Would potentially crash when `buf` is truncated for some reason and does not have the expected bytes, this is of course considered not normal and is an odd situation. But we do not need to crash here instead allow for errors to be returned and let callers handle the errors.	2022-03-14 09:07:46 -07:00
Harshavardhana	b0c84e3de7	fix: deleteVersions causing xl.meta to have empty Versions[] slice (#14483 ) This is a side-affect of the optimization done in PR #13544 which causes a certain type of delete operations on given object versions can cause lastVersion indication to be skipped, which leads to an `xl.meta` where Versions[] slice is empty while the entire file is intact by itself. This PR tries to ensure that such files are visible and deletable by regular means of listing as null 'delete-marker' and also avoid the situation where this potential issue might arise.	2022-03-04 20:01:26 -08:00
Harshavardhana	0e3bafcc54	improve logs, fix banner formatting (#14456 )	2022-03-03 13:21:16 -08:00
Klaus Post	0012ca8ca5	Fix inconsistent metadata after healing (#14125 ) When calculating signatures empty part ETags were not discarded, leading to a different signature compared to freshly created ones. This would mean that after a heal signature of the healed metadata would be different. Fixing the calculation of signature will make these consistent. Furthermore when inconsistent entries, with zero version ID, with the same mod times but different signatures, the one with the lowest signature would be picked for quorum check. Since this is 50/50, we fall back to a simple quorum count on all signatures. Each of these fixes by themselves will lead to quorum. Tests were added for regressions and expected outcomes.	2022-01-19 10:48:00 -08:00
Harshavardhana	b7c5e45fff	heal: isObjectDangling should return false when it cannot decide (#14053 ) In a multi-pool setup when disks are coming up, or in a single pool setup let's say with 100's of erasure sets with a slow network. It's possible when healing is attempted on `.minio.sys/config` folder, it can lead to healing unexpectedly deleting some policy files as dangling due to a mistake in understanding when `isObjectDangling` is considered to be 'true'. This issue happened in commit `30135eed86` when we assumed the validMeta with empty ErasureInfo is considered to be fully dangling. This implementation issue gets exposed when the server is starting up. This is most easily seen with multiple-pool setups because of the disconnected fashion pools that come up. The decision to purge the object as dangling is taken incorrectly prior to the correct state being achieved on each pool, when the corresponding drive let's say returns 'errDiskNotFound', a 'delete' is triggered. At this point, the 'drive' comes online because this is part of the startup sequence as drives can come online lazily. This kind of situation exists because we allow (totalDisks/2) number of drives to be online when the server is being restarted. Implementation made an incorrect assumption here leading to policies getting deleted. Added tests to capture the implementation requirements.	2022-01-07 19:11:54 -08:00
Harshavardhana	f527c708f2	run gofumpt cleanup across code-base (#14015 )	2022-01-02 09:15:06 -08:00
Klaus Post	81e43b87c2	Don't zero buffer if big enough (#13877 ) Only append zeroed bytes when we don't have enough space anyway.	2021-12-10 13:08:10 -08:00
Klaus Post	3db931dc0e	Improve listing consistency with version merging (#13723 )	2021-12-02 11:29:16 -08:00
Harshavardhana	b280a37c4d	add delete-marker proactively in DeleteObject() (#13795 ) single object delete was not working properly on a bucket when versioning was suspended, current version 'null' object was never removed. added unit tests to cover the behavior fixes #13783	2021-11-30 18:30:06 -08:00
Klaus Post	faf013ec84	Improve performance on multiple versions (#13573 ) Existing: ```go type xlMetaV2 struct { Versions []xlMetaV2Version `json:"Versions" msg:"Versions"` } ``` Serialized as regular MessagePack. ```go //msgp:tuple xlMetaV2VersionHeader type xlMetaV2VersionHeader struct { VersionID [16]byte ModTime int64 Type VersionType Flags xlFlags } ``` Serialize as streaming MessagePack, format: ``` int(headerVersion) int(xlmetaVersion) int(nVersions) for each version { binary blob, xlMetaV2VersionHeader, serialized binary blob, xlMetaV2Version, serialized. } ``` xlMetaV2VersionHeader is <= 30 bytes serialized. Deserialized struct can easily be reused and does not contain pointers, so efficient as a slice (single allocation) This allows quickly parsing everything as slices of bytes (no copy). Versions are always saved sorted by modTime, newest first. No more need to sort on load. * Allows checking if a version exists. * Allows reading single version without unmarshal all. * Allows reading latest version of type without unmarshal all. * Allows reading latest version without unmarshal of all. * Allows checking if the latest is deleteMarker by reading first entry. * Allows adding/updating/deleting a version with only header deserialization. * Reduces allocations on conversion to FileInfo(s).	2021-11-18 12:15:22 -08:00
Harshavardhana	200caab82b	fix: multi-pool setup make sure acquire locks properly (#13280 ) This was a regression introduced in '14bb969782' this has the potential to cause corruption when there are concurrent overwrites attempting to update the content on the namespace. This PR adds a situation where PutObject(), CopyObject() compete properly for the same locks with NewMultipartUpload() however it ends up turning off competing locks for the actual object with GetObject() and DeleteObject() - since they do not compete due to concurrent I/O on a versioned bucket it can lead to loss of versions. This PR fixes this bug with multi-pool setup with replication that causes corruption of inlined data due to lack of competing locks in a multi-pool setup. Instead CompleteMultipartUpload holds the necessary locks when finishing the transaction, knowing the exact location of an object to schedule the multipart upload doesn't need to compete in this manner, a pool id location for existing object.	2021-09-22 21:46:24 -07:00
Poorna Krishnamoorthy	c4373ef290	Add support for multi site replication (#12880 )	2021-09-18 13:31:35 -07:00
Klaus Post	308371b434	Clean up ToFileInfo and avoid copy (#13144 ) Simplify code and remove an iteration of all versions. Remove unneded copy.	2021-09-03 12:31:32 -07:00
Klaus Post	1080609c86	Reuse buffers when writing metadata (#13040 ) Simplify returning buffers. Tested using `warp mixed --duration=1m --obj.size=100K`: ``` Operation: DELETE Operations: 7148 -> 7642 * Average: +6.77% (+8.1) obj/s ------------------- Operation: GET Operations: 32200 -> 34403 * Average: +6.74% (+3.5 MiB/s) throughput, +6.74% (+36.2) obj/s * First Byte: Average: -105.403µs (-3%), Median: -309µs (-11%), Best: -2.7µs (-0%), Worst: +3.5637ms (+3%) ------------------- Operation: PUT Operations: 10741 -> 11475 * Average: +6.78% (+1.2 MiB/s) throughput, +6.78% (+12.1) obj/s ------------------- Operation: STAT Operations: 21465 -> 22927 * Average: +6.71% (+24.0) obj/s ```	2021-08-23 11:17:27 -07:00
Klaus Post	24722ddd02	Remove inline data hack (#12946 ) move the code down to the storage layer, this logic decouples the inline data from the size parameter making it flexible and future proof.	2021-08-13 08:25:54 -07:00
Klaus Post	89febdb3d6	Reuse small buffers (#12948 ) When reading metadata allow reuse of buffers in certain cases. Take the low-hanging fruit. Reduce GC overhead when listing.	2021-08-12 14:27:22 -07:00
Krishnan Parthasarathi	29eea52e14	Skip transitioning of object versions if inlined (#12705 )	2021-07-16 09:38:27 -07:00
Poorna Krishnamoorthy	a6ec405443	fix: UpdateObjectVersion should compare versionID through versions (#12726 ) fixes #12703	2021-07-15 15:01:59 -07:00
Krishnan Parthasarathi	a1df230518	Add a 'free' version to track deletion of tiered object content (#12470 )	2021-06-30 19:32:07 -07:00
Aditya Manthramurthy	30a3921d3e	[Tiering] Support remote tiers with object versioning (#12342 ) - Adds versioning support for S3 based remote tiers that have versioning enabled. This ensures that when reading or deleting we specify the specific version ID of the object. In case of deletion, this is important to ensure that the object version is actually deleted instead of simply being marked for deletion. - Stores the remote object's version id in the tier-journal. Tier-journal file version is not bumped up as serializing the new struct version is compatible with old journals without the remote object version id. - `storageRESTVersion` is bumped up as FileInfo struct now includes a `TransitionRemoteVersionID` member. - Azure and GCS support for this feature will be added subsequently. Co-authored-by: Krishnan Parthasarathi <krisis@users.noreply.github.com>	2021-06-03 14:26:51 -07:00
Harshavardhana	1f262daf6f	rename all remaining packages to internal/ (#12418 ) This is to ensure that there are no projects that try to import `minio/minio/pkg` into their own repo. Any such common packages should go to `https://github.com/minio/pkg`	2021-06-01 14:59:40 -07:00
Harshavardhana	ebf75ef10d	fix: remove all unused code (#12360 )	2021-05-24 09:28:19 -07:00
Harshavardhana	0287711dc9	fix: implement readMetadata common function for re-use (#12353 ) Previous PR #12351 added functions to read from the reader stream to reduce memory usage, use the same technique in few other places where we are not interested in reading the data part.	2021-05-21 11:41:25 -07:00
Klaus Post	9d1b6fb37d	Add XL reader without data (#12351 ) Add XL metadata reader that reads metadata only on larger files. Use for scanning and listing for now.	2021-05-21 09:10:54 -07:00
Klaus Post	254698f126	fix: minor allocation improvements in xlMetaV2 (#12133 )	2021-05-07 09:11:05 -07:00
Krishnan Parthasarathi	c829e3a13b	Support for remote tier management (#12090 ) With this change, MinIO's ILM supports transitioning objects to a remote tier. This change includes support for Azure Blob Storage, AWS S3 compatible object storage incl. MinIO and Google Cloud Storage as remote tier storage backends. Some new additions include: - Admin APIs remote tier configuration management - Simple journal to track remote objects to be 'collected' This is used by object API handlers which 'mutate' object versions by overwriting/replacing content (Put/CopyObject) or removing the version itself (e.g DeleteObjectVersion). - Rework of previous ILM transition to fit the new model In the new model, a storage class (a.k.a remote tier) is defined by the 'remote' object storage type (one of s3, azure, GCS), bucket name and a prefix. * Fixed bugs, review comments, and more unit-tests - Leverage inline small object feature - Migrate legacy objects to the latest object format before transitioning - Fix restore to particular version if specified - Extend SharedDataDirCount to handle transitioned and restored objects - Restore-object should accept version-id for version-suspended bucket (#12091) - Check if remote tier creds have sufficient permissions - Bonus minor fixes to existing error messages Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io> Co-authored-by: Krishna Srinivas <krishna@minio.io> Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-23 11:58:53 -07:00
Harshavardhana	069432566f	update license change for MinIO Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-23 11:58:53 -07:00
Harshavardhana	2ef824bbb2	collapse two distinct calls into single RenameData() call (#12093 ) This is an optimization by reducing one extra system call, and many network operations. This reduction should increase the performance for small file workloads.	2021-04-20 10:44:39 -07:00
Harshavardhana	1456f9f090	fix: preserve shared dataDir during suspend overwrites (#12058 ) CopyObject() when shares dataDir needs to be preserved, and upon versioning suspended overwrites should still preserve the dataDir.	2021-04-15 08:44:05 -07:00
Harshavardhana	b70c298c27	update findDataDir to skip inline data (#12050 )	2021-04-14 22:44:27 -07:00
Harshavardhana	e85b28398b	fix: pre-allocate certain slices with expected capacity (#12044 ) Avoids append() based tiny allocations on known allocated slices repeated access.	2021-04-12 13:45:06 -07:00
Harshavardhana	641150f2a2	change updateVersion to only update keys, no deletes (#12032 ) there are situations where metadata can have keys with empty values, preserve existing behavior	2021-04-10 09:13:12 -07:00
Klaus Post	f0ca0b3ca9	Add metadata checksum (#12017 ) - Add 32-bit checksum (32 LSB part of xxhash64) of the serialized metadata. This will ensure that we always reject corrupted metadata. - Add automatic repair of inline data, so the data structure can be used. If data was corrupted, we remove all unreadable entries to ensure that operations can succeed on the object. Since higher layers add bitrot checks this is not a big problem. Cannot downgrade to v1.1 metadata, but since that isn't released, no need for a major bump.	2021-04-08 17:29:54 -07:00
Harshavardhana	d46386246f	api: Introduce metadata update APIs to update only metadata (#11962 ) Current implementation heavily relies on readAllFileInfo but with the advent of xl.meta inlined with data, we cannot easily avoid reading data when we are only interested is updating metadata, this leads to invariably write amplification during metadata updates, repeatedly reading data when we are only interested in updating metadata. This PR ensures that we implement a metadata only update API at storage layer, that handles updates to metadata alone for any given version - given the version is valid and present. This helps reduce the chattiness for following calls.. - PutObjectTags - DeleteObjectTags - PutObjectLegalHold - PutObjectRetention - ReplicateObject (updates metadata on replication status)	2021-04-04 13:32:31 -07:00
Poorna Krishnamoorthy	47c09a1e6f	Various improvements in replication (#11949 ) - collect real time replication metrics for prometheus. - add pending_count, failed_count metric for total pending/failed replication operations. - add API to get replication metrics - add MRF worker to handle spill-over replication operations - multiple issues found with replication - fixes an issue when client sends a bucket name with `/` at the end from SetRemoteTarget API call make sure to trim the bucket name to avoid any extra `/`. - hold write locks in GetObjectNInfo during replication to ensure that object version stack is not overwritten while reading the content. - add additional protection during WriteMetadata() to ensure that we always write a valid FileInfo{} and avoid ever writing empty FileInfo{} to the lowest layers. Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io> Co-authored-by: Harshavardhana <harsha@minio.io>	2021-04-03 09:03:42 -07:00
Harshavardhana	204c610d84	do not use dataDir to reference inline data use versionID (#11942 ) versionID is the one that needs to be preserved and as well as overwritten in case of replication, transition etc - dataDir is an ephemeral entity that changes during overwrites - make sure that versionID is used to save the object content. this would break things if you are already running the latest master, please wipe your current content and re-do your setup after this change.	2021-04-01 13:09:23 -07:00
Klaus Post	2623338dc5	Inline small file data in xl.meta file (#11758 )	2021-03-29 17:00:55 -07:00
Harshavardhana	99b733d44c	fix: deletion of delete marker regression (#11465 ) fixes #11440 fixes #11451 fixes #11454	2021-02-05 15:06:23 -08:00
Anis Elleuch	1887c25279	xl: Fix feeding NumVersions & SuccessorModTime to lifecycle (#11462 ) After recent refactor where lifecycle started to rely on ObjectInfo to make decisions, it turned out there are some issues calculating Successor Modtime and NumVersions, hence the lifecycle is not working as expected in a versioning bucket in some cases. This commit fixes the behavior.	2021-02-05 11:59:08 -08:00
Harshavardhana	f108873c48	fix: replication metadata comparsion and other fixes (#11410 ) - using miniogo.ObjectInfo.UserMetadata is not correct - using UserTags from Map->String() can change order - ContentType comparison needs to be removed. - Compare both lowercase and uppercase key names. - do not silently error out constructing PutObjectOptions if tag parsing fails - avoid notification for empty object info, failed operations should rely on valid objInfo for notification in all situations - optimize copyObject implementation, also introduce a new replication event - clone ObjectInfo() before scheduling for replication - add additional headers for comparison - remove strings.EqualFold comparison avoid unexpected bugs - fix pool based proxying with multiple pools - compare only specific metadata Co-authored-by: Poorna Krishnamoorthy <poornas@users.noreply.github.com>	2021-02-03 20:41:33 -08:00
Anis Elleuch	65aa2bc614	ilm: Remove object in HEAD/GET if having an applicable ILM rule (#11296 ) Remove an object on the fly if there is a lifecycle rule with delete expiry action for the corresponding object.	2021-02-01 09:52:11 -08:00
Harshavardhana	d1a8f0b786	fix possible crashes on deleteMarker replication (#11308 ) Delete marker can have `metaSys` set to nil, that can lead to crashes after the delete marker has been healed. Additionally also fix isObjectDangling check for transitioned objects, that do not have parts should be treated similar to Delete marker.	2021-01-20 13:12:12 -08:00
Harshavardhana	bdd094bc39	fix: avoid sending errors on missing objects on locked buckets (#10994 ) make sure multi-object delete returned errors that are AWS S3 compatible	2020-11-28 21:15:45 -08:00
Poorna Krishnamoorthy	2ff655a745	Refactor replication, ILM handling in DELETE API (#10945 )	2020-11-25 11:24:50 -08:00
Poorna Krishnamoorthy	39f3d5493b	Show Delete replication status header (#10946 ) X-Minio-Replication-Delete-Status header shows the status of the replication of a permanent delete of a version. All GETs are disallowed and return 405 on this object version. In the case of replicating delete markers. X-Minio-Replication-DeleteMarker-Status shows the status of replication, and would similarly return 405. Additionally, this PR adds reporting of delete marker event completion and updates documentation	2020-11-21 23:48:50 -08:00
Poorna Krishnamoorthy	1ebf6f146a	Add support for ILM transition (#10565 ) This PR adds transition support for ILM to transition data to another MinIO target represented by a storage class ARN. Subsequent GET or HEAD for that object will be streamed from the transition tier. If PostRestoreObject API is invoked, the transitioned object can be restored for duration specified to the source cluster.	2020-11-19 18:47:17 -08:00
Harshavardhana	9a34fd5c4a	Revert "Revert "Add delete marker replication support (#10396 )"" This reverts commit `267d7bf0a9`.	2020-11-19 18:43:58 -08:00

1 2

70 Commits