minio

Commit Graph

Author	SHA1	Message	Date
Harshavardhana	b883803b21	fix: healing across pools removing dangling objects (#13990 ) adds other simplifications to the code when running namespace heals across pools.	2021-12-25 09:01:44 -08:00
Harshavardhana	7e3a7d7044	add healing for invalid shards by skipping the blocks (#13978 ) Built on top of #13945, now we need to simply skip the shards and its automated.	2021-12-23 23:01:46 -08:00
Harshavardhana	0e3037631f	skip inconsistent shards if possible (#13945 ) data shards were wrong due to a healing bug reported in #13803 mainly with unaligned object sizes. This PR is an attempt to automatically avoid these shards, with available information about the `xl.meta` and actually disk mtime.	2021-12-21 10:08:26 -08:00
jiangfucheng	7460fb8349	fix padding error and compatible with uploaded objects (#13803 )	2021-12-03 09:26:30 -08:00
Harshavardhana	28f95f1fbe	quorum calculation getLatestFileInfo should be itself (#13717 ) FileInfo quorum shouldn't be passed down, instead inferred after obtaining a maximally occurring FileInfo. This PR also changes other functions that rely on wrong quorum calculation. Update tests as well to handle the proper requirement. All these changes are needed when migrating from older deployments where we used to set N/2 quorum for reads to EC:4 parity in newer releases.	2021-11-22 09:36:29 -08:00
Harshavardhana	c791de0e1e	re-implement pickValidInfo dataDir, move to quorum calculation (#13681 ) dataDir loosely based on maxima is incorrect and does not work in all situations such as disks in the following order - xl.json migration to xl.meta there may be partial xl.json's leftover if some disks are not yet connected when the disk is yet to come up, since xl.json mtime and xl.meta is same the dataDir maxima doesn't work properly leading to quorum issues. - its also possible that XLV1 might be true among the disks available, make sure to keep FileInfo based on common quorum and skip unexpected disks with the older data format. Also, this PR tests upgrade from older to a newer release if the data is readable and matches the checksum. NOTE: this is just initial work we can build on top of this to do further tests.	2021-11-21 10:41:30 -08:00
Harshavardhana	4545ecad58	ignore swapped drives instead of throwing errors (#13655 ) - add checks such that swapped disks are detected and ignored - never used for normal operations. - implement `unrecognizedDisk` to be ignored with all operations returning `errDiskNotFound`. - also add checks such that we do not load unexpected disks while connecting automatically. - additionally humanize the values when printing the errors. Bonus: fixes handling of non-quorum situations in getLatestFileInfo(), that does not work when 2 drives are down, currently this function would return errors incorrectly.	2021-11-15 09:46:55 -08:00
Harshavardhana	94d587e6fc	fix: delete-markers without quorum were unreadable (#13351 ) DeleteMarkers were unreadable if they had quorum based guarantees, this PR tries to fix this behavior appropriately. DeleteMarkers with sufficient should be allowed and the return error should be accordingly with or without version-id. This also allows for overwrites which may not be possible in a multi-pool setup. fixes #12787	2021-10-04 08:53:38 -07:00
Anis Elleuch	1d9e91e00f	Fix wrong reporting of total disks after restart (#13326 ) A restart of the cluster and a failed disk will wrongly count the number of total disks.	2021-09-29 11:36:19 -07:00
Klaus Post	0e7fdcee30	Healing: Decide healing inlining based on metadata (#13178 ) Don't perform an independent evaluation of inlining, but mirror the decision made when uploading the object. Leads to some objects being inlined or not based on new metrics. Instead respect previous decision.	2021-09-09 08:55:43 -07:00
Harshavardhana	a19e3bc9d9	add more dangling heal related tests (#13140 ) also make sure that HealObject() never returns 'ObjectNotFound' or 'VersionNotFound' errors, as those are meaningless and not useful for the caller.	2021-09-02 20:56:13 -07:00
Krishnan Parthasarathi	db35bcf2ce	heal: Remove transitioned objects' parts from outdated disks (#13018 ) Bonus: check equality for replication and other metadata	2021-08-23 13:14:55 -07:00
Harshavardhana	035882d292	fix: remove parentIsObject() check (#12851 ) we will allow situations such as ``` a/b/1.txt a/b ``` and ``` a/b a/b/1.txt ``` we are going to document that this usecase is not supported and we will never support it, if any application does this users have to delete the top level parent to make sure namespace is accessible at lower level. rest of the situations where the prefixes get created across sets are supported as is.	2021-08-03 13:26:57 -07:00
Harshavardhana	a9d9b520ec	remove short circuited healing optimization (#12796 ) this healing optimization caused multiple regressions in healing - delete-markers incorrectly missing heal and returning incorrect healing results to client. - missing individual 'parts' such as for restored object or simply for all objects just missing few parts. This optimization is not necessary, we should proceed to verify all cases possible not just when metadata is inconsistent.	2021-07-26 16:51:09 -07:00
Harshavardhana	a3f7d575e0	improve delete-marker healing (#12794 ) delete-markers missing on drives were not healed due to few things disksWithAllParts() does not know-how to deal with delete markers, add support for that. fixes #12787	2021-07-26 11:48:09 -07:00
Harshavardhana	f175ff8f66	add healing fixes for delete-marker (#12788 ) - delete-markers are incorrectly reported as corrupt with wrong data sent to client 'mc admin heal -r' on objects with delete marker will report as 'grey' incorrectly. - do not heal delete-markers during HeadObject() this can lead to inconsistent order of heals on the object, although this is not an issue in terms of order of versions it is rather simpler to keep the same order on all drives. - defaultHealResult() should handle 'err == nil' case such that valid cases should be handled as 'drive' status OK.	2021-07-26 08:01:41 -07:00
Harshavardhana	542fe4ea2e	fix: legacy objects with 10MiB blockSize should use right buffers (#12459 ) healing code was using incorrect buffers to heal older objects with 10MiB erasure blockSize, incorrect calculation of such buffers can lead to incorrect premature closure of io.Pipe() during healing. fixes #12410	2021-06-07 10:06:06 -07:00
Harshavardhana	1f262daf6f	rename all remaining packages to internal/ (#12418 ) This is to ensure that there are no projects that try to import `minio/minio/pkg` into their own repo. Any such common packages should go to `https://github.com/minio/pkg`	2021-06-01 14:59:40 -07:00
Harshavardhana	b5ebfd35b4	fix: always prefer DataBlocks present in FileInfo (#12386 )	2021-05-27 10:11:50 -07:00
Klaus Post	3fff50120b	Revert heal locks (#12365 ) A lot of healing is likely to be on non-existing objects and locks are very expensive and will slow down scanning significantly. In cases where all are valid or, all are broken allow rejection without locking. Keep the existing behavior, but move the check for dangling objects to after the lock has been acquired. ``` _, err = getLatestFileInfo(ctx, partsMetadata, errs) if err != nil { return er.purgeObjectDangling(ctx, bucket, object, versionID, partsMetadata, errs, []error{}, opts) } ``` Revert "heal: Hold lock when reading xl.meta from disks (#12362)" This reverts commit `abd32065aa`	2021-05-25 17:02:06 -07:00
Harshavardhana	4840974d7a	fix: inline data upon overwrites should be readable (#12369 ) This PR fixes two bugs - Remove fi.Data upon overwrite of objects from inlined-data to non-inlined-data - Workaround for an existing bug on disk with latest releases to ignore fi.Data and instead read from the disk for non-inlined-data - Addtionally add a reserved metadata header to indicate data is inlined for a given version.	2021-05-25 16:33:06 -07:00
Harshavardhana	ed4941a5f3	fix: calculate dataBlocks properly in healing (#12364 )	2021-05-25 09:34:27 -07:00
Anis Elleuch	abd32065aa	heal: Hold lock when reading xl.meta from disks (#12362 ) Lock is hold in healObject() after reading xl.meta from disks the first time. This commit will held the lock since the beginning of HealObject() Co-authored-by: Anis Elleuch <anis@min.io>	2021-05-24 13:39:38 -07:00
Klaus Post	cde6469b88	Fix hanging erasure writes (#12253 ) However, this slice is also used for closing the writers, so close is never called on these. Furthermore when an error is returned from a write it is now reported to the reader. bonus: remove unused heal param from `newBitrotWriter`. * Remove copy, now that we don't mutate.	2021-05-17 08:32:28 -07:00
Harshavardhana	f1e479d274	remove more duplicate bloom filter trackers (#12302 ) At some places bloom filter tracker was getting updated for `.minio.sys/tmp` bucket, there is no reason to update bloom filters for those. And add a missing bloom filter update for MakeBucket() Bonus: purge unused function deleteEmptyDir()	2021-05-17 08:25:48 -07:00
Harshavardhana	d84261aa6d	fix: ensure proper usage of DataDir (#12300 ) - GetObject() should always use a common dataDir to read from when it starts reading, this allows the code in erasure decoding to have sane expectations. - Healing should always heal on the common dataDir, this allows the code in dangling object detection to purge dangling content. These both situations can happen under certain types of retries during PUT when server is restarting etc, some namespace entries might be left over.	2021-05-14 16:50:47 -07:00
Krishnan Parthasarathi	0bab1c1895	Heal restored object contents on disk (#12238 )	2021-05-06 16:06:57 -07:00
Harshavardhana	1aa5858543	move madmin to github.com/minio/madmin-go (#12239 )	2021-05-06 08:52:02 -07:00
Harshavardhana	ff36baeaa7	fix: attempt to drain the ReadFileStream for connection pooling (#12208 ) avoid time_wait build up with getObject requests if there are pending callers and they timeout, can lead to time_wait states Bonus share the same buffer pool with erasure healing logic, additionally also fixes a race where parallel readers were never cleanup during Encode() phase, because pipe.Reader end was never closed(). Added closer right away upon an error during Encode to make sure to avoid racy Close() while stream was still being Read().	2021-05-04 10:12:08 -07:00
Krishnan Parthasarathi	860bf1bab2	Add IsRemote method on FileInfo, ObjectInfo (#12209 ) Provides a convenient method to know if an object's contents are in its remote tier.	2021-05-04 08:40:42 -07:00
Harshavardhana	f7a87b30bf	Revert "deprecate embedded browser (#12163 )" This reverts commit `736d8cbac4`. Bring contrib files for older contributions	2021-04-30 08:50:39 -07:00
Harshavardhana	64f6020854	fix: cleanup locking, cancel context upon lock timeout (#12183 ) upon errors to acquire lock context would still leak, since the cancel would never be called. since the lock is never acquired - proactively clear it before returning.	2021-04-29 20:55:21 -07:00
Anis Elleuch	9e797532dc	lock: Always cancel the returned Get(R)Lock context (#12162 ) * lock: Always cancel the returned Get(R)Lock context There is a leak with cancel created inside the locking mechanism. The cancel purpose was to cancel operations such erasure get/put that are holding non-refreshable locks. This PR will ensure the created context.Cancel is passed to the unlock API so it will cleanup and avoid leaks. * locks: Avoid returning nil cancel in local lockers Since there is no Refresh mechanism in the local locking mechanism, we do not generate a new context or cancel. Currently, a nil cancel function is returned but this can cause a crash. Return a dummy function instead.	2021-04-27 16:12:50 -07:00
Harshavardhana	736d8cbac4	deprecate embedded browser (#12163 ) https://github.com/minio/console takes over the functionality for the future object browser development Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-27 10:52:12 -07:00
Poorna Krishnamoorthy	4be0f92067	Fix multipart restore to remove part match (#12161 ) Part ETags are not available after multipart finalizes, removing this check as not useful. Signed-off-by: Poorna Krishnamoorthy <poorna@minio.io> Co-authored-by: Harshavardhana <harsha@minio.io>	2021-04-26 18:24:06 -07:00
Krishnan Parthasarathi	c829e3a13b	Support for remote tier management (#12090 ) With this change, MinIO's ILM supports transitioning objects to a remote tier. This change includes support for Azure Blob Storage, AWS S3 compatible object storage incl. MinIO and Google Cloud Storage as remote tier storage backends. Some new additions include: - Admin APIs remote tier configuration management - Simple journal to track remote objects to be 'collected' This is used by object API handlers which 'mutate' object versions by overwriting/replacing content (Put/CopyObject) or removing the version itself (e.g DeleteObjectVersion). - Rework of previous ILM transition to fit the new model In the new model, a storage class (a.k.a remote tier) is defined by the 'remote' object storage type (one of s3, azure, GCS), bucket name and a prefix. * Fixed bugs, review comments, and more unit-tests - Leverage inline small object feature - Migrate legacy objects to the latest object format before transitioning - Fix restore to particular version if specified - Extend SharedDataDirCount to handle transitioned and restored objects - Restore-object should accept version-id for version-suspended bucket (#12091) - Check if remote tier creds have sufficient permissions - Bonus minor fixes to existing error messages Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io> Co-authored-by: Krishna Srinivas <krishna@minio.io> Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-23 11:58:53 -07:00
Harshavardhana	069432566f	update license change for MinIO Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-23 11:58:53 -07:00
Harshavardhana	a7acfa6158	fix: pick valid FileInfo additionally based on dataDir (#12116 ) * fix: pick valid FileInfo additionally based on dataDir historically we have always relied on modTime to be consistent and same, we can now add additional reference to look for the same dataDir value. A dataDir is the same for an object at a given point in time for a given version, let's say a `null` version is overwritten in quorum we do not by mistake pick up the fileInfo's incorrectly. * make sure to not preserve fi.Data Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-21 19:06:08 -07:00
Harshavardhana	2ef824bbb2	collapse two distinct calls into single RenameData() call (#12093 ) This is an optimization by reducing one extra system call, and many network operations. This reduction should increase the performance for small file workloads.	2021-04-20 10:44:39 -07:00
Klaus Post	d267d152ba	healing: re-read metadata after lock (#12004 ) Do no use potentially wrong metadata from before acquiring lock. Plus remove unused NoLock option.	2021-04-07 10:39:48 -07:00
Harshavardhana	434e5c0cfe	allow preserving legacyXLv1 with inline data format (#11951 ) current master breaks this important requirement we need to preserve legacyXLv1 format, this is simply ignored and overwritten causing a myriad of issues by leaving stale files on the namespace etc. for now lets still use the two-phase approach of writing to `tmp` and then renaming the content to the actual namespace.	2021-04-01 22:12:03 -07:00
Harshavardhana	f966fbc4a3	make sure to preserve checksumInfo to lookup older hash (#11940 ) upgrading from 2yr old releases is expected to work, the issue was we were missing checksum info to be passed down to newBitrotReader() for whole bitrot calculation	2021-03-31 21:14:08 -07:00
Klaus Post	2623338dc5	Inline small file data in xl.meta file (#11758 )	2021-03-29 17:00:55 -07:00
Harshavardhana	51a8619a79	[feat] Add configurable deadline for writers (#11822 ) This PR adds deadlines per Write() calls, such that slow drives are timed-out appropriately and the overall responsiveness for Writes() is always up to a predefined threshold providing applications sustained latency even if one of the drives is slow to respond.	2021-03-18 14:09:55 -07:00
Klaus Post	fa9cf1251b	Imporve healing and reporting (#11312 ) * Provide information on actively healing, buckets healed/queued, objects healed/failed. * Add concurrent healing of multiple sets (typically on startup). * Add bucket level resume, so restarts will only heal non-healed buckets. * Print summary after healing a disk is done.	2021-03-04 14:36:23 -08:00
Anis Elleuch	b3f81e75f6	xl: Make it clear when to create delete marker for a non existant object (#11423 )	2021-02-03 10:33:43 -08:00
Harshavardhana	09bc49bd51	fix: healBucket across sets should capture results properly (#11341 ) healing `.minio.sys/config` returns incorrect quorum errors across sets, healing of the buckets.	2021-01-25 09:45:09 -08:00
Harshavardhana	d1a8f0b786	fix possible crashes on deleteMarker replication (#11308 ) Delete marker can have `metaSys` set to nil, that can lead to crashes after the delete marker has been healed. Additionally also fix isObjectDangling check for transitioned objects, that do not have parts should be treated similar to Delete marker.	2021-01-20 13:12:12 -08:00
Harshavardhana	f903cae6ff	Support variable server pools (#11256 ) Current implementation requires server pools to have same erasure stripe sizes, to facilitate same SLA and expectations. This PR allows server pools to be variadic, i.e they do not have to be same erasure stripe sizes - instead they should have SLA for parity ratio. If the parity ratio cannot be guaranteed by the new server pool, the deployment is rejected i.e server pool expansion is not allowed.	2021-01-16 12:08:02 -08:00
Harshavardhana	f21d650ed4	fix: readData in bulk call using messagepack byte wrappers (#11228 ) This PR refactors the way we use buffers for O_DIRECT and to re-use those buffers for messagepack reader writer. After some extensive benchmarking found that not all objects have this benefit, and only objects smaller than 64KiB see this benefit overall. Benefits are seen from almost all objects from 1KiB - 32KiB Beyond this no objects see benefit with bulk call approach as the latency of bytes sent over the wire v/s streaming content directly from disk negate each other with no remarkable benefits. All other optimizations include reuse of msgp.Reader, msgp.Writer using sync.Pool's for all internode calls.	2021-01-07 19:27:31 -08:00

1 2

73 Commits