minio

mirror of https://github.com/minio/minio.git synced 2024-12-26 15:15:55 -05:00

Author	SHA1	Message	Date
Harshavardhana	6c11dbffd5	add crash protection from backend modifications (#16846 )	2023-03-20 09:08:42 -07:00
ferhat elmas	714283fae2	cleanup ignored static analysis (#16767 )	2023-03-06 08:56:10 -08:00
Klaus Post	9acf1024e4	Remove bloom filter (#16682 ) Removes the bloom filter since it has so limited usability, often gets saturated anyway and adds a bunch of complexity to the scanner. Also removes a tiny bit of CPU by each write operation.	2023-02-24 09:03:31 +05:30
Klaus Post	fd6622458b	Add detailed scanner trace output and notifications (#16668 )	2023-02-21 09:33:33 -08:00
Harshavardhana	72daccd468	fix: scanner in healing cycle must use actual size (#16589 )	2023-02-10 06:53:03 -08:00
Poorna	b22b39de96	Avoid dangling deletes if disk not found (#16401 )	2023-01-12 22:20:19 -08:00
Harshavardhana	a15a2556c3	converge listBuckets() as a peer call (#16346 )	2023-01-03 23:39:40 -08:00
Anis Elleuch	acc9c033ed	debug: Add X-Amz-Request-ID to lock/unlock calls (#16309 )	2022-12-23 19:49:07 -08:00
Klaus Post	70986b6e6e	Add version id to healresult (#16193 )	2022-12-08 07:49:10 -08:00
Aditya Manthramurthy	a30cfdd88f	Bump up madmin-go to v2 (#16162 )	2022-12-06 13:46:50 -08:00
Klaus Post	cc1d8f0057	Check for abandoned data when healing (#16122 )	2022-11-28 10:20:55 -08:00
Harshavardhana	fd6f6fc8df	cleanup stale parent multipart directories (#15980 )	2022-11-01 08:00:02 -07:00
Harshavardhana	136d41775f	remove numAvailableDisks check as it doesn't serve any purpose (#15954 )	2022-10-27 09:05:24 -07:00
Harshavardhana	2d9b5a65f1	verify RenameData() versions to be consistent (#15649 ) xl.meta gets written and never rolled back, however we definitely need to validate the state that is persisted on the disk, if there are inconsistencies - more than write quorum we should return an error to the client - if write quorum was achieved however there are inconsistent xl.meta's we should simply trigger an MRF on them	2022-09-05 16:51:37 -07:00
Harshavardhana	bcedc2b0d9	fix: add healing metric type for heal tracing (#15631 ) changes the `heal.checkBucket` to `heal.Bucket` instead since the latter is more meaningful.	2022-08-31 12:28:03 -07:00
Klaus Post	dec942beb6	feat: Add healing trace (#15616 )	2022-08-31 01:56:12 -07:00
Klaus Post	a9f1ad7924	Add extended checksum support (#15433 )	2022-08-29 16:57:16 -07:00
ebozduman	b57e7321e7	Replaces 'disk'=>'drive' visible to end user (#15464 )	2022-08-04 16:10:08 -07:00
Harshavardhana	a6e0ec4e6f	Add support converting non-inlined to inlined (#15444 ) This is a feature to allow for inode compaction on large clusters that use a lot of small files spread across a large heirarchy.	2022-08-02 23:10:22 -07:00
Anis Elleuch	b3edb25377	bloom: healObject to mark a path dirty only for dangling objects (#15458 ) The path is marked dirty automatically when healObject() is called, which is wrong. HealObject() is called during self-healing and this will lead to an increase in the false positive result of the bloom filter. Also move NSUpdated() from renameData() and call it directly in CompleteMultipart and PutObject, this is not a functional change but it will make it less prone to errors in the future.	2022-08-02 16:57:39 -07:00
Harshavardhana	ce8397f7d9	use partInfo only for intermediate part.x.meta (#15353 )	2022-07-19 18:56:24 -07:00
Klaus Post	911a17b149	Add compressed file index (#15247 )	2022-07-11 17:30:56 -07:00
Praveen raj Mani	b49fc33cb3	purge objects immediately with `x-minio-force-delete` in DeleteObject and DeleteBucket API (#15148 )	2022-07-11 09:15:54 -07:00
Anis Elleuch	73733a8fb9	heal: Report correctly in multip-pools setup (#15117 ) `mc admin heal -r <alias>` in a multi setup pools returns incorrectly grey objects. The reason is that erasure-server-pools.HealObject() runs HealObject in all pools and returns the result of the first nil error. However, in the lower erasureObject level, HealObject() returns nil if an object does not exist + missing error in each disk of the object in that pool, therefore confusing mc. Make erasureObject.HealObject() to return not found error in the lower level, so at least erasureServerPools will know what pools to ignore.	2022-06-20 08:07:45 -07:00
Harshavardhana	013cc66d8e	add dataErrs for healing debug log (#15092 )	2022-06-16 09:42:45 -07:00
Harshavardhana	dea8220eee	do not heal outdated disks > parityBlocks (#14976 ) this PR also fixes a situation where incorrect partsMetadata slice was used where fi.Data was re-used from a single drive causing duplication of the shards across all drives. This happens for situations where shouldHeal() returns true for all drives > parityBlocks. To avoid this we should never attempt to heal on all drives > parityBlocks, unless we are doing metadata migration from xl.json -> xl.meta	2022-05-25 15:17:10 -07:00
Harshavardhana	eda34423d7	update gofumpt -w - new changes	2022-04-13 12:00:11 -07:00
Anis Elleuch	3fca4055d2	heal: Re-heal an object when a corruption is found during normal scan (#14482 ) When scanning using normal mode, HealObject() can report an error saying that it found a corrupted part. This doesn't have when HealObject() is called with bitrot scan flag. However, when this happens, we can still restart HealObject() with the bitrot scan. This is also important because this means the scanner and the new disks healer will not be able to heal an object that doesn't exist in a specific disk and has corruption in another disk. Also without this PR, mc admin heal command without bitrot will report an error.	2022-03-04 18:24:34 -08:00
Harshavardhana	f19a414e09	fix: allow danging objects to be purged properly deleteMultipleObjects() (#14273 ) Deleting bulk objects had an issue since the relevant versionID is not passed through the layers to ensure that the dangling object purge actually works cleanly. This is a continuation of quorum related error returned by multi-object delete API from #14248 This PR ensures that we pass down correct information as well as extend the scope of dangling object detection.	2022-02-08 20:08:23 -08:00
Harshavardhana	f546636c52	fix: use renameAll instead of deleteObject() for purging temporary files (#14096 ) This PR simplifies few things - Multipart parts are renamed, upon failure are unrenamed() keep this multipart specific behavior it is needed and works fine. - AbortMultipart should blindly delete once lock is acquired instead of re-reading metadata and calculating quorum, abort is a delete() operation and client has no business looking for errors on this. - Skip Access() calls to folders that are operating on `.minio.sys/multipart` folder as well.	2022-01-13 11:07:41 -08:00
Harshavardhana	38ccc4f672	fix: make sure to avoid calling RenameData() on disconnected disks. (#14094 ) Large clusters with multiple sets, or multi-pool setups at times might fail and report unexpected "file not found" errors. This can become a problem during startup sequence when some files need to be created at multiple locations. - This PR ensures that we nil the erasure writers such that they are skipped in RenameData() call. - RenameData() doesn't need to "Access()" calls for `.minio.sys` folders they always exist. - Make sure PutObject() never returns ObjectNotFound{} for any errors, make sure it always returns "WriteQuorum" when renameData() fails with ObjectNotFound{}. Return appropriate errors for all other cases.	2022-01-12 18:49:01 -08:00
Harshavardhana	b7c5e45fff	heal: isObjectDangling should return false when it cannot decide (#14053 ) In a multi-pool setup when disks are coming up, or in a single pool setup let's say with 100's of erasure sets with a slow network. It's possible when healing is attempted on `.minio.sys/config` folder, it can lead to healing unexpectedly deleting some policy files as dangling due to a mistake in understanding when `isObjectDangling` is considered to be 'true'. This issue happened in commit `30135eed86` when we assumed the validMeta with empty ErasureInfo is considered to be fully dangling. This implementation issue gets exposed when the server is starting up. This is most easily seen with multiple-pool setups because of the disconnected fashion pools that come up. The decision to purge the object as dangling is taken incorrectly prior to the correct state being achieved on each pool, when the corresponding drive let's say returns 'errDiskNotFound', a 'delete' is triggered. At this point, the 'drive' comes online because this is part of the startup sequence as drives can come online lazily. This kind of situation exists because we allow (totalDisks/2) number of drives to be online when the server is being restarted. Implementation made an incorrect assumption here leading to policies getting deleted. Added tests to capture the implementation requirements.	2022-01-07 19:11:54 -08:00
Harshavardhana	f527c708f2	run gofumpt cleanup across code-base (#14015 )	2022-01-02 09:15:06 -08:00
Harshavardhana	79df2c7ce7	correctly calculate read quorum based on the available fileInfo (#14000 ) The current usage of assuming `default` parity of `4` is not correct for all objects stored on MinIO, objects in .minio.sys have maximum parity, healing won't trigger on these objects due to incorrect verification of quorum.	2021-12-28 15:33:03 -08:00
Harshavardhana	b883803b21	fix: healing across pools removing dangling objects (#13990 ) adds other simplifications to the code when running namespace heals across pools.	2021-12-25 09:01:44 -08:00
Harshavardhana	7e3a7d7044	add healing for invalid shards by skipping the blocks (#13978 ) Built on top of #13945, now we need to simply skip the shards and its automated.	2021-12-23 23:01:46 -08:00
Harshavardhana	0e3037631f	skip inconsistent shards if possible (#13945 ) data shards were wrong due to a healing bug reported in #13803 mainly with unaligned object sizes. This PR is an attempt to automatically avoid these shards, with available information about the `xl.meta` and actually disk mtime.	2021-12-21 10:08:26 -08:00
jiangfucheng	7460fb8349	fix padding error and compatible with uploaded objects (#13803 )	2021-12-03 09:26:30 -08:00
Harshavardhana	28f95f1fbe	quorum calculation getLatestFileInfo should be itself (#13717 ) FileInfo quorum shouldn't be passed down, instead inferred after obtaining a maximally occurring FileInfo. This PR also changes other functions that rely on wrong quorum calculation. Update tests as well to handle the proper requirement. All these changes are needed when migrating from older deployments where we used to set N/2 quorum for reads to EC:4 parity in newer releases.	2021-11-22 09:36:29 -08:00
Harshavardhana	c791de0e1e	re-implement pickValidInfo dataDir, move to quorum calculation (#13681 ) dataDir loosely based on maxima is incorrect and does not work in all situations such as disks in the following order - xl.json migration to xl.meta there may be partial xl.json's leftover if some disks are not yet connected when the disk is yet to come up, since xl.json mtime and xl.meta is same the dataDir maxima doesn't work properly leading to quorum issues. - its also possible that XLV1 might be true among the disks available, make sure to keep FileInfo based on common quorum and skip unexpected disks with the older data format. Also, this PR tests upgrade from older to a newer release if the data is readable and matches the checksum. NOTE: this is just initial work we can build on top of this to do further tests.	2021-11-21 10:41:30 -08:00
Harshavardhana	4545ecad58	ignore swapped drives instead of throwing errors (#13655 ) - add checks such that swapped disks are detected and ignored - never used for normal operations. - implement `unrecognizedDisk` to be ignored with all operations returning `errDiskNotFound`. - also add checks such that we do not load unexpected disks while connecting automatically. - additionally humanize the values when printing the errors. Bonus: fixes handling of non-quorum situations in getLatestFileInfo(), that does not work when 2 drives are down, currently this function would return errors incorrectly.	2021-11-15 09:46:55 -08:00
Harshavardhana	94d587e6fc	fix: delete-markers without quorum were unreadable (#13351 ) DeleteMarkers were unreadable if they had quorum based guarantees, this PR tries to fix this behavior appropriately. DeleteMarkers with sufficient should be allowed and the return error should be accordingly with or without version-id. This also allows for overwrites which may not be possible in a multi-pool setup. fixes #12787	2021-10-04 08:53:38 -07:00
Anis Elleuch	1d9e91e00f	Fix wrong reporting of total disks after restart (#13326 ) A restart of the cluster and a failed disk will wrongly count the number of total disks.	2021-09-29 11:36:19 -07:00
Klaus Post	0e7fdcee30	Healing: Decide healing inlining based on metadata (#13178 ) Don't perform an independent evaluation of inlining, but mirror the decision made when uploading the object. Leads to some objects being inlined or not based on new metrics. Instead respect previous decision.	2021-09-09 08:55:43 -07:00
Harshavardhana	a19e3bc9d9	add more dangling heal related tests (#13140 ) also make sure that HealObject() never returns 'ObjectNotFound' or 'VersionNotFound' errors, as those are meaningless and not useful for the caller.	2021-09-02 20:56:13 -07:00
Krishnan Parthasarathi	db35bcf2ce	heal: Remove transitioned objects' parts from outdated disks (#13018 ) Bonus: check equality for replication and other metadata	2021-08-23 13:14:55 -07:00
Harshavardhana	035882d292	fix: remove parentIsObject() check (#12851 ) we will allow situations such as ``` a/b/1.txt a/b ``` and ``` a/b a/b/1.txt ``` we are going to document that this usecase is not supported and we will never support it, if any application does this users have to delete the top level parent to make sure namespace is accessible at lower level. rest of the situations where the prefixes get created across sets are supported as is.	2021-08-03 13:26:57 -07:00
Harshavardhana	a9d9b520ec	remove short circuited healing optimization (#12796 ) this healing optimization caused multiple regressions in healing - delete-markers incorrectly missing heal and returning incorrect healing results to client. - missing individual 'parts' such as for restored object or simply for all objects just missing few parts. This optimization is not necessary, we should proceed to verify all cases possible not just when metadata is inconsistent.	2021-07-26 16:51:09 -07:00
Harshavardhana	a3f7d575e0	improve delete-marker healing (#12794 ) delete-markers missing on drives were not healed due to few things disksWithAllParts() does not know-how to deal with delete markers, add support for that. fixes #12787	2021-07-26 11:48:09 -07:00
Harshavardhana	f175ff8f66	add healing fixes for delete-marker (#12788 ) - delete-markers are incorrectly reported as corrupt with wrong data sent to client 'mc admin heal -r' on objects with delete marker will report as 'grey' incorrectly. - do not heal delete-markers during HeadObject() this can lead to inconsistent order of heals on the object, although this is not an issue in terms of order of versions it is rather simpler to keep the same order on all drives. - defaultHealResult() should handle 'err == nil' case such that valid cases should be handled as 'drive' status OK.	2021-07-26 08:01:41 -07:00
Harshavardhana	542fe4ea2e	fix: legacy objects with 10MiB blockSize should use right buffers (#12459 ) healing code was using incorrect buffers to heal older objects with 10MiB erasure blockSize, incorrect calculation of such buffers can lead to incorrect premature closure of io.Pipe() during healing. fixes #12410	2021-06-07 10:06:06 -07:00
Harshavardhana	1f262daf6f	rename all remaining packages to internal/ (#12418 ) This is to ensure that there are no projects that try to import `minio/minio/pkg` into their own repo. Any such common packages should go to `https://github.com/minio/pkg`	2021-06-01 14:59:40 -07:00
Harshavardhana	b5ebfd35b4	fix: always prefer DataBlocks present in FileInfo (#12386 )	2021-05-27 10:11:50 -07:00
Klaus Post	3fff50120b	Revert heal locks (#12365 ) A lot of healing is likely to be on non-existing objects and locks are very expensive and will slow down scanning significantly. In cases where all are valid or, all are broken allow rejection without locking. Keep the existing behavior, but move the check for dangling objects to after the lock has been acquired. ``` _, err = getLatestFileInfo(ctx, partsMetadata, errs) if err != nil { return er.purgeObjectDangling(ctx, bucket, object, versionID, partsMetadata, errs, []error{}, opts) } ``` Revert "heal: Hold lock when reading xl.meta from disks (#12362)" This reverts commit `abd32065aa`	2021-05-25 17:02:06 -07:00
Harshavardhana	4840974d7a	fix: inline data upon overwrites should be readable (#12369 ) This PR fixes two bugs - Remove fi.Data upon overwrite of objects from inlined-data to non-inlined-data - Workaround for an existing bug on disk with latest releases to ignore fi.Data and instead read from the disk for non-inlined-data - Addtionally add a reserved metadata header to indicate data is inlined for a given version.	2021-05-25 16:33:06 -07:00
Harshavardhana	ed4941a5f3	fix: calculate dataBlocks properly in healing (#12364 )	2021-05-25 09:34:27 -07:00
Anis Elleuch	abd32065aa	heal: Hold lock when reading xl.meta from disks (#12362 ) Lock is hold in healObject() after reading xl.meta from disks the first time. This commit will held the lock since the beginning of HealObject() Co-authored-by: Anis Elleuch <anis@min.io>	2021-05-24 13:39:38 -07:00
Klaus Post	cde6469b88	Fix hanging erasure writes (#12253 ) However, this slice is also used for closing the writers, so close is never called on these. Furthermore when an error is returned from a write it is now reported to the reader. bonus: remove unused heal param from `newBitrotWriter`. * Remove copy, now that we don't mutate.	2021-05-17 08:32:28 -07:00
Harshavardhana	f1e479d274	remove more duplicate bloom filter trackers (#12302 ) At some places bloom filter tracker was getting updated for `.minio.sys/tmp` bucket, there is no reason to update bloom filters for those. And add a missing bloom filter update for MakeBucket() Bonus: purge unused function deleteEmptyDir()	2021-05-17 08:25:48 -07:00
Harshavardhana	d84261aa6d	fix: ensure proper usage of DataDir (#12300 ) - GetObject() should always use a common dataDir to read from when it starts reading, this allows the code in erasure decoding to have sane expectations. - Healing should always heal on the common dataDir, this allows the code in dangling object detection to purge dangling content. These both situations can happen under certain types of retries during PUT when server is restarting etc, some namespace entries might be left over.	2021-05-14 16:50:47 -07:00
Krishnan Parthasarathi	0bab1c1895	Heal restored object contents on disk (#12238 )	2021-05-06 16:06:57 -07:00
Harshavardhana	1aa5858543	move madmin to github.com/minio/madmin-go (#12239 )	2021-05-06 08:52:02 -07:00
Harshavardhana	ff36baeaa7	fix: attempt to drain the ReadFileStream for connection pooling (#12208 ) avoid time_wait build up with getObject requests if there are pending callers and they timeout, can lead to time_wait states Bonus share the same buffer pool with erasure healing logic, additionally also fixes a race where parallel readers were never cleanup during Encode() phase, because pipe.Reader end was never closed(). Added closer right away upon an error during Encode to make sure to avoid racy Close() while stream was still being Read().	2021-05-04 10:12:08 -07:00
Krishnan Parthasarathi	860bf1bab2	Add IsRemote method on FileInfo, ObjectInfo (#12209 ) Provides a convenient method to know if an object's contents are in its remote tier.	2021-05-04 08:40:42 -07:00
Harshavardhana	f7a87b30bf	Revert "deprecate embedded browser (#12163 )" This reverts commit `736d8cbac4`. Bring contrib files for older contributions	2021-04-30 08:50:39 -07:00
Harshavardhana	64f6020854	fix: cleanup locking, cancel context upon lock timeout (#12183 ) upon errors to acquire lock context would still leak, since the cancel would never be called. since the lock is never acquired - proactively clear it before returning.	2021-04-29 20:55:21 -07:00
Anis Elleuch	9e797532dc	lock: Always cancel the returned Get(R)Lock context (#12162 ) * lock: Always cancel the returned Get(R)Lock context There is a leak with cancel created inside the locking mechanism. The cancel purpose was to cancel operations such erasure get/put that are holding non-refreshable locks. This PR will ensure the created context.Cancel is passed to the unlock API so it will cleanup and avoid leaks. * locks: Avoid returning nil cancel in local lockers Since there is no Refresh mechanism in the local locking mechanism, we do not generate a new context or cancel. Currently, a nil cancel function is returned but this can cause a crash. Return a dummy function instead.	2021-04-27 16:12:50 -07:00
Harshavardhana	736d8cbac4	deprecate embedded browser (#12163 ) https://github.com/minio/console takes over the functionality for the future object browser development Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-27 10:52:12 -07:00
Poorna Krishnamoorthy	4be0f92067	Fix multipart restore to remove part match (#12161 ) Part ETags are not available after multipart finalizes, removing this check as not useful. Signed-off-by: Poorna Krishnamoorthy <poorna@minio.io> Co-authored-by: Harshavardhana <harsha@minio.io>	2021-04-26 18:24:06 -07:00
Krishnan Parthasarathi	c829e3a13b	Support for remote tier management (#12090 ) With this change, MinIO's ILM supports transitioning objects to a remote tier. This change includes support for Azure Blob Storage, AWS S3 compatible object storage incl. MinIO and Google Cloud Storage as remote tier storage backends. Some new additions include: - Admin APIs remote tier configuration management - Simple journal to track remote objects to be 'collected' This is used by object API handlers which 'mutate' object versions by overwriting/replacing content (Put/CopyObject) or removing the version itself (e.g DeleteObjectVersion). - Rework of previous ILM transition to fit the new model In the new model, a storage class (a.k.a remote tier) is defined by the 'remote' object storage type (one of s3, azure, GCS), bucket name and a prefix. * Fixed bugs, review comments, and more unit-tests - Leverage inline small object feature - Migrate legacy objects to the latest object format before transitioning - Fix restore to particular version if specified - Extend SharedDataDirCount to handle transitioned and restored objects - Restore-object should accept version-id for version-suspended bucket (#12091) - Check if remote tier creds have sufficient permissions - Bonus minor fixes to existing error messages Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io> Co-authored-by: Krishna Srinivas <krishna@minio.io> Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-23 11:58:53 -07:00
Harshavardhana	069432566f	update license change for MinIO Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-23 11:58:53 -07:00
Harshavardhana	a7acfa6158	fix: pick valid FileInfo additionally based on dataDir (#12116 ) * fix: pick valid FileInfo additionally based on dataDir historically we have always relied on modTime to be consistent and same, we can now add additional reference to look for the same dataDir value. A dataDir is the same for an object at a given point in time for a given version, let's say a `null` version is overwritten in quorum we do not by mistake pick up the fileInfo's incorrectly. * make sure to not preserve fi.Data Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-21 19:06:08 -07:00
Harshavardhana	2ef824bbb2	collapse two distinct calls into single RenameData() call (#12093 ) This is an optimization by reducing one extra system call, and many network operations. This reduction should increase the performance for small file workloads.	2021-04-20 10:44:39 -07:00
Klaus Post	d267d152ba	healing: re-read metadata after lock (#12004 ) Do no use potentially wrong metadata from before acquiring lock. Plus remove unused NoLock option.	2021-04-07 10:39:48 -07:00
Harshavardhana	434e5c0cfe	allow preserving legacyXLv1 with inline data format (#11951 ) current master breaks this important requirement we need to preserve legacyXLv1 format, this is simply ignored and overwritten causing a myriad of issues by leaving stale files on the namespace etc. for now lets still use the two-phase approach of writing to `tmp` and then renaming the content to the actual namespace.	2021-04-01 22:12:03 -07:00
Harshavardhana	f966fbc4a3	make sure to preserve checksumInfo to lookup older hash (#11940 ) upgrading from 2yr old releases is expected to work, the issue was we were missing checksum info to be passed down to newBitrotReader() for whole bitrot calculation	2021-03-31 21:14:08 -07:00
Klaus Post	2623338dc5	Inline small file data in xl.meta file (#11758 )	2021-03-29 17:00:55 -07:00
Harshavardhana	51a8619a79	[feat] Add configurable deadline for writers (#11822 ) This PR adds deadlines per Write() calls, such that slow drives are timed-out appropriately and the overall responsiveness for Writes() is always up to a predefined threshold providing applications sustained latency even if one of the drives is slow to respond.	2021-03-18 14:09:55 -07:00
Klaus Post	fa9cf1251b	Imporve healing and reporting (#11312 ) * Provide information on actively healing, buckets healed/queued, objects healed/failed. * Add concurrent healing of multiple sets (typically on startup). * Add bucket level resume, so restarts will only heal non-healed buckets. * Print summary after healing a disk is done.	2021-03-04 14:36:23 -08:00
Anis Elleuch	b3f81e75f6	xl: Make it clear when to create delete marker for a non existant object (#11423 )	2021-02-03 10:33:43 -08:00
Harshavardhana	09bc49bd51	fix: healBucket across sets should capture results properly (#11341 ) healing `.minio.sys/config` returns incorrect quorum errors across sets, healing of the buckets.	2021-01-25 09:45:09 -08:00
Harshavardhana	d1a8f0b786	fix possible crashes on deleteMarker replication (#11308 ) Delete marker can have `metaSys` set to nil, that can lead to crashes after the delete marker has been healed. Additionally also fix isObjectDangling check for transitioned objects, that do not have parts should be treated similar to Delete marker.	2021-01-20 13:12:12 -08:00
Harshavardhana	f903cae6ff	Support variable server pools (#11256 ) Current implementation requires server pools to have same erasure stripe sizes, to facilitate same SLA and expectations. This PR allows server pools to be variadic, i.e they do not have to be same erasure stripe sizes - instead they should have SLA for parity ratio. If the parity ratio cannot be guaranteed by the new server pool, the deployment is rejected i.e server pool expansion is not allowed.	2021-01-16 12:08:02 -08:00
Harshavardhana	f21d650ed4	fix: readData in bulk call using messagepack byte wrappers (#11228 ) This PR refactors the way we use buffers for O_DIRECT and to re-use those buffers for messagepack reader writer. After some extensive benchmarking found that not all objects have this benefit, and only objects smaller than 64KiB see this benefit overall. Benefits are seen from almost all objects from 1KiB - 32KiB Beyond this no objects see benefit with bulk call approach as the latency of bytes sent over the wire v/s streaming content directly from disk negate each other with no remarkable benefits. All other optimizations include reuse of msgp.Reader, msgp.Writer using sync.Pool's for all internode calls.	2021-01-07 19:27:31 -08:00
Harshavardhana	4ed45ce543	fix: healing buckets during pool expansion (#11224 ) fixes #11209	2021-01-05 13:24:22 -08:00
Harshavardhana	d0027c3c41	do not use large buffers if not necessary (#11220 ) without this change, there is a performance regression for small objects GETs, this makes the overall speed to go back to pre '59d363' commit days.	2021-01-04 18:51:52 -08:00
Harshavardhana	c4131c2798	feat: Small object optimization read data in single bulk call (#11207 )	2021-01-03 11:27:57 -08:00
Harshavardhana	2eb52ca5f4	fix: heal bucket metadata right before healing bucket (#11097 ) optimization mainly to avoid listing the entire `.minio.sys/buckets/.minio.sys` directory, this can get really huge and comes in the way of startup routines, contents inside `.minio.sys/buckets/.minio.sys` are rather transient and not necessary to be healed.	2020-12-13 11:57:08 -08:00
Harshavardhana	6990de9c94	fix: dangling object delete shall return object doesn't exist (#10961 ) dangling object when deleted means object doesn't exist anymore, so we should return appropriate errors, this allows crawler heal to ensure that it removes the tracker for dangling objects.	2020-11-23 18:50:53 -08:00
Harshavardhana	519c0077a9	fix: do not return an error for successfully deleted dangling objects (#10938 ) dangling objects when removed `mc admin heal -r` or crawler auto heal would incorrectly return error - this can interfere with usage calculation as the entry size for this would be returned as `0`, instead upon success use the resultant object size to calculate the final size for the object and avoid reporting this in the log messages Also do not set ObjectSize in healResultItem to be '-1' this has an effect on crawler metrics calculating 1 byte less for objects which seem to be missing their `xl.meta`	2020-11-23 09:12:17 -08:00
Harshavardhana	8f7fe0405e	fix: delete marker replication should support directories (#10878 ) allow directories to be replicated as well, along with their delete markers in replication. Bonus fix to fix bloom filter updates for directories to be preserved.	2020-11-19 18:47:12 -08:00
Klaus Post	a982baff27	ListObjects Metadata Caching (#10648 ) Design: https://gist.github.com/klauspost/025c09b48ed4a1293c917cecfabdf21c Gist of improvements: * Cross-server caching and listing will use the same data across servers and requests. * Lists can be arbitrarily resumed at a constant speed. * Metadata for all files scanned is stored for streaming retrieval. * The existing bloom filters controlled by the crawler is used for validating caches. * Concurrent requests for the same data (or parts of it) will not spawn additional walkers. * Listing a subdirectory of an existing recursive cache will use the cache. * All listing operations are fully streamable so the number of objects in a bucket no longer dictates the amount of memory. * Listings can be handled by any server within the cluster. * Caches are cleaned up when out of date or superseded by a more recent one.	2020-10-28 09:18:35 -07:00
Krishna Srinivas	f53c5a020e	fix: heal object shards with ec.index and ec.distribution mismatches (#10773 ) Co-authored-by: Harshavardhana <harsha@minio.io>	2020-10-28 00:10:20 -07:00
Krishna Srinivas	592f2f23a3	fix: heal rejects objects with disk re-ordering issue (#10766 )	2020-10-26 18:48:47 -07:00
Anis Elleuch	db2241066b	heal: Enable removing dangling delete markers (#10688 )	2020-10-15 13:06:40 -07:00
Harshavardhana	2760fc86af	Bump default idleConnsPerHost to control conns in time_wait (#10653 ) This PR fixes a hang which occurs quite commonly at higher concurrency by allowing following changes - allowing lower connections in time_wait allows faster socket open's - lower idle connection timeout to ensure that we let kernel reclaim the time_wait connections quickly - increase somaxconn to 4096 instead of 2048 to allow larger tcp syn backlogs. fixes #10413	2020-10-12 14:19:46 -07:00
Harshavardhana	ca989eb0b3	avoid ListBuckets returning quorum errors when node is down (#10555 ) Also, revamp the way ListBuckets work make few portions of the healing logic parallel - walk objects for healing disks in parallel - collect the list of buckets in parallel across drives - provide consistent view for listBuckets()	2020-09-24 09:53:38 -07:00
Anis Elleuch	4c81201f95	fix: healing delete marker on versioned buckets (#10530 ) Healing was not working correctly in the distributed mode because errFileVersionNotFound was not properly converted in storage rest client. Besides, fixing the healing delete marker is not working as expected.	2020-09-21 15:16:16 -07:00
Klaus Post	493c714663	Remove erasureSets and erasureObjects from ObjectLayer (#10442 )	2020-09-10 09:18:19 -07:00
Klaus Post	2d58a8d861	Add storage layer contexts (#10321 ) Add context to all (non-trivial) calls to the storage layer. Contexts are propagated through the REST client. - `context.TODO()` is left in place for the places where it needs to be added to the caller. - `endWalkCh` could probably be removed from the walkers, but no changes so far. The "dangerous" part is that now a caller disconnecting will propagate down, so a "delete" operation will now be interrupted. In some cases we might want to disconnect this functionality so the operation completes if it has started, leaving the system in a cleaner state.	2020-09-04 09:45:06 -07:00

1 2 3 4

157 Commits