minio

mirror of https://github.com/minio/minio.git synced 2024-12-25 22:55:54 -05:00

Author	SHA1	Message	Date
Harshavardhana	f1abb92f0c	feat: Single drive XL implementation (#14970 ) Main motivation is move towards a common backend format for all different types of modes in MinIO, allowing for a simpler code and predictable behavior across all features. This PR also brings features such as versioning, replication, transitioning to single drive setups.	2022-05-30 10:58:37 -07:00
Klaus Post	90a52a29c5	Fix WalkDir fallback hot loop (#14961 ) Fix fallback hot loop fd was never refreshed, leading to an infinite hot loop if a disk failed and the fallback disk fails as well. Fix & simplify retry loop. Fixes #14960	2022-05-23 06:28:46 -07:00
Harshavardhana	153a612253	fetch bucket retention config once for ILM evalAction (#14727 ) This is mainly an optimization, does not change any existing functionality.	2022-04-11 13:25:32 -07:00
Harshavardhana	e77ad3f9bb	make sure to pass Lifecycle if set for List filtering (#14722 ) PR #14606 never really passed the Lifecycle filter down to the listing callers to ensure skipping the entries.	2022-04-10 11:14:52 -07:00
Klaus Post	901d33b59c	Tweak listing quorum (#14703 ) Always go for 50% quorum, and only use non-healing disks. Fixes #14635	2022-04-06 12:24:21 -07:00
Harshavardhana	5cfedcfe33	askDisks for strict quorum to be equal to read quorum (#14623 )	2022-03-25 16:29:45 -07:00
Harshavardhana	f046f557fa	request only 1 best version for latest version resolution (#14625 ) ListObjects, ListObjectsV2 calls are being heavily taxed when there are many versions on objects left over from a previous release or ILM was never setup to clean them up. Instead of being absolutely correct at resolving the exact latest version of an object, we simply rely on the top most 1 version and resolve the rest. Once we have obtained the top most "1" version for ListObject, ListObjectsV2 call we break out.	2022-03-25 08:50:07 -07:00
Harshavardhana	401958938d	add load balance properly restClientFromHash() bucket/prefix (#14621 ) spread out resuming further to other nodes	2022-03-25 03:41:31 -07:00
Klaus Post	2ac54e5a7b	ListObjects: Filter lifecycle expired objects (#14606 ) For ListObjects and ListObjectsV2 perform lifecycle checks on all objects before returning. This will filter out objects that are pending lifecycle expiration. Bonus: Cheaper server pool conflict resolution by not converting to FileInfo.	2022-03-22 12:39:45 -07:00
Harshavardhana	8eecdc6d1f	odd stripe sizes should choose (odd+1)/2 to get correct quorum (#14610 )	2022-03-22 12:21:14 -07:00
Klaus Post	7bc1f986e8	Do not wait for results when canceled (#14607 ) When canceled nobody may be listening for the results. Prevents memory buildup from cancelled requests.	2022-03-22 09:37:01 -07:00
Klaus Post	61eb9d4e29	Fix listing fallback re-using disks (#14576 ) When more than 2 disks are unavailable for listing, the same disk will be used for fallback. This makes quorum calculations incorrect since the same disk will have multiple entries. This PR keeps track of which fallback disks have been handed out and only every returns a disk once.	2022-03-18 11:35:27 -07:00
Harshavardhana	aaea94a48d	update quorum requirement to list all objects (#14201 ) some upgraded objects might not get listed due to different quorum ratios across objects. make sure to list all objects that satisfy the maximum possible quorum.	2022-01-27 17:00:15 -08:00
Harshavardhana	f527c708f2	run gofumpt cleanup across code-base (#14015 )	2022-01-02 09:15:06 -08:00
Harshavardhana	866a95de38	fix: choose appropriate quorum for a given erasure set (#13998 ) multiObject delete should honor expected quorum	2021-12-28 12:41:52 -08:00
Harshavardhana	2f1e8ba612	add more directory marker tests and fix a bug (#13871 ) ListObjects() should never list a delete-marked folder if latest is delete marker and delimiter is not provided. ListObjectVersions() should list a delete-marked folder even if latest is delete marker and delimiter is not provided. Enhance further versioning listing on the buckets	2021-12-09 14:59:23 -08:00
Harshavardhana	dcff6c996d	fix: do not list delete-marked objects (#13864 ) delete marked objects should not be considered for listing when listing is delimited, this issue as introduced in PR #13804 which was mainly to address listing of directories in listing when delimited. This PR fixes this properly and adds tests to ensure that we behave in accordance with how an S3 API behaves for ListObjects() without versions.	2021-12-08 17:34:52 -08:00
Klaus Post	3db931dc0e	Improve listing consistency with version merging (#13723 )	2021-12-02 11:29:16 -08:00
Harshavardhana	c791de0e1e	re-implement pickValidInfo dataDir, move to quorum calculation (#13681 ) dataDir loosely based on maxima is incorrect and does not work in all situations such as disks in the following order - xl.json migration to xl.meta there may be partial xl.json's leftover if some disks are not yet connected when the disk is yet to come up, since xl.json mtime and xl.meta is same the dataDir maxima doesn't work properly leading to quorum issues. - its also possible that XLV1 might be true among the disks available, make sure to keep FileInfo based on common quorum and skip unexpected disks with the older data format. Also, this PR tests upgrade from older to a newer release if the data is readable and matches the checksum. NOTE: this is just initial work we can build on top of this to do further tests.	2021-11-21 10:41:30 -08:00
Harshavardhana	60aad1b717	fix: improve bucket deletes we were leaving behind few files (#13364 ) bucket deletes should purge entire bucket metadata appropriately, use rename() to move the metadata files to trash folder, for dangling buckets instead of doing recursive deletes, rename such buckets to trash folder as well. Bonus: reduce retry duration for listing to 200ms	2021-10-06 09:20:25 -07:00
Harshavardhana	769f0b1e24	fix: fallback listing on drives that are unformatted, disconnected (#13249 )	2021-09-23 17:24:24 -07:00
Harshavardhana	5ed781a330	check for context canceled after competing for locks (#13239 ) once we have competed for locks, verify if the context is still valid - this is to ensure that we do not start readdir() or read() calls on the drives on canceled connections.	2021-09-17 14:11:01 -07:00
Klaus Post	d80826b05d	Clean up metacache saver (#13225 ) Don't report success before the listing has actually finished. This will make stop conditions more clear.	2021-09-16 13:35:25 -07:00
Anis Elleuch	98479d7ffd	Fix deadlock when error during metacache generation (#13201 ) A typo forgot to release a lock after acquiring it.	2021-09-13 09:11:39 -07:00
Klaus Post	3c2efd9cf3	Stop async listing earlier (#13160 ) Stop async listing if we have not heard back from the client for 3 minutes. This will stop spending resources on async listings when they are unlikely to get used. If the client returns a new listing will be started on the second request. Stop saving cache metadata to disk. It is cleared on restarts anyway. Removes all load/save functionality	2021-09-08 11:06:45 -07:00
Klaus Post	556552340a	listing: Don't log errFileNotFound and friends (#13119 )	2021-08-31 09:46:42 -07:00
Klaus Post	ad928f0078	Return list request when canceled (#12977 ) * Return list request when canceled * Cancel list if abandoned	2021-08-16 11:59:16 -07:00
Harshavardhana	54ab3a1d5b	implement putMetacacheObject() optimizing List operations (#12903 ) removes unexpected features from regular putObject() such as - increasing parity when disks are down, avoids a lot of DiskInfo() calls. - triggering MRF for metacache objects if disks are offline - avoiding renames from temporary location to actual namespace, not needed since metacache files are unique.	2021-08-09 06:58:54 -07:00
Harshavardhana	035882d292	fix: remove parentIsObject() check (#12851 ) we will allow situations such as ``` a/b/1.txt a/b ``` and ``` a/b a/b/1.txt ``` we are going to document that this usecase is not supported and we will never support it, if any application does this users have to delete the top level parent to make sure namespace is accessible at lower level. rest of the situations where the prefixes get created across sets are supported as is.	2021-08-03 13:26:57 -07:00
Harshavardhana	f175ff8f66	add healing fixes for delete-marker (#12788 ) - delete-markers are incorrectly reported as corrupt with wrong data sent to client 'mc admin heal -r' on objects with delete marker will report as 'grey' incorrectly. - do not heal delete-markers during HeadObject() this can lead to inconsistent order of heals on the object, although this is not an issue in terms of order of versions it is rather simpler to keep the same order on all drives. - defaultHealResult() should handle 'err == nil' case such that valid cases should be handled as 'drive' status OK.	2021-07-26 08:01:41 -07:00
Harshavardhana	e124d88788	optimize listing operation concurrency (#12728 ) - remove use of getOnlineDisks() instead rely on fallbackDisks() when disk return errors like diskNotFound, unformattedDisk use other fallback disks to list from, instead of paying the price for checking getOnlineDisks() - optimize getDiskID() further to avoid large write locks when looking formatLastCheck time window This new change allows for a more relaxed fallback for listing allowing for more tolerance and also eventually gain more consistency in results even if using '3' disks by default.	2021-07-24 22:03:38 -07:00
Harshavardhana	9516587a6c	update console to master branch with new fixes - improve download behavior - avoid response timeouts	2021-07-15 14:44:18 -07:00
Klaus Post	05aebc52c2	feat: Implement listing version 3.0 (#12605 ) Co-authored-by: Harshavardhana <harsha@minio.io>	2021-07-05 15:34:41 -07:00
Harshavardhana	a6ad965799	fix: use NumVersions for list resolver (#12599 ) also do not incorrectly double count objExists unless its selected and it matches with previous entry. Bonus: change listQuorum to match with AskDisks to ensure that we atleast by default choose all the "drives" that we asked is consistent.	2021-06-30 07:43:19 -07:00
Harshavardhana	1f262daf6f	rename all remaining packages to internal/ (#12418 ) This is to ensure that there are no projects that try to import `minio/minio/pkg` into their own repo. Any such common packages should go to `https://github.com/minio/pkg`	2021-06-01 14:59:40 -07:00
Harshavardhana	81d5688d56	move the dependency to minio/pkg for common libraries (#12397 )	2021-05-28 15:17:01 -07:00
Harshavardhana	32d8a48d4e	reduce memory usage in metacache reader (#12334 )	2021-05-20 09:00:11 -07:00
Klaus Post	229d83bb75	feat: add dynamic usage cache (#12229 ) A cache structure will be kept with a tree of usages. The cache is a tree structure where each keeps track of its children. An uncompacted branch contains a count of the files only directly at the branch level, and contains link to children branches or leaves. The leaves are "compacted" based on a number of properties. A compacted leaf contains the totals of all files beneath it. A leaf is only scanned once every dataUsageUpdateDirCycles, rarer if the bloom filter for the path is clean and no lifecycles are applied. Skipped leaves have their totals transferred from the previous cycle. A clean leaf will be included once every healFolderIncludeProb for partial heal scans. When selected there is a one in healObjectSelectProb that any object will be chosen for heal scan. Compaction happens when either: - The folder (and subfolders) contains less than dataScannerCompactLeastObject objects. - The folder itself contains more than dataScannerCompactAtFolders folders. - The folder only contains objects and no subfolders. - A bucket root will never be compacted. Furthermore, if a has more than dataScannerCompactAtChildren recursive children (uncompacted folders) the tree will be recursively scanned and the branches with the least number of objects will be compacted until the limit is reached. This ensures that any branch will never contain an unreasonable amount of other branches, and also that small branches with few objects don't take up unreasonable amounts of space. Whenever a branch is scanned, it is assumed that it will be un-compacted before it hits any of the above limits. This will make the branch rebalance itself when scanned if the distribution of objects has changed. TLDR; With current values: No bucket will ever have more than 10000 child nodes recursively. No single folder will have more than 2500 child nodes by itself. All subfolders are compacted if they have less than 500 objects in them recursively. We accumulate the (non-deletemarker) version count for paths as well, since we are changing the structure anyway.	2021-05-11 18:36:15 -07:00
Harshavardhana	e84f533c6c	add missing wait groups for certain io.Pipe() usage (#12264 ) wait groups are necessary with io.Pipes() to avoid races when a blocking function may not be expected and a Write() -> Close() before Read() races on each other. We should avoid such situations.. Co-authored-by: Klaus Post <klauspost@gmail.com>	2021-05-11 09:18:37 -07:00
Harshavardhana	069432566f	update license change for MinIO Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-23 11:58:53 -07:00
Harshavardhana	d46386246f	api: Introduce metadata update APIs to update only metadata (#11962 ) Current implementation heavily relies on readAllFileInfo but with the advent of xl.meta inlined with data, we cannot easily avoid reading data when we are only interested is updating metadata, this leads to invariably write amplification during metadata updates, repeatedly reading data when we are only interested in updating metadata. This PR ensures that we implement a metadata only update API at storage layer, that handles updates to metadata alone for any given version - given the version is valid and present. This helps reduce the chattiness for following calls.. - PutObjectTags - DeleteObjectTags - PutObjectLegalHold - PutObjectRetention - ReplicateObject (updates metadata on replication status)	2021-04-04 13:32:31 -07:00
Klaus Post	9efcb9e15c	Fix listPathRaw/WalkDir cancelation (#11905 ) In #11888 we observe a lot of running, WalkDir calls. There doesn't appear to be any listerners for these calls, so they should be aborted. Ensure that WalkDir aborts when upstream cancels the request. Fixes #11888	2021-03-26 11:18:30 -07:00
Harshavardhana	d971061305	use listPathRaw for HealObjects() instead of expensive WalkVersions() (#11675 )	2021-03-06 09:25:48 -08:00
Klaus Post	10bdb78699	fix: listObjectVersions Include object in marker (#11562 ) ListObjectVersions would skip past the object in the marker when version id is specified. Make `listPath` return the object with the marker and truncate it if not needed. Avoid having to parse unintended objects to find a version marker.	2021-03-01 08:12:02 -08:00
Harshavardhana	a8e4f64ff3	Revert "fix: remove persistence layer for metacache store in memory (#11538 )" This reverts commit `b23659927c`.	2021-02-24 22:24:51 -08:00
Harshavardhana	b23659927c	fix: remove persistence layer for metacache store in memory (#11538 ) store the cache in-memory instead of disks to avoid large write amplifications for list heavy workloads, store in memory instead and let it auto expire.	2021-02-24 15:51:41 -08:00
Andreas Auernhammer	d4b822d697	pkg/etag: add new package for S3 ETag handling (#11577 ) This commit adds a new package `etag` for dealing with S3 ETags. Even though ETag is often viewed as MD5 checksum of an object, handling S3 ETags correctly is a surprisingly complex task. While it is true that the ETag corresponds to the MD5 for the most basic S3 API operations, there are many exceptions in case of multipart uploads or encryption. In worse, some S3 clients expect very specific behavior when it comes to ETags. For example, some clients expect that the ETag is a double-quoted string and fail otherwise. Non-AWS compliant ETag handling has been a source of many bugs in the past. Therefore, this commit adds a dedicated `etag` package that provides functionality for parsing, generating and converting S3 ETags. Further, this commit removes the ETag computation from the `hash` package. Instead, the `hash` package (i.e. `hash.Reader`) should focus only on computing and verifying the content-sha256. One core feature of this commit is to provide a mechanism to communicate a computed ETag from a low-level `io.Reader` to a high-level `io.Reader`. This problem occurs when an S3 server receives a request and has to compute the ETag of the content. However, the server may also wrap the initial body with several other `io.Reader`, e.g. when encrypting or compressing the content: ``` reader := Encrypt(Compress(ETag(content))) ``` In such a case, the ETag should be accessible by the high-level `io.Reader`. The `etag` provides a mechanism to wrap `io.Reader` implementations such that the `ETag` can be accessed by a type-check. This technique is applied to the PUT, COPY and Upload handlers.	2021-02-23 12:31:53 -08:00
Klaus Post	c5b2a8441b	fix: faster healing when disk is replaced. (#11520 )	2021-02-18 11:06:54 -08:00
Krishnan Parthasarathi	b87fae0049	Simplify PutObjReader for plain-text reader usage (#11470 ) This change moves away from a unified constructor for plaintext and encrypted usage. NewPutObjReader is simplified for the plain-text reader use. For encrypted reader use, WithEncryption should be called on an initialized PutObjReader. Plaintext: func NewPutObjReader(rawReader hash.Reader) PutObjReader The hash.Reader is used to provide payload size and md5sum to the downstream consumers. This is different from the previous version in that there is no need to pass nil values for unused parameters. Encrypted: func WithEncryption(encReader hash.Reader, key crypto.ObjectKey) (*PutObjReader, error) This method sets up encrypted reader along with the key to seal the md5sum produced by the plain-text reader (already setup when NewPutObjReader was called). Usage: ``` pReader := NewPutObjReader(rawReader) // ... other object handler code goes here // Prepare the encrypted hashed reader pReader, err = pReader.WithEncryption(encReader, objEncKey) ```	2021-02-10 08:52:50 -08:00
Harshavardhana	a6c146bd00	validate storage class across pools when setting config (#11320 ) ``` mc admin config set alias/ storage_class standard=EC:3 ``` should only succeed if parity ratio is valid for all server pools, if not we should fail proactively. This PR also needs to bring other changes now that we need to cater for variadic drive counts per pool. Bonus fixes also various bugs reproduced with - GetObjectWithPartNumber() - CopyObjectPartWithOffsets() - CopyObjectWithMetadata() - PutObjectPart,PutObject with truncated streams	2021-01-22 12:09:24 -08:00

1 2

85 Commits