minio

mirror of https://github.com/minio/minio.git synced 2024-12-26 07:05:55 -05:00

Author	SHA1	Message	Date
Klaus Post	470553ff5d	Tweak readall allocation and renameData buffer reuse (#13108 ) Use a single allocation for reading the file, not the growing buffer of `io.ReadAll`. Reuse the write buffer if we can when writing metadata in RenameData.	2021-08-30 08:38:11 -07:00
Poorna Krishnamoorthy	27f895cf2c	Check pathlength before reading metadata (#13080 ) fixes bug where the server returns 503 instead of 400 if objectName is longer than 255 characters Fixes regression introduced in #12942	2021-08-26 16:23:12 -07:00
Harshavardhana	c11a2ac396	refactor healing to remove certain structs (#13079 ) - remove sourceCh usage from healing we already have tasks and resp channel - use read locks to lookup globalHealConfig - fix healing resolver to pick candidates quickly that need healing, without this resolver was unexpectedly skipping.	2021-08-26 14:06:04 -07:00
Klaus Post	88d719689c	Synchronize bucket cycle numbers (#13058 ) Synchronize bucket cycles so it is much more likely that the same prefixes will be picked up for scanning. Use the global bloom filter cycle for that. Bump bloom filter versions to clear those.	2021-08-25 08:25:26 -07:00
Krishnan Parthasarathi	db35bcf2ce	heal: Remove transitioned objects' parts from outdated disks (#13018 ) Bonus: check equality for replication and other metadata	2021-08-23 13:14:55 -07:00
Klaus Post	1080609c86	Reuse buffers when writing metadata (#13040 ) Simplify returning buffers. Tested using `warp mixed --duration=1m --obj.size=100K`: ``` Operation: DELETE Operations: 7148 -> 7642 * Average: +6.77% (+8.1) obj/s ------------------- Operation: GET Operations: 32200 -> 34403 * Average: +6.74% (+3.5 MiB/s) throughput, +6.74% (+36.2) obj/s * First Byte: Average: -105.403µs (-3%), Median: -309µs (-11%), Best: -2.7µs (-0%), Worst: +3.5637ms (+3%) ------------------- Operation: PUT Operations: 10741 -> 11475 * Average: +6.78% (+1.2 MiB/s) throughput, +6.78% (+12.1) obj/s ------------------- Operation: STAT Operations: 21465 -> 22927 * Average: +6.71% (+24.0) obj/s ```	2021-08-23 11:17:27 -07:00
Harshavardhana	0f01e7ef0f	fix: check for xl.meta as directory fallback (#13023 ) Objects uploaded in this format for example ``` mc cp /etc/hosts alias/bucket/foo/bar/xl.meta mc ls -r alias/bucket/foo/bar ``` Won't list the object, handle this scenario.	2021-08-21 00:12:29 -07:00
Klaus Post	c25816eabc	xl walk: Limit walk concurrent IO (#12885 ) We are observing heavy system loads, potentially locking the system up for periods when concurrent listing operations are performed. We place a per-disk lock on walk IO operations. This will minimize the impact of concurrent listing operations on the entire system and de-prioritize them compared to other operations. Single list operations should remain largely unaffected.	2021-08-18 18:10:36 -07:00
Klaus Post	24722ddd02	Remove inline data hack (#12946 ) move the code down to the storage layer, this logic decouples the inline data from the size parameter making it flexible and future proof.	2021-08-13 08:25:54 -07:00
Klaus Post	89febdb3d6	Reuse small buffers (#12948 ) When reading metadata allow reuse of buffers in certain cases. Take the low-hanging fruit. Reduce GC overhead when listing.	2021-08-12 14:27:22 -07:00
Klaus Post	3eac02f676	Use metadata reader in ReadVersion (#12942 ) Use `readMetadata` when reading version information without data requested. Reduces IO on inlined data. Bonus: Inline compressed data as well when compression is enabled.	2021-08-12 10:05:24 -07:00
Harshavardhana	40a2fa8e81	fix: add more optimizations to putMetacacheObject() (#12916 ) - avoid extra lookup for 'xl.meta' since we are definitely sure that it doesn't exist. - use this in newMultipartUpload() as well - also additionally do not write with O_DSYNC to avoid loading the drives, instead create 'xl.meta' for listing operations without O_DSYNC since these are ephemeral objects. - do the same with newMultipartUpload() since it gets synced when the PutObjectPart() is attempted, we do not need to tax newMultipartUpload() instead.	2021-08-10 11:12:22 -07:00
Harshavardhana	035882d292	fix: remove parentIsObject() check (#12851 ) we will allow situations such as ``` a/b/1.txt a/b ``` and ``` a/b a/b/1.txt ``` we are going to document that this usecase is not supported and we will never support it, if any application does this users have to delete the top level parent to make sure namespace is accessible at lower level. rest of the situations where the prefixes get created across sets are supported as is.	2021-08-03 13:26:57 -07:00
Harshavardhana	bfbdb8f0a8	fix: incorrect O_DIRECT behavior for reads (#12811 ) O_DIRECT behavior was broken and it was still caching all the reads, this change properly fixes this behavior.	2021-07-28 11:20:16 -07:00
Krishna Srinivas	aa0c28809b	Server side speedtest implementation (#12750 )	2021-07-27 12:55:56 -07:00
Harshavardhana	0c666379fe	fix: avoid removing healed parts on dstDataPath (#12795 ) destination path and old path will be similar when healing occurs, this can lead to healed parts being again purged leading to always an inconsistent state on an object which might further cause reduction in quorum eventually.	2021-07-26 15:15:34 -07:00
Harshavardhana	e124d88788	optimize listing operation concurrency (#12728 ) - remove use of getOnlineDisks() instead rely on fallbackDisks() when disk return errors like diskNotFound, unformattedDisk use other fallback disks to list from, instead of paying the price for checking getOnlineDisks() - optimize getDiskID() further to avoid large write locks when looking formatLastCheck time window This new change allows for a more relaxed fallback for listing allowing for more tolerance and also eventually gain more consistency in results even if using '3' disks by default.	2021-07-24 22:03:38 -07:00
Krishnan Parthasarathi	29eea52e14	Skip transitioning of object versions if inlined (#12705 )	2021-07-16 09:38:27 -07:00
Klaus Post	d6a2fe02d3	Add admin file inspector (#12635 ) Download files from any bucket/path as an encrypted zip file. The key is included in the response but can be separated so zip and the key doesn't have to be sent on the same channel. Requires https://github.com/minio/pkg/pull/6	2021-07-09 11:29:16 -07:00
Krishnan Parthasarathi	a1df230518	Add a 'free' version to track deletion of tiered object content (#12470 )	2021-06-30 19:32:07 -07:00
Harshavardhana	4669d19f2a	fix: simplify diskMap usage to keep certain checks predictable (#12519 ) Bonus: also make sure that we Sanitize() the drives only during startup of the server, but not during disk reconnects.	2021-06-16 14:26:26 -07:00
Anis Elleuch	f30c996d48	trace: Add bucket/prefix to WalkDir() tracing (#12510 ) Bonus, replace os.* API with os-instrumented.go	2021-06-15 14:34:26 -07:00
Harshavardhana	31971906ff	fix: force-delete should just rename to .trash (#12499 ) avoid blocking call for force-delete, instead treat it lazily and delete in background.	2021-06-14 08:04:37 -07:00
Harshavardhana	dd2831c1a0	fix: remove parent dirs in RenameData upon failure (#12452 ) - it is possible that during I/O failures we might leave partially written directories, make sure we purge them after. - rename current data-dir (null) versionId only after the newer xl.meta has been written fully. - attempt removal once for minioMetaTmpBucket/uuid/ as this folder is empty if all previous operations were successful, this allows avoiding recursive os.Remove()	2021-06-07 09:35:08 -07:00
Anis Elleuch	810af07529	xl: Avoid multi-disks node to exit when one disk fails (#12423 ) It makes sense that a node that has multiple disks starts when one disk fails, returning an i/o error for example. This commit will make this faulty tolerance available in this specific use case.	2021-06-05 09:10:32 -07:00
Poorna Krishnamoorthy	dbea8d2ee0	Add support for existing object replication. (#12109 ) Also adding an API to allow resyncing replication when existing object replication is enabled and the remote target is entirely lost. With the `mc replicate reset` command, the objects that are eligible for replication as per the replication config will be resynced to target if existing object replication is enabled on the rule.	2021-06-01 19:59:11 -07:00
Harshavardhana	1f262daf6f	rename all remaining packages to internal/ (#12418 ) This is to ensure that there are no projects that try to import `minio/minio/pkg` into their own repo. Any such common packages should go to `https://github.com/minio/pkg`	2021-06-01 14:59:40 -07:00
Harshavardhana	81d5688d56	move the dependency to minio/pkg for common libraries (#12397 )	2021-05-28 15:17:01 -07:00
Harshavardhana	4840974d7a	fix: inline data upon overwrites should be readable (#12369 ) This PR fixes two bugs - Remove fi.Data upon overwrite of objects from inlined-data to non-inlined-data - Workaround for an existing bug on disk with latest releases to ignore fi.Data and instead read from the disk for non-inlined-data - Addtionally add a reserved metadata header to indicate data is inlined for a given version.	2021-05-25 16:33:06 -07:00
Harshavardhana	ebf75ef10d	fix: remove all unused code (#12360 )	2021-05-24 09:28:19 -07:00
Harshavardhana	0287711dc9	fix: implement readMetadata common function for re-use (#12353 ) Previous PR #12351 added functions to read from the reader stream to reduce memory usage, use the same technique in few other places where we are not interested in reading the data part.	2021-05-21 11:41:25 -07:00
Klaus Post	9d1b6fb37d	Add XL reader without data (#12351 ) Add XL metadata reader that reads metadata only on larger files. Use for scanning and listing for now.	2021-05-21 09:10:54 -07:00
Klaus Post	2ca9c533ef	feat: implement in-progress partial bucket updates (#12279 )	2021-05-19 14:38:30 -07:00
Harshavardhana	2daba018d6	reduce allocations on multi-disk clusters (#12311 ) multi-disk clusters initialize buffer pools per disk, this is perhaps expensive and perhaps not useful, for a running server instance. As this may disallow re-use of buffers across sets, this change ensures that buffers across sets can be re-used at drive level, this can reduce quite a lot of memory on large drive setups.	2021-05-17 17:49:48 -07:00
Harshavardhana	2ab9dc7609	do not update bloomFilters for temporary objects	2021-05-15 19:54:07 -07:00
Harshavardhana	4d876d03e8	fix: do not fail upon faulty/non-writable drives gracefully start the server, if there are other drives available - print enough information for administrator to notice the errors in console. Bonus: for really large streams use larger buffer for writes.	2021-05-15 12:57:18 -07:00
Klaus Post	229d83bb75	feat: add dynamic usage cache (#12229 ) A cache structure will be kept with a tree of usages. The cache is a tree structure where each keeps track of its children. An uncompacted branch contains a count of the files only directly at the branch level, and contains link to children branches or leaves. The leaves are "compacted" based on a number of properties. A compacted leaf contains the totals of all files beneath it. A leaf is only scanned once every dataUsageUpdateDirCycles, rarer if the bloom filter for the path is clean and no lifecycles are applied. Skipped leaves have their totals transferred from the previous cycle. A clean leaf will be included once every healFolderIncludeProb for partial heal scans. When selected there is a one in healObjectSelectProb that any object will be chosen for heal scan. Compaction happens when either: - The folder (and subfolders) contains less than dataScannerCompactLeastObject objects. - The folder itself contains more than dataScannerCompactAtFolders folders. - The folder only contains objects and no subfolders. - A bucket root will never be compacted. Furthermore, if a has more than dataScannerCompactAtChildren recursive children (uncompacted folders) the tree will be recursively scanned and the branches with the least number of objects will be compacted until the limit is reached. This ensures that any branch will never contain an unreasonable amount of other branches, and also that small branches with few objects don't take up unreasonable amounts of space. Whenever a branch is scanned, it is assumed that it will be un-compacted before it hits any of the above limits. This will make the branch rebalance itself when scanned if the distribution of objects has changed. TLDR; With current values: No bucket will ever have more than 10000 child nodes recursively. No single folder will have more than 2500 child nodes by itself. All subfolders are compacted if they have less than 500 objects in them recursively. We accumulate the (non-deletemarker) version count for paths as well, since we are changing the structure anyway.	2021-05-11 18:36:15 -07:00
Anis Elleuch	56d4d7b8b1	MRF: Better detection of non stable disks (#12252 ) MRF does not detect when a node is disconnected and reconnected quickly this change will ensure that MRF is alerted by comparing the last disk reconnection timestamp with the last MRF check time. Signed-off-by: Anis Elleuch <anis@min.io> Co-authored-by: Klaus Post <klauspost@gmail.com>	2021-05-11 09:19:15 -07:00
Harshavardhana	764721e2c6	add root_disk threshold detection (#12259 ) as there is no automatic way to detect if there is a root disk mounted on / or /var for the container environments due to how the root disk information is masked inside overlay root inside container. this PR brings an environment variable to set root disk size threshold manually to detect the root disks in such situations.	2021-05-08 15:40:29 -07:00
Klaus Post	254698f126	fix: minor allocation improvements in xlMetaV2 (#12133 )	2021-05-07 09:11:05 -07:00
Nitish Tiwari	776589f0da	Add free inode metric for Prometheus (#12225 )	2021-05-06 12:50:48 -07:00
Harshavardhana	c8050bc079	fix: sleeper behavior in data scanner (#12164 ) do not apply healReplication() for ILM expired, transitioned objects	2021-04-27 08:24:44 -07:00
Poorna Krishnamoorthy	4be0f92067	Fix multipart restore to remove part match (#12161 ) Part ETags are not available after multipart finalizes, removing this check as not useful. Signed-off-by: Poorna Krishnamoorthy <poorna@minio.io> Co-authored-by: Harshavardhana <harsha@minio.io>	2021-04-26 18:24:06 -07:00
Krishnan Parthasarathi	c829e3a13b	Support for remote tier management (#12090 ) With this change, MinIO's ILM supports transitioning objects to a remote tier. This change includes support for Azure Blob Storage, AWS S3 compatible object storage incl. MinIO and Google Cloud Storage as remote tier storage backends. Some new additions include: - Admin APIs remote tier configuration management - Simple journal to track remote objects to be 'collected' This is used by object API handlers which 'mutate' object versions by overwriting/replacing content (Put/CopyObject) or removing the version itself (e.g DeleteObjectVersion). - Rework of previous ILM transition to fit the new model In the new model, a storage class (a.k.a remote tier) is defined by the 'remote' object storage type (one of s3, azure, GCS), bucket name and a prefix. * Fixed bugs, review comments, and more unit-tests - Leverage inline small object feature - Migrate legacy objects to the latest object format before transitioning - Fix restore to particular version if specified - Extend SharedDataDirCount to handle transitioned and restored objects - Restore-object should accept version-id for version-suspended bucket (#12091) - Check if remote tier creds have sufficient permissions - Bonus minor fixes to existing error messages Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io> Co-authored-by: Krishna Srinivas <krishna@minio.io> Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-23 11:58:53 -07:00
Harshavardhana	069432566f	update license change for MinIO Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-23 11:58:53 -07:00
Harshavardhana	2ef824bbb2	collapse two distinct calls into single RenameData() call (#12093 ) This is an optimization by reducing one extra system call, and many network operations. This reduction should increase the performance for small file workloads.	2021-04-20 10:44:39 -07:00
Harshavardhana	1456f9f090	fix: preserve shared dataDir during suspend overwrites (#12058 ) CopyObject() when shares dataDir needs to be preserved, and upon versioning suspended overwrites should still preserve the dataDir.	2021-04-15 08:44:05 -07:00
Harshavardhana	928ee1a7b2	remove null version dataDir upon overwrites (#12023 )	2021-04-08 19:55:44 -07:00
Klaus Post	d2ac2f758e	odirectReader: handle EOF correctly (#11998 ) EOF may be sent along with data so queue it up and return it when the buffer is empty. Also, when reading data without direct io don't add a buffer that only results in extra memcopy.	2021-04-07 08:32:59 -07:00
Klaus Post	788a8bc254	Fix disk info race (#11984 ) Protect updated members in xlStorage. ``` WARNING: DATA RACE Write at 0x00c004b4ee78 by goroutine 1491: github.com/minio/minio/cmd.(xlStorage).GetDiskID() d:/minio/minio/cmd/xl-storage.go:590 +0x1078 github.com/minio/minio/cmd.(xlStorageDiskIDCheck).checkDiskStale() d:/minio/minio/cmd/xl-storage-disk-id-check.go:195 +0x84 github.com/minio/minio/cmd.(xlStorageDiskIDCheck).StatVol() d:/minio/minio/cmd/xl-storage-disk-id-check.go:284 +0x16a github.com/minio/minio/cmd.erasureObjects.getBucketInfo.func1() d:/minio/minio/cmd/erasure-bucket.go:100 +0x1a5 github.com/minio/minio/pkg/sync/errgroup.(Group).Go.func1() d:/minio/minio/pkg/sync/errgroup/errgroup.go:122 +0xd7 Previous read at 0x00c004b4ee78 by goroutine 1087: github.com/minio/minio/cmd.(xlStorage).CheckFile.func1() d:/minio/minio/cmd/xl-storage.go:1699 +0x384 github.com/minio/minio/cmd.(xlStorage).CheckFile() d:/minio/minio/cmd/xl-storage.go:1726 +0x13c github.com/minio/minio/cmd.(xlStorageDiskIDCheck).CheckFile() d:/minio/minio/cmd/xl-storage-disk-id-check.go:446 +0x23b github.com/minio/minio/cmd.erasureObjects.parentDirIsObject.func1() d:/minio/minio/cmd/erasure-common.go:173 +0x194 github.com/minio/minio/pkg/sync/errgroup.(Group).Go.func1() d:/minio/minio/pkg/sync/errgroup/errgroup.go:122 +0xd7 ```	2021-04-06 11:33:42 -07:00

1 2 3 4

167 Commits