minio

mirror of https://github.com/minio/minio.git synced 2024-12-27 23:55:56 -05:00

Author	SHA1	Message	Date
Harshavardhana	bbb64eaade	skip healing properly in the scanner when a drive is hotplugged (#19939 ) skip healing properly in scanner when drive is hotplugged due to how the state is passed around the SkipHealing might not be the true state() of the system always, causing a situation where we might healing from the scanner on the same drive which is being. Due to this competing heals get triggered that slow each other down.	2024-06-17 16:39:11 -07:00
Klaus Post	9667a170de	Add usage cache cleanup and lower forced top compaction (#19719 ) Lower forced compaction to 250K entries. If there is more than 250K entries on the top level force compact it and log an error.	2024-05-10 07:49:50 -07:00
Harshavardhana	95c65f4e8f	do not panic on rebalance during server restarts (#19563 ) This PR makes a feasible approach to handle all the scenarios that we must face to avoid returning "panic." Instead, we must return "errServerNotInitialized" when a bucketMetadataSys.Get() is called, allowing the caller to retry their operation and wait. Bonus fix the way data-usage-cache stores the object. Instead of storing usage-cache.bin with the bucket as `.minio.sys/buckets`, the `buckets` must be relative to the bucket `.minio.sys` as part of the object name. Otherwise, there is no way to decommission entries at `.minio.sys/buckets` and their final erasure set positions. A bucket must never have a `/` in it. Adds code to read() from existing data-usage.bin upon upgrade.	2024-04-22 10:49:30 -07:00
Anis Eleuch	95bf4a57b6	logging: Add subsystem to log API (#19002 ) Create new code paths for multiple subsystems in the code. This will make maintaing this easier later. Also introduce bugLogIf() for errors that should not happen in the first place.	2024-04-04 05:04:40 -07:00
Klaus Post	aa0eec16ab	Remove empty replication stats when sending update (#19375 ) When sending update and there is no replication stats - remove the struct. Will remove an unneeded alloc on the receiver.	2024-03-28 10:13:07 -07:00
jiuker	ec3a3bb10d	fix: Remove unnecessary loops for searchParent (#19353 )	2024-03-27 08:12:14 -07:00
Harshavardhana	dd2542e96c	add codespell action (#18818 ) Original work here, #18474, refixed and updated.	2024-01-17 23:03:17 -08:00
Krishnan Parthasarathi	cba3dd276b	Add more size intervals to obj size histogram (#18772 ) New intervals: [1024B, 64KiB) [64KiB, 256KiB) [256KiB, 512KiB) [512KiB, 1MiB) The new intervals helps us see object size distribution with higher resolution for the interval [1024B, 1MiB).	2024-01-12 23:51:08 -08:00
Klaus Post	7472818d94	Fix hanging scanner saves (#18368 ) Fix various regressions from #18029 * If context is canceled the token is never returned. This will lead to scanner being unable to save and deadlocking. * Fix backup not being able to get any data (hr empty) * Reduce backup timeout.	2023-11-01 09:09:28 -07:00
Harshavardhana	877e0cac03	fix: tiering statistics handling a bug in clone() implementation (#18342 ) Tiering statistics have been broken for some time now, a regression was introduced in `6f2406b0b6` Bonus fixes an issue where the objects are not assumed to be of the 'STANDARD' storage-class for the objects that have not yet tiered, this should be conditional based on the object's metadata not a default assumption. This PR also does some cleanup in terms of implementation, fixes #18070	2023-10-30 09:59:51 -07:00
Anis Eleuch	b336e9a79f	fix: loading usage cache to not fail early when reading the backup fails (#18158 ) Currently, the retry is not fully used when there is no backup copy of the data usage; use 5 retry attempts when we don't have any valid data, new or backup, unless we have seen an un-recognized error.	2023-10-02 19:22:35 -07:00
jiuker	9947c01c8e	feat: SSE-KMS use uuid instead of read all data to md5. (#17958 )	2023-09-18 10:00:54 -07:00
Eng Zer Jun	a00db4267c	data-usage-cache: remove redundant nil check (#17970 ) From the Go specification: "3. If the map is nil, the number of iterations is 0." [1] Therefore, an additional nil check for before the loop is unnecessary. [1]: https://go.dev/ref/spec#For_range Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>	2023-09-16 19:09:29 -07:00
Anis Eleuch	37aa5934a1	scanner: Fix loading data usage cache structure (#18037 ) Return an empty data usage cache structure when the data usage cache file does not exist, otherwise, the scanner won't work.	2023-09-15 13:11:08 -07:00
Harshavardhana	a2aabfabd9	add backups for usage-caches to rely on upon error (#18029 ) This allows scanner to avoid lengthy scans, skip things appropriately and also not lose metrics in any manner. reduce longer deadlines for usage-cache loads/saves to match the disk timeout which is 2minutes now per IOP.	2023-09-14 11:53:52 -07:00
Poorna	b48bbe08b2	Add additional info for replication metrics API (#17293 ) to track the replication transfer rate across different nodes, number of active workers in use and in-queue stats to get an idea of the current workload. This PR also adds replication metrics to the site replication status API. For site replication, prometheus metrics are no longer at the bucket level - but at the cluster level. Add prometheus metric to track credential errors since uptime	2023-08-30 01:00:59 -07:00
Krishnan Parthasarathi	0120ff93bc	admin-info: add DeleteMarkers count (#17659 )	2023-07-18 10:49:40 -07:00
Aditya Manthramurthy	5a1612fe32	Bump up madmin-go and pkg deps (#17469 )	2023-06-19 17:53:08 -07:00
Klaus Post	6f2406b0b6	fix: protect ReplicationStats against concurrent map iteration and write crash (#17403 )	2023-06-12 09:17:11 -07:00
Harshavardhana	6825bd7e75	fix: inlined objects don't need to honor long locks (#17039 )	2023-04-17 12:16:37 -07:00
Klaus Post	d85da9236e	Add Object Version count histogram (#16739 )	2023-03-10 08:53:59 -08:00
Klaus Post	a547bf517d	Remove locks on usage cache (#16786 )	2023-03-09 15:15:46 -08:00
Klaus Post	9acf1024e4	Remove bloom filter (#16682 ) Removes the bloom filter since it has so limited usability, often gets saturated anyway and adds a bunch of complexity to the scanner. Also removes a tiny bit of CPU by each write operation.	2023-02-24 09:03:31 +05:30
Aditya Manthramurthy	a30cfdd88f	Bump up madmin-go to v2 (#16162 )	2022-12-06 13:46:50 -08:00
Klaus Post	3fd9059b4e	opt: Only stream big data usage caches (#16168 )	2022-12-05 13:01:11 -08:00
Harshavardhana	23b329b9df	remove gateway completely (#15929 )	2022-10-24 17:44:15 -07:00
Klaus Post	ac055b09e9	Add detailed scanner metrics (#15161 )	2022-07-05 14:45:49 -07:00
Anis Elleuch	df50eda811	Add number of versions in server info API (#14812 ) The goal is to show the number of versions in the server info API.	2022-04-25 22:04:10 -07:00
Klaus Post	1d1b213f1f	scanner: Consider preselection bias when selecting for Healing (#14492 ) Healing decisions would align with skipped folder counters. This can lead to files never being selected for heal checks on "clean" paths. Use different hashing methods and take objectHealProbDiv into account when calculating the cycle. Found by @vadmeste	2022-03-07 09:25:53 -08:00
Harshavardhana	dbd05d6e82	remove FIFO bucket quota, use ILM expiration instead (#14206 )	2022-01-31 11:07:04 -08:00
Harshavardhana	57118919d2	cached diskIDs are not needed for scanner healing (#14170 ) This PR removes an unnecessary state that gets passed around for DiskIDs, which is not necessary since each disk exactly knows which pool and which set it belongs to on a running system. Currently cached DiskId's won't work properly because it always ends up skipping offline disks and never runs healing when disks are offline, as it expects all the cached diskIDs to be present always. This also sort of made things in-flexible in terms perhaps a new diskID for `format.json`. (however this is not a big issue) This is an unnecessary requirement that healing via scanner needs all drives to be online, instead healing should trigger even when partial nodes and drives are available this ensures that we keep the SLA in-tact on the objects when disks are offline for a prolonged period of time.	2022-01-26 08:34:56 -08:00
Harshavardhana	001b77e7e1	use readConfig/saveConfig to simplify I/O on usage/tracker info (#14019 )	2022-01-03 10:22:58 -08:00
Harshavardhana	f527c708f2	run gofumpt cleanup across code-base (#14015 )	2022-01-02 09:15:06 -08:00
Harshavardhana	914bfb2d9c	fix: allow compaction on replicated buckets (#13711 ) currently getReplicationConfig() failure incorrectly returns error on unexpected buckets upon upgrade, we should always calculate usage as much as possible.	2021-11-19 14:46:14 -08:00
Krishnan Parthasarathi	939fbb3c38	ilm: Make per-tier stats available via admin-tier-info (#13381 )	2021-10-23 18:38:33 -07:00
Anis Elleuch	20761e053e	replication: Fix replica stats during crawling (#13499 ) Also show replica stats with an ARN in Prometheus output.	2021-10-22 19:13:50 -07:00
Poorna Krishnamoorthy	19ecdc75a8	replication: Simplify metrics calculation (#13274 ) Also doing some code cleanup	2021-09-22 10:48:45 -07:00
Poorna Krishnamoorthy	0b55a0423e	fix: cache usage deserialization from v5 to v6 (#13258 )	2021-09-21 09:01:51 -07:00
Poorna Krishnamoorthy	c4373ef290	Add support for multi site replication (#12880 )	2021-09-18 13:31:35 -07:00
Klaus Post	7f49c38e2d	Recover corrupted usage files if any (#13179 )	2021-09-09 11:24:22 -07:00
Klaus Post	c8ca055935	Fix concurrent map read/write (#13052 ) Clones were not independent. Fixes race: ``` WARNING: DATA RACE Read at 0x00c002040cc0 by goroutine 50: runtime.mapiterinit() c:/go/src/runtime/map.go:802 +0x0 github.com/minio/minio/cmd.(dataUsageCache).flatten() d:/minio/minio/cmd/data-usage-cache.go:551 +0xad github.com/minio/minio/cmd.(dataUsageCache).dui() d:/minio/minio/cmd/data-usage-cache.go:352 +0x144 github.com/minio/minio/cmd.(erasureServerPools).NSScanner.func3.1() d:/minio/minio/cmd/erasure-server-pool.go:542 +0x2a4 github.com/minio/minio/cmd.(erasureServerPools).NSScanner.func3() d:/minio/minio/cmd/erasure-server-pool.go:561 +0x24b Previous write at 0x00c002040cc0 by goroutine 1391: runtime.mapassign_faststr() c:/go/src/runtime/map_faststr.go:202 +0x0 github.com/minio/minio/cmd.(dataUsageEntry).addChild() d:/minio/minio/cmd/data-usage-cache.go:231 +0x313 github.com/minio/minio/cmd.(dataUsageCache).replace() d:/minio/minio/cmd/data-usage-cache.go:383 +0x293 github.com/minio/minio/cmd.erasureObjects.nsScanner.func1() d:/minio/minio/cmd/erasure.go:428 +0x3a6 ```	2021-08-24 07:11:38 -07:00
Klaus Post	cc60d66909	Fix incremental usage accounting (#12871 ) Remote caches were not returned correctly, so they would not get updated on save. Furthermore make some tweaks for more reliable updates. Invalidate bloom filter to ensure rescan.	2021-08-04 09:14:14 -07:00
Anis Elleuch	aa78505181	Add prefixes usage in Accounting Usage Info (#12687 )	2021-07-13 10:42:11 -07:00
Klaus Post	a6cbfc3600	fs: fix stale bucket counts in data usage (#12521 ) In FS mode bucket count would be incorrect. Children were not removed. Other totals is correct, though. Fixes #12512	2021-06-16 14:22:55 -07:00
Poorna Krishnamoorthy	dbea8d2ee0	Add support for existing object replication. (#12109 ) Also adding an API to allow resyncing replication when existing object replication is enabled and the remote target is entirely lost. With the `mc replicate reset` command, the objects that are eligible for replication as per the replication config will be resynced to target if existing object replication is enabled on the rule.	2021-06-01 19:59:11 -07:00
Harshavardhana	1f262daf6f	rename all remaining packages to internal/ (#12418 ) This is to ensure that there are no projects that try to import `minio/minio/pkg` into their own repo. Any such common packages should go to `https://github.com/minio/pkg`	2021-06-01 14:59:40 -07:00
Harshavardhana	ebf75ef10d	fix: remove all unused code (#12360 )	2021-05-24 09:28:19 -07:00
Klaus Post	2ca9c533ef	feat: implement in-progress partial bucket updates (#12279 )	2021-05-19 14:38:30 -07:00
Harshavardhana	57aed841dd	do not return error for usage-cache version v4 (#12276 )	2021-05-12 08:07:02 -07:00
Klaus Post	229d83bb75	feat: add dynamic usage cache (#12229 ) A cache structure will be kept with a tree of usages. The cache is a tree structure where each keeps track of its children. An uncompacted branch contains a count of the files only directly at the branch level, and contains link to children branches or leaves. The leaves are "compacted" based on a number of properties. A compacted leaf contains the totals of all files beneath it. A leaf is only scanned once every dataUsageUpdateDirCycles, rarer if the bloom filter for the path is clean and no lifecycles are applied. Skipped leaves have their totals transferred from the previous cycle. A clean leaf will be included once every healFolderIncludeProb for partial heal scans. When selected there is a one in healObjectSelectProb that any object will be chosen for heal scan. Compaction happens when either: - The folder (and subfolders) contains less than dataScannerCompactLeastObject objects. - The folder itself contains more than dataScannerCompactAtFolders folders. - The folder only contains objects and no subfolders. - A bucket root will never be compacted. Furthermore, if a has more than dataScannerCompactAtChildren recursive children (uncompacted folders) the tree will be recursively scanned and the branches with the least number of objects will be compacted until the limit is reached. This ensures that any branch will never contain an unreasonable amount of other branches, and also that small branches with few objects don't take up unreasonable amounts of space. Whenever a branch is scanned, it is assumed that it will be un-compacted before it hits any of the above limits. This will make the branch rebalance itself when scanned if the distribution of objects has changed. TLDR; With current values: No bucket will ever have more than 10000 child nodes recursively. No single folder will have more than 2500 child nodes by itself. All subfolders are compacted if they have less than 500 objects in them recursively. We accumulate the (non-deletemarker) version count for paths as well, since we are changing the structure anyway.	2021-05-11 18:36:15 -07:00

1 2

79 Commits