minio

mirror of https://github.com/minio/minio.git synced 2024-12-26 15:15:55 -05:00

Author	SHA1	Message	Date
Harshavardhana	5d7e8f79ed	fix: remove scanner healing with unnecessary logs (#16260 )	2022-12-14 16:39:18 -08:00
Aditya Manthramurthy	a30cfdd88f	Bump up madmin-go to v2 (#16162 )	2022-12-06 13:46:50 -08:00
Klaus Post	a713aee3d5	Run staticcheck on CI (#16170 )	2022-12-05 11:18:50 -08:00
Klaus Post	cc1d8f0057	Check for abandoned data when healing (#16122 )	2022-11-28 10:20:55 -08:00
Krishnan Parthasarathi	6eef9b4a23	lifecycle: simplify Eval and HasActiveRules (#16036 )	2022-11-10 07:17:45 -08:00
Anis Elleuch	3b1a9b9fdf	Use the same lock for the scanner and site replication healing (#15985 )	2022-11-08 08:55:55 -08:00
Harshavardhana	b57fbff7c1	ignore background healInfo in single drive setup (#15968 )	2022-10-31 07:26:10 -07:00
Anis Elleuch	fc6c794972	Audit dangling object removal (#15933 )	2022-10-24 11:35:07 -07:00
Anis Elleuch	ac85c2af76	lifecycle: refactor rules filtering and tagging support (#15914 )	2022-10-21 10:46:53 -07:00
Harshavardhana	41e1654f9a	remove spurious logging for object not found (#15842 )	2022-10-12 04:28:21 -07:00
Harshavardhana	928feb0889	remove unused debug param from evalActionFromLifecycle (#15813 )	2022-10-07 10:24:12 -07:00
Harshavardhana	ae4ee95d25	change default lock retry interval to 50ms (#15560 ) competing calls on the same object on versioned bucket mutating calls on the same object may unexpected have higher delays. This can be reproduced with a replicated bucket overwriting the same object writes, deletes repeatedly. For longer locks like scanner keep the 1sec interval	2022-08-19 16:21:05 -07:00
Poorna	21bf5b4db7	replication: heal proactively upon access (#15501 ) Queue failed/pending replication for healing during listing and GET/HEAD API calls. This includes healing of existing objects that were never replicated or those in the middle of a resync operation. This PR also fixes a bug in ListObjectVersions where lifecycle filtering should be done.	2022-08-09 15:00:24 -07:00
Harshavardhana	0a8b78cb84	fix: simplify passing auditLog eventType (#15278 ) Rename Trigger -> Event to be a more appropriate name for the audit event. Bonus: fixes a bug in AddMRFWorker() it did not cancel the waitgroup, leading to waitgroup leaks.	2022-07-12 10:43:32 -07:00
Klaus Post	37a6b2da67	Allow compaction at bucket top level. (#15266 ) If more than 1M folders (objects or prefixes) are found at the top level in a bucket allow it to be compacted. While very suboptimal structure we should limit memory usage at some point.	2022-07-11 07:59:03 -07:00
Harshavardhana	ae92521310	remove unnecessary nAgreed value in partial() func (#15242 )	2022-07-07 13:45:34 -07:00
Klaus Post	ac055b09e9	Add detailed scanner metrics (#15161 )	2022-07-05 14:45:49 -07:00
Shireesh Anjal	4ce81fd07f	Add periodic callhome functionality (#14918 ) * Add periodic callhome functionality Periodically (every 24hrs by default), fetch callhome information and upload it to SUBNET. New config keys under the `callhome` subsystem: enable - Set to `on` for enabling callhome. Default `off` frequency - Interval between callhome cycles. Default `24h` * Improvements based on review comments - Update `enableCallhome` safely - Rename pctx to ctx - Block during execution of callhome - Store parsed proxy URL in global subnet config - Store callhome URL(s) in constants - Use existing global transport - Pass auth token to subnetPostReq - Use `config.EnableOn` instead of `"on"` * Use atomic package instead of lock * Use uber atomic package * Use `Cancel` instead of `cancel` Co-authored-by: Harshavardhana <harsha@minio.io> Co-authored-by: Harshavardhana <harsha@minio.io> Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>	2022-06-06 16:14:52 -07:00
Harshavardhana	52221db7ef	fix: for unexpected errors in reading versioning config panic (#14994 ) We need to make sure if we cannot read bucket metadata for some reason, and bucket metadata is not missing and returning corrupted information we should panic such handlers to disallow I/O to protect the overall state on the system. In-case of such corruption we have a mechanism now to force recreate the metadata on the bucket, using `x-minio-force-create` header with `PUT /bucket` API call. Additionally fix the versioning config updated state to be set properly for the site replication healing to trigger correctly.	2022-05-31 02:57:57 -07:00
Harshavardhana	6cfb1cb6fd	fix: timer usage across codebase (#14935 ) it seems in some places we have been wrongly using the timer.Reset() function, nicely exposed by an example shared by @donatello https://go.dev/play/p/qoF71_D1oXD this PR fixes all the usage comprehensively	2022-05-17 22:42:59 -07:00
Krishnan Parthasarathi	88dd83a365	lifecycle: Set opts.VersionSuspended when expiring objects (#14902 )	2022-05-12 06:09:24 -07:00
Krishnan Parthasarathi	ad8e611098	feat: implement prefix-level versioning exclusion (#14828 ) Spark/Hadoop workloads which use Hadoop MR Committer v1/v2 algorithm upload objects to a temporary prefix in a bucket. These objects are 'renamed' to a different prefix on Job commit. Object storage admins are forced to configure separate ILM policies to expire these objects and their versions to reclaim space. Our solution: This can be avoided by simply marking objects under these prefixes to be excluded from versioning, as shown below. Consequently, these objects are excluded from replication, and don't require ILM policies to prune unnecessary versions. - MinIO Extension to Bucket Version Configuration ```xml <VersioningConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> <Status>Enabled</Status> <ExcludeFolders>true</ExcludeFolders> <ExcludedPrefixes> <Prefix>app1-jobs//_temporary/</Prefix> </ExcludedPrefixes> <ExcludedPrefixes> <Prefix>app2-jobs//__magic/</Prefix> </ExcludedPrefixes> <!-- .. up to 10 prefixes in all --> </VersioningConfiguration> ``` Note: `ExcludeFolders` excludes all folders in a bucket from versioning. This is required to prevent the parent folders from accumulating delete markers, especially those which are shared across spark workloads spanning projects/teams. - To enable version exclusion on a list of prefixes ``` mc version enable --excluded-prefixes "app1-jobs//_temporary/,app2-jobs//_magic," --exclude-prefix-marker myminio/test ```	2022-05-06 19:05:28 -07:00
Krishnan Parthasarathi	28d3ad3ada	Honor object retention when applying ILM policies (#14732 )	2022-04-11 21:55:56 -07:00
Harshavardhana	153a612253	fetch bucket retention config once for ILM evalAction (#14727 ) This is mainly an optimization, does not change any existing functionality.	2022-04-11 13:25:32 -07:00
Anis Elleuch	16431d222c	heal: Enable periodic bitrot scan configuration (#14464 )	2022-04-07 08:10:40 -07:00
Klaus Post	1d1b213f1f	scanner: Consider preselection bias when selecting for Healing (#14492 ) Healing decisions would align with skipped folder counters. This can lead to files never being selected for heal checks on "clean" paths. Use different hashing methods and take objectHealProbDiv into account when calculating the cycle. Found by @vadmeste	2022-03-07 09:25:53 -08:00
Harshavardhana	aaea94a48d	update quorum requirement to list all objects (#14201 ) some upgraded objects might not get listed due to different quorum ratios across objects. make sure to list all objects that satisfy the maximum possible quorum.	2022-01-27 17:00:15 -08:00
Harshavardhana	57118919d2	cached diskIDs are not needed for scanner healing (#14170 ) This PR removes an unnecessary state that gets passed around for DiskIDs, which is not necessary since each disk exactly knows which pool and which set it belongs to on a running system. Currently cached DiskId's won't work properly because it always ends up skipping offline disks and never runs healing when disks are offline, as it expects all the cached diskIDs to be present always. This also sort of made things in-flexible in terms perhaps a new diskID for `format.json`. (however this is not a big issue) This is an unnecessary requirement that healing via scanner needs all drives to be online, instead healing should trigger even when partial nodes and drives are available this ensures that we keep the SLA in-tact on the objects when disks are offline for a prolonged period of time.	2022-01-26 08:34:56 -08:00
Harshavardhana	001b77e7e1	use readConfig/saveConfig to simplify I/O on usage/tracker info (#14019 )	2022-01-03 10:22:58 -08:00
Harshavardhana	a60ac7ca17	fix: audit log to support object names in multipleObjectNames() handler (#14017 )	2022-01-03 01:28:52 -08:00
Harshavardhana	42ba0da6b0	fix: initialize new drwMutex for each attempt in 'for {' loop. (#14009 ) It is possible that GetLock() call remembers a previously failed releaseAll() when there are networking issues, now this state can have potential side effects. This PR tries to avoid this side affect by making sure to initialize NewNSLock() for each GetLock() attempts made to avoid any prior state in the memory that can interfere with the new lock grants.	2022-01-02 09:15:34 -08:00
Harshavardhana	f527c708f2	run gofumpt cleanup across code-base (#14015 )	2022-01-02 09:15:06 -08:00
Poorna K	e270ab65b3	fix: healing of replication delete markers (#13933 ) A corner case can occur where the delete-marker was propagated but the metadata could not be updated on the primary. Sending a RemoveObject call with the Delete marker version would end up permanently deleting the version on target. Instead, perform a Stat on the delete-marker version on target and redo replication only if the delete-marker is missing on target.	2021-12-16 15:34:55 -08:00
Anis Elleuch	926373f9c1	Run the data scanner routine in a loop (#13928 ) After the introduction of Refresh logic in locks, the data scanner can quit when the data scanner lock is not able to get refreshed. In that case, the context of the data scanner will get canceled and runDataScanner() will quit. Another server would pick the scanning routine but after some time, all nodes can just have all scanning routine aborted, as described above. This fix will just run the data scanner in a loop.	2021-12-16 08:32:15 -08:00
Krishnan Parthasarathi	44a9339c0a	Newer noncurrent versions (#13815 ) - Rename MaxNoncurrentVersions tag to NewerNoncurrentVersions Note: We apply overlapping NewerNoncurrentVersions rules such that we honor the highest among applicable limits. e.g if 2 overlapping rules are configured with 2 and 3 noncurrent versions to be retained, we will retain 3. - Expire newer noncurrent versions after noncurrent days - MinIO extension: allow noncurrent days to be zero, allowing expiry of noncurrent version as soon as more than configured NewerNoncurrentVersions are present. - Allow NewerNoncurrentVersions rules on object-locked buckets - No x-amz-expiration when NewerNoncurrentVersions configured - ComputeAction should skip rules with NewerNoncurrentVersions > 0 - Add unit tests for lifecycle.ComputeAction - Support lifecycle rules with MaxNoncurrentVersions - Extend ExpectedExpiryTime to work with zero days - Fix all-time comparisons to be relative to UTC	2021-12-14 09:41:44 -08:00
Klaus Post	3db931dc0e	Improve listing consistency with version merging (#13723 )	2021-12-02 11:29:16 -08:00
Krishnan Parthasarathi	3da9ee15d3	Add MaxNoncurrentVersions to NoncurrentExpiration action (#13580 ) This unit allows users to limit the maximum number of noncurrent versions of an object. To enable this rule you need the following ilm.json ``` cat >> ilm.json <<EOF { "Rules": [ { "ID": "test-max-noncurrent", "Status": "Enabled", "Filter": { "Prefix": "user-uploads/" }, "NoncurrentVersionExpiration": { "MaxNoncurrentVersions": 5 } } ] } EOF mc ilm import myminio/mybucket < ilm.json ```	2021-11-19 17:54:10 -08:00
Harshavardhana	661b263e77	add gocritic/ruleguard checks back again, cleanup code. (#13665 ) - remove some duplicated code - reported a bug, separately fixed in #13664 - using strings.ReplaceAll() when needed - using filepath.ToSlash() use when needed - remove all non-Go style comments from the codebase Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>	2021-11-16 09:28:29 -08:00
Harshavardhana	acf26c5ab7	re-arrange metacache struct to be optimal (#13609 )	2021-11-08 10:26:08 -08:00
Krishnan Parthasarathi	939fbb3c38	ilm: Make per-tier stats available via admin-tier-info (#13381 )	2021-10-23 18:38:33 -07:00
Klaus Post	75699a3825	Add basic scanner metrics (#13317 ) Add number of objects/versions/folders scanned as well as ILM action outcomes.	2021-10-02 09:31:05 -07:00
Krishnan Parthasarathi	f3aeed77e5	Add immediate inline tiering support (#13298 )	2021-10-01 11:58:17 -07:00
Poorna Krishnamoorthy	19ecdc75a8	replication: Simplify metrics calculation (#13274 ) Also doing some code cleanup	2021-09-22 10:48:45 -07:00
Harshavardhana	8392765213	healObjects() should cancel() context before writing to errCh (#13262 ) also remove HealObjects() code from dataScanner running another listing from the data-scanner is super in-efficient and in-fact this code is redundant since we already attempt to heal all dangling objects anyways.	2021-09-21 14:55:17 -07:00
Poorna Krishnamoorthy	c4373ef290	Add support for multi site replication (#12880 )	2021-09-18 13:31:35 -07:00
Klaus Post	f98f115ac2	fs: Fix non-progressing scanner (#13218 ) Scanner would keep doing the same cycle in FS mode leading to missed updates. Add a few sanity checks and handle errors better.	2021-09-15 09:24:41 -07:00
Harshavardhana	a19e3bc9d9	add more dangling heal related tests (#13140 ) also make sure that HealObject() never returns 'ObjectNotFound' or 'VersionNotFound' errors, as those are meaningless and not useful for the caller.	2021-09-02 20:56:13 -07:00
Harshavardhana	35f2552fc5	reduce extra getObjectInfo() calls during ILM transition (#13091 ) * reduce extra getObjectInfo() calls during ILM transition This PR also changes expiration logic to be non-blocking, scanner is now free from additional costs incurred due to slower object layer calls and hitting the drives. * move verifying expiration inside locks	2021-08-27 17:06:47 -07:00
Harshavardhana	ed16ce9b73	add healing workers support to parallelize healing (#13081 ) Faster healing as well as making healing more responsive for faster scanner times. also fixes a bug introduced in #13079, newly replaced disks were not healing automatically.	2021-08-26 20:32:58 -07:00
Harshavardhana	c11a2ac396	refactor healing to remove certain structs (#13079 ) - remove sourceCh usage from healing we already have tasks and resp channel - use read locks to lookup globalHealConfig - fix healing resolver to pick candidates quickly that need healing, without this resolver was unexpectedly skipping.	2021-08-26 14:06:04 -07:00

1 2

92 Commits