Commit Graph

112 Commits

Author SHA1 Message Date
Anis Eleuch
ae95384dd8
Revert "heal: Update object parity with the latest configured SC (#17187)" (#17404) 2023-06-12 11:54:51 -07:00
Anis Eleuch
e2b7a08c10
heal: Update object parity with the latest configured SC (#17187) 2023-05-15 21:32:13 -07:00
Praveen raj Mani
72802a5972
Use 'minio/pkg/sync/errgroup' and 'minio/pkg/workers' (#17069) 2023-04-25 22:57:40 -07:00
Krishnan Parthasarathi
fae9000304
heal: Pick maximally occuring modTime in quorum (#17071) 2023-04-25 10:13:57 -07:00
Krishnan Parthasarathi
25f7a8e406
Indicate RenameData is called by healObject (#16997) 2023-04-09 10:25:37 -07:00
Harshavardhana
6c11dbffd5
add crash protection from backend modifications (#16846) 2023-03-20 09:08:42 -07:00
ferhat elmas
714283fae2
cleanup ignored static analysis (#16767) 2023-03-06 08:56:10 -08:00
Klaus Post
9acf1024e4
Remove bloom filter (#16682)
Removes the bloom filter since it has so limited usability, often gets saturated anyway and adds a bunch of complexity to the scanner.

Also removes a tiny bit of CPU by each write operation.
2023-02-24 09:03:31 +05:30
Klaus Post
fd6622458b
Add detailed scanner trace output and notifications (#16668) 2023-02-21 09:33:33 -08:00
Harshavardhana
72daccd468
fix: scanner in healing cycle must use actual size (#16589) 2023-02-10 06:53:03 -08:00
Poorna
b22b39de96
Avoid dangling deletes if disk not found (#16401) 2023-01-12 22:20:19 -08:00
Harshavardhana
a15a2556c3
converge listBuckets() as a peer call (#16346) 2023-01-03 23:39:40 -08:00
Anis Elleuch
acc9c033ed
debug: Add X-Amz-Request-ID to lock/unlock calls (#16309) 2022-12-23 19:49:07 -08:00
Klaus Post
70986b6e6e
Add version id to healresult (#16193) 2022-12-08 07:49:10 -08:00
Aditya Manthramurthy
a30cfdd88f
Bump up madmin-go to v2 (#16162) 2022-12-06 13:46:50 -08:00
Klaus Post
cc1d8f0057
Check for abandoned data when healing (#16122) 2022-11-28 10:20:55 -08:00
Harshavardhana
fd6f6fc8df
cleanup stale parent multipart directories (#15980) 2022-11-01 08:00:02 -07:00
Harshavardhana
136d41775f
remove numAvailableDisks check as it doesn't serve any purpose (#15954) 2022-10-27 09:05:24 -07:00
Harshavardhana
2d9b5a65f1
verify RenameData() versions to be consistent (#15649)
xl.meta gets written and never rolled back, however
we definitely need to validate the state that is
persisted on the disk, if there are inconsistencies

- more than write quorum we should return an error
  to the client

- if write quorum was achieved however there are
  inconsistent xl.meta's we should simply trigger
  an MRF on them
2022-09-05 16:51:37 -07:00
Harshavardhana
bcedc2b0d9
fix: add healing metric type for heal tracing (#15631)
changes the `heal.checkBucket` to `heal.Bucket` instead
since the latter is more meaningful.
2022-08-31 12:28:03 -07:00
Klaus Post
dec942beb6
feat: Add healing trace (#15616) 2022-08-31 01:56:12 -07:00
Klaus Post
a9f1ad7924
Add extended checksum support (#15433) 2022-08-29 16:57:16 -07:00
ebozduman
b57e7321e7
Replaces 'disk'=>'drive' visible to end user (#15464) 2022-08-04 16:10:08 -07:00
Harshavardhana
a6e0ec4e6f
Add support converting non-inlined to inlined (#15444)
This is a feature to allow for inode compaction on
large clusters that use a lot of small files spread
across a large heirarchy.
2022-08-02 23:10:22 -07:00
Anis Elleuch
b3edb25377
bloom: healObject to mark a path dirty only for dangling objects (#15458)
The path is marked dirty automatically when healObject() is called, which is
wrong. HealObject() is called during self-healing and this will lead to
an increase in the false positive result of the bloom filter.

Also move NSUpdated() from renameData() and call it directly in
CompleteMultipart and PutObject, this is not a functional change but
it will make it less prone to errors in the future.
2022-08-02 16:57:39 -07:00
Harshavardhana
ce8397f7d9
use partInfo only for intermediate part.x.meta (#15353) 2022-07-19 18:56:24 -07:00
Klaus Post
911a17b149
Add compressed file index (#15247) 2022-07-11 17:30:56 -07:00
Praveen raj Mani
b49fc33cb3
purge objects immediately with x-minio-force-delete in DeleteObject and DeleteBucket API (#15148) 2022-07-11 09:15:54 -07:00
Anis Elleuch
73733a8fb9
heal: Report correctly in multip-pools setup (#15117)
`mc admin heal -r <alias>` in a multi setup pools returns incorrectly
grey objects. The reason is that erasure-server-pools.HealObject() runs
HealObject in all pools and returns the result of the first nil
error. However, in the lower erasureObject level, HealObject() returns
nil if an object does not exist + missing error in each disk of the object
in that pool, therefore confusing mc.

Make erasureObject.HealObject() to return not found error in the lower
level, so at least erasureServerPools will know what pools to ignore.
2022-06-20 08:07:45 -07:00
Harshavardhana
013cc66d8e
add dataErrs for healing debug log (#15092) 2022-06-16 09:42:45 -07:00
Harshavardhana
dea8220eee
do not heal outdated disks > parityBlocks (#14976)
this PR also fixes a situation where incorrect
partsMetadata slice was used where fi.Data was
re-used from a single drive causing duplication
of the shards across all drives.

This happens for situations where shouldHeal()
returns true for all drives > parityBlocks.

To avoid this we should never attempt to heal on all
drives > parityBlocks, unless we are doing metadata
migration from xl.json -> xl.meta
2022-05-25 15:17:10 -07:00
Harshavardhana
eda34423d7 update gofumpt -w - new changes 2022-04-13 12:00:11 -07:00
Anis Elleuch
3fca4055d2
heal: Re-heal an object when a corruption is found during normal scan (#14482)
When scanning using normal mode, HealObject() can report an 
error saying that it found a corrupted part. This doesn't have 
when HealObject() is called with bitrot scan flag. However, when 
this happens, we can still restart HealObject() with the bitrot scan.

This is also important because this means the scanner and the 
new disks healer will not be able to heal an object that doesn't 
exist in a specific disk and has corruption in another disk.

Also without this PR, mc admin heal command without bitrot will report
an error.
2022-03-04 18:24:34 -08:00
Harshavardhana
f19a414e09
fix: allow danging objects to be purged properly deleteMultipleObjects() (#14273)
Deleting bulk objects had an issue since the relevant versionID
is not passed through the layers to ensure that the dangling
object purge actually works cleanly.

This is a continuation of quorum related error returned by
multi-object delete API from #14248

This PR ensures that we pass down correct information as
well as extend the scope of dangling object detection.
2022-02-08 20:08:23 -08:00
Harshavardhana
f546636c52
fix: use renameAll instead of deleteObject() for purging temporary files (#14096)
This PR simplifies few things

- Multipart parts are renamed, upon failure are unrenamed() keep this
  multipart specific behavior it is needed and works fine.

- AbortMultipart should blindly delete once lock is acquired instead
  of re-reading metadata and calculating quorum, abort is a delete()
  operation and client has no business looking for errors on this.

- Skip Access() calls to folders that are operating on
  `.minio.sys/multipart` folder as well.
2022-01-13 11:07:41 -08:00
Harshavardhana
38ccc4f672
fix: make sure to avoid calling RenameData() on disconnected disks. (#14094)
Large clusters with multiple sets, or multi-pool setups at times might
fail and report unexpected "file not found" errors. This can become
a problem during startup sequence when some files need to be created
at multiple locations.

- This PR ensures that we nil the erasure writers such that they
  are skipped in RenameData() call.

- RenameData() doesn't need to "Access()" calls for `.minio.sys`
  folders they always exist.

- Make sure PutObject() never returns ObjectNotFound{} for any
  errors, make sure it always returns "WriteQuorum" when renameData()
  fails with ObjectNotFound{}. Return appropriate errors for all
  other cases.
2022-01-12 18:49:01 -08:00
Harshavardhana
b7c5e45fff
heal: isObjectDangling should return false when it cannot decide (#14053)
In a multi-pool setup when disks are coming up, or in a single pool
setup let's say with 100's of erasure sets with a slow network.

It's possible when healing is attempted on `.minio.sys/config`
folder, it can lead to healing unexpectedly deleting some policy
files as dangling due to a mistake in understanding when `isObjectDangling`
is considered to be 'true'.

This issue happened in commit 30135eed86
when we assumed the validMeta with empty ErasureInfo is considered
to be fully dangling. This implementation issue gets exposed when
the server is starting up.

This is most easily seen with multiple-pool setups because of the
disconnected fashion pools that come up. The decision to purge the
object as dangling is taken incorrectly prior to the correct state
being achieved on each pool, when the corresponding drive let's say
returns 'errDiskNotFound', a 'delete' is triggered. At this point,
the 'drive' comes online because this is part of the startup sequence
as drives can come online lazily.

This kind of situation exists because we allow (totalDisks/2) number
of drives to be online when the server is being restarted.

Implementation made an incorrect assumption here leading to policies
getting deleted.

Added tests to capture the implementation requirements.
2022-01-07 19:11:54 -08:00
Harshavardhana
f527c708f2
run gofumpt cleanup across code-base (#14015) 2022-01-02 09:15:06 -08:00
Harshavardhana
79df2c7ce7
correctly calculate read quorum based on the available fileInfo (#14000)
The current usage of assuming `default` parity of `4` is not correct
for all objects stored on MinIO, objects in .minio.sys have maximum
parity, healing won't trigger on these objects due to incorrect
verification of quorum.
2021-12-28 15:33:03 -08:00
Harshavardhana
b883803b21
fix: healing across pools removing dangling objects (#13990)
adds other simplifications to the code when running
namespace heals across pools.
2021-12-25 09:01:44 -08:00
Harshavardhana
7e3a7d7044
add healing for invalid shards by skipping the blocks (#13978)
Built on top of #13945, now we need to simply skip the
shards and its automated.
2021-12-23 23:01:46 -08:00
Harshavardhana
0e3037631f
skip inconsistent shards if possible (#13945)
data shards were wrong due to a healing bug
reported in #13803 mainly with unaligned object
sizes.

This PR is an attempt to automatically avoid
these shards, with available information about
the `xl.meta` and actually disk mtime.
2021-12-21 10:08:26 -08:00
jiangfucheng
7460fb8349
fix padding error and compatible with uploaded objects (#13803) 2021-12-03 09:26:30 -08:00
Harshavardhana
28f95f1fbe
quorum calculation getLatestFileInfo should be itself (#13717)
FileInfo quorum shouldn't be passed down, instead
inferred after obtaining a maximally occurring FileInfo.

This PR also changes other functions that rely on
wrong quorum calculation.

Update tests as well to handle the proper requirement. All
these changes are needed when migrating from older deployments
where we used to set N/2 quorum for reads to EC:4 parity in
newer releases.
2021-11-22 09:36:29 -08:00
Harshavardhana
c791de0e1e
re-implement pickValidInfo dataDir, move to quorum calculation (#13681)
dataDir loosely based on maxima is incorrect and does not
work in all situations such as disks in the following order

- xl.json migration to xl.meta there may be partial xl.json's
  leftover if some disks are not yet connected when the disk
  is yet to come up, since xl.json mtime and xl.meta is
  same the dataDir maxima doesn't work properly leading to
  quorum issues.

- its also possible that XLV1 might be true among the disks
  available, make sure to keep FileInfo based on common quorum
  and skip unexpected disks with the older data format.

Also, this PR tests upgrade from older to a newer release if the 
data is readable and matches the checksum.

NOTE: this is just initial work we can build on top of this to do further tests.
2021-11-21 10:41:30 -08:00
Harshavardhana
4545ecad58
ignore swapped drives instead of throwing errors (#13655)
- add checks such that swapped disks are detected
  and ignored - never used for normal operations.

- implement `unrecognizedDisk` to be ignored with
  all operations returning `errDiskNotFound`.

- also add checks such that we do not load unexpected
  disks while connecting automatically.

- additionally humanize the values when printing the errors.

Bonus: fixes handling of non-quorum situations in
getLatestFileInfo(), that does not work when 2 drives
are down, currently this function would return errors
incorrectly.
2021-11-15 09:46:55 -08:00
Harshavardhana
94d587e6fc
fix: delete-markers without quorum were unreadable (#13351)
DeleteMarkers were unreadable if they had quorum based
guarantees, this PR tries to fix this behavior appropriately.

DeleteMarkers with sufficient should be allowed and the
return error should be accordingly with or without version-id.

This also allows for overwrites which may not be possible
in a multi-pool setup.

fixes #12787
2021-10-04 08:53:38 -07:00
Anis Elleuch
1d9e91e00f
Fix wrong reporting of total disks after restart (#13326)
A restart of the cluster and a failed disk will wrongly count 
the number of total disks.
2021-09-29 11:36:19 -07:00
Klaus Post
0e7fdcee30
Healing: Decide healing inlining based on metadata (#13178)
Don't perform an independent evaluation of inlining, but mirror the decision made when uploading the object.

Leads to some objects being inlined or not based on new metrics. Instead respect previous decision.
2021-09-09 08:55:43 -07:00
Harshavardhana
a19e3bc9d9
add more dangling heal related tests (#13140)
also make sure that HealObject() never returns
'ObjectNotFound' or 'VersionNotFound' errors,
as those are meaningless and not useful for the
caller.
2021-09-02 20:56:13 -07:00