Commit Graph

121 Commits

Author SHA1 Message Date
Aditya Manthramurthy 1c99fb106c
Update to minio/pkg/v2 (#17967) 2023-09-04 12:57:37 -07:00
Harshavardhana 18b3655c99
with xlv2 format we never had to fill in checksumInfo() (#17963)
- this PR avoids sending a large ChecksumInfo slice
  when its not needed

- also for a file with XLV2 format there is no reason
  to allocate Checksum slice while reading
2023-09-01 13:45:58 -07:00
Harshavardhana 9458485e43
avoid double logging from healing (#17950) 2023-08-30 18:46:04 -07:00
Harshavardhana 45fb375c41
allow healing to prefer local disks over remote (#17788) 2023-08-03 02:18:18 -07:00
Kaan Kabalak 21fbe88e1f
Print certain log messages once per error (#17484) 2023-06-24 20:29:13 -07:00
Harshavardhana 1f8b9b4bd5
fix: do not listAndHeal() inline with PutObject() (#17499)
there is a possibility that slow drives can actually add latency
to the overall call, leading to a large spike in latency.

this can happen if there are other parallel listObjects()
calls to the same drive, in-turn causing each other to sort
of serialize.

this potentially improves performance and makes PutObject()
also non-blocking.
2023-06-24 19:31:04 -07:00
Aditya Manthramurthy 5a1612fe32
Bump up madmin-go and pkg deps (#17469) 2023-06-19 17:53:08 -07:00
Harshavardhana 1443b5927a
allow quorum fileInfo to pick same parityBlocks (#17454)
Bonus: allow replication to proceed for 503 errors such as
with error code SlowDownRead
2023-06-18 18:20:15 -07:00
Harshavardhana 64de61d15d
fallback on etags if they match when mtime is not same (#17424)
on "unversioned" buckets there are situations
when successive concurrent I/O can lead to
an inconsistent state() with mtime while the
etag might be the same for the object on disk.

in such a scenario it is possible for us to
allow reading of the object since etag matches
and if etag matches we are guaranteed that we
have enough copies the object will be readable
and same.

This PR allows fallback in such scenarios.
2023-06-17 19:18:20 -07:00
Anis Eleuch ae95384dd8
Revert "heal: Update object parity with the latest configured SC (#17187)" (#17404) 2023-06-12 11:54:51 -07:00
Anis Eleuch e2b7a08c10
heal: Update object parity with the latest configured SC (#17187) 2023-05-15 21:32:13 -07:00
Praveen raj Mani 72802a5972
Use 'minio/pkg/sync/errgroup' and 'minio/pkg/workers' (#17069) 2023-04-25 22:57:40 -07:00
Krishnan Parthasarathi fae9000304
heal: Pick maximally occuring modTime in quorum (#17071) 2023-04-25 10:13:57 -07:00
Krishnan Parthasarathi 25f7a8e406
Indicate RenameData is called by healObject (#16997) 2023-04-09 10:25:37 -07:00
Harshavardhana 6c11dbffd5
add crash protection from backend modifications (#16846) 2023-03-20 09:08:42 -07:00
ferhat elmas 714283fae2
cleanup ignored static analysis (#16767) 2023-03-06 08:56:10 -08:00
Klaus Post 9acf1024e4
Remove bloom filter (#16682)
Removes the bloom filter since it has so limited usability, often gets saturated anyway and adds a bunch of complexity to the scanner.

Also removes a tiny bit of CPU by each write operation.
2023-02-24 09:03:31 +05:30
Klaus Post fd6622458b
Add detailed scanner trace output and notifications (#16668) 2023-02-21 09:33:33 -08:00
Harshavardhana 72daccd468
fix: scanner in healing cycle must use actual size (#16589) 2023-02-10 06:53:03 -08:00
Poorna b22b39de96
Avoid dangling deletes if disk not found (#16401) 2023-01-12 22:20:19 -08:00
Harshavardhana a15a2556c3
converge listBuckets() as a peer call (#16346) 2023-01-03 23:39:40 -08:00
Anis Elleuch acc9c033ed
debug: Add X-Amz-Request-ID to lock/unlock calls (#16309) 2022-12-23 19:49:07 -08:00
Klaus Post 70986b6e6e
Add version id to healresult (#16193) 2022-12-08 07:49:10 -08:00
Aditya Manthramurthy a30cfdd88f
Bump up madmin-go to v2 (#16162) 2022-12-06 13:46:50 -08:00
Klaus Post cc1d8f0057
Check for abandoned data when healing (#16122) 2022-11-28 10:20:55 -08:00
Harshavardhana fd6f6fc8df
cleanup stale parent multipart directories (#15980) 2022-11-01 08:00:02 -07:00
Harshavardhana 136d41775f
remove numAvailableDisks check as it doesn't serve any purpose (#15954) 2022-10-27 09:05:24 -07:00
Harshavardhana 2d9b5a65f1
verify RenameData() versions to be consistent (#15649)
xl.meta gets written and never rolled back, however
we definitely need to validate the state that is
persisted on the disk, if there are inconsistencies

- more than write quorum we should return an error
  to the client

- if write quorum was achieved however there are
  inconsistent xl.meta's we should simply trigger
  an MRF on them
2022-09-05 16:51:37 -07:00
Harshavardhana bcedc2b0d9
fix: add healing metric type for heal tracing (#15631)
changes the `heal.checkBucket` to `heal.Bucket` instead
since the latter is more meaningful.
2022-08-31 12:28:03 -07:00
Klaus Post dec942beb6
feat: Add healing trace (#15616) 2022-08-31 01:56:12 -07:00
Klaus Post a9f1ad7924
Add extended checksum support (#15433) 2022-08-29 16:57:16 -07:00
ebozduman b57e7321e7
Replaces 'disk'=>'drive' visible to end user (#15464) 2022-08-04 16:10:08 -07:00
Harshavardhana a6e0ec4e6f
Add support converting non-inlined to inlined (#15444)
This is a feature to allow for inode compaction on
large clusters that use a lot of small files spread
across a large heirarchy.
2022-08-02 23:10:22 -07:00
Anis Elleuch b3edb25377
bloom: healObject to mark a path dirty only for dangling objects (#15458)
The path is marked dirty automatically when healObject() is called, which is
wrong. HealObject() is called during self-healing and this will lead to
an increase in the false positive result of the bloom filter.

Also move NSUpdated() from renameData() and call it directly in
CompleteMultipart and PutObject, this is not a functional change but
it will make it less prone to errors in the future.
2022-08-02 16:57:39 -07:00
Harshavardhana ce8397f7d9
use partInfo only for intermediate part.x.meta (#15353) 2022-07-19 18:56:24 -07:00
Klaus Post 911a17b149
Add compressed file index (#15247) 2022-07-11 17:30:56 -07:00
Praveen raj Mani b49fc33cb3
purge objects immediately with `x-minio-force-delete` in DeleteObject and DeleteBucket API (#15148) 2022-07-11 09:15:54 -07:00
Anis Elleuch 73733a8fb9
heal: Report correctly in multip-pools setup (#15117)
`mc admin heal -r <alias>` in a multi setup pools returns incorrectly
grey objects. The reason is that erasure-server-pools.HealObject() runs
HealObject in all pools and returns the result of the first nil
error. However, in the lower erasureObject level, HealObject() returns
nil if an object does not exist + missing error in each disk of the object
in that pool, therefore confusing mc.

Make erasureObject.HealObject() to return not found error in the lower
level, so at least erasureServerPools will know what pools to ignore.
2022-06-20 08:07:45 -07:00
Harshavardhana 013cc66d8e
add dataErrs for healing debug log (#15092) 2022-06-16 09:42:45 -07:00
Harshavardhana dea8220eee
do not heal outdated disks > parityBlocks (#14976)
this PR also fixes a situation where incorrect
partsMetadata slice was used where fi.Data was
re-used from a single drive causing duplication
of the shards across all drives.

This happens for situations where shouldHeal()
returns true for all drives > parityBlocks.

To avoid this we should never attempt to heal on all
drives > parityBlocks, unless we are doing metadata
migration from xl.json -> xl.meta
2022-05-25 15:17:10 -07:00
Harshavardhana eda34423d7 update gofumpt -w - new changes 2022-04-13 12:00:11 -07:00
Anis Elleuch 3fca4055d2
heal: Re-heal an object when a corruption is found during normal scan (#14482)
When scanning using normal mode, HealObject() can report an 
error saying that it found a corrupted part. This doesn't have 
when HealObject() is called with bitrot scan flag. However, when 
this happens, we can still restart HealObject() with the bitrot scan.

This is also important because this means the scanner and the 
new disks healer will not be able to heal an object that doesn't 
exist in a specific disk and has corruption in another disk.

Also without this PR, mc admin heal command without bitrot will report
an error.
2022-03-04 18:24:34 -08:00
Harshavardhana f19a414e09
fix: allow danging objects to be purged properly deleteMultipleObjects() (#14273)
Deleting bulk objects had an issue since the relevant versionID
is not passed through the layers to ensure that the dangling
object purge actually works cleanly.

This is a continuation of quorum related error returned by
multi-object delete API from #14248

This PR ensures that we pass down correct information as
well as extend the scope of dangling object detection.
2022-02-08 20:08:23 -08:00
Harshavardhana f546636c52
fix: use renameAll instead of deleteObject() for purging temporary files (#14096)
This PR simplifies few things

- Multipart parts are renamed, upon failure are unrenamed() keep this
  multipart specific behavior it is needed and works fine.

- AbortMultipart should blindly delete once lock is acquired instead
  of re-reading metadata and calculating quorum, abort is a delete()
  operation and client has no business looking for errors on this.

- Skip Access() calls to folders that are operating on
  `.minio.sys/multipart` folder as well.
2022-01-13 11:07:41 -08:00
Harshavardhana 38ccc4f672
fix: make sure to avoid calling RenameData() on disconnected disks. (#14094)
Large clusters with multiple sets, or multi-pool setups at times might
fail and report unexpected "file not found" errors. This can become
a problem during startup sequence when some files need to be created
at multiple locations.

- This PR ensures that we nil the erasure writers such that they
  are skipped in RenameData() call.

- RenameData() doesn't need to "Access()" calls for `.minio.sys`
  folders they always exist.

- Make sure PutObject() never returns ObjectNotFound{} for any
  errors, make sure it always returns "WriteQuorum" when renameData()
  fails with ObjectNotFound{}. Return appropriate errors for all
  other cases.
2022-01-12 18:49:01 -08:00
Harshavardhana b7c5e45fff
heal: isObjectDangling should return false when it cannot decide (#14053)
In a multi-pool setup when disks are coming up, or in a single pool
setup let's say with 100's of erasure sets with a slow network.

It's possible when healing is attempted on `.minio.sys/config`
folder, it can lead to healing unexpectedly deleting some policy
files as dangling due to a mistake in understanding when `isObjectDangling`
is considered to be 'true'.

This issue happened in commit 30135eed86
when we assumed the validMeta with empty ErasureInfo is considered
to be fully dangling. This implementation issue gets exposed when
the server is starting up.

This is most easily seen with multiple-pool setups because of the
disconnected fashion pools that come up. The decision to purge the
object as dangling is taken incorrectly prior to the correct state
being achieved on each pool, when the corresponding drive let's say
returns 'errDiskNotFound', a 'delete' is triggered. At this point,
the 'drive' comes online because this is part of the startup sequence
as drives can come online lazily.

This kind of situation exists because we allow (totalDisks/2) number
of drives to be online when the server is being restarted.

Implementation made an incorrect assumption here leading to policies
getting deleted.

Added tests to capture the implementation requirements.
2022-01-07 19:11:54 -08:00
Harshavardhana f527c708f2
run gofumpt cleanup across code-base (#14015) 2022-01-02 09:15:06 -08:00
Harshavardhana 79df2c7ce7
correctly calculate read quorum based on the available fileInfo (#14000)
The current usage of assuming `default` parity of `4` is not correct
for all objects stored on MinIO, objects in .minio.sys have maximum
parity, healing won't trigger on these objects due to incorrect
verification of quorum.
2021-12-28 15:33:03 -08:00
Harshavardhana b883803b21
fix: healing across pools removing dangling objects (#13990)
adds other simplifications to the code when running
namespace heals across pools.
2021-12-25 09:01:44 -08:00
Harshavardhana 7e3a7d7044
add healing for invalid shards by skipping the blocks (#13978)
Built on top of #13945, now we need to simply skip the
shards and its automated.
2021-12-23 23:01:46 -08:00