Commit Graph

83 Commits

Author SHA1 Message Date
Harshavardhana 013cc66d8e
add dataErrs for healing debug log (#15092) 2022-06-16 09:42:45 -07:00
Harshavardhana dea8220eee
do not heal outdated disks > parityBlocks (#14976)
this PR also fixes a situation where incorrect
partsMetadata slice was used where fi.Data was
re-used from a single drive causing duplication
of the shards across all drives.

This happens for situations where shouldHeal()
returns true for all drives > parityBlocks.

To avoid this we should never attempt to heal on all
drives > parityBlocks, unless we are doing metadata
migration from xl.json -> xl.meta
2022-05-25 15:17:10 -07:00
Harshavardhana eda34423d7 update gofumpt -w - new changes 2022-04-13 12:00:11 -07:00
Anis Elleuch 3fca4055d2
heal: Re-heal an object when a corruption is found during normal scan (#14482)
When scanning using normal mode, HealObject() can report an 
error saying that it found a corrupted part. This doesn't have 
when HealObject() is called with bitrot scan flag. However, when 
this happens, we can still restart HealObject() with the bitrot scan.

This is also important because this means the scanner and the 
new disks healer will not be able to heal an object that doesn't 
exist in a specific disk and has corruption in another disk.

Also without this PR, mc admin heal command without bitrot will report
an error.
2022-03-04 18:24:34 -08:00
Harshavardhana f19a414e09
fix: allow danging objects to be purged properly deleteMultipleObjects() (#14273)
Deleting bulk objects had an issue since the relevant versionID
is not passed through the layers to ensure that the dangling
object purge actually works cleanly.

This is a continuation of quorum related error returned by
multi-object delete API from #14248

This PR ensures that we pass down correct information as
well as extend the scope of dangling object detection.
2022-02-08 20:08:23 -08:00
Harshavardhana f546636c52
fix: use renameAll instead of deleteObject() for purging temporary files (#14096)
This PR simplifies few things

- Multipart parts are renamed, upon failure are unrenamed() keep this
  multipart specific behavior it is needed and works fine.

- AbortMultipart should blindly delete once lock is acquired instead
  of re-reading metadata and calculating quorum, abort is a delete()
  operation and client has no business looking for errors on this.

- Skip Access() calls to folders that are operating on
  `.minio.sys/multipart` folder as well.
2022-01-13 11:07:41 -08:00
Harshavardhana 38ccc4f672
fix: make sure to avoid calling RenameData() on disconnected disks. (#14094)
Large clusters with multiple sets, or multi-pool setups at times might
fail and report unexpected "file not found" errors. This can become
a problem during startup sequence when some files need to be created
at multiple locations.

- This PR ensures that we nil the erasure writers such that they
  are skipped in RenameData() call.

- RenameData() doesn't need to "Access()" calls for `.minio.sys`
  folders they always exist.

- Make sure PutObject() never returns ObjectNotFound{} for any
  errors, make sure it always returns "WriteQuorum" when renameData()
  fails with ObjectNotFound{}. Return appropriate errors for all
  other cases.
2022-01-12 18:49:01 -08:00
Harshavardhana b7c5e45fff
heal: isObjectDangling should return false when it cannot decide (#14053)
In a multi-pool setup when disks are coming up, or in a single pool
setup let's say with 100's of erasure sets with a slow network.

It's possible when healing is attempted on `.minio.sys/config`
folder, it can lead to healing unexpectedly deleting some policy
files as dangling due to a mistake in understanding when `isObjectDangling`
is considered to be 'true'.

This issue happened in commit 30135eed86
when we assumed the validMeta with empty ErasureInfo is considered
to be fully dangling. This implementation issue gets exposed when
the server is starting up.

This is most easily seen with multiple-pool setups because of the
disconnected fashion pools that come up. The decision to purge the
object as dangling is taken incorrectly prior to the correct state
being achieved on each pool, when the corresponding drive let's say
returns 'errDiskNotFound', a 'delete' is triggered. At this point,
the 'drive' comes online because this is part of the startup sequence
as drives can come online lazily.

This kind of situation exists because we allow (totalDisks/2) number
of drives to be online when the server is being restarted.

Implementation made an incorrect assumption here leading to policies
getting deleted.

Added tests to capture the implementation requirements.
2022-01-07 19:11:54 -08:00
Harshavardhana f527c708f2
run gofumpt cleanup across code-base (#14015) 2022-01-02 09:15:06 -08:00
Harshavardhana 79df2c7ce7
correctly calculate read quorum based on the available fileInfo (#14000)
The current usage of assuming `default` parity of `4` is not correct
for all objects stored on MinIO, objects in .minio.sys have maximum
parity, healing won't trigger on these objects due to incorrect
verification of quorum.
2021-12-28 15:33:03 -08:00
Harshavardhana b883803b21
fix: healing across pools removing dangling objects (#13990)
adds other simplifications to the code when running
namespace heals across pools.
2021-12-25 09:01:44 -08:00
Harshavardhana 7e3a7d7044
add healing for invalid shards by skipping the blocks (#13978)
Built on top of #13945, now we need to simply skip the
shards and its automated.
2021-12-23 23:01:46 -08:00
Harshavardhana 0e3037631f
skip inconsistent shards if possible (#13945)
data shards were wrong due to a healing bug
reported in #13803 mainly with unaligned object
sizes.

This PR is an attempt to automatically avoid
these shards, with available information about
the `xl.meta` and actually disk mtime.
2021-12-21 10:08:26 -08:00
jiangfucheng 7460fb8349
fix padding error and compatible with uploaded objects (#13803) 2021-12-03 09:26:30 -08:00
Harshavardhana 28f95f1fbe
quorum calculation getLatestFileInfo should be itself (#13717)
FileInfo quorum shouldn't be passed down, instead
inferred after obtaining a maximally occurring FileInfo.

This PR also changes other functions that rely on
wrong quorum calculation.

Update tests as well to handle the proper requirement. All
these changes are needed when migrating from older deployments
where we used to set N/2 quorum for reads to EC:4 parity in
newer releases.
2021-11-22 09:36:29 -08:00
Harshavardhana c791de0e1e
re-implement pickValidInfo dataDir, move to quorum calculation (#13681)
dataDir loosely based on maxima is incorrect and does not
work in all situations such as disks in the following order

- xl.json migration to xl.meta there may be partial xl.json's
  leftover if some disks are not yet connected when the disk
  is yet to come up, since xl.json mtime and xl.meta is
  same the dataDir maxima doesn't work properly leading to
  quorum issues.

- its also possible that XLV1 might be true among the disks
  available, make sure to keep FileInfo based on common quorum
  and skip unexpected disks with the older data format.

Also, this PR tests upgrade from older to a newer release if the 
data is readable and matches the checksum.

NOTE: this is just initial work we can build on top of this to do further tests.
2021-11-21 10:41:30 -08:00
Harshavardhana 4545ecad58
ignore swapped drives instead of throwing errors (#13655)
- add checks such that swapped disks are detected
  and ignored - never used for normal operations.

- implement `unrecognizedDisk` to be ignored with
  all operations returning `errDiskNotFound`.

- also add checks such that we do not load unexpected
  disks while connecting automatically.

- additionally humanize the values when printing the errors.

Bonus: fixes handling of non-quorum situations in
getLatestFileInfo(), that does not work when 2 drives
are down, currently this function would return errors
incorrectly.
2021-11-15 09:46:55 -08:00
Harshavardhana 94d587e6fc
fix: delete-markers without quorum were unreadable (#13351)
DeleteMarkers were unreadable if they had quorum based
guarantees, this PR tries to fix this behavior appropriately.

DeleteMarkers with sufficient should be allowed and the
return error should be accordingly with or without version-id.

This also allows for overwrites which may not be possible
in a multi-pool setup.

fixes #12787
2021-10-04 08:53:38 -07:00
Anis Elleuch 1d9e91e00f
Fix wrong reporting of total disks after restart (#13326)
A restart of the cluster and a failed disk will wrongly count 
the number of total disks.
2021-09-29 11:36:19 -07:00
Klaus Post 0e7fdcee30
Healing: Decide healing inlining based on metadata (#13178)
Don't perform an independent evaluation of inlining, but mirror the decision made when uploading the object.

Leads to some objects being inlined or not based on new metrics. Instead respect previous decision.
2021-09-09 08:55:43 -07:00
Harshavardhana a19e3bc9d9
add more dangling heal related tests (#13140)
also make sure that HealObject() never returns
'ObjectNotFound' or 'VersionNotFound' errors,
as those are meaningless and not useful for the
caller.
2021-09-02 20:56:13 -07:00
Krishnan Parthasarathi db35bcf2ce
heal: Remove transitioned objects' parts from outdated disks (#13018)
Bonus: check equality for replication and other metadata
2021-08-23 13:14:55 -07:00
Harshavardhana 035882d292
fix: remove parentIsObject() check (#12851)
we will allow situations such as

```
a/b/1.txt
a/b
```

and

```
a/b
a/b/1.txt
```

we are going to document that this usecase is
not supported and we will never support it, if
any application does this users have to delete
the top level parent to make sure namespace is
accessible at lower level.

rest of the situations where the prefixes get
created across sets are supported as is.
2021-08-03 13:26:57 -07:00
Harshavardhana a9d9b520ec
remove short circuited healing optimization (#12796)
this healing optimization caused multiple
regressions in healing

- delete-markers incorrectly missing
  heal and returning incorrect healing
  results to client.

- missing individual 'parts' such
  as for restored object or simply
  for all objects just missing few parts.

This optimization is not necessary, we
should proceed to verify all cases possible
not just when metadata is inconsistent.
2021-07-26 16:51:09 -07:00
Harshavardhana a3f7d575e0
improve delete-marker healing (#12794)
delete-markers missing on drives were
not healed due to few things

disksWithAllParts() does not know-how
to deal with delete markers, add support
for that.

fixes #12787
2021-07-26 11:48:09 -07:00
Harshavardhana f175ff8f66
add healing fixes for delete-marker (#12788)
- delete-markers are incorrectly reported
  as corrupt with wrong data sent to client
  'mc admin heal -r' on objects with delete
  marker will report as 'grey' incorrectly.

- do not heal delete-markers during HeadObject()
  this can lead to inconsistent order of heals
  on the object, although this is not an issue
  in terms of order of versions it is rather
  simpler to keep the same order on all drives.

- defaultHealResult() should handle 'err == nil'
  case such that valid cases should be handled
  as 'drive' status OK.
2021-07-26 08:01:41 -07:00
Harshavardhana 542fe4ea2e
fix: legacy objects with 10MiB blockSize should use right buffers (#12459)
healing code was using incorrect buffers to heal older
objects with 10MiB erasure blockSize, incorrect calculation
of such buffers can lead to incorrect premature closure of
io.Pipe() during healing.

fixes #12410
2021-06-07 10:06:06 -07:00
Harshavardhana 1f262daf6f
rename all remaining packages to internal/ (#12418)
This is to ensure that there are no projects
that try to import `minio/minio/pkg` into
their own repo. Any such common packages should
go to `https://github.com/minio/pkg`
2021-06-01 14:59:40 -07:00
Harshavardhana b5ebfd35b4
fix: always prefer DataBlocks present in FileInfo (#12386) 2021-05-27 10:11:50 -07:00
Klaus Post 3fff50120b
Revert heal locks (#12365)
A lot of healing is likely to be on non-existing objects and 
locks are very expensive and will slow down scanning 
significantly.

In cases where all are valid or, all are broken allow 
rejection without locking.

Keep the existing behavior, but move the check for 
dangling objects to after the lock has been acquired.

```
	_, err = getLatestFileInfo(ctx, partsMetadata, errs)
	if err != nil {
		return er.purgeObjectDangling(ctx, bucket, object, versionID, partsMetadata, errs, []error{}, opts)
	}
```

Revert "heal: Hold lock when reading xl.meta from disks (#12362)"

This reverts commit abd32065aa
2021-05-25 17:02:06 -07:00
Harshavardhana 4840974d7a
fix: inline data upon overwrites should be readable (#12369)
This PR fixes two bugs

- Remove fi.Data upon overwrite of objects from inlined-data to non-inlined-data
- Workaround for an existing bug on disk with latest releases to ignore fi.Data
  and instead read from the disk for non-inlined-data
- Addtionally add a reserved metadata header to indicate data is inlined for
  a given version.
2021-05-25 16:33:06 -07:00
Harshavardhana ed4941a5f3
fix: calculate dataBlocks properly in healing (#12364) 2021-05-25 09:34:27 -07:00
Anis Elleuch abd32065aa
heal: Hold lock when reading xl.meta from disks (#12362)
Lock is hold in healObject() after reading xl.meta from disks the first
time. This commit will held the lock since the beginning of HealObject()

Co-authored-by: Anis Elleuch <anis@min.io>
2021-05-24 13:39:38 -07:00
Klaus Post cde6469b88
Fix hanging erasure writes (#12253)
However, this slice is also used for closing the writers, so close is never called on these.

Furthermore when an error is returned from a write it is now reported to the reader.

bonus: remove unused heal param from `newBitrotWriter`.

* Remove copy, now that we don't mutate.
2021-05-17 08:32:28 -07:00
Harshavardhana f1e479d274
remove more duplicate bloom filter trackers (#12302)
At some places bloom filter tracker was getting
updated for `.minio.sys/tmp` bucket, there is no
reason to update bloom filters for those.

And add a missing bloom filter update for MakeBucket()

Bonus: purge unused function deleteEmptyDir()
2021-05-17 08:25:48 -07:00
Harshavardhana d84261aa6d
fix: ensure proper usage of DataDir (#12300)
- GetObject() should always use a common dataDir to
  read from when it starts reading, this allows the
  code in erasure decoding to have sane expectations.

- Healing should always heal on the common dataDir, this
  allows the code in dangling object detection to purge
  dangling content.

These both situations can happen under certain types of
retries during PUT when server is restarting etc, some
namespace entries might be left over.
2021-05-14 16:50:47 -07:00
Krishnan Parthasarathi 0bab1c1895
Heal restored object contents on disk (#12238) 2021-05-06 16:06:57 -07:00
Harshavardhana 1aa5858543
move madmin to github.com/minio/madmin-go (#12239) 2021-05-06 08:52:02 -07:00
Harshavardhana ff36baeaa7
fix: attempt to drain the ReadFileStream for connection pooling (#12208)
avoid time_wait build up with getObject requests if there are
pending callers and they timeout, can lead to time_wait states

Bonus share the same buffer pool with erasure healing logic,
additionally also fixes a race where parallel readers were
never cleanup during Encode() phase, because pipe.Reader end
was never closed().

Added closer right away upon an error during Encode to make
sure to avoid racy Close() while stream was still being
Read().
2021-05-04 10:12:08 -07:00
Krishnan Parthasarathi 860bf1bab2
Add IsRemote method on FileInfo, ObjectInfo (#12209)
Provides a convenient method to know if an object's contents are in its remote
tier.
2021-05-04 08:40:42 -07:00
Harshavardhana f7a87b30bf Revert "deprecate embedded browser (#12163)"
This reverts commit 736d8cbac4.

Bring contrib files for older contributions
2021-04-30 08:50:39 -07:00
Harshavardhana 64f6020854
fix: cleanup locking, cancel context upon lock timeout (#12183)
upon errors to acquire lock context would still leak,
since the cancel would never be called. since the lock
is never acquired - proactively clear it before returning.
2021-04-29 20:55:21 -07:00
Anis Elleuch 9e797532dc
lock: Always cancel the returned Get(R)Lock context (#12162)
* lock: Always cancel the returned Get(R)Lock context

There is a leak with cancel created inside the locking mechanism. The
cancel purpose was to cancel operations such erasure get/put that are
holding non-refreshable locks.

This PR will ensure the created context.Cancel is passed to the unlock
API so it will cleanup and avoid leaks.

* locks: Avoid returning nil cancel in local lockers

Since there is no Refresh mechanism in the local locking mechanism, we
do not generate a new context or cancel. Currently, a nil cancel
function is returned but this can cause a crash. Return a dummy function
instead.
2021-04-27 16:12:50 -07:00
Harshavardhana 736d8cbac4
deprecate embedded browser (#12163)
https://github.com/minio/console takes over the functionality for the
future object browser development

Signed-off-by: Harshavardhana <harsha@minio.io>
2021-04-27 10:52:12 -07:00
Poorna Krishnamoorthy 4be0f92067
Fix multipart restore to remove part match (#12161)
Part ETags are not available after multipart finalizes, removing this
check as not useful.

Signed-off-by: Poorna Krishnamoorthy <poorna@minio.io>
Co-authored-by: Harshavardhana <harsha@minio.io>
2021-04-26 18:24:06 -07:00
Krishnan Parthasarathi c829e3a13b Support for remote tier management (#12090)
With this change, MinIO's ILM supports transitioning objects to a remote tier.
This change includes support for Azure Blob Storage, AWS S3 compatible object
storage incl. MinIO and Google Cloud Storage as remote tier storage backends.

Some new additions include:

 - Admin APIs remote tier configuration management

 - Simple journal to track remote objects to be 'collected'
   This is used by object API handlers which 'mutate' object versions by
   overwriting/replacing content (Put/CopyObject) or removing the version
   itself (e.g DeleteObjectVersion).

 - Rework of previous ILM transition to fit the new model
   In the new model, a storage class (a.k.a remote tier) is defined by the
   'remote' object storage type (one of s3, azure, GCS), bucket name and a
   prefix.

* Fixed bugs, review comments, and more unit-tests

- Leverage inline small object feature
- Migrate legacy objects to the latest object format before transitioning
- Fix restore to particular version if specified
- Extend SharedDataDirCount to handle transitioned and restored objects
- Restore-object should accept version-id for version-suspended bucket (#12091)
- Check if remote tier creds have sufficient permissions
- Bonus minor fixes to existing error messages

Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
Co-authored-by: Krishna Srinivas <krishna@minio.io>
Signed-off-by: Harshavardhana <harsha@minio.io>
2021-04-23 11:58:53 -07:00
Harshavardhana 069432566f update license change for MinIO
Signed-off-by: Harshavardhana <harsha@minio.io>
2021-04-23 11:58:53 -07:00
Harshavardhana a7acfa6158
fix: pick valid FileInfo additionally based on dataDir (#12116)
* fix: pick valid FileInfo additionally based on dataDir

historically we have always relied on modTime
to be consistent and same, we can now add additional
reference to look for the same dataDir value.

A dataDir is the same for an object at a given point in
time for a given version, let's say a `null` version
is overwritten in quorum we do not by mistake pick
up the fileInfo's incorrectly.

* make sure to not preserve fi.Data

Signed-off-by: Harshavardhana <harsha@minio.io>
2021-04-21 19:06:08 -07:00
Harshavardhana 2ef824bbb2
collapse two distinct calls into single RenameData() call (#12093)
This is an optimization by reducing one extra system call,
and many network operations. This reduction should increase
the performance for small file workloads.
2021-04-20 10:44:39 -07:00
Klaus Post d267d152ba
healing: re-read metadata after lock (#12004)
Do no use potentially wrong metadata from before acquiring lock.

Plus remove unused NoLock option.
2021-04-07 10:39:48 -07:00