For a non-tiered object, MinIO requires that EcM (# of data blocks) of
xl.meta agree, corresponding to the number of data blocks needed to
read this object.
OTOH, tiered objects have metadata in the hot tier and data in the
warm tier. The data and its integrity are offloaded to the warm tier. This
allows us to reduce the read quorum from EcM (typically > N/2, where N -
erasure stripe width) to N/2 + 1. The simple majority of metadata
ensures consensus on what the object is and where it is
located.
Since the object is being permanently deleted, the lack of read quorum should not
matter as long as sufficient disks are online to complete the deletion with parity
requirements.
If several pools have the same object with insufficient read quorum, attempt to
delete object from all the pools where it exists
- Use a shared worker pool for all ILM expiry tasks
- Free version cleanup executes in a separate goroutine
- Add a free version only if removing the remote object fails
- Add ILM expiry metrics to the node namespace
- Move tier journal tasks to expiryState
- Remove unused on-disk journal for tiered objects pending deletion
- Distribute expiry tasks across workers such that the expiry of versions of
the same object serialized
- Ability to resize worker pool without server restart
- Make scaling down of expiryState workers' concurrency safe; Thanks
@klauspost
- Add error logs when expiryState and transition state are not
initialized (yet)
* metrics: Add missed tier journal entry tasks
* Initialize the ILM worker pool after the object layer
on "unversioned" buckets there are situations
when successive concurrent I/O can lead to
an inconsistent state() with mtime while the
etag might be the same for the object on disk.
in such a scenario it is possible for us to
allow reading of the object since etag matches
and if etag matches we are guaranteed that we
have enough copies the object will be readable
and same.
This PR allows fallback in such scenarios.
dataDir loosely based on maxima is incorrect and does not
work in all situations such as disks in the following order
- xl.json migration to xl.meta there may be partial xl.json's
leftover if some disks are not yet connected when the disk
is yet to come up, since xl.json mtime and xl.meta is
same the dataDir maxima doesn't work properly leading to
quorum issues.
- its also possible that XLV1 might be true among the disks
available, make sure to keep FileInfo based on common quorum
and skip unexpected disks with the older data format.
Also, this PR tests upgrade from older to a newer release if the
data is readable and matches the checksum.
NOTE: this is just initial work we can build on top of this to do further tests.
* fix: pick valid FileInfo additionally based on dataDir
historically we have always relied on modTime
to be consistent and same, we can now add additional
reference to look for the same dataDir value.
A dataDir is the same for an object at a given point in
time for a given version, let's say a `null` version
is overwritten in quorum we do not by mistake pick
up the fileInfo's incorrectly.
* make sure to not preserve fi.Data
Signed-off-by: Harshavardhana <harsha@minio.io>
- Implement a new xl.json 2.0.0 format to support,
this moves the entire marshaling logic to POSIX
layer, top layer always consumes a common FileInfo
construct which simplifies the metadata reads.
- Implement list object versions
- Migrate to siphash from crchash for new deployments
for object placements.
Fixes#2111