we do not need to hold the read locks at the higher
layer instead before reading the body, instead hold
the read locks properly at the time of renamePart()
for protection from racy part overwrites to compete
with concurrent completeMultipart().
`mc admin heal ALIAS/bucket/object` does not have any flag to heal
object noncurrent versions, this commit will make healing of the object
noncurrent versions implicitly asked.
This also fixes the 'mc admin heal ALIAS/bucket/object' that does not work
correctly when the bucket is versioned. This has been broken since Apr 2023.
The code assigns corrupted state to a drive for any unexpected error,
which is confusing for users. This change will make sure to assign
corrupted state only for corrupted parts or xl.meta. Use unknown state
with a explanation for any unexpected error, like canceled, deadline
errors, drive timeout, ...
Also make sure to return the bucket/object name when the object is not
found or marked not found by the heal dangling code.
- PutObjectMetadata()
- PutObjectTags()
- DeleteObjectTags()
- TransitionObject()
- RestoreTransitionObject()
Also improve the behavior of multipart code across
pool locks, hold locks only once per upload ID for
- CompleteMultipartUpload()
- AbortMultipartUpload()
- ListObjectParts() (read-lock)
- GetMultipartInfo() (read-lock)
- PutObjectPart() (read-lock)
This avoids lock attempts across pools for no
reason, this increases O(n) when there are n-pools.
- PutObject() for multi-pooled was holding large
region locks, which was not necessary. This affects
almost all slowpoke clients and lengthy uploads.
- Re-arrange locks for CompleteMultipart, PutObject
to be close to rename()
this cache will be honored only when `prefix=""` while
performing ListMultipartUploads() operation.
This is mainly to satisfy applications like alluxio
for their underfs implementation and tests.
replaces https://github.com/minio/minio/pull/20181
locks handed by different pools would become non-compete for
multi-object delete request, this is wrong for obvious
reasons.
New locking implementation and revamp will rewrite multi-object
lock anyway, this is a workaround for now.
Use Walk(), which is a recursive listing with versioning, to check if
the bucket has some objects before being removed. This is beneficial
because the bucket can contain multiple dangling objects in multiple
drives.
Also, this will prevent a bug where a bucket is deleted in a deployment
that has many erasure sets but the bucket contains one or few objects
not spread to enough erasure sets.
during rebalance stop, it can possibly happen that
Put() would race by overwriting the same object again.
This may very well if done "successfully" it can
potentially proceed to delete the object from the pool,
causing data loss.
This PR enhances #20233 to handle more scenarios such
as these.
Rebalance-stop can race with ongoing rebalance operations. This change
prevents these operations from overwriting objects by checking the source
and destination pool indices are different.
When a drive is in a failed state when a single node multiple drives
deployment is started, a replacement of a fresh disk will not be
properly healed unless the user restarts the node.
Fix this by always adding the new fresh disk to globalLocalDrivesMap. Also
remove globalLocalDrives for simplification, a map to store local node
drives can still be used since the order of local drives of a node is
not defined.
Currently, bucket metadata is being loaded serially inside ListBuckets
Objet API. Fix that by loading the bucket metadata as the number of
erasure sets * 10, which is a good approximation.
skip healing properly in scanner when drive is hotplugged
due to how the state is passed around the SkipHealing
might not be the true state() of the system always, causing
a situation where we might healing from the scanner on the
same drive which is being. Due to this competing heals get
triggered that slow each other down.
since #19688 there was a regression introduced during drive
lookups for single node multi-drive setups, drive replacement
would not work correctly without this PR.
This change uses the updated ldap library in minio/pkg (bumped
up to v3). A new config parameter is added for LDAP configuration to
specify extra user attributes to load from the LDAP server and to store
them as additional claims for the user.
A test is added in sts_handlers.go that shows how to access the LDAP
attributes as a claim.
This is in preparation for adding SSH pubkey authentication to MinIO's SFTP
integration.
Recent Veeam is very picky about storage class names. Add `_MINIO_VEEAM_FORCE_SC` env var.
It will override the storage class returned by the storage backend if it is non-standard
and we detect a Veeam client by checking the User Agent.
Applies to HeadObject/GetObject/ListObject*
canceled callers might linger around longer,
can potentially overwhelm the system. Instead
provider a caller context and canceled callers
don't hold on to them.
Bonus: we have no reason to cache errors, we should
never cache errors otherwise we can potentially have
quorum errors creeping in unexpectedly. We should
let the cache when invalidating hit the actual resources
instead.
If used, 'opts.Marker` will cause many missed entries since results are returned
unsorted, and pools are serialized.
Switch to fully concurrent listing and merging across pools to return sorted entries.
'opts.Marker` is causing many missed entries if used since results are returned unsorted. Also since pools are serialized.
Switch to do fully concurrent listing and merging across pools to return sorted entries.
Returning errors on listings is impossible with the current API, so document that.
Return an error at once if no drives are found instead of just returning an empty listing and no error.
Since the object is being permanently deleted, the lack of read quorum should not
matter as long as sufficient disks are online to complete the deletion with parity
requirements.
If several pools have the same object with insufficient read quorum, attempt to
delete object from all the pools where it exists
Create new code paths for multiple subsystems in the code. This will
make maintaing this easier later.
Also introduce bugLogIf() for errors that should not happen in the first
place.
- Use a shared worker pool for all ILM expiry tasks
- Free version cleanup executes in a separate goroutine
- Add a free version only if removing the remote object fails
- Add ILM expiry metrics to the node namespace
- Move tier journal tasks to expiryState
- Remove unused on-disk journal for tiered objects pending deletion
- Distribute expiry tasks across workers such that the expiry of versions of
the same object serialized
- Ability to resize worker pool without server restart
- Make scaling down of expiryState workers' concurrency safe; Thanks
@klauspost
- Add error logs when expiryState and transition state are not
initialized (yet)
* metrics: Add missed tier journal entry tasks
* Initialize the ILM worker pool after the object layer