ListObjects() should never list a delete-marked folder
if latest is delete marker and delimiter is not provided.
ListObjectVersions() should list a delete-marked folder
even if latest is delete marker and delimiter is not
provided.
Enhance further versioning listing on the buckets
delete marked objects should not be considered
for listing when listing is delimited, this issue
as introduced in PR #13804 which was mainly to
address listing of directories in listing when
delimited.
This PR fixes this properly and adds tests to
ensure that we behave in accordance with how
an S3 API behaves for ListObjects() without
versions.
Following scenario such as objects that exist inside a
prefix say `folder/` must be included in the listObjects()
response.
```
2aa16073-387e-492c-9d59-b4b0b7b6997a v2 DEL folder/
a5b9ce68-7239-4921-90ab-20aed402c7a2 v1 PUT folder/
f2211798-0eeb-4d9e-9184-fcfeae27d069 v1 PUT folder/1.txt
```
Current master does not handle this scenario, because it
ignores the top level delete-marker on folders. This is
however unexpected. It is expected that list-objects returns
the top level prefix in this situation.
```
aws s3api list-objects --bucket harshavardhana --prefix unique/ \
--delimiter / --profile minio --endpoint-url http://localhost:9000
{
"CommonPrefixes": [
{
"Prefix": "unique/folder/"
}
]
}
```
There are applications in the wild such as Hadoop s3a connector
that exploit this behavior and expect the folder to be present
in the response.
This also makes the behavior consistent with AWS S3.
single object delete was not working properly
on a bucket when versioning was suspended,
current version 'null' object was never removed.
added unit tests to cover the behavior
fixes#13783
- add checks such that swapped disks are detected
and ignored - never used for normal operations.
- implement `unrecognizedDisk` to be ignored with
all operations returning `errDiskNotFound`.
- also add checks such that we do not load unexpected
disks while connecting automatically.
- additionally humanize the values when printing the errors.
Bonus: fixes handling of non-quorum situations in
getLatestFileInfo(), that does not work when 2 drives
are down, currently this function would return errors
incorrectly.
- avoids relying in listQuorum from the underlying listObjects()
and potentially missing entries if any.
- avoid the entire merging logic etc, listing raw set by set
and loading whatever is found is cleaner when dealing with
a large cluster for IAM metadata.
In (erasureServerPools).MakeBucketWithLocation deletes the created
buckets if any set returns an error.
Add `NoRecreate` option, which will not recreate the bucket
in `DeleteBucket`, if the operation fails.
Additionally use context.Background() for operations we always want to be performed.
bucket deletes should purge entire bucket metadata
appropriately, use rename() to move the metadata files
to trash folder,
for dangling buckets instead of doing recursive deletes,
rename such buckets to trash folder as well.
Bonus: reduce retry duration for listing to 200ms
This was a regression introduced in '14bb969782'
this has the potential to cause corruption when
there are concurrent overwrites attempting to update
the content on the namespace.
This PR adds a situation where PutObject(), CopyObject()
compete properly for the same locks with NewMultipartUpload()
however it ends up turning off competing locks for the actual
object with GetObject() and DeleteObject() - since they do not
compete due to concurrent I/O on a versioned bucket it can lead
to loss of versions.
This PR fixes this bug with multi-pool setup with replication
that causes corruption of inlined data due to lack of competing
locks in a multi-pool setup.
Instead CompleteMultipartUpload holds the necessary
locks when finishing the transaction, knowing the exact
location of an object to schedule the multipart upload
doesn't need to compete in this manner, a pool id location
for existing object.
also remove HealObjects() code from dataScanner running another
listing from the data-scanner is super in-efficient and in-fact
this code is redundant since we already attempt to heal all
dangling objects anyways.
healObject() should be non-blocking to ensure
that scanner is not blocked for a long time,
this adversely affects performance of the scanner
and also affects the way usage is updated
subsequently.
This PR allows for a non-blocking behavior for
healing, dropping operations that cannot be queued
anymore.
Synchronize bucket cycles so it is much more
likely that the same prefixes will be picked up
for scanning.
Use the global bloom filter cycle for that.
Bump bloom filter versions to clear those.
Ensure that one call will succeed and others will serialize
Example failure without code in place:
```
bucket-policy-handlers_test.go:120: unexpected error: cmd.InsufficientWriteQuorum: Storage resources are insufficient for the write operation doz2wjqaovp5kvlrv11fyacowgcvoziszmkmzzz9nk9au946qwhci4zkane5-1/
bucket-policy-handlers_test.go:120: unexpected error: cmd.InsufficientWriteQuorum: Storage resources are insufficient for the write operation doz2wjqaovp5kvlrv11fyacowgcvoziszmkmzzz9nk9au946qwhci4zkane5-1/
bucket-policy-handlers_test.go:135: want 1 ok, got 0
```
Some applications albeit poorly written rather than using headObject
rely on listObjects to check for existence of object, this unusual
request always has prefix=(to actual object) and max-keys=1
handle this situation specially such that we can avoid readdir()
on the top level parent to avoid sorting and skipping, ensuring
that such type of listObjects() always behaves similar to a
headObject() call.
- deletes should always Sweep() for tiering at the
end and does not need an extra getObjectInfo() call
- puts, copy and multipart writes should conditionally
do getObjectInfo() when tiering targets are configured
- introduce 'TransitionedObject' struct for ease of usage
and understanding.
- multiple-pools optimization deletes don't need to hold
read locks verifying objects across namespace and pools.
improvements include
- skip IPv6 correctly
- do not set default value for
MINIO_SERVER_URL, let it be
configured if not use local IPs
Bonus:
- In healing return error from listPathRaw()
- update console to v0.8.3
Download files from *any* bucket/path as an encrypted zip file.
The key is included in the response but can be separated so zip
and the key doesn't have to be sent on the same channel.
Requires https://github.com/minio/pkg/pull/6
Create write lock on PutObject and CopyObject when on multi-pool setup.
Use the same lock as NewMultipartUpload so all creation calls share the same lock.
its possible that, version might exist on second pool such that
upon deleteBucket() might have deleted the bucket on pool1 successfully
since it doesn't have any objects, undo such operations properly in
all any error scenario.
Also delete bucket metadata from pool layer rather than sets layer.
- for single pool setups usage is not checked.
- for pools, only check the "set" in which it would be placed.
- keep a minimum number of inodes (when we know it).
- ignore for `.minio.sys`.
This is to ensure that there are no projects
that try to import `minio/minio/pkg` into
their own repo. Any such common packages should
go to `https://github.com/minio/pkg`
It is possible in some scenarios that in multiple pools,
two concurrent calls for the same object as a multipart operation
can lead to duplicate entries on two different pools.
This PR fixes this
- hold locks to serialize multiple callers so that we don't race.
- make sure to look for existing objects on the namespace as well
not just for existing uploadIDs
upon errors to acquire lock context would still leak,
since the cancel would never be called. since the lock
is never acquired - proactively clear it before returning.
* lock: Always cancel the returned Get(R)Lock context
There is a leak with cancel created inside the locking mechanism. The
cancel purpose was to cancel operations such erasure get/put that are
holding non-refreshable locks.
This PR will ensure the created context.Cancel is passed to the unlock
API so it will cleanup and avoid leaks.
* locks: Avoid returning nil cancel in local lockers
Since there is no Refresh mechanism in the local locking mechanism, we
do not generate a new context or cancel. Currently, a nil cancel
function is returned but this can cause a crash. Return a dummy function
instead.
With this change, MinIO's ILM supports transitioning objects to a remote tier.
This change includes support for Azure Blob Storage, AWS S3 compatible object
storage incl. MinIO and Google Cloud Storage as remote tier storage backends.
Some new additions include:
- Admin APIs remote tier configuration management
- Simple journal to track remote objects to be 'collected'
This is used by object API handlers which 'mutate' object versions by
overwriting/replacing content (Put/CopyObject) or removing the version
itself (e.g DeleteObjectVersion).
- Rework of previous ILM transition to fit the new model
In the new model, a storage class (a.k.a remote tier) is defined by the
'remote' object storage type (one of s3, azure, GCS), bucket name and a
prefix.
* Fixed bugs, review comments, and more unit-tests
- Leverage inline small object feature
- Migrate legacy objects to the latest object format before transitioning
- Fix restore to particular version if specified
- Extend SharedDataDirCount to handle transitioned and restored objects
- Restore-object should accept version-id for version-suspended bucket (#12091)
- Check if remote tier creds have sufficient permissions
- Bonus minor fixes to existing error messages
Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
Co-authored-by: Krishna Srinivas <krishna@minio.io>
Signed-off-by: Harshavardhana <harsha@minio.io>
avoid potential for duplicates under multi-pool
setup, additionally also make sure CompleteMultipart
is using a more optimal API for uploadID lookup
and never delete the object there is a potential
to create a delete marker during complete multipart.
Signed-off-by: Harshavardhana <harsha@minio.io>
Current implementation heavily relies on readAllFileInfo
but with the advent of xl.meta inlined with data, we cannot
easily avoid reading data when we are only interested is
updating metadata, this leads to invariably write
amplification during metadata updates, repeatedly reading
data when we are only interested in updating metadata.
This PR ensures that we implement a metadata only update
API at storage layer, that handles updates to metadata alone
for any given version - given the version is valid and
present.
This helps reduce the chattiness for following calls..
- PutObjectTags
- DeleteObjectTags
- PutObjectLegalHold
- PutObjectRetention
- ReplicateObject (updates metadata on replication status)
- collect real time replication metrics for prometheus.
- add pending_count, failed_count metric for total pending/failed replication operations.
- add API to get replication metrics
- add MRF worker to handle spill-over replication operations
- multiple issues found with replication
- fixes an issue when client sends a bucket
name with `/` at the end from SetRemoteTarget
API call make sure to trim the bucket name to
avoid any extra `/`.
- hold write locks in GetObjectNInfo during replication
to ensure that object version stack is not overwritten
while reading the content.
- add additional protection during WriteMetadata() to
ensure that we always write a valid FileInfo{} and avoid
ever writing empty FileInfo{} to the lowest layers.
Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
Co-authored-by: Harshavardhana <harsha@minio.io>
baseDirFromPrefix(prefix) for object names without
parent directory incorrectly uses empty path, leading
to long listing at various paths that are not useful
for healing - avoid this listing completely if "baseDir"
returns empty simple use the "prefix" as is.
this improves startup performance significantly