In some cases a writer could be left behind unclosed, leaking compression blocks.
Always close and set compression concurrency to 2 which should be fine to keep up.
Due to https://github.com/philhofer/fwd/issues/20 when skipping a metadata entry that is >2048 bytes and the buffer is full (2048 bytes) the skip will fail with `io.ErrNoProgress`.
Enlarge the buffer so we temporarily make this much more unlikely.
If it still happens we will have to rewrite the skips to reads.
Fixes#10959
dangling object when deleted means object doesn't exist
anymore, so we should return appropriate errors, this
allows crawler heal to ensure that it removes the tracker
for dangling objects.
AZURE_STORAGE_ACCOUNT and AZURE_STORAGE_KEY are used in
azure CLI to specify the azure blob storage access & secret keys. With this commit,
it is possible to set them if you want the gateway's own credentials to be
different from the Azure blob credentials.
Co-authored-by: Harshavardhana <harsha@minio.io>
dangling objects when removed `mc admin heal -r` or crawler
auto heal would incorrectly return error - this can interfere
with usage calculation as the entry size for this would be
returned as `0`, instead upon success use the resultant
object size to calculate the final size for the object
and avoid reporting this in the log messages
Also do not set ObjectSize in healResultItem to be '-1'
this has an effect on crawler metrics calculating 1 byte
less for objects which seem to be missing their `xl.meta`
X-Minio-Replication-Delete-Status header shows the
status of the replication of a permanent delete of a version.
All GETs are disallowed and return 405 on this object version.
In the case of replicating delete markers.
X-Minio-Replication-DeleteMarker-Status shows the status
of replication, and would similarly return 405.
Additionally, this PR adds reporting of delete marker event completion
and updates documentation
Alternative to #10927
Instead of having an upstream fix, do unwrap when checking network errors.
'As' will also work when destination is an interface as checked by the tests.
This PR adds transition support for ILM
to transition data to another MinIO target
represented by a storage class ARN. Subsequent
GET or HEAD for that object will be streamed from
the transition tier. If PostRestoreObject API is
invoked, the transitioned object can be restored for
duration specified to the source cluster.
allow directories to be replicated as well, along with
their delete markers in replication.
Bonus fix to fix bloom filter updates for directories
to be preserved.
fixes a regression introduced in #10859, due
to the error returned by rest.Client being typed
i.e *rest.NetworkError - IsNetworkHostDown function
didn't work as expected to detect network issues.
This in-turn aggravated the situations when nodes
are disconnected leading to performance loss.
Do listings with prefix filter when bloom filter is dirty.
This will forward the prefix filter to the lister which will make it
only scan the folders/objects with the specified prefix.
If we have a clean bloom filter we try to build a more generally
useful cache so in that case, we will list all objects/folders.
Add shortcut for `APN/1.0 Veeam/1.0 Backup/10.0`
It requests unique blocks with a specific prefix. We skip
scanning the parent directory for more objects matching the prefix.
Allow each crawler operation to sleep up to 10 seconds on very heavily loaded systems.
This will of course make minimum crawler speed less, but should be more effective at stopping.
Delete marker replication is implemented for V2
configuration specified in AWS spec (though AWS
allows it only in the V1 configuration).
This PR also brings in a MinIO only extension of
replicating permanent deletes, i.e. deletes specifying
version id are replicated to target cluster.
This will make the health check clients 'silent'.
Use `IsNetworkOrHostDown` determine if network is ok so it mimics the functionality in the actual client.
this is needed such that we make sure to heal the
users, policies and bucket metadata right away as
we do listing based on list cache which only lists
'3' sufficiently good drives, to avoid possibly
losing access to these users upon upgrade make
sure to heal them.
If a scanning server shuts down unexpectedly we may have "successful" caches that are incomplete on a set.
In this case mark the cache with an error so it will no longer be handed out.
Add `MINIO_API_EXTEND_LIST_CACHE_LIFE` that will extend
the life of generated caches for a while.
This changes caches to remain valid until no updates have been
received for the specified time plus a fixed margin.
This also changes the caches from being invalidated when the *first*
set finishes until the *last* set has finished plus the specified time
has passed.
Similar to #10775 for fewer memory allocations, since we use
getOnlineDisks() extensively for listing we should optimize it
further.
Additionally, remove all unused walkers from the storage layer
A new field called AccessKey is added to the ReqInfo struct and populated.
Because ReqInfo is added to the context, this allows the AccessKey to be
accessed from 3rd-party code, such as a custom ObjectLayer.
Co-authored-by: Harshavardhana <harsha@minio.io>
Co-authored-by: Kaloyan Raev <kaloyan@storj.io>
On extremely long running listings keep the transient list 15 minutes after last update instead of using start time.
Also don't do overlap checks on transient lists.
Add trashcan that keeps recently updated lists after bucket deletion.
All caches were deleted once a bucket was deleted, so caches still running would report errors. Now they are canceled.
Fix `.minio.sys` not being transient.
Bonus fixes, remove package retry it is harder to get it
right, also manage context remove it such that we don't have
to rely on it anymore instead use a simple Jitter retry.
WriteAll saw 127GB allocs in a 5 minute timeframe for 4MiB buffers
used by `io.CopyBuffer` even if they are pooled.
Since all writers appear to write byte buffers, just send those
instead and write directly. The files are opened through the `os`
package so they have no special properties anyway.
This removes the alloc and copy for each operation.
REST sends content length so a precise alloc can be made.
this reduces allocations in order of magnitude
Also, revert "erasure: delete dangling objects automatically (#10765)"
affects list caching should be investigated.
Add store and a forward option for a single part
uploads when an async mode is enabled with env
MINIO_CACHE_COMMIT=writeback
It defaults to `writethrough` if unspecified.
Bonus fixes, we do not need reload format anymore
as the replaced drive is healed locally we only need
to ensure that drive heal reloads the drive properly.
We preserve the UUID of the original order, this means
that the replacement in `format.json` doesn't mean that
the drive needs to be reloaded into memory anymore.
fixes#10791
when server is booting up there is a possibility
that users might see '503' because object layer
when not initialized, then the request is proxied
to neighboring peers first one which is online.
* Fix caches having EOF marked as a failure.
* Simplify cache updates.
* Provide context for checkMetacacheState failures.
* Log 499 when the client disconnects.
`decryptObjectInfo` is a significant bottleneck when listing objects.
Reduce the allocations for a significant speedup.
https://github.com/minio/sio/pull/40
```
λ benchcmp before.txt after.txt
benchmark old ns/op new ns/op delta
Benchmark_decryptObjectInfo-32 24260928 808656 -96.67%
benchmark old MB/s new MB/s speedup
Benchmark_decryptObjectInfo-32 0.04 1.24 31.00x
benchmark old allocs new allocs delta
Benchmark_decryptObjectInfo-32 75112 48996 -34.77%
benchmark old bytes new bytes delta
Benchmark_decryptObjectInfo-32 287694772 4228076 -98.53%
```
Design: https://gist.github.com/klauspost/025c09b48ed4a1293c917cecfabdf21c
Gist of improvements:
* Cross-server caching and listing will use the same data across servers and requests.
* Lists can be arbitrarily resumed at a constant speed.
* Metadata for all files scanned is stored for streaming retrieval.
* The existing bloom filters controlled by the crawler is used for validating caches.
* Concurrent requests for the same data (or parts of it) will not spawn additional walkers.
* Listing a subdirectory of an existing recursive cache will use the cache.
* All listing operations are fully streamable so the number of objects in a bucket no
longer dictates the amount of memory.
* Listings can be handled by any server within the cluster.
* Caches are cleaned up when out of date or superseded by a more recent one.