allow directories to be replicated as well, along with
their delete markers in replication.
Bonus fix to fix bloom filter updates for directories
to be preserved.
fixes a regression introduced in #10859, due
to the error returned by rest.Client being typed
i.e *rest.NetworkError - IsNetworkHostDown function
didn't work as expected to detect network issues.
This in-turn aggravated the situations when nodes
are disconnected leading to performance loss.
Do listings with prefix filter when bloom filter is dirty.
This will forward the prefix filter to the lister which will make it
only scan the folders/objects with the specified prefix.
If we have a clean bloom filter we try to build a more generally
useful cache so in that case, we will list all objects/folders.
Add shortcut for `APN/1.0 Veeam/1.0 Backup/10.0`
It requests unique blocks with a specific prefix. We skip
scanning the parent directory for more objects matching the prefix.
Allow each crawler operation to sleep up to 10 seconds on very heavily loaded systems.
This will of course make minimum crawler speed less, but should be more effective at stopping.
Delete marker replication is implemented for V2
configuration specified in AWS spec (though AWS
allows it only in the V1 configuration).
This PR also brings in a MinIO only extension of
replicating permanent deletes, i.e. deletes specifying
version id are replicated to target cluster.
This will make the health check clients 'silent'.
Use `IsNetworkOrHostDown` determine if network is ok so it mimics the functionality in the actual client.
this is needed such that we make sure to heal the
users, policies and bucket metadata right away as
we do listing based on list cache which only lists
'3' sufficiently good drives, to avoid possibly
losing access to these users upon upgrade make
sure to heal them.
If a scanning server shuts down unexpectedly we may have "successful" caches that are incomplete on a set.
In this case mark the cache with an error so it will no longer be handed out.
Add `MINIO_API_EXTEND_LIST_CACHE_LIFE` that will extend
the life of generated caches for a while.
This changes caches to remain valid until no updates have been
received for the specified time plus a fixed margin.
This also changes the caches from being invalidated when the *first*
set finishes until the *last* set has finished plus the specified time
has passed.
Similar to #10775 for fewer memory allocations, since we use
getOnlineDisks() extensively for listing we should optimize it
further.
Additionally, remove all unused walkers from the storage layer
A new field called AccessKey is added to the ReqInfo struct and populated.
Because ReqInfo is added to the context, this allows the AccessKey to be
accessed from 3rd-party code, such as a custom ObjectLayer.
Co-authored-by: Harshavardhana <harsha@minio.io>
Co-authored-by: Kaloyan Raev <kaloyan@storj.io>
On extremely long running listings keep the transient list 15 minutes after last update instead of using start time.
Also don't do overlap checks on transient lists.
Add trashcan that keeps recently updated lists after bucket deletion.
All caches were deleted once a bucket was deleted, so caches still running would report errors. Now they are canceled.
Fix `.minio.sys` not being transient.
Bonus fixes, remove package retry it is harder to get it
right, also manage context remove it such that we don't have
to rely on it anymore instead use a simple Jitter retry.
WriteAll saw 127GB allocs in a 5 minute timeframe for 4MiB buffers
used by `io.CopyBuffer` even if they are pooled.
Since all writers appear to write byte buffers, just send those
instead and write directly. The files are opened through the `os`
package so they have no special properties anyway.
This removes the alloc and copy for each operation.
REST sends content length so a precise alloc can be made.
this reduces allocations in order of magnitude
Also, revert "erasure: delete dangling objects automatically (#10765)"
affects list caching should be investigated.
Add store and a forward option for a single part
uploads when an async mode is enabled with env
MINIO_CACHE_COMMIT=writeback
It defaults to `writethrough` if unspecified.
Bonus fixes, we do not need reload format anymore
as the replaced drive is healed locally we only need
to ensure that drive heal reloads the drive properly.
We preserve the UUID of the original order, this means
that the replacement in `format.json` doesn't mean that
the drive needs to be reloaded into memory anymore.
fixes#10791
when server is booting up there is a possibility
that users might see '503' because object layer
when not initialized, then the request is proxied
to neighboring peers first one which is online.
* Fix caches having EOF marked as a failure.
* Simplify cache updates.
* Provide context for checkMetacacheState failures.
* Log 499 when the client disconnects.
`decryptObjectInfo` is a significant bottleneck when listing objects.
Reduce the allocations for a significant speedup.
https://github.com/minio/sio/pull/40
```
λ benchcmp before.txt after.txt
benchmark old ns/op new ns/op delta
Benchmark_decryptObjectInfo-32 24260928 808656 -96.67%
benchmark old MB/s new MB/s speedup
Benchmark_decryptObjectInfo-32 0.04 1.24 31.00x
benchmark old allocs new allocs delta
Benchmark_decryptObjectInfo-32 75112 48996 -34.77%
benchmark old bytes new bytes delta
Benchmark_decryptObjectInfo-32 287694772 4228076 -98.53%
```
Design: https://gist.github.com/klauspost/025c09b48ed4a1293c917cecfabdf21c
Gist of improvements:
* Cross-server caching and listing will use the same data across servers and requests.
* Lists can be arbitrarily resumed at a constant speed.
* Metadata for all files scanned is stored for streaming retrieval.
* The existing bloom filters controlled by the crawler is used for validating caches.
* Concurrent requests for the same data (or parts of it) will not spawn additional walkers.
* Listing a subdirectory of an existing recursive cache will use the cache.
* All listing operations are fully streamable so the number of objects in a bucket no
longer dictates the amount of memory.
* Listings can be handled by any server within the cluster.
* Caches are cleaned up when out of date or superseded by a more recent one.
only newly replaced drives get the new `format.json`,
this avoids disks reloading their in-memory reference
format, ensures that drives are online without
reloading the in-memory reference format.
keeping reference format in-tact means UUIDs
never change once they are formatted.
lockers currently might leave stale lockers,
in unknown ways waiting for downed lockers.
locker check interval is high enough to safely
cleanup stale locks.
reference format should be source of truth
for inconsistent drives which reconnect,
add them back to their original position
remove automatic fix for existing offline
disk uuids
Bonus fixes
- logging improvements to ensure that we don't use
`go logger.LogIf` to avoid runtime.Caller missing
the function name. log where necessary.
- remove unused code at erasure sets
Test TestDialContextWithDNSCacheRand was failing sometimes because it depends
on a random selection of addresses when testing random DNS resolution from cache.
Lower addr selection exception to 10%
Allow requests to come in for users as soon as object
layer and config are initialized, this allows users
to be authenticated sooner and would succeed automatically
on servers which are yet to fully initialize.