Design: https://gist.github.com/klauspost/025c09b48ed4a1293c917cecfabdf21c
Gist of improvements:
* Cross-server caching and listing will use the same data across servers and requests.
* Lists can be arbitrarily resumed at a constant speed.
* Metadata for all files scanned is stored for streaming retrieval.
* The existing bloom filters controlled by the crawler is used for validating caches.
* Concurrent requests for the same data (or parts of it) will not spawn additional walkers.
* Listing a subdirectory of an existing recursive cache will use the cache.
* All listing operations are fully streamable so the number of objects in a bucket no
longer dictates the amount of memory.
* Listings can be handled by any server within the cluster.
* Caches are cleaned up when out of date or superseded by a more recent one.
This PR fixes a hang which occurs quite commonly at higher concurrency
by allowing following changes
- allowing lower connections in time_wait allows faster socket open's
- lower idle connection timeout to ensure that we let kernel
reclaim the time_wait connections quickly
- increase somaxconn to 4096 instead of 2048 to allow larger tcp
syn backlogs.
fixes#10413
Also, revamp the way ListBuckets work make few portions
of the healing logic parallel
- walk objects for healing disks in parallel
- collect the list of buckets in parallel across drives
- provide consistent view for listBuckets()
Healing was not working correctly in the distributed mode because
errFileVersionNotFound was not properly converted in storage rest
client.
Besides, fixing the healing delete marker is not working as expected.
Add context to all (non-trivial) calls to the storage layer.
Contexts are propagated through the REST client.
- `context.TODO()` is left in place for the places where it needs to be added to the caller.
- `endWalkCh` could probably be removed from the walkers, but no changes so far.
The "dangerous" part is that now a caller disconnecting *will* propagate down, so a
"delete" operation will now be interrupted. In some cases we might want to disconnect
this functionality so the operation completes if it has started, leaving the system in a cleaner state.
this is to detect situations of corruption disk
format etc errors quickly and keep the disk online
in such scenarios for requests to fail appropriately.
This PR adds support for healing older
content i.e from 2yrs, 1yr. Also handles
other situations where our config was
not encrypted yet.
This PR also ensures that our Listing
is consistent and quorum friendly,
such that we don't list partial objects
healing was not working properly when drives were
replaced, due to the error check in root disk
calculation this PR fixes this behavior
This PR also adds additional fix for missing
metadata entries from .minio.sys as part of
disk healing as well.
Added code to ignore and print more context
sensitive errors for better debugging.
This PR is continuation of fix in 7b14e9b660
Healing an object which has multiple versions was not working because
the healing code forgot to consider errFileVersionNotFound error as a
use case that needs healing
- Implement a new xl.json 2.0.0 format to support,
this moves the entire marshaling logic to POSIX
layer, top layer always consumes a common FileInfo
construct which simplifies the metadata reads.
- Implement list object versions
- Migrate to siphash from crchash for new deployments
for object placements.
Fixes#2111