- acquire since leader lock for all background operations
- healing, crawling and applying lifecycle policies.
- simplify lifecyle to avoid network calls, which was a
bug in implementation - we should hold a leader and
do everything from there, we have access to entire
name space.
- make listing, walking not interfere by slowing itself
down like the crawler.
- effectively use global context everywhere to ensure
proper shutdown, in cache, lifecycle, healing
- don't read `format.json` for prometheus metrics in
StorageInfo() call.
Change distributed locking to allow taking bulk locks
across objects, reduces usually 1000 calls to 1.
Also allows for situations where multiple clients sends
delete requests to objects with following names
```
{1,2,3,4,5}
```
```
{5,4,3,2,1}
```
will block and ensure that we do not fail the request
on each other.
This PR implements locking from a global entity into
a more localized set level entity, allowing for locks
to be held only on the resources which are writing
to a collection of disks rather than a global level.
In this process this PR also removes the top-level
limit of 32 nodes to an unlimited number of nodes. This
is a precursor change before bring in bucket expansion.
This PR refactors object layer handling such
that upon failure in sub-system initialization
server reaches a stage of safe-mode operation
wherein only certain API operations are enabled
and available.
This allows for fixing many scenarios such as
- incorrect configuration in vault, etcd,
notification targets
- missing files, incomplete config migrations
unable to read encrypted content etc
- any other issues related to notification,
policies, lifecycle etc
- This PR allows config KVS to be validated properly
without being affected by ENV overrides, rejects
invalid values during set operation
- Expands unit tests and refactors the error handling
for notification targets, returns error instead of
ignoring targets for invalid KVS
- Does all the prep-work for implementing safe-mode
style operation for MinIO server, introduces a new
global variable to toggle safe mode based operations
NOTE: this PR itself doesn't provide safe mode operations
- ListObjectsHeal should list only objects
which need healing, not the entire namespace.
- DeleteObjects() to be used to delete 1000s of
objects in bulk instead of serially.
Add better dynamic timeouts for locks, also
add jitters before launching daily sweep to ensure
that not all the servers in distributed setup
are not trying to hold locks to begin the sweep
round.
Also, add enough delay for incoming requests based
on totalSetCount*totalDriveCount.
A possible fix for #8071