- Implement a new xl.json 2.0.0 format to support,
this moves the entire marshaling logic to POSIX
layer, top layer always consumes a common FileInfo
construct which simplifies the metadata reads.
- Implement list object versions
- Migrate to siphash from crchash for new deployments
for object placements.
Fixes#2111
Manual healing (as background healing) creates a heal task with a
possiblity to override healing options, such as deep or normal mode.
Use a pointer type in heal opts so nil would mean use the default
healing options.
In large setups this avoids unnecessary data transfer
across nodes and potential locks.
This PR also optimizes heal result channel, which should
be avoided for each queueHealTask as its expensive
to create/close channels for large number of objects.
- acquire since leader lock for all background operations
- healing, crawling and applying lifecycle policies.
- simplify lifecyle to avoid network calls, which was a
bug in implementation - we should hold a leader and
do everything from there, we have access to entire
name space.
- make listing, walking not interfere by slowing itself
down like the crawler.
- effectively use global context everywhere to ensure
proper shutdown, in cache, lifecycle, healing
- don't read `format.json` for prometheus metrics in
StorageInfo() call.
- avoid setting last heal activity when starting self-healing
This can be confusing to users thinking that the self healing
cycle was already performed.
- add info about the next background healing round
This PR implements locking from a global entity into
a more localized set level entity, allowing for locks
to be held only on the resources which are writing
to a collection of disks rather than a global level.
In this process this PR also removes the top-level
limit of 32 nodes to an unlimited number of nodes. This
is a precursor change before bring in bucket expansion.
This PR refactors object layer handling such
that upon failure in sub-system initialization
server reaches a stage of safe-mode operation
wherein only certain API operations are enabled
and available.
This allows for fixing many scenarios such as
- incorrect configuration in vault, etcd,
notification targets
- missing files, incomplete config migrations
unable to read encrypted content etc
- any other issues related to notification,
policies, lifecycle etc
- This PR allows config KVS to be validated properly
without being affected by ENV overrides, rejects
invalid values during set operation
- Expands unit tests and refactors the error handling
for notification targets, returns error instead of
ignoring targets for invalid KVS
- Does all the prep-work for implementing safe-mode
style operation for MinIO server, introduces a new
global variable to toggle safe mode based operations
NOTE: this PR itself doesn't provide safe mode operations
The new auto healing model selects one node always responsible
for auto-healing the whole cluster, erasure set by erasure set.
If that node dies, another node will be elected as a leading
operator to perform healing.
This code also adds a goroutine which checks each 10 minutes
if there are any new unformatted disks and performs its healing
in that case, only the erasure set which has the new disk will
be healed.