This is to support deployments migrating from a multi-pooled
wider stripe to lower stripe. MINIO_STORAGE_CLASS_STANDARD
is still expected to be same for all pools. So you can satisfy
adding custom drive count based pools by adjusting the storage
class value.
```
version: v2
address: ':9000'
rootUser: 'minioadmin'
rootPassword: 'minioadmin'
console-address: ':9001'
pools: # Specify the nodes and drives with pools
-
args:
- 'node{11...14}.example.net/data{1...4}'
-
args:
- 'node{15...18}.example.net/data{1...4}'
-
args:
- 'node{19...22}.example.net/data{1...4}'
-
args:
- 'node{23...34}.example.net/data{1...10}'
set-drive-count: 6
```
instead upon any error in renameData(), we still
preserve the existing dataDir in some form for
recoverability in strange situations such as out
of disk space type errors.
Bonus: avoid running list and heal() instead allow
versions disparity to return the actual versions,
uuid to heal. Currently limit this to 100 versions
and lesser disparate objects.
an undo now reverts back the xl.meta from xl.meta.bkp
during overwrites on such flaky setups.
Bonus: Save N depth syscalls via skipping the parents
upon overwrites and versioned updates.
Flaky setup examples are stretch clusters with regular
packet drops etc, we need to add some defensive code
around to avoid dangling objects.
Bonus: allow replication to attempt Deletes/Puts when
the remote returns quorum errors of some kind, this is
to ensure that MinIO can rewrite the namespace with the
latest version that exists on the source.
- remove targetClient for passing around via replicationObjectInfo{}
- remove cloing to object info unnecessarily
- remove objectInfo from replicationObjectInfo{} (only require necessary fields)
versioned buckets were not creating the delete markers
present in the versioned stack of an object, this essentially
would stop decommission to succeed.
This PR fixes creating such delete markers properly during
a decommissioning process, adds tests as well.
Current code incorrectly passed the
config asset object name while decommissioning,
make sure that we pass the right object name
to be hashed on the newer set of pools.
This PR fixes situations after a successful
decommission, the users and policies might go
missing due to wrong hashed set.
```
λ mc admin decommission start alias/ http://minio{1...2}/data{1...4}
```
```
λ mc admin decommission status alias/
┌─────┬─────────────────────────────────┬──────────────────────────────────┬────────┐
│ ID │ Pools │ Capacity │ Status │
│ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Active │
│ 2nd │ http://minio{3...4}/data{1...4} │ 329 GiB (used) / 421 GiB (total) │ Active │
└─────┴─────────────────────────────────┴──────────────────────────────────┴────────┘
```
```
λ mc admin decommission status alias/ http://minio{1...2}/data{1...4}
Progress: ===================> [1GiB/sec] [15%] [4TiB/50TiB]
Time Remaining: 4 hours (started 3 hours ago)
```
```
λ mc admin decommission status alias/ http://minio{1...2}/data{1...4}
ERROR: This pool is not scheduled for decommissioning currently.
```
```
λ mc admin decommission cancel alias/
┌─────┬─────────────────────────────────┬──────────────────────────────────┬──────────┐
│ ID │ Pools │ Capacity │ Status │
│ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Draining │
└─────┴─────────────────────────────────┴──────────────────────────────────┴──────────┘
```
> NOTE: Canceled decommission will not make the pool active again, since we might have
> Potentially partial duplicate content on the other pools, to avoid this scenario be
> very sure to start decommissioning as a planned activity.
```
λ mc admin decommission cancel alias/ http://minio{1...2}/data{1...4}
┌─────┬─────────────────────────────────┬──────────────────────────────────┬────────────────────┐
│ ID │ Pools │ Capacity │ Status │
│ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Draining(Canceled) │
└─────┴─────────────────────────────────┴──────────────────────────────────┴────────────────────┘
```
In cases where a cluster is degraded, we do not uphold our consistency
guarantee and we will write fewer erasure codes and rely on healing
to recreate the missing shards.
In some cases replacing known bad disks in practice take days.
We want to change the behavior of a known degraded system to keep
the erasure code promise of the storage class for each object.
This will create the objects with the same confidence as a fully
functional cluster. The tradeoff will be that objects created
during a partial outage will take up slightly more space.
This means that when the storage class is EC:4, there should
always be written 4 parity shards, even if some disks are unavailable.
When an object is created on a set, the disks are immediately
checked. If any disks are unavailable additional parity shards
will be made for each offline disk, up to 50% of the number of disks.
We add an internal metadata field with the actual and intended
erasure code level, this can optionally be picked up later by
the scanner if we decide that data like this should be re-sharded.
In lieu of new changes coming for server command line, this
change is to deprecate strict requirement for distributed setups
to provide root credentials.
Bonus: remove MINIO_WORM warning from April 2020, it is time to
remove this warning.
Current implementation requires server pools to have
same erasure stripe sizes, to facilitate same SLA
and expectations.
This PR allows server pools to be variadic, i.e they
do not have to be same erasure stripe sizes - instead
they should have SLA for parity ratio.
If the parity ratio cannot be guaranteed by the new
server pool, the deployment is rejected i.e server
pool expansion is not allowed.
Too many deployments come up with an odd number
of hosts or drives, to facilitate even distribution
among those setups allow for odd and prime numbers
based packs.