mirror of
https://github.com/minio/minio.git
synced 2025-11-07 12:52:58 -05:00
Support variable server pools (#11256)
Current implementation requires server pools to have same erasure stripe sizes, to facilitate same SLA and expectations. This PR allows server pools to be variadic, i.e they do not have to be same erasure stripe sizes - instead they should have SLA for parity ratio. If the parity ratio cannot be guaranteed by the new server pool, the deployment is rejected i.e server pool expansion is not allowed.
This commit is contained in:
@@ -94,20 +94,22 @@ Input for the key is the object name specified in `PutObject()`, returns a uniqu
|
||||
|
||||
- MinIO does erasure coding at the object level not at the volume level, unlike other object storage vendors. This allows applications to choose different storage class by setting `x-amz-storage-class=STANDARD/REDUCED_REDUNDANCY` for each object uploads so effectively utilizing the capacity of the cluster. Additionally these can also be enforced using IAM policies to make sure the client uploads with correct HTTP headers.
|
||||
|
||||
- MinIO also supports expansion of existing clusters in server pools. Each pool is a self contained entity with same SLA's (read/write quorum) for each object as original cluster. By using the existing namespace for lookup validation MinIO ensures conflicting objects are not created. When no such object exists then MinIO simply uses the least used pool.
|
||||
- MinIO also supports expansion of existing clusters in server pools. Each pool is a self contained entity with same SLA's (read/write quorum) for each object as original cluster. By using the existing namespace for lookup validation MinIO ensures conflicting objects are not created. When no such object exists then MinIO simply uses the least used pool to place new objects.
|
||||
|
||||
__There are no limits on how many server pools can be combined__
|
||||
|
||||
```
|
||||
minio server http://host{1...32}/export{1...32} http://host{5...6}/export{1...8}
|
||||
minio server http://host{1...32}/export{1...32} http://host{1...12}/export{1...12}
|
||||
```
|
||||
|
||||
In above example there are two server pools
|
||||
|
||||
- 32 * 32 = 1024 drives pool1
|
||||
- 2 * 8 = 16 drives pool2
|
||||
- 12 * 12 = 144 drives pool2
|
||||
|
||||
> Notice the requirement of common SLA here original cluster had 1024 drives with 16 drives per erasure set, second pool is expected to have a minimum of 16 drives to match the original cluster SLA or it should be in multiples of 16.
|
||||
> Notice the requirement of common SLA here original cluster had 1024 drives with 16 drives per erasure set with default parity of '4', second pool is expected to have a minimum of 8 drives per erasure set to match the original cluster SLA (parity count) of '4'. '12' drives stripe per erasure set in the second pool satisfies the original pool's parity count.
|
||||
|
||||
Refer to the sizing guide with details on the default parity count chosen for different erasure stripe sizes [here](https://github.com/minio/minio/blob/master/docs/distributed/SIZING.md)
|
||||
|
||||
MinIO places new objects in server pools based on proportionate free space, per pool. Following pseudo code demonstrates this behavior.
|
||||
```go
|
||||
|
||||
@@ -14,9 +14,9 @@ Distributed MinIO provides protection against multiple node/drive failures and [
|
||||
|
||||
A stand-alone MinIO server would go down if the server hosting the disks goes offline. In contrast, a distributed MinIO setup with _m_ servers and _n_ disks will have your data safe as long as _m/2_ servers or _m*n_/2 or more disks are online.
|
||||
|
||||
For example, an 16-server distributed setup with 200 disks per node would continue serving files, even if up to 8 servers are offline in default configuration i.e around 1600 disks can down MinIO would continue service files. But, you'll need at least 9 servers online to create new objects.
|
||||
For example, an 16-server distributed setup with 200 disks per node would continue serving files, up to 4 servers can be offline in default configuration i.e around 800 disks down MinIO would continue to read and write objects.
|
||||
|
||||
You can also use [storage classes](https://github.com/minio/minio/tree/master/docs/erasure/storage-class) to set custom parity distribution per object.
|
||||
Refer to sizing guide for more understanding on default values chosen depending on your erasure stripe size [here](https://github.com/minio/minio/blob/master/docs/distributed/SIZING.md). Parity settings can be changed using [storage classes](https://github.com/minio/minio/tree/master/docs/erasure/storage-class).
|
||||
|
||||
### Consistency Guarantees
|
||||
|
||||
@@ -38,14 +38,14 @@ __NOTE:__
|
||||
|
||||
- All the nodes running distributed MinIO need to have same access key and secret key for the nodes to connect. To achieve this, it is __recommended__ to export access key and secret key as environment variables, `MINIO_ROOT_USER` and `MINIO_ROOT_PASSWORD`, on all the nodes before executing MinIO server command.
|
||||
- __MinIO creates erasure-coding sets of *4* to *16* drives per set. The number of drives you provide in total must be a multiple of one of those numbers.__
|
||||
- __MinIO chooses the largest EC set size which divides into the total number of drives or total number of nodes given - making sure to keep the uniform distribution i.e each node participates equal number of drives per set.
|
||||
- __MinIO chooses the largest EC set size which divides into the total number of drives or total number of nodes given - making sure to keep the uniform distribution i.e each node participates equal number of drives per set__.
|
||||
- __Each object is written to a single EC set, and therefore is spread over no more than 16 drives.__
|
||||
- __All the nodes running distributed MinIO setup are recommended to be homogeneous, i.e. same operating system, same number of disks and same network interconnects.__
|
||||
- MinIO distributed mode requires __fresh directories__. If required, the drives can be shared with other applications. You can do this by using a sub-directory exclusive to MinIO. For example, if you have mounted your volume under `/export`, pass `/export/data` as arguments to MinIO server.
|
||||
- The IP addresses and drive paths below are for demonstration purposes only, you need to replace these with the actual IP addresses and drive paths/folders.
|
||||
- Servers running distributed MinIO instances should be less than 15 minutes apart. You can enable [NTP](http://www.ntp.org/) service as a best practice to ensure same times across servers.
|
||||
- `MINIO_DOMAIN` environment variable should be defined and exported for bucket DNS style support.
|
||||
- Running Distributed MinIO on __Windows__ operating system is experimental. Please proceed with caution.
|
||||
- Running Distributed MinIO on __Windows__ operating system is considered **experimental**. Please proceed with caution.
|
||||
|
||||
Example 1: Start distributed MinIO instance on n nodes with m drives each mounted at `/export1` to `/exportm` (pictured below), by running this command on all the n nodes:
|
||||
|
||||
@@ -79,8 +79,7 @@ minio server http://host{1...4}/export{1...16} http://host{5...12}/export{1...16
|
||||
|
||||
Now the server has expanded total storage by _(newly_added_servers\*m)_ more disks, taking the total count to _(existing_servers\*m)+(newly_added_servers\*m)_ disks. New object upload requests automatically start using the least used cluster. This expansion strategy works endlessly, so you can perpetually expand your clusters as needed. When you restart, it is immediate and non-disruptive to the applications. Each group of servers in the command-line is called a pool. There are 2 server pools in this example. New objects are placed in server pools in proportion to the amount of free space in each pool. Within each pool, the location of the erasure-set of drives is determined based on a deterministic hashing algorithm.
|
||||
|
||||
> __NOTE:__ __Each pool you add must have the same erasure coding set size as the original pool, so the same data redundancy SLA is maintained.__
|
||||
> For example, if your first pool was 8 drives, you could add further server pools of 16, 32 or 1024 drives each. All you have to make sure is deployment SLA is multiples of original data redundancy SLA i.e 8.
|
||||
> __NOTE:__ __Each pool you add must have the same erasure coding parity configuration as the original pool, so the same data redundancy SLA is maintained.__
|
||||
|
||||
## 3. Test your setup
|
||||
To test this setup, access the MinIO server via browser or [`mc`](https://docs.min.io/docs/minio-client-quickstart-guide).
|
||||
|
||||
34
docs/distributed/SIZING.md
Normal file
34
docs/distributed/SIZING.md
Normal file
@@ -0,0 +1,34 @@
|
||||
## Erasure code sizing guide
|
||||
|
||||
### Toy Setups
|
||||
|
||||
Capacity constrained environments, MinIO will work but not recommended for production.
|
||||
|
||||
| servers | drives (per node) | stripe_size | parity chosen (default) | tolerance for reads (servers) | tolerance for writes (servers) |
|
||||
|--------:|------------------:|------------:|------------------------:|------------------------------:|-------------------------------:|
|
||||
| 1 | 1 | 1 | 0 | 0 | 0 |
|
||||
| 1 | 4 | 4 | 2 | 0 | 0 |
|
||||
| 4 | 1 | 4 | 2 | 2 | 3 |
|
||||
| 5 | 1 | 5 | 2 | 2 | 2 |
|
||||
| 6 | 1 | 6 | 3 | 3 | 4 |
|
||||
| 7 | 1 | 7 | 3 | 3 | 3 |
|
||||
|
||||
### Minimum System Configuration for Production
|
||||
|
||||
| servers | drives (per node) | stripe_size | parity chosen (default) | tolerance for reads (servers) | tolerance for writes (servers) |
|
||||
|--------:|------------------:|------------:|------------------------:|------------------------------:|-------------------------------:|
|
||||
| 4 | 2 | 8 | 4 | 2 | 1 |
|
||||
| 5 | 2 | 10 | 4 | 2 | 2 |
|
||||
| 6 | 2 | 12 | 4 | 2 | 2 |
|
||||
| 7 | 2 | 14 | 4 | 2 | 2 |
|
||||
| 8 | 1 | 8 | 4 | 4 | 3 |
|
||||
| 8 | 2 | 16 | 4 | 2 | 2 |
|
||||
| 9 | 2 | 9 | 4 | 4 | 4 |
|
||||
| 10 | 2 | 10 | 4 | 4 | 4 |
|
||||
| 11 | 2 | 11 | 4 | 4 | 4 |
|
||||
| 12 | 2 | 12 | 4 | 4 | 4 |
|
||||
| 13 | 2 | 13 | 4 | 4 | 4 |
|
||||
| 14 | 2 | 14 | 4 | 4 | 4 |
|
||||
| 15 | 2 | 15 | 4 | 4 | 4 |
|
||||
| 16 | 2 | 16 | 4 | 4 | 4 |
|
||||
|
||||
Reference in New Issue
Block a user