Add large bucket support for erasure coded backend (#5160)

This PR implements an object layer which
combines input erasure sets of XL layers
into a unified namespace.

This object layer extends the existing
erasure coded implementation, it is assumed
in this design that providing > 16 disks is
a static configuration as well i.e if you started
the setup with 32 disks with 4 sets 8 disks per
pack then you would need to provide 4 sets always.

Some design details and restrictions:

- Objects are distributed using consistent ordering
  to a unique erasure coded layer.
- Each pack has its own dsync so locks are synchronized
  properly at pack (erasure layer).
- Each pack still has a maximum of 16 disks
  requirement, you can start with multiple
  such sets statically.
- Static sets set of disks and cannot be
  changed, there is no elastic expansion allowed.
- Static sets set of disks and cannot be
  changed, there is no elastic removal allowed.
- ListObjects() across sets can be noticeably
  slower since List happens on all servers,
  and is merged at this sets layer.

Fixes #5465
Fixes #5464
Fixes #5461
Fixes #5460
Fixes #5459
Fixes #5458
Fixes #5460
Fixes #5488
Fixes #5489
Fixes #5497
Fixes #5496
This commit is contained in:
Harshavardhana
2018-02-15 17:45:57 -08:00
committed by kannappanr
parent dd80256151
commit fb96779a8a
82 changed files with 5046 additions and 4771 deletions

184
docs/large-bucket/DESIGN.md Normal file
View File

@@ -0,0 +1,184 @@
## Command-line
```
NAME:
minio server - Start object storage server.
USAGE:
minio server [FLAGS] DIR1 [DIR2..]
minio server [FLAGS] DIR{1...64}
DIR:
DIR points to a directory on a filesystem. When you want to combine multiple drives
into a single large system, pass one directory per filesystem separated by space.
You may also use a `...` convention to abbreviate the directory arguments. Remote
directories in a distributed setup are encoded as HTTP(s) URIs.
```
## Limitations
- Minimum of 4 disks are needed for distributed erasure coded configuration.
- Maximum of 32 distinct nodes are supported in distributed configuration.
## Common usage
Single disk filesystem export
```
minio server dir1
```
Standalone erasure coded configuration with 4 disks.
```
minio server dir1 dir2 dir3 dir4
```
Standalone erasure coded configuration with 4 sets with 16 disks each.
```
minio server dir{1...64}
```
Distributed erasure coded configuration with 64 sets with 16 disks each.
```
minio server http://host{1...16}/export{1...64} - good
```
## Other usages
### Advanced use cases with multiple ellipses
Standalone erasure coded configuration with 4 sets with 16 disks each, which spawns disks across controllers.
```
minio server /mnt/controller{1...4}/data{1...16}
```
Standalone erasure coded configuration with 16 sets 16 disks per set, across mnts, across controllers.
```
minio server /mnt{1..4}/controller{1...4}/data{1...16}
```
Distributed erasure coded configuration with 2 sets 16 disks per set across hosts.
```
minio server http://host{1...32}/disk1
```
Distributed erasure coded configuration with rack level redundancy 32 sets in total, 16 disks per set.
```
minio server http://rack{1...4}-host{1...8}.example.net/export{1...16}
```
Distributed erasure coded configuration with no rack level redundancy but redundancy with in the rack we split the arguments, 32 sets in total, 16 disks per set.
```
minio server http://rack1-host{1...8}.example.net/export{1...16} http://rack2-host{1...8}.example.net/export{1...16} http://rack3-host{1...8}.example.net/export{1...16} http://rack4-host{1...8}.example.net/export{1...16}
```
### Expected expansion for double ellipses
```
minio server http://host{1...4}/export{1...8}
```
Expected expansion
```
> http://host1/export1
> http://host2/export1
> http://host3/export1
> http://host4/export1
> http://host1/export2
> http://host2/export2
> http://host3/export2
> http://host4/export2
> http://host1/export3
> http://host2/export3
> http://host3/export3
> http://host4/export3
> http://host1/export4
> http://host2/export4
> http://host3/export4
> http://host4/export4
> http://host1/export5
> http://host2/export5
> http://host3/export5
> http://host4/export5
> http://host1/export6
> http://host2/export6
> http://host3/export6
> http://host4/export6
> http://host1/export7
> http://host2/export7
> http://host3/export7
> http://host4/export7
> http://host1/export8
> http://host2/export8
> http://host3/export8
> http://host4/export8
```
## Backend `format.json` changes
New `format.json` has new fields
- `disk` is changed to `this`
- `jbod` is changed to `sets` , along with this change sets is also a two dimensional list representing total sets and disks per set.
A sample `format.json` looks like below
```json
{
"version": "1",
"format": "xl",
"xl": {
"version": "2",
"this": "4ec63786-3dbd-4a9e-96f5-535f6e850fb1",
"sets": [
[
"4ec63786-3dbd-4a9e-96f5-535f6e850fb1",
"1f3cf889-bc90-44ca-be2a-732b53be6c9d",
"4b23eede-1846-482c-b96f-bfb647f058d3",
"e1f17302-a850-419d-8cdb-a9f884a63c92"
], [
"2ca4c5c1-dccb-4198-a840-309fea3b5449",
"6d1e666e-a22c-4db4-a038-2545c2ccb6d5",
"d4fa35ab-710f-4423-a7c2-e1ca33124df0",
"88c65e8b-00cb-4037-a801-2549119c9a33"
]
],
"distributionAlgo": "CRCMOD"
}
}
```
New `format-xl.go` behavior is format structure is used as a opaque type, `Format` field signifies the format of the backend. Once the format has been identified it is now the job of the identified backend to further interpret the next structures and validate them.
```go
type formatType string
const (
formatFS formatType = "fs"
formatXL = "xl"
)
type format struct {
Version string
Format BackendFormat
}
```
### Current format
```go
type formatXLV1 struct{
format
XL struct{
Version string
Disk string
JBOD []string
}
}
```
### New format
```go
type formatXLV2 struct {
Version string `json:"version"`
Format string `json:"format"`
XL struct {
Version string `json:"version"`
This string `json:"this"`
Sets [][]string `json:"sets"`
DistributionAlgo string `json:"distributionAlgo"`
} `json:"xl"`
}
```

View File

@@ -0,0 +1,48 @@
# Large Bucket Support Quickstart Guide [![Slack](https://slack.minio.io/slack?type=svg)](https://slack.minio.io) [![Go Report Card](https://goreportcard.com/badge/minio/minio)](https://goreportcard.com/report/minio/minio) [![Docker Pulls](https://img.shields.io/docker/pulls/minio/minio.svg?maxAge=604800)](https://hub.docker.com/r/minio/minio/) [![codecov](https://codecov.io/gh/minio/minio/branch/master/graph/badge.svg)](https://codecov.io/gh/minio/minio)
Minio large bucket support lets you use more than 16 disks by creating a number of smaller sets of erasure coded units, these units are further combined into a single namespace. Minio large bucket support is developed to solve for several real world use cases, without any special configuration changes. Some of these are
- You already have racks with many disks.
- You are looking for large capacity up-front for your object storage needs.
# Get started
If you're aware of distributed Minio setup, the installation and running remains the same. Newer syntax to use a `...` convention to abbreviate the directory arguments. Remote directories in a distributed setup are encoded as HTTP(s) URIs which can be similarly abbreviated as well.
## 1. Prerequisites
Install Minio - [Minio Quickstart Guide](https://docs.minio.io/docs/minio).
## 2. Run Minio on many disks
To run Minio large bucket instances, you need to start multiple Minio servers pointing to the same disks. We'll see examples on how to do this in the following sections.
*Note*
- All the nodes running distributed Minio need to have same access key and secret key. To achieve this, we export access key and secret key as environment variables on all the nodes before executing Minio server command.
- The drive paths below are for demonstration purposes only, you need to replace these with the actual drive paths/folders.
### Minio large bucket on Ubuntu 16.04 LTS standalone
You'll need the path to the disks e.g. `/export1, /export2 .... /export24`. Then run the following commands on all the nodes you'd like to launch Minio.
```sh
export MINIO_ACCESS_KEY=<ACCESS_KEY>
export MINIO_SECRET_KEY=<SECRET_KEY>
minio server /export{1...24}
```
### Minio large bucket on Ubuntu 16.04 LTS servers
You'll need the path to the disks e.g. `/export1, /export2 .... /export16`. Then run the following commands on all the nodes you'd like to launch Minio.
```sh
export MINIO_ACCESS_KEY=<ACCESS_KEY>
export MINIO_SECRET_KEY=<SECRET_KEY>
minio server http://host{1...4}/export{1...16}
```
## 3. Test your setup
To test this setup, access the Minio server via browser or [`mc`](https://docs.minio.io/docs/minio-client-quickstart-guide). Youll see the uploaded files are accessible from the all the Minio endpoints.
## Explore Further
- [Use `mc` with Minio Server](https://docs.minio.io/docs/minio-client-quickstart-guide)
- [Use `aws-cli` with Minio Server](https://docs.minio.io/docs/aws-cli-with-minio)
- [Use `s3cmd` with Minio Server](https://docs.minio.io/docs/s3cmd-with-minio)
- [Use `minio-go` SDK with Minio Server](https://docs.minio.io/docs/golang-client-quickstart-guide)
- [The Minio documentation website](https://docs.minio.io)