55 Commits

Author SHA1 Message Date
Harshavardhana
53e4887e02 Simplify and cleanup metadata r/w functions (#8146) 2019-09-11 22:52:12 +05:30
Harshavardhana
b52a3e523c Avoid using fastjson parser pool, move back to jsoniter (#8190)
It looks like from implementation point of view fastjson
parser pool doesn't behave the same way as expected
when dealing many `xl.json` from multiple disks.

The fastjson parser pool usage ends up returning incorrect
xl.json entries for checksums, with references pointing
to older entries. This led to the subtle bug where checksum
info is duplicated from a previous xl.json read of a different
file from different disk.
2019-09-06 04:21:27 +05:30
Harshavardhana
9ca7470ccc
Avoid using jsoniter, move to fastjson (#8063)
This is to avoid using unsafe.Pointer type
code dependency for MinIO, this causes
crashes on ARM64 platforms

Refer #8005 collection of runtime crashes due
to unsafe.Pointer usage incorrectly. We have
seen issues like this before when using
jsoniter library in the past.

This PR hopes to fix this using fastjson
2019-08-19 08:35:52 -10:00
Harshavardhana
39b3e4f9b3 Avoid using io.ReadFull() for WriteAll and CreateFile (#7676)
With these changes we are now able to peak performances
for all Write() operations across disks HDD and NVMe.

Also adds readahead for disk reads, which also increases
performance for reads by 3x.
2019-05-22 13:47:15 -07:00
Anis Elleuch
27ef1262bf xl: Use random UUID during complete multipart upload (#7527)
One user has seen this following error log:

API: CompleteMultipartUpload(bucket=vertica, object=perf-dss-v03/cc2/02596813aecd4e476d810148586c2a3300d00000013557ef_0.gt)
Time: 15:44:07 UTC 04/11/2019
RequestID: 159475EFF4DEDFFB
RemoteHost: 172.26.87.184
UserAgent: vertica-v9.1.1-5
Error: open /data/.minio.sys/tmp/100bb3ec-6c0d-4a37-8b36-65241050eb02/xl.json: file exists
       1: cmd/xl-v1-metadata.go:448:cmd.writeXLMetadata()
       2: cmd/xl-v1-metadata.go:501:cmd.writeUniqueXLMetadata.func1()

This can happen when CompleteMultipartUpload fails with write quorum,
the S3 client will retry (since write quorum is 500 http response),
however the second call of CompleteMultipartUpload will fail because
this latter doesn't truly use a random uuid under .minio.sys/tmp/
directory but pick the upload id.

This commit fixes the behavior to choose a random uuid for generating
xl.json
2019-04-25 07:33:26 -07:00
Harshavardhana
f767a2538a
Optimize listing with leaf check offloaded to posix (#7541)
Other listing optimizations include

- remove double sorting while filtering object entries
- improve error message when upload-id is not in quorum
- use jsoniter for full unmarshal json, instead of gjson
- remove unused code
2019-04-23 14:54:28 -07:00
kannappanr
5ecac91a55
Replace Minio refs in docs with MinIO and links (#7494) 2019-04-09 11:39:42 -07:00
poornas
2564147ab4 Filter Expires header from user metadata (#7269)
Instead save it as a struct field in ObjectInfo as it is
a standard HTTP header - Fixes minio/mc#2690
2019-02-28 11:01:25 -08:00
Krishna Srinivas
51ec61ee94 Fix healing whole file bitrot (#7123)
* Use 0-byte file for bitrot verification of whole-file-bitrot files

Also pass the right checksum information for bitrot verification

* Copy xlMeta info from latest meta except []checksums and []Parts while healing
2019-01-20 07:58:40 +05:30
Krishna Srinivas
98c950aacd Streaming bitrot verification support (#7004) 2019-01-17 18:28:18 +05:30
poornas
5a80cbec2a Add double encryption at S3 gateway. (#6423)
This PR adds pass-through, single encryption at gateway and double
encryption support (gateway encryption with pass through of SSE
headers to backend).

If KMS is set up (either with Vault as KMS or using
MINIO_SSE_MASTER_KEY),gateway will automatically perform
single encryption. If MINIO_GATEWAY_SSE is set up in addition to
Vault KMS, double encryption is performed.When neither KMS nor
MINIO_GATEWAY_SSE is set, do a pass through to backend.

When double encryption is specified, MINIO_GATEWAY_SSE can be set to
"C" for SSE-C encryption at gateway and backend, "S3" for SSE-S3
encryption at gateway/backend or both to support more than one option.

Fixes #6323, #6696
2019-01-05 14:16:42 -08:00
Harshavardhana
f1f23f6f11 Add sync mode for 'xl.json' (#6798)
xl.json is the source of truth for all erasure
coded objects, without which we won't be able to
read the objects properly. This PR enables sync
mode for writing `xl.json` such all writes go hit
the disk and are persistent under situations such
as abrupt power failures on servers running Minio.
2018-11-14 19:48:35 +05:30
Anis Elleuch
5b3090dffc encryption: Fix copy from encrypted multipart to single part (#6604)
CopyObject handler forgot to remove multipart encryption flag in metadata
when source is an encrypted multipart object and the target is also encrypted
but single part object.

This PR also simplifies the code to facilitate review.
2018-10-15 11:07:36 -07:00
Praveen raj Mani
ce9d36d954 Add object compression support (#6292)
Add support for streaming (golang/LZ77/snappy) compression.
2018-09-28 09:06:17 +05:30
Harshavardhana
3de5a3157f Enhance picking valid xlMeta based on quorum (#6297)
This PR borrows the idea from getFormatXLQuorum()
2018-08-17 14:42:04 -07:00
kannappanr
0286e61aee Log disk not found error just once (#6059)
Modified the LogIf function to log only if the error passed
is not on the ignored errors list.

Currently, only disk not found error is added to the list.
Added a new function in logger package called LogAlwaysIf, 
which will print on any error.

Fixes #5997
2018-08-14 13:58:48 -07:00
Krishna Srinivas
ce02ab613d Simplify erasure code by separating bitrot from erasure code (#5959) 2018-08-06 15:14:08 -07:00
Harshavardhana
e5e522fc61
docs: fix all Chinese doc links for the new docs site (#6097)
Additionally fix typos, default to US locale words
2018-06-28 16:02:02 -07:00
Krishna Srinivas
0f746a14a3 Do not use crypto.SHA3_256 as placeholder for HighwayHash256 (#5847) 2018-05-04 10:42:22 -07:00
Krishna Srinivas
9aace6d36d Continue healing other objects even if objects without quorum exist (#5851)
fixes #5815
2018-04-25 11:56:39 -07:00
ebozduman
f16bfda2f2 Remove panic() and handle it appropriately (#5807)
This is an effort to remove panic from the source. 
Add a new call called CriticialIf, that calls LogIf and exits. 
Replace panics with one of CriticalIf, FatalIf and a return of error.
2018-04-19 17:24:43 -07:00
Harshavardhana
4a874dfbc1
Ignore prefix renames when dest directory is not empty (#5798)
Also make sure to not modify the underlying errors from
layers, we should return the error as is and one object
layer should translate the errors.

Fixes #5797
2018-04-11 17:15:42 -07:00
kannappanr
cef992a395
Remove error package and cause functions (#5784) 2018-04-10 09:36:37 -07:00
kannappanr
f8a3fd0c2a
Create logger package and rename errorIf to LogIf (#5678)
Removing message from error logging
Replace errors.Trace with LogIf
2018-04-05 15:04:40 -07:00
Nitish Tiwari
9eb94fe8c8 Fix StorageClass field in ListObject/ListObjectV2 response (#5766)
Fixes: #5754
2018-04-05 10:56:28 -07:00
Anis Elleuch
120b061966 Add multipart support in SSE-C encryption (#5576)
*) Add Put/Get support of multipart in encryption
*) Add GET Range support for encryption
*) Add CopyPart encrypted support
*) Support decrypting of large single PUT object
2018-03-01 11:37:57 -08:00
Harshavardhana
fb96779a8a Add large bucket support for erasure coded backend (#5160)
This PR implements an object layer which
combines input erasure sets of XL layers
into a unified namespace.

This object layer extends the existing
erasure coded implementation, it is assumed
in this design that providing > 16 disks is
a static configuration as well i.e if you started
the setup with 32 disks with 4 sets 8 disks per
pack then you would need to provide 4 sets always.

Some design details and restrictions:

- Objects are distributed using consistent ordering
  to a unique erasure coded layer.
- Each pack has its own dsync so locks are synchronized
  properly at pack (erasure layer).
- Each pack still has a maximum of 16 disks
  requirement, you can start with multiple
  such sets statically.
- Static sets set of disks and cannot be
  changed, there is no elastic expansion allowed.
- Static sets set of disks and cannot be
  changed, there is no elastic removal allowed.
- ListObjects() across sets can be noticeably
  slower since List happens on all servers,
  and is merged at this sets layer.

Fixes #5465
Fixes #5464
Fixes #5461
Fixes #5460
Fixes #5459
Fixes #5458
Fixes #5460
Fixes #5488
Fixes #5489
Fixes #5497
Fixes #5496
2018-02-15 17:45:57 -08:00
Andreas Auernhammer
7f99cc9768 add HighwayHash256 support (#5359)
This change adds the HighwayHash256 PRF as bitrot protection / detection
algorithm. Since HighwayHash256 requires a 256 bit we generate a random
key from the first 100 decimals of π - See nothing-up-my-sleeve-numbers.
This key is fixed forever and tied to the HighwayHash256 bitrot algorithm.

Fixes #5358
2018-01-19 10:18:21 -08:00
Andreas Auernhammer
d0a43af616 replace all "crypto/sha256" with "github.com/minio/sha256-simd" (#5391)
This change replaces all imports of "crypto/sha256" with
"github.com/minio/sha256-simd". The sha256-simd package
is faster on ARM64 (NEON instructions) and can take advantage
of AVX-512 in certain scenarios.

Fixes #5374
2018-01-17 10:54:31 -08:00
Nitish Tiwari
ede504400f
Add validation of xlMeta ErasureInfo field (#5389) 2018-01-12 18:16:30 +05:30
Nitish Tiwari
1e5fb4b79a
Fix storage class related issues (#5338)
- Update startup banner to print storage class in capitals. This
makes it easier to identify different storage classes available.

- Update response metadata to not send STANDARD storage class.
This is in accordance with AWS S3 behaviour.

- Update minio-go library to bring in storage class related
changes. This is needed to make transparent translation of
storage class headers for Minio S3 Gateway.
2018-01-04 11:44:45 +05:30
Nitish Tiwari
1a3dbbc9dd
Add x-amz-storage-class support (#5295)
This adds configurable data and parity options on a per object
basis. To use variable parity

- Users can set environment variables to cofigure variable
parity

- Then add header x-amz-storage-class to putobject requests
with relevant storage class values

Fixes #4997
2017-12-22 16:58:13 +05:30
Harshavardhana
8efa82126b
Convert errors tracer into a separate package (#5221) 2017-11-25 11:58:29 -08:00
Andreas Auernhammer
85fcee1919 erasure: simplify XL backend operations (#4649) (#4758)
This change provides new implementations of the XL backend operations:
 - create file
 - read   file
 - heal   file
Further this change adds table based tests for all three operations.

This affects also the bitrot algorithm integration. Algorithms are now
integrated in an idiomatic way (like crypto.Hash).
Fixes #4696
Fixes #4649
Fixes #4359
2017-08-14 18:08:42 -07:00
Frank Wessels
46897b1100 Name return values to prevent the need (and unnecessary code bloat) (#4576)
This is done to explicitly instantiate objects for every return statement.
2017-06-21 19:53:09 -07:00
Anis Elleuch
af8071c86a xl: Fix rare freeze after many disk/network errors (#4438)
xl.storageDisks is sometimes passed to some low-level XL functions. Some disks in
xl.storageDisks are set to nil when they encounter some errors. This means all
elements in xl.storageDisks will be nil after some time which lead to an unusable XL.
2017-06-14 17:14:27 -07:00
Frank Wessels
9ba57a8df0 Add errCorruptedFormat to list of ignored errors for metadata operations. (#4447)
Fixes listing of objects where xl.json is empty or corrupted to skip to the next disk/server (issue 4354).
2017-05-31 20:03:32 -07:00
Aditya Manthramurthy
8975da4e84 Add new ReadFileWithVerify storage-layer API (#4349)
This is an enhancement to the XL/distributed-XL mode. FS mode is
unaffected.

The ReadFileWithVerify storage-layer call is similar to ReadFile with
the additional functionality of performing bit-rot checking. It
accepts additional parameters for a hashing algorithm to use and the
expected hex-encoded hash string.

This patch provides significant performance improvement because:

1. combines the step of reading the file (during
erasure-decoding/reconstruction) with bit-rot verification;

2. limits the number of file-reads; and

3. avoids transferring the file over the network for bit-rot
verification.

ReadFile API is implemented as ReadFileWithVerify with empty hashing
arguments.

Credits to AB and Harsha for the algorithmic improvement.

Fixes #4236.
2017-05-16 14:21:52 -07:00
Harshavardhana
155a90403a fs/erasure: Rename meta 'md5Sum' as 'etag'. (#4319)
This PR also does backend format change to 1.0.1
from 1.0.0.  Backward compatible changes are still
kept to read the 'md5Sum' key. But all new objects
will be stored with the same details under 'etag'.

Fixes #4312
2017-05-14 12:05:51 -07:00
Harshavardhana
a7afa469e2 xl: Add stat calls to keep track of ignored errors. (#4117)
Such that in a situation where all errors were
ignored we need to reduce the errors using
readQuorum to get a consistent error value.

Without this change errors generated will
never be consistent with for an expected scenario.

For example in a 6 disk setup 1 disk is missing
and 5 do not have the volume (testbucket)

Without this change Stat() would result in different
errors depending on which disk died. Can cause
confusion to S3 client application.

This change addresses need to track type of
errors we ignored and bring readQuorum to
choose the maximally occuring as the value
of truth.
2017-04-14 01:46:16 -07:00
Anis Elleuch
dce0345f8f Set disk to nil after write which needs quorum (#3795)
Ignore a disk which wasn't able to successfully perform an action to
avoid eventual perturbations when the disk comes back in the middle
of write change.
2017-02-26 11:58:32 -08:00
Harshavardhana
6a6c930f5b xl: Abort multipart upload should honor quorum properly. (#3670)
Current implementation didn't honor quorum properly and didn't
handle the errors generated properly. This patch addresses that
and also moves common code `cleanupMultipartUploads` into xl
specific private function.

Fixes #3665
2017-02-01 11:16:17 -08:00
Harshavardhana
62f8343879 Add constants for commonly used values. (#3588)
This is a consolidation effort, avoiding usage
of naked strings in codebase. Whenever possible
use constants which can be repurposed elsewhere.

This also fixes `goconst ./...` reported issues.
2017-01-18 12:24:34 -08:00
Harshavardhana
69559aa101 objAPI: Implement CopyObject API. (#3487)
This is written so that to simplify our handler code
and provide a way to only update metadata instead of
the data when source and destination in CopyObject
request are same.

Fixes #3316
2016-12-26 16:29:26 -08:00
Harshavardhana
5878fcc086 bit-rot: Default to sha256 on ARM64. (#3488)
This is to utilize an optimized version of
sha256 checksum which @fwessels implemented.

blake2b lacks such optimizations on ARM platform,
this can provide us significant boost in performance.

blake2b on ARM64 as expected would be slower.
```
BenchmarkSize1K-4           	   30000	     44015 ns/op	  23.26 MB/s
BenchmarkSize8K-4           	    5000	    335448 ns/op	  24.42 MB/s
BenchmarkSize32K-4          	    1000	   1333960 ns/op	  24.56 MB/s
BenchmarkSize128K-4         	     300	   5328286 ns/op	  24.60 MB/s
```

sha256 on ARM64 is faster by orders of magnitude giving close to
AVX performance of blake2b.
```
BenchmarkHash8Bytes-4	 1000000	      1446 ns/op	   5.53 MB/s
BenchmarkHash1K-4    	  500000	      3229 ns/op	 317.12 MB/s
BenchmarkHash8K-4    	  100000	     14430 ns/op	 567.69 MB/s
BenchmarkHash1M-4    	    1000	   1640126 ns/op	 639.33 MB/s
```
2016-12-22 08:25:03 -08:00
Bala FA
0f2e493c9a Use isErrIgnored() function wherever applicable. (#3343) 2016-11-23 20:05:04 -08:00
Harshavardhana
5197649081 utils: reduceErrs returns and validates quorum errors. (#3300)
This is needed as explained by @krisis

Lets say we have following errors.

```
[]error{nil, errFileNotFound, errDiskAccessDenied, errDiskAccesDenied}
```

Since the last two errors are filtered, the maximum is nil,
depending on map order.

Let's say we get nil from reduceErr. Clearly at this point
we don't have quorum nodes agreeing about the data and since
GetObject only requires N/2 (Read quorum) and isDiskQuorum
would have returned true. This is problematic and can lead to
undersiable consequences.

Fixes #3298
2016-11-21 01:47:26 -08:00
Krishnan Parthasarathi
eed9ab0464 XL: pickValidXLMeta should return error instead of panic'ing (#3277) 2016-11-20 20:56:44 -08:00
Harshavardhana
0b9f0d14a1 auth/rpc: Take remote disk offline after maximum allowed attempts. (#3288)
Disks when are offline for a long period of time, we should
ignore the disk after trying Login upto 5 times.

This is to reduce the network chattiness, this also reduces
the overall time spent on `net.Dial`.

Fixes #3286
2016-11-20 16:57:12 -08:00
Krishnan Parthasarathi
6a57f2c1f0 XL: Add more information to panic msg (#3119) 2016-10-28 08:46:03 -07:00