Too many deployments come up with an odd number
of hosts or drives, to facilitate even distribution
among those setups allow for odd and prime numbers
based packs.
This is a precursor change before versioning,
removes/deprecates the requirement of remembering
partName and partETag which are not useful after
a multipart transaction has finished.
This PR reduces the overall size of the backend
JSON for large file uploads.
This PR adds jsoniter package to replace encoding/json
in places where faster json unmarshal is necessary
whenever input JSON is large enough.
Some benchmarking comparison between jsoniter and enconding/json
benchmark old MB/s new MB/s speedup
BenchmarkParseUnmarshal/N10-4 110.02 331.17 3.01x
BenchmarkParseUnmarshal/N100-4 125.74 524.09 4.17x
BenchmarkParseUnmarshal/N500-4 131.68 542.60 4.12x
BenchmarkParseUnmarshal/N1000-4 133.93 514.88 3.84x
BenchmarkParseUnmarshal/N5000-4 122.10 415.36 3.40x
BenchmarkParseUnmarshal/N10000-4 132.13 403.90 3.06x
It looks like from implementation point of view fastjson
parser pool doesn't behave the same way as expected
when dealing many `xl.json` from multiple disks.
The fastjson parser pool usage ends up returning incorrect
xl.json entries for checksums, with references pointing
to older entries. This led to the subtle bug where checksum
info is duplicated from a previous xl.json read of a different
file from different disk.
This is to avoid using unsafe.Pointer type
code dependency for MinIO, this causes
crashes on ARM64 platforms
Refer #8005 collection of runtime crashes due
to unsafe.Pointer usage incorrectly. We have
seen issues like this before when using
jsoniter library in the past.
This PR hopes to fix this using fastjson
With these changes we are now able to peak performances
for all Write() operations across disks HDD and NVMe.
Also adds readahead for disk reads, which also increases
performance for reads by 3x.
One user has seen this following error log:
API: CompleteMultipartUpload(bucket=vertica, object=perf-dss-v03/cc2/02596813aecd4e476d810148586c2a3300d00000013557ef_0.gt)
Time: 15:44:07 UTC 04/11/2019
RequestID: 159475EFF4DEDFFB
RemoteHost: 172.26.87.184
UserAgent: vertica-v9.1.1-5
Error: open /data/.minio.sys/tmp/100bb3ec-6c0d-4a37-8b36-65241050eb02/xl.json: file exists
1: cmd/xl-v1-metadata.go:448:cmd.writeXLMetadata()
2: cmd/xl-v1-metadata.go:501:cmd.writeUniqueXLMetadata.func1()
This can happen when CompleteMultipartUpload fails with write quorum,
the S3 client will retry (since write quorum is 500 http response),
however the second call of CompleteMultipartUpload will fail because
this latter doesn't truly use a random uuid under .minio.sys/tmp/
directory but pick the upload id.
This commit fixes the behavior to choose a random uuid for generating
xl.json
Other listing optimizations include
- remove double sorting while filtering object entries
- improve error message when upload-id is not in quorum
- use jsoniter for full unmarshal json, instead of gjson
- remove unused code
* Use 0-byte file for bitrot verification of whole-file-bitrot files
Also pass the right checksum information for bitrot verification
* Copy xlMeta info from latest meta except []checksums and []Parts while healing
This PR adds pass-through, single encryption at gateway and double
encryption support (gateway encryption with pass through of SSE
headers to backend).
If KMS is set up (either with Vault as KMS or using
MINIO_SSE_MASTER_KEY),gateway will automatically perform
single encryption. If MINIO_GATEWAY_SSE is set up in addition to
Vault KMS, double encryption is performed.When neither KMS nor
MINIO_GATEWAY_SSE is set, do a pass through to backend.
When double encryption is specified, MINIO_GATEWAY_SSE can be set to
"C" for SSE-C encryption at gateway and backend, "S3" for SSE-S3
encryption at gateway/backend or both to support more than one option.
Fixes#6323, #6696
xl.json is the source of truth for all erasure
coded objects, without which we won't be able to
read the objects properly. This PR enables sync
mode for writing `xl.json` such all writes go hit
the disk and are persistent under situations such
as abrupt power failures on servers running Minio.
CopyObject handler forgot to remove multipart encryption flag in metadata
when source is an encrypted multipart object and the target is also encrypted
but single part object.
This PR also simplifies the code to facilitate review.
Modified the LogIf function to log only if the error passed
is not on the ignored errors list.
Currently, only disk not found error is added to the list.
Added a new function in logger package called LogAlwaysIf,
which will print on any error.
Fixes#5997
This is an effort to remove panic from the source.
Add a new call called CriticialIf, that calls LogIf and exits.
Replace panics with one of CriticalIf, FatalIf and a return of error.
Also make sure to not modify the underlying errors from
layers, we should return the error as is and one object
layer should translate the errors.
Fixes#5797
*) Add Put/Get support of multipart in encryption
*) Add GET Range support for encryption
*) Add CopyPart encrypted support
*) Support decrypting of large single PUT object
This PR implements an object layer which
combines input erasure sets of XL layers
into a unified namespace.
This object layer extends the existing
erasure coded implementation, it is assumed
in this design that providing > 16 disks is
a static configuration as well i.e if you started
the setup with 32 disks with 4 sets 8 disks per
pack then you would need to provide 4 sets always.
Some design details and restrictions:
- Objects are distributed using consistent ordering
to a unique erasure coded layer.
- Each pack has its own dsync so locks are synchronized
properly at pack (erasure layer).
- Each pack still has a maximum of 16 disks
requirement, you can start with multiple
such sets statically.
- Static sets set of disks and cannot be
changed, there is no elastic expansion allowed.
- Static sets set of disks and cannot be
changed, there is no elastic removal allowed.
- ListObjects() across sets can be noticeably
slower since List happens on all servers,
and is merged at this sets layer.
Fixes#5465Fixes#5464Fixes#5461Fixes#5460Fixes#5459Fixes#5458Fixes#5460Fixes#5488Fixes#5489Fixes#5497Fixes#5496
This change adds the HighwayHash256 PRF as bitrot protection / detection
algorithm. Since HighwayHash256 requires a 256 bit we generate a random
key from the first 100 decimals of π - See nothing-up-my-sleeve-numbers.
This key is fixed forever and tied to the HighwayHash256 bitrot algorithm.
Fixes#5358
This change replaces all imports of "crypto/sha256" with
"github.com/minio/sha256-simd". The sha256-simd package
is faster on ARM64 (NEON instructions) and can take advantage
of AVX-512 in certain scenarios.
Fixes#5374
- Update startup banner to print storage class in capitals. This
makes it easier to identify different storage classes available.
- Update response metadata to not send STANDARD storage class.
This is in accordance with AWS S3 behaviour.
- Update minio-go library to bring in storage class related
changes. This is needed to make transparent translation of
storage class headers for Minio S3 Gateway.
This adds configurable data and parity options on a per object
basis. To use variable parity
- Users can set environment variables to cofigure variable
parity
- Then add header x-amz-storage-class to putobject requests
with relevant storage class values
Fixes#4997
This change provides new implementations of the XL backend operations:
- create file
- read file
- heal file
Further this change adds table based tests for all three operations.
This affects also the bitrot algorithm integration. Algorithms are now
integrated in an idiomatic way (like crypto.Hash).
Fixes#4696Fixes#4649Fixes#4359
xl.storageDisks is sometimes passed to some low-level XL functions. Some disks in
xl.storageDisks are set to nil when they encounter some errors. This means all
elements in xl.storageDisks will be nil after some time which lead to an unusable XL.
This is an enhancement to the XL/distributed-XL mode. FS mode is
unaffected.
The ReadFileWithVerify storage-layer call is similar to ReadFile with
the additional functionality of performing bit-rot checking. It
accepts additional parameters for a hashing algorithm to use and the
expected hex-encoded hash string.
This patch provides significant performance improvement because:
1. combines the step of reading the file (during
erasure-decoding/reconstruction) with bit-rot verification;
2. limits the number of file-reads; and
3. avoids transferring the file over the network for bit-rot
verification.
ReadFile API is implemented as ReadFileWithVerify with empty hashing
arguments.
Credits to AB and Harsha for the algorithmic improvement.
Fixes#4236.
This PR also does backend format change to 1.0.1
from 1.0.0. Backward compatible changes are still
kept to read the 'md5Sum' key. But all new objects
will be stored with the same details under 'etag'.
Fixes#4312
Such that in a situation where all errors were
ignored we need to reduce the errors using
readQuorum to get a consistent error value.
Without this change errors generated will
never be consistent with for an expected scenario.
For example in a 6 disk setup 1 disk is missing
and 5 do not have the volume (testbucket)
Without this change Stat() would result in different
errors depending on which disk died. Can cause
confusion to S3 client application.
This change addresses need to track type of
errors we ignored and bring readQuorum to
choose the maximally occuring as the value
of truth.