The reedsolomon library now avoids allocations during reconstruction.
This change exploits that to reduce memory allocs and GC preasure during
healing and reading.
This change refactor the ObjectLayer PutObject and PutObjectPart
functions. Instead of passing an io.Reader and a size to PUT operations
ObejectLayer expects an HashReader.
A HashReader verifies the MD5 sum (and SHA256 sum if required) of the object.
This change updates all all PutObject(Part) calls and removes unnecessary code
in all ObjectLayer implementations.
Fixes#4923
We don't need to typecast identifiers from
their base to type to same type again. This
is not a bug and compiler is fine to skip
it but it is better to avoid if not needed.
This change provides new implementations of the XL backend operations:
- create file
- read file
- heal file
Further this change adds table based tests for all three operations.
This affects also the bitrot algorithm integration. Algorithms are now
integrated in an idiomatic way (like crypto.Hash).
Fixes#4696Fixes#4649Fixes#4359
xl.storageDisks is sometimes passed to some low-level XL functions. Some disks in
xl.storageDisks are set to nil when they encounter some errors. This means all
elements in xl.storageDisks will be nil after some time which lead to an unusable XL.
This is an enhancement to the XL/distributed-XL mode. FS mode is
unaffected.
The ReadFileWithVerify storage-layer call is similar to ReadFile with
the additional functionality of performing bit-rot checking. It
accepts additional parameters for a hashing algorithm to use and the
expected hex-encoded hash string.
This patch provides significant performance improvement because:
1. combines the step of reading the file (during
erasure-decoding/reconstruction) with bit-rot verification;
2. limits the number of file-reads; and
3. avoids transferring the file over the network for bit-rot
verification.
ReadFile API is implemented as ReadFileWithVerify with empty hashing
arguments.
Credits to AB and Harsha for the algorithmic improvement.
Fixes#4236.
This PR also does backend format change to 1.0.1
from 1.0.0. Backward compatible changes are still
kept to read the 'md5Sum' key. But all new objects
will be stored with the same details under 'etag'.
Fixes#4312
HEAD Object for FS and XL was returning invalid object name when
an object name has a trailing slash separator, this PR changes the
behavior and will always return 404 object not found, this guarantees
a better compatibility with S3 spec.
It was possible to upload a big file which overcomes the minimal
disk space limit in XL, PrepareFile was actually checking for disk
space but we weren't checking its returned error. This patch fixes
this behavior.
Ignore a disk which wasn't able to successfully perform an action to
avoid eventual perturbations when the disk comes back in the middle
of write change.
This PR is for readability cleanup
- getOrderedDisks as shuffleDisks
- getOrderedPartsMetadata as shufflePartsMetadata
Distribution is now a second argument instead being the
primary input argument for brevity.
Also change the usage of type casted int64(0), instead
rely on direct type reference as `var variable int64` everywhere.
Existing objects before overwrites are renamed to
temp location in completeMultipart. We make sure
that we delete it even if subsequenty calls fail.
Additionally move verifying of parent dir is a
file earlier to fail the entire operation.
Ref #3784
Current implementation didn't honor quorum properly and didn't
handle the errors generated properly. This patch addresses that
and also moves common code `cleanupMultipartUploads` into xl
specific private function.
Fixes#3665
This is written so that to simplify our handler code
and provide a way to only update metadata instead of
the data when source and destination in CopyObject
request are same.
Fixes#3316
ObjectLayer GetObject() now returns the entire object
if starting offset is 0 and length is negative. This
also allows to simplify handler layer code where
we always had to use GetObjectInfo() before proceeding
to read bucket metadata files examples `policy.json`.
This also reduces one additional call overhead.
This is implemented so that the issues like in the
following flow don't affect the behavior of operation.
```
GetObjectInfo()
.... --> Time window for mutation (no lock held)
.... --> Time window for mutation (no lock held)
GetObject()
```
This happens when two simultaneous uploads are made
to the same object the object has returned wrong
info to the client.
Another classic example is "CopyObject" API itself
which reads from a source object and copies to
destination object.
Fixes#3370Fixes#2912
This change brings in changes at multiple places
- Reuse buffers at almost all locations ranging
from rpc, fs, xl, checksum etc.
- Change caching behavior to disable itself
under low memory conditions i.e < 8GB of RAM.
- Only objects cached are of size 1/10th the size
of the cache for example if 4GB is the cache size
the maximum object size which will be cached
is going to be 400MB. This change is an
optimization to cache more objects rather
than few larger objects.
- If object cache is enabled default GC
percent has been reduced to 20% in lieu
with newly found behavior of GC. If the cache
utilization reaches 75% of the maximum value
GC percent is reduced to 10% to make GC
more aggressive.
- Do not use *bytes.Buffer* due to its growth
requirements. For every allocation *bytes.Buffer*
allocates an additional buffer for its internal
purposes. This is undesirable for us, so
implemented a new cappedWriter which is capped to a
desired size, beyond this all writes rejected.
Possible fix for #3403.
This is needed as explained by @krisis
Lets say we have following errors.
```
[]error{nil, errFileNotFound, errDiskAccessDenied, errDiskAccesDenied}
```
Since the last two errors are filtered, the maximum is nil,
depending on map order.
Let's say we get nil from reduceErr. Clearly at this point
we don't have quorum nodes agreeing about the data and since
GetObject only requires N/2 (Read quorum) and isDiskQuorum
would have returned true. This is problematic and can lead to
undersiable consequences.
Fixes#3298
Disks when are offline for a long period of time, we should
ignore the disk after trying Login upto 5 times.
This is to reduce the network chattiness, this also reduces
the overall time spent on `net.Dial`.
Fixes#3286
- abstract out instrumentation information.
- use separate lockInstance type that encapsulates the nsMutex, volume,
path and opsID as the frontend or top-level lock object.
In a multipart upload scenario disks going down and coming backup
can lead to certain parts missing on the disk/server which was
going down. This is a valid case since these blocks can be
missing and should be healed through heal operation. But we are
not supposed to fail prematurely since we have enough data on
the other disks as well within read-quorum.
This fix relaxes previous assumption, fixes a major corruption
issue reproduced by @vadmeste.
Fixes#2976
These messages based on our prep stage during XL
and prints more informative message regarding
drive information.
This change also does a much needed refactoring.
- Using gjson for constructing xlMetaV1{} in realXLMeta.
- Test for parsing constructing xlMetaV1{} using gjson.
- Changes made since benchmarks showed 30-40% improvement in speed.
- Follow up comments in issue https://github.com/minio/minio/issues/2208
for more details.
- gjson parsing of parts from xl.json for listParts.
- gjson parsing of statInfo from xl.json for getObjectInfo.
- Vendorizing gjson dependency.
From the S3 layer after PutObject we were calling GetObjectInfo for bucket notification. This can
be avoided if PutObjectInfo returns ObjectInfo.
fixes#2567
- Instrumentation for locks.
- Detailed test coverage.
- Adding RPC control handler to fetch lock instrumentation.
- RPC control handlers suite tests with a test RPC server.