Commit Graph

82 Commits

Author SHA1 Message Date
Anis Elleuch
000a60f238 xl: Heal empty parts (#7860)
posix.VerifyFile() doesn't know how to check if a file
is corrupted if that file is empty. We do have the part
size in xl.json so we pass it to VerifyFile to return
an error so healing empty parts can work properly.
2019-07-13 00:29:44 +01:00
Krishna Srinivas
58d90ed73c Avoid network transfer for bitrot verification during healing (#7375) 2019-07-08 13:51:18 -07:00
Krishna Srinivas
338e9a9be9 Put object client disconnect (#7824)
Fail putObject  and postpolicy in case client prematurely disconnects
Use request's context to cancel lock requests on client disconnects
2019-06-28 22:09:17 -07:00
Anis Elleuch
27ef1262bf xl: Use random UUID during complete multipart upload (#7527)
One user has seen this following error log:

API: CompleteMultipartUpload(bucket=vertica, object=perf-dss-v03/cc2/02596813aecd4e476d810148586c2a3300d00000013557ef_0.gt)
Time: 15:44:07 UTC 04/11/2019
RequestID: 159475EFF4DEDFFB
RemoteHost: 172.26.87.184
UserAgent: vertica-v9.1.1-5
Error: open /data/.minio.sys/tmp/100bb3ec-6c0d-4a37-8b36-65241050eb02/xl.json: file exists
       1: cmd/xl-v1-metadata.go:448:cmd.writeXLMetadata()
       2: cmd/xl-v1-metadata.go:501:cmd.writeUniqueXLMetadata.func1()

This can happen when CompleteMultipartUpload fails with write quorum,
the S3 client will retry (since write quorum is 500 http response),
however the second call of CompleteMultipartUpload will fail because
this latter doesn't truly use a random uuid under .minio.sys/tmp/
directory but pick the upload id.

This commit fixes the behavior to choose a random uuid for generating
xl.json
2019-04-25 07:33:26 -07:00
Harshavardhana
f767a2538a
Optimize listing with leaf check offloaded to posix (#7541)
Other listing optimizations include

- remove double sorting while filtering object entries
- improve error message when upload-id is not in quorum
- use jsoniter for full unmarshal json, instead of gjson
- remove unused code
2019-04-23 14:54:28 -07:00
kannappanr
5ecac91a55
Replace Minio refs in docs with MinIO and links (#7494) 2019-04-09 11:39:42 -07:00
Harshavardhana
c90999df98 Valid if bucket names are internal (#7476)
This commit fixes a privilege escalation issue against
the S3 and web handlers. An authenticated IAM user
can:

- Read from or write to the internal '.minio.sys'
bucket by simply sending a properly signed
S3 GET or PUT request. Further, the user can
- Read from or write to the internal '.minio.sys'
bucket using the 'Upload'/'Download'/'DownloadZIP'
API by sending a "browser" request authenticated
with its JWT token.
2019-04-03 23:10:37 -07:00
Harshavardhana
4a698c731b HealObjects should remove objects without quorum (#7407)
This PR adds a way to list objects without quorum
such that they can purged by `mc admin heal --remove`
2019-03-26 14:57:44 -07:00
Anis Elleuch
facbd653ba Add normal/deep type of heal scanning (#7251)
Healing scan used to read all objects parts to check for bitrot
checksum. This commit will add a quicker way of healing scan
by only checking if parts are actually present in disks or not.
2019-03-14 13:08:51 -07:00
Harshavardhana
082f777281 Revamp bucket metadata healing (#7208)
Bucket metadata healing in the current code was executed multiple
times each time for a given set. Bucket metadata just like
objects are hashed in accordance with its name on any given set,
to allow hashing to play a role we should let the top level
code decide where to navigate.

Current code also had 3 bucket metadata files hardcoded, whereas
we should make it generic by listing and navigating the .minio.sys
to heal such objects.

We also had another bug where due to isObjectDangling changes
without pre-existing bucket metadata files, we were erroneously
reporting it as grey/corrupted objects.

This PR fixes all of the above items.
2019-02-11 09:23:13 +05:30
Harshavardhana
30135eed86 Redo how to handle stale dangling files (#7171)
foo.CORRUPTED should never be created because when
multiple sets are involved we would hash the file
to wrong a location, this PR removes the code.

But allows DeleteBucket() to work properly to delete
dangling buckets/objects. Also adds another option
to Healing where a user needs to specify `--remove`
such that all dangling objects will be deleted with
user confirmation.
2019-02-05 17:58:48 -08:00
Krishna Srinivas
b18c0478e7 Only heal on disks where we are sure that healing is needed (#7148) 2019-01-30 10:53:57 -08:00
Anis Elleuch
2d9860e875 heal: Fix healing empty directories (#7154)
This commit fixes the computation of Before/After healing state
for empty directories.

Issues before the commit:
- Before state doesn't reflect the real status (no StatVol() called)
- For any MakeVol() error, healObjectDir is exited directly, which is
  wrong.
2019-01-30 10:51:56 -08:00
kannappanr
d3553f8dfc
Bucket Heal: Do not add empty endpoint entry (#7172)
Currently during a heal of a bucket, if one disk is offline an empty endpoint entry is added.
Then another entry with the missing endpoint is also added.

This results in more entries than disks being added.

Code that adds empty endpoint has been removed.
2019-01-30 10:40:43 -08:00
Krishna Srinivas
51ec61ee94 Fix healing whole file bitrot (#7123)
* Use 0-byte file for bitrot verification of whole-file-bitrot files

Also pass the right checksum information for bitrot verification

* Copy xlMeta info from latest meta except []checksums and []Parts while healing
2019-01-20 07:58:40 +05:30
Krishna Srinivas
98c950aacd Streaming bitrot verification support (#7004) 2019-01-17 18:28:18 +05:30
Harshavardhana
bfb505aa8e Refactor logging in more Go idiomatic style (#6816)
This refactor brings a change which allows
targets to be added in a cleaner way and also
audit is now moved out.

This PR also simplifies logger dependency for auditing
2018-11-19 14:47:03 -08:00
Harshavardhana
8491a29ec3 Fix healing bucket properly (#6716)
Bucket should be healed properly if it partially exists
on only one set, since bucket is common for all sets.

Fixes #6710
2018-10-28 14:13:17 -07:00
Harshavardhana
223967fd32 Return always a default heal item upon unexpected error (#6556)
Never return an empty result item even upon error,
choose all the default values and based on the errors
make sure to send right result reply.
2018-10-02 17:13:51 -07:00
Harshavardhana
b4772849f9 Heal recursively all entries in config/ prefix (#6545)
This to ensure that we heal all entries in config/
prefix, we will have IAM and STS related files which
are being introduced in #6168 PR

This is a change to ensure that we heal all of them
properly, not just `config.json`
2018-10-01 22:24:26 +05:30
Harshavardhana
aebfceeafb Heal backend configuration file (#6532)
Fixes #6461
2018-09-29 13:47:01 +05:30
Krishna Srinivas
52f6d5aafc Rename of structs and methods (#6230)
Rename of ErasureStorage to Erasure (and rename of related variables and methods)
2018-08-23 23:35:37 -07:00
kannappanr
2d84b02bc4 Check for absence of checksum field and attributes. (#6298)
Fixes #6295
2018-08-20 16:58:47 -07:00
Harshavardhana
3de5a3157f Enhance picking valid xlMeta based on quorum (#6297)
This PR borrows the idea from getFormatXLQuorum()
2018-08-17 14:42:04 -07:00
kannappanr
0286e61aee Log disk not found error just once (#6059)
Modified the LogIf function to log only if the error passed
is not on the ignored errors list.

Currently, only disk not found error is added to the list.
Added a new function in logger package called LogAlwaysIf, 
which will print on any error.

Fixes #5997
2018-08-14 13:58:48 -07:00
Krishna Srinivas
ce02ab613d Simplify erasure code by separating bitrot from erasure code (#5959) 2018-08-06 15:14:08 -07:00
kannappanr
264cc4020f Return 503 instead of 404 if more than half of disks are not found (#6207)
Fixes #6163
2018-07-31 00:23:29 -07:00
Harshavardhana
c872c30ea3 fix: introduce isLeafDir in healing to fix the crash (#5920)
This PR also supports healing directories.

Fixes #5917
2018-05-10 16:53:42 -07:00
Anis Elleuch
6d5f2a4391 Better support of empty directories (#5890)
Better support of HEAD and listing of zero sized objects with trailing
slash (a.k.a empty directory). For that, isLeafDir function is added
to indicate if the specified object is an empty directory or not. Each
backend (xl, fs) has the responsibility to store that information.
Currently, in both of XL & FS, an empty directory is represented by
an empty directory in the backend.

isLeafDir() checks if the given path is an empty directory or not,
since dir listing is costly if the latter contains too many objects,
readDirN() is added in this PR to list only N number of entries.
In isLeadDir(), we will only list one entry to check if a directory
is empty or not.
2018-05-09 01:38:21 -07:00
Krishna Srinivas
9aace6d36d Continue healing other objects even if objects without quorum exist (#5851)
fixes #5815
2018-04-25 11:56:39 -07:00
kannappanr
cef992a395
Remove error package and cause functions (#5784) 2018-04-10 09:36:37 -07:00
Harshavardhana
1d31ad499f Make sure to re-load reference format after HealFormat (#5772)
This PR introduces ReloadFormat API call at objectlayer
to facilitate this. Previously we repurposed HealFormat
but we never ended up updating our reference format on
peers.

Fixes #5700
2018-04-09 22:55:41 +05:30
kannappanr
f8a3fd0c2a
Create logger package and rename errorIf to LogIf (#5678)
Removing message from error logging
Replace errors.Trace with LogIf
2018-04-05 15:04:40 -07:00
Harshavardhana
6e9c853312 After healing re-load disks with the new format (#5718)
This PR also fixes correct calculation of drive states
before and after healing of objects.

Fixes #5700
Fixes #5708
2018-03-28 06:41:39 +05:30
Harshavardhana
de44be86d0 Use readQuorum instead of writeQuorum to check bucket exists (#5715)
Fixes #5708
Fixes #5700
2018-03-26 16:36:57 -07:00
Harshavardhana
f23944aed7 Fix heal bucket deadlock after replacing disks (#5661)
Fixes #5659
2018-03-16 15:09:31 -07:00
Krishna Srinivas
9ede179a21 Use context.Background() instead of nil
Rename Context[Get|Set] -> [Get|Set]Context
2018-03-15 16:28:25 -07:00
Krishna Srinivas
e452377b24 Add context to the object-interface methods.
Make necessary changes to xl fs azure sia
2018-03-15 16:28:25 -07:00
Harshavardhana
fb96779a8a Add large bucket support for erasure coded backend (#5160)
This PR implements an object layer which
combines input erasure sets of XL layers
into a unified namespace.

This object layer extends the existing
erasure coded implementation, it is assumed
in this design that providing > 16 disks is
a static configuration as well i.e if you started
the setup with 32 disks with 4 sets 8 disks per
pack then you would need to provide 4 sets always.

Some design details and restrictions:

- Objects are distributed using consistent ordering
  to a unique erasure coded layer.
- Each pack has its own dsync so locks are synchronized
  properly at pack (erasure layer).
- Each pack still has a maximum of 16 disks
  requirement, you can start with multiple
  such sets statically.
- Static sets set of disks and cannot be
  changed, there is no elastic expansion allowed.
- Static sets set of disks and cannot be
  changed, there is no elastic removal allowed.
- ListObjects() across sets can be noticeably
  slower since List happens on all servers,
  and is merged at this sets layer.

Fixes #5465
Fixes #5464
Fixes #5461
Fixes #5460
Fixes #5459
Fixes #5458
Fixes #5460
Fixes #5488
Fixes #5489
Fixes #5497
Fixes #5496
2018-02-15 17:45:57 -08:00
Harshavardhana
994fe53669 Avoid shadowing ignored errors listAllBuckets() (#5524)
It can happen such that one of the disks that was down would
return 'errDiskNotFound' but the err is preserved due to
loop shadowing which leads to issues when healing the bucket.
2018-02-13 17:03:50 -08:00
Aditya Manthramurthy
a337ea4d11 Move admin APIs to new path and add redesigned heal APIs (#5351)
- Changes related to moving admin APIs
   - admin APIs now have an endpoint under /minio/admin
   - admin APIs are now versioned - a new API to server the version is
     added at "GET /minio/admin/version" and all API operations have the
     path prefix /minio/admin/v1/<operation>
   - new service stop API added
   - credentials change API is moved to /minio/admin/v1/config/credential
   - credentials change API and configuration get/set API now require TLS
     so that credentials are protected
   - all API requests now receive JSON
   - heal APIs are disabled as they will be changed substantially

- Heal API changes
   Heal API is now provided at a single endpoint with the ability for a
   client to start a heal sequence on all the data in the server, a
   single bucket, or under a prefix within a bucket.

   When a heal sequence is started, the server returns a unique token
   that needs to be used for subsequent 'status' requests to fetch heal
   results.

   On each status request from the client, the server returns heal result
   records that it has accumulated since the previous status request. The
   server accumulates upto 1000 records and pauses healing further
   objects until the client requests for status. If the client does not
   request any further records for a long time, the server aborts the
   heal sequence automatically.

   A heal result record is returned for each entity healed on the server,
   such as system metadata, object metadata, buckets and objects, and has
   information about the before and after states on each disk.

   A client may request to force restart a heal sequence - this causes
   the running heal sequence to be aborted at the next safe spot and
   starts a new heal sequence.
2018-01-22 14:54:55 -08:00
poornas
0bb6247056 Move nslocking from s3 layer to object layer (#5382)
Fixes #5350
2018-01-13 10:04:52 +05:30
Harshavardhana
c0721164be Automatically set goroutines based on shardSize (#5346)
Update reedsolomon library to enable feature to automatically
set number of go-routines based on the input shard size,
since shard size is sort of a constant in Minio for
objects > 10MiB (default blocksize)

klauspost reported around 15-20% improvement in performance
numbers on older systems such as AVX and SSE3

```
name                  old speed      new speed      delta
Encode10x2x10000-8    5.45GB/s ± 1%  6.22GB/s ± 1%  +14.20%    (p=0.000 n=9+9)
Encode100x20x10000-8  1.44GB/s ± 1%  1.64GB/s ± 1%  +13.77%  (p=0.000 n=10+10)
Encode17x3x1M-8       10.0GB/s ± 5%  12.0GB/s ± 1%  +19.88%  (p=0.000 n=10+10)
Encode10x4x16M-8      7.81GB/s ± 5%  8.56GB/s ± 5%   +9.58%   (p=0.000 n=10+9)
Encode5x2x1M-8        15.3GB/s ± 2%  19.6GB/s ± 2%  +28.57%   (p=0.000 n=9+10)
Encode10x2x1M-8       12.2GB/s ± 5%  15.0GB/s ± 5%  +22.45%  (p=0.000 n=10+10)
Encode10x4x1M-8       7.84GB/s ± 1%  9.03GB/s ± 1%  +15.19%    (p=0.000 n=9+9)
Encode50x20x1M-8      1.73GB/s ± 4%  2.09GB/s ± 4%  +20.59%   (p=0.000 n=10+9)
Encode17x3x16M-8      10.6GB/s ± 1%  11.7GB/s ± 4%  +10.12%   (p=0.000 n=8+10)
```
2018-01-03 13:47:22 -08:00
Nitish Tiwari
1a3dbbc9dd
Add x-amz-storage-class support (#5295)
This adds configurable data and parity options on a per object
basis. To use variable parity

- Users can set environment variables to cofigure variable
parity

- Then add header x-amz-storage-class to putobject requests
with relevant storage class values

Fixes #4997
2017-12-22 16:58:13 +05:30
Harshavardhana
8efa82126b
Convert errors tracer into a separate package (#5221) 2017-11-25 11:58:29 -08:00
Aditya Manthramurthy
4c9fae90ff Optimize healObject by eliminating extra data passes (#4949) 2017-09-28 15:57:19 -07:00
Frank Wessels
61e0b1454a Add support for timeouts for locks (#4377) 2017-08-31 14:43:59 -07:00
Andreas Auernhammer
85fcee1919 erasure: simplify XL backend operations (#4649) (#4758)
This change provides new implementations of the XL backend operations:
 - create file
 - read   file
 - heal   file
Further this change adds table based tests for all three operations.

This affects also the bitrot algorithm integration. Algorithms are now
integrated in an idiomatic way (like crypto.Hash).
Fixes #4696
Fixes #4649
Fixes #4359
2017-08-14 18:08:42 -07:00
Aditya Manthramurthy
32da1aa9d6 XL: Simplify heal-format operations
This is in preparation for updated admin heal API.

* Improve case analysis of healFormatXL() - fixes a case where disks
  could have unhandled errors.

* Simplify healFormatXLFreshDisks() and healFormatXLCorruptedDisks()
  to share more code and handle fewer cases for improved simplicity
  and reduced code repetition.

* Fix test cases.
2017-08-08 17:14:24 -07:00
Anis Elleuch
af8071c86a xl: Fix rare freeze after many disk/network errors (#4438)
xl.storageDisks is sometimes passed to some low-level XL functions. Some disks in
xl.storageDisks are set to nil when they encounter some errors. This means all
elements in xl.storageDisks will be nil after some time which lead to an unusable XL.
2017-06-14 17:14:27 -07:00