19 Commits

Author SHA1 Message Date
Harshavardhana
2d6f8153fa format: Check properly for disks in valid formats. (#3427)
There was an error in how we validated disk formats,
if one of the disk was formatted and was formatted with
FS would cause confusion and object layer would never
initialize essentially go into an infinite loop.

Validate pre-emptively and also check for FS format
properly.
2016-12-11 15:18:55 -08:00
Harshavardhana
4daa0d2cee lock: Moving locking to handler layer. (#3381)
This is implemented so that the issues like in the
following flow don't affect the behavior of operation.

```
GetObjectInfo()
.... --> Time window for mutation (no lock held)
.... --> Time window for mutation (no lock held)
GetObject()
```

This happens when two simultaneous uploads are made
to the same object the object has returned wrong
info to the client.

Another classic example is "CopyObject" API itself
which reads from a source object and copies to
destination object.

Fixes #3370
Fixes #2912
2016-12-10 16:15:12 -08:00
Harshavardhana
ff4ce0ee14 fs/xl: Combine input checks into re-usable functions. (#3383)
Repeated code around both object layers are moved
and combined into simple re-usable functions.
2016-12-01 23:15:17 -08:00
Bala FA
0f2e493c9a Use isErrIgnored() function wherever applicable. (#3343) 2016-11-23 20:05:04 -08:00
Bala FA
1d4ac4b084 Rename getUUID() into mustGetUUID() (#3320)
In case of UUID generation failure mustGetUUID() will panic than
infinitely trying in for loop.
2016-11-22 16:52:37 -08:00
Harshavardhana
5197649081 utils: reduceErrs returns and validates quorum errors. (#3300)
This is needed as explained by @krisis

Lets say we have following errors.

```
[]error{nil, errFileNotFound, errDiskAccessDenied, errDiskAccesDenied}
```

Since the last two errors are filtered, the maximum is nil,
depending on map order.

Let's say we get nil from reduceErr. Clearly at this point
we don't have quorum nodes agreeing about the data and since
GetObject only requires N/2 (Read quorum) and isDiskQuorum
would have returned true. This is problematic and can lead to
undersiable consequences.

Fixes #3298
2016-11-21 01:47:26 -08:00
Krishnan Parthasarathi
eed9ab0464 XL: pickValidXLMeta should return error instead of panic'ing (#3277) 2016-11-20 20:56:44 -08:00
Harshavardhana
0b9f0d14a1 auth/rpc: Take remote disk offline after maximum allowed attempts. (#3288)
Disks when are offline for a long period of time, we should
ignore the disk after trying Login upto 5 times.

This is to reduce the network chattiness, this also reduces
the overall time spent on `net.Dial`.

Fixes #3286
2016-11-20 16:57:12 -08:00
Anis Elleuch
ffbee70e04 Avoid removing 'tmp' directory inside '.minio.sys' (#3294) 2016-11-20 14:25:43 -08:00
Harshavardhana
1c47365445 xl/bootup: Upon bootup handle errors loading bucket and event configs. (#3287)
In a situation when we have lots of buckets the bootup time
might have slowed down a bit but during this situation the
servers quickly going up and down would be an in-transit state.

Certain calls which do not use quorum like `readXLMetaStat`
might return an error saying `errDiskNotFound` this is returned
in place of expected `errFileNotFound` which leads to an issue
where server doesn't start.

To avoid this situation we need to ignore them as safe values
to be ignored, for the most part these are network related errors.

Fixes #3275
2016-11-19 17:37:57 -08:00
Harshavardhana
c91d3791f9 heal: Add healing support for bucket, bucket metadata files. (#3252)
This patch implements healing in general but it is only used
as part of quickHeal().

Fixes #3237
2016-11-16 16:42:23 -08:00
Aditya Manthramurthy
dd0698d14c Improve namespace lock API: (#3203)
- abstract out instrumentation information.
- use separate lockInstance type that encapsulates the nsMutex, volume,
  path and opsID as the frontend or top-level lock object.
2016-11-09 10:58:41 -08:00
Harshavardhana
39331b6b4e xl: GetCheckSumInfo() shouldn't fail if hash not available. (#2984)
In a multipart upload scenario disks going down and coming backup
can lead to certain parts missing on the disk/server which was
going down. This is a valid case since these blocks can be
missing and should be healed through heal operation. But we are
not supposed to fail prematurely since we have enough data on
the other disks as well within read-quorum.

This fix relaxes previous assumption, fixes a major corruption
issue reproduced by @vadmeste.

Fixes #2976
2016-10-18 11:13:25 -07:00
Harshavardhana
fee3f99a6e xl: heal bucket should validate if bucket exists first. (#2953)
Fixes #2944
2016-10-17 02:10:23 -07:00
Krishna Srinivas
f5f007e183 Test: Add test case for xl.HealObject() (#2884)
fixes #2842
2016-10-08 17:08:17 -07:00
Harshavardhana
1e6d67b16d server: Remove deadcode. (#2699) 2016-09-14 13:43:08 -07:00
Krishna Srinivas
7cc77eba45 XL/Healing: errDiskNotFound is the only pardonable error in xlShouldHeal. (#2586)
This is so that we try to heal a file for all the "bad" cases except when the disk is down.
2016-09-13 21:18:30 -07:00
Anis Elleuch
200d327737 List only objects that need healing (#2546) 2016-09-13 21:18:30 -07:00
Harshavardhana
bccf549463 server: Move all the top level files into cmd folder. (#2490)
This change brings a change which was done for the 'mc'
package to allow for clean repo and have a cleaner
github drop in experience.
2016-08-18 16:23:42 -07:00