Commit Graph

3091 Commits

Author SHA1 Message Date
Krishna Srinivas
c49a80db41
fix: use meta.Erasure.Index for GetObject() to reconstruct object (#10764) 2020-10-26 16:19:42 -07:00
Poorna Krishnamoorthy
46275c6547
cache: rename function declarations (#10763) 2020-10-26 15:41:24 -07:00
Poorna Krishnamoorthy
0994ed9783
cache: fix call in GetObjectNInfo (#10762)
Fixes: #10751
2020-10-26 12:30:40 -07:00
Anis Elleuch
eb95353cb1
fix: Get/HeadObject return 404 on non quorum objects (#10753) 2020-10-26 10:30:46 -07:00
Harshavardhana
029758cb20
fix: retain the previous UUID for newly replaced drives (#10759)
only newly replaced drives get the new `format.json`,
this avoids disks reloading their in-memory reference
format, ensures that drives are online without
reloading the in-memory reference format.

keeping reference format in-tact means UUIDs
never change once they are formatted.
2020-10-26 10:29:29 -07:00
Harshavardhana
646d6917ed
turn-off checking for updates completely if MINIO_UPDATE=off (#10752) 2020-10-24 22:39:44 -07:00
Harshavardhana
d9db7f3308
expire lockers if lockers are offline (#10749)
lockers currently might leave stale lockers,
in unknown ways waiting for downed lockers.

locker check interval is high enough to safely
cleanup stale locks.
2020-10-24 13:23:16 -07:00
Harshavardhana
6a8c62f9fd
make sure to preserve UUID from reference format (#10748)
reference format should be source of truth
for inconsistent drives which reconnect,
add them back to their original position

remove automatic fix for existing offline
disk uuids
2020-10-24 13:23:08 -07:00
Anis Elleuch
00124c56d9
erasure: Commit data before xl.meta in RenameData() (#10734)
This will reduce the chance to have updated xl.meta without data.
2020-10-23 21:54:58 -07:00
Anis Elleuch
2c32c2149e
tests: Avoid running TestNSRace in short test mode (#10735) 2020-10-23 21:23:12 -07:00
Harshavardhana
734f258878
fix: slow down auto healing more aggressively (#10730)
Bonus fixes

- logging improvements to ensure that we don't use
  `go logger.LogIf` to avoid runtime.Caller missing
  the function name. log where necessary.
- remove unused code at erasure sets
2020-10-22 13:36:24 -07:00
Anis Elleuch
0e0c53bba4
tests: Lower expectation in addr selection in rand cache dialer (#10739)
Test TestDialContextWithDNSCacheRand was failing sometimes because it depends
on a random selection of addresses when testing random DNS resolution from cache.

Lower addr selection exception to 10%
2020-10-22 09:35:32 -07:00
Poorna Krishnamoorthy
5cc23ae052
validate if iam store is initialized (#10719)
Fixes panic - regression from d6d770c1b1
2020-10-20 21:28:24 -07:00
Harshavardhana
d6d770c1b1 initialize object layer right after config has loaded 2020-10-19 22:04:59 -07:00
Harshavardhana
b07df5cae1
initialize IAM as soon as object layer is initialized (#10700)
Allow requests to come in for users as soon as object
layer and config are initialized, this allows users
to be authenticated sooner and would succeed automatically
on servers which are yet to fully initialize.
2020-10-19 09:54:40 -07:00
Harshavardhana
c107728676
fix: s3 gateway DNS cache initialization (#10706)
fixes #10705
2020-10-19 01:34:23 -07:00
Anis Elleuch
284a2b9021
ilm: Send delete marker creation event when appropriate (#10696)
Before this commit, the crawler ILM will always send object delete event
notification though this is wrong.
2020-10-16 21:22:12 -07:00
Ritesh H Shukla
0b53e30ecb
Clean up monitor on delete bucket (#10698) 2020-10-16 17:59:31 -07:00
Harshavardhana
bd2131ba34
add DNS cache support to avoid DNS flooding (#10693)
Go stdlib resolver doesn't support caching DNS
resolutions, since we compile with CGO disabled
we are more probe to DNS flooding for all network
calls to resolve for DNS from the DNS server.

Under various containerized environments such as
VMWare this becomes a problem because there are
no DNS caches available and we may end up overloading
the kube-dns resolver under concurrent I/O.

To circumvent this issue implement a DNSCache resolver
which resolves DNS and caches them for around 10secs
with every 3sec invalidation attempted.
2020-10-16 14:49:05 -07:00
ebozduman
1aec168c84
fix: azure gateway should reject bucket names with "." (#10635) 2020-10-16 09:30:18 -07:00
Klaus Post
21a549a83b
fix: keep MRF channel open to avoid random CI crash (#10686)
There doesn't seem to be any benefit to closing the channel, so just keep 
it open and let it die with the server.
2020-10-16 09:08:51 -07:00
Ritesh H Shukla
8a16a1a1a9
fix: misc fixes for bandwidth reporting amd monitoring (#10683)
* Set peer for fetch bandwidth
* Fix the limit for bandwidth that is reported.
* Reduce CPU burn from bandwidth management.
2020-10-16 09:07:50 -07:00
Harshavardhana
ad726b49b4
rename zones to serverSets to avoid terminology conflict (#10679)
we are bringing in availability zones, we should avoid
zones as per server expansion concept.
2020-10-15 14:28:50 -07:00
Anis Elleuch
db2241066b
heal: Enable removing dangling delete markers (#10688) 2020-10-15 13:06:40 -07:00
Harshavardhana
f1cc16e788
fix: background heal rely on getOnlineDisks() (#10687) 2020-10-15 13:06:23 -07:00
Klaus Post
3820a905e0
in getOnlineDisks wait for disks to be populated (#10685) 2020-10-15 06:37:10 -07:00
Harshavardhana
2042d4873c
rename crawler config option to heal (#10678) 2020-10-14 13:51:51 -07:00
Harshavardhana
f9be783f3e
fix: allow crawler to crawl on disks without usage constraints (#10677)
additionally also change the resolution usage wise
return of disks, allows to small byte level differences
to be masked.
2020-10-14 12:12:10 -07:00
Harshavardhana
71b97fd3ac
fix: connect disks pre-emptively during startup (#10669)
connect disks pre-emptively upon startup, to ensure we have
enough disks are connected at startup rather than wait
for them.

we need to do this to avoid long wait times for server to
be online when we have servers come up in rolling upgrade
fashion
2020-10-13 18:28:42 -07:00
Klaus Post
03991c5d41
crawler: Remove waitForLowActiveIO (#10667)
Only use dynamic delays for the crawler. Even though the max wait was 1 second the number 
of waits could severely impact crawler speed.

Instead of relying on a global metric, we use the stateless local delays to keep the crawler 
running at a speed more adjusted to current conditions.

The only case we keep it is before bitrot checks when enabled.
2020-10-13 13:45:08 -07:00
飞雪无情
614060764d
fix: use the correct Action type for policy.Args and iampolicy.Args (#10650) 2020-10-12 15:18:22 -07:00
Harshavardhana
a3ba8188d7 fix: allow locker to be niladic 2020-10-12 14:23:44 -07:00
Harshavardhana
2760fc86af
Bump default idleConnsPerHost to control conns in time_wait (#10653)
This PR fixes a hang which occurs quite commonly at higher concurrency
by allowing following changes

- allowing lower connections in time_wait allows faster socket open's
- lower idle connection timeout to ensure that we let kernel
  reclaim the time_wait connections quickly
- increase somaxconn to 4096 instead of 2048 to allow larger tcp
  syn backlogs.

fixes #10413
2020-10-12 14:19:46 -07:00
Ritesh H Shukla
8ceb2a93fd
fix: peer replication bandwidth monitoring in distributed setup (#10652) 2020-10-12 09:04:55 -07:00
Ritesh H Shukla
c2f16ee846
Add basic bandwidth monitoring for replication. (#10501)
This change tracks bandwidth for a bucket and object

- [x] Add Admin API
- [x] Add Peer API
- [x] Add BW throttling
- [x] Admin APIs to set replication limit
- [x] Admin APIs for fetch bandwidth
2020-10-09 20:36:00 -07:00
Harshavardhana
6484453fc6
optionally allow strict quorum listing (#10649)
```
export MINIO_API_LIST_STRICT_QUORUM=on
```

would enable listing in quorum if necessary
2020-10-09 15:40:46 -07:00
Harshavardhana
a0d0645128
remove safeMode behavior in startup (#10645)
In almost all scenarios MinIO now is
mostly ready for all sub-systems
independently, safe-mode is not useful
anymore and do not serve its original
intended purpose.

allow server to be fully functional
even with config partially configured,
this is to cater for availability of actual
I/O v/s manually fixing the server.

In k8s like environments it will never make
sense to take pod into safe-mode state,
because there is no real access to perform
any remote operation on them.
2020-10-09 09:59:52 -07:00
Harshavardhana
253194e491
do not hold write locks - if objects don't exist (#10644) 2020-10-08 17:47:21 -07:00
Harshavardhana
736e58dd68
fix: handle concurrent lockers with multiple optimizations (#10640)
- select lockers which are non-local and online to have
  affinity towards remote servers for lock contention

- optimize lock retry interval to avoid sending too many
  messages during lock contention, reduces average CPU
  usage as well

- if bucket is not set, when deleteObject fails make sure
  setPutObjHeaders() honors lifecycle only if bucket name
  is set.

- fix top locks to list out always the oldest lockers always,
  avoid getting bogged down into map's unordered nature.
2020-10-08 12:32:32 -07:00
Poorna Krishnamoorthy
907a171edd
Generalize error messages for remote targets (#10638)
This is to allow remote targets to be generalized
for replication/ILM transition

Also adding a field in BucketTarget to identify
a remote target with a label.
2020-10-08 10:54:11 -07:00
Andreas Auernhammer
ed6d2a100f
logger: avoid writing audit log response header twice (#10642)
This commit fixes a misuse of the `http.ResponseWriter.WriteHeader`.
A caller should **either** call `WriteHeader` exactly once **or**
write to the response writer and causing an implicit 200 OK.

Writing the response headers more than once causes a `http: superfluous
response.WriteHeader call` log message. This commit fixes this
by preventing a 2nd `WriteHeader` call being forwarded to the underlying
`ResponseWriter`.

Updates #10587
2020-10-08 09:29:10 -07:00
Harshavardhana
effe131090
fix: allow read unlocks to be defensive about split brains (#10637) 2020-10-07 09:15:01 -07:00
Harshavardhana
18063bf25c
fix: cleanup old directory handling code (#10633)
we don't need them anymore, remove legacy code.
2020-10-06 12:03:57 -07:00
Poorna Krishnamoorthy
dbbed6f7f0
update minio-go dependency (#10634) 2020-10-06 08:37:09 -07:00
Poorna Krishnamoorthy
7fbfdceba3
Fix replication slowness (#10632)
- Increase channel buffer length
- Avoid blocking wait on replicaCh
2020-10-05 14:45:42 -07:00
Shireesh Anjal
f1418a50f0
add NVMe drive info [model num, serial num, drive temp. etc.] (#10613)
* add NVMe drive info [model num, serial num, drive temp. etc.]
* Ignore fuse partitions
* Add the nvme logic only for linux
* Move smart/nvme structs to a separate file

Co-authored-by: wlan0 <sidharthamn@gmail.com>
2020-10-04 10:18:46 -07:00
Krishna Srinivas
045e30f2c1
Set LastModified time from source for bucket replication (#10627) 2020-10-02 18:32:22 -07:00
Harshavardhana
c6a9a94f94
fix: optimize ServerInfo() handler to avoid reading config (#10626)
fixes #10620
2020-10-02 16:19:44 -07:00
Harshavardhana
8e7c00f3d4
add missing request-id from DeleteObject events (#10623)
fixes #10621
2020-10-02 13:36:13 -07:00
Harshavardhana
23e8390997
fix: Allow Walk to honor load balanced drives (#10610) 2020-10-01 20:24:34 -07:00
Anis Elleuch
71403be912
fix: consider partNumber in GET/HEAD requests (#10618) 2020-10-01 15:41:12 -07:00
Harshavardhana
f28d02b7f2
fix: simplify obd how we calculate transferred bytes (#10617) 2020-10-01 14:34:51 -07:00
Harshavardhana
e0cb814f3f
fail if port is not accessible (#10616)
throw proper error when port is not accessible
for the regular user, this is possibly a regression.

```
ERROR Unable to start the server: Insufficient permissions to use specified port
   > Please ensure MinIO binary has 'cap_net_bind_service=+ep' permissions
   HINT:
     Use 'sudo setcap cap_net_bind_service=+ep /path/to/minio' to provide sufficient permissions
```
2020-10-01 13:23:31 -07:00
Harshavardhana
98a08e1644
fix: protect updating latencies/throughput slices in obd (#10611)
Additionally close the transferChan upon function exit.
2020-10-01 09:50:08 -07:00
Klaus Post
3047121255
dataupdate: Bump to force rescan (#10609)
After #10594 let's invalidate the bloom filters to force the next cycles to go through all data.

There is a small chance that the linked PR could have caused missing bloom filter data.

This will invalidate the current bloom filters and make the crawler go through everything.
2020-09-30 16:10:40 -07:00
Ritesh H Shukla
5a7f92481e
fix: client errors for DNS service creation errors (#10584) 2020-09-30 14:09:41 -07:00
Anis Elleuch
0d45c38782
List v1/versions routes based on source IP if found (#10603)
Routing using on source IP if found. This should distribute
the listing load for V1 and versioning on multiple nodes
evenly between different clients.

If source IP is not found from the http request header, then falls back
to bucket name instead.
2020-09-30 13:38:27 -07:00
Poorna Krishnamoorthy
56d1b227cf
Handle changes to versioning config for replication (#10598)
Disallow versioning suspension on a bucket with
pre-existing replication configuration

If versioning is suspended on the target,replication
should fail.
2020-09-30 13:36:37 -07:00
Lenin Alevski
bea87a5a20
fix: reading multiple TLS certificates when deployed in K8S (#10601)
Ignore all regular files, CAs directory and any 
directory that starts with `..` inside the
`.minio/certs` folder
2020-09-30 08:21:30 -07:00
Harshavardhana
2b4eb87d77
pick disks which are common maximally used (#10600)
further optimization to ensure that good disks
are always used for listing, other than healing
we only use disks that are maximally used.
2020-09-29 22:54:02 -07:00
Harshavardhana
1f9abbee4d
make sure to release locks upon timeout (#10596)
fixes #10418
2020-09-29 15:18:34 -07:00
Klaus Post
fdf0ae9167
exit data update tracker only upon context completion (#10594)
The data update tracker saver would exit if data wasn't updated for between cycles.
2020-09-29 13:23:53 -07:00
Harshavardhana
00eb6f6bc9
cache DiskInfo at storage layer for performance (#10586)
`mc admin info` on busy setups will not move HDD
heads unnecessarily for repeated calls, provides
a better responsiveness for the call overall.

Bonus change allow listTolerancePerSet be N-1
for good entries, to avoid skipping entries
for some reason one of the disk went offline.
2020-09-29 09:54:41 -07:00
Harshavardhana
66174692a2
add '.healing.bin' for tracking currently healing disk (#10573)
add a hint on the disk to allow for tracking fresh disk
being healed, to allow for restartable heals, and also
use this as a way to track and remove disks.

There are more pending changes where we should move
all the disk formatting logic to backend drives, this
PR doesn't deal with this refactor instead makes it
easier to track healing in the future.
2020-09-28 19:39:32 -07:00
飞雪无情
209680e89f
Remove redundant http.HandlerFunc type conversion. (#10576) 2020-09-28 13:33:49 -07:00
飞雪无情
27d9bd04e5
Handling unhandled errors in the InfoCannedPolicy method. (#10575) 2020-09-27 10:24:04 -07:00
Harshavardhana
bebcf4f004 unlock() only if locking was successful 2020-09-25 19:36:47 -07:00
Harshavardhana
eafa775952
fix: add lock ownership to expire locks (#10571)
- Add owner information for expiry, locking, unlocking a resource
- TopLocks returns now locks in quorum by default, provides
  a way to capture stale locks as well with `?stale=true`
- Simplify the quorum handling for locks to avoid from storage
  class, because there were challenges to make it consistent
  across all situations.
- And other tiny simplifications to reset locks.
2020-09-25 19:21:52 -07:00
Harshavardhana
66b4a862e0
fix: network failure err check should ignore context canceled errors (#10567)
context canceled errors bubbling up from the network
layer has the potential to be misconstrued as network
errors, taking prematurely a server offline and triggering
a health check routine avoid this potential occurrence.
2020-09-25 14:35:47 -07:00
Anis Elleuch
9603489dd3
federation: Honor range with UploadObjectPart to a different cluster (#10570)
Use gr & length instead of srcInfo.Reader & srcInfo.Size because 
they don't honor range header
2020-09-25 12:06:42 -07:00
Anis Elleuch
b302c8a5f4
heal: Fix periodic healing cleanup (#10569)
isEnded() was incorrectly calculating if the current healing sequence is
ended or not. h.currentStatus.Items could be empty if healing is very
slow and mc admin heal consumed all items.
2020-09-25 10:29:00 -07:00
Praveen raj Mani
b880796aef
Set the maximum open connections limit in PG and MySQL target configs (#10558)
As the bulk/recursive delete will require multiple connections to open at an instance,
The default open connections limit will be reached which results in the following error

```FATAL:  sorry, too many clients already```

By setting the open connections to a reasonable value - `2`, We ensure that the max open connections
will not be exhausted and lie under bounds.

The queries are simple inserts/updates/deletes which is operational and sufficient with the
the maximum open connection limit is 2.

Fixes #10553

Allow user configuration for MaxOpenConnections
2020-09-24 22:20:30 -07:00
Harshavardhana
37a5d5d7a0
reduce timeouts between servers for faster disconnects (#10562) 2020-09-24 20:10:07 -07:00
Harshavardhana
3cac262dd1
report heal drives properly, also from global state (#10561)
It is possible the heal drives are not reported from
the maintenance check because the background heal
state simply relied on the `format.json` for capturing
unformatted drives. It is possible that drives might
be still healing - make sure that applications which
rely on cluster health check respond back this detail.
2020-09-24 15:36:47 -07:00
poornas
e6ab4db6b8
Fix minimum replication workers started (#10560)
This PR also fixes GetReplicationConfiguration permission
in web-handlers.go to use bucket as resource
2020-09-24 12:25:41 -07:00
Harshavardhana
ca989eb0b3
avoid ListBuckets returning quorum errors when node is down (#10555)
Also, revamp the way ListBuckets work make few portions
of the healing logic parallel

- walk objects for healing disks in parallel
- collect the list of buckets in parallel across drives
- provide consistent view for listBuckets()
2020-09-24 09:53:38 -07:00
飞雪无情
d778d034e7
Remove redundant mgmtQueryKey type. (#10557)
Remove redundant type conversion.
2020-09-24 08:40:21 -07:00
Harshavardhana
f7f9517b6a fix: host extraction without port 2020-09-23 12:10:14 -07:00
Harshavardhana
90cff10e2b avoid crash if disks are not initialized 2020-09-23 12:00:29 -07:00
Harshavardhana
81caf35926
fix: reduce healthcheck interval for storage rest client (#10544) 2020-09-23 10:43:42 -07:00
poornas
5726cef3ca
validate bucket exists in ListRemoteTargets api (#10552) 2020-09-23 10:37:54 -07:00
Harshavardhana
8b74a72b21
fix: rename READY deadline to CLUSTER deadline ENV (#10535) 2020-09-23 09:14:33 -07:00
Klaus Post
eec69d6796
Fix stale context for bucket retrieval (#10551)
The provided context gets captured by the closure making all subsequent calls fail.
2020-09-23 08:30:31 -07:00
Harshavardhana
0537a21b79
avoid concurrenct use of rand.NewSource (#10543) 2020-09-22 15:34:27 -07:00
poornas
4c54ed8748
Close replica channel only once (#10542)
Also enforce s3:GetReplicationConfiguration permission check as a
bucket level resource.
2020-09-22 12:47:24 -07:00
Anis Elleuch
4c81201f95
fix: healing delete marker on versioned buckets (#10530)
Healing was not working correctly in the distributed mode because
errFileVersionNotFound was not properly converted in storage rest
client.

Besides, fixing the healing delete marker is not working as expected.
2020-09-21 15:16:16 -07:00
Harshavardhana
cd8d511d3d move versionsOrder struct to xl-storage-utils 2020-09-21 14:24:42 -07:00
Harshavardhana
17e17da00d
add parallel workers to perform replication in parallel (#10525)
set the concurrency for replication be to runtime.NumCPU()/2
2020-09-21 13:43:29 -07:00
Harshavardhana
a5da9120f3
fix: [fs] an error upon rwPool.Write() just attempt rwPool.Create() (#10533)
On some NFS clients looks like errno is incorrectly set,
which leads to incorrect errors thrown upwards.
2020-09-21 12:54:23 -07:00
poornas
aa12d75d75
fix crawler to detect lifecycle on bucket even if filter nil (#10532) 2020-09-21 11:41:07 -07:00
Harshavardhana
6fcbdd5607
remove unused putObjectDir code (#10528) 2020-09-21 09:41:39 -07:00
Harshavardhana
3831cc9e3b
fix: [fs] CompleteMultipart use trie structure for partMatch (#10522)
performance improves by around 100x or more

```
go test -v -run NONE -bench BenchmarkGetPartFile
goos: linux
goarch: amd64
pkg: github.com/minio/minio/cmd
BenchmarkGetPartFileWithTrie
BenchmarkGetPartFileWithTrie-4          1000000000               0.140 ns/op           0 B/op          0 allocs/op
PASS
ok      github.com/minio/minio/cmd      1.737s
```

fixes #10520
2020-09-21 01:18:13 -07:00
Krishna Srinivas
230fc0d186
Support for "directory" objects (#10499) 2020-09-19 08:39:41 -07:00
Harshavardhana
7f9498f43f
fix: ignore faulty drives and continue (#10511)
drives might return different types of errors
handle them individually, and for some errors
just log an error and continue
2020-09-18 12:09:05 -07:00
Harshavardhana
1cf322b7d4
change leader locker only for crawler (#10509) 2020-09-18 11:15:54 -07:00
Klaus Post
0b1c824618
Fix incorrect request start time (#10516)
Log request start time BEFORE starting processing the request
2020-09-18 09:30:52 -07:00
Klaus Post
c851e022b7
Tweaks to dynamic locks (#10508)
* Fix cases where minimum timeout > default timeout.
* Add defensive code for too small/negative timeouts.
* Never set timeout below the maximum value of a request.
* Protect against (unlikely) int64 wraps.
* Decrease timeout slower.
* Don't re-lock before copying.
2020-09-18 09:18:18 -07:00
Klaus Post
5ad032826a
Add a reasonable if unable to get total RAM (#10506)
Though unlikely we shouldn't skip initializing the API if we cannot get RAM.

Add 16GiB as a default and log the error.
2020-09-18 02:03:02 -07:00
Harshavardhana
84bf4624a4
fix: make sure to preserve metadata during overwrite in FS mode (#10512)
This bug was introduced in 14f0047295
almost 3yrs ago, as a side affect of removing stale `fs.json`
but we in-fact end up removing existing good `fs.json` for an
existing object, leading to some form of a data loss.

fixes #10496
2020-09-18 00:16:16 -07:00
Harshavardhana
4a36cd7035
fix: improve performance ListObjectParts in FS mode (#10510)
from 20s for 10000 parts to less than 1sec

Without the patch
```
~ time aws --endpoint-url=http://localhost:9000 --profile minio s3api \
       list-parts --bucket testbucket --key test \
       --upload-id c1cd1f50-ea9a-4824-881c-63b5de95315a

real    0m20.394s
user    0m0.589s
sys     0m0.174s
```

With the patch
```
~ time aws --endpoint-url=http://localhost:9000 --profile minio s3api \
       list-parts --bucket testbucket --key test \
       --upload-id c1cd1f50-ea9a-4824-881c-63b5de95315a

real    0m0.891s
user    0m0.624s
sys     0m0.182s
```

fixes #10503
2020-09-17 18:51:16 -07:00
Klaus Post
03490c811b
Fix obd goroutine leak (#10504)
The gouroutine collecting transfer stats never exits. Add missing channel close.
2020-09-17 10:10:20 -07:00
Harshavardhana
ed78854cea fix: list across all drives to avoid stale disks 2020-09-16 21:17:10 -07:00
Harshavardhana
e60834838f
fix: background disk heal, to reload format consistently (#10502)
It was observed in VMware vsphere environment during a
pod replacement, `mc admin info` might report incorrect
offline nodes for the replaced drive. This issue eventually
goes away but requires quite a lot of time for all servers
to be in sync.

This PR fixes this behavior properly.
2020-09-16 21:14:35 -07:00
Harshavardhana
d616d8a857
serialize replication and feed it through task model (#10500)
this allows for eventually controlling the concurrency
of replication and overally control of throughput
2020-09-16 16:04:55 -07:00
Anis Elleuch
24cab7f9df
ilm: Remove a 'null' version if not latest (#10494)
If the ILM document requires removing noncurrent versions, the 
the server should be able to remove 'null' versions as well. 
'null' versions are created when versioning is not enabled 
or suspended.
2020-09-16 10:21:50 -07:00
Harshavardhana
02c1a08a5b
fix: make sure to lock CopyObject for in-place updates (#10492) 2020-09-15 20:44:48 -07:00
Ritesh H Shukla
5c47ce456e
Run replication in the background (#10491) 2020-09-15 18:44:58 -07:00
Anis Elleuch
8ea55f9dba
obd: Add console log to OBD output (#10372) 2020-09-15 18:02:54 -07:00
poornas
80e3dce631
azure: update content-md5 to metadata after upload (#10482)
Fixes #10453
2020-09-15 16:31:47 -07:00
Harshavardhana
80fab03b63
fix: S3 gateway doesn't support full passthrough for encryption (#10484)
The entire encryption layer is dependent on the fact that
KMS should be configured for S3 encryption to work properly
and we only support passing the headers as is to the backend
for encryption only if KMS is configured.

Make sure that this predictability is maintained, currently
the code was allowing encryption to go through and fail
at later to indicate that KMS was not configured. We should
simply reply "NotImplemented" if KMS is not configured, this
allows clients to simply proceed with their tests.
2020-09-15 13:57:15 -07:00
Harshavardhana
730d2dc7be
fix: allow CopyObject/PutObjecTags on pre-existing content (#10485)
fixes #10475
2020-09-15 09:18:41 -07:00
Harshavardhana
0ee9678190
fix: add missing delete marker created filter (#10481) 2020-09-14 21:32:52 -07:00
Klaus Post
34859c6d4b
Preallocate (safe) slices when we know the size (#10459) 2020-09-14 20:44:18 -07:00
Klaus Post
b1c99e88ac
reduce CPU usage upto 50% in readdir (#10466) 2020-09-14 17:19:54 -07:00
Harshavardhana
0104af6bcc
delayed locks until we have started reading the body (#10474)
This is to ensure that Go contexts work properly, after some
interesting experiments I found that Go net/http doesn't
cancel the context when Body is non-zero and hasn't been
read till EOF.

The following gist explains this, this can lead to pile up
of go-routines on the server which will never be canceled
and will die at a really later point in time, which can
simply overwhelm the server.

https://gist.github.com/harshavardhana/c51dcfd055780eaeb71db54f9c589150

To avoid this refactor the locking such that we take locks after we
have started reading from the body and only take locks when needed.

Also, remove contextReader as it's not useful, doesn't work as expected
context is not canceled until the body reaches EOF so there is no point
in wrapping it with context and putting a `select {` on it which
can unnecessarily increase the CPU overhead.

We will still use the context to cancel the lockers etc.
Additional simplification in the locker code to avoid timers
as re-using them is a complicated ordeal avoid them in
the hot path, since locking is very common this may avoid
lots of allocations.
2020-09-14 15:57:13 -07:00
Harshavardhana
34ea1d2167
fix: return correct error code for MetadataTooLarge (#10470)
fixes #10469
2020-09-13 21:26:35 -07:00
Harshavardhana
9d95937018 update KMS docs indicating deprecation of AUTO_ENCRYPTION env 2020-09-13 16:23:28 -07:00
Klaus Post
fa01e640f5
Continous healing: add optional bitrot check (#10417) 2020-09-12 00:08:12 -07:00
Harshavardhana
f355374962
add support for configurable remote transport deadline (#10447)
configurable remote transport timeouts for some special cases
where this value needs to be bumped to a higher value when
transferring large data between federated instances.
2020-09-11 23:03:08 -07:00
Harshavardhana
bda0fe3150
fix: allow LDAP identity to support form body POST (#10468)
similar to other STS APIs
2020-09-11 23:02:32 -07:00
Harshavardhana
b70995dd60 Revert "ilm: Remove null version if not latest with proper config (#10467)"
This reverts commit 4b6264da7d.
2020-09-11 18:15:49 -07:00
Anis Elleuch
4b6264da7d
ilm: Remove null version if not latest with proper config (#10467) 2020-09-11 14:20:09 -07:00
Harshavardhana
48919de301
fix: for defer'ed deleteObject use internal context (#10463) 2020-09-11 06:39:19 -07:00
Harshavardhana
eb2934f0c1
simplify webhook DNS further generalize for gateway (#10448)
continuation of the changes from eaaf05a7cc
this further simplifies, enables this for gateway deployments as well
2020-09-10 14:19:32 -07:00
Klaus Post
b7438fe4e6
Copy metadata before spawning goroutine + prealloc maps (#10458)
In `(*cacheObjects).GetObjectNInfo` copy the metadata before spawning a goroutine.

Clean up a few map[string]string copies as well, reducing allocs and simplifying the code.

Fixes #10426
2020-09-10 11:37:22 -07:00
Anis Elleuch
ce6cef6855
erasure: Call Walk() from all disks (#10445)
It does not make sense to call Walk() in only N/2 disks and then
requires N/2 quorum, just keep it N/2+1 

The commit fixes this behavior.
2020-09-10 09:27:52 -07:00
Klaus Post
493c714663
Remove erasureSets and erasureObjects from ObjectLayer (#10442) 2020-09-10 09:18:19 -07:00
Harshavardhana
e959c5d71c
fix: server panic in FS mode (#10455)
fixes #10454
2020-09-10 09:16:26 -07:00
Harshavardhana
4a2928eb49
generate missing object delete bucket notifications (#10449)
fixes #10381
2020-09-09 18:23:08 -07:00
Anis Elleuch
af88772a78
lifecycle: NoncurrentVersionExpiration considers noncurrent version age (#10444)
From https://docs.aws.amazon.com/AmazonS3/latest/dev/intro-lifecycle-rules.html#intro-lifecycle-rules-actions

```
When specifying the number of days in the NoncurrentVersionTransition
and NoncurrentVersionExpiration actions in a Lifecycle configuration,
note the following:

It is the number of days from when the version of the object becomes
noncurrent (that is, when the object is overwritten or deleted), that
Amazon S3 will perform the action on the specified object or objects.

Amazon S3 calculates the time by adding the number of days specified in
the rule to the time when the new successor version of the object is
created and rounding the resulting time to the next day midnight UTC.
For example, in your bucket, suppose that you have a current version of
an object that was created at 1/1/2014 10:30 AM UTC. If the new version
of the object that replaces the current version is created at 1/15/2014
10:30 AM UTC, and you specify 3 days in a transition rule, the
transition date of the object is calculated as 1/19/2014 00:00 UTC.
```
2020-09-09 18:11:24 -07:00
Harshavardhana
9109148474
add support for new UA values for update an check (#10451) 2020-09-09 17:21:39 -07:00
Nitish Tiwari
eaaf05a7cc
Add Kubernetes operator webook server as DNS target (#10404)
This PR adds a DNS target that ensures to update an entry
into Kubernetes operator when a bucket is created or deleted.

See minio/operator#264 for details.

Co-authored-by: Harshavardhana <harsha@minio.io>
2020-09-09 12:20:49 -07:00
Harshavardhana
958661cbb5
skip subdomain from bucket DNS which start with minio.domain (#10390)
extend host matcher to reject the host match
2020-09-09 09:57:37 -07:00
Harshavardhana
6a0372be6c
cleanup tmpDir any older entries automatically just like multipart (#10439)
also consider multipart uploads, temporary files in `.minio.sys/tmp`
as stale beyond 24hrs and clean them up automatically
2020-09-08 15:55:40 -07:00
Harshavardhana
c13afd56e8
Remove MaxConnsPerHost settings to avoid potential hangs (#10438)
MaxConnsPerHost can potentially hang a call without any
way to timeout, we do not need this setting for our proxy
and gateway implementations instead IdleConn settings are
good enough.

Also ensure to use NewRequestWithContext and make sure to
take the disks offline only for network errors.

Fixes #10304
2020-09-08 14:22:04 -07:00
Harshavardhana
96997d2b21
allow ctrl+c to be consistent at early startup (#10435)
fixes #10431
2020-09-08 09:10:55 -07:00
Klaus Post
86a3319d41
Ignore config values from unknown subsystems (#10432) 2020-09-08 08:57:04 -07:00
Harshavardhana
9f60e84ce1
always copy UserDefined metadata map (#10427)
fixes #10426
2020-09-07 09:25:28 -07:00
Harshavardhana
572b1721b2
set max API requests automatically based on RAM (#10421) 2020-09-04 19:37:37 -07:00
Harshavardhana
b0e1d4ce78
re-attach offline drive after new drive replacement (#10416)
inconsistent drive healing when one of the drive is offline
while a new drive was replaced, this change is to ensure
that we can add the offline drive back into the mix by
healing it again.
2020-09-04 17:09:02 -07:00
Harshavardhana
eb19c8af40
Bump response header timeout for proxying list request (#10420) 2020-09-04 16:07:40 -07:00
Klaus Post
2d58a8d861
Add storage layer contexts (#10321)
Add context to all (non-trivial) calls to the storage layer. 

Contexts are propagated through the REST client.

- `context.TODO()` is left in place for the places where it needs to be added to the caller.
- `endWalkCh` could probably be removed from the walkers, but no changes so far.

The "dangerous" part is that now a caller disconnecting *will* propagate down,  so a 
"delete" operation will now be interrupted. In some cases we might want to disconnect 
this functionality so the operation completes if it has started, leaving the system in a cleaner state.
2020-09-04 09:45:06 -07:00
poornas
0037951b6e
improve error message when remote target missing (#10412) 2020-09-04 08:48:38 -07:00
Andreas Auernhammer
fbd1c5f51a
certs: refactor cert manager to support multiple certificates (#10207)
This commit refactors the certificate management implementation
in the `certs` package such that multiple certificates can be
specified at the same time. Therefore, the following layout of
the `certs/` directory is expected:
```
certs/
 │
 ├─ public.crt
 ├─ private.key
 ├─ CAs/          // CAs directory is ignored
 │   │
 │    ...
 │
 ├─ example.com/
 │   │
 │   ├─ public.crt
 │   └─ private.key
 └─ foobar.org/
     │
     ├─ public.crt
     └─ private.key
   ...
```

However, directory names like `example.com` are just for human
readability/organization and don't have any meaning w.r.t whether
a particular certificate is served or not. This decision is made based
on the SNI sent by the client and the SAN of the certificate.

***

The `Manager` will pick a certificate based on the client trying
to establish a TLS connection. In particular, it looks at the client
hello (i.e. SNI) to determine which host the client tries to access.
If the manager can find a certificate that matches the SNI it
returns this certificate to the client.

However, the client may choose to not send an SNI or tries to access
a server directly via IP (`https://<ip>:<port>`). In this case, we
cannot use the SNI to determine which certificate to serve. However,
we also should not pick "the first" certificate that would be accepted
by the client (based on crypto. parameters - like a signature algorithm)
because it may be an internal certificate that contains internal hostnames. 
We would disclose internal infrastructure details doing so.

Therefore, the `Manager` returns the "default" certificate when the
client does not specify an SNI. The default certificate the top-level
`public.crt` - i.e. `certs/public.crt`.

This approach has some consequences:
 - It's the operator's responsibility to ensure that the top-level
   `public.crt` does not disclose any information (i.e. hostnames)
   that are not publicly visible. However, this was the case in the
   past already.
 - Any other `public.crt` - except for the top-level one - must not
   contain any IP SAN. The reason for this restriction is that the
   Manager cannot match a SNI to an IP b/c the SNI is the server host
   name. The entire purpose of SNI is to indicate which host the client
   tries to connect to when multiple hosts run on the same IP. So, a
   client will not set the SNI to an IP.
   If we would allow IP SANs in a lower-level `public.crt` a user would
   expect that it is possible to connect to MinIO directly via IP address
   and that the MinIO server would pick "the right" certificate. However,
   the MinIO server cannot determine which certificate to serve, and
   therefore always picks the "default" one. This may lead to all sorts
   of confusing errors like:
   "It works if I use `https:instance.minio.local` but not when I use
   `https://10.0.2.1`.

These consequences/limitations should be pointed out / explained in our
docs in an appropriate way. However, the support for multiple
certificates should not have any impact on how deployment with a single
certificate function today.

Co-authored-by: Harshavardhana <harsha@minio.io>
2020-09-03 23:33:37 -07:00
Harshavardhana
1c6781757c
add missing ListBucketVersions from policy actions (#10414) 2020-09-03 18:25:06 -07:00
Harshavardhana
b4e3956e69
update KES docs to talk about 'mc encrypt' command (#10400)
add a deprecation notice for KMS_AUTO_ENCRYPTION
2020-09-03 12:43:45 -07:00
Harshavardhana
8a291e1dc0
Cluster healthcheck improvements (#10408)
- do not fail the healthcheck if heal status
  was not obtained from one of the nodes,
  if many nodes fail then report this as a
  catastrophic error.
- add "x-minio-write-quorum" value to match
  the write tolerance supported by server.
- admin info now states if a drive is healing
  where madmin.Disk.Healing is set to true
  and madmin.Disk.State is "ok"
2020-09-02 22:54:56 -07:00
Klaus Post
650dccfa9e
cache: Only start at high watermark (#10403)
Currently, cache purges are triggered as soon as the low watermark is exceeded.
To reduce IO this should only be done when reaching the high watermark.
This simplifies checks and reduces all calls for a GC to go through
`dcache.diskSpaceAvailable(size)`. While a comment claims that 
`dcache.triggerGC <- struct{}{}` was non-blocking I don't see how 
that was possible. Instead, we add a 1 size to the queue channel 
and use channel  semantics to avoid blocking when a GC has 
already been requested.

`bytesToClear` now takes the high watermark into account to it will 
not request any bytes to be cleared until that is reached.
2020-09-02 17:48:44 -07:00
Andreas Auernhammer
9a703befe6
crypto: reduce retry delay when retrying KES requests (#10394)
This commit reduces the retry delay when retrying a request
to a KES server by:
 - reducing the max. jitter delay from 3s to 1.5s
 - skipping the random delay when there are more KES endpoints
   available.

If there are more KES endpoints we can directly retry to the request
by sending it to the next endpoint - as pointed out by @krishnasrinivas
2020-09-02 11:04:10 -07:00
Klaus Post
9a1615768d
Fix flaky TestXLStorageVerifyFile (#10398)
`TestXLStorageVerifyFile` would fail 1 in 256 if the first random character was 'a'.

Instead write 256 bytes which has 1 in 256^256 probability.
2020-09-02 09:42:24 -07:00