Commit Graph

3054 Commits

Author SHA1 Message Date
Klaus Post
6135f072d2
Fix invalidated metacaches (#10784)
* Fix caches having EOF marked as a failure.
* Simplify cache updates.
* Provide context for checkMetacacheState failures.
* Log 499 when the client disconnects.
2020-10-30 09:33:16 -07:00
Klaus Post
e63a44b734
rest client: Expect context timeouts for locks (#10782)
Add option for rest clients to not mark a remote offline for context timeouts.

This can be used if context timeouts are expected on the call.
2020-10-29 09:52:11 -07:00
Klaus Post
6b14c4ab1e
Optimize decryptObjectInfo (#10726)
`decryptObjectInfo` is a significant bottleneck when listing objects.

Reduce the allocations for a significant speedup.

https://github.com/minio/sio/pull/40

```
λ benchcmp before.txt after.txt
benchmark                          old ns/op     new ns/op     delta
Benchmark_decryptObjectInfo-32     24260928      808656        -96.67%

benchmark                          old MB/s     new MB/s     speedup
Benchmark_decryptObjectInfo-32     0.04         1.24         31.00x

benchmark                          old allocs     new allocs     delta
Benchmark_decryptObjectInfo-32     75112          48996          -34.77%

benchmark                          old bytes     new bytes     delta
Benchmark_decryptObjectInfo-32     287694772     4228076       -98.53%
```
2020-10-29 09:34:20 -07:00
Harshavardhana
4bf90ca67f
fix: handle a crash when AskDisks is set to -1 (#10777) 2020-10-29 09:25:43 -07:00
Harshavardhana
e0655e24f2
fix: A possible crash when fi.Erasure.Distribution is empty (#10779) 2020-10-28 19:24:01 -07:00
Klaus Post
bfc36aed89
Add update retry limit and compare error by string instead (#10776) 2020-10-28 13:19:53 -07:00
Kaloyan Raev
be7f67268d
fix: Do not cleanup range files in cache SaveMetadata when total hits are false (#10728) 2020-10-28 09:23:17 -07:00
Klaus Post
a982baff27
ListObjects Metadata Caching (#10648)
Design: https://gist.github.com/klauspost/025c09b48ed4a1293c917cecfabdf21c

Gist of improvements:

* Cross-server caching and listing will use the same data across servers and requests.
* Lists can be arbitrarily resumed at a constant speed.
* Metadata for all files scanned is stored for streaming retrieval.
* The existing bloom filters controlled by the crawler is used for validating caches.
* Concurrent requests for the same data (or parts of it) will not spawn additional walkers.
* Listing a subdirectory of an existing recursive cache will use the cache.
* All listing operations are fully streamable so the number of objects in a bucket no 
  longer dictates the amount of memory.
* Listings can be handled by any server within the cluster.
* Caches are cleaned up when out of date or superseded by a more recent one.
2020-10-28 09:18:35 -07:00
Krishna Srinivas
f53c5a020e
fix: heal object shards with ec.index and ec.distribution mismatches (#10773)
Co-authored-by: Harshavardhana <harsha@minio.io>
2020-10-28 00:10:20 -07:00
Harshavardhana
5b30bbda92
fix: add more protection distribution to match EcIndex (#10772)
allows for more stricter validation in picking up the right
set of disks for reconstruction.
2020-10-28 00:09:15 -07:00
Shireesh Anjal
858e2a43df
Remove logging info from OBDInfoHandler (#10727)
A lot of logging data is counterproductive. A better implementation with
precise useful log data can be introduced later.
2020-10-27 17:41:48 -07:00
Kaloyan Raev
df9894e275
avoid caching http ranges in background goroutine (#10724) 2020-10-26 23:04:48 -07:00
Krishna Srinivas
592f2f23a3
fix: heal rejects objects with disk re-ordering issue (#10766) 2020-10-26 18:48:47 -07:00
Krishna Srinivas
c49a80db41
fix: use meta.Erasure.Index for GetObject() to reconstruct object (#10764) 2020-10-26 16:19:42 -07:00
Poorna Krishnamoorthy
46275c6547
cache: rename function declarations (#10763) 2020-10-26 15:41:24 -07:00
Poorna Krishnamoorthy
0994ed9783
cache: fix call in GetObjectNInfo (#10762)
Fixes: #10751
2020-10-26 12:30:40 -07:00
Anis Elleuch
eb95353cb1
fix: Get/HeadObject return 404 on non quorum objects (#10753) 2020-10-26 10:30:46 -07:00
Harshavardhana
029758cb20
fix: retain the previous UUID for newly replaced drives (#10759)
only newly replaced drives get the new `format.json`,
this avoids disks reloading their in-memory reference
format, ensures that drives are online without
reloading the in-memory reference format.

keeping reference format in-tact means UUIDs
never change once they are formatted.
2020-10-26 10:29:29 -07:00
Harshavardhana
646d6917ed
turn-off checking for updates completely if MINIO_UPDATE=off (#10752) 2020-10-24 22:39:44 -07:00
Harshavardhana
d9db7f3308
expire lockers if lockers are offline (#10749)
lockers currently might leave stale lockers,
in unknown ways waiting for downed lockers.

locker check interval is high enough to safely
cleanup stale locks.
2020-10-24 13:23:16 -07:00
Harshavardhana
6a8c62f9fd
make sure to preserve UUID from reference format (#10748)
reference format should be source of truth
for inconsistent drives which reconnect,
add them back to their original position

remove automatic fix for existing offline
disk uuids
2020-10-24 13:23:08 -07:00
Anis Elleuch
00124c56d9
erasure: Commit data before xl.meta in RenameData() (#10734)
This will reduce the chance to have updated xl.meta without data.
2020-10-23 21:54:58 -07:00
Anis Elleuch
2c32c2149e
tests: Avoid running TestNSRace in short test mode (#10735) 2020-10-23 21:23:12 -07:00
Harshavardhana
734f258878
fix: slow down auto healing more aggressively (#10730)
Bonus fixes

- logging improvements to ensure that we don't use
  `go logger.LogIf` to avoid runtime.Caller missing
  the function name. log where necessary.
- remove unused code at erasure sets
2020-10-22 13:36:24 -07:00
Anis Elleuch
0e0c53bba4
tests: Lower expectation in addr selection in rand cache dialer (#10739)
Test TestDialContextWithDNSCacheRand was failing sometimes because it depends
on a random selection of addresses when testing random DNS resolution from cache.

Lower addr selection exception to 10%
2020-10-22 09:35:32 -07:00
Poorna Krishnamoorthy
5cc23ae052
validate if iam store is initialized (#10719)
Fixes panic - regression from d6d770c1b1
2020-10-20 21:28:24 -07:00
Harshavardhana
d6d770c1b1 initialize object layer right after config has loaded 2020-10-19 22:04:59 -07:00
Harshavardhana
b07df5cae1
initialize IAM as soon as object layer is initialized (#10700)
Allow requests to come in for users as soon as object
layer and config are initialized, this allows users
to be authenticated sooner and would succeed automatically
on servers which are yet to fully initialize.
2020-10-19 09:54:40 -07:00
Harshavardhana
c107728676
fix: s3 gateway DNS cache initialization (#10706)
fixes #10705
2020-10-19 01:34:23 -07:00
Anis Elleuch
284a2b9021
ilm: Send delete marker creation event when appropriate (#10696)
Before this commit, the crawler ILM will always send object delete event
notification though this is wrong.
2020-10-16 21:22:12 -07:00
Ritesh H Shukla
0b53e30ecb
Clean up monitor on delete bucket (#10698) 2020-10-16 17:59:31 -07:00
Harshavardhana
bd2131ba34
add DNS cache support to avoid DNS flooding (#10693)
Go stdlib resolver doesn't support caching DNS
resolutions, since we compile with CGO disabled
we are more probe to DNS flooding for all network
calls to resolve for DNS from the DNS server.

Under various containerized environments such as
VMWare this becomes a problem because there are
no DNS caches available and we may end up overloading
the kube-dns resolver under concurrent I/O.

To circumvent this issue implement a DNSCache resolver
which resolves DNS and caches them for around 10secs
with every 3sec invalidation attempted.
2020-10-16 14:49:05 -07:00
ebozduman
1aec168c84
fix: azure gateway should reject bucket names with "." (#10635) 2020-10-16 09:30:18 -07:00
Klaus Post
21a549a83b
fix: keep MRF channel open to avoid random CI crash (#10686)
There doesn't seem to be any benefit to closing the channel, so just keep 
it open and let it die with the server.
2020-10-16 09:08:51 -07:00
Ritesh H Shukla
8a16a1a1a9
fix: misc fixes for bandwidth reporting amd monitoring (#10683)
* Set peer for fetch bandwidth
* Fix the limit for bandwidth that is reported.
* Reduce CPU burn from bandwidth management.
2020-10-16 09:07:50 -07:00
Harshavardhana
ad726b49b4
rename zones to serverSets to avoid terminology conflict (#10679)
we are bringing in availability zones, we should avoid
zones as per server expansion concept.
2020-10-15 14:28:50 -07:00
Anis Elleuch
db2241066b
heal: Enable removing dangling delete markers (#10688) 2020-10-15 13:06:40 -07:00
Harshavardhana
f1cc16e788
fix: background heal rely on getOnlineDisks() (#10687) 2020-10-15 13:06:23 -07:00
Klaus Post
3820a905e0
in getOnlineDisks wait for disks to be populated (#10685) 2020-10-15 06:37:10 -07:00
Harshavardhana
2042d4873c
rename crawler config option to heal (#10678) 2020-10-14 13:51:51 -07:00
Harshavardhana
f9be783f3e
fix: allow crawler to crawl on disks without usage constraints (#10677)
additionally also change the resolution usage wise
return of disks, allows to small byte level differences
to be masked.
2020-10-14 12:12:10 -07:00
Harshavardhana
71b97fd3ac
fix: connect disks pre-emptively during startup (#10669)
connect disks pre-emptively upon startup, to ensure we have
enough disks are connected at startup rather than wait
for them.

we need to do this to avoid long wait times for server to
be online when we have servers come up in rolling upgrade
fashion
2020-10-13 18:28:42 -07:00
Klaus Post
03991c5d41
crawler: Remove waitForLowActiveIO (#10667)
Only use dynamic delays for the crawler. Even though the max wait was 1 second the number 
of waits could severely impact crawler speed.

Instead of relying on a global metric, we use the stateless local delays to keep the crawler 
running at a speed more adjusted to current conditions.

The only case we keep it is before bitrot checks when enabled.
2020-10-13 13:45:08 -07:00
飞雪无情
614060764d
fix: use the correct Action type for policy.Args and iampolicy.Args (#10650) 2020-10-12 15:18:22 -07:00
Harshavardhana
a3ba8188d7 fix: allow locker to be niladic 2020-10-12 14:23:44 -07:00
Harshavardhana
2760fc86af
Bump default idleConnsPerHost to control conns in time_wait (#10653)
This PR fixes a hang which occurs quite commonly at higher concurrency
by allowing following changes

- allowing lower connections in time_wait allows faster socket open's
- lower idle connection timeout to ensure that we let kernel
  reclaim the time_wait connections quickly
- increase somaxconn to 4096 instead of 2048 to allow larger tcp
  syn backlogs.

fixes #10413
2020-10-12 14:19:46 -07:00
Ritesh H Shukla
8ceb2a93fd
fix: peer replication bandwidth monitoring in distributed setup (#10652) 2020-10-12 09:04:55 -07:00
Ritesh H Shukla
c2f16ee846
Add basic bandwidth monitoring for replication. (#10501)
This change tracks bandwidth for a bucket and object

- [x] Add Admin API
- [x] Add Peer API
- [x] Add BW throttling
- [x] Admin APIs to set replication limit
- [x] Admin APIs for fetch bandwidth
2020-10-09 20:36:00 -07:00
Harshavardhana
6484453fc6
optionally allow strict quorum listing (#10649)
```
export MINIO_API_LIST_STRICT_QUORUM=on
```

would enable listing in quorum if necessary
2020-10-09 15:40:46 -07:00
Harshavardhana
a0d0645128
remove safeMode behavior in startup (#10645)
In almost all scenarios MinIO now is
mostly ready for all sub-systems
independently, safe-mode is not useful
anymore and do not serve its original
intended purpose.

allow server to be fully functional
even with config partially configured,
this is to cater for availability of actual
I/O v/s manually fixing the server.

In k8s like environments it will never make
sense to take pod into safe-mode state,
because there is no real access to perform
any remote operation on them.
2020-10-09 09:59:52 -07:00
Harshavardhana
253194e491
do not hold write locks - if objects don't exist (#10644) 2020-10-08 17:47:21 -07:00
Harshavardhana
736e58dd68
fix: handle concurrent lockers with multiple optimizations (#10640)
- select lockers which are non-local and online to have
  affinity towards remote servers for lock contention

- optimize lock retry interval to avoid sending too many
  messages during lock contention, reduces average CPU
  usage as well

- if bucket is not set, when deleteObject fails make sure
  setPutObjHeaders() honors lifecycle only if bucket name
  is set.

- fix top locks to list out always the oldest lockers always,
  avoid getting bogged down into map's unordered nature.
2020-10-08 12:32:32 -07:00
Poorna Krishnamoorthy
907a171edd
Generalize error messages for remote targets (#10638)
This is to allow remote targets to be generalized
for replication/ILM transition

Also adding a field in BucketTarget to identify
a remote target with a label.
2020-10-08 10:54:11 -07:00
Andreas Auernhammer
ed6d2a100f
logger: avoid writing audit log response header twice (#10642)
This commit fixes a misuse of the `http.ResponseWriter.WriteHeader`.
A caller should **either** call `WriteHeader` exactly once **or**
write to the response writer and causing an implicit 200 OK.

Writing the response headers more than once causes a `http: superfluous
response.WriteHeader call` log message. This commit fixes this
by preventing a 2nd `WriteHeader` call being forwarded to the underlying
`ResponseWriter`.

Updates #10587
2020-10-08 09:29:10 -07:00
Harshavardhana
effe131090
fix: allow read unlocks to be defensive about split brains (#10637) 2020-10-07 09:15:01 -07:00
Harshavardhana
18063bf25c
fix: cleanup old directory handling code (#10633)
we don't need them anymore, remove legacy code.
2020-10-06 12:03:57 -07:00
Poorna Krishnamoorthy
dbbed6f7f0
update minio-go dependency (#10634) 2020-10-06 08:37:09 -07:00
Poorna Krishnamoorthy
7fbfdceba3
Fix replication slowness (#10632)
- Increase channel buffer length
- Avoid blocking wait on replicaCh
2020-10-05 14:45:42 -07:00
Shireesh Anjal
f1418a50f0
add NVMe drive info [model num, serial num, drive temp. etc.] (#10613)
* add NVMe drive info [model num, serial num, drive temp. etc.]
* Ignore fuse partitions
* Add the nvme logic only for linux
* Move smart/nvme structs to a separate file

Co-authored-by: wlan0 <sidharthamn@gmail.com>
2020-10-04 10:18:46 -07:00
Krishna Srinivas
045e30f2c1
Set LastModified time from source for bucket replication (#10627) 2020-10-02 18:32:22 -07:00
Harshavardhana
c6a9a94f94
fix: optimize ServerInfo() handler to avoid reading config (#10626)
fixes #10620
2020-10-02 16:19:44 -07:00
Harshavardhana
8e7c00f3d4
add missing request-id from DeleteObject events (#10623)
fixes #10621
2020-10-02 13:36:13 -07:00
Harshavardhana
23e8390997
fix: Allow Walk to honor load balanced drives (#10610) 2020-10-01 20:24:34 -07:00
Anis Elleuch
71403be912
fix: consider partNumber in GET/HEAD requests (#10618) 2020-10-01 15:41:12 -07:00
Harshavardhana
f28d02b7f2
fix: simplify obd how we calculate transferred bytes (#10617) 2020-10-01 14:34:51 -07:00
Harshavardhana
e0cb814f3f
fail if port is not accessible (#10616)
throw proper error when port is not accessible
for the regular user, this is possibly a regression.

```
ERROR Unable to start the server: Insufficient permissions to use specified port
   > Please ensure MinIO binary has 'cap_net_bind_service=+ep' permissions
   HINT:
     Use 'sudo setcap cap_net_bind_service=+ep /path/to/minio' to provide sufficient permissions
```
2020-10-01 13:23:31 -07:00
Harshavardhana
98a08e1644
fix: protect updating latencies/throughput slices in obd (#10611)
Additionally close the transferChan upon function exit.
2020-10-01 09:50:08 -07:00
Klaus Post
3047121255
dataupdate: Bump to force rescan (#10609)
After #10594 let's invalidate the bloom filters to force the next cycles to go through all data.

There is a small chance that the linked PR could have caused missing bloom filter data.

This will invalidate the current bloom filters and make the crawler go through everything.
2020-09-30 16:10:40 -07:00
Ritesh H Shukla
5a7f92481e
fix: client errors for DNS service creation errors (#10584) 2020-09-30 14:09:41 -07:00
Anis Elleuch
0d45c38782
List v1/versions routes based on source IP if found (#10603)
Routing using on source IP if found. This should distribute
the listing load for V1 and versioning on multiple nodes
evenly between different clients.

If source IP is not found from the http request header, then falls back
to bucket name instead.
2020-09-30 13:38:27 -07:00
Poorna Krishnamoorthy
56d1b227cf
Handle changes to versioning config for replication (#10598)
Disallow versioning suspension on a bucket with
pre-existing replication configuration

If versioning is suspended on the target,replication
should fail.
2020-09-30 13:36:37 -07:00
Lenin Alevski
bea87a5a20
fix: reading multiple TLS certificates when deployed in K8S (#10601)
Ignore all regular files, CAs directory and any 
directory that starts with `..` inside the
`.minio/certs` folder
2020-09-30 08:21:30 -07:00
Harshavardhana
2b4eb87d77
pick disks which are common maximally used (#10600)
further optimization to ensure that good disks
are always used for listing, other than healing
we only use disks that are maximally used.
2020-09-29 22:54:02 -07:00
Harshavardhana
1f9abbee4d
make sure to release locks upon timeout (#10596)
fixes #10418
2020-09-29 15:18:34 -07:00
Klaus Post
fdf0ae9167
exit data update tracker only upon context completion (#10594)
The data update tracker saver would exit if data wasn't updated for between cycles.
2020-09-29 13:23:53 -07:00
Harshavardhana
00eb6f6bc9
cache DiskInfo at storage layer for performance (#10586)
`mc admin info` on busy setups will not move HDD
heads unnecessarily for repeated calls, provides
a better responsiveness for the call overall.

Bonus change allow listTolerancePerSet be N-1
for good entries, to avoid skipping entries
for some reason one of the disk went offline.
2020-09-29 09:54:41 -07:00
Harshavardhana
66174692a2
add '.healing.bin' for tracking currently healing disk (#10573)
add a hint on the disk to allow for tracking fresh disk
being healed, to allow for restartable heals, and also
use this as a way to track and remove disks.

There are more pending changes where we should move
all the disk formatting logic to backend drives, this
PR doesn't deal with this refactor instead makes it
easier to track healing in the future.
2020-09-28 19:39:32 -07:00
飞雪无情
209680e89f
Remove redundant http.HandlerFunc type conversion. (#10576) 2020-09-28 13:33:49 -07:00
飞雪无情
27d9bd04e5
Handling unhandled errors in the InfoCannedPolicy method. (#10575) 2020-09-27 10:24:04 -07:00
Harshavardhana
bebcf4f004 unlock() only if locking was successful 2020-09-25 19:36:47 -07:00
Harshavardhana
eafa775952
fix: add lock ownership to expire locks (#10571)
- Add owner information for expiry, locking, unlocking a resource
- TopLocks returns now locks in quorum by default, provides
  a way to capture stale locks as well with `?stale=true`
- Simplify the quorum handling for locks to avoid from storage
  class, because there were challenges to make it consistent
  across all situations.
- And other tiny simplifications to reset locks.
2020-09-25 19:21:52 -07:00
Harshavardhana
66b4a862e0
fix: network failure err check should ignore context canceled errors (#10567)
context canceled errors bubbling up from the network
layer has the potential to be misconstrued as network
errors, taking prematurely a server offline and triggering
a health check routine avoid this potential occurrence.
2020-09-25 14:35:47 -07:00
Anis Elleuch
9603489dd3
federation: Honor range with UploadObjectPart to a different cluster (#10570)
Use gr & length instead of srcInfo.Reader & srcInfo.Size because 
they don't honor range header
2020-09-25 12:06:42 -07:00
Anis Elleuch
b302c8a5f4
heal: Fix periodic healing cleanup (#10569)
isEnded() was incorrectly calculating if the current healing sequence is
ended or not. h.currentStatus.Items could be empty if healing is very
slow and mc admin heal consumed all items.
2020-09-25 10:29:00 -07:00
Praveen raj Mani
b880796aef
Set the maximum open connections limit in PG and MySQL target configs (#10558)
As the bulk/recursive delete will require multiple connections to open at an instance,
The default open connections limit will be reached which results in the following error

```FATAL:  sorry, too many clients already```

By setting the open connections to a reasonable value - `2`, We ensure that the max open connections
will not be exhausted and lie under bounds.

The queries are simple inserts/updates/deletes which is operational and sufficient with the
the maximum open connection limit is 2.

Fixes #10553

Allow user configuration for MaxOpenConnections
2020-09-24 22:20:30 -07:00
Harshavardhana
37a5d5d7a0
reduce timeouts between servers for faster disconnects (#10562) 2020-09-24 20:10:07 -07:00
Harshavardhana
3cac262dd1
report heal drives properly, also from global state (#10561)
It is possible the heal drives are not reported from
the maintenance check because the background heal
state simply relied on the `format.json` for capturing
unformatted drives. It is possible that drives might
be still healing - make sure that applications which
rely on cluster health check respond back this detail.
2020-09-24 15:36:47 -07:00
poornas
e6ab4db6b8
Fix minimum replication workers started (#10560)
This PR also fixes GetReplicationConfiguration permission
in web-handlers.go to use bucket as resource
2020-09-24 12:25:41 -07:00
Harshavardhana
ca989eb0b3
avoid ListBuckets returning quorum errors when node is down (#10555)
Also, revamp the way ListBuckets work make few portions
of the healing logic parallel

- walk objects for healing disks in parallel
- collect the list of buckets in parallel across drives
- provide consistent view for listBuckets()
2020-09-24 09:53:38 -07:00
飞雪无情
d778d034e7
Remove redundant mgmtQueryKey type. (#10557)
Remove redundant type conversion.
2020-09-24 08:40:21 -07:00
Harshavardhana
f7f9517b6a fix: host extraction without port 2020-09-23 12:10:14 -07:00
Harshavardhana
90cff10e2b avoid crash if disks are not initialized 2020-09-23 12:00:29 -07:00
Harshavardhana
81caf35926
fix: reduce healthcheck interval for storage rest client (#10544) 2020-09-23 10:43:42 -07:00
poornas
5726cef3ca
validate bucket exists in ListRemoteTargets api (#10552) 2020-09-23 10:37:54 -07:00
Harshavardhana
8b74a72b21
fix: rename READY deadline to CLUSTER deadline ENV (#10535) 2020-09-23 09:14:33 -07:00
Klaus Post
eec69d6796
Fix stale context for bucket retrieval (#10551)
The provided context gets captured by the closure making all subsequent calls fail.
2020-09-23 08:30:31 -07:00
Harshavardhana
0537a21b79
avoid concurrenct use of rand.NewSource (#10543) 2020-09-22 15:34:27 -07:00
poornas
4c54ed8748
Close replica channel only once (#10542)
Also enforce s3:GetReplicationConfiguration permission check as a
bucket level resource.
2020-09-22 12:47:24 -07:00
Anis Elleuch
4c81201f95
fix: healing delete marker on versioned buckets (#10530)
Healing was not working correctly in the distributed mode because
errFileVersionNotFound was not properly converted in storage rest
client.

Besides, fixing the healing delete marker is not working as expected.
2020-09-21 15:16:16 -07:00
Harshavardhana
cd8d511d3d move versionsOrder struct to xl-storage-utils 2020-09-21 14:24:42 -07:00