Commit Graph

7318 Commits

Author SHA1 Message Date
Jorge Israel Peña
4752323e1c
Use hdfs.Readdir() to optimize HDFS directory listings (#10121)
Currently, listing directories on HDFS incurs a per-entry remote Stat() call
penalty, the cost of which can really blow up on directories with many
entries (+1,000) especially when considered in addition to peripheral
calls (such as validation) and the fact that minio is an intermediary to the
client (whereas other clients listed below can query HDFS directly).

Because listing directories this way is expensive, the Golang HDFS library
provides the [`Client.Open()`] function which creates a [`FileReader`] that is
able to batch multiple calls together through the [`Readdir()`] function.

This is substantially more efficient for very large directories.

In one case we were witnessing about +20 seconds to list a directory with 1,500
entries, admittedly large, but the Java hdfs ls utility as well as the HDFS
library sample ls utility were much faster.

Hadoop HDFS DFS (4.02s):

    λ ~/code/minio → use-readdir
    » time hdfs dfs -ls /directory/with/1500/entries/
    …
    hdfs dfs -ls   5.81s user 0.49s system 156% cpu 4.020 total

Golang HDFS library (0.47s):

    λ ~/code/hdfs → master
    » time ./hdfs ls -lh /directory/with/1500/entries/
    …
    ./hdfs ls -lh   0.13s user 0.14s system 56% cpu 0.478 total

mc and minio **without** optimization (16.96s):

    λ ~/code/minio → master
    » time mc ls myhdfs/directory/with/1500/entries/
    …
    ./mc ls   0.22s user 0.29s system 3% cpu 16.968 total

mc and minio **with** optimization (0.40s):

    λ ~/code/minio → use-readdir
    » time mc ls myhdfs/directory/with/1500/entries/
    …
    ./mc ls   0.13s user 0.28s system 102% cpu 0.403 total

[`Client.Open()`]: https://godoc.org/github.com/colinmarc/hdfs#Client.Open
[`FileReader`]: https://godoc.org/github.com/colinmarc/hdfs#FileReader
[`Readdir()`]: https://godoc.org/github.com/colinmarc/hdfs#FileReader.Readdir
2020-07-24 11:31:51 -07:00
Klaus Post
11593c6cc4
Usage: Reset merged info when updating (#10126)
When merging multiple buckets reset between each update.

Avoids merging the same usage metrics multiple times resulting 
in duplicate data entries.
2020-07-24 11:02:10 -07:00
Harshavardhana
10025bda45
fix: add missing response headers to CORS handler (#10124) 2020-07-24 00:46:51 -07:00
Praveen raj Mani
b800541fbe
fix: a type in NSQ notification target environment key (#10118)
fixes #10100
2020-07-23 12:19:36 -07:00
Harshavardhana
3a73f1ead5
refactor server update behavior (#10107) 2020-07-23 08:03:31 -07:00
Anis Elleuch
1340281cb8
Fix marshaling expiration field in lifecycle (#10117) 2020-07-23 08:01:25 -07:00
poornas
b9be841fd2
Add missing validation for replication API conditions (#10114) 2020-07-22 17:39:40 -07:00
Harshavardhana
73890f31af
add minisign verification for container builds (#10115) 2020-07-22 17:09:31 -07:00
Anis Elleuch
456b2ef6eb
Avoid healing to be stuck with many concurrent event listeners (#10111)
If there are many listeners to bucket notifications or to the trace
subsystem, healing fails to work properly since it suspends itself when
the number of concurrent connections is above a certain threshold.

These connections are also continuous and not costly (*no disk access*),
it is okay to just ignore them in waitForLowHTTPReq().
2020-07-22 13:16:55 -07:00
Harshavardhana
ad8b53e6d4
add mips64 support for cross compilation (#10106) 2020-07-21 23:56:14 -07:00
Harshavardhana
0b5d1bc91d
fix: bucket replication docs (#10104)
* fix: bucket replication docs

* Update docs/bucket/replication/README.md

Co-authored-by: kannappanr <30541348+kannappanr@users.noreply.github.com>

Co-authored-by: kannappanr <30541348+kannappanr@users.noreply.github.com>
2020-07-21 22:19:30 -07:00
poornas
c43da3005a
Add support for server side bucket replication (#9882) 2020-07-21 17:49:56 -07:00
Minio Trusted
ca4c15bc63 Update yaml files to latest version RELEASE.2020-07-22T00-26-33Z 2020-07-22 00:44:03 +00:00
Harshavardhana
a880283593
Send the lower level error directly from GetDiskID() (#10095)
this is to detect situations of corruption disk
format etc errors quickly and keep the disk online
in such scenarios for requests to fail appropriately.
2020-07-21 13:54:06 -07:00
Bruce Wang
e464a5bfbc
Fix bug with fields that contain trimming spaces (#10079)
String x might contain trimming spaces. And it needs to be trimmed. For
example, in csv files, there might be trimming spaces in a field that
ought to meet a query condition that contains the value without
trimming spaces. This applies to both intCast and floatCast functions.
2020-07-21 12:57:09 -07:00
Harshavardhana
eb6bf454f1
fix: copyObject encryption from unencrypted object (#10102)
This is a continuation of #10085
2020-07-21 12:25:01 -07:00
Harshavardhana
ec06089eda
fix: re-implement cluster healthcheck (#10101) 2020-07-20 18:31:22 -07:00
Harshavardhana
0c4be55936
fix: fix lockup in merge-walk pool (#10098)
Fixes two different types of problems

- continuation of the problem seen in FS #9992
  as not fixed for erasure coded deployments,
  reproduced this issue with spark and its fixed now

- another issue was leaking walk go-routines which
  would lead to high memory usage and crash the system
  this is simply because all the walks which were purged
  at the top limit had leaking end walkers which would
  consume memory endlessly.

closes #9966
closes #10088
2020-07-20 17:28:26 -07:00
Harshavardhana
11d21d5d1b
fix: pass around the correct drives per set (#10097)
this is a precursor change before adding parity
based SLA across zones instead of same stripe size
2020-07-20 16:38:40 -07:00
findmyname666
f9648d3976
add tests lifecycle rules with empty prefix (#10093) 2020-07-20 13:12:50 -07:00
Harshavardhana
2955aae8e4
feat: Add notification support for bucketCreates and removal (#10075) 2020-07-20 12:52:49 -07:00
Harshavardhana
9fd836e51f
add dnsStore interface for upcoming operator webhook (#10077) 2020-07-20 12:28:48 -07:00
Anis Elleuch
518f44908c
fs: Close object fs.json before deletion (#10092)
NFS fails when deleting a file while it is already opened. The reason is
that the object fs.json meta file is opened but not closed before
removal.
2020-07-20 08:52:24 -07:00
Minio Trusted
38f60b3c1d Update yaml files to latest version RELEASE.2020-07-20T02-25-16Z 2020-07-20 02:41:52 +00:00
Harshavardhana
e2c71717f8
add different TCP timeouts for internal and incoming (#10090)
closes #10086
2020-07-19 17:16:12 -07:00
Harshavardhana
7764c542f2
allow claims to be optional in STS (#10078)
not all claims need to be present for
the JWT claim, let the policies not
exist and only apply which are present
when generating the credentials

once credentials are generated then
those policies should exist, otherwise
the request will fail.
2020-07-19 15:34:01 -07:00
findmyname666
aa6468932b
make sure lifecycle rule ID is present (#10084) 2020-07-19 15:10:05 -07:00
Harshavardhana
30104cb12b docs: fix veeam document formatting 2020-07-18 18:38:12 -07:00
Harshavardhana
d53e560ce0
fix: copyObject key rotation issue (#10085)
- copyObject in-place decryption failed
  due to incorrect verification of headers
- do not decode ETag when object is encrypted
  with SSE-C, so that pre-conditions don't fail
  prematurely.
2020-07-18 17:36:32 -07:00
Anis Elleuch
44c8af66ad
fs: Fix expiry regression after versioning refactor (#10083)
Do not ignore non-versioned objects in lifecycle compute 
action function.
2020-07-18 15:43:13 -07:00
Minio Trusted
68aaa5bbc3 Update yaml files to latest version RELEASE.2020-07-18T18-48-16Z 2020-07-18 19:04:01 +00:00
Harshavardhana
17747db93f
fix: support healing older content (#10076)
This PR adds support for healing older
content i.e from 2yrs, 1yr. Also handles
other situations where our config was
not encrypted yet.

This PR also ensures that our Listing
is consistent and quorum friendly,
such that we don't list partial objects
2020-07-17 17:41:29 -07:00
Harshavardhana
3fe27c8411
fix: In federated setup dial all hosts to figure out online host (#10074)
In federated NAS gateway setups, multiple hosts in srvRecords
was picked at random which could mean that if one of the
host was down the request can indeed fail and if client
retries it would succeed. Instead allow server to figure
out the current online host quickly such that we can
exclude the host which is down.

At the max the attempt to look for a downed node is to
300 millisecond, if the node is taking longer to respond
than this value we simply ignore and move to the node,
total attempts are equal to number of srvRecords if no
server is online we simply fallback to last dialed host.
2020-07-17 14:25:47 -07:00
Harshavardhana
14b1c9f8e4
fix: return Range errors after If-Matches (#10045)
closes #7292
2020-07-17 13:01:22 -07:00
Klaus Post
d84fc58cac
fix: CheckParts endpoint call to correct API (#10073)
CheckParts is calling the wrong endpoint, so instead of 
checking parts, it is writing metadata.
2020-07-17 10:17:59 -07:00
Harshavardhana
187c3f62df
fix: heal replaced drives properly (#10069)
healing was not working properly when drives were
replaced, due to the error check in root disk
calculation this PR fixes this behavior

This PR also adds additional fix for missing
metadata entries from .minio.sys as part of
disk healing as well.

Added code to ignore and print more context
sensitive errors for better debugging.

This PR is continuation of fix in 7b14e9b660
2020-07-17 10:08:04 -07:00
Anis Elleuch
4a447a439a
Fix lifecycle rules not applied in some cases (#10072)
HasActiveRules was not behaving as expected, this commit fixes it
and adds more unit tests.
2020-07-17 09:48:00 -07:00
Harshavardhana
4bfc50411c
fix: return versionId in tagging APIs (#10068) 2020-07-16 22:38:58 -07:00
Eco
5e8392c8ef
Update Veeam integration doc with immutability references (#10067) 2020-07-16 17:16:53 -07:00
Harshavardhana
d3c81a6e93
add missing available space from metrics (#10065) 2020-07-16 14:43:48 -07:00
Harshavardhana
7342b5355f
fix: obtain correct location string with DNS style buckets (#10060)
closes #10054
2020-07-16 13:28:29 -07:00
鸿则
1341bf5a9e
add preview width constraint (#10062)
* fix: add preview width constraint

* fix: set object item dropdown menu nowrap style
2020-07-16 10:46:27 -07:00
findmyname666
48aebf2d9d
allow lifecycle rules with overlapping prefixes (#10053) 2020-07-16 07:39:41 -07:00
Harshavardhana
7b14e9b660
fix: diskInfo should check diskID only if disk is online (#10058)
closes #10057
2020-07-16 07:30:05 -07:00
Harshavardhana
07eb24b775
add absolute path for images (#10056) 2020-07-16 00:06:14 -07:00
Harshavardhana
cd849bc2ff
update STS docs with new values (#10055)
Co-authored-by: Poorna <poornas@users.noreply.github.com>
2020-07-15 14:36:14 -07:00
Harshavardhana
9c66812b99
Add missing action stringer for DeleteVersionAction (#10049) 2020-07-15 17:28:49 +05:30
Harshavardhana
ec91fa55db
docs: Add more STS docs with dex and python example (#10047) 2020-07-15 17:25:55 +05:30
Klaus Post
00d3cc4b69
Enforce quota checks after crawl (#10036)
Enforce bucket quotas when crawling has finished. 
This ensures that we will not do quota enforcement on old data.

Additionally, delete less if we are closer to quota than we thought.
2020-07-14 18:59:05 -07:00
Harshavardhana
14ff7f5fcf
add hdfs sub-path support (#10046)
for users who don't have access to HDFS rootPath '/'
can optionally specify `minio gateway hdfs hdfs://namenode:8200/path`
for which they have access to, allowing all writes to be
performed at `/path`.

NOTE: once configured in this manner you need to make
sure command line is correctly specified, otherwise
your data might not be visible

closes #10011
2020-07-14 15:49:10 -07:00