Commit Graph

4024 Commits

Author SHA1 Message Date
Harshavardhana
c885777ac6
Add support for TCP_QUICKACK (#11369)
TCP_QUICKACK is a setting that allows TCP endpoints
to acknowledge the receipt of data instantly in situations
where they would normally wait to see if more data
would be arriving.

https://assets.extrahop.com/whitepapers/TCP-Optimization-Guide-by-ExtraHop.pdf
2021-02-02 09:44:18 -08:00
Poorna Krishnamoorthy
fe3aca70c3
Make number of replication workers configurable. (#11379)
MINIO_API_REPLICATION_WORKERS env.var and
`mc admin config set api` allow number of replication
workers to be configurable. Defaults to half the number
of cpus available.

Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
2021-02-02 16:45:06 +05:30
Ritesh H Shukla
c4848f9b4f
Add process start time to cluster metrics. (#11405) 2021-02-01 23:02:18 -08:00
Andreas Auernhammer
838d4dafbd
gateway: don't use encrypted ETags for If-Match (#11400)
This commit fixes a bug in the S3 gateway that causes
GET requests to fail when the object is encrypted by the
gateway itself.

The gateway was not able to GET the object since it always
specified a `If-Match` pre-condition checking that the object
ETag matches an expected ETag - even for encrypted ETags.

The problem is that an encrypted ETag will never match the ETag
computed by the backend causing the `If-Match` pre-condition
to fail.

This commit fixes this by not sending an `If-Match` header when
the ETag is encrypted. This is acceptable because:
  1. A gateway-encrypted object consists of two objects at the backend
     and there is no way to provide a concurrency-safe implementation
     of two consecutive S3 GETs in the deployment model of the S3
     gateway.
     Ref: S3 gateways are self-contained and isolated - and there may
          be multiple instances at the same time (no lock across
          instances).
  2. Even if the data object changes (concurrent PUT) while gateway
     A has download the metadata object (but not issued the GET to
     the data object => data race) then we don't return invalid data
     to the client since the decryption (of the currently uploaded data)
     will fail - given the metadata of the previous object.
2021-02-01 23:02:08 -08:00
Anis Elleuch
e96fdcd5ec
tagging: Add event notif for PUT object tagging (#11366)
An optimization to avoid double calling for during PutObject tagging
2021-02-01 13:52:51 -08:00
Anis Elleuch
6ef678663e
xl: Create a delete-marker when no other version exists (#11362)
Currently, it is not possible to create a delete-marker when xl.meta
does not exist (no version is created for that object yet). This makes a
problem for replication and mc mirroring with versioning enabled.

This also follows S3 specification.
2021-02-01 13:23:50 -08:00
Harshavardhana
f737a027cf fix: regression introduced in federated listing buckets
regression was introduced in
6cd255d516 fix it properly.
2021-02-01 12:06:58 -08:00
Anis Elleuch
65aa2bc614
ilm: Remove object in HEAD/GET if having an applicable ILM rule (#11296)
Remove an object on the fly if there is a lifecycle rule with delete
expiry action for the corresponding object.
2021-02-01 09:52:11 -08:00
Andreas Auernhammer
33554651e9
crypto: deprecate native Hashicorp Vault support (#11352)
This commit deprecates the native Hashicorp Vault
support and removes the legacy Vault documentation.

The native Hashicorp Vault documentation is marked as
outdated and deprecated for over a year now. We give
another 6 months before we start removing Hashicorp Vault
support and show a deprecation warning when a MinIO server
starts with a native Vault configuration.
2021-01-29 17:55:37 -08:00
Poorna Krishnamoorthy
c82aef0a56
fix ObjectInfo returned by CopyObject (#11377)
erasure CopyObject was returning old metadata
2021-01-29 14:49:18 -08:00
Harshavardhana
1e53bf2789
fix: allow expansion with newer constraints for older setups (#11372)
currently we had a restriction where older setups would
need to follow previous style of "stripe" count being same
expansion, we can relax that instead newer pools can be
expanded for older setups with newer constraints of
common parity ratio.
2021-01-29 11:40:55 -08:00
Ritesh H Shukla
c8489a8f0c
fix: log notification errors only once (#11350) 2021-01-28 13:40:31 -08:00
Klaus Post
2680772d4b
Don't mark remotes online when shutting down (#11368)
Shutting down will mark remotes online when the shutdown has 
started since the context is canceled.

For example:

```
API: SYSTEM()
Time: 16:21:31 CET 01/28/2021
DeploymentID: 313b0065-c5a1-4aa3-9233-07223e77a730
Error: Storage resources are insufficient for the write operation .minio.sys/tmp/ced455c4-3d27-4bdd-95fc-b4707a179b8a/fd934ef3-8fc8-4330-abc1-f039fbbb9700/part.1 (cmd.InsufficientWriteQuorum)
       1: d:\minio\minio\cmd\data-usage.go:56:cmd.storeDataUsageInBackend()
Exiting on signal: INTERRUPT
Client http://127.0.0.1:9002/minio/lock/v5 online
Client http://127.0.0.1:9002/minio/storage/data/distxl/s2/d3/v24 online
Client http://127.0.0.1:9002/minio/storage/data/distxl/s2/d2/v24 online
Client http://127.0.0.1:9002/minio/storage/data/distxl/s2/d1/v24 online
Client http://127.0.0.1:9002/minio/peer/v12 online
Client http://127.0.0.1:9002/minio/storage/data/distxl/s2/d4/v24 online
```

Use a fresh context for health checks.
2021-01-28 13:38:12 -08:00
Harshavardhana
567f7bdd05 fix: verify overlapping domains when > 1 2021-01-28 13:08:53 -08:00
Harshavardhana
6cd255d516
fix: allow updated domain names in federation (#11365)
additionally also disallow overlapping domain names
2021-01-28 11:44:48 -08:00
Aditya Manthramurthy
e79829b5b3
Bind to lookup user after user auth to lookup ldap groups (#11357) 2021-01-27 17:31:21 -08:00
Poorna Krishnamoorthy
fd3f02637a
fix: replication regression due to proxying requests (#11356)
In PR #11165 due to incorrect proxying for 2 
way replication even when the object was not 
yet replicated

Additionally, fix metadata comparisons when
deciding to do full replication vs metadata copy.

fixes #11340
2021-01-27 11:22:34 -08:00
Harshavardhana
e019f21bda
fix: trigger heal if one of the parts are not found (#11358)
Previously we added heal trigger when bit-rot checks
failed, now extend that to support heal when parts
are not found either. This healing gets only triggered
if we can successfully decode the object i.e read
quorum is still satisfied for the object.
2021-01-27 10:21:14 -08:00
Anis Elleuch
e9ac7b0fb7
heal: Remove empty directories (#11354)
Since the introduction of __XLDIR__, an empty directory does not have a
meaning anymore in erasure mode. Make healing removes it wherever it
finds it.
2021-01-27 02:19:28 -08:00
Harshavardhana
1debd722b5 rename last remaining Zone->Pool 2021-01-26 20:47:42 -08:00
massintha azamoum
e7f6051f19
Send bucket name to peers when bucket notification is enabled (#11351) 2021-01-26 13:48:28 -08:00
Harshavardhana
6717295e18 fix: rename audit log docs and datastructure 2021-01-26 13:39:55 -08:00
Anis Elleuch
00cff1aac5
audit: per object send pool number, set number and servers per operation (#11233) 2021-01-26 13:21:51 -08:00
Harshavardhana
9722531817 fix: purge LDAP deprecated keys 2021-01-26 09:53:29 -08:00
Harshavardhana
5c6bfae4c7
fix: load credentials from etcd directly when possible (#11339)
under large deployments loading credentials might be
time consuming, while this is okay and we will not
respond quickly for `mc admin user list` like queries
but it is possible to support `mc admin user info`

just like how we handle authentication by fetching
the user directly from persistent store.

additionally support service accounts properly,
reloaded from etcd during watch() - this was missing

This PR is also half way remedy for #11305
2021-01-25 20:01:49 -08:00
Aditya Manthramurthy
5f51ef0b40
Add LDAP Lookup-Bind mode (#11318)
This change allows the MinIO server to be configured with a special (read-only)
LDAP account to perform user DN lookups.

The following configuration parameters are added (along with corresponding
environment variables) to LDAP identity configuration (under `identity_ldap`):

- lookup_bind_dn / MINIO_IDENTITY_LDAP_LOOKUP_BIND_DN
- lookup_bind_password / MINIO_IDENTITY_LDAP_LOOKUP_BIND_PASSWORD
- user_dn_search_base_dn / MINIO_IDENTITY_LDAP_USER_DN_SEARCH_BASE_DN
- user_dn_search_filter / MINIO_IDENTITY_LDAP_USER_DN_SEARCH_FILTER

This lookup-bind account is a service account that is used to lookup the user's
DN from their username provided in the STS API. When configured, searching for
the user DN is enabled and configuration of the base DN and filter for search is
required. In this "lookup-bind" mode, the username format is not checked and must
not be specified. This feature is to support Active Directory setups where the
DN cannot be simply derived from the username.

When the lookup-bind is not configured, the old behavior is enabled: the minio
server performs LDAP lookups as the LDAP user making the STS API request and the
username format is checked and configuring it is required.
2021-01-25 14:26:10 -08:00
Harshavardhana
7e266293e6
fix: notify bucket replication after replication/ilm (#11343) 2021-01-25 14:04:41 -08:00
Harshavardhana
eb6871ecd9
fix: LoginSTS should be an inline implementation (#11337)
STS tokens can be obtained by using local APIs
once the remote JWT token is presented, current
code was not validating the incoming token in the
first place and was incorrectly making a network
operation using that token.

For the most part this always works without issues,
but under adversarial scenarios it exposes client
to hand-craft a request that can reach internal
services without authentication.

This kind of proxying should be avoided before
validating the incoming token.
2021-01-25 10:15:03 -08:00
Harshavardhana
9cdd981ce7
fix: expire locks only on participating lockers (#11335)
additionally also add a new ForceUnlock API, to
allow forcibly unlocking locks if possible.
2021-01-25 10:01:27 -08:00
Anis Elleuch
bd8020aba8
heal: Decode object name in healing result (#11348)
The user can see __XLDIR__ prefix in mc admin heal when the command
heals an empty object with a trailing slash. This commit decodes the
name of the object before sending it back to the upper level.
2021-01-25 09:53:37 -08:00
Harshavardhana
09bc49bd51
fix: healBucket across sets should capture results properly (#11341)
healing `.minio.sys/config` returns incorrect quorum errors
across sets, healing of the buckets.
2021-01-25 09:45:09 -08:00
Harshavardhana
82f0471d1b
honor maxWait heal config when maxIO hits (#11338) 2021-01-25 07:53:12 -08:00
Harshavardhana
6a95f412c9
avoid double CORS headers in federation (#11334)
CORS proxying adds double headers one
by the receiving server, one by proxied
server. Remove them before proxying
when 'Origin' header is found.
2021-01-23 18:27:23 -08:00
Ritesh H Shukla
7575c24037
Add open FD and FD limit to cluster metrics (#11328) 2021-01-22 18:30:16 -08:00
Harshavardhana
43f973c4cf
fix: check for O_DIRECT support for reads and writes (#11331)
In-case user enables O_DIRECT for reads and backend does
not support it we shall proceed to turn it off instead
and print a warning. This validation avoids any unexpected
downtimes that users may incur.
2021-01-22 15:38:21 -08:00
Harshavardhana
1b453728a3
initialize forwarder after init() to avoid crashes (#11330)
DNSCache dialer is a global value initialized in
init(), whereas `go` keeps `var =` before `init()`
, also we don't need to keep proxy routers as
global entities - register the forwarder as
necessary to avoid crashes.
2021-01-22 15:37:41 -08:00
Harshavardhana
a6c146bd00
validate storage class across pools when setting config (#11320)
```
mc admin config set alias/ storage_class standard=EC:3
```

should only succeed if parity ratio is valid for all
server pools, if not we should fail proactively.

This PR also needs to bring other changes now that
we need to cater for variadic drive counts per pool.

Bonus fixes also various bugs reproduced with

- GetObjectWithPartNumber()
- CopyObjectPartWithOffsets()
- CopyObjectWithMetadata()
- PutObjectPart,PutObject with truncated streams
2021-01-22 12:09:24 -08:00
Klaus Post
2167ba0111
Feed correct part number to sio (#11326)
When offsets were specified we relied on the first part number to be correct.

Recalculate based on offset.
2021-01-21 08:43:03 -08:00
Klaus Post
4e6d717f39
Compress profiling data (#11313)
Trace data can be rather large and compresses fine.

Compress profile data in zip files:

```
277.895.314 before.profiles.zip
152.800.318 after.profiles.zip
```
2021-01-20 15:49:53 -08:00
Poorna Krishnamoorthy
845e251fa9
fix: crash in notificationsys when peers online is 0 (#11307)
Check if the number of peers online > 0 before using peerClient
2021-01-20 13:13:05 -08:00
Harshavardhana
d1a8f0b786
fix possible crashes on deleteMarker replication (#11308)
Delete marker can have `metaSys` set to nil, that
can lead to crashes after the delete marker has
been healed.

Additionally also fix isObjectDangling check
for transitioned objects, that do not have parts
should be treated similar to Delete marker.
2021-01-20 13:12:12 -08:00
Klaus Post
dac19d7272
Clarify root disk error (#11314)
Make it clearer what the problem is and how to resolve it.
2021-01-20 13:11:42 -08:00
Harshavardhana
7624c8b9bb
fix: honor storage class uniformity for multiple pools (#11309) 2021-01-20 01:41:18 -08:00
Klaus Post
19fb1086b2
select: Fix leak on compressed files (#11302)
Properly close gzip reader when done reading

fixes #11300
2021-01-19 17:51:46 -08:00
Harshavardhana
a5e23a40ff
fix: allow delayed etcd updates to have fallbacks (#11151)
fixes #11149
2021-01-19 10:05:41 -08:00
Harshavardhana
1ad2b7b699
fix: add stricter validation for erasure server pools (#11299)
During expansion we need to validate if

- new deployment is expanded with newer constraints
- existing deployment is expanded with older constraints
- multiple server pools rejected if they have different
  deploymentID and distribution algo
2021-01-19 10:01:31 -08:00
Harshavardhana
b5049d541f
fix: reduce an extra readdir() attempted on non-legacy setups (#11301)
to verify moving content and preserving legacy content,
we have way to detect the objects through readdir()
this path is not necessary for most common cases on
newer setups, avoid readdir() to save multiple system
calls.

also fix the CheckFile behavior for most common
use case i.e without legacy format.
2021-01-19 10:01:06 -08:00
Harshavardhana
e0055609bb
fix: crawler to skip healing the drives in a set being healed (#11274)
If an erasure set had a drive replacement recently, we don't
need to attempt healing on another drive with in the same erasure
set - this would ensure we do not double heal the same content
and also prioritizes usage for such an erasure set to be calculated
sooner.
2021-01-19 02:40:52 -08:00
Klaus Post
e8ce348da1
crypto: Escape JSON text (#10794)
Escape the JSON keys+values from the context.

We do not add the HTML escapes, since that is an extra escape level not mandatory for JSON.
2021-01-19 01:39:04 -08:00
Ritesh H Shukla
b4add82bb6
Updated Prometheus metrics (#11141)
* Add metrics for nodes online and offline
* Add cluster capacity metrics
* Introduce v2 metrics
2021-01-18 20:35:38 -08:00
Harshavardhana
3ca6330661
fix: optimize parentDirIsObject by moving isObject to storage layer (#11291)
For objects with `N` prefix depth, this PR reduces `N` such network
operations by converting `CheckFile` into a single bulk operation.

Reduction in chattiness here would allow disks to be utilized more
cleanly, while maintaining the same functionality along with one
extra volume check stat() call is removed.

Update tests to test multiple sets scenario
2021-01-18 12:25:22 -08:00
Aditya Manthramurthy
3163a660aa
Fix support for multiple LDAP user formats (#11276)
Fixes support for using multiple base DNs for user search in the LDAP directory
allowing users from different subtrees in the LDAP hierarchy to request
credentials.

- The username in the produced credentials is now the full DN of the LDAP user
to disambiguate users in different base DNs.
2021-01-17 21:54:32 -08:00
Harshavardhana
0dadfd1b3d
fix: do not compute usage for not found lifecycle operations (#11288)
Currently we would proceed to apply incorrect lifecycle policies
for non-existent objects, this PR handles them appropriately.
2021-01-17 13:58:41 -08:00
Harshavardhana
4315f93421
fix: make sure parentDirIsObject is used at set level (#11280)
parentDirIsObject is not using set level understanding
to check for parent objects, without this it can lead to
objects that can actually reside on a separate set as
objects and would conflict.
2021-01-17 01:11:48 -08:00
Harshavardhana
ddb5d7043a fix: standard storage class is allowed to be '0' 2021-01-16 17:32:25 -08:00
Harshavardhana
f903cae6ff
Support variable server pools (#11256)
Current implementation requires server pools to have
same erasure stripe sizes, to facilitate same SLA
and expectations.

This PR allows server pools to be variadic, i.e they
do not have to be same erasure stripe sizes - instead
they should have SLA for parity ratio.

If the parity ratio cannot be guaranteed by the new
server pool, the deployment is rejected i.e server
pool expansion is not allowed.
2021-01-16 12:08:02 -08:00
Poorna Krishnamoorthy
7090bcc8e0
fix: doc links and delete replication permissions enforcement (#11285) 2021-01-15 15:22:55 -08:00
Harshavardhana
c222bde14b
fix: use common logging implementation for DNSCache (#11284) 2021-01-15 14:04:56 -08:00
Poorna Krishnamoorthy
feaf8dfb9a
Fix replication status reported on completion (#11273)
Fixes: #11272
2021-01-13 11:52:28 -08:00
Harshavardhana
628ef081d1
fix: preserve cache calculated previously while moving from v2 to v3 (#11269)
This ensures that all the prometheus monitoring and usage
trackers to avoid alerts configured, although we cannot
support v1 to v2 here - we can v2 to v3.
2021-01-13 09:58:08 -08:00
Harshavardhana
44dff36ff7
listing with prefix prefixed with '/' should be ignored (#11268)
fixes #11265
2021-01-13 09:44:11 -08:00
Poorna Krishnamoorthy
b97d53b29c
fix remote target healthcheck (#11267) 2021-01-12 20:48:04 -08:00
Harshavardhana
1a5775e2e8
enable small and large file optimization (#11260)
- for large objects we found that 1MiB block for
  r/w respectively.
- for small objects we found that 128KiB block for
  r/w respectively.
2021-01-12 10:20:39 -08:00
Anis Elleuch
e2579b1f5a
azure: Use default upload parameters to avoid consuming too much memory (#11251)
A lot of memory is consumed when uploading small files in parallel, use
the default upload parameters and add MINIO_AZURE_UPLOAD_CONCURRENCY for
users to tweak.
2021-01-11 22:48:09 -08:00
Poorna Krishnamoorthy
7824e19d20
Allow synchronous replication if enabled. (#11165)
Synchronous replication can be enabled by setting the --sync
flag while adding a remote replication target.

This PR also adds proxying on GET/HEAD to another node in a
active-active replication setup in the event of a 404 on the current node.
2021-01-11 22:36:51 -08:00
Harshavardhana
317305d5f9
fix: regression in adding new replication targets (#11257) 2021-01-11 09:08:42 -08:00
Harshavardhana
e4e117faab
fix: enable xl.json to xl.meta only if legacy drive is found (#11255)
another optimization is renameLegacyMetadata() never needs
to validate bucket with os.Stat() again, leading to reduction
in one extra syscall.
2021-01-11 02:27:04 -08:00
Klaus Post
51dad1d130
Fix missing GetObjectNInfo Closure (#11243)
Review for missing Close of returned value from `GetObjectNInfo`.

This was often obscured by the stuff that auto-unlocks when reaching EOF.
2021-01-08 10:12:26 -08:00
Harshavardhana
4593b146be
fix: print errors only when metacache status has errors (#11248) 2021-01-08 16:52:19 +05:30
Harshavardhana
f21d650ed4
fix: readData in bulk call using messagepack byte wrappers (#11228)
This PR refactors the way we use buffers for O_DIRECT and
to re-use those buffers for messagepack reader writer.

After some extensive benchmarking found that not all objects
have this benefit, and only objects smaller than 64KiB see
this benefit overall.

Benefits are seen from almost all objects from

1KiB - 32KiB

Beyond this no objects see benefit with bulk call approach
as the latency of bytes sent over the wire v/s streaming
content directly from disk negate each other with no
remarkable benefits.

All other optimizations include reuse of msgp.Reader,
msgp.Writer using sync.Pool's for all internode calls.
2021-01-07 19:27:31 -08:00
Harshavardhana
a4f6705874
expire stale locks when owner is down (#11247)
fixes #11246
2021-01-07 19:16:18 -08:00
Poorna Krishnamoorthy
b35b537e3f
Pass versionID to checkReplicateDelete in web handler (#11244) 2021-01-07 15:28:27 -08:00
Harshavardhana
5c52d5ffc7
fix: treat errVolumeNotFound as EOF error in listPathRaw (#11238) 2021-01-07 09:52:53 -08:00
Harshavardhana
f0808bb2e5
fix: getObject fd leaks in transition and replication code (#11237) 2021-01-06 16:13:10 -08:00
Harshavardhana
a6dee21092
initialize IAM store before Init() to avoid any crash (#11236) 2021-01-06 13:40:20 -08:00
Anis Elleuch
6f781c5e7a
heal: Reduce whitespace ticker to 5 seconds (#11234)
30 seconds white spaces is long for some setups which time out when no
read activity in short time, reduce the subnet health white space ticker
to 5 seconds, since it has no cost at all.
2021-01-06 13:29:50 -08:00
Harshavardhana
f8ca859790
fix: server/gateway banner formatting (#11230) 2021-01-06 10:38:07 -08:00
Harshavardhana
76e2713ffe
fix: use buffers only when necessary for io.Copy() (#11229)
Use separate sync.Pool for writes/reads

Avoid passing buffers for io.CopyBuffer()
if the writer or reader implement io.WriteTo or io.ReadFrom
respectively then its useless for sync.Pool to allocate
buffers on its own since that will be completely ignored
by the io.CopyBuffer Go implementation.

Improve this wherever we see this to be optimal.

This allows us to be more efficient on memory usage.
```
   385  // copyBuffer is the actual implementation of Copy and CopyBuffer.
   386  // if buf is nil, one is allocated.
   387  func copyBuffer(dst Writer, src Reader, buf []byte) (written int64, err error) {
   388  	// If the reader has a WriteTo method, use it to do the copy.
   389  	// Avoids an allocation and a copy.
   390  	if wt, ok := src.(WriterTo); ok {
   391  		return wt.WriteTo(dst)
   392  	}
   393  	// Similarly, if the writer has a ReadFrom method, use it to do the copy.
   394  	if rt, ok := dst.(ReaderFrom); ok {
   395  		return rt.ReadFrom(src)
   396  	}
```

From readahead package
```
// WriteTo writes data to w until there's no more data to write or when an error occurs.
// The return value n is the number of bytes written.
// Any error encountered during the write is also returned.
func (a *reader) WriteTo(w io.Writer) (n int64, err error) {
	if a.err != nil {
		return 0, a.err
	}
	n = 0
	for {
		err = a.fill()
		if err != nil {
			return n, err
		}
		n2, err := w.Write(a.cur.buffer())
		a.cur.inc(n2)
		n += int64(n2)
		if err != nil {
			return n, err
		}
```
2021-01-06 09:36:55 -08:00
Harshavardhana
b5d291ea88
fix: rename remaining zone -> pool (#11231) 2021-01-06 09:35:47 -08:00
Klaus Post
eb9172eecb
Allow Compression + encryption (#11103) 2021-01-05 20:08:35 -08:00
Poorna Krishnamoorthy
64bddf47d8
Pass deletemarker correctly to replicate opts (#11227)
fixes: #11180
2021-01-05 14:12:37 -08:00
Harshavardhana
4ed45ce543
fix: healing buckets during pool expansion (#11224)
fixes #11209
2021-01-05 13:24:22 -08:00
Klaus Post
ad511b0eb8
tests: Fix occasional data race (#11223)
CI tests could trigger a data race.

Servers are generally not expected to reinitialize, so tests could trigger data races when reinitializing and async operations are running.

We add the option to safely reset global vars instead of overwriting.

Fixes races like:

```
WARNING: DATA RACE
Read at 0x00000477ab18 by goroutine 1159:
  github.com/minio/minio/cmd.FileInfo.ToObjectInfo()
      /home/runner/work/minio/minio/cmd/erasure-metadata.go:105 +0x16d
  github.com/minio/minio/cmd.erasureObjects.putObject()
      /home/runner/work/minio/minio/cmd/erasure-object.go:748 +0x13f8
  github.com/minio/minio/cmd.(*erasureObjects).listPath.func3.2()
      /home/runner/work/minio/minio/cmd/metacache-set.go:682 +0x7d3
  github.com/minio/minio/cmd.newMetacacheBlockWriter.func1.2()
      /home/runner/work/minio/minio/cmd/metacache-stream.go:777 +0x1c4
  github.com/minio/minio/cmd.newMetacacheBlockWriter.func1()
      /home/runner/work/minio/minio/cmd/metacache-stream.go:806 +0x614

Previous write at 0x00000477ab18 by goroutine 1269:
  [failed to restore the stack]

Goroutine 1159 (running) created at:
  github.com/minio/minio/cmd.newMetacacheBlockWriter()
      /home/runner/work/minio/minio/cmd/metacache-stream.go:760 +0x112
  github.com/minio/minio/cmd.(*erasureObjects).listPath.func3()
      /home/runner/work/minio/minio/cmd/metacache-set.go:672 +0xe22

Goroutine 1269 (running) created at:
  testing.(*T).Run()
      /opt/hostedtoolcache/go/1.14.13/x64/src/testing/testing.go:1095 +0x537
  testing.runTests.func1()
      /opt/hostedtoolcache/go/1.14.13/x64/src/testing/testing.go:1339 +0xa6
  testing.tRunner()
      /opt/hostedtoolcache/go/1.14.13/x64/src/testing/testing.go:1050 +0x1eb
  testing.runTests()
      /opt/hostedtoolcache/go/1.14.13/x64/src/testing/testing.go:1337 +0x594
  testing.(*M).Run()
      /opt/hostedtoolcache/go/1.14.13/x64/src/testing/testing.go:1252 +0x2ff
  github.com/minio/minio/cmd.TestMain()
      /home/runner/work/minio/minio/cmd/test-utils_test.go:120 +0x44e
  main.main()
      _testmain.go:1408 +0x223
==================
==================
WARNING: DATA RACE
Read at 0x00000477aae8 by goroutine 1159:
  github.com/minio/minio/cmd.(*BucketVersioningSys).Enabled()
      /home/runner/work/minio/minio/cmd/bucket-versioning.go:26 +0x52
  github.com/minio/minio/cmd.FileInfo.ToObjectInfo()
      /home/runner/work/minio/minio/cmd/erasure-metadata.go:105 +0x197
  github.com/minio/minio/cmd.erasureObjects.putObject()
      /home/runner/work/minio/minio/cmd/erasure-object.go:748 +0x13f8
  github.com/minio/minio/cmd.(*erasureObjects).listPath.func3.2()
      /home/runner/work/minio/minio/cmd/metacache-set.go:682 +0x7d3
  github.com/minio/minio/cmd.newMetacacheBlockWriter.func1.2()
      /home/runner/work/minio/minio/cmd/metacache-stream.go:777 +0x1c4
  github.com/minio/minio/cmd.newMetacacheBlockWriter.func1()
      /home/runner/work/minio/minio/cmd/metacache-stream.go:806 +0x614

Previous write at 0x00000477aae8 by goroutine 1269:
  [failed to restore the stack]

Goroutine 1159 (running) created at:
  github.com/minio/minio/cmd.newMetacacheBlockWriter()
      /home/runner/work/minio/minio/cmd/metacache-stream.go:760 +0x112
  github.com/minio/minio/cmd.(*erasureObjects).listPath.func3()
      /home/runner/work/minio/minio/cmd/metacache-set.go:672 +0xe22

Goroutine 1269 (running) created at:
  testing.(*T).Run()
      /opt/hostedtoolcache/go/1.14.13/x64/src/testing/testing.go:1095 +0x537
  testing.runTests.func1()
      /opt/hostedtoolcache/go/1.14.13/x64/src/testing/testing.go:1339 +0xa6
  testing.tRunner()
      /opt/hostedtoolcache/go/1.14.13/x64/src/testing/testing.go:1050 +0x1eb
  testing.runTests()
      /opt/hostedtoolcache/go/1.14.13/x64/src/testing/testing.go:1337 +0x594
  testing.(*M).Run()
      /opt/hostedtoolcache/go/1.14.13/x64/src/testing/testing.go:1252 +0x2ff
  github.com/minio/minio/cmd.TestMain()
      /home/runner/work/minio/minio/cmd/test-utils_test.go:120 +0x44e
  main.main()
      _testmain.go:1408 +0x223
==================
```
2021-01-05 10:45:26 -08:00
Harshavardhana
cb0eaeaad8
feat: migrate to ROOT_USER/PASSWORD from ACCESS/SECRET_KEY (#11185) 2021-01-05 10:22:57 -08:00
Harshavardhana
d0027c3c41
do not use large buffers if not necessary (#11220)
without this change, there is a performance
regression for small objects GETs, this makes
the overall speed to go back to pre '59d363'
commit days.
2021-01-04 18:51:52 -08:00
Anis Elleuch
cb7fc99368
handlers: Avoid initializing a struct in each handler call (#11217) 2021-01-04 09:54:22 -08:00
Harshavardhana
a4383051d9
remove/deprecate crawler disable environment (#11214)
with changes present to automatically throttle crawler
at runtime, there is no need to have an environment
value to disable crawling. crawling is a fundamental
piece for healing, lifecycle and many other features
there is no good reason anyone would need to disable
this on a production system.

* Apply suggestions from code review
2021-01-04 09:43:31 -08:00
Harshavardhana
e7ae49f9c9
fix: calculate prometheus disks_offline/disks_total correctly (#11215)
fixes #11196
2021-01-04 09:42:09 -08:00
Anis Elleuch
153d4be032
tracing: NumSubscribers() to use atomic instead of mutex (#11219)
globalSubscribers.NumSubscribers() is heavily used in S3 requests and it
uses mutex, use atomic.Load instead since it is faster

Co-authored-by: Anis Elleuch <anis@min.io>
2021-01-04 09:40:30 -08:00
Anis Elleuch
dfd99b6d8f
handlers: Little bit more optimizations (#11211) 2021-01-04 00:01:06 -08:00
Harshavardhana
c4b1d394d6
erasure: avoid io.Copy in hotpaths to reduce allocation (#11213) 2021-01-03 16:27:34 -08:00
Harshavardhana
c4131c2798
feat: Small object optimization read data in single bulk call (#11207) 2021-01-03 11:27:57 -08:00
Anis Elleuch
c9d502e6fa
parentDirIsObject() to return quickly with inexistant parent (#11204)
Rewrite parentIsObject() function. Currently if a client uploads
a/b/c/d, we always check if c, b, a are actual objects or not.

The new code will check with the reverse order and quickly quit if 
the segment doesn't exist.

So if a, b, c in 'a/b/c' does not exist in the first place, then returns
false quickly.
2021-01-02 12:01:29 -08:00
Anis Elleuch
677e80c0f8
xl: Remove check-dir in ReadVersion (#11200)
The only purpose of check-dir flag in
ReadVersion is to return 404 when
an object has xl.meta but without data.

This is causing an extract call to the disk 
which can be penalizing in case of busy system
where disks receive many concurrent access.
2021-01-02 10:35:57 -08:00
Harshavardhana
aa85af4d1a fix: missing CopyObjectPart maxClients reorder 2021-01-01 23:07:37 -08:00
Anis Elleuch
ae731d232f
trace: Reorder http/trace maxClients wrapping for correct tracing (#11202)
mc admin trace does not show the correct handler name in the output: it
is printing `maxClients' for all handlers. The reason is that the wrong
order of handler wrapping.
2021-01-01 23:06:07 -08:00
Anis Elleuch
a317d220ed
xl-storage: Do not stat bucket assuming the object exists (#11201)
In HEAD/GET, only STAT the bucket if the 
object does not exist to return the correct
error response.
2021-01-01 09:44:36 -08:00
Harshavardhana
3e1221a01c
fix: log once updating dataUsageCache versions (#11190)
also reduce usage of *bytes.Buffer for
reading `usage-cache.bin`
2020-12-31 09:45:09 -08:00
Ritesh H Shukla
36fc2f98ed
fix: admin trace throttled requests (#11192) 2020-12-30 21:04:55 -08:00
Ritesh H Shukla
556524c715
Reduce logging when peer is offline (#11184) 2020-12-30 14:38:54 -08:00
Harshavardhana
cc457f1798
fix: enhance logging in crawler use console.Debug instead of logger.Info (#11179) 2020-12-29 01:57:28 -08:00
Harshavardhana
ca0d31b09a
fix: re-arrange handlers to handle requests on /minio (#11177)
fixes #11175
2020-12-28 17:10:33 -08:00
Harshavardhana
445a9bd827
fix: heal optimizations in crawler to avoid multiple healing attempts (#11173)
Fixes two problems

- Double healing when bitrot is enabled, instead heal attempt
  once in applyActions() before lifecycle is applied.

- If applyActions() is successful and getSize() returns proper
  value, then object is accounted for and should be removed
  from the oldCache namespace map to avoid double heal attempts.
2020-12-28 10:31:00 -08:00
Harshavardhana
d8d25a308f
fix: use HealObject for cleaning up dangling objects (#11171)
main reason is that HealObjects starts a recursive listing
for each object, this can be a really really long time on
large namespaces instead avoid recursive listing just
perform HealObject() instead at the prefix.

delete's already handle purging dangling content, we
don't need to achieve this by doing recursive listing,
this in-turn can delay crawling significantly.
2020-12-27 15:42:20 -08:00
Harshavardhana
c19e6ce773
avoid a crash in crawler when lifecycle is not initialized (#11170)
Bonus for static buffers use bytes.NewReader instead of
bytes.NewBuffer, to use a more reader friendly implementation
2020-12-26 22:58:06 -08:00
Harshavardhana
59d3639396
fix: inherit heal opts globally, including bitrot settings (#11166)
Bonus re-use ReadFileStream internal io.Copy buffers, fixes
lots of chatty allocations when reading metacache readers
with many sustained concurrent listing operations

```
   17.30GB  1.27% 84.80%    35.26GB  2.58%  io.copyBuffer
```
2020-12-24 23:04:03 -08:00
Harshavardhana
027e17468a
fix: discarding results do not attempt in-memory metacache writer (#11163)
Optimizations include

- do not write the metacache block if the size of the
  block is '0' and it is the first block - where listing
  is attempted for a transient prefix, this helps to
  avoid creating lots of empty metacache entries for
  `minioMetaBucket`

- avoid the entire initialization sequence of cacheCh
  , metacacheBlockWriter if we are simply going to skip
  them when discardResults is set to true.

- No need to hold write locks while writing metacache
  blocks - each block is unique, per bucket, per prefix
  and also is written by a single node.
2020-12-24 15:02:02 -08:00
Harshavardhana
45ea161f8d
webUI: change listing to 1000 keys from browser UI (#11159)
gateway implementations do not handle maxKeys being
`-1` properly unlike MinIO implementation, handle it
by setting an appropriate value.

fixes #11158
2020-12-23 19:58:15 -08:00
Harshavardhana
6a66f142d4
fix: strict quorum in list should list on all drives (#11157)
current implementation was incorrect, it in-fact
assumed only read quorum number of disks. in-fact
that value is only meant for read quorum good entries
from all online disks.

This PR fixes this behavior properly.
2020-12-23 09:26:40 -08:00
Harshavardhana
5982965839
fix: re-use bytes.Buffer using sync.Pool (#11156) 2020-12-22 23:22:37 -08:00
Harshavardhana
8565cefe4e fix: allow HTTP2.0 to be always configured 2020-12-22 16:32:58 -08:00
Andreas Auernhammer
8cdf2106b0
refactor cmd/crypto code for SSE handling and parsing (#11045)
This commit refactors the code in `cmd/crypto`
and separates SSE-S3, SSE-C and SSE-KMS.

This commit should not cause any behavior change
except for:
  - `IsRequested(http.Header)`

which now returns the requested type {SSE-C, SSE-S3,
SSE-KMS} and does not consider SSE-C copy headers.

However, SSE-C copy headers alone are anyway not valid.
2020-12-22 09:19:32 -08:00
Harshavardhana
35fafb837b
fix: issues with handling delete markers in metacache (#11150)
Additional cases handled

- fix address situations where healing is not
  triggered on failed writes and deletes.

- consider object exists during listing when
  metadata can be successfully decoded.
2020-12-22 09:16:43 -08:00
Harshavardhana
274bbad5cb
fix: select always online peers for remote listing (#11153)
always find the right set of online peers for remote listing,
this may have an effect on listing if the server is down - we
should do this to avoid always performing transient operations
on bucket->peerClient that is permanently or down for a long
period.
2020-12-22 09:16:07 -08:00
Harshavardhana
5c451d1690
update x/net/http2 to address few bugs (#11144)
additionally also configure http2 healthcheck
values to quickly detect unstable connections
and let them timeout.

also use single transport for proxying requests
2020-12-21 21:42:38 -08:00
Poorna Krishnamoorthy
c987313431
Encrypt remote target if kms is configured (#11034)
Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
2020-12-21 16:21:33 -08:00
Anis Elleuch
2ecaab55a6
admin: ServerInfo returns info without object layer initialized (#11142) 2020-12-21 09:35:19 -08:00
Harshavardhana
3e792ae2a2
fix: change defaults for DNS cache dialer (#11145) 2020-12-21 09:33:29 -08:00
Harshavardhana
4cc500a041
normalize users with double // in accessKeys (#11143)
Bonus fix, use constant time compare for secret keys  in web-handlers.go:SetAuth()
2020-12-20 10:09:51 -08:00
Harshavardhana
d8e28830cf
fix: allow STS creds for admin accounts to add users (#11138)
Allow rotating creds with privileges to add users

fixes https://github.com/minio/console/issues/529
2020-12-19 13:24:21 -08:00
Harshavardhana
3e16ec457a
fix: support user/groups with '/' character (#11127)
NOTE: user/groups with `//` shall be normalized to `/`

fixes #11126
2020-12-19 09:36:37 -08:00
Harshavardhana
e5d378931d
fix: delimiter based listing was broken without marker (#11136)
with missing nextMarker with delimiter based listing,
top level prefixes beyond 4500 or max-keys value
wouldn't be sent back for client to ask for the next
batch.

reproduced at a customer deployment, create prefixes
as shown below

```
for year in $(seq 2017 2020)
do
    for month in {01..12}
    do for day in {01..31}
       do
           mc -q cp file myminio/testbucket/dir/day_id=$year-$month-$day/;
       done
    done
done
```

Then perform

```
aws s3api --profile minio --endpoint-url http://localhost:9000 list-objects \
    --bucket testbucket --prefix dir/ --delimiter / --max-keys 1000
```

You shall see missing NextMarker, this would disallow listing beyond max-keys
requested and also disallow beyond 4500 (maxKeyObjectList) prefixes being listed
because client wouldn't know the NextMarker available.

This PR addresses this situation properly by making the implementation
more spec compatible. i.e NextMarker in-fact can be either an object, a prefix
with delimiter depending on the input operation.

This issue was introduced after the list caching changes and has been present
for a while.
2020-12-19 09:36:04 -08:00
Anis Elleuch
e63a10e505
Profiling does not required object layer to be initialized (#11133) 2020-12-18 11:51:15 -08:00
Anis Elleuch
5434088c51
replication: Ensure to always use nano precision source modtime (#11135) 2020-12-18 11:37:28 -08:00
Harshavardhana
a773cf48d8
fix: overlapping object and prefix rejected (#11130)
fixes #11129
2020-12-18 08:51:09 -08:00
Harshavardhana
f714840da7
add _MINIO_SERVER_DEBUG env for enabling debug messages (#11128) 2020-12-17 16:52:47 -08:00
Harshavardhana
7c9ef76f66
fix: timer deadlock on expired timers (#11124)
issue was introduced in #11106 the following
pattern

<-t.C // timer fired

if !t.Stop() {
   <-t.C // timer hangs
}

Seems to hang at the last `t.C` line, this
issue happens because a fired timer cannot be
Stopped() anymore and t.Stop() returns `false`
leading to confusing state of usage.

Refactor the code such that use timers appropriately
with exact requirements in place.
2020-12-17 12:35:02 -08:00
Anis Elleuch
cffdb01279
azure/s3 gateways: Pass ETag during GET call to avoid data corruption (#11024)
Both Azure & S3 gateways call for object information before returning
the stream of the object, however, the object content/length could be
modified meanwhile, which means it can return a corrupted object.

Use ETag to ensure that the object was not modified during the GET call
2020-12-17 09:11:14 -08:00
Harshavardhana
b390a2a0b9
fix: reuser timers in erasure set hotpaths (#11106)
reuser timers in

 - connectDisks() monitoring
 - healMRFRoutine() channel timeouts
2020-12-16 14:33:05 -08:00
Harshavardhana
90158f1e33
fix: avoid logging for Heal APIs in FS mode (#11121)
fixes #11120
2020-12-16 09:46:13 -08:00
Harshavardhana
c606c76323
fix: prioritized latest buckets for crawler to finish the scans faster (#11115)
crawler should only ListBuckets once not for each serverPool,
buckets are same across all pools, across sets and ListBuckets
always returns an unified view, once list buckets returns
sort it by create time to scan the latest buckets earlier
with the assumption that latest buckets would have lesser
content than older buckets allowing them to be scanned faster
and also to be able to provide more closer to latest view.
2020-12-15 17:34:54 -08:00
Klaus Post
e7d3b49a20
metacache: Make very small requests transient (#11109) 2020-12-15 11:25:36 -08:00
Harshavardhana
5df61ab96b
fix: remove gorilla/rpc/ deps fully after our fork (#11108) 2020-12-15 11:18:06 -08:00
Poorna Krishnamoorthy
3456b03b12
Ignore ObjectNotFound errors in delete api while enforcing locking (#11114)
AWS does not report this or version not found as errors in the response.
2020-12-15 11:15:49 -08:00
Klaus Post
f6fb27e8f0
Don't copy interesting ids, clean up logging (#11102)
When searching the caches don't copy the ids, instead inline the loop.

```
Benchmark_bucketMetacache_findCache-32    	   19200	     63490 ns/op	    8303 B/op	       5 allocs/op
Benchmark_bucketMetacache_findCache-32    	   20338	     58609 ns/op	     111 B/op	       4 allocs/op
```

Add a reasonable, but still the simplistic benchmark.

Bonus - make nicer zero alloc logging
2020-12-14 13:13:33 -08:00
Harshavardhana
8368ab76aa
fix: remove the requirement for healing buckets in ListBucketsHeal (#11098)
With new refactor of bucket healing, healing bucket happens
automatically including its metadata, there is no need to
redundant heal buckets also in ListBucketsHeal remove
it.
2020-12-14 12:07:07 -08:00
Harshavardhana
3e83643320
lifecycle improvements and additional debug logging (#11096)
Bonus change fix browser assets
2020-12-13 12:05:54 -08:00
Harshavardhana
2eb52ca5f4
fix: heal bucket metadata right before healing bucket (#11097)
optimization mainly to avoid listing the entire
`.minio.sys/buckets/.minio.sys` directory, this
can get really huge and comes in the way of startup
routines, contents inside `.minio.sys/buckets/.minio.sys`
are rather transient and not necessary to be healed.
2020-12-13 11:57:08 -08:00
Anis Elleuch
f164085227
xl: Always set root disk to true in test environment (#11094)
Tests environments (go test or manual testing) should always consider
the passed disks are root disks and should not rely on disk.IsRootDisk()
function. The reason is that this latter can return a false negative
when called in a busy system. However, returning a false negative will
only occur in a testing environment and not in a production, so we can
accept this trade-off for now.
2020-12-12 16:10:07 -08:00
Harshavardhana
48191dd748
return NoSuchVersion if invalid version-id is specified (#11091) 2020-12-11 20:44:08 -08:00
Anis Elleuch
c4f29d24da
metacache: Ask all disks when drive count is 4 (#11087) 2020-12-11 17:54:31 -08:00
Harshavardhana
db7890660e
fix: a crash when disk is nil, safe access on erasureDisks (#11089)
fixes #11088
2020-12-11 16:58:36 -08:00
Poorna Krishnamoorthy
9adc33efbb
Return version-id header in DeleteObject response (#11090)
even when the object version is non-existent

To make this consistent with aws behavior.

Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
2020-12-11 16:58:15 -08:00
Poorna Krishnamoorthy
8f65aba04b
ignore NoSuchVersion error in DeleteObjects API (#11086)
Currently, the error response reports NoSuchVersion
for a non-existent version-id, whereas AWS ignores it.
2020-12-11 12:39:09 -08:00
Harshavardhana
3a0082f0f1
fix: TTFB prometheus metrics calculation (#11082)
until now metrics was reporting entire call
duration instead of ttfb's this PR fixes it
2020-12-10 23:02:25 -08:00
Klaus Post
4bca62a0bd
crawler: Stream bucket usage cache data (#11068)
Stream bucket caches to storage and through RPC calls.
2020-12-10 13:03:22 -08:00
Klaus Post
82e2be4239
metacache: Speed up cleanup operation (#11078)
Perform cleanup operations on copied data. Avoids read locking
data while determining which caches to keep.

Also, reduce the log(N*N) operation to log(N*M) where M caches 
with the same root or below when checking potential replacements.
2020-12-10 12:30:28 -08:00
Harshavardhana
4550ac6fff
fix: refactor locks to apply them uniquely per node (#11052)
This refactor is done for few reasons below

- to avoid deadlocks in scenarios when number
  of nodes are smaller < actual erasure stripe
  count where in N participating local lockers
  can lead to deadlocks across systems.

- avoids expiry routines to run 1000 of separate
  network operations and routes per disk where
  as each of them are still accessing one single
  local entity.

- it is ideal to have since globalLockServer
  per instance.

- In a 32node deployment however, each server
  group is still concentrated towards the
  same set of lockers that partipicate during
  the write/read phase, unlike previous minio/dsync
  implementation - this potentially avoids send
  32 requests instead we will still send at max
  requests of unique nodes participating in a
  write/read phase.

- reduces overall chattiness on smaller setups.
2020-12-10 07:28:37 -08:00
Klaus Post
e65ed2e44f
listcache: Add path index (#11063)
Add a root path index.

```
Before:
Benchmark_bucketMetacache_findCache-32    	   10000	    730737 ns/op

With excluded prints:
Benchmark_bucketMetacache_findCache-32    	   10000	    207100 ns/op

With the root path:
Benchmark_bucketMetacache_findCache-32    	  705765	      1943 ns/op
```

Benchmark used (not linear):

```Go
func Benchmark_bucketMetacache_findCache(b *testing.B) {
	bm := newBucketMetacache("", false)
	for i := 0; i < b.N; i++ {
		bm.findCache(listPathOptions{
			ID:           mustGetUUID(),
			Bucket:       "",
			BaseDir:      "prefix/" + mustGetUUID(),
			Prefix:       "",
			FilterPrefix: "",
			Marker:       "",
			Limit:        0,
			AskDisks:     0,
			Recursive:    false,
			Separator:    slashSeparator,
			Create:       true,
			CurrentCycle: 0,
			OldestCycle:  0,
		})
	}
}
```

Replaces #11058
2020-12-09 08:37:43 -08:00
Anis Elleuch
d90044b847
federation: Redirect Lifecycle PUT request by bucket name (#11062)
The bucket forwarder handler considers MakeBucket to be always local but
it mistakenly thinks that PUT bucket lifecycle to be a MakeBucket call.

Fix the check of the MakeBucket call by ensuring that the query is empty
in the PUT url.
2020-12-09 07:25:26 -08:00
Harshavardhana
d8c1f93de6
reject mixed drive situations with drives on root disks (#11057)
till now we used to match the inode number of the root
drive and the drive path minio would use, if they match
we knew that its a root disk.

this may not be true in all situations such as running
inside a container environment where the container might
be mounted from a different partition altogether, root
disk detection might fail.
2020-12-09 00:27:02 -08:00
Anis Elleuch
a51488cbaa
s3: Fix reading GET with partNumber specified (#11032)
partNumber was miscalculting the start and end of parts when partNumber
query is specified in the GET request. This commit fixes it and also
fixes the ContentRange header in that case.
2020-12-08 13:12:42 -08:00
Harshavardhana
dc819afa44 fix: auto update crawler meta version
PR 038bcd9079 introduced
version '3', we need to make sure that we do not
print an unexpected error instead log a message to
indicate we will auto update the version.
2020-12-08 10:40:51 -08:00
Harshavardhana
4a564336fe Revert "Add metrics for nodes online and offline (#11050)"
This reverts commit f60bbdf86b.
2020-12-08 09:23:35 -08:00
Ritesh H Shukla
f60bbdf86b
Add metrics for nodes online and offline (#11050) 2020-12-08 01:06:27 -08:00
Poorna Krishnamoorthy
f3beb1236a
Add cache usage, total capacity to prometheus metrics (#11026) 2020-12-07 16:35:11 -08:00
Poorna Krishnamoorthy
934bed47fa
Add transition event notification (#11047)
This is a MinIO specific extension to allow monitoring of transition events.
2020-12-07 13:53:28 -08:00
Ritesh H Shukla
038bcd9079
Add replication capacity metrics support in crawler (#10786) 2020-12-07 13:47:48 -08:00
Harshavardhana
ce93b2681b
fix: re-use er.getDisks() properly in certain calls (#11043) 2020-12-07 10:04:07 -08:00
Harshavardhana
8d036ed6d8
fix: allow sub-admin to modify password for other users (#11039)
fixes #11037
2020-12-06 20:36:34 -08:00
Harshavardhana
9c53cc1b83
fix: heal multiple buckets in bulk (#11029)
makes server startup, orders of magnitude
faster with large number of buckets
2020-12-05 13:00:44 -08:00
Harshavardhana
3514e89eb3
support envs as well for new crawler sub-system (#11033) 2020-12-04 21:54:24 -08:00
Klaus Post
a896125490
Add crawler delay config + dynamic config values (#11018) 2020-12-04 09:32:35 -08:00
Harshavardhana
e083471ec4
use argon2 with sync.Pool for better memory management (#11019) 2020-12-03 19:23:19 -08:00
Harshavardhana
80d31113e5
fix: etcd import paths again depend on v3.4.14 release (#11020)
Due to botched upstream renames of project repositories
and incomplete migration to go.mod support, our current
dependency version of `go.mod` had bugs i.e it was
using commits from master branch which didn't have
the required fixes present in release-3.4 branches

which leads to some rare bugs

https://github.com/etcd-io/etcd/pull/11477 provides
a workaround for now and we should migrate to this.

release-3.5 eventually claims to fix all of this
properly until then we cannot use /v3 import right now
2020-12-03 11:35:18 -08:00
Ritesh H Shukla
7e2b79984e
Stream bucket bandwidth measurements (#11014) 2020-12-03 11:34:42 -08:00
Harshavardhana
951b6b203b skip metacache entries healing to speed up startup 2020-12-02 21:30:54 -08:00
Harshavardhana
44e23b7f4f fix: startup being slow - wait only if IOCount > 0 2020-12-02 21:06:17 -08:00
Harshavardhana
96c0ce1f0c
add support for tuning healing to make healing more aggressive (#11003)
supports `mc admin config set <alias> heal sleep=100ms` to
enable more aggressive healing under certain times.

also optimize some areas that were doing extra checks than
necessary when bitrotscan was enabled, avoid double sleeps
make healing more predictable.

fixes #10497
2020-12-02 11:12:00 -08:00
ebozduman
303be1866d
Adds "x-amz-usr-agent" and "x-id" params to be used in authentication of presignedURL (#10792) 2020-12-02 02:02:49 -08:00
Harshavardhana
4ec45753e6 rename server sets to server pools 2020-12-01 13:50:33 -08:00
Klaus Post
e6ea5c2703
crawler: Missing folder heal check per set (#10876) 2020-12-01 12:07:39 -08:00
Harshavardhana
790833f3b2 Revert "Support variable server sets (#10314)"
This reverts commit aabf053d2f.
2020-12-01 12:02:29 -08:00
Harshavardhana
7cbca43eb1
fix: allow admins to create users (#11005)
PR #10978 introduced a regression, root
credential should be allowed to create users
2020-11-30 21:53:23 -08:00
Poorna Krishnamoorthy
2f564437ae
Disallow writeback caching with cache_after (#11002)
fixes #10974
2020-11-30 20:53:27 -08:00
Harshavardhana
bdd094bc39
fix: avoid sending errors on missing objects on locked buckets (#10994)
make sure multi-object delete returned errors that are AWS S3 compatible
2020-11-28 21:15:45 -08:00
Harshavardhana
e6fa410778
fix: allow accountInfo, addUser and getUserInfo implicit (#10978)
- accountInfo API that returns information about
  user, access to buckets and the size per bucket
- addUser - user is allowed to change their secretKey
- getUserInfo - returns user info if the incoming
  is the same user requesting their information
2020-11-27 17:23:57 -08:00
Harshavardhana
aabf053d2f
Support variable server sets (#10314) 2020-11-25 16:28:47 -08:00
Anis Elleuch
91130e884b
Avoid sending errors in gob in storage requests (#10977) 2020-11-25 12:42:48 -08:00
Poorna Krishnamoorthy
2ff655a745
Refactor replication, ILM handling in DELETE API (#10945) 2020-11-25 11:24:50 -08:00
Klaus Post
0422eda6a2
metacache: Always close block writer (#10973)
In some cases a writer could be left behind unclosed, leaking compression blocks.

Always close and set compression concurrency to 2 which should be fine to keep up.
2020-11-25 09:37:30 -08:00
Harshavardhana
31e6f60847
fix: improve error handling in metacache (#10965) 2020-11-25 01:11:22 -08:00
Poorna Krishnamoorthy
3ad41fe89d
Add admin API to edit remote bucket target credentials (#10848) 2020-11-24 19:09:05 -08:00
Klaus Post
a75fafdbe2
Remove msgp workaround (#10964)
The error in `github.com/philhofer/fwd` was quickly fixed through 
https://github.com/philhofer/fwd/pull/22 - update the dependency 
and remove the workaround.
2020-11-24 11:58:10 -08:00
Klaus Post
a58b7874ef
Temporary workaround for msgp skipping (#10960)
Due to https://github.com/philhofer/fwd/issues/20 when skipping a metadata entry that is >2048 bytes and the buffer is full (2048 bytes) the skip will fail with `io.ErrNoProgress`.

Enlarge the buffer so we temporarily make this much more unlikely.

If it still happens we will have to rewrite the skips to reads.

Fixes #10959
2020-11-23 18:51:59 -08:00
Harshavardhana
6990de9c94
fix: dangling object delete shall return object doesn't exist (#10961)
dangling object when deleted means object doesn't exist
anymore, so we should return appropriate errors, this
allows crawler heal to ensure that it removes the tracker
for dangling objects.
2020-11-23 18:50:53 -08:00
Anis Elleuch
75a8e81f8f
azure: Specify different Azure storage in the shell env (#10943)
AZURE_STORAGE_ACCOUNT and AZURE_STORAGE_KEY are used in 
azure CLI to specify the azure blob storage access & secret keys. With this commit, 
it is possible to set them if you want the gateway's own credentials to be
different from the Azure blob credentials.

Co-authored-by: Harshavardhana <harsha@minio.io>
2020-11-23 16:45:56 -08:00
Harshavardhana
519c0077a9
fix: do not return an error for successfully deleted dangling objects (#10938)
dangling objects when removed `mc admin heal -r` or crawler
auto heal would incorrectly return error - this can interfere
with usage calculation as the entry size for this would be
returned as `0`, instead upon success use the resultant
object size to calculate the final size for the object
and avoid reporting this in the log messages

Also do not set ObjectSize in healResultItem to be '-1'
this has an effect on crawler metrics calculating 1 byte
less for objects which seem to be missing their `xl.meta`
2020-11-23 09:12:17 -08:00
Harshavardhana
734d07a532
fix: all hosts local and port same should be local erasure setup (#10951)
this is needed to avoid initializing notification peers
that can lead to races in many sub-systems

fixes #10950
2020-11-23 09:07:50 -08:00
Harshavardhana
df93102235
fix: unwrapping issues with os.Is* functions (#10949)
reduces  3 stat calls, reducing the
overall startup time significantly.
2020-11-23 08:36:49 -08:00
Poorna Krishnamoorthy
39f3d5493b
Show Delete replication status header (#10946)
X-Minio-Replication-Delete-Status header shows the
status of the replication of a permanent delete of a version.

All GETs are disallowed and return 405 on this object version.
In the case of replicating delete markers.

X-Minio-Replication-DeleteMarker-Status shows the status 
of replication, and would similarly return 405.

Additionally, this PR adds reporting of delete marker event completion
and updates documentation
2020-11-21 23:48:50 -08:00
Klaus Post
692ff41ef7
Unwrap network errors (#10934)
Alternative to #10927

Instead of having an upstream fix, do unwrap when checking network errors.

'As' will also work when destination is an interface as checked by the tests.
2020-11-20 22:55:35 -08:00
Harshavardhana
86409fa93d
add audit/admin trace support for browser requests (#10947)
To support this functionality we had to fork
the gorilla/rpc package with relevant changes
2020-11-20 22:52:17 -08:00
Shireesh Anjal
7bc47a14cc
Rename OBD to Health (#10842)
Also, Remove thread stats and openfds from the health report 
as we already have process stats and numfds
2020-11-20 12:52:53 -08:00
Harshavardhana
73e308079a
fix: handle errors appropriately as they are wrapped (#10917) 2020-11-20 10:43:07 -08:00
Poorna Krishnamoorthy
08b24620c0 Display storage-class of transitioned object in HEAD 2020-11-20 09:17:31 -08:00
Harshavardhana
95675b0c9a
fix: do not crash PutObjectTags when node is down (#10940)
fixes #10939
2020-11-20 09:10:48 -08:00
Poorna Krishnamoorthy
251c1ef6da Add support for replication of object tags, retention metadata (#10880) 2020-11-19 18:56:09 -08:00
Poorna Krishnamoorthy
0fa430c1da validate service type of target in replication/ilm transition config (#10928) 2020-11-19 18:47:33 -08:00
Poorna Krishnamoorthy
f60b6eb82e fix validation for deletemarker replication on object locked bucket (#10892) 2020-11-19 18:47:19 -08:00
Poorna Krishnamoorthy
1ebf6f146a Add support for ILM transition (#10565)
This PR adds transition support for ILM
to transition data to another MinIO target
represented by a storage class ARN. Subsequent
GET or HEAD for that object will be streamed from
the transition tier. If PostRestoreObject API is
invoked, the transitioned object can be restored for
duration specified to the source cluster.
2020-11-19 18:47:17 -08:00
Harshavardhana
8f7fe0405e fix: delete marker replication should support directories (#10878)
allow directories to be replicated as well, along with
their delete markers in replication.

Bonus fix to fix bloom filter updates for directories
to be preserved.
2020-11-19 18:47:12 -08:00
Harshavardhana
9a34fd5c4a Revert "Revert "Add delete marker replication support (#10396)""
This reverts commit 267d7bf0a9.
2020-11-19 18:43:58 -08:00
Harshavardhana
f794fe79e3
fix: network shutdown was not handle properly (#10927)
fixes a regression introduced in #10859, due
to the error returned by rest.Client being typed
i.e *rest.NetworkError - IsNetworkHostDown function
didn't work as expected to detect network issues.

This in-turn aggravated the situations when nodes
are disconnected leading to performance loss.
2020-11-19 13:53:49 -08:00
Harshavardhana
0f9e125cf3
fix: check for gateway backend online without http request (#10924)
fixes #10921
2020-11-19 10:38:02 -08:00
Harshavardhana
d778d9493f
remove MinIO release tag as part of HTTP Server string (#10929) 2020-11-19 09:16:02 -08:00
Harshavardhana
70d2c2ccc9
skip files that are not erasure objects or directories (#10926)
without this change WalkDir reports errors while
trying to read `format.json/xl.meta` which is a
replicated file
2020-11-19 09:15:09 -08:00
Harshavardhana
9dea7020f0
allow prefix filtering for WalkDir to be optional (#10923) 2020-11-18 12:03:16 -08:00
Klaus Post
990d074f7d
metacache: Allow prefix filtering (#10920)
Do listings with prefix filter when bloom filter is dirty.

This will forward the prefix filter to the lister which will make it 
only scan the folders/objects with the specified prefix.

If we have a clean bloom filter we try to build a more generally 
useful cache so in that case, we will list all objects/folders.
2020-11-18 10:44:18 -08:00
Klaus Post
e413f05397
Save listing error async (#10922)
Since the RPC call may have to time out save an error state async 
to not hold up the listing returning.

Fixes #10919
2020-11-18 10:28:22 -08:00
Harshavardhana
d1b1fee080
fix: save healing tracker right before healing (#10915)
this change avoids a situation where accidentally
if the user deleted the healing tracker or drives
were replaced again within the 10sec window.
2020-11-18 09:34:46 -08:00
Harshavardhana
9738d605e4
increase readdir per block memory to facilitate faster WalkDir (#10908) 2020-11-18 09:21:02 -08:00
Klaus Post
10099357b6
listcache: Wrap returned errors (#10882)
To give an indication of where they happen
2020-11-17 09:11:59 -08:00
Harshavardhana
80b8ce89a4
remove context deadline from Delete calls (#10901) 2020-11-17 09:09:45 -08:00
Poorna Krishnamoorthy
0b766288ef
fix: send replication completed event notification (#10902) 2020-11-15 22:16:41 -08:00
Rafael Bodill
598ca0569c
fix: global in-place update boolean check (#10900) 2020-11-15 13:34:12 -08:00
Poorna Krishnamoorthy
d295ce5708
Fix disk cache usage percent for prometheus (#10898)
Fixes: #10895

Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
2020-11-14 19:18:00 -08:00
Klaus Post
b5a3d79bce
listobjectversions: Add shortcut for Veeam blocks (#10893)
Add shortcut for `APN/1.0 Veeam/1.0 Backup/10.0`

It requests unique blocks with a specific prefix. We skip 
scanning the parent directory for more objects matching the prefix.
2020-11-13 16:58:20 -08:00
Harshavardhana
17a5ff51ff
fix: move context timeout closer to network for Delete calls (#10897)
allowing for disconnects to be limited to the drive
themselves instead of disconnecting all drives.
2020-11-13 16:56:45 -08:00
Harshavardhana
0bcb1b679d
fix: disallow update if dates are same (#10890)
fixes #10889
2020-11-12 14:18:59 -08:00
Klaus Post
a3017c724e
Sort directory objects correctly (#10886)
Decode dir objects when listing and sort them correctly.
2020-11-12 13:09:34 -08:00
Harshavardhana
267d7bf0a9 Revert "Add delete marker replication support (#10396)"
This reverts commit 50c10a5087.

PR is moved to origin/dev branch
2020-11-12 11:43:14 -08:00
cksac
be83dfc52a
fix: HDFS list bucket when subpath is provided (#10884) 2020-11-12 11:26:51 -08:00
Harshavardhana
ca88ca753c
ignore typed errors correctly in list cache layer (#10879)
bonus write bucket metadata cache with enough quorum

Possible fix for #10868
2020-11-12 09:28:56 -08:00
Klaus Post
f86d3538f6
Allow deeper sleep (#10883)
Allow each crawler operation to sleep up to 10 seconds on very heavily loaded systems.

This will of course make minimum crawler speed less, but should be more effective at stopping.
2020-11-12 09:17:56 -08:00
Klaus Post
1c3590078d
Skip 0 byte stream writes (#10875)
Don't send a packet when receiving 0 bytes or there is an error recorded
2020-11-11 18:07:40 -08:00
Harshavardhana
aa158228f9
fix: simplify healing metadata objects per set (#10867) 2020-11-11 10:58:16 -08:00
Klaus Post
8747834c69
DeletedObjects: Return objects on lock failure (#10874)
Return objects when locking fails.

<details>
<summary>Panic</summary>

```
: 2020/11/10 04:15:55 http: panic serving 10.10.62.153:44858: runtime error: index out of range [0] with length 0
: goroutine 363537270 [running]:
: net/http.(*conn).serve.func1(0xc019232780)
:         net/http/server.go:1801 +0x147
: panic(0x1cadd60, 0xc001719260)
:         runtime/panic.go:975 +0x47a
: github.com/minio/minio/cmd.criticalErrorHandler.ServeHTTP.func1(0xc0121d1200, 0x210cda0, 0xc0141940e0)
:         github.com/minio/minio/cmd/generic-handlers.go:781 +0x1a8
: panic(0x1cadd60, 0xc001719260)
:         runtime/panic.go:969 +0x1b9
: github.com/minio/minio/cmd.objectAPIHandlers.DeleteMultipleObjectsHandler(0x1e71ce8, 0x1e71cc8, 0x2108420, 0xc0192328c0, 0xc0121d1400)
:         github.com/minio/minio/cmd/bucket-handlers.go:465 +0x2490
: net/http.HandlerFunc.ServeHTTP(...)
:         net/http/server.go:2042
: github.com/minio/minio/cmd.httpTraceAll.func1(0x2108420, 0xc0192328c0, 0xc0121d1400)
:         github.com/minio/minio/cmd/handler-utils.go:353 +0x158
: net/http.HandlerFunc.ServeHTTP(...)
:         net/http/server.go:2042
: github.com/minio/minio/cmd.collectAPIStats.func1(0x2108420, 0xc019232820, 0xc0121d1400)
:         github.com/minio/minio/cmd/handler-utils.go:380 +0xed
: net/http.HandlerFunc.ServeHTTP(...)
:         net/http/server.go:2042
: github.com/minio/minio/cmd.maxClients.func1(0x2108420, 0xc019232820, 0xc0121d1400)
:         github.com/minio/minio/cmd/handler-api.go:132 +0x33b
: net/http.HandlerFunc.ServeHTTP(0xc00271d590, 0x2108420, 0xc019232820, 0xc0121d1400)
:         net/http/server.go:2042 +0x44
: github.com/minio/minio/cmd.redirectHandler.ServeHTTP(0x20e2180, 0xc00271d590, 0x2108420, 0xc019232820, 0xc0121d1400)
:         github.com/minio/minio/cmd/generic-handlers.go:192 +0x156
: github.com/minio/minio/cmd.customHeaderHandler.ServeHTTP(0x20e1060, 0xc0141a22b0, 0x21083e0, 0xc01814d2e0, 0xc0121d1400)
:         github.com/minio/minio/cmd/generic-handlers.go:751 +0x162
: github.com/minio/minio/cmd.securityHeaderHandler.ServeHTTP(0x20e0fc0, 0xc0141a22c0, 0x21083e0, 0xc01814d2e0, 0xc0121d1400)
:         github.com/minio/minio/cmd/generic-handlers.go:766 +0x1d6
: github.com/minio/minio/cmd.bucketForwardingHandler.ServeHTTP(0xc0121c7a40, 0x20e1120, 0xc0141a22d0, 0x21083e0, 0xc01814d2e0, 0xc0121d1400)
:         github.com/minio/minio/cmd/generic-handlers.go:624 +0xbf
: github.com/minio/minio/cmd.requestValidityHandler.ServeHTTP(0x20e0f20, 0xc01814d280, 0x21083e0, 0xc01814d2e0, 0xc0121d1400)
:         github.com/minio/minio/cmd/generic-handlers.go:608 +0x42a
: github.com/minio/minio/cmd.httpStatsHandler.ServeHTTP(0x20e10c0, 0xc0141a2300, 0x210cda0, 0xc0141940e0, 0xc0121d1400)
:         github.com/minio/minio/cmd/generic-handlers.go:536 +0xe4
: github.com/minio/minio/cmd.requestSizeLimitHandler.ServeHTTP(0x20e0fe0, 0xc0141a2310, 0x50004000000, 0x210cda0, 0xc0141940e0, 0xc0121d1400)
:         github.com/minio/minio/cmd/generic-handlers.go:68 +0xd4
: github.com/minio/minio/cmd.requestHeaderSizeLimitHandler.ServeHTTP(0x20e10a0, 0xc01814d2a0, 0x210cda0, 0xc0141940e0, 0xc0121d1400)
:         github.com/minio/minio/cmd/generic-handlers.go:93 +0x1b7
: github.com/minio/minio/cmd.crossDomainPolicy.ServeHTTP(0x20e1080, 0xc0141a2320, 0x210cda0, 0xc0141940e0, 0xc0121d1400)
:         github.com/minio/minio/cmd/crossdomain-xml-handler.go:51 +0x82
: github.com/minio/minio/cmd.browserRedirectHandler.ServeHTTP(0x20e0fa0, 0xc0141a2330, 0x210cda0, 0xc0141940e0, 0xc0121d1400)
:         github.com/minio/minio/cmd/generic-handlers.go:276 +0x68
: github.com/minio/minio/cmd.minioReservedBucketHandler.ServeHTTP(0x20e0f00, 0xc0141a2340, 0x210cda0, 0xc0141940e0, 0xc0121d1400)
:         github.com/minio/minio/cmd/generic-handlers.go:344 +0xb8
: github.com/minio/minio/cmd.cacheControlHandler.ServeHTTP(0x20e1020, 0xc0141a2350, 0x210cda0, 0xc0141940e0, 0xc0121d1400)
:         github.com/minio/minio/cmd/generic-handlers.go:303 +0x1ce
: github.com/minio/minio/cmd.timeValidityHandler.ServeHTTP(0x20e0f40, 0xc0141a2360, 0x210cda0, 0xc0141940e0, 0xc0121d1400)
:         github.com/minio/minio/cmd/generic-handlers.go:414 +0x3ca
: github.com/minio/minio/cmd.resourceHandler.ServeHTTP(0x20e1160, 0xc0141a2370, 0x210cda0, 0xc0141940e0, 0xc0121d1400)
:         github.com/minio/minio/cmd/generic-handlers.go:516 +0xab
: github.com/minio/minio/cmd.authHandler.ServeHTTP(0x20e1100, 0xc0141a2380, 0x210cda0, 0xc0141940e0, 0xc0121d1400)
:         github.com/minio/minio/cmd/auth-handler.go:502 +0x2e7
: github.com/minio/minio/cmd.sseTLSHandler.ServeHTTP(0x20e0ee0, 0xc0141a2390, 0x210cda0, 0xc0141940e0, 0xc0121d1400)
:         github.com/minio/minio/cmd/generic-handlers.go:802 +0x79
: github.com/minio/minio/cmd.reservedMetadataHandler.ServeHTTP(0x20e1140, 0xc0141a23a0, 0x210cda0, 0xc0141940e0, 0xc0121d1400)
:         github.com/minio/minio/cmd/generic-handlers.go:139 +0x1b7
: github.com/gorilla/mux.(*Router).ServeHTTP(0xc00073fb00, 0x210cda0, 0xc0141940e0, 0xc0121d1200)
:         github.com/gorilla/mux@v1.8.0/mux.go:210 +0xd3
: github.com/rs/cors.(*Cors).Handler.func1(0x210cda0, 0xc0141940e0, 0xc0121d1200)
:         github.com/rs/cors@v1.7.0/cors.go:219 +0x1b9
: net/http.HandlerFunc.ServeHTTP(0xc0009aece0, 0x210cda0, 0xc0141940e0, 0xc0121d1200)
:         net/http/server.go:2042 +0x44
: github.com/minio/minio/cmd.criticalErrorHandler.ServeHTTP(0x20e2180, 0xc0009aece0, 0x210cda0, 0xc0141940e0, 0xc0121d1200)
:         github.com/minio/minio/cmd/generic-handlers.go:784 +0x85
: github.com/minio/minio/cmd/http.(*Server).Start.func1(0x210cda0, 0xc0141940e0, 0xc0121d1200)
:         github.com/minio/minio/cmd/http/server.go:101 +0x258
: net/http.HandlerFunc.ServeHTTP(0xc000dc4080, 0x210cda0, 0xc0141940e0, 0xc0121d1200)
:         net/http/server.go:2042 +0x44
: net/http.serverHandler.ServeHTTP(0xc000764c60, 0x210cda0, 0xc0141940e0, 0xc0121d1200)
:         net/http/server.go:2843 +0xa3
: net/http.(*conn).serve(0xc019232780, 0x2114720, 0xc03381f6c0)
:         net/http/server.go:1925 +0x8ad
: created by net/http.(*Server).Serve
:         net/http/server.go:2969 +0x36c
```
</details>
2020-11-11 09:14:32 -08:00
Poorna Krishnamoorthy
50c10a5087
Add delete marker replication support (#10396)
Delete marker replication is implemented for V2
configuration specified in AWS spec (though AWS
allows it only in the V1 configuration).

This PR also brings in a MinIO only extension of
replicating permanent deletes, i.e. deletes specifying
version id are replicated to target cluster.
2020-11-10 15:24:14 -08:00
Steven Reitsma
4683a623dc
fix: negative STS IAM token TTL value (#10866) 2020-11-10 12:24:01 -08:00
Klaus Post
06899210a7
Reduce health check output (#10859)
This will make the health check clients 'silent'.
Use `IsNetworkOrHostDown` determine if network is ok so it mimics the functionality in the actual client.
2020-11-10 09:28:23 -08:00
Harshavardhana
cbdab62c1e
fix: heal user/metadata right away upon server startup (#10863)
this is needed such that we make sure to heal the
users, policies and bucket metadata right away as
we do listing based on list cache which only lists
'3' sufficiently good drives, to avoid possibly
losing access to these users upon upgrade make
sure to heal them.
2020-11-10 09:02:06 -08:00
Harshavardhana
8df6112204
fix: avoid divide by zero error single node distributed setup (#10862) 2020-11-09 20:40:39 -08:00
Harshavardhana
97692bc772
re-route requests if IAM is not initialized (#10850) 2020-11-07 21:03:06 -08:00
Steven Reitsma
54120107ce
fix: infinite loop in cleanupStaleUploads of encrypted MPUs (#10845)
fixes #10588
2020-11-06 11:53:42 -08:00
Klaus Post
9bf5990ea9
metadata: Invalidate cache if unreadable and not updating (#10844)
If a scanning server shuts down unexpectedly we may have "successful" caches that are incomplete on a set.

In this case mark the cache with an error so it will no longer be handed out.
2020-11-06 08:54:09 -08:00
Steven Reitsma
74f7cf24ae
fix: s3 gateway SSE pagination (#10840)
Fixes #10838
2020-11-05 15:04:03 -08:00
Harshavardhana
fb28aa847b
fix: add missing deleted key element in multiObjectDelete (#10839)
fixes #10832
2020-11-05 12:47:46 -08:00
Klaus Post
0724205f35
metacache: Add option for life extension (#10837)
Add `MINIO_API_EXTEND_LIST_CACHE_LIFE` that will extend 
the life of generated caches for a while.

This changes caches to remain valid until no updates have been 
received for the specified time plus a fixed margin.

This also changes the caches from being invalidated when the *first* 
set finishes until the *last* set has finished plus the specified time 
has passed.
2020-11-05 11:49:56 -08:00
Harshavardhana
b72cac4cf3
fix: dangling objects on actual namespace (#10822) 2020-11-05 11:48:55 -08:00
Klaus Post
bd77f29fc4
Don't replace caches that are receiving updates (#10834)
Keep caches while they are receiving updates.
Move update code to separate function.
2020-11-05 07:34:08 -08:00
Klaus Post
d1e1205036
metacache: Always close the s2 writer (#10836)
The s2 writer could be leaked if there was an error.

Make sure it is always closed.
2020-11-05 07:30:14 -08:00
Harshavardhana
71753e21e0
add missing TTL for STS credentials on etcd (#10828) 2020-11-04 13:06:05 -08:00
Harshavardhana
fde3299bf3
re-use optimized readdir for isDirEmpty() (#10829)
reduces effective memory usage by an order
of magnitude, also increases performance for
small objects
2020-11-04 13:05:21 -08:00
Harshavardhana
1a1f00fa15
fix: use internode data for DisksInfo, VolsInfo in message pack (#10821)
Similar to #10775 for fewer memory allocations, since we use
getOnlineDisks() extensively for listing we should optimize it
further.

Additionally, remove all unused walkers from the storage layer
2020-11-04 10:10:54 -08:00
Bill Thorp
4a1efabda4
Context based AccessKey passing (#10615)
A new field called AccessKey is added to the ReqInfo struct and populated.
Because ReqInfo is added to the context, this allows the AccessKey to be
accessed from 3rd-party code, such as a custom ObjectLayer.

Co-authored-by: Harshavardhana <harsha@minio.io>
Co-authored-by: Kaloyan Raev <kaloyan@storj.io>
2020-11-04 09:13:34 -08:00
Klaus Post
3b88a646ec
Add remote online/offline information (#10825)
Log information about remote clients being marked offline.

This will help to identify root causes of failures.
2020-11-04 08:27:32 -08:00
Klaus Post
2294e53a0b
Don't retain context in locker (#10515)
Use the context for internal timeouts, but disconnect it from outgoing 
calls so we always receive the results and cancel it remotely.
2020-11-04 08:25:42 -08:00
Klaus Post
f0819cce75
Keep transient lists while they are updating (#10826)
On extremely long running listings keep the transient list 15 minutes after last update instead of using start time.

Also don't do overlap checks on transient lists.
2020-11-04 08:01:33 -08:00
Klaus Post
1e11b4629f
Add remote Diskinfo caching (#10824)
Add 1 second remote disk info cache.

Should decrease need for remote calls a great deal due to how actively it is used now.
2020-11-04 08:00:18 -08:00
Harshavardhana
5c72a34fa8
fix: honor delimiter as per AWS S3 spec (#10823) 2020-11-04 07:56:58 -08:00
Klaus Post
b9277c8030
metacache: Add trashcan (#10820)
Add trashcan that keeps recently updated lists after bucket deletion.
All caches were deleted once a bucket was deleted, so caches still running would report errors. Now they are canceled.
Fix `.minio.sys` not being transient.
2020-11-03 12:47:52 -08:00
Harshavardhana
8c76e1353e
initialize IAM after etcd has initialized (#10819) 2020-11-03 12:12:30 -08:00
Harshavardhana
ad382799b1
use list cache for Walk() with webUI and quota (#10814)
bring list cache optimizations for web UI
object listing, also FIFO quota enforcement
through list cache as well.
2020-11-03 08:53:48 -08:00
Harshavardhana
68de5a6f6a
fix: IAM store fallback to list users and policies from disk (#10787)
Bonus fixes, remove package retry it is harder to get it
right, also manage context remove it such that we don't have
to rely on it anymore instead use a simple Jitter retry.
2020-11-02 17:52:13 -08:00
Harshavardhana
4ea31da889
fix: move list quorum ENV to config (#10804) 2020-11-02 17:21:56 -08:00
Klaus Post
0a796505c1
metacache: Check only one disk for updates (#10809)
Check only one disk for updates.

This will reduce IO while waiting for lists to finish.
2020-11-02 17:20:27 -08:00
Klaus Post
37749f4623
Optimize FileInfo(Version) transfer (#10775)
File Info decoding, in particular, is showing up as a major 
allocator and time consumer for internode data transfers

Switch to message pack for cross-server transfers:

```
MSGP:

Size: 945 bytes

BenchmarkEncodeFileInfoMsgp-32    	 1558444	       866 ns/op	   1.16 MB/s	       0 B/op	       0 allocs/op
BenchmarkDecodeFileInfoMsgp-32    	  479968	      2487 ns/op	   0.40 MB/s	     848 B/op	      18 allocs/op

GOB:

Size: 1409 bytes

BenchmarkEncodeFileInfoGOB-32    	  333339	      3237 ns/op	   0.31 MB/s	     576 B/op	      19 allocs/op
BenchmarkDecodeFileInfoGOB-32    	   20869	     57837 ns/op	   0.02 MB/s	   16439 B/op	     428 allocs/op
```
2020-11-02 17:07:52 -08:00
Klaus Post
86e0d272f3
Reduce WriteAll allocs (#10810)
WriteAll saw 127GB allocs in a 5 minute timeframe for 4MiB buffers 
used by `io.CopyBuffer` even if they are pooled.

Since all writers appear to write byte buffers, just send those 
instead and write directly. The files are opened through the `os` 
package so they have no special properties anyway.

This removes the alloc and copy for each operation.

REST sends content length so a precise alloc can be made.
2020-11-02 16:14:31 -08:00
Harshavardhana
8527f22df1
optimize request URL encoding for internode (#10811)
this reduces allocations in order of magnitude

Also, revert "erasure: delete dangling objects automatically (#10765)" 
affects list caching should be investigated.
2020-11-02 15:15:12 -08:00
Anis Elleuch
b456292295
erasure: delete dangling objects automatically (#10765) 2020-11-02 10:49:30 -08:00
Poorna Krishnamoorthy
03fdbc3ec2
Add async caching commit option in diskcache (#10742)
Add store and a forward option for a single part
uploads when an async mode is enabled with env
MINIO_CACHE_COMMIT=writeback 

It defaults to `writethrough` if unspecified.
2020-11-02 10:00:45 -08:00
Harshavardhana
4c773f7068
re-use remote transports in Peer,Storage,Locker clients (#10788)
use one transport for internode communication
2020-11-02 07:43:11 -08:00
Harshavardhana
5412d730c1
simplify monitoring doesn't need to be canceled (#10803)
connect disks monitoring doesn't need to be canceled
upon drive replacement, since we only need to replace
the newly replaced drive.
2020-10-31 14:10:12 -07:00
Klaus Post
fe9f23e632
Recreate bucket metacache if corrupted (#10800)
If bucket metadata cannot be read, clean up existing and create a new.
2020-10-31 10:26:16 -07:00
Klaus Post
422898d9b3
Clean up metadata cache when deleting bucket (#10802)
Metadata caches were left behind when deleting a bucket.
2020-10-31 09:46:18 -07:00
Harshavardhana
b686bb9c83
fix: replaced drive properly by healing the entire drive (#10799)
Bonus fixes, we do not need reload format anymore
as the replaced drive is healed locally we only need
to ensure that drive heal reloads the drive properly.

We preserve the UUID of the original order, this means
that the replacement in `format.json` doesn't mean that
the drive needs to be reloaded into memory anymore.

fixes #10791
2020-10-31 01:34:48 -07:00
Harshavardhana
5e5cdc581d
remove unnecessary logging and move to log once (#10798)
the current master logs way too much when a node
is down, instead log once and move on.
2020-10-30 14:55:50 -07:00
Harshavardhana
02cfa774be
allow requests to be proxied when server is booting up (#10790)
when server is booting up there is a possibility
that users might see '503' because object layer
when not initialized, then the request is proxied
to neighboring peers first one which is online.
2020-10-30 12:20:28 -07:00
Krishna Srinivas
3a2f89b3c0
fix: add support for O_DIRECT reads for erasure backends (#10718) 2020-10-30 11:04:29 -07:00
Klaus Post
6135f072d2
Fix invalidated metacaches (#10784)
* Fix caches having EOF marked as a failure.
* Simplify cache updates.
* Provide context for checkMetacacheState failures.
* Log 499 when the client disconnects.
2020-10-30 09:33:16 -07:00
Klaus Post
e63a44b734
rest client: Expect context timeouts for locks (#10782)
Add option for rest clients to not mark a remote offline for context timeouts.

This can be used if context timeouts are expected on the call.
2020-10-29 09:52:11 -07:00
Klaus Post
6b14c4ab1e
Optimize decryptObjectInfo (#10726)
`decryptObjectInfo` is a significant bottleneck when listing objects.

Reduce the allocations for a significant speedup.

https://github.com/minio/sio/pull/40

```
λ benchcmp before.txt after.txt
benchmark                          old ns/op     new ns/op     delta
Benchmark_decryptObjectInfo-32     24260928      808656        -96.67%

benchmark                          old MB/s     new MB/s     speedup
Benchmark_decryptObjectInfo-32     0.04         1.24         31.00x

benchmark                          old allocs     new allocs     delta
Benchmark_decryptObjectInfo-32     75112          48996          -34.77%

benchmark                          old bytes     new bytes     delta
Benchmark_decryptObjectInfo-32     287694772     4228076       -98.53%
```
2020-10-29 09:34:20 -07:00
Harshavardhana
4bf90ca67f
fix: handle a crash when AskDisks is set to -1 (#10777) 2020-10-29 09:25:43 -07:00
Harshavardhana
e0655e24f2
fix: A possible crash when fi.Erasure.Distribution is empty (#10779) 2020-10-28 19:24:01 -07:00
Klaus Post
bfc36aed89
Add update retry limit and compare error by string instead (#10776) 2020-10-28 13:19:53 -07:00
Kaloyan Raev
be7f67268d
fix: Do not cleanup range files in cache SaveMetadata when total hits are false (#10728) 2020-10-28 09:23:17 -07:00
Klaus Post
a982baff27
ListObjects Metadata Caching (#10648)
Design: https://gist.github.com/klauspost/025c09b48ed4a1293c917cecfabdf21c

Gist of improvements:

* Cross-server caching and listing will use the same data across servers and requests.
* Lists can be arbitrarily resumed at a constant speed.
* Metadata for all files scanned is stored for streaming retrieval.
* The existing bloom filters controlled by the crawler is used for validating caches.
* Concurrent requests for the same data (or parts of it) will not spawn additional walkers.
* Listing a subdirectory of an existing recursive cache will use the cache.
* All listing operations are fully streamable so the number of objects in a bucket no 
  longer dictates the amount of memory.
* Listings can be handled by any server within the cluster.
* Caches are cleaned up when out of date or superseded by a more recent one.
2020-10-28 09:18:35 -07:00
Krishna Srinivas
f53c5a020e
fix: heal object shards with ec.index and ec.distribution mismatches (#10773)
Co-authored-by: Harshavardhana <harsha@minio.io>
2020-10-28 00:10:20 -07:00
Harshavardhana
5b30bbda92
fix: add more protection distribution to match EcIndex (#10772)
allows for more stricter validation in picking up the right
set of disks for reconstruction.
2020-10-28 00:09:15 -07:00
Shireesh Anjal
858e2a43df
Remove logging info from OBDInfoHandler (#10727)
A lot of logging data is counterproductive. A better implementation with
precise useful log data can be introduced later.
2020-10-27 17:41:48 -07:00
Kaloyan Raev
df9894e275
avoid caching http ranges in background goroutine (#10724) 2020-10-26 23:04:48 -07:00
Krishna Srinivas
592f2f23a3
fix: heal rejects objects with disk re-ordering issue (#10766) 2020-10-26 18:48:47 -07:00
Krishna Srinivas
c49a80db41
fix: use meta.Erasure.Index for GetObject() to reconstruct object (#10764) 2020-10-26 16:19:42 -07:00
Poorna Krishnamoorthy
46275c6547
cache: rename function declarations (#10763) 2020-10-26 15:41:24 -07:00
Poorna Krishnamoorthy
0994ed9783
cache: fix call in GetObjectNInfo (#10762)
Fixes: #10751
2020-10-26 12:30:40 -07:00
Anis Elleuch
eb95353cb1
fix: Get/HeadObject return 404 on non quorum objects (#10753) 2020-10-26 10:30:46 -07:00
Harshavardhana
029758cb20
fix: retain the previous UUID for newly replaced drives (#10759)
only newly replaced drives get the new `format.json`,
this avoids disks reloading their in-memory reference
format, ensures that drives are online without
reloading the in-memory reference format.

keeping reference format in-tact means UUIDs
never change once they are formatted.
2020-10-26 10:29:29 -07:00
Harshavardhana
646d6917ed
turn-off checking for updates completely if MINIO_UPDATE=off (#10752) 2020-10-24 22:39:44 -07:00
Harshavardhana
d9db7f3308
expire lockers if lockers are offline (#10749)
lockers currently might leave stale lockers,
in unknown ways waiting for downed lockers.

locker check interval is high enough to safely
cleanup stale locks.
2020-10-24 13:23:16 -07:00
Harshavardhana
6a8c62f9fd
make sure to preserve UUID from reference format (#10748)
reference format should be source of truth
for inconsistent drives which reconnect,
add them back to their original position

remove automatic fix for existing offline
disk uuids
2020-10-24 13:23:08 -07:00
Anis Elleuch
00124c56d9
erasure: Commit data before xl.meta in RenameData() (#10734)
This will reduce the chance to have updated xl.meta without data.
2020-10-23 21:54:58 -07:00
Anis Elleuch
2c32c2149e
tests: Avoid running TestNSRace in short test mode (#10735) 2020-10-23 21:23:12 -07:00
Harshavardhana
734f258878
fix: slow down auto healing more aggressively (#10730)
Bonus fixes

- logging improvements to ensure that we don't use
  `go logger.LogIf` to avoid runtime.Caller missing
  the function name. log where necessary.
- remove unused code at erasure sets
2020-10-22 13:36:24 -07:00
Anis Elleuch
0e0c53bba4
tests: Lower expectation in addr selection in rand cache dialer (#10739)
Test TestDialContextWithDNSCacheRand was failing sometimes because it depends
on a random selection of addresses when testing random DNS resolution from cache.

Lower addr selection exception to 10%
2020-10-22 09:35:32 -07:00
Poorna Krishnamoorthy
5cc23ae052
validate if iam store is initialized (#10719)
Fixes panic - regression from d6d770c1b1
2020-10-20 21:28:24 -07:00
Harshavardhana
d6d770c1b1 initialize object layer right after config has loaded 2020-10-19 22:04:59 -07:00
Harshavardhana
b07df5cae1
initialize IAM as soon as object layer is initialized (#10700)
Allow requests to come in for users as soon as object
layer and config are initialized, this allows users
to be authenticated sooner and would succeed automatically
on servers which are yet to fully initialize.
2020-10-19 09:54:40 -07:00
Harshavardhana
c107728676
fix: s3 gateway DNS cache initialization (#10706)
fixes #10705
2020-10-19 01:34:23 -07:00
Anis Elleuch
284a2b9021
ilm: Send delete marker creation event when appropriate (#10696)
Before this commit, the crawler ILM will always send object delete event
notification though this is wrong.
2020-10-16 21:22:12 -07:00
Ritesh H Shukla
0b53e30ecb
Clean up monitor on delete bucket (#10698) 2020-10-16 17:59:31 -07:00
Harshavardhana
bd2131ba34
add DNS cache support to avoid DNS flooding (#10693)
Go stdlib resolver doesn't support caching DNS
resolutions, since we compile with CGO disabled
we are more probe to DNS flooding for all network
calls to resolve for DNS from the DNS server.

Under various containerized environments such as
VMWare this becomes a problem because there are
no DNS caches available and we may end up overloading
the kube-dns resolver under concurrent I/O.

To circumvent this issue implement a DNSCache resolver
which resolves DNS and caches them for around 10secs
with every 3sec invalidation attempted.
2020-10-16 14:49:05 -07:00
ebozduman
1aec168c84
fix: azure gateway should reject bucket names with "." (#10635) 2020-10-16 09:30:18 -07:00
Klaus Post
21a549a83b
fix: keep MRF channel open to avoid random CI crash (#10686)
There doesn't seem to be any benefit to closing the channel, so just keep 
it open and let it die with the server.
2020-10-16 09:08:51 -07:00
Ritesh H Shukla
8a16a1a1a9
fix: misc fixes for bandwidth reporting amd monitoring (#10683)
* Set peer for fetch bandwidth
* Fix the limit for bandwidth that is reported.
* Reduce CPU burn from bandwidth management.
2020-10-16 09:07:50 -07:00
Harshavardhana
ad726b49b4
rename zones to serverSets to avoid terminology conflict (#10679)
we are bringing in availability zones, we should avoid
zones as per server expansion concept.
2020-10-15 14:28:50 -07:00
Anis Elleuch
db2241066b
heal: Enable removing dangling delete markers (#10688) 2020-10-15 13:06:40 -07:00
Harshavardhana
f1cc16e788
fix: background heal rely on getOnlineDisks() (#10687) 2020-10-15 13:06:23 -07:00
Klaus Post
3820a905e0
in getOnlineDisks wait for disks to be populated (#10685) 2020-10-15 06:37:10 -07:00
Harshavardhana
2042d4873c
rename crawler config option to heal (#10678) 2020-10-14 13:51:51 -07:00
Harshavardhana
f9be783f3e
fix: allow crawler to crawl on disks without usage constraints (#10677)
additionally also change the resolution usage wise
return of disks, allows to small byte level differences
to be masked.
2020-10-14 12:12:10 -07:00
Harshavardhana
71b97fd3ac
fix: connect disks pre-emptively during startup (#10669)
connect disks pre-emptively upon startup, to ensure we have
enough disks are connected at startup rather than wait
for them.

we need to do this to avoid long wait times for server to
be online when we have servers come up in rolling upgrade
fashion
2020-10-13 18:28:42 -07:00
Klaus Post
03991c5d41
crawler: Remove waitForLowActiveIO (#10667)
Only use dynamic delays for the crawler. Even though the max wait was 1 second the number 
of waits could severely impact crawler speed.

Instead of relying on a global metric, we use the stateless local delays to keep the crawler 
running at a speed more adjusted to current conditions.

The only case we keep it is before bitrot checks when enabled.
2020-10-13 13:45:08 -07:00
飞雪无情
614060764d
fix: use the correct Action type for policy.Args and iampolicy.Args (#10650) 2020-10-12 15:18:22 -07:00
Harshavardhana
a3ba8188d7 fix: allow locker to be niladic 2020-10-12 14:23:44 -07:00
Harshavardhana
2760fc86af
Bump default idleConnsPerHost to control conns in time_wait (#10653)
This PR fixes a hang which occurs quite commonly at higher concurrency
by allowing following changes

- allowing lower connections in time_wait allows faster socket open's
- lower idle connection timeout to ensure that we let kernel
  reclaim the time_wait connections quickly
- increase somaxconn to 4096 instead of 2048 to allow larger tcp
  syn backlogs.

fixes #10413
2020-10-12 14:19:46 -07:00
Ritesh H Shukla
8ceb2a93fd
fix: peer replication bandwidth monitoring in distributed setup (#10652) 2020-10-12 09:04:55 -07:00
Ritesh H Shukla
c2f16ee846
Add basic bandwidth monitoring for replication. (#10501)
This change tracks bandwidth for a bucket and object

- [x] Add Admin API
- [x] Add Peer API
- [x] Add BW throttling
- [x] Admin APIs to set replication limit
- [x] Admin APIs for fetch bandwidth
2020-10-09 20:36:00 -07:00
Harshavardhana
6484453fc6
optionally allow strict quorum listing (#10649)
```
export MINIO_API_LIST_STRICT_QUORUM=on
```

would enable listing in quorum if necessary
2020-10-09 15:40:46 -07:00
Harshavardhana
a0d0645128
remove safeMode behavior in startup (#10645)
In almost all scenarios MinIO now is
mostly ready for all sub-systems
independently, safe-mode is not useful
anymore and do not serve its original
intended purpose.

allow server to be fully functional
even with config partially configured,
this is to cater for availability of actual
I/O v/s manually fixing the server.

In k8s like environments it will never make
sense to take pod into safe-mode state,
because there is no real access to perform
any remote operation on them.
2020-10-09 09:59:52 -07:00
Harshavardhana
253194e491
do not hold write locks - if objects don't exist (#10644) 2020-10-08 17:47:21 -07:00
Harshavardhana
736e58dd68
fix: handle concurrent lockers with multiple optimizations (#10640)
- select lockers which are non-local and online to have
  affinity towards remote servers for lock contention

- optimize lock retry interval to avoid sending too many
  messages during lock contention, reduces average CPU
  usage as well

- if bucket is not set, when deleteObject fails make sure
  setPutObjHeaders() honors lifecycle only if bucket name
  is set.

- fix top locks to list out always the oldest lockers always,
  avoid getting bogged down into map's unordered nature.
2020-10-08 12:32:32 -07:00
Poorna Krishnamoorthy
907a171edd
Generalize error messages for remote targets (#10638)
This is to allow remote targets to be generalized
for replication/ILM transition

Also adding a field in BucketTarget to identify
a remote target with a label.
2020-10-08 10:54:11 -07:00
Andreas Auernhammer
ed6d2a100f
logger: avoid writing audit log response header twice (#10642)
This commit fixes a misuse of the `http.ResponseWriter.WriteHeader`.
A caller should **either** call `WriteHeader` exactly once **or**
write to the response writer and causing an implicit 200 OK.

Writing the response headers more than once causes a `http: superfluous
response.WriteHeader call` log message. This commit fixes this
by preventing a 2nd `WriteHeader` call being forwarded to the underlying
`ResponseWriter`.

Updates #10587
2020-10-08 09:29:10 -07:00
Harshavardhana
effe131090
fix: allow read unlocks to be defensive about split brains (#10637) 2020-10-07 09:15:01 -07:00
Harshavardhana
18063bf25c
fix: cleanup old directory handling code (#10633)
we don't need them anymore, remove legacy code.
2020-10-06 12:03:57 -07:00
Poorna Krishnamoorthy
dbbed6f7f0
update minio-go dependency (#10634) 2020-10-06 08:37:09 -07:00
Poorna Krishnamoorthy
7fbfdceba3
Fix replication slowness (#10632)
- Increase channel buffer length
- Avoid blocking wait on replicaCh
2020-10-05 14:45:42 -07:00
Shireesh Anjal
f1418a50f0
add NVMe drive info [model num, serial num, drive temp. etc.] (#10613)
* add NVMe drive info [model num, serial num, drive temp. etc.]
* Ignore fuse partitions
* Add the nvme logic only for linux
* Move smart/nvme structs to a separate file

Co-authored-by: wlan0 <sidharthamn@gmail.com>
2020-10-04 10:18:46 -07:00
Krishna Srinivas
045e30f2c1
Set LastModified time from source for bucket replication (#10627) 2020-10-02 18:32:22 -07:00
Harshavardhana
c6a9a94f94
fix: optimize ServerInfo() handler to avoid reading config (#10626)
fixes #10620
2020-10-02 16:19:44 -07:00
Harshavardhana
8e7c00f3d4
add missing request-id from DeleteObject events (#10623)
fixes #10621
2020-10-02 13:36:13 -07:00
Harshavardhana
23e8390997
fix: Allow Walk to honor load balanced drives (#10610) 2020-10-01 20:24:34 -07:00
Anis Elleuch
71403be912
fix: consider partNumber in GET/HEAD requests (#10618) 2020-10-01 15:41:12 -07:00
Harshavardhana
f28d02b7f2
fix: simplify obd how we calculate transferred bytes (#10617) 2020-10-01 14:34:51 -07:00
Harshavardhana
e0cb814f3f
fail if port is not accessible (#10616)
throw proper error when port is not accessible
for the regular user, this is possibly a regression.

```
ERROR Unable to start the server: Insufficient permissions to use specified port
   > Please ensure MinIO binary has 'cap_net_bind_service=+ep' permissions
   HINT:
     Use 'sudo setcap cap_net_bind_service=+ep /path/to/minio' to provide sufficient permissions
```
2020-10-01 13:23:31 -07:00
Harshavardhana
98a08e1644
fix: protect updating latencies/throughput slices in obd (#10611)
Additionally close the transferChan upon function exit.
2020-10-01 09:50:08 -07:00
Klaus Post
3047121255
dataupdate: Bump to force rescan (#10609)
After #10594 let's invalidate the bloom filters to force the next cycles to go through all data.

There is a small chance that the linked PR could have caused missing bloom filter data.

This will invalidate the current bloom filters and make the crawler go through everything.
2020-09-30 16:10:40 -07:00
Ritesh H Shukla
5a7f92481e
fix: client errors for DNS service creation errors (#10584) 2020-09-30 14:09:41 -07:00
Anis Elleuch
0d45c38782
List v1/versions routes based on source IP if found (#10603)
Routing using on source IP if found. This should distribute
the listing load for V1 and versioning on multiple nodes
evenly between different clients.

If source IP is not found from the http request header, then falls back
to bucket name instead.
2020-09-30 13:38:27 -07:00
Poorna Krishnamoorthy
56d1b227cf
Handle changes to versioning config for replication (#10598)
Disallow versioning suspension on a bucket with
pre-existing replication configuration

If versioning is suspended on the target,replication
should fail.
2020-09-30 13:36:37 -07:00
Lenin Alevski
bea87a5a20
fix: reading multiple TLS certificates when deployed in K8S (#10601)
Ignore all regular files, CAs directory and any 
directory that starts with `..` inside the
`.minio/certs` folder
2020-09-30 08:21:30 -07:00
Harshavardhana
2b4eb87d77
pick disks which are common maximally used (#10600)
further optimization to ensure that good disks
are always used for listing, other than healing
we only use disks that are maximally used.
2020-09-29 22:54:02 -07:00
Harshavardhana
1f9abbee4d
make sure to release locks upon timeout (#10596)
fixes #10418
2020-09-29 15:18:34 -07:00
Klaus Post
fdf0ae9167
exit data update tracker only upon context completion (#10594)
The data update tracker saver would exit if data wasn't updated for between cycles.
2020-09-29 13:23:53 -07:00
Harshavardhana
00eb6f6bc9
cache DiskInfo at storage layer for performance (#10586)
`mc admin info` on busy setups will not move HDD
heads unnecessarily for repeated calls, provides
a better responsiveness for the call overall.

Bonus change allow listTolerancePerSet be N-1
for good entries, to avoid skipping entries
for some reason one of the disk went offline.
2020-09-29 09:54:41 -07:00
Harshavardhana
66174692a2
add '.healing.bin' for tracking currently healing disk (#10573)
add a hint on the disk to allow for tracking fresh disk
being healed, to allow for restartable heals, and also
use this as a way to track and remove disks.

There are more pending changes where we should move
all the disk formatting logic to backend drives, this
PR doesn't deal with this refactor instead makes it
easier to track healing in the future.
2020-09-28 19:39:32 -07:00
飞雪无情
209680e89f
Remove redundant http.HandlerFunc type conversion. (#10576) 2020-09-28 13:33:49 -07:00
飞雪无情
27d9bd04e5
Handling unhandled errors in the InfoCannedPolicy method. (#10575) 2020-09-27 10:24:04 -07:00
Harshavardhana
bebcf4f004 unlock() only if locking was successful 2020-09-25 19:36:47 -07:00
Harshavardhana
eafa775952
fix: add lock ownership to expire locks (#10571)
- Add owner information for expiry, locking, unlocking a resource
- TopLocks returns now locks in quorum by default, provides
  a way to capture stale locks as well with `?stale=true`
- Simplify the quorum handling for locks to avoid from storage
  class, because there were challenges to make it consistent
  across all situations.
- And other tiny simplifications to reset locks.
2020-09-25 19:21:52 -07:00
Harshavardhana
66b4a862e0
fix: network failure err check should ignore context canceled errors (#10567)
context canceled errors bubbling up from the network
layer has the potential to be misconstrued as network
errors, taking prematurely a server offline and triggering
a health check routine avoid this potential occurrence.
2020-09-25 14:35:47 -07:00
Anis Elleuch
9603489dd3
federation: Honor range with UploadObjectPart to a different cluster (#10570)
Use gr & length instead of srcInfo.Reader & srcInfo.Size because 
they don't honor range header
2020-09-25 12:06:42 -07:00
Anis Elleuch
b302c8a5f4
heal: Fix periodic healing cleanup (#10569)
isEnded() was incorrectly calculating if the current healing sequence is
ended or not. h.currentStatus.Items could be empty if healing is very
slow and mc admin heal consumed all items.
2020-09-25 10:29:00 -07:00
Praveen raj Mani
b880796aef
Set the maximum open connections limit in PG and MySQL target configs (#10558)
As the bulk/recursive delete will require multiple connections to open at an instance,
The default open connections limit will be reached which results in the following error

```FATAL:  sorry, too many clients already```

By setting the open connections to a reasonable value - `2`, We ensure that the max open connections
will not be exhausted and lie under bounds.

The queries are simple inserts/updates/deletes which is operational and sufficient with the
the maximum open connection limit is 2.

Fixes #10553

Allow user configuration for MaxOpenConnections
2020-09-24 22:20:30 -07:00
Harshavardhana
37a5d5d7a0
reduce timeouts between servers for faster disconnects (#10562) 2020-09-24 20:10:07 -07:00
Harshavardhana
3cac262dd1
report heal drives properly, also from global state (#10561)
It is possible the heal drives are not reported from
the maintenance check because the background heal
state simply relied on the `format.json` for capturing
unformatted drives. It is possible that drives might
be still healing - make sure that applications which
rely on cluster health check respond back this detail.
2020-09-24 15:36:47 -07:00
poornas
e6ab4db6b8
Fix minimum replication workers started (#10560)
This PR also fixes GetReplicationConfiguration permission
in web-handlers.go to use bucket as resource
2020-09-24 12:25:41 -07:00
Harshavardhana
ca989eb0b3
avoid ListBuckets returning quorum errors when node is down (#10555)
Also, revamp the way ListBuckets work make few portions
of the healing logic parallel

- walk objects for healing disks in parallel
- collect the list of buckets in parallel across drives
- provide consistent view for listBuckets()
2020-09-24 09:53:38 -07:00
飞雪无情
d778d034e7
Remove redundant mgmtQueryKey type. (#10557)
Remove redundant type conversion.
2020-09-24 08:40:21 -07:00
Harshavardhana
f7f9517b6a fix: host extraction without port 2020-09-23 12:10:14 -07:00
Harshavardhana
90cff10e2b avoid crash if disks are not initialized 2020-09-23 12:00:29 -07:00
Harshavardhana
81caf35926
fix: reduce healthcheck interval for storage rest client (#10544) 2020-09-23 10:43:42 -07:00
poornas
5726cef3ca
validate bucket exists in ListRemoteTargets api (#10552) 2020-09-23 10:37:54 -07:00
Harshavardhana
8b74a72b21
fix: rename READY deadline to CLUSTER deadline ENV (#10535) 2020-09-23 09:14:33 -07:00
Klaus Post
eec69d6796
Fix stale context for bucket retrieval (#10551)
The provided context gets captured by the closure making all subsequent calls fail.
2020-09-23 08:30:31 -07:00
Harshavardhana
0537a21b79
avoid concurrenct use of rand.NewSource (#10543) 2020-09-22 15:34:27 -07:00
poornas
4c54ed8748
Close replica channel only once (#10542)
Also enforce s3:GetReplicationConfiguration permission check as a
bucket level resource.
2020-09-22 12:47:24 -07:00
Anis Elleuch
4c81201f95
fix: healing delete marker on versioned buckets (#10530)
Healing was not working correctly in the distributed mode because
errFileVersionNotFound was not properly converted in storage rest
client.

Besides, fixing the healing delete marker is not working as expected.
2020-09-21 15:16:16 -07:00
Harshavardhana
cd8d511d3d move versionsOrder struct to xl-storage-utils 2020-09-21 14:24:42 -07:00
Harshavardhana
17e17da00d
add parallel workers to perform replication in parallel (#10525)
set the concurrency for replication be to runtime.NumCPU()/2
2020-09-21 13:43:29 -07:00
Harshavardhana
a5da9120f3
fix: [fs] an error upon rwPool.Write() just attempt rwPool.Create() (#10533)
On some NFS clients looks like errno is incorrectly set,
which leads to incorrect errors thrown upwards.
2020-09-21 12:54:23 -07:00
poornas
aa12d75d75
fix crawler to detect lifecycle on bucket even if filter nil (#10532) 2020-09-21 11:41:07 -07:00
Harshavardhana
6fcbdd5607
remove unused putObjectDir code (#10528) 2020-09-21 09:41:39 -07:00
Harshavardhana
3831cc9e3b
fix: [fs] CompleteMultipart use trie structure for partMatch (#10522)
performance improves by around 100x or more

```
go test -v -run NONE -bench BenchmarkGetPartFile
goos: linux
goarch: amd64
pkg: github.com/minio/minio/cmd
BenchmarkGetPartFileWithTrie
BenchmarkGetPartFileWithTrie-4          1000000000               0.140 ns/op           0 B/op          0 allocs/op
PASS
ok      github.com/minio/minio/cmd      1.737s
```

fixes #10520
2020-09-21 01:18:13 -07:00
Krishna Srinivas
230fc0d186
Support for "directory" objects (#10499) 2020-09-19 08:39:41 -07:00
Harshavardhana
7f9498f43f
fix: ignore faulty drives and continue (#10511)
drives might return different types of errors
handle them individually, and for some errors
just log an error and continue
2020-09-18 12:09:05 -07:00
Harshavardhana
1cf322b7d4
change leader locker only for crawler (#10509) 2020-09-18 11:15:54 -07:00
Klaus Post
0b1c824618
Fix incorrect request start time (#10516)
Log request start time BEFORE starting processing the request
2020-09-18 09:30:52 -07:00
Klaus Post
c851e022b7
Tweaks to dynamic locks (#10508)
* Fix cases where minimum timeout > default timeout.
* Add defensive code for too small/negative timeouts.
* Never set timeout below the maximum value of a request.
* Protect against (unlikely) int64 wraps.
* Decrease timeout slower.
* Don't re-lock before copying.
2020-09-18 09:18:18 -07:00
Klaus Post
5ad032826a
Add a reasonable if unable to get total RAM (#10506)
Though unlikely we shouldn't skip initializing the API if we cannot get RAM.

Add 16GiB as a default and log the error.
2020-09-18 02:03:02 -07:00
Harshavardhana
84bf4624a4
fix: make sure to preserve metadata during overwrite in FS mode (#10512)
This bug was introduced in 14f0047295
almost 3yrs ago, as a side affect of removing stale `fs.json`
but we in-fact end up removing existing good `fs.json` for an
existing object, leading to some form of a data loss.

fixes #10496
2020-09-18 00:16:16 -07:00
Harshavardhana
4a36cd7035
fix: improve performance ListObjectParts in FS mode (#10510)
from 20s for 10000 parts to less than 1sec

Without the patch
```
~ time aws --endpoint-url=http://localhost:9000 --profile minio s3api \
       list-parts --bucket testbucket --key test \
       --upload-id c1cd1f50-ea9a-4824-881c-63b5de95315a

real    0m20.394s
user    0m0.589s
sys     0m0.174s
```

With the patch
```
~ time aws --endpoint-url=http://localhost:9000 --profile minio s3api \
       list-parts --bucket testbucket --key test \
       --upload-id c1cd1f50-ea9a-4824-881c-63b5de95315a

real    0m0.891s
user    0m0.624s
sys     0m0.182s
```

fixes #10503
2020-09-17 18:51:16 -07:00
Klaus Post
03490c811b
Fix obd goroutine leak (#10504)
The gouroutine collecting transfer stats never exits. Add missing channel close.
2020-09-17 10:10:20 -07:00
Harshavardhana
ed78854cea fix: list across all drives to avoid stale disks 2020-09-16 21:17:10 -07:00
Harshavardhana
e60834838f
fix: background disk heal, to reload format consistently (#10502)
It was observed in VMware vsphere environment during a
pod replacement, `mc admin info` might report incorrect
offline nodes for the replaced drive. This issue eventually
goes away but requires quite a lot of time for all servers
to be in sync.

This PR fixes this behavior properly.
2020-09-16 21:14:35 -07:00
Harshavardhana
d616d8a857
serialize replication and feed it through task model (#10500)
this allows for eventually controlling the concurrency
of replication and overally control of throughput
2020-09-16 16:04:55 -07:00
Anis Elleuch
24cab7f9df
ilm: Remove a 'null' version if not latest (#10494)
If the ILM document requires removing noncurrent versions, the 
the server should be able to remove 'null' versions as well. 
'null' versions are created when versioning is not enabled 
or suspended.
2020-09-16 10:21:50 -07:00
Harshavardhana
02c1a08a5b
fix: make sure to lock CopyObject for in-place updates (#10492) 2020-09-15 20:44:48 -07:00
Ritesh H Shukla
5c47ce456e
Run replication in the background (#10491) 2020-09-15 18:44:58 -07:00
Anis Elleuch
8ea55f9dba
obd: Add console log to OBD output (#10372) 2020-09-15 18:02:54 -07:00
poornas
80e3dce631
azure: update content-md5 to metadata after upload (#10482)
Fixes #10453
2020-09-15 16:31:47 -07:00
Harshavardhana
80fab03b63
fix: S3 gateway doesn't support full passthrough for encryption (#10484)
The entire encryption layer is dependent on the fact that
KMS should be configured for S3 encryption to work properly
and we only support passing the headers as is to the backend
for encryption only if KMS is configured.

Make sure that this predictability is maintained, currently
the code was allowing encryption to go through and fail
at later to indicate that KMS was not configured. We should
simply reply "NotImplemented" if KMS is not configured, this
allows clients to simply proceed with their tests.
2020-09-15 13:57:15 -07:00
Harshavardhana
730d2dc7be
fix: allow CopyObject/PutObjecTags on pre-existing content (#10485)
fixes #10475
2020-09-15 09:18:41 -07:00
Harshavardhana
0ee9678190
fix: add missing delete marker created filter (#10481) 2020-09-14 21:32:52 -07:00
Klaus Post
34859c6d4b
Preallocate (safe) slices when we know the size (#10459) 2020-09-14 20:44:18 -07:00
Klaus Post
b1c99e88ac
reduce CPU usage upto 50% in readdir (#10466) 2020-09-14 17:19:54 -07:00
Harshavardhana
0104af6bcc
delayed locks until we have started reading the body (#10474)
This is to ensure that Go contexts work properly, after some
interesting experiments I found that Go net/http doesn't
cancel the context when Body is non-zero and hasn't been
read till EOF.

The following gist explains this, this can lead to pile up
of go-routines on the server which will never be canceled
and will die at a really later point in time, which can
simply overwhelm the server.

https://gist.github.com/harshavardhana/c51dcfd055780eaeb71db54f9c589150

To avoid this refactor the locking such that we take locks after we
have started reading from the body and only take locks when needed.

Also, remove contextReader as it's not useful, doesn't work as expected
context is not canceled until the body reaches EOF so there is no point
in wrapping it with context and putting a `select {` on it which
can unnecessarily increase the CPU overhead.

We will still use the context to cancel the lockers etc.
Additional simplification in the locker code to avoid timers
as re-using them is a complicated ordeal avoid them in
the hot path, since locking is very common this may avoid
lots of allocations.
2020-09-14 15:57:13 -07:00
Harshavardhana
34ea1d2167
fix: return correct error code for MetadataTooLarge (#10470)
fixes #10469
2020-09-13 21:26:35 -07:00
Harshavardhana
9d95937018 update KMS docs indicating deprecation of AUTO_ENCRYPTION env 2020-09-13 16:23:28 -07:00
Klaus Post
fa01e640f5
Continous healing: add optional bitrot check (#10417) 2020-09-12 00:08:12 -07:00
Harshavardhana
f355374962
add support for configurable remote transport deadline (#10447)
configurable remote transport timeouts for some special cases
where this value needs to be bumped to a higher value when
transferring large data between federated instances.
2020-09-11 23:03:08 -07:00
Harshavardhana
bda0fe3150
fix: allow LDAP identity to support form body POST (#10468)
similar to other STS APIs
2020-09-11 23:02:32 -07:00
Harshavardhana
b70995dd60 Revert "ilm: Remove null version if not latest with proper config (#10467)"
This reverts commit 4b6264da7d.
2020-09-11 18:15:49 -07:00
Anis Elleuch
4b6264da7d
ilm: Remove null version if not latest with proper config (#10467) 2020-09-11 14:20:09 -07:00
Harshavardhana
48919de301
fix: for defer'ed deleteObject use internal context (#10463) 2020-09-11 06:39:19 -07:00
Harshavardhana
eb2934f0c1
simplify webhook DNS further generalize for gateway (#10448)
continuation of the changes from eaaf05a7cc
this further simplifies, enables this for gateway deployments as well
2020-09-10 14:19:32 -07:00
Klaus Post
b7438fe4e6
Copy metadata before spawning goroutine + prealloc maps (#10458)
In `(*cacheObjects).GetObjectNInfo` copy the metadata before spawning a goroutine.

Clean up a few map[string]string copies as well, reducing allocs and simplifying the code.

Fixes #10426
2020-09-10 11:37:22 -07:00
Anis Elleuch
ce6cef6855
erasure: Call Walk() from all disks (#10445)
It does not make sense to call Walk() in only N/2 disks and then
requires N/2 quorum, just keep it N/2+1 

The commit fixes this behavior.
2020-09-10 09:27:52 -07:00
Klaus Post
493c714663
Remove erasureSets and erasureObjects from ObjectLayer (#10442) 2020-09-10 09:18:19 -07:00
Harshavardhana
e959c5d71c
fix: server panic in FS mode (#10455)
fixes #10454
2020-09-10 09:16:26 -07:00
Harshavardhana
4a2928eb49
generate missing object delete bucket notifications (#10449)
fixes #10381
2020-09-09 18:23:08 -07:00
Anis Elleuch
af88772a78
lifecycle: NoncurrentVersionExpiration considers noncurrent version age (#10444)
From https://docs.aws.amazon.com/AmazonS3/latest/dev/intro-lifecycle-rules.html#intro-lifecycle-rules-actions

```
When specifying the number of days in the NoncurrentVersionTransition
and NoncurrentVersionExpiration actions in a Lifecycle configuration,
note the following:

It is the number of days from when the version of the object becomes
noncurrent (that is, when the object is overwritten or deleted), that
Amazon S3 will perform the action on the specified object or objects.

Amazon S3 calculates the time by adding the number of days specified in
the rule to the time when the new successor version of the object is
created and rounding the resulting time to the next day midnight UTC.
For example, in your bucket, suppose that you have a current version of
an object that was created at 1/1/2014 10:30 AM UTC. If the new version
of the object that replaces the current version is created at 1/15/2014
10:30 AM UTC, and you specify 3 days in a transition rule, the
transition date of the object is calculated as 1/19/2014 00:00 UTC.
```
2020-09-09 18:11:24 -07:00
Harshavardhana
9109148474
add support for new UA values for update an check (#10451) 2020-09-09 17:21:39 -07:00
Nitish Tiwari
eaaf05a7cc
Add Kubernetes operator webook server as DNS target (#10404)
This PR adds a DNS target that ensures to update an entry
into Kubernetes operator when a bucket is created or deleted.

See minio/operator#264 for details.

Co-authored-by: Harshavardhana <harsha@minio.io>
2020-09-09 12:20:49 -07:00
Harshavardhana
958661cbb5
skip subdomain from bucket DNS which start with minio.domain (#10390)
extend host matcher to reject the host match
2020-09-09 09:57:37 -07:00
Harshavardhana
6a0372be6c
cleanup tmpDir any older entries automatically just like multipart (#10439)
also consider multipart uploads, temporary files in `.minio.sys/tmp`
as stale beyond 24hrs and clean them up automatically
2020-09-08 15:55:40 -07:00
Harshavardhana
c13afd56e8
Remove MaxConnsPerHost settings to avoid potential hangs (#10438)
MaxConnsPerHost can potentially hang a call without any
way to timeout, we do not need this setting for our proxy
and gateway implementations instead IdleConn settings are
good enough.

Also ensure to use NewRequestWithContext and make sure to
take the disks offline only for network errors.

Fixes #10304
2020-09-08 14:22:04 -07:00
Harshavardhana
96997d2b21
allow ctrl+c to be consistent at early startup (#10435)
fixes #10431
2020-09-08 09:10:55 -07:00
Klaus Post
86a3319d41
Ignore config values from unknown subsystems (#10432) 2020-09-08 08:57:04 -07:00
Harshavardhana
9f60e84ce1
always copy UserDefined metadata map (#10427)
fixes #10426
2020-09-07 09:25:28 -07:00
Harshavardhana
572b1721b2
set max API requests automatically based on RAM (#10421) 2020-09-04 19:37:37 -07:00
Harshavardhana
b0e1d4ce78
re-attach offline drive after new drive replacement (#10416)
inconsistent drive healing when one of the drive is offline
while a new drive was replaced, this change is to ensure
that we can add the offline drive back into the mix by
healing it again.
2020-09-04 17:09:02 -07:00
Harshavardhana
eb19c8af40
Bump response header timeout for proxying list request (#10420) 2020-09-04 16:07:40 -07:00
Klaus Post
2d58a8d861
Add storage layer contexts (#10321)
Add context to all (non-trivial) calls to the storage layer. 

Contexts are propagated through the REST client.

- `context.TODO()` is left in place for the places where it needs to be added to the caller.
- `endWalkCh` could probably be removed from the walkers, but no changes so far.

The "dangerous" part is that now a caller disconnecting *will* propagate down,  so a 
"delete" operation will now be interrupted. In some cases we might want to disconnect 
this functionality so the operation completes if it has started, leaving the system in a cleaner state.
2020-09-04 09:45:06 -07:00
poornas
0037951b6e
improve error message when remote target missing (#10412) 2020-09-04 08:48:38 -07:00
Andreas Auernhammer
fbd1c5f51a
certs: refactor cert manager to support multiple certificates (#10207)
This commit refactors the certificate management implementation
in the `certs` package such that multiple certificates can be
specified at the same time. Therefore, the following layout of
the `certs/` directory is expected:
```
certs/
 │
 ├─ public.crt
 ├─ private.key
 ├─ CAs/          // CAs directory is ignored
 │   │
 │    ...
 │
 ├─ example.com/
 │   │
 │   ├─ public.crt
 │   └─ private.key
 └─ foobar.org/
     │
     ├─ public.crt
     └─ private.key
   ...
```

However, directory names like `example.com` are just for human
readability/organization and don't have any meaning w.r.t whether
a particular certificate is served or not. This decision is made based
on the SNI sent by the client and the SAN of the certificate.

***

The `Manager` will pick a certificate based on the client trying
to establish a TLS connection. In particular, it looks at the client
hello (i.e. SNI) to determine which host the client tries to access.
If the manager can find a certificate that matches the SNI it
returns this certificate to the client.

However, the client may choose to not send an SNI or tries to access
a server directly via IP (`https://<ip>:<port>`). In this case, we
cannot use the SNI to determine which certificate to serve. However,
we also should not pick "the first" certificate that would be accepted
by the client (based on crypto. parameters - like a signature algorithm)
because it may be an internal certificate that contains internal hostnames. 
We would disclose internal infrastructure details doing so.

Therefore, the `Manager` returns the "default" certificate when the
client does not specify an SNI. The default certificate the top-level
`public.crt` - i.e. `certs/public.crt`.

This approach has some consequences:
 - It's the operator's responsibility to ensure that the top-level
   `public.crt` does not disclose any information (i.e. hostnames)
   that are not publicly visible. However, this was the case in the
   past already.
 - Any other `public.crt` - except for the top-level one - must not
   contain any IP SAN. The reason for this restriction is that the
   Manager cannot match a SNI to an IP b/c the SNI is the server host
   name. The entire purpose of SNI is to indicate which host the client
   tries to connect to when multiple hosts run on the same IP. So, a
   client will not set the SNI to an IP.
   If we would allow IP SANs in a lower-level `public.crt` a user would
   expect that it is possible to connect to MinIO directly via IP address
   and that the MinIO server would pick "the right" certificate. However,
   the MinIO server cannot determine which certificate to serve, and
   therefore always picks the "default" one. This may lead to all sorts
   of confusing errors like:
   "It works if I use `https:instance.minio.local` but not when I use
   `https://10.0.2.1`.

These consequences/limitations should be pointed out / explained in our
docs in an appropriate way. However, the support for multiple
certificates should not have any impact on how deployment with a single
certificate function today.

Co-authored-by: Harshavardhana <harsha@minio.io>
2020-09-03 23:33:37 -07:00
Harshavardhana
1c6781757c
add missing ListBucketVersions from policy actions (#10414) 2020-09-03 18:25:06 -07:00
Harshavardhana
b4e3956e69
update KES docs to talk about 'mc encrypt' command (#10400)
add a deprecation notice for KMS_AUTO_ENCRYPTION
2020-09-03 12:43:45 -07:00
Harshavardhana
8a291e1dc0
Cluster healthcheck improvements (#10408)
- do not fail the healthcheck if heal status
  was not obtained from one of the nodes,
  if many nodes fail then report this as a
  catastrophic error.
- add "x-minio-write-quorum" value to match
  the write tolerance supported by server.
- admin info now states if a drive is healing
  where madmin.Disk.Healing is set to true
  and madmin.Disk.State is "ok"
2020-09-02 22:54:56 -07:00
Klaus Post
650dccfa9e
cache: Only start at high watermark (#10403)
Currently, cache purges are triggered as soon as the low watermark is exceeded.
To reduce IO this should only be done when reaching the high watermark.
This simplifies checks and reduces all calls for a GC to go through
`dcache.diskSpaceAvailable(size)`. While a comment claims that 
`dcache.triggerGC <- struct{}{}` was non-blocking I don't see how 
that was possible. Instead, we add a 1 size to the queue channel 
and use channel  semantics to avoid blocking when a GC has 
already been requested.

`bytesToClear` now takes the high watermark into account to it will 
not request any bytes to be cleared until that is reached.
2020-09-02 17:48:44 -07:00
Andreas Auernhammer
9a703befe6
crypto: reduce retry delay when retrying KES requests (#10394)
This commit reduces the retry delay when retrying a request
to a KES server by:
 - reducing the max. jitter delay from 3s to 1.5s
 - skipping the random delay when there are more KES endpoints
   available.

If there are more KES endpoints we can directly retry to the request
by sending it to the next endpoint - as pointed out by @krishnasrinivas
2020-09-02 11:04:10 -07:00
Klaus Post
9a1615768d
Fix flaky TestXLStorageVerifyFile (#10398)
`TestXLStorageVerifyFile` would fail 1 in 256 if the first random character was 'a'.

Instead write 256 bytes which has 1 in 256^256 probability.
2020-09-02 09:42:24 -07:00
Harshavardhana
37da0c647e
fix: delete marker compatibility behavior for suspended bucket (#10395)
- delete-marker should be created on a suspended bucket as `null`
- delete-marker should delete any pre-existing `null` versioned
  object and create an entry `null`
2020-09-02 00:19:03 -07:00
Harshavardhana
2acb530ccd
update rulesguard with new rules (#10392)
Co-authored-by: Nitish Tiwari <nitish@minio.io>
Co-authored-by: Praveen raj Mani <praveen@minio.io>
2020-09-01 16:58:13 -07:00
Klaus Post
3e1fb17b70
heal: Check for truncated files (#10399)
When checking parts we already do a stat for each part.

Since we have the on disk size check if it is at least what we expect.

When checking metadata check if metadata is 0 bytes.
2020-09-01 12:06:45 -07:00
Klaus Post
a89d6b8e3d
Fix common Windows failure (#10397)
The `getNonLoopBackIP` may grab an IP from an interface that
doesn't allow binding (on Windows), so this test consistently fails.

We exclude that specific error.
2020-09-01 10:11:15 -07:00
Klaus Post
1c085f7d1a
Fix crash on Windows when crawling (#10385)
* readDirN: Check if file is directory

`syscall.FindNextFile` crashes if the handle is a file.

`errFileNotFound` matches 'unix' functionality: d19b434ffc/cmd/os-readdir_unix.go (L106)

Fixes #10384
2020-09-01 09:33:16 -07:00
Harshavardhana
4b6585d249
support 'ldap:user' variable replacement properly (#10391)
also update `ldap.go` examples with latest
minio-go changes

Fixes #10367
2020-09-01 12:26:22 +05:30
Harshavardhana
9ffad7fceb discard empty endpoint in crypto kes
introduced in 18725679c4
2020-08-31 19:35:43 -07:00
Andreas Auernhammer
18725679c4
crypto: allow multiple KES endpoints (#10383)
This commit addresses a maintenance / automation problem when MinIO-KES
is deployed on bare-metal. In orchestrated env. the orchestrator (K8S)
will make sure that `n` KES servers (IPs) are available via the same DNS
name. There it is sufficient to provide just one endpoint.
2020-08-31 18:10:52 -07:00
Anis Elleuch
ba8a8ad818
ListObjectsV1 requests unnecessarily fail with offline nodes (#10386)
ListObjectsV1 requests are actually redirected to a specific node, 
depending on the bucket name. The purpose of this behavior was
to optimize listing.

However, the current code sends a Bad Gateway error if the
target node is offline, which is a bad behavior because it means
that the list request will fail, although this is unnecessary since
we can still use the current node to list as well (the default behavior
without using proxying optimization)

Currently, you can see mint fails when there is one offline node, after
this PR, mint will always succeed.
2020-08-31 12:37:31 -07:00
Harshavardhana
102ad60dee
simplify removing temporary files (#10389) 2020-08-31 12:35:40 -07:00
Gaige B Paulsen
859ef52886
update for smartos build (solaris too) (#10378) 2020-08-31 10:19:25 -07:00
Harshavardhana
e730da1438
fix: referesh JWKS public keys upon failure (#10368)
fixes #10359
2020-08-28 08:15:12 -07:00
Anis Elleuch
46ee8659b4
fix write quorum calculation for bucket operations (#10364)
When the number of disks is odd, the calculation of quorum 
for bucket operations were not correct, fix it.
2020-08-27 12:55:32 -07:00
Harshavardhana
a359e36e35
tolerate listing with only readQuorum disks (#10357)
We can reduce this further in the future, but this is a good
value to keep around. With the advent of continuous healing,
we can be assured that namespace will eventually be
consistent so we are okay to avoid the necessity to
a list across all drives on all sets.

Bonus Pop()'s in parallel seem to have the potential to
wait too on large drive setups and cause more slowness
instead of gaining any performance remove it for now.

Also, implement load balanced reply for local disks,
ensuring that local disks have an affinity for

- cleanupStaleMultipartUploads()
2020-08-26 19:29:35 -07:00
Jorge Israel Peña
0a2e6d58a5
hdfs gateway handle listing single files (#10362) 2020-08-26 16:03:53 -07:00
Klaus Post
1b119557c2
getDisksInfo: Attribute failed disks to correct endpoint (#10360)
If DiskInfo calls failed the information returned was used anyway 
resulting in no endpoint being set.

This would make the drive be attributed to the local system since 
`disk.Endpoint == disk.DrivePath` in that case.

Instead, if the call fails record the endpoint and the error only.
2020-08-26 10:11:26 -07:00
Harshavardhana
7778fef6bb
update continous heal metrics appropriately for scanned items (#10352)
bonus make sure to ignore objectNotFound, and versionNotFound
errors properly at all layers, since HealObjects() returns
objectNotFound error if the bucket or prefix is empty.
2020-08-26 08:53:33 -07:00
飞雪无情
ea1803417f
Use constants for gateway names to avoid bugs caused by spelling. (#10355) 2020-08-26 08:52:46 -07:00
Harshavardhana
d19b434ffc
fix: bring back delayed leaf detection in listing (#10346) 2020-08-25 12:26:48 -07:00
Klaus Post
17a1eda702
Disregard healing disks in crawling (#10349)
When crawling never use a disk we know is healing.

Most of the change involves keeping track of the original endpoint on xlStorage
and this also fixes DiskInfo.Endpoint never being populated.

Heal master will print `data-crawl: Disk "http://localhost:9001/data/mindev/data2/xl1" is 
Healing, skipping` once on a cycle (no more often than every 5m).
2020-08-25 10:55:15 -07:00
Daniel Valdivia
7d1734d033
indicate through HTTP header cluster healing in progress (#10342) 2020-08-24 15:20:50 -07:00
Harshavardhana
03ec6adfd0
fix: KES http2.0 communication support (#10341) 2020-08-24 14:37:53 -07:00
Harshavardhana
309b10f201 keep crawler cycle at 5 minutes 2020-08-24 14:05:16 -07:00
Klaus Post
c097ce9c32
continous healing based on crawler (#10103)
Design: https://gist.github.com/klauspost/792fe25c315caf1dd15c8e79df124914
2020-08-24 13:47:01 -07:00
Harshavardhana
caad314faa
add ruleguard support, fix all the reported issues (#10335) 2020-08-24 12:11:20 -07:00
Klaus Post
bc2ebe0021
Only enforce quota on success (#10339)
We should only enforce quotas if no error has been returned.

firstErr is safe to access since all goroutines have exited at this point.

If `firstErr` hasn't been set by something else return the context error if cancelled.
2020-08-24 10:15:46 -07:00
Harshavardhana
11aa393ba7
Allow region errors to be dynamic (#10323)
remove other FIXMEs as we are not planning to fix these, 
instead we will add dynamism case by case basis.

fixes #10250
2020-08-23 22:06:22 -07:00
Praveen raj Mani
d0c910a6f3
Support https and basic-auth for elasticsearch notification target (#10332) 2020-08-23 09:43:48 -07:00
kannappanr
d15a5ad4cc
S3 Gateway: Check for encryption headers properly (#10309) 2020-08-22 11:41:49 -07:00
Harshavardhana
95411228db
add missing cleanupStaleMultipartUploads (#10325)
fixes #10319
2020-08-21 21:39:54 -07:00
ebozduman
23774353b7
get_object() returns NoSuchKey error when object is a prefix (#10315) 2020-08-21 13:08:01 -07:00
poornas
a2a5ec93d3
fix: use global context for filling cache in the background (#10308) 2020-08-20 14:23:24 -07:00
Harshavardhana
27a774cbe9
fix: FS mode should reject putBucketVersioning (#10307) 2020-08-20 13:18:06 -07:00
Klaus Post
8e6787a302
Fix TestDataUpdateTracker hanging (#10302)
Keep dataUpdateTracker while goroutine is starting.

This will ensure the object is updated one `start` returns

Tested with

```
λ go test -cpu=1,2,4,8 -test.run TestDataUpdateTracker -count=1000
PASS
ok      github.com/minio/minio/cmd      8.913s
```

Fixes #10295
2020-08-20 13:17:42 -07:00
Harshavardhana
59352d0ac2
load all blocking metadata in background (#10298)
most of this metadata already has fallbacks
and there is no good reason to load them
in blocking fashion
2020-08-20 10:38:53 -07:00
Harshavardhana
75d44b3bae
add disk for more context in bitrot errors (#10296) 2020-08-20 09:41:15 -07:00
Klaus Post
95ae6c4b49
Fix missing unlock in *healSequence.hasEnded() (#10305)
The background healing sequence would always hang when this function is called.
2020-08-20 08:48:09 -07:00
KevinSmile
0ebb73ee2e
use const instead of literals (#10292) 2020-08-19 16:43:52 -07:00
Harshavardhana
c8b84a0e9e
Add nancy vulnerability scanner (#10289) 2020-08-19 14:25:21 -07:00
Ritesh H Shukla
3acb5cff45
Update code comment (#10287) 2020-08-19 14:24:58 -07:00
Harshavardhana
74116204ce
handle fresh setup with mixed drives (#10273)
fresh drive setups when one of the drive is
a root drive, we should ignore such a root
drive and not proceed to format.

This PR handles this properly by marking
the disks which are root disk and they are
taken offline.
2020-08-18 14:37:26 -07:00
Harshavardhana
e4a44f6224
fix: commonPrefixes behavior in ListObjectVersions (#10286)
```
$ aws s3api --profile minio --endpoint-url http://localhost:9003 \
    list-object-versions --bucket testbucket \
    --delimiter / --prefix Veeam/Archive/

{
    "CommonPrefixes": [
        {
            "Prefix": "Veeam/Archive/003/"
        }
    ]
}
```

Also add coverage tests similar to ListObjects to
catch errors in future, skip these tests in FS mode
2020-08-18 12:19:44 -07:00
poornas
0272973175
Fix regression in web ui for retention (#10285)
Fixes: #10283 regression from PR #9259
2020-08-18 12:09:42 -07:00
Harshavardhana
d2a3f92452
fix: health handler for lockers (#10280) 2020-08-18 07:27:41 -07:00
Harshavardhana
ede86845e5
docs: Add policy variables for resource and conditions (#10278)
Bonus fix adds LDAP policy variable and clarifies the
usage of policy variables for temporary credentials.

fixes #10197
2020-08-17 17:39:55 -07:00
Harshavardhana
e57c742674
use single dynamic timeout for most locked API/heal ops (#10275)
newDynamicTimeout should be allocated once, in-case
of temporary locks in config and IAM we should
have allocated timeout once before the `for loop`

This PR doesn't fix any issue as such, but provides
enough dynamism for the timeout as per expectation.
2020-08-17 11:29:58 -07:00
Klaus Post
bb5976d727
healbucket: Send object version ID (#10263)
Based on our previous conversations I assume we should send the version
 id when healing an object.

Maybe we should even list object versions and heal all?
2020-08-17 08:25:44 -07:00
Harshavardhana
f7c1a59de1
add validation logs for configured Logger/Audit HTTP targets (#10274)
extra logs in-case of misconfiguration of audit/logger targets
2020-08-16 10:25:00 -07:00
Anis Elleuch
51ba1dac49
listing: Fix result when prefix is an object with a slash (#10267)
In a non recursive mode, issuing a list request where prefix
is an existing object with a slash and delimiter is a slash will
return entries in the object directory (data dir IDs)

```
$ aws s3api --profile minioadmin --endpoint-url http://localhost:9000 \
        list-objects-v2 --bucket testbucket --prefix code_of_conduct.md/ --delimiter '/'
{
    "CommonPrefixes": [
        {
            "Prefix":
"code_of_conduct.md/ec750fe0-ea7e-4b87-bbec-1e32407e5e47/"
        }
    ]
}
```

This commit adds a fast exit track in Walk() in this specific case.
2020-08-14 20:13:24 -07:00
Harshavardhana
a4463dd40f
fix: storageClass shouldn't set the value upon failure (#10271) 2020-08-14 19:48:04 -07:00
Harshavardhana
83a82d818e
allow lock tolerance to match storage-class drive tolerance (#10270) 2020-08-14 18:17:14 -07:00
Harshavardhana
1d1c4430b2
decrypt ETags in parallel around 500 at a time (#10261)
Listing speed-up gained from 10secs for
just 400 entries to 2secs for 400 entries
2020-08-14 11:56:35 -07:00
Harshavardhana
43e6d1ce2d
fix: missing proxy request by bucket for ListVersions (#10260) 2020-08-13 16:31:58 -07:00
Harshavardhana
30da442a85
rootDisk on containers can have different device Id (#10259)
use `/etc/hosts` instead of `/` to check for common
device id, if the device is same for `/etc/hosts`
and the --bind mount to detect root disks.

Bonus enhance healthcheck logging by adding maintenance
tags, for all messages.
2020-08-13 15:21:20 -07:00
Harshavardhana
038d91feaa
fix: add public certs automatically as part of global CAs (#10256) 2020-08-13 09:46:50 -07:00
Harshavardhana
e7ba78beee
use GlobalContext instead of context.Background when possible (#10254) 2020-08-13 09:16:01 -07:00
Harshavardhana
b32d0a5b60 use the correct endpoints for offline drives 2020-08-12 19:17:49 -07:00
poornas
79e21601b0
fix: web handlers to enforce replication (#10249)
This PR also preserves source ETag for replication
2020-08-12 17:32:24 -07:00
Harshavardhana
34253aa595
feat: cache env value in-case network is not reachable (#10251) 2020-08-12 16:53:15 -07:00
Harshavardhana
79ed7ce451
fs: listObjects shouldn't take FS locks while listing (#10248) 2020-08-12 15:23:14 +05:30
Harshavardhana
0dd3a08169
move the certPool loader function into pkg/certs (#10239) 2020-08-11 08:29:50 -07:00
Klaus Post
f8f290e848
security: Remove insecure custom headers (#10244)
Background: https://github.com/google/security-research/security/advisories/GHSA-76wf-9vgp-pj7w

Remove these custom headers from incoming and outgoing requests.
2020-08-11 08:29:29 -07:00
Harshavardhana
1e2ebc9945
feat: time to bring back http2.0 support (#10230)
Bonus move our CI/CD to go1.14
2020-08-10 09:02:29 -07:00
Harshavardhana
2a9819aff8
fix: refactor background heal for cluster health (#10225) 2020-08-07 19:43:06 -07:00
Harshavardhana
6c6137b2e7
add cluster maintenance healthcheck drive heal affinity (#10218) 2020-08-07 13:22:53 -07:00
Anis Elleuch
9138b2b503
Avoid duplicate headers when proxying S3 listing requests (#10220) 2020-08-07 04:10:16 -07:00
Harshavardhana
77509ce391
Support looking up environment remotely (#10215)
adds a feature where we can fetch the MinIO
command-line remotely, this
is primarily meant to add some stateless
nature to the MinIO deployment in k8s
environments, MinIO operator would run a
webhook service endpoint
which can be used to fetch any environment
value in a generalized approach.
2020-08-06 18:03:16 -07:00
poornas
adcaa6f9de
fix: Change ListBucketTargets handler (#10217)
to list all targets across a tenant.
Also fixing some validations.
2020-08-06 17:10:21 -07:00
poornas
121164db56
fix: relax some replication validations (#10210)
Also inherit storage class from source object
if replication configuration does not have a storage
class specified for destination bucket.
2020-08-05 20:01:20 -07:00
Harshavardhana
a20d4568a2
fix: make sure to use uniform drive count calculation (#10208)
It is possible in situations when server was deployed
in asymmetric configuration in the past such as

```
minio server ~/fs{1...4}/disk{1...5}
```

Results in setDriveCount of 10 in older releases
but with fairly recent releases we have moved to
having server affinity which means that a set drive
count ascertained from above config will be now '4'

While the object layer make sure that we honor
`format.json` the storageClass configuration however
was by mistake was using the global value obtained
by heuristics. Which leads to prematurely using
lower parity without being requested by the an
administrator.

This PR fixes this behavior.
2020-08-05 13:31:12 -07:00
Harshavardhana
e656beb915
feat: allow service accounts to be generated with OpenID STS (#10184)
Bonus also fix a bug where we did not purge relevant
service accounts generated by rotating credentials
appropriately, service accounts should become invalid
as soon as its corresponding parent user becomes invalid.

Since service account themselves carry parent claim always
we would never reach this problem, as the access get
rejected at IAM policy layer.
2020-08-05 13:08:40 -07:00
poornas
88daaef76b
Validate object lock when setting replication config. (#10200)
Check if object lock is enabled on
destination bucket while setting replication
configuration on a object lock enabled bucket.
2020-08-04 23:02:27 -07:00
Harshavardhana
0b8255529a
fix: proxies set keep-alive timeouts to be system dependent (#10199)
Split the DialContext's one for internode and another
for all other external communications especially
proxy forwarders, gateway transport etc.
2020-08-04 14:55:53 -07:00
Harshavardhana
019fe69a57
fix: reduce an extra system call for writes instead fail later (#10187) 2020-08-04 12:09:41 -07:00
Anis Elleuch
6ae30b21c9
fix ILM should not remove a protected version (#10189) 2020-08-03 23:04:40 -07:00
Harshavardhana
b16781846e
allow server to start even with corrupted/faulty disks (#10175) 2020-08-03 18:17:48 -07:00
Harshavardhana
5ce82b45da
add CopyObject optimization when source and destination are same (#10170)
when source and destination are same and versioning is enabled
on the destination bucket - we do not need to re-create the entire
object once again to optimize on space utilization.

Cases this PR is not supporting

- any pre-existing legacy object will not
  be preserved in this manner, meaning a new
  dataDir will be created.

- key-rotation and storage class changes
  of course will never re-use the dataDir
2020-08-03 16:21:10 -07:00
Harshavardhana
e99bc177c0
fix: allow FS mode situations when conflicting files exist (#10185)
conflicting files can exist on FS at
`.minio.sys/buckets/testbucket/policy.json/`, this is an
expected valid scenario for FS mode allow it to work,
i.e ignore and move forward
2020-08-03 13:20:49 -07:00
Harshavardhana
b68bc75dad
fix: quorum calculation mistake with reduced parity (#10186)
With reduced parity our write quorum should be same
as read quorum, but code was still assuming

```
readQuorum+1
```

In all situations which is not necessary.
2020-08-03 12:15:08 -07:00
Harshavardhana
d61eac080b
fix: connection_string should override other params (#10180)
closes #9965
2020-08-03 09:16:00 -07:00
poornas
a8dd7b3eda
Refactor replication target management. (#10154)
Generalize replication target management so
that remote targets for a bucket can be
managed with ARNs. `mc admin bucket remote`
command will be used to manage targets.
2020-07-30 19:55:22 -07:00
Harshavardhana
25a55bae6f
fix: avoid buffering of server sent events by proxies (#10164) 2020-07-30 19:45:12 -07:00
Harshavardhana
fe157166ca
fix: Pass context all the way down to the network call in lockers (#10161)
Context timeout might race on each other when timeouts are lower
i.e when two lock attempts happened very quickly on the same resource
and the servers were yet trying to establish quorum.

This situation can lead to locks held which wouldn't be unlocked
and subsequent lock attempts would fail.

This would require a complete server restart. A potential of this
issue happening is when server is booting up and we are trying
to hold a 'transaction.lock' in quick bursts of timeout.
2020-07-29 23:15:34 -07:00
Adam Brown
f7259adf83
Update LastUpdate timestamp before save (#10152) 2020-07-28 13:20:50 -07:00
Harshavardhana
6669560cb9
turn-off bucket usage metrics in gateway mode (#10150)
closes #10147
2020-07-28 13:04:26 -07:00
poornas
b46ab7e921
Rename replication target handler (#10142)
Rename replication target handler to a generic bucket target handler
2020-07-28 11:50:47 -07:00
Harshavardhana
27266f8a54
fix: if OPA set do not enforce policy claim (#10149) 2020-07-28 11:47:57 -07:00
poornas
1b6ba0d062
Add validation in cache for offline drives (#10146)
closes #10144
2020-07-28 10:06:52 -07:00
Harshavardhana
f200a7fb6a
fix: speed up OBD tests avoid unnecessary memory allocation (#10141)
replace dummy buffer with nullReader{} instead,
to avoid large memory allocations in memory
constrainted environments. allows running
obd tests in such environments.
2020-07-27 14:51:59 -07:00
Harshavardhana
47e304d03c
fix: add missing content-disposition from CORS handler (#10137) 2020-07-27 09:03:38 -07:00
Harshavardhana
9108abf204
fix: allow shareable URLs with rotating creds (#10135)
closes #8935
2020-07-27 09:02:53 -07:00
Harshavardhana
6529dcb3b5
fix: gateway Walk() implementation to list correct contents (#10131)
closes #10122
2020-07-26 22:56:05 -07:00
Harshavardhana
abbf6ce6cc
simplify JWKS decoding in OpenID and more tests (#10119)
add tests for non-compliant Azure AD behavior
with "nonce" to fail properly and treat it as
expected behavior for non-standard JWT tokens.
2020-07-25 08:42:41 -07:00
Harshavardhana
5ffc733eec
fix: enforce bucket quota from browser uploads (#10129) 2020-07-24 21:16:54 -07:00
Harshavardhana
35212b673e
add unformatted disk as part of the error list (#10128)
these errors should be ignored for quorum
error calculation to ensure that we don't
prematurely return unformatted disk error
as part of API calls
2020-07-24 13:16:11 -07:00
Harshavardhana
57ff9abca2
Apply quota usage cache invalidation per second (#10127)
Allow faster lookups for quota check enforcement
2020-07-24 12:24:21 -07:00
Jorge Israel Peña
4752323e1c
Use hdfs.Readdir() to optimize HDFS directory listings (#10121)
Currently, listing directories on HDFS incurs a per-entry remote Stat() call
penalty, the cost of which can really blow up on directories with many
entries (+1,000) especially when considered in addition to peripheral
calls (such as validation) and the fact that minio is an intermediary to the
client (whereas other clients listed below can query HDFS directly).

Because listing directories this way is expensive, the Golang HDFS library
provides the [`Client.Open()`] function which creates a [`FileReader`] that is
able to batch multiple calls together through the [`Readdir()`] function.

This is substantially more efficient for very large directories.

In one case we were witnessing about +20 seconds to list a directory with 1,500
entries, admittedly large, but the Java hdfs ls utility as well as the HDFS
library sample ls utility were much faster.

Hadoop HDFS DFS (4.02s):

    λ ~/code/minio → use-readdir
    » time hdfs dfs -ls /directory/with/1500/entries/
    …
    hdfs dfs -ls   5.81s user 0.49s system 156% cpu 4.020 total

Golang HDFS library (0.47s):

    λ ~/code/hdfs → master
    » time ./hdfs ls -lh /directory/with/1500/entries/
    …
    ./hdfs ls -lh   0.13s user 0.14s system 56% cpu 0.478 total

mc and minio **without** optimization (16.96s):

    λ ~/code/minio → master
    » time mc ls myhdfs/directory/with/1500/entries/
    …
    ./mc ls   0.22s user 0.29s system 3% cpu 16.968 total

mc and minio **with** optimization (0.40s):

    λ ~/code/minio → use-readdir
    » time mc ls myhdfs/directory/with/1500/entries/
    …
    ./mc ls   0.13s user 0.28s system 102% cpu 0.403 total

[`Client.Open()`]: https://godoc.org/github.com/colinmarc/hdfs#Client.Open
[`FileReader`]: https://godoc.org/github.com/colinmarc/hdfs#FileReader
[`Readdir()`]: https://godoc.org/github.com/colinmarc/hdfs#FileReader.Readdir
2020-07-24 11:31:51 -07:00
Klaus Post
11593c6cc4
Usage: Reset merged info when updating (#10126)
When merging multiple buckets reset between each update.

Avoids merging the same usage metrics multiple times resulting 
in duplicate data entries.
2020-07-24 11:02:10 -07:00
Harshavardhana
10025bda45
fix: add missing response headers to CORS handler (#10124) 2020-07-24 00:46:51 -07:00
Harshavardhana
3a73f1ead5
refactor server update behavior (#10107) 2020-07-23 08:03:31 -07:00
poornas
b9be841fd2
Add missing validation for replication API conditions (#10114) 2020-07-22 17:39:40 -07:00
Anis Elleuch
456b2ef6eb
Avoid healing to be stuck with many concurrent event listeners (#10111)
If there are many listeners to bucket notifications or to the trace
subsystem, healing fails to work properly since it suspends itself when
the number of concurrent connections is above a certain threshold.

These connections are also continuous and not costly (*no disk access*),
it is okay to just ignore them in waitForLowHTTPReq().
2020-07-22 13:16:55 -07:00
poornas
c43da3005a
Add support for server side bucket replication (#9882) 2020-07-21 17:49:56 -07:00
Harshavardhana
a880283593
Send the lower level error directly from GetDiskID() (#10095)
this is to detect situations of corruption disk
format etc errors quickly and keep the disk online
in such scenarios for requests to fail appropriately.
2020-07-21 13:54:06 -07:00
Harshavardhana
eb6bf454f1
fix: copyObject encryption from unencrypted object (#10102)
This is a continuation of #10085
2020-07-21 12:25:01 -07:00
Harshavardhana
ec06089eda
fix: re-implement cluster healthcheck (#10101) 2020-07-20 18:31:22 -07:00
Harshavardhana
0c4be55936
fix: fix lockup in merge-walk pool (#10098)
Fixes two different types of problems

- continuation of the problem seen in FS #9992
  as not fixed for erasure coded deployments,
  reproduced this issue with spark and its fixed now

- another issue was leaking walk go-routines which
  would lead to high memory usage and crash the system
  this is simply because all the walks which were purged
  at the top limit had leaking end walkers which would
  consume memory endlessly.

closes #9966
closes #10088
2020-07-20 17:28:26 -07:00
Harshavardhana
11d21d5d1b
fix: pass around the correct drives per set (#10097)
this is a precursor change before adding parity
based SLA across zones instead of same stripe size
2020-07-20 16:38:40 -07:00
Harshavardhana
2955aae8e4
feat: Add notification support for bucketCreates and removal (#10075) 2020-07-20 12:52:49 -07:00
Harshavardhana
9fd836e51f
add dnsStore interface for upcoming operator webhook (#10077) 2020-07-20 12:28:48 -07:00
Anis Elleuch
518f44908c
fs: Close object fs.json before deletion (#10092)
NFS fails when deleting a file while it is already opened. The reason is
that the object fs.json meta file is opened but not closed before
removal.
2020-07-20 08:52:24 -07:00
Harshavardhana
e2c71717f8
add different TCP timeouts for internal and incoming (#10090)
closes #10086
2020-07-19 17:16:12 -07:00
Harshavardhana
7764c542f2
allow claims to be optional in STS (#10078)
not all claims need to be present for
the JWT claim, let the policies not
exist and only apply which are present
when generating the credentials

once credentials are generated then
those policies should exist, otherwise
the request will fail.
2020-07-19 15:34:01 -07:00
Harshavardhana
d53e560ce0
fix: copyObject key rotation issue (#10085)
- copyObject in-place decryption failed
  due to incorrect verification of headers
- do not decode ETag when object is encrypted
  with SSE-C, so that pre-conditions don't fail
  prematurely.
2020-07-18 17:36:32 -07:00
Harshavardhana
17747db93f
fix: support healing older content (#10076)
This PR adds support for healing older
content i.e from 2yrs, 1yr. Also handles
other situations where our config was
not encrypted yet.

This PR also ensures that our Listing
is consistent and quorum friendly,
such that we don't list partial objects
2020-07-17 17:41:29 -07:00
Harshavardhana
3fe27c8411
fix: In federated setup dial all hosts to figure out online host (#10074)
In federated NAS gateway setups, multiple hosts in srvRecords
was picked at random which could mean that if one of the
host was down the request can indeed fail and if client
retries it would succeed. Instead allow server to figure
out the current online host quickly such that we can
exclude the host which is down.

At the max the attempt to look for a downed node is to
300 millisecond, if the node is taking longer to respond
than this value we simply ignore and move to the node,
total attempts are equal to number of srvRecords if no
server is online we simply fallback to last dialed host.
2020-07-17 14:25:47 -07:00
Harshavardhana
14b1c9f8e4
fix: return Range errors after If-Matches (#10045)
closes #7292
2020-07-17 13:01:22 -07:00
Klaus Post
d84fc58cac
fix: CheckParts endpoint call to correct API (#10073)
CheckParts is calling the wrong endpoint, so instead of 
checking parts, it is writing metadata.
2020-07-17 10:17:59 -07:00
Harshavardhana
187c3f62df
fix: heal replaced drives properly (#10069)
healing was not working properly when drives were
replaced, due to the error check in root disk
calculation this PR fixes this behavior

This PR also adds additional fix for missing
metadata entries from .minio.sys as part of
disk healing as well.

Added code to ignore and print more context
sensitive errors for better debugging.

This PR is continuation of fix in 7b14e9b660
2020-07-17 10:08:04 -07:00
Harshavardhana
4bfc50411c
fix: return versionId in tagging APIs (#10068) 2020-07-16 22:38:58 -07:00
Harshavardhana
d3c81a6e93
add missing available space from metrics (#10065) 2020-07-16 14:43:48 -07:00
Harshavardhana
7342b5355f
fix: obtain correct location string with DNS style buckets (#10060)
closes #10054
2020-07-16 13:28:29 -07:00
Harshavardhana
7b14e9b660
fix: diskInfo should check diskID only if disk is online (#10058)
closes #10057
2020-07-16 07:30:05 -07:00
Harshavardhana
cd849bc2ff
update STS docs with new values (#10055)
Co-authored-by: Poorna <poornas@users.noreply.github.com>
2020-07-15 14:36:14 -07:00
Klaus Post
00d3cc4b69
Enforce quota checks after crawl (#10036)
Enforce bucket quotas when crawling has finished. 
This ensures that we will not do quota enforcement on old data.

Additionally, delete less if we are closer to quota than we thought.
2020-07-14 18:59:05 -07:00
Harshavardhana
14ff7f5fcf
add hdfs sub-path support (#10046)
for users who don't have access to HDFS rootPath '/'
can optionally specify `minio gateway hdfs hdfs://namenode:8200/path`
for which they have access to, allowing all writes to be
performed at `/path`.

NOTE: once configured in this manner you need to make
sure command line is correctly specified, otherwise
your data might not be visible

closes #10011
2020-07-14 15:49:10 -07:00
Harshavardhana
369a876ebe
fix: handle array policies in JWT claim (#10041)
PR #10014 was not complete as only handled
policy claims partially.
2020-07-14 10:26:47 -07:00
Anis Elleuch
778e9c864f
Move dependency from minio-go v6 to v7 (#10042) 2020-07-14 09:38:05 -07:00
Harshavardhana
a2616b8227
allow turning off secure ciphers (#10038)
this PR to allow legacy support for big-data
applications which run older Java versions
which do not support the secure ciphers
currently defaulted in MinIO. This option
allows optionally to turn them off such
that client and server can negotiate the
best ciphers themselves.

This env is purposefully not documented,
meant as a last resort when client
application cannot be changed easily.
2020-07-13 14:20:21 -07:00
Harshavardhana
e7d7d5232c
fix: admin info output and improve overall performance (#10015)
- admin info node offline check is now quicker
- admin info now doesn't duplicate the code
  across doing the same checks for disks
- rely on StorageInfo to return appropriate errors
  instead of calling locally.
- diskID checks now return proper errors when
  disk not found v/s format.json missing.
- add more disk states for more clarity on the
  underlying disk errors.
2020-07-13 09:51:07 -07:00
Harshavardhana
1d65ef3201
fix: deletes on older format properly (#10029)
while we handle all situations for writes and reads
on older format, what we didn't cater for properly
yet was delete where we only ended up deleting
just `xl.meta` - instead we should allow all the
deletes to go through for older format without
versioning enabled buckets.
2020-07-13 09:01:17 -07:00
Harshavardhana
37c14207d6
fix: cors handling again for not just OPTIONS request (#10025)
CORS is notorious requires specific headers to be
handled appropriately in request and response,
using cors package as part of handlerFunc() for
options method lacks the necessary control this
package needs to add headers.
2020-07-12 10:56:57 -07:00
Harshavardhana
3b9fbf80ad
fix: make sure to use new restClient for healthcheck (#10026)
Without instantiating a new rest client we can
have a recursive error which can lead to
healthcheck returning always offline, this can
prematurely take the servers offline.
2020-07-11 22:19:38 -07:00
Harshavardhana
143f9371c6 fix: loading users regression
additionally also move to latest gorilla/mux master
to fix the DNS style bucket routing regression

resolves #10022
resolves #10023
2020-07-11 14:03:27 -07:00
Harshavardhana
3f1902face
fix: cors should be available on all paths (#10020) 2020-07-11 13:49:24 -07:00
Harshavardhana
c0adb52213
sync to disk only upon successful legacy metadata rename (#10018) 2020-07-11 09:37:34 -07:00
Harshavardhana
2d17c16d93
fix: make sure to honor versioning from browser UI deletes (#10016) 2020-07-10 22:21:04 -07:00
Harshavardhana
36d36fab0b
fix: add virtual host style workaround for gorilla/mux issue (#10010)
gorilla/mux broke their recent release 1.7.4 which we
upgraded to, we need the current workaround to ensure
that our regex matches appropriately.

An upstream PR is sent, we should remove the
workaround once we have a new release.
2020-07-10 15:21:32 -07:00
Harshavardhana
ba756cf366
fix: extract array type for policy claim if present (#10014) 2020-07-10 14:48:44 -07:00
Benjamin Sodenkamp
c00d410e61
Added bucket name param to ToJSONError call (#9961)
when called with InvalidBucketName error.
The user is shown a more specific error
when the param is present.
2020-07-10 12:10:39 -07:00
Klaus Post
968342c732
Remove usage of go-ieproxy for windows (#10009)
There is a potential for deadlock on Windows 10
refer https://github.com/mattn/go-ieproxy/issues/17 

remove this dependency for now.
2020-07-10 12:08:14 -07:00
Harshavardhana
5c15656c55
support bootstrap client to use healthcheck restClient (#10004)
- reduce locker timeout for early transaction lock
  for more eagerness to timeout
- reduce leader lock timeout to range from 30sec to 1minute
- add additional log message during bootstrap phase
2020-07-10 09:26:21 -07:00
kannappanr
efe9fe6124
azure: Return success when deleting non-existent object (#9981) 2020-07-10 08:30:23 -07:00
Klaus Post
c850905e43
fix: threadwalk lockup under high load (#9992)
Main issue is that `t.pool[params]` should be `t.pool[oldest]`.

We add a bit more safety features for the code.

* Make writes to the endTimerCh non-blocking in all cases
   so multiple releases cannot lock up.
* Double check expectations.
* Shift down deletes with copy instead of truncating slice.
* Actually delete the oldest if we are above total limit.
* Actually delete the oldest found and not the current.
* Unexport the mutex so nobody from the outside can meddle with it.
2020-07-09 07:02:18 -07:00
Andreas Auernhammer
a317a2531c
admin: new API for creating KMS master keys (#9982)
This commit adds a new admin API for creating master keys.
An admin client can send a POST request to:
```
/minio/admin/v3/kms/key/create?key-id=<keyID>
```

The name / ID of the new key is specified as request
query parameter `key-id=<ID>`.

Creating new master keys requires KES - it does not work with
the native Vault KMS (deprecated) nor with a static master key
(deprecated).

Further, this commit removes the `UpdateKey` method from the `KMS`
interface. This method is not needed and not used anymore.
2020-07-08 18:50:43 -07:00
Harshavardhana
2743d4ca87
fix: Add support for preserving mtime for replication (#9995)
This PR is needed for bucket replication support
2020-07-08 17:36:56 -07:00
Harshavardhana
6136a963c8
fix: bump the response header timeout for forwarder as well (#9994)
continuation of #9986, add more place where the lower timeout
comes into effect.
2020-07-08 10:55:24 -07:00
Anis Elleuch
fa211f6a10
heal: Fix healing delete markers (#9989) 2020-07-07 20:54:09 -07:00
Harshavardhana
72e0745e2f
fix: migrate to go.etcd.io import path (#9987)
with the merge of https://github.com/etcd-io/etcd/pull/11823
etcd v3.5.0 will now have a properly imported versioned path

this fixes our pending migration to newer repo
2020-07-07 19:04:29 -07:00
Klaus Post
aa4d1021eb
Remove timeout from putobject and listobjects (#9986)
Use a separate client for these calls that can take a long time.

Add request context to these so they are canceled when the client 
disconnects instead except for ListObject which doesn't have any equivalent.
2020-07-07 12:19:57 -07:00
Harshavardhana
93e7e4a0e5
fix: cors handling after gorilla mux update (#9980)
fixes #9979
2020-07-06 20:55:19 -07:00
Anis Elleuch
c2f7cd1104
Consider errFileVersionNotFound during healing assessment (#9977)
Healing an object which has multiple versions was not working because
the healing code forgot to consider errFileVersionNotFound error as a
use case that needs healing
2020-07-06 08:09:48 -07:00
Anis Elleuch
4cf80f96ad
fix: lifecycle XML parsing errors with Versioning (#9974) 2020-07-05 09:08:42 -07:00
Anis Elleuch
d4af132fc4
lifecycle: Expiry should not delete versions (#9972)
Currently, lifecycle expiry is deleting all object versions which is not
correct, unless noncurrent versions field is specified.

Also, only delete the delete marker if it is the only version of the
given object.
2020-07-04 20:56:02 -07:00
Harshavardhana
c087a05b43
fix: simplify data structure before release (#9968)
- additionally upgrade to msgp@v1.1.2
- change StatModTime,StatSize fields as
  simple Size/ModTime
- reduce 50000 entries per List batch to 10000
  as client needs to wait too long to see the
  first batch some times which is not desired
  and it is worth we write the data as soon
  as we have it.
2020-07-04 12:25:53 -07:00
Harshavardhana
cdb0e6ffed
support proper values for listMultipartUploads/listParts (#9970)
object KMS is configured with auto-encryption,
there were issues when using docker registry -
this has been left unnoticed for a while.

This PR fixes an issue with compatibility.

Additionally also fix the continuation-token implementation
infinite loop issue which was missed as part of #9939

Also fix the heal token to be generated as a client
facing value instead of what is remembered by the
server, this allows for the server to be stateless 
regarding the token's behavior.
2020-07-03 19:27:13 -07:00
Harshavardhana
03b84091fc
auto enable versioning with object locking (#9967)
this is to preserve versioning for object-locked
buckets from current release code.
2020-07-03 15:30:06 -07:00
Anis Elleuch
2be20588bf
Reroute requests based token heal/listing (#9939)
When manual healing is triggered, one node in a cluster will 
become the authority to heal. mc regularly sends new requests 
to fetch the status of the ongoing healing process, but a load 
balancer could land the healing request to a node that is not 
doing the healing request.

This PR will redirect a request to the node based on the node 
index found described as part of the client token. A similar
technique is also used to proxy ListObjectsV2 requests
by encoding this information in continuation-token
2020-07-03 11:53:03 -07:00
Harshavardhana
e59ee14f40
Tune tcp keep-alives with new kernel timeout options (#9963)
For more deeper understanding https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
2020-07-03 10:03:41 -07:00
Anis Elleuch
21a37e3393
fix: ListObjectVersions should return ordered Version & DeleteMarker (#9959)
The S3 specification says that versions are ordered in the response of
list object versions.

mc snapshot needs this to know which version comes first especially when
two versions have the same exact last-modified field.
2020-07-03 09:15:44 -07:00
Harshavardhana
810a4f0723
fix: return proper errors Get/HeadObject for deleteMarkers (#9957) 2020-07-02 16:17:27 -07:00
Krishna Srinivas
4c266df863
fix: proxy ListObjects request to one of the server based on hash(bucket) (#9881) 2020-07-02 10:56:22 -07:00
Klaus Post
abd999f64a
fix: list object versions in distributed setup (#9958)
Remove calls to `WalkVersions` was calling the wrong endpoint, 
so unless quorum could be reached with local disks no results 
would ever be returned.
2020-07-02 10:29:50 -07:00
Benjamin Sodenkamp
648cb13e02
Added 'close' to results channel in Walk() (#9956) 2020-07-01 14:29:45 -07:00
Harshavardhana
174f428571
add additional fdatasync before close() on writes (#9947) 2020-07-01 10:57:23 -07:00
Harshavardhana
5388ae4acb
make sure to delete data-usage cache upon bucket deletes (#9952) 2020-07-01 10:55:28 -07:00
kannappanr
5089a7167d
Handle empty retention in get/put object retention (#9948)
Fixes #9943
2020-06-30 16:44:24 -07:00
Harshavardhana
c0ac25bfff
fix: readiness needs to be like liveness (#9941)
Readiness as no reasoning to be cluster scope
because that is not how the k8s networking works
for pods, all the pods to a deployment are not
sharing the network in a singleton. Instead they
are run as local scopes to themselves, with
readiness failures the pod is potentially taken
out of the network to be resolvable - this
affects the distributed setup in myriad of
different ways.

Instead readiness should behave like liveness
with local scope alone, and should be a dummy
implementation.

This PR all the startup times and overal k8s
startup time dramatically improves.

Added another handler called as `/minio/health/cluster`
to understand the cluster scope health.
2020-06-30 11:28:27 -07:00
Klaus Post
27a1f3ed2b
fs: Check if cache root was added (#9945)
Fixes #9942
2020-06-30 09:32:36 -07:00
Harshavardhana
91817d0d1a
fix: implement generic Walk for gateway (#9938)
Walk() functionality was missing on gateway
implementations leading to missing functionality
for the browser UI such as remove multiple objects,
download as zip file etc.

This PR brings a generic implementation across
all gateway's, it is not required to repeat the
same code in all gateway's
2020-06-29 17:07:23 -07:00
poornas
55a3b071ea
Allow optionally to disable range caching. (#9908)
The default behavior is to cache each range requested
to cache drive. Add an environment variable
`MINIO_RANGE_CACHE` - when set to off, it disables
range caching and instead downloads entire object
in the background.

Fixes #9870
2020-06-29 13:25:29 -07:00
Harshavardhana
a38ce29137
fix: simplify background heal and trigger heal items early (#9928)
Bonus fix during versioning merge one of the PR was missing
the offline/online disk count fix from #9801 port it correctly
over to the master branch from release.

Additionally, add versionID support for MRF

Fixes #9910
Fixes #9931
2020-06-29 13:07:26 -07:00
Harshavardhana
4bba2cd034
fix: disallow versioning to be suspended with object lock (#9930) 2020-06-28 08:15:15 -07:00
Harshavardhana
f7f12b8604
fix: crash in storage rest client due to spurious query params (#9924)
regression got introduced in dee3cf2d7f
when the DeleteVersion API was changed, but the corresponding query
params were left in-tact.
2020-06-26 16:49:49 -07:00
Praveen raj Mani
cf5d051afc
update notification rulesMap when reloading bucketMetadata (#9917) 2020-06-26 13:17:31 -07:00
Harshavardhana
2f681bed57
fix: pop entries from each drives in parallel (#9918) 2020-06-25 23:20:12 -07:00
Praveen raj Mani
b1705599e1
Fix config leaks and deprecate file-based config setters in NAS gateway (#9884)
This PR has the following changes

- Removing duplicate lookupConfigs() calls.
- Deprecate admin config APIs for NAS gateways. This will avoid repeated reloads of the config from the disk.
- WatchConfigNASDisk will be removed
- Migration guide for NAS gateways users to migrate to ENV settings.

NOTE: THIS PR HAS A BREAKING CHANGE

Fixes #9875

Co-authored-by: Harshavardhana <harsha@minio.io>
2020-06-25 15:59:28 +05:30
Harshavardhana
f4b2ed2a92
fix: filter list buckets operation with ListObjects perm (#9907)
fix regression introduced in #9305
2020-06-23 23:21:11 -07:00
Harshavardhana
dee3cf2d7f
fix: preserve modTime for DeleteMarker on remote disks (#9905) 2020-06-23 10:20:31 -07:00
Harshavardhana
21058c34d0
add some description of xl.meta (#9901) 2020-06-22 17:27:54 -07:00
Harshavardhana
5b1e6c7dbc
Add check for object statTime non-negative (#9899) 2020-06-22 14:33:58 -07:00
Harshavardhana
e92434c2e7
fix: support client customized scopes for OpenID (#9880)
Fixes #9238
2020-06-22 12:08:50 -07:00
Klaus Post
cae09d8b84
crawler: Wait max 1 second (#9894)
Add 1-second timeout to crawler wait.

This will make the crawler able to run, albeit very, 
very slowly on high load servers.
2020-06-22 11:57:22 -07:00
Harshavardhana
c54e3b4ea3
Add support for minioreleaser a fork for goreleaser (#9890)
This is to support building containers for multiple
platforms, rpms and debs all in a single build process

https://github.com/harshavardhana/minioreleaser
2020-06-22 08:26:40 -07:00
Klaus Post
972d876ca9
Do not select zones with <5% free after upload (#9877)
Looking into full disk errors on zoned setup. We don't take the
5% space requirement into account when selecting a zone.

The interesting part is that even considering this we don't
know the size of the object the user wants to upload when
they do multipart uploads.

It seems quite defensive to always upload multiparts to
the zone where there is the most space since all load will
be directed to a part of the cluster.

In these cases we make sure it can at least hold a 1GiB file
and we disadvantage fuller zones more by subtracting the
expected size before weighing.
2020-06-20 06:36:44 -07:00
Harshavardhana
b8cb21c954
allow more than N number of locks in TopLocks (#9883) 2020-06-20 06:33:01 -07:00
Harshavardhana
67062840c1
fix: perform CopyObject under more conditions (#9879)
- x-amz-storage-class specified CopyObject
  should proceed regardless, its not a precondition
- sourceVersionID is specified CopyObject should
  proceed regardless, its not a precondition
2020-06-19 13:53:45 -07:00
Harshavardhana
9626a981bc
fix: Preserve old data appropriately (#9873)
This PR fixes all the below scenarios
and handles them correctly.

- existing data/bucket is replaced with
  new content, no versioning enabled old
  structure vanishes.

- existing data/bucket - enable versioning
  before uploading any data, once versioning
  enabled upload new content, old content
  is preserved.

- suspend versioning on the bucket again, now
  upload content again the old content is purged
  since that is the default "null" version.

Additionally sync data after xl.json -> xl.meta
rename(), to avoid any surprises if there is a
crash during this rename operation.
2020-06-19 10:58:17 -07:00
Harshavardhana
b912c8f035
fix: generate new version when replacing metadata in CopyObject (#9871) 2020-06-19 08:44:51 -07:00
Harshavardhana
fa13fe2184
allow loading some from config and some values from ENVs (#9872)
A regression perhaps introduced in #9851
2020-06-18 17:31:56 -07:00
Harshavardhana
85a1956e5c
Avoid duplicate object holding locks (#9867)
Fixes #9866
2020-06-18 10:25:07 -07:00
Harshavardhana
7ed1077879
Add a custom healthcheck function for online status (#9858)
- Add changes to ensure remote disks are not
  incorrectly taken online if their order has
  changed or are incorrect disks.
- Bring changes to peer to detect disconnection
  with separate Health handler, to avoid a
  rather expensive call GetLocakDiskIDs()
- Follow up on the same changes for Lockers
  as well
2020-06-17 14:49:26 -07:00
Harshavardhana
94424e14d7
fix: rename legacy xl.json to xl.meta properly in ListDir() (#9863) 2020-06-17 13:58:38 -07:00
Harshavardhana
e79874f58e
[feat] Preserve version supplied by client (#9854)
Just like GET/DELETE APIs it is possible to preserve
client supplied versionId's, of course the versionIds
have to be uuid, if an existing versionId is found
it is overwritten if no object locking policies
are found.

- PUT /bucketname/objectname?versionId=<id>
- POST /bucketname/objectname?uploads=&versionId=<id>
- PUT /bucketname/objectname?verisonId=<id> (with x-amz-copy-source)
2020-06-17 11:13:41 -07:00
Klaus Post
8aae8b1d27
Put an upper limit on walk pool sizes (#9848)
Fixes potentially infinite allocations, especially in FS mode, 
since lookups live up to 30 minutes. Limit walk pool sizes to 50 
max parameter entries and 4 concurrent operations with the same
parameters.

Fixes #9835
2020-06-17 09:52:07 -07:00
Klaus Post
1813ff9dfa
Re-add missing bucket bloom filters (#9861) 2020-06-17 08:54:41 -07:00
Harshavardhana
4ac31ea82b
fix: find current location of object multi-zones (#9840)
PutObject on multiple-zone with versioning would not
overwrite the correct location of the object if the
object has delete marker, leading to duplicate objects
on two zones.

This PR fixes by adding affinity towards delete marker
when GetObjectInfo() returns error, use the zone index
which has the delete marker.
2020-06-17 08:33:14 -07:00
Harshavardhana
67ca157329
fix: content-md5 is not mandatory for PutBucketVersioning (#9852) 2020-06-17 07:59:08 -07:00
Harshavardhana
f5e1b3d09e
fix: initialize config once per startup (#9851) 2020-06-16 20:15:21 -07:00
Klaus Post
3ba4804d6c
Move online status to REST client (#9808) 2020-06-16 18:59:32 -07:00
Harshavardhana
216de230e2
remove unnecessary log for setMaxResources (#9856)
fixes #9855
2020-06-16 18:57:29 -07:00
ebozduman
a91cfa03e7
extend the HINT on backend ownership and its contents (#9846) 2020-06-16 15:32:29 -07:00
Harshavardhana
087aaaf894
fix: save deleteMarker properly, precision upto UnixNano() (#9843) 2020-06-16 07:54:27 -07:00
Harshavardhana
cbb7a09376
Allow etcd, cache setup to exit when starting gateway mode (#9842)
- Initialize etcd once per call
- Fail etcd, cache setup pro-actively for gateway setups
- Support deleting/updating bucket notification,
  tagging, lifecycle, sse-encryption
2020-06-15 22:09:39 -07:00
Harshavardhana
1a956424e0 Add logs when quorum is lost during readiness checks (#9839) 2020-06-15 13:11:22 -07:00
Harshavardhana
f9aa239973
fix: export prometheus metrics for cache GC triggers (#9815)
Bonus change to use channel to serialize triggers,
instead of using atomic variables. More efficient
mechanism for synchronization.

Co-authored-by: Nitish Tiwari <nitish@minio.io>
2020-06-15 09:05:35 -07:00
Anis Elleuch
2073b79633
fix: Remove unnecessary debug log line (#9834) 2020-06-15 08:55:33 -07:00
Anis Elleuch
63e9005f01 fix: Avoid updating object tags on failed disks (#9819) 2020-06-14 10:53:07 -07:00
Harshavardhana
d55f4336ae
preserve context per request for local locks (#9828)
In the Current bug we were re-using the context
from previously granted lockers, this would
lead to lock timeouts for existing valid
read or write locks, leading to premature
timeout of locks.

This bug affects only local lockers in FS
or standalone erasure coded mode. This issue
is rather historical as well and was present
in lsync for some time but we were lucky to
not see it.

Similar changes are done in dsync as well
to keep the code more familiar

Fixes #9827
2020-06-14 07:43:10 -07:00
ethan ho
535efd34a0
Fix peer server update failure (#9824)
When updating all servers following the constructions of mc update,
only the endpoint server will be updated successfully.
All the other peer servers' updating failed due to the error below:
--------------------------------------------------------------------------
parsing time "2006-01-02T15:04:05Z07:00" as "<release version>": cannot parse "-01-02T15:04:05Z07:00" as "0-" 
--------------------------------------------------------------------------
2020-06-13 07:12:49 -07:00
Harshavardhana
4915433bd2
Support bucket versioning (#9377)
- Implement a new xl.json 2.0.0 format to support,
  this moves the entire marshaling logic to POSIX
  layer, top layer always consumes a common FileInfo
  construct which simplifies the metadata reads.
- Implement list object versions
- Migrate to siphash from crchash for new deployments
  for object placements.

Fixes #2111
2020-06-12 20:04:01 -07:00
Klaus Post
43d6e3ae06
merge object lifecycle checks into usage crawler (#9579) 2020-06-12 10:28:21 -07:00
kannappanr
225b812b5e
Update minio-go library to latest (#9813) 2020-06-12 10:18:42 -07:00
Harshavardhana
96ed0991b5
fix: optimize IAM users load, add fallback (#9809)
Bonus fix, load service accounts properly
when service accounts were generated with
LDAP
2020-06-11 14:11:30 -07:00
Harshavardhana
a42df3d364
Allow idiomatic usage of middlewares in gorilla/mux (#9802)
Historically due to lack of support for middlewares
we ended up writing wrapped handlers for all
middlewares on top of the gorilla/mux, this causes
multiple issues when we want to let's say

- Overload r.Body with some custom implementation
  to track the incoming Reads()
- Add other sort of top level checks to avoid
  DDOSing the server with large incoming HTTP
  bodies.

Since 1.7.x release gorilla/mux provides proper
use of middlewares, which are honored by the muxer
directly. This makes sure that Go can honor its
own internal ServeHTTP(w, r) implementation where
Go net/http can wrap into its own customer readers.

This PR as a side-affect fixes rare issues of client
hangs which were reported in the wild but never really
understood or fixed in our codebase.

Fixes #9759
Fixes #7266
Fixes #6540
Fixes #5455
Fixes #5150

Refer https://github.com/boto/botocore/pull/1328 for
one variation of the same issue in #9759
2020-06-11 08:19:55 -07:00
Harshavardhana
ff94b1b0a9
isEndpointConnected should take local disk inputs (#9803)
PR #9801 while it is correct, the loop isEndpointConnected()
was changed to rely on endpoint.String() which has the host
information as well, which is not correct value as input to
detect if the disk is down or up, if endpoint is local use
its local path value instead.
2020-06-11 08:05:25 -07:00
Andreas Auernhammer
b1845c6c83
kes: try to auto. create master key if not present (#9790)
This commit changes the data key generation such that
if a MinIO server/nodes tries to generate a new DEK
but the particular master key does not exist - then
MinIO asks KES to create a new master key and then
requests the DEK again.

From now on, a SSE-S3 master key must not be created
explicitly via: `kes key create <key-name>`.
Instead, it is sufficient to just set the env. var.
```
export MINIO_KMS_KES_KEY_NAME=<key-name>
```

However, the MinIO identity (mTLS client certificate)
must have the permission to access the `/v1/key/create/`
API. Therefore, KES policy for MinIO must look similar to:
```
[
  /v1/key/create/<key-name-pattern>
  /v1/key/generate/<key-name-pattern>
  /v1/key/decrypt/<key-name-pattern>
]
```
However, in our guides we already suggest that.
See e.g.: https://github.com/minio/kes/wiki/MinIO-Object-Storage#kes-server-setup

***

The ability to create master keys on request may also be
necessary / useful in case of SSE-KMS.
2020-06-11 02:00:47 -07:00
Harshavardhana
62b1da3e2c
fix offline disk calculation (#9801)
Current code was relying on globalEndpoints as
the source of secondary truth to obtain
the missing endpoints list when the disk
is offline, this is problematic

- there is no way to know if the getDisks()
  returned endpoints total is same as the
  ones list of globalEndpoints and it
  belongs to a particular set.
- there is no order guarantee as getDisks()
  is ordered as per format.json, globalEndpoints
  may not be, so potentially end up including
  incorrect endpoints.

To fix this bring getEndpoints() just like getDisks()
to ensure that consistently ordered endpoints are
always available for us to ensure that returned values
are consistent with what each erasure set would observe.
2020-06-10 17:10:31 -07:00
poornas
d26b24f670
avoid storing X-Amz-Tagging-Directive in metadata (#9800) 2020-06-10 14:29:24 -07:00
kannappanr
2c372a9894
Send Partscount only when partnumber is specified (#9793)
Fixes #9789
2020-06-10 09:22:15 -07:00
poornas
3d3b75fb8d
Avoid overwriting object tags when changing lock (#9794) 2020-06-10 08:16:30 -07:00
Klaus Post
142b057be8
Check object names on windows (#9798)
Uploading files with names that could not be written to disk 
would result in "reduce your request" errors returned.

Instead check explicitly for disallowed characters and reject 
files with `Object name contains unsupported characters.`
2020-06-10 08:14:22 -07:00
Harshavardhana
4790868878
allow background IAM load to speed up startup (#9796)
Also fix healthcheck handler to run success
only if object layer has initialized fully
for S3 API access call.
2020-06-09 19:19:03 -07:00
Harshavardhana
342ade03f6
deprecate listDir usage for healing (#9792)
listDir was incorrectly used for healing which
is slower, instead use Walk() to heal the entire
set.
2020-06-09 17:09:19 -07:00
P R
9407dbf387
display proper used space based on disk usage (#9551)
Fixes #9346
2020-06-09 15:05:39 -07:00
Harshavardhana
423aeb0d81
allow large buffer to list more entries per directory (#9785) 2020-06-09 09:44:50 -07:00
Anis Elleuch
790323ac37
lifecycle: Fix object expiration date (#9791)
re-use PredictExpiryTime() in ComputeAction()
2020-06-09 09:40:53 -07:00
Harshavardhana
febe9cc26a
fix: avoid timer leaks in dsync/lsync (#9781)
At a customer setup with lots of concurrent calls
it can be observed that in newRetryTimer there
were lots of tiny alloations which are not
relinquished upon retries, in this codepath
we were only interested in re-using the timer
and use it wisely for each locker.

```
(pprof) top
Showing nodes accounting for 8.68TB, 97.02% of 8.95TB total
Dropped 1198 nodes (cum <= 0.04TB)
Showing top 10 nodes out of 79
      flat  flat%   sum%        cum   cum%
    5.95TB 66.50% 66.50%     5.95TB 66.50%  time.NewTimer
    1.16TB 13.02% 79.51%     1.16TB 13.02%  github.com/ncw/directio.AlignedBlock
    0.67TB  7.53% 87.04%     0.70TB  7.78%  github.com/minio/minio/cmd.xlObjects.putObject
    0.21TB  2.36% 89.40%     0.21TB  2.36%  github.com/minio/minio/cmd.(*posix).Walk
    0.19TB  2.08% 91.49%     0.27TB  2.99%  os.statNolog
    0.14TB  1.59% 93.08%     0.14TB  1.60%  os.(*File).readdirnames
    0.10TB  1.09% 94.17%     0.11TB  1.25%  github.com/minio/minio/cmd.readDirN
    0.10TB  1.07% 95.23%     0.10TB  1.07%  syscall.ByteSliceFromString
    0.09TB  1.03% 96.27%     0.09TB  1.03%  strings.(*Builder).grow
    0.07TB  0.75% 97.02%     0.07TB  0.75%  path.(*lazybuf).append
```
2020-06-08 11:28:40 -07:00
Praveen raj Mani
2ce2e88adf
Support mTLS Authentication in Webhooks (#9777) 2020-06-08 05:55:44 -07:00
Harshavardhana
c7599d323b
fix: throw error if symmetry cannot be obtained (#9780)
For example `{1...17}/{1...52}` symmetrical
distribution of drives cannot be obtained

- Because 17 is a prime number
- Is not divisible by any pre-defined setCounts i.e
  from 1 to 16
2020-06-06 22:13:48 -07:00
Harshavardhana
d93bdea433
fix remove LDAPPassword from audit logs (#9773)
the previous fix for #9707 was not correct,
fix this properly passing the right filter
keys to be filtered from the audit
log output.

Fixes #9767
2020-06-04 22:07:55 -07:00
Harshavardhana
5e529a1c96
simplify context timeout for readiness (#9772)
additionally also add CORS support to restrict
for specific origin, adds a new config and
updated the documentation as well
2020-06-04 14:58:34 -07:00
Harshavardhana
5686a7e273
fix NAS gateway support for policy/notification (#9765)
Fixes #9764
2020-06-03 13:18:54 -07:00
Harshavardhana
566e0e2048
allow deleting of dropped multiparts (#9753)
bonus change trigger MRF heal when single
offline disk is found, break out early.
2020-06-02 15:27:03 -07:00
Anis Elleuch
3aad09be28
heal: Fix passing healing opts (#9756)
Manual healing (as background healing) creates a heal task with a
possiblity to override healing options, such as deep or normal mode.

Use a pointer type in heal opts so nil would mean use the default
healing options.
2020-06-02 09:07:16 -07:00
Harshavardhana
f0358acb32
concurrently load bucket metadata (#9749) 2020-06-01 22:32:53 -07:00
Anis Elleuch
fd0de4ab32
azure: Show better message when credentials are wrong (#9748) 2020-06-01 18:23:48 -07:00
Anis Elleuch
73a308502f
Relax content-md5 requirement in set encryption handler (#9750)
aws cli fails to set a bucket encryption configuration to MinIO server.
The reason is that aws cli does not send MD5-Content header. It seems
that MD5-Content is not required anymore.

This commit also returns Not Implemented header early to help mint tests
to ignore testing this API in gateway modes.
2020-06-01 18:08:19 -07:00
Anis Elleuch
bd59f150b8
azure: Implement CopyPart API (#9747) 2020-06-01 11:12:18 -07:00
Harshavardhana
f90422a890
fix prometheus calculation of offline disks per instance (#9744)
This was a regression introduced in 9baeda7 for prometheus
calculation of offline disks which should be local to
an instance.

fixes #9742
2020-06-01 07:35:40 -07:00
Harshavardhana
8befedef14
simplify FS multipart cleanup (#9740)
fixes #9671
2020-05-30 13:56:31 -07:00
Nathan Brown
2af3004409
Use registry to check Atime support on Windows (#9741) 2020-05-30 09:47:42 -07:00
Harshavardhana
38ee40d59c
move to upstream code colinmarc/hdfs (#9738)
- supports SASL based authentication now
- upgrades to new changes in gokrb library
- implement force delete feature

Fixes #8206
2020-05-29 18:38:50 -07:00
kannappanr
d583f1ac0e
check if container is empty before invoking DeleteContainer (#9733) 2020-05-29 13:24:39 -07:00
Harshavardhana
2bcb02f628
Avoid '\n' from constant strings (#9737)
Fixes #9736
2020-05-29 11:40:57 -07:00
Klaus Post
167ddf9c9c
Workaround for Windows Docker Engine 19.03.8 (#9735)
Add workaround for issue preventing servers from starting on 
Windows Docker Engine 19.03.8

Fixes #9726
2020-05-29 07:05:19 -07:00
Anton Huck
f833e41e69
IAM: Fix nil panic due to uninit. iamGroupPolicyMap. Fixes #9730 (#9734) 2020-05-29 06:13:54 -07:00
Harshavardhana
41688a936b
fix: CopyObject behavior on expanded zones (#9729)
CopyObject was not correctly figuring out the correct
destination object location and would end up creating
duplicate objects on two different zones, reproduced
by doing encryption based key rotation.
2020-05-28 14:36:38 -07:00
Harshavardhana
b2db8123ec
Preserve errors returned by diskInfo to detect disk errors (#9727)
This PR basically reverts #9720 and re-implements it differently
2020-05-28 13:03:04 -07:00
Harshavardhana
b330c2c57e
Introduce simpler GetMultipartInfo call for performance (#9722)
Advantages avoids 100's of stats which are needed for each
upload operation in FS/NAS gateway mode when uploading a large
multipart object, dramatically increases performance for
multipart uploads by avoiding recursive calls.

For other gateway's simplifies the approach since
azure, gcs, hdfs gateway's don't capture any specific
metadata during upload which needs handler validation
for encryption/compression.

Erasure coding was already optimized, additionally
just avoids small allocations of large data structure.

Fixes #7206
2020-05-28 12:36:20 -07:00
kannappanr
7214a0160a
allow bucket policy to set/removed in NAS gateway (#9706) 2020-05-28 08:31:16 -07:00
Anis Elleuch
375b79f11b
storage: Implement GetDiskID request in REST server side (#9720)
GetDiskID() in storage rest client does not really issue a REST request
to the remote disk, but returns an in-memory value instead.

However, GetDiskID() should return an error when format.json is not
found or for other similar issues (unmounted disks, etc..)

GetDiskID() is only called when formatting disks and getting storage
informatio, hence this commit should not have a performance degradation.
2020-05-28 08:17:42 -07:00
Harshavardhana
3da1869d5e
Avoid double reads on metadata during GetObject() (#9719)
Overall TTFB can see a dramatic improvement with
this change - did not do any benchmark as such
but the change itself is self-explanatory
2020-05-27 16:14:26 -07:00
Harshavardhana
7cedc5369d
fix: send valid claims in AuditLogs for browser requests (#9713)
Additionally also fix STS logs to filter out LDAP
password to be sent out in audit logs.

Bonus fix handle the reload of users properly by
making sure to preserve the newer users during the
reload to be not invalidated.

Fixes #9707
Fixes #9644
Fixes #9651
2020-05-27 12:38:44 -07:00
Harshavardhana
53aaa5d2a5
Export bucket usage counts as part of bucket metrics (#9710)
Bonus fixes in quota enforcement to use the
new datastructure and use timedValue to cache
a value/reload automatically avoids one less
global variable.
2020-05-27 06:45:43 -07:00
P R
9d39fb3604
add copyobject tagging replace directive for gateway (#9711) 2020-05-26 17:32:53 -07:00
Klaus Post
4a007e3767
Prefer local disks when fetching data blocks (#9563)
If the requested server is part of the set this will always read 
from the local disk, even if the disk contains a parity shard. 
In default setup there is a 50% chance that at least 
one shard that otherwise would have been fetched remotely 
will be read locally instead.

It basically trades RPC call overhead for reed-solomon. 
On distributed localhost this seems to be fairly break-even, 
with a very small gain in throughput and latency. 
However on networked servers this should be a bigger

1MB objects, before:

```
Operation: GET. Concurrency: 32. Hosts: 4.

Requests considered: 76257:
 * Avg: 25ms 50%: 24ms 90%: 32ms 99%: 42ms Fastest: 7ms Slowest: 67ms
 * First Byte: Average: 23ms, Median: 22ms, Best: 5ms, Worst: 65ms

Throughput:
* Average: 1213.68 MiB/s, 1272.63 obj/s (59.948s, starting 14:45:44 CEST)
```

After:
```
Operation: GET. Concurrency: 32. Hosts: 4.

Requests considered: 78845:
 * Avg: 24ms 50%: 24ms 90%: 31ms 99%: 39ms Fastest: 8ms Slowest: 62ms
 * First Byte: Average: 22ms, Median: 21ms, Best: 6ms, Worst: 57ms

Throughput:
* Average: 1255.11 MiB/s, 1316.08 obj/s (59.938s, starting 14:43:58 CEST)
```

Bonus fix: Only ask for heal once on an object.
2020-05-26 16:47:23 -07:00
Klaus Post
95814359bd
cache disk info to avoid repeated calls (#9682)
This value is requested on every upload when there are multiple zones.

Since this will result in an RPC call to every remote disk this scales 
quite badly in a distributed setup. Load every 1second interval.

2 servers, localhost only. In large distributed setups much bigger 
gains can be expected.

```
Operations: 21743 -> 22454
* Average: +3.28% (+0.0 MiB/s) throughput, +3.28% (+11.9) obj/s
* Fastest: +3.37% (+0.0 MiB/s) throughput, +3.37% (+13.0) obj/s
* 50% Median: +3.03% (+0.0 MiB/s) throughput, +3.03% (+11.2) obj/s
* Slowest: +8.03% (+0.0 MiB/s) throughput, +8.03% (+22.8) obj/s
```

For easy management of this a generic helper has been added.
2020-05-26 12:52:24 -07:00
Harshavardhana
d0ae69087c
fix: add proper errors for disks with preexisting content (#9703) 2020-05-26 09:32:33 -07:00
Harshavardhana
7ea026ff1d
fix: reply back user-metadata in lower case form (#9697)
some clients such as veeam expect the x-amz-meta to
be sent in lower cased form, while this does indeed
defeats the HTTP protocol contract it is harder to
change these applications, while these applications
get fixed appropriately in future.

x-amz-meta is usually sent in lowercased form
by AWS S3 and some applications like veeam
incorrectly end up relying on the case sensitivity
of the HTTP headers.

Bonus fixes

 - Fix the iso8601 time format to keep it same as
   AWS S3 response
 - Increase maxObjectList to 50,000 and use
   maxDeleteList as 10,000 whenever multi-object
   deletes are needed.
2020-05-25 16:51:32 -07:00
Harshavardhana
6e0575a53d
Revert "Disable crawler in FS/NAS gateway mode (#9695)" (#9702)
This reverts commit eba423bb9d.

Additionally also address the FS crawler to properly
calculate the sizes for encrypted/compressed content.
2020-05-25 11:32:53 -07:00
Harshavardhana
eba423bb9d
Disable crawler in FS/NAS gateway mode (#9695)
No one really uses FS for large scale accounting
usage, neither we crawl in NAS gateway mode. It is
worthwhile to simply disable this feature as its
not useful for anyone.

Bonus disable bucket quota ops as well in, FS
and gateway mode
2020-05-25 00:17:52 -07:00
Erkki Eilonen
301de169e9
in cache build ranges metadata as needed (#9698) 2020-05-25 00:17:03 -07:00
Harshavardhana
0c71ce3398
fix size accounting for encrypted/compressed objects (#9690)
size calculation in crawler was using the real size
of the object instead of its actual size i.e either
a decrypted or uncompressed size.

this is needed to make sure all other accounting
such as bucket quota and mcs UI to display the
correct values.
2020-05-24 11:19:17 -07:00
Krishna Srinivas
7d19ab9f62
readiness returns error quickly if any of the set is down (#9662)
This PR adds a new configuration parameter which allows readiness
check to respond within 10secs, this can be reduced to a lower value
if necessary using 

```
mc admin config set api ready_deadline=5s
```

 or

```
export MINIO_API_READY_DEADLINE=5s
```
2020-05-23 17:38:39 -07:00
P R
3f6d624c7b
add gateway object tagging support (#9124) 2020-05-23 11:09:35 -07:00
Harshavardhana
c138272d63
reject object lock requests on existing buckets (#9684)
a regression was introduced fix it to ensure that we
do not allow object locking settings on existing buckets
without object locking
2020-05-23 10:01:01 -07:00
Harshavardhana
7dbfea1353
avoid net/http ErrorLog for consistent logging experience (#9672)
net/http exposes ErrorLog but it is log.Logger
instance not an interface which can be overridden,
because of this reason the logging is interleaved
sometimes with TLS with messages like this on the
server

```
http: TLS handshake error from 139.178.70.188:63760: EOF
```

This is bit problematic for us as we need to have
consistent logging view for allow --json or --quiet
flags.

With this PR we ensure that this format is adhered to.
2020-05-22 21:59:18 -07:00
Sidhartha Mani
c121d27f31
progressively report obd results (#9639) 2020-05-22 17:56:45 -07:00
Anis Elleuch
43c19a6b82
nas: ensure loading of bucket notifications during startup (#9681) 2020-05-22 11:55:30 -07:00
Harshavardhana
e45c90060f
remove references for deprecated dockerfiles and deployment styles (#9675) 2020-05-22 08:40:59 -07:00
Harshavardhana
d15042470e
add missing signature v2 query params (#9670) 2020-05-21 18:51:23 -07:00
Anis Elleuch
cdf4815a6b
Add x-amz-expiration header in some S3 responses (#9667)
x-amz-expiration is described in the S3 specification as a header which
indicates if the object in question will expire any time in the future.
2020-05-21 14:12:52 -07:00
kannappanr
fade056244
filter all encryption headers in gateway (#9661)
fixes #9655
2020-05-21 11:07:50 -07:00
Harshavardhana
a546047c95
keep bucket metadata fields to be consistent (#9660)
added bonus reload bucket metadata always after
a successful MakeBucket, current we were only
doing it with object locking enabled.
2020-05-21 11:03:59 -07:00
ebozduman
2896e780ae
fixes misleading assume role error msgs (#9642) 2020-05-21 09:09:18 -07:00
Harshavardhana
baa30f4289
reload bucket metadata outside the locker (#9659) 2020-05-20 14:11:13 -07:00
Harshavardhana
189c861835
fix: remove LDAP groups claim and store them on server (#9637)
Groups information shall be now stored as part of the
credential data structure, this is a more idiomatic
way to support large LDAP groups.

Avoids the complication of setups where LDAP groups
can be in the range of 150+ which may lead to excess
HTTP header size > 8KiB, to reduce such an occurrence
we shall save the group information on the server as
part of the credential data structure.

Bonus change support multiple mapped policies, across
all types of users.
2020-05-20 11:33:35 -07:00
Harshavardhana
6656fa3066
simplify further bucket configuration properly (#9650)
This PR is a continuation from #9586, now the
entire parsing logic is fully merged into
bucket metadata sub-system, simplify the
quota API further by reducing the remove
quota handler implementation.
2020-05-20 10:18:15 -07:00
Praveen raj Mani
0cc2ed04f5
humanize timeToFirstByte and timeToResponse upto nanoseconds (#9641) 2020-05-19 18:34:02 -07:00
Anis Elleuch
9baeda781a
fix storage info output with unordered endpoints arguments (#9610)
Shuffling arguments that we pass to MinIO server are supported. However,
when that happens, Prometheus returns wrong information about disks usage
and online/offline status.

The commit fixes the issue by avoiding relying on xl.endpoints since
it is not ordered.
2020-05-19 14:27:20 -07:00
Harshavardhana
bd032d13ff
migrate all bucket metadata into a single file (#9586)
this is a major overhaul by migrating off all
bucket metadata related configs into a single
object '.metadata.bin' this allows us for faster
bootups across 1000's of buckets and as well
as keeps the code simple enough for future
work and additions.

Additionally also fixes #9396, #9394
2020-05-19 13:53:54 -07:00
Harshavardhana
d31eaddba3
fix: avoid double body reads in SelectObject call (#9638)
Bonus fix handle encryption headers in response
properly for both notification and response to
the client.
2020-05-19 02:01:08 -07:00
poornas
3202f78f0f
Fix cache metadata update for range GET (#9636)
This was inadvertently deleting cached ranges
because HTTPRangeSpec was not being passed down

fixes #9597
2020-05-18 18:33:43 -07:00
Harshavardhana
6de410a0aa
fix: possiblity of double write lockers on same resource (#9616)
To avoid this issue with refCounter refactor the code
such that

- locker() always increases refCount upon success
- unlocker() always decrements refCount upon success
  (as a special case removes the resource if the
  refCount is zero)

By these two assumptions we are able to see that we
are never granted two write lockers in any situation.

Thanks to @vcabbage for writing a nice reproducer.
2020-05-18 17:33:35 -07:00
Klaus Post
1847f17f50
Set Deployment ID before starting handlers (#9635)
Global handler ID is added to response headers, so initialize it before the server starts.

Fixes #9634
2020-05-18 11:35:05 -07:00
Harshavardhana
1bc32215b9
enable full linter across the codebase (#9620)
enable linter using golangci-lint across
codebase to run a bunch of linters together,
we shall enable new linters as we fix more
things the codebase.

This PR fixes the first stage of this
cleanup.
2020-05-18 09:59:45 -07:00
Anis Elleuch
96009975d6
relax validation when loading lifecycle document from the backend (#9612) 2020-05-18 08:33:43 -07:00
Harshavardhana
de9b391db3
fix: Disable presigned without appropriate policy (#9621)
Fixes #9590
2020-05-17 23:38:52 -07:00
kannappanr
a62572fb86
Check for address flags in all positions (#9615)
Fixes #9599
2020-05-17 08:46:23 -07:00
poornas
011a2c0b78
Add docs for bucket quota feature (#9503)
This PR also adds a check to not enforce
bucket quota for server-side metadata copy
of an object onto itself.
2020-05-16 19:27:33 -07:00
Harshavardhana
814ddc0923
add missing admin actions, enhance AccountUsageInfo (#9607) 2020-05-15 18:16:45 -07:00
Harshavardhana
d348ec0f6c
avoid double listObjectParts calls improves performance (#9606)
this PR is to avoid double calls across multiple calls
in APIs

- CopyObjectPart
- PutObjectPart
2020-05-15 08:06:45 -07:00
Harshavardhana
b730bd1396
fix: possible race in FS local lockMap (#9598) 2020-05-14 23:59:07 -07:00
Klaus Post
56e0c6adf8
Track if bloom filter is dirty (#9601)
Only save bloom filter on cycles and updates.

Fixes #9600
2020-05-14 21:46:36 -07:00
Anis Elleuch
f44a960dcd
tests: Fix one multi-delete test failure in Windows CI (#9602)
There is a disparency of behavior under Linux & Windows about
the returned error when trying to rename a non existant path.

err := os.Rename("/path/does/not/exist", "/tmp/copy")

Linux:
  isSysErrNotDir(err) = false
  os.IsNotExist(err) = true

Windows:
  isSysErrNotDir(err) = true
  os.IsNotExist(err) = true

ENOTDIR in Linux is returned when the destination path
of the rename call contains a file in one of the middle
segments of the path (e.g. /tmp/file/dst, where /tmp/file
is an actual file not a directory)

However, as shown above, Windows has more scenarios when
it returns ENOTDIR. For example, when the source path contains
an inexistant directory in its path.

In that case, we want errFileNotFound returned and not
errFileAccessDenied, so this commit will add a further check to close
the disparency between Windows & Linux.
2020-05-14 18:09:30 -07:00
kannappanr
6c1bbf918d
do not add quotes around etag, if already present (#9603) 2020-05-14 17:43:54 -07:00
Anis Elleuch
48e614b167
honor lifecycle expiration with tag rule (#9604) 2020-05-14 16:21:03 -07:00
poornas
fe8d33452b
Allow writes for bucket exceeding FIFO quota (#9575)
the quota will be enforced while
deleting oldest entries in FIFO manner.
2020-05-14 15:18:24 -07:00
Klaus Post
216fa57b88
merge nested hash readers (#9582)
The `ioutil.NopCloser(reader)` was hiding nested hash readers.

We make it an `io.Closer` so it can be attached without wrapping 
and allows for nesting, by merging the requests.
2020-05-14 14:01:31 -07:00
Klaus Post
ee9077db7d
fix: windows tests for all cases (#9594)
Replaces #9299
2020-05-13 23:55:38 -07:00
Harshavardhana
9c85928740
add formatting message for zones in ordinals (#9596)
Unlike the message
> Formatting 2 zone, 1 set(s), 6 drives per set.

It is more readable as ordinal
> Formatting 2nd zone, 1 set(s), 6 drives per set.
2020-05-13 20:25:29 -07:00
Harshavardhana
6ac48a65cb
fix: use unused cacheMetrics code in prometheus (#9588)
remove all other unusued/deadcode
2020-05-13 08:15:26 -07:00
Krishna Srinivas
94f1a1dea3
add option for O_SYNC writes for standalone FS backend (#9581) 2020-05-12 19:24:59 -07:00
Anis Elleuch
c045ae15e7
fix: avoid undoing bucket creation and return the first err instead (#9578) 2020-05-12 15:20:42 -07:00
Harshavardhana
1756b7c6ff
fix: LDAP derivative accounts parentUser validation is not needed (#9573)
* fix: LDAP derivative accounts parentUser validation is not needed

fixes #9435

* Update cmd/iam.go

Co-authored-by: Lenin Alevski <alevsk.8772@gmail.com>

Co-authored-by: Lenin Alevski <alevsk.8772@gmail.com>
2020-05-12 09:21:08 -07:00
Klaus Post
e25ace2151
Forward RPC errors from crawler (#9569)
The `keepHTTPResponseAlive` would cause errors to be 
returned with status OK.

- Add '32' as a filler byte until a response is ready
- '0' to indicate the response is ready to be consumed
- '1' to indicate response has an error which needs
to be returned to the caller

Clear out 'file not found' errors from dir walker, since it may be 
in a folder that has been deleted since it was scanned.
2020-05-11 20:41:38 -07:00
poornas
a8e5a86fa0
Remove brittle tests for cache (#9570) 2020-05-11 15:41:10 -07:00
Harshavardhana
f8edc233ab
support multiple policies for temporary users (#9550) 2020-05-11 13:04:11 -07:00
Harshavardhana
337c2a7cb4
add audit logging for all admin calls (#9568)
- add ServiceRestart/ServiceStop actions
- audit log appropriately in all admin handlers

fixes #9522
2020-05-11 10:34:08 -07:00
Harshavardhana
b5ed42c845
ignore policy/group missing errors appropriately (#9559) 2020-05-09 13:59:12 -07:00
Klaus Post
d9e7cadacf
Update reed+solomon (#9562)
Only create encoder when strictly needed.
2020-05-09 09:54:20 -07:00
Anis Elleuch
6d76efb9bb
Add support of TCP fast open in internode calls (#9486) 2020-05-08 14:33:23 -07:00
Harshavardhana
a1de9cec58
cleanup object-lock/bucket tagging for gateways (#9548)
This PR is to ensure that we call the relevant object
layer APIs for necessary S3 API level functionalities
allowing gateway implementations to return proper
errors as NotImplemented{}

This allows for all our tests in mint to behave
appropriately and can be handled appropriately as
well.
2020-05-08 13:44:44 -07:00
Anis Elleuch
6885c72f32
disable check for DirectIO in standalone FS mode (#9558) 2020-05-08 12:07:51 -07:00
poornas
0f1389e992
Fix azure gateway handling of ETag for CopyObject (#9544)
fixes #9428
2020-05-08 11:30:35 -07:00
Harshavardhana
9dda1fd624
Remove B2 gateway implementation (#9547)
S3 is now natively supported by B2 cloud storage provider
there is no reason to use specialized gateway for B2 anymore,
our current S3 gateway with caching would work with B2.

Resolves #8584
2020-05-07 19:00:30 -07:00
Harshavardhana
2dc46cb153
Report correct error when O_DIRECT is not supported (#9545)
fixes #9537
2020-05-07 16:12:16 -07:00
remche
0674c0075e
add LDAP StartTLS support (#9472) 2020-05-07 15:08:33 -07:00
Harshavardhana
0dd626ec67
fix: requests without bucket should route to the original router (#9541)
requests in federated setups for STS type calls which are
performed at '/' resource should be routed by the muxer,
the assumption is simply such that requests without a bucket
in a federated setup cannot be proxied, so serve them at
current server.
2020-05-07 11:49:04 -07:00
P R
7e3ea77fdf
Checking for access denied in web browser request. (#9523)
Fixes #9485
2020-05-06 21:31:44 -07:00
Harshavardhana
7290d23b26
Apply partNumber checks only on multipart objects (#9528) 2020-05-06 16:58:09 -07:00
Harshavardhana
4c9de098b0
heal buckets during init and make sure to wait on quorum (#9526)
heal buckets properly during expansion, and make sure
to wait for the quorum properly such that healing can
be retried.
2020-05-06 14:25:05 -07:00
Harshavardhana
a2ccba69e5
add kes retries upto two times with jitter backoff (#9527)
KES calls are not retried and under certain situations
when KES is under high load, the request should be
retried automatically.
2020-05-06 11:44:06 -07:00
Harshavardhana
8eb99d3a87
fix: complete multipart upload respond with ETag quoted (#9525)
Fixes #9517
2020-05-05 17:47:54 -07:00
Bala FA
3773874cd3
add bucket tagging support (#9389)
This patch also simplifies object tagging support
2020-05-05 14:18:13 -07:00
Harshavardhana
6c62b1a2ea fix broken retry tests 2020-05-04 22:01:39 -07:00
Harshavardhana
b768645fde
fix: unexpected logging with bucket metadata conversions (#9519) 2020-05-04 20:04:06 -07:00
Harshavardhana
7b58dcb28c
fix: return context error from context reader (#9507) 2020-05-04 14:33:49 -07:00
Harshavardhana
fea4a1e68e
fix logical error in path length handling for windows (#9520)
fixes #9515
2020-05-04 13:11:56 -07:00
Andreas Auernhammer
a9e83dd42c
crypto: remove dead code (#9516)
This commit removes some crypto-related code
that is not used anywhere anymore.
2020-05-04 11:41:18 -07:00
Andreas Auernhammer
145f501a21
use HTTP/2 when connecting to KES (#9514)
This commit makes the KES client use HTTP/2
when establishing a connection to the KES server.

This is necessary since the next KES server release
will require HTTP/2.
2020-05-04 10:17:13 -07:00
Harshavardhana
9b3b04ecec
allow retries for bucket encryption/policy quorum reloads (#9513)
We should allow quorum errors to be send upwards
such that caller can retry while reading bucket
encryption/policy configs when server is starting
up, this allows distributed setups to load the
configuration properly.

Current code didn't facilitate this and would have
never loaded the actual configs during rolling,
server restarts.
2020-05-04 09:42:58 -07:00
Anis Elleuch
3e063cca5c
Show the cause error in startup when directio is not supported (#9497)
This commit tries to create a file using direct i/o in the startup
so the server returns quickly and avoid cryptic other errors.
2020-05-04 08:48:03 -07:00
Harshavardhana
27d716c663
simplify usage of mutexes and atomic constants (#9501) 2020-05-03 22:35:40 -07:00
ebozduman
fbd15cb7b7
Fixes browser delete issue for anon and authorized users (#9440) 2020-05-03 14:01:28 -07:00
Egor Rudinsky
f7c91eff54
Share button for public objects (#9162) 2020-05-01 23:55:53 -07:00
Dmitry Gadeev
a6bdc086a2
fix: use source scheme retrieved from X-Forwarded headers (#9483) 2020-05-01 23:53:01 -07:00
Bala FA
83ccae6c8b
Store bucket created time as a metadata (#9465)
Fixes #9459
2020-05-01 09:53:14 -07:00
Harshavardhana
28f9c477a8
fix: assume parentUser correctly for serviceAccounts (#9504)
ListServiceAccounts/DeleteServiceAccount didn't work properly
with STS credentials yet due to incorrect Parent user.
2020-05-01 08:05:14 -07:00
Harshavardhana
09571d03a5
avoid unnecessary logging in IAM (#9502) 2020-05-01 18:11:17 +05:30
Harshavardhana
71ce63f79c
fix: background heal to call HealFormat only if needed (#9491)
In large setups this avoids unnecessary data transfer
across nodes and potential locks.

This PR also optimizes heal result channel, which should
be avoided for each queueHealTask as its expensive
to create/close channels for large number of objects.
2020-04-30 20:23:00 -07:00
Harshavardhana
5205c9591f
print proper certinfo on console when starting up (#9479)
also potentially fix a race in certs.go implementation
while accessing tls.Certificate concurrently.
2020-04-30 16:15:29 -07:00
poornas
9a547dcbfb
Add API's for managing bucket quota (#9379)
This PR allows setting a "hard" or "fifo" quota
restriction at the bucket level. Buckets that
have reached the FIFO quota configured, will
automatically be cleaned up in FIFO manner until
bucket usage drops to configured quota.
If a bucket is configured with a "hard" quota
ceiling, all further writes are disallowed.
2020-04-30 15:55:54 -07:00
Anis Elleuch
27632ca6ec
audit: Merge ResponseWriter with RecordAPIStats (#9496)
ResponseWriter & RecordAPIStats has similar role, merge them.

This commit will also fix wrong auditing for STS and Web and others
since they are using ResponseWriter instead of the RecordAPIStats.
2020-04-30 11:27:19 -07:00
Anis Elleuch
d090a17ed0
fix: Audit tests on the correct response writer type (#9445) 2020-04-29 22:17:36 -07:00
Harshavardhana
c2529260e7
fix: crash observed when position of drives different (#9490)
allocate the disk slice properly before populating
disk by its ID and its position.

Fixes #9416
2020-04-29 13:42:37 -07:00
P R
5dd9cf4398
fix: CopyObject with REPLACE directive deletes existing tags (#9478)
Fixes #9477
2020-04-29 10:26:37 +05:30
Harshavardhana
ab77b216d1
fix: remove restrictions on windows for NAME_MAX (#9469)
Fixes #9393
2020-04-28 17:32:46 -07:00
Anis Elleuch
c3c3e9087b
config: More fixes in parsing Audit & Logger env variables (#9474)
- Add support of missed legacy Logger webhook
- Disable enabling Audit or logger if _ENABLE
  if not explicitly set to "on".
2020-04-28 15:20:40 -07:00
Anis Elleuch
7ad6bc955f
show a notice when mixed rootfs & mounted disks is detected (#9471)
A user can incorrectly mounts a newly fresh disk. MinIO will detect
that it is writing with a rootfs disk and will mark it down. However,
it is hard for the user to understand what's going on.

This commit will just print a notice so it will be easy to spot
such use case.
2020-04-28 14:55:01 -07:00
Harshavardhana
7a5271ad96
fix: re-use connections in webhook/elasticsearch (#9461)
- elasticsearch client should rely on the SDK helpers
  instead of pure HTTP calls.
- webhook shouldn't need to check for IsActive() for
  all notifications, failure should be delayed.
- Remove DialHTTP as its never used properly

Fixes #9460
2020-04-28 13:57:56 -07:00
Harshavardhana
1b122526aa
fix: add service account support for AssumeRole/LDAPIdentity creds (#9451)
allow generating service accounts for temporary credentials
which have a designated parent, currently OpenID is not yet
supported.

added checks to ensure that service account cannot generate
further service accounts for itself, service accounts can
never be a parent to any credential.
2020-04-28 12:49:56 -07:00
Anis Elleuch
a3b266761e
Fix audit loading from the env and consider enable env variable (#9467)
Audit was not working properly when enabled from the environment
caused by a typo in the code.

This commit fixes that but also consider the following variables:
  `MINIO_LOGGER_WEBHOOK_ENABLE_*` and 
`MINIO_AUDIT_WEBHOOK_ENABLE_*` so the user can use 
this latter to temporarily disable a logger or audit configuration.
2020-04-28 16:10:51 +05:30
Harshavardhana
498389123e
avoid unnecessary logging on fresh/newly replaced drives (#9470)
data usage tracker and crawler seem to be logging
non-actionable information on console, which is not
useful and is fixed on its own in almost all deployments,
lets keep this logging to minimal.
2020-04-28 01:16:57 -07:00
Harshavardhana
bc61417284
calculate automatic node based symmetry (#9446)
it is possible in many screnarios that even
if the divisible value is optimal, we may
end up with uneven distribution due to number
of nodes present in the configuration.

added code allow for affinity towards various
ellipses to figure out optimal value across
ellipses such that we can always reach a
symmetric value automatically.

Fixes #9416
2020-04-27 14:39:57 -07:00
Harshavardhana
97d952e61c
fix: ensure buckets are preserved if one set returns error (#9468)
the bucket should be deleted if it can be successfully
deleted on all sets, if not we should ensure to
restore those buckets properly.
2020-04-27 14:18:02 -07:00
Klaus Post
073aac3d92
add data update tracking using bloom filter (#9208)
By monitoring PUT/DELETE and heal operations it is possible
to track changed paths and keep a bloom filter for this data. 

This can help prioritize paths to scan. The bloom filter can identify
paths that have not changed, and the few collisions will only result
in a marginal extra workload. This can be implemented on either a
bucket+(1 prefix level) with reasonable performance.

The bloom filter is set to have a false positive rate at 1% at 1M 
entries. A bloom table of this size is about ~2500 bytes when serialized.

To not force a full scan of all paths that have changed cycle bloom
filters would need to be kept, so we guarantee that dirty paths have
been scanned within cycle runs. Until cycle bloom filters have been
collected all paths are considered dirty.
2020-04-27 10:06:21 -07:00
Harshavardhana
eff4127efd Revert "Write files in O_SYNC for fs backend to protect against machine crashes (#9434)"
This reverts commit 4843affd0e.
2020-04-27 09:22:05 -07:00
Harshavardhana
b1c0c32ba6
fix: ignore symlinks in backend filesystems (#9457)
fixes #9419
2020-04-27 06:30:12 -07:00
Harshavardhana
f14bf25cb9
optimize Listen bucket notification implementation (#9444)
this commit avoids lots of tiny allocations, repeated
channel creates which are performed when filtering
the incoming events, unescaping a key just for matching.

also remove deprecated code which is not needed
anymore, avoids unexpected data structure transformations
from the map to slice.
2020-04-27 06:25:05 -07:00
Harshavardhana
f216670814
use context specific to the etcd call (#9458) 2020-04-26 21:42:41 -07:00
Harshavardhana
6ecc98fddb
fix: crash in metrics handler when some disks are offline (#9450)
Fixes #9449
2020-04-25 19:48:07 -07:00
Krishna Srinivas
4843affd0e
Write files in O_SYNC for fs backend to protect against machine crashes (#9434) 2020-04-25 01:18:54 -07:00
Harshavardhana
558785a4bb
fix: config Set/Get decrypt/encrypt using authenticated credentials (#9447)
we have policy available for sub-admin users to set/get/delete
config, but we incorrectly decrypt the content using admin secret
key which in-fact should be the credential authenticating the
request.
2020-04-24 22:36:48 -07:00
Harshavardhana
60d415bb8a
deprecate/remove global WORM mode (#9436)
global WORM mode is a complex piece for which
the time has passed, with the advent of S3 compatible
object locking and retention implementation global
WORM is sort of deprecated, this has been mentioned
in our documentation for some time, now the time
has come for this to go.
2020-04-24 16:37:05 -07:00
BigUstad
45e22cf8aa
fix: selectObject to return error when object does not exist (#9423) 2020-04-24 13:51:48 -07:00