Commit Graph

3504 Commits

Author SHA1 Message Date
Petr Tichý 14aef52004
remove Content-MD5 on Range requests (#11611)
This removes the Content-MD5 response header on Range requests in Azure
Gateway mode. The partial content MD5 doesn't match the full object MD5
in metadata.
2021-02-23 19:32:56 -08:00
Andreas Auernhammer d4b822d697
pkg/etag: add new package for S3 ETag handling (#11577)
This commit adds a new package `etag` for dealing
with S3 ETags.

Even though ETag is often viewed as MD5 checksum of
an object, handling S3 ETags correctly is a surprisingly
complex task. While it is true that the ETag corresponds
to the MD5 for the most basic S3 API operations, there are
many exceptions in case of multipart uploads or encryption.

In worse, some S3 clients expect very specific behavior when
it comes to ETags. For example, some clients expect that the
ETag is a double-quoted string and fail otherwise.
Non-AWS compliant ETag handling has been a source of many bugs
in the past.

Therefore, this commit adds a dedicated `etag` package that provides
functionality for parsing, generating and converting S3 ETags.
Further, this commit removes the ETag computation from the `hash`
package. Instead, the `hash` package (i.e. `hash.Reader`) should
focus only on computing and verifying the content-sha256.

One core feature of this commit is to provide a mechanism to
communicate a computed ETag from a low-level `io.Reader` to
a high-level `io.Reader`.

This problem occurs when an S3 server receives a request and
has to compute the ETag of the content. However, the server
may also wrap the initial body with several other `io.Reader`,
e.g. when encrypting or compressing the content:
```
   reader := Encrypt(Compress(ETag(content)))
```
In such a case, the ETag should be accessible by the high-level
`io.Reader`.

The `etag` provides a mechanism to wrap `io.Reader` implementations
such that the `ETag` can be accessed by a type-check.
This technique is applied to the PUT, COPY and Upload handlers.
2021-02-23 12:31:53 -08:00
Harshavardhana aa7244a9a4
fix: make sure to convert the error properly in HealBucket() (#11610)
server startup code expects the object layer to properly
convert error into a proper type, so that in situations when
servers are coming up and quorum is not available servers
wait on each other.
2021-02-23 09:23:11 -08:00
Harshavardhana 2a79ea0332
isServerResolvable its sufficient to check server is reachable (#11609)
using isServerResolvable for expiration can lead to chicken
and egg problems, a lock might expire knowingly when server
is booting up causing perpetual locks getting expired.
2021-02-22 16:29:53 -08:00
Aditya Manthramurthy 02e7de6367
LDAP config: fix substitution variables (#11586)
- In username search filter and username format variables we support %s for
replacing with the username.

- In group search filter we support %s for username and %d for the full DN of
the username.
2021-02-22 13:20:36 -08:00
Harshavardhana da676ac298
remove network calls for getLocalDisks (#11603) 2021-02-22 13:19:44 -08:00
Harshavardhana 18ec933085
fix: for containers use root-disk detection cleverly (#11593)
root-disk implemented currently had issues where root
disk partitions getting modified might race and provide
incorrect results, to avoid this lets rely again back on
DeviceID and match it instead.

In-case of containers `/data` is one such extra entity that
needs to be verified for root disk, due to how 'overlay'
filesystem works and the 'overlay' presents a completely
different 'device' id - using `/data` as another entity
for fallback helps because our containers describe 'VOLUME'
parameter that allows containers to automatically have a
virtual `/data` that points to the container root path this
can either be at `/` or `/var/lib/` (on different partition)
2021-02-22 10:32:21 -08:00
Harshavardhana c31d2c3fdc
fix: CrawlAndGetDataUsage close pipe() before using a new one (#11600)
also additionally make sure errors during deserializer closes
the reader with right error type such that Write() end
actually see the final error, this avoids a waitGroup usage
and waiting.
2021-02-22 10:04:32 -08:00
Harshavardhana 8778828a03
fix: read metadata in O_DIRECT if configured and supported (#11594)
reduce the page-cache pressure completely by moving
the entire read-phase of our operations to O_DIRECT,
primarily this is going to be very useful for chatty
metadata operations such as listing, scanner, ilm, healing
like operations to avoid filling up the page-cache upon
repeated runs.
2021-02-22 01:36:17 -08:00
Sarasa Kisaragi 48b212dd8e
Fix HDFS wrong filepath if subpath provided (#11574) 2021-02-20 15:32:18 -08:00
Harshavardhana be7de911c4
fix: update minio-go to fix an issue with S3 gateway (#11591)
since we have changed our default envs to MINIO_ROOT_USER,
MINIO_ROOT_PASSWORD this was not supported by minio-go
credentials package, update minio-go to v7.0.10 for this
support. This also addresses few bugs related to users
had to specify AWS_ACCESS_KEY_ID as well to authenticate
with their S3 backend if they only used MINIO_ROOT_USER.
2021-02-20 11:10:21 -08:00
Harshavardhana 8cad407e0b
fix: Bring support for symlink on regular files on NAS (#11383)
fixes #11203
2021-02-20 00:30:12 -08:00
Poorna Krishnamoorthy 85d2187c20
fix: ETag mismatch for large upload in replica (#11587) 2021-02-20 00:22:17 -08:00
Anis Elleuch 98d3f94996
metrics: Add the number of requests in the waiting queue (#11580)
We can use this metric to check if there are too many S3 clients in the
queue and could explain why some of those S3 clients are timing out.

```
minio_s3_requests_waiting_total{server="127.0.0.1:9000"} 9981
```

If max_requests is 10000 then there is a strong possibility that clients
are timing out because of the queue deadline.
2021-02-20 00:21:55 -08:00
mailsmail 173284903b
fix incorrect http range in SelectObjectContentHandler (#11585) 2021-02-19 17:55:28 -08:00
Poorna Krishnamoorthy 2dce5d9442
fix: delete marker permanent delete replication (#11581) 2021-02-18 16:35:37 -08:00
Anis Elleuch f28b063091
heal: Use healDeleteDangling global const in self healing (#11579)
A small fix, use healDeleteDangling constant instead of 'true' in the
self-healing code.
2021-02-18 15:16:20 -08:00
Klaus Post c5b2a8441b
fix: faster healing when disk is replaced. (#11520) 2021-02-18 11:06:54 -08:00
Klaus Post 8a6b13c239
Avoid synchronizing usage writes (#11560)
If the periodic `case <-t.C:` save gets held up for a long time it will end up 
synchronize all disk writes for saving the caches.

We add jitter to per set writes so they don't sync up and don't hold a 
lock for the write, since it isn't needed anyway.

If an outage prevents writes for a long while we also add individual 
waits for each disk in case there was a queue.

Furthermore limit the number of buffers kept to 2GiB, since this could get 
huge in large clusters. This will not act as a hard limit but should be enough 
for normal operation.
2021-02-18 00:38:37 -08:00
Poorna Krishnamoorthy 8e8a792d9d
Allow delete marker replication from replica (#11566)
in the case of active-active replication.

This PR also has the following changes:

- add docs on replication design
- fix corner case of completing versioned delete on a delete marker
  when the target is down and `mc rm --vid` is performed repeatedly. Instead
  the version should still be retained in the `PENDING|FAILED` state until
  replication sync completes.
- remove `s3:Replication:OperationCompletedReplication` and
   `s3:Replication:OperationFailedReplication` from ObjectCreated 
  events type
2021-02-18 00:33:51 -08:00
Harshavardhana 95e0acbb26
fix: allow accountInfo with creds with parentUsers (#11568) 2021-02-17 20:57:17 -08:00
Poorna Krishnamoorthy 55037e6e54
lifecycle:Fix args passed to determine expiry header (#11567) 2021-02-17 19:25:19 -08:00
Harshavardhana 289e1d8b2a
fix: reduce crawler memory usage by orders of magnitude (#11556)
currently crawler waits for an entire readdir call to
return until it processes usage, lifecycle, replication
and healing - instead we should pass the applicator all
the way down to avoid building any special stack for all
the contents in a single directory.

This allows for

- no need to remember the entire list of entries per directory
  before applying the required functions
- no need to wait for entire readdir() call to finish before
  applying the required functions
2021-02-17 15:34:42 -08:00
Harshavardhana ffea6fcf09
fix: rename crawler as scanner in config (#11549) 2021-02-17 12:04:11 -08:00
Klaus Post 11b2220696
Don't autoheal if disks are healing (#11558)
Don't spawn automatic healing ops if a disk is healing.
2021-02-17 10:18:12 -08:00
Harshavardhana aa8450a2a1
fix: parallelize getPoolIdx() for object lookup (#11547) 2021-02-16 19:36:15 -08:00
Harshavardhana 7d4a2d2b68
fix: multiple pool reads parallelize when possible (#11537) 2021-02-16 02:43:47 -08:00
Anis Elleuch c4e12dc846
fix: in MultiDelete API return MalformedXML upon empty input (#11532)
To follow S3 spec
2021-02-13 09:48:25 -08:00
Harshavardhana a94a9c37fa
fix: support IAM policy handling for wildcard actions (#11530)
This PR fixes

- allow 's3:versionid` as a valid conditional for
  Get,Put,Tags,Object locking APIs
- allow additional headers missing for object APIs
- allow wildcard based action matching
2021-02-12 23:05:09 -08:00
Harshavardhana 79b6a43467
fix: avoid timed value for network calls (#11531)
additionally simply timedValue to have RWMutex
to avoid concurrent calls to DiskInfo() getting
serialized, this has an effect on all calls that
use GetDiskInfo() on the same disks.

Such as getOnlineDisks, getOnlineDisksWithoutHealing
2021-02-12 18:17:52 -08:00
Shireesh Anjal 928de04f7a
fix: osinfos incomplete in case of warnings (#11505)
The function used for getting host information
(host.SensorsTemperaturesWithContext) returns warnings in some cases.

Returning with error in such cases means we miss out on the other useful
information already fetched (os info).

If the OS info has been succesfully fetched, it should always be
included in the output irrespective of whether the other data (CPU
sensors, users) could be fetched or not.
2021-02-12 17:57:57 -08:00
Poorna Krishnamoorthy 93fd248b52
fix: save ModTime properly in disk cache (#11522)
fix #11414
2021-02-11 19:25:47 -08:00
Harshavardhana 2a7b123895
turn off http2 for TLS setups for now (#11523)
due to lots of issues with x/net/http2, as
well as the bundled h2_bundle.go in the go
runtime should be avoided for now.

https://github.com/golang/go/issues/23559
https://github.com/golang/go/issues/42534
https://github.com/golang/go/issues/43989
https://github.com/golang/go/issues/33425
https://github.com/golang/go/issues/29246

With collection of such issues present, it
make sense to remove HTTP2 support for now
2021-02-11 15:53:04 -08:00
Harshavardhana b3c56b53fb
fix: metacache should only rename entries during cleanup (#11503)
To avoid large delays in metacache cleanup, use rename
instead of recursive delete calls, renames are cheaper
move the content to minioMetaTmpBucket and then cleanup
this folder once in 24hrs instead.

If the new cache can replace an existing one, we should
let it replace since that is currently being saved anyways,
this avoids pile up of 1000's of metacache entires for
same listing calls that are not necessary to be stored
on disk.
2021-02-11 10:22:03 -08:00
Poorna Krishnamoorthy f24d8127ab
fix: DeleteMultipleObjectsHandler to process deleted objects correctly (#11515)
DeleteMarkerVersionID which is returned by the lower layer should 
not be used in the key to lookup ObjectToDelete map
2021-02-10 23:41:41 -08:00
Harshavardhana 7875d472bc
avoid notification for non-existent delete objects (#11514)
Skip notifications on objects that might have had
an error during deletion, this also avoids unnecessary
replication attempt on such objects.

Refactor some places to make sure that we have notified
the client before we

- notify
- schedule for replication
- lifecycle etc.
2021-02-10 22:00:42 -08:00
Harshavardhana 711adb9652 remove ipv6 fallbackdelay leave it as default 2021-02-10 17:35:09 -08:00
Poorna Krishnamoorthy e6b4ea7618
More fixes for delete marker replication (#11504)
continuation of PR#11491 for multiple server pools and
bi-directional replication.

Moving proxying for GET/HEAD to handler level rather than
server pool layer as this was also causing incorrect proxying 
of HEAD.

Also fixing metadata update on CopyObject - minio-go was not passing
source version ID in X-Amz-Copy-Source header
2021-02-10 17:25:04 -08:00
Aditya Manthramurthy 466e95bb59
Return group DN instead of group name in LDAP STS (#11501)
- Additionally, check if the user or their groups has a policy attached during
the STS call.

- Remove the group name attribute configuration value.
2021-02-10 16:52:49 -08:00
Harshavardhana 881f98e511
fix: use getPoolIdx in DeleteObjects() (#11513)
filter out relevant objects for each pool to
avoid calling, further delete operations on
subsequent pools where some of these objects
might not exist.

This is mainly useful to avoid situations
during bi-directional bucket replication.
2021-02-10 14:25:43 -08:00
Harshavardhana cbf4bb62e0
fix: getPoolIdx decouple from top level options (#11512)
top-level options shouldn't be passed down for
GetObjectInfo() while verifying the objects in
different pools, this is to make sure that
we always get the value from the pool where
the object exists.
2021-02-10 11:45:02 -08:00
Anis Elleuch 682482459d
Change the default object content-type to binary/octet-stream (#11508) 2021-02-10 08:56:37 -08:00
Krishnan Parthasarathi b87fae0049
Simplify PutObjReader for plain-text reader usage (#11470)
This change moves away from a unified constructor for plaintext and encrypted
usage. NewPutObjReader is simplified for the plain-text reader use. For
encrypted reader use, WithEncryption should be called on an initialized PutObjReader.

Plaintext:
func NewPutObjReader(rawReader *hash.Reader) *PutObjReader

The hash.Reader is used to provide payload size and md5sum to the downstream
consumers. This is different from the previous version in that there is no need
to pass nil values for unused parameters.

Encrypted:
func WithEncryption(encReader *hash.Reader,
key *crypto.ObjectKey) (*PutObjReader, error)

This method sets up encrypted reader along with the key to seal the md5sum
produced by the plain-text reader (already setup when NewPutObjReader was
called).

Usage:
```
  pReader := NewPutObjReader(rawReader)
  // ... other object handler code goes here

  // Prepare the encrypted hashed reader
  pReader, err = pReader.WithEncryption(encReader, objEncKey)

```
2021-02-10 08:52:50 -08:00
Shireesh Anjal 5a18d437ce
fix: drive hw info incomplete when smartinfo fails (#11509)
Collection of SMART information doesn't work in certain scenarios e.g.
in a container based setup. In such cases, instead of returning an error
(without any data), we should only set the error on the smartinfo
struct, so that other important drive hw info like device, mountpoint,
etc is retained in the output.
2021-02-10 08:48:14 -08:00
Poorna Krishnamoorthy 93eb549a83
fix: duplicate delete marker attempts in bi-directional replication (#11491) 2021-02-09 15:11:43 -08:00
Harshavardhana fe3c39b583
use the new errgroup API whereever applicable (#11466)
start using the new errgroup concurrency control
API introduced in #11457
2021-02-09 12:08:25 -08:00
Harshavardhana 84d400487f
fix: accountInfo API to cater for federated setups (#11484)
when MinIO is deployed in a federated setup, use etcd 
based listing of buckets to provide appropriate filtering 
of buckets per user.
2021-02-09 09:53:07 -08:00
Shireesh Anjal 3afa499885
fix: empty buckets/objects nodes in new setup (#11493) 2021-02-09 09:52:38 -08:00
Krishna Srinivas 876b79b8d8
read-health check endpoint returns success if cluster can serve read requests (#11310) 2021-02-09 01:00:44 -08:00
Ritesh H Shukla 3d74efa6b1
fux: copy object for encrypted objects (#11490) 2021-02-08 19:58:17 -08:00
Harshavardhana 68d299e719
fix: case-insensitive lookups for metadata (#11489)
continuation of #11487, with more changes
2021-02-08 18:12:28 -08:00
Poorna Krishnamoorthy f9c5636c2d
fix: lookup metdata case insensitively (#11487)
while setting replication options
2021-02-08 16:19:05 -08:00
Klaus Post 9b10118d34
Metacache add abs entry limit (#11483)
Add an absolute limit to the number of metacaches for a bucket.

Delete excess caches if they haven't been handed out in an hour.
2021-02-08 11:36:16 -08:00
Harshavardhana 0e3211f4ad
fix: server upgrades should have more descriptive error messages (#11476)
during rolling upgrade, provide a more descriptive error
message and discourage rolling upgrade in such situations,
allowing users to take action.

additionally also rename `slashpath -> pathutil` to avoid
a slighly mis-pronounced usage of `path` package.
2021-02-08 10:15:12 -08:00
Harshavardhana 2e4d9124ad
honor region specified for remote targets (#11480)
fixes #11472
2021-02-08 08:54:27 -08:00
Harshavardhana 6fef4c21b9
fix: align atomic variables for 32bit arch (#11475)
fixes #11474
2021-02-08 08:51:12 -08:00
Poorna Krishnamoorthy 8e1bbd989a
replication:alloc UserDefined map before use (#11478) 2021-02-07 22:01:10 -08:00
Sarasa Kisaragi 152d7cd95b
HDFS support keytab (#11473) 2021-02-07 17:29:47 -08:00
Harshavardhana 0d057c777a remove restriction for multi pool distribution algo 2021-02-06 16:19:05 -08:00
Anis Elleuch 275f7a63e8
lc: Apply DeleteAction correctly to objects (#11471)
When lifecycle decides to Delete an object and not a version in a
versioned bucket, the code should create a delete marker and not
removing the scanned version.

This commit fixes the issue.
2021-02-06 16:10:33 -08:00
Shireesh Anjal 97fe57bba9
Remove Connections from SysProcess struct (#11373)
The connections info of the processes takes up a huge amount of space,
and is not important for adding any useful health checks. Removing it
will significantly reduce the size of the subnet health report.
2021-02-05 21:32:28 -08:00
Harshavardhana 88c1bb0720
fix: improper ticker usage in goroutines (#11468)
- lock maintenance loop was incorrectly sleeping
  as well as using ticker badly, leading to
  extra expiration routines getting triggered
  that could flood the network.

- multipart upload cleanup should be based on
  timer instead of ticker, to ensure that long
  running jobs don't get triggered twice.

- make sure to get right lockers for object name
2021-02-05 19:23:48 -08:00
Harshavardhana 1fdafaf72f
fix: listing for directory object when delimiter is present (#11463)
When you have heirarchy of prefixes with directory objects
our current master would list directory objects as prefixes
when delimiter is present, this is inconsistent with AWS S3

```
aws s3api list-objects --endpoint-url http://localhost:9000 \
    --profile minio --bucket testbucket-v --prefix new/ --delimiter /
{
    "CommonPrefixes": [
        {
            "Prefix": "new/"
        },
        {
            "Prefix": "new/new/"
        }
    ]
}
```

Instead this PR fixes this to behave like AWS S3

```
aws s3api list-objects --endpoint-url http://localhost:9000 \
      --profile minio --bucket testbucket-v --prefix new/ --delimiter /
{
    "Contents": [
        {
            "Key": "new/",
            "LastModified": "2021-02-05T06:27:42.660Z",
            "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"",
            "Size": 0,
            "StorageClass": "STANDARD",
            "Owner": {
                "DisplayName": "",
                "ID": "02d6176db174dc93cb1b899f7c6078f08654445fe8cf1b6ce98d8855f66bdbf4"
            }
        }
    ],
    "CommonPrefixes": [
        {
            "Prefix": "new/new/"
        }
    ]
}
```
2021-02-05 16:24:40 -08:00
Ritesh H Shukla 5fe4bb6b36
Reduce redundant crawler logging (#11448) 2021-02-05 15:51:11 -08:00
Harshavardhana 99b733d44c
fix: deletion of delete marker regression (#11465)
fixes #11440
fixes #11451
fixes #11454
2021-02-05 15:06:23 -08:00
Klaus Post b4ac05523b
Add parallel bucket healing during startup (#11457)
Replaces #11449

Does concurrent healing but limits concurrency to 50 buckets.

Aborts on first error.

`errgroup.Group` is extended to facilitate this in a generic way.
2021-02-05 13:04:26 -08:00
Anis Elleuch c7eacba41c
health-info: Add tags to errors (#11412)
We use multiple libraries in health info, but the returned error does
not indicate exactly what library call is failing, hence adding named
tags to returned errors whenever applicable.
2021-02-05 12:37:15 -08:00
Anis Elleuch 1887c25279
xl: Fix feeding NumVersions & SuccessorModTime to lifecycle (#11462)
After recent refactor where lifecycle started to rely on ObjectInfo to
make decisions, it turned out there are some issues calculating
Successor Modtime and NumVersions, hence the lifecycle is not working as
expected in a versioning bucket in some cases.

This commit fixes the behavior.
2021-02-05 11:59:08 -08:00
Harshavardhana c9b0f595b9
support directory objects in listing in certain scenarios (#11452)
When a directory object is presented as a `prefix`
param our implementation tend to only list objects
present common to the `prefix` than the `prefix` itself,
to mimic AWS S3 like flat key behavior this PR ensures
that if `prefix` is directory object, it should be
automatically considered to be part of the eventual
listing result.

fixes #11370
2021-02-05 10:12:25 -08:00
Harshavardhana 8bb580abfc
fix: use getObjectNInfo to avoid bytes.Buffer usage (#11428)
few places were still using legacy call GetObject()
which was mainly designed for client response writer,
use GetObjectNInfo() for internal calls instead.
2021-02-05 09:57:30 -08:00
Harshavardhana da55a05587
fix aggressive expiration detection (#11446)
for some flaky networks this may be too fast of a value
choose a defensive value, and let this be addressed
properly in a new refactor of dsync with renewal logic.

Also enable faster fallback delay to cater for misconfigured
IPv6 servers

refer
 - https://golang.org/pkg/net/#Dialer
 - https://tools.ietf.org/html/rfc6555
2021-02-04 16:56:40 -08:00
Harshavardhana 3fc4d6f620
update dependenices for relevant projects (#11445)
- minio-go -> v7.0.8
- ldap/v3 -> v3.2.4
- reedsolomon -> v1.9.11
- sio-go -> v0.3.1
- msgp -> v1.1.5
- simdjson-go, md5-simd, highwayhash
2021-02-04 13:49:52 -08:00
Ritesh H Shukla 67a8f37df0
fix: disk usage capacity metric reporting (#11435) 2021-02-04 12:26:58 -08:00
ArthurMa df0c678167
fix: ldap config parsing issue for UserDNSearchFilter (#11437) 2021-02-04 11:07:29 -08:00
Harshavardhana f108873c48
fix: replication metadata comparsion and other fixes (#11410)
- using miniogo.ObjectInfo.UserMetadata is not correct
- using UserTags from Map->String() can change order
- ContentType comparison needs to be removed.
- Compare both lowercase and uppercase key names.
- do not silently error out constructing PutObjectOptions
  if tag parsing fails
- avoid notification for empty object info, failed operations
  should rely on valid objInfo for notification in all
  situations
- optimize copyObject implementation, also introduce a new 
  replication event
- clone ObjectInfo() before scheduling for replication
- add additional headers for comparison
- remove strings.EqualFold comparison avoid unexpected bugs
- fix pool based proxying with multiple pools
- compare only specific metadata

Co-authored-by: Poorna Krishnamoorthy <poornas@users.noreply.github.com>
2021-02-03 20:41:33 -08:00
Andreas Auernhammer 871b450dbd
crypto: add support for decrypting SSE-KMS metadata (#11415)
This commit refactors the SSE implementation and add
S3-compatible SSE-KMS context handling.

SSE-KMS differs from SSE-S3 in two main aspects:
 1. The client can request a particular key and
    specify a KMS context as part of the request.
 2. The ETag of an SSE-KMS encrypted object is not
    the MD5 sum of the object content.

This commit only focuses on the 1st aspect.

A client can send an optional SSE context when using
SSE-KMS. This context is remembered by the S3 server
such that the client does not have to specify the
context again (during multipart PUT / GET / HEAD ...).
The crypto. context also includes the bucket/object
name to prevent renaming objects at the backend.

Now, AWS S3 behaves as following:
 - If the user does not provide a SSE-KMS context
   it does not store one - resp. does not include
   the SSE-KMS context header in the response (e.g. HEAD).
 - If the user specifies a SSE-KMS context without
   the bucket/object name then AWS stores the exact
   context the client provided but adds the bucket/object
   name internally. The response contains the KMS context
   without the bucket/object name.
 - If the user specifies a SSE-KMS context with
   the bucket/object name then AWS again stores the exact
   context provided by the client. The response contains
   the KMS context with the bucket/object name.

This commit implements this behavior w.r.t. SSE-KMS.
However, as of now, no such object can be created since
the server rejects SSE-KMS encryption requests.

This commit is one stepping stone for SSE-KMS support.

Co-authored-by: Harshavardhana <harsha@minio.io>
2021-02-03 15:19:08 -08:00
Harshavardhana f71e192343
avoid listing an empty dir without __XLDIR__ (#11427)
```
minio server /tmp/disk{1...4}
mc mb myminio/testbucket/
mkdir -p /tmp/disk{1..4}/testbucket/test-prefix/
```

This would end up being listed in the current
master, this PR fixes this situation.

If a directory is a leaf dir we should it
being listed, since it cannot be deleted anymore
with DeleteObject, DeleteObjects() API calls
because we natively support directories now.

Avoid listing it and let healing purge this folder
eventually in the background.
2021-02-03 14:06:54 -08:00
Anis Elleuch b3f81e75f6
xl: Make it clear when to create delete marker for a non existant object (#11423) 2021-02-03 10:33:43 -08:00
Klaus Post a71e0483c9
Fix nil disks in getOnlineDisksWithHealing (#11419)
If a disk is skipped when nil it is still returned.
2021-02-02 17:04:37 -08:00
Klaus Post 4a9d9c8585
Update colinmarc/hdfs (#11417)
Updates needed dependency as well.

Fixes #11416
2021-02-02 15:37:30 -08:00
Harshavardhana c885777ac6
Add support for TCP_QUICKACK (#11369)
TCP_QUICKACK is a setting that allows TCP endpoints
to acknowledge the receipt of data instantly in situations
where they would normally wait to see if more data
would be arriving.

https://assets.extrahop.com/whitepapers/TCP-Optimization-Guide-by-ExtraHop.pdf
2021-02-02 09:44:18 -08:00
Poorna Krishnamoorthy fe3aca70c3
Make number of replication workers configurable. (#11379)
MINIO_API_REPLICATION_WORKERS env.var and
`mc admin config set api` allow number of replication
workers to be configurable. Defaults to half the number
of cpus available.

Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
2021-02-02 16:45:06 +05:30
Ritesh H Shukla c4848f9b4f
Add process start time to cluster metrics. (#11405) 2021-02-01 23:02:18 -08:00
Andreas Auernhammer 838d4dafbd
gateway: don't use encrypted ETags for If-Match (#11400)
This commit fixes a bug in the S3 gateway that causes
GET requests to fail when the object is encrypted by the
gateway itself.

The gateway was not able to GET the object since it always
specified a `If-Match` pre-condition checking that the object
ETag matches an expected ETag - even for encrypted ETags.

The problem is that an encrypted ETag will never match the ETag
computed by the backend causing the `If-Match` pre-condition
to fail.

This commit fixes this by not sending an `If-Match` header when
the ETag is encrypted. This is acceptable because:
  1. A gateway-encrypted object consists of two objects at the backend
     and there is no way to provide a concurrency-safe implementation
     of two consecutive S3 GETs in the deployment model of the S3
     gateway.
     Ref: S3 gateways are self-contained and isolated - and there may
          be multiple instances at the same time (no lock across
          instances).
  2. Even if the data object changes (concurrent PUT) while gateway
     A has download the metadata object (but not issued the GET to
     the data object => data race) then we don't return invalid data
     to the client since the decryption (of the currently uploaded data)
     will fail - given the metadata of the previous object.
2021-02-01 23:02:08 -08:00
Anis Elleuch e96fdcd5ec
tagging: Add event notif for PUT object tagging (#11366)
An optimization to avoid double calling for during PutObject tagging
2021-02-01 13:52:51 -08:00
Anis Elleuch 6ef678663e
xl: Create a delete-marker when no other version exists (#11362)
Currently, it is not possible to create a delete-marker when xl.meta
does not exist (no version is created for that object yet). This makes a
problem for replication and mc mirroring with versioning enabled.

This also follows S3 specification.
2021-02-01 13:23:50 -08:00
Harshavardhana f737a027cf fix: regression introduced in federated listing buckets
regression was introduced in
6cd255d516 fix it properly.
2021-02-01 12:06:58 -08:00
Anis Elleuch 65aa2bc614
ilm: Remove object in HEAD/GET if having an applicable ILM rule (#11296)
Remove an object on the fly if there is a lifecycle rule with delete
expiry action for the corresponding object.
2021-02-01 09:52:11 -08:00
Andreas Auernhammer 33554651e9
crypto: deprecate native Hashicorp Vault support (#11352)
This commit deprecates the native Hashicorp Vault
support and removes the legacy Vault documentation.

The native Hashicorp Vault documentation is marked as
outdated and deprecated for over a year now. We give
another 6 months before we start removing Hashicorp Vault
support and show a deprecation warning when a MinIO server
starts with a native Vault configuration.
2021-01-29 17:55:37 -08:00
Poorna Krishnamoorthy c82aef0a56
fix ObjectInfo returned by CopyObject (#11377)
erasure CopyObject was returning old metadata
2021-01-29 14:49:18 -08:00
Harshavardhana 1e53bf2789
fix: allow expansion with newer constraints for older setups (#11372)
currently we had a restriction where older setups would
need to follow previous style of "stripe" count being same
expansion, we can relax that instead newer pools can be
expanded for older setups with newer constraints of
common parity ratio.
2021-01-29 11:40:55 -08:00
Ritesh H Shukla c8489a8f0c
fix: log notification errors only once (#11350) 2021-01-28 13:40:31 -08:00
Klaus Post 2680772d4b
Don't mark remotes online when shutting down (#11368)
Shutting down will mark remotes online when the shutdown has 
started since the context is canceled.

For example:

```
API: SYSTEM()
Time: 16:21:31 CET 01/28/2021
DeploymentID: 313b0065-c5a1-4aa3-9233-07223e77a730
Error: Storage resources are insufficient for the write operation .minio.sys/tmp/ced455c4-3d27-4bdd-95fc-b4707a179b8a/fd934ef3-8fc8-4330-abc1-f039fbbb9700/part.1 (cmd.InsufficientWriteQuorum)
       1: d:\minio\minio\cmd\data-usage.go:56:cmd.storeDataUsageInBackend()
Exiting on signal: INTERRUPT
Client http://127.0.0.1:9002/minio/lock/v5 online
Client http://127.0.0.1:9002/minio/storage/data/distxl/s2/d3/v24 online
Client http://127.0.0.1:9002/minio/storage/data/distxl/s2/d2/v24 online
Client http://127.0.0.1:9002/minio/storage/data/distxl/s2/d1/v24 online
Client http://127.0.0.1:9002/minio/peer/v12 online
Client http://127.0.0.1:9002/minio/storage/data/distxl/s2/d4/v24 online
```

Use a fresh context for health checks.
2021-01-28 13:38:12 -08:00
Harshavardhana 567f7bdd05 fix: verify overlapping domains when > 1 2021-01-28 13:08:53 -08:00
Harshavardhana 6cd255d516
fix: allow updated domain names in federation (#11365)
additionally also disallow overlapping domain names
2021-01-28 11:44:48 -08:00
Aditya Manthramurthy e79829b5b3
Bind to lookup user after user auth to lookup ldap groups (#11357) 2021-01-27 17:31:21 -08:00
Poorna Krishnamoorthy fd3f02637a
fix: replication regression due to proxying requests (#11356)
In PR #11165 due to incorrect proxying for 2 
way replication even when the object was not 
yet replicated

Additionally, fix metadata comparisons when
deciding to do full replication vs metadata copy.

fixes #11340
2021-01-27 11:22:34 -08:00
Harshavardhana e019f21bda
fix: trigger heal if one of the parts are not found (#11358)
Previously we added heal trigger when bit-rot checks
failed, now extend that to support heal when parts
are not found either. This healing gets only triggered
if we can successfully decode the object i.e read
quorum is still satisfied for the object.
2021-01-27 10:21:14 -08:00
Anis Elleuch e9ac7b0fb7
heal: Remove empty directories (#11354)
Since the introduction of __XLDIR__, an empty directory does not have a
meaning anymore in erasure mode. Make healing removes it wherever it
finds it.
2021-01-27 02:19:28 -08:00
Harshavardhana 1debd722b5 rename last remaining Zone->Pool 2021-01-26 20:47:42 -08:00
massintha azamoum e7f6051f19
Send bucket name to peers when bucket notification is enabled (#11351) 2021-01-26 13:48:28 -08:00
Harshavardhana 6717295e18 fix: rename audit log docs and datastructure 2021-01-26 13:39:55 -08:00
Anis Elleuch 00cff1aac5
audit: per object send pool number, set number and servers per operation (#11233) 2021-01-26 13:21:51 -08:00
Harshavardhana 9722531817 fix: purge LDAP deprecated keys 2021-01-26 09:53:29 -08:00
Harshavardhana 5c6bfae4c7
fix: load credentials from etcd directly when possible (#11339)
under large deployments loading credentials might be
time consuming, while this is okay and we will not
respond quickly for `mc admin user list` like queries
but it is possible to support `mc admin user info`

just like how we handle authentication by fetching
the user directly from persistent store.

additionally support service accounts properly,
reloaded from etcd during watch() - this was missing

This PR is also half way remedy for #11305
2021-01-25 20:01:49 -08:00
Aditya Manthramurthy 5f51ef0b40
Add LDAP Lookup-Bind mode (#11318)
This change allows the MinIO server to be configured with a special (read-only)
LDAP account to perform user DN lookups.

The following configuration parameters are added (along with corresponding
environment variables) to LDAP identity configuration (under `identity_ldap`):

- lookup_bind_dn / MINIO_IDENTITY_LDAP_LOOKUP_BIND_DN
- lookup_bind_password / MINIO_IDENTITY_LDAP_LOOKUP_BIND_PASSWORD
- user_dn_search_base_dn / MINIO_IDENTITY_LDAP_USER_DN_SEARCH_BASE_DN
- user_dn_search_filter / MINIO_IDENTITY_LDAP_USER_DN_SEARCH_FILTER

This lookup-bind account is a service account that is used to lookup the user's
DN from their username provided in the STS API. When configured, searching for
the user DN is enabled and configuration of the base DN and filter for search is
required. In this "lookup-bind" mode, the username format is not checked and must
not be specified. This feature is to support Active Directory setups where the
DN cannot be simply derived from the username.

When the lookup-bind is not configured, the old behavior is enabled: the minio
server performs LDAP lookups as the LDAP user making the STS API request and the
username format is checked and configuring it is required.
2021-01-25 14:26:10 -08:00
Harshavardhana 7e266293e6
fix: notify bucket replication after replication/ilm (#11343) 2021-01-25 14:04:41 -08:00
Harshavardhana eb6871ecd9
fix: LoginSTS should be an inline implementation (#11337)
STS tokens can be obtained by using local APIs
once the remote JWT token is presented, current
code was not validating the incoming token in the
first place and was incorrectly making a network
operation using that token.

For the most part this always works without issues,
but under adversarial scenarios it exposes client
to hand-craft a request that can reach internal
services without authentication.

This kind of proxying should be avoided before
validating the incoming token.
2021-01-25 10:15:03 -08:00
Harshavardhana 9cdd981ce7
fix: expire locks only on participating lockers (#11335)
additionally also add a new ForceUnlock API, to
allow forcibly unlocking locks if possible.
2021-01-25 10:01:27 -08:00
Anis Elleuch bd8020aba8
heal: Decode object name in healing result (#11348)
The user can see __XLDIR__ prefix in mc admin heal when the command
heals an empty object with a trailing slash. This commit decodes the
name of the object before sending it back to the upper level.
2021-01-25 09:53:37 -08:00
Harshavardhana 09bc49bd51
fix: healBucket across sets should capture results properly (#11341)
healing `.minio.sys/config` returns incorrect quorum errors
across sets, healing of the buckets.
2021-01-25 09:45:09 -08:00
Harshavardhana 82f0471d1b
honor maxWait heal config when maxIO hits (#11338) 2021-01-25 07:53:12 -08:00
Harshavardhana 6a95f412c9
avoid double CORS headers in federation (#11334)
CORS proxying adds double headers one
by the receiving server, one by proxied
server. Remove them before proxying
when 'Origin' header is found.
2021-01-23 18:27:23 -08:00
Ritesh H Shukla 7575c24037
Add open FD and FD limit to cluster metrics (#11328) 2021-01-22 18:30:16 -08:00
Harshavardhana 43f973c4cf
fix: check for O_DIRECT support for reads and writes (#11331)
In-case user enables O_DIRECT for reads and backend does
not support it we shall proceed to turn it off instead
and print a warning. This validation avoids any unexpected
downtimes that users may incur.
2021-01-22 15:38:21 -08:00
Harshavardhana 1b453728a3
initialize forwarder after init() to avoid crashes (#11330)
DNSCache dialer is a global value initialized in
init(), whereas `go` keeps `var =` before `init()`
, also we don't need to keep proxy routers as
global entities - register the forwarder as
necessary to avoid crashes.
2021-01-22 15:37:41 -08:00
Harshavardhana a6c146bd00
validate storage class across pools when setting config (#11320)
```
mc admin config set alias/ storage_class standard=EC:3
```

should only succeed if parity ratio is valid for all
server pools, if not we should fail proactively.

This PR also needs to bring other changes now that
we need to cater for variadic drive counts per pool.

Bonus fixes also various bugs reproduced with

- GetObjectWithPartNumber()
- CopyObjectPartWithOffsets()
- CopyObjectWithMetadata()
- PutObjectPart,PutObject with truncated streams
2021-01-22 12:09:24 -08:00
Klaus Post 2167ba0111
Feed correct part number to sio (#11326)
When offsets were specified we relied on the first part number to be correct.

Recalculate based on offset.
2021-01-21 08:43:03 -08:00
Klaus Post 4e6d717f39
Compress profiling data (#11313)
Trace data can be rather large and compresses fine.

Compress profile data in zip files:

```
277.895.314 before.profiles.zip
152.800.318 after.profiles.zip
```
2021-01-20 15:49:53 -08:00
Poorna Krishnamoorthy 845e251fa9
fix: crash in notificationsys when peers online is 0 (#11307)
Check if the number of peers online > 0 before using peerClient
2021-01-20 13:13:05 -08:00
Harshavardhana d1a8f0b786
fix possible crashes on deleteMarker replication (#11308)
Delete marker can have `metaSys` set to nil, that
can lead to crashes after the delete marker has
been healed.

Additionally also fix isObjectDangling check
for transitioned objects, that do not have parts
should be treated similar to Delete marker.
2021-01-20 13:12:12 -08:00
Klaus Post dac19d7272
Clarify root disk error (#11314)
Make it clearer what the problem is and how to resolve it.
2021-01-20 13:11:42 -08:00
Harshavardhana 7624c8b9bb
fix: honor storage class uniformity for multiple pools (#11309) 2021-01-20 01:41:18 -08:00
Klaus Post 19fb1086b2
select: Fix leak on compressed files (#11302)
Properly close gzip reader when done reading

fixes #11300
2021-01-19 17:51:46 -08:00
Harshavardhana a5e23a40ff
fix: allow delayed etcd updates to have fallbacks (#11151)
fixes #11149
2021-01-19 10:05:41 -08:00
Harshavardhana 1ad2b7b699
fix: add stricter validation for erasure server pools (#11299)
During expansion we need to validate if

- new deployment is expanded with newer constraints
- existing deployment is expanded with older constraints
- multiple server pools rejected if they have different
  deploymentID and distribution algo
2021-01-19 10:01:31 -08:00
Harshavardhana b5049d541f
fix: reduce an extra readdir() attempted on non-legacy setups (#11301)
to verify moving content and preserving legacy content,
we have way to detect the objects through readdir()
this path is not necessary for most common cases on
newer setups, avoid readdir() to save multiple system
calls.

also fix the CheckFile behavior for most common
use case i.e without legacy format.
2021-01-19 10:01:06 -08:00
Harshavardhana e0055609bb
fix: crawler to skip healing the drives in a set being healed (#11274)
If an erasure set had a drive replacement recently, we don't
need to attempt healing on another drive with in the same erasure
set - this would ensure we do not double heal the same content
and also prioritizes usage for such an erasure set to be calculated
sooner.
2021-01-19 02:40:52 -08:00
Klaus Post e8ce348da1
crypto: Escape JSON text (#10794)
Escape the JSON keys+values from the context.

We do not add the HTML escapes, since that is an extra escape level not mandatory for JSON.
2021-01-19 01:39:04 -08:00
Ritesh H Shukla b4add82bb6
Updated Prometheus metrics (#11141)
* Add metrics for nodes online and offline
* Add cluster capacity metrics
* Introduce v2 metrics
2021-01-18 20:35:38 -08:00
Harshavardhana 3ca6330661
fix: optimize parentDirIsObject by moving isObject to storage layer (#11291)
For objects with `N` prefix depth, this PR reduces `N` such network
operations by converting `CheckFile` into a single bulk operation.

Reduction in chattiness here would allow disks to be utilized more
cleanly, while maintaining the same functionality along with one
extra volume check stat() call is removed.

Update tests to test multiple sets scenario
2021-01-18 12:25:22 -08:00
Aditya Manthramurthy 3163a660aa
Fix support for multiple LDAP user formats (#11276)
Fixes support for using multiple base DNs for user search in the LDAP directory
allowing users from different subtrees in the LDAP hierarchy to request
credentials.

- The username in the produced credentials is now the full DN of the LDAP user
to disambiguate users in different base DNs.
2021-01-17 21:54:32 -08:00
Harshavardhana 0dadfd1b3d
fix: do not compute usage for not found lifecycle operations (#11288)
Currently we would proceed to apply incorrect lifecycle policies
for non-existent objects, this PR handles them appropriately.
2021-01-17 13:58:41 -08:00
Harshavardhana 4315f93421
fix: make sure parentDirIsObject is used at set level (#11280)
parentDirIsObject is not using set level understanding
to check for parent objects, without this it can lead to
objects that can actually reside on a separate set as
objects and would conflict.
2021-01-17 01:11:48 -08:00
Harshavardhana ddb5d7043a fix: standard storage class is allowed to be '0' 2021-01-16 17:32:25 -08:00
Harshavardhana f903cae6ff
Support variable server pools (#11256)
Current implementation requires server pools to have
same erasure stripe sizes, to facilitate same SLA
and expectations.

This PR allows server pools to be variadic, i.e they
do not have to be same erasure stripe sizes - instead
they should have SLA for parity ratio.

If the parity ratio cannot be guaranteed by the new
server pool, the deployment is rejected i.e server
pool expansion is not allowed.
2021-01-16 12:08:02 -08:00
Poorna Krishnamoorthy 7090bcc8e0
fix: doc links and delete replication permissions enforcement (#11285) 2021-01-15 15:22:55 -08:00
Harshavardhana c222bde14b
fix: use common logging implementation for DNSCache (#11284) 2021-01-15 14:04:56 -08:00
Poorna Krishnamoorthy feaf8dfb9a
Fix replication status reported on completion (#11273)
Fixes: #11272
2021-01-13 11:52:28 -08:00
Harshavardhana 628ef081d1
fix: preserve cache calculated previously while moving from v2 to v3 (#11269)
This ensures that all the prometheus monitoring and usage
trackers to avoid alerts configured, although we cannot
support v1 to v2 here - we can v2 to v3.
2021-01-13 09:58:08 -08:00
Harshavardhana 44dff36ff7
listing with prefix prefixed with '/' should be ignored (#11268)
fixes #11265
2021-01-13 09:44:11 -08:00
Poorna Krishnamoorthy b97d53b29c
fix remote target healthcheck (#11267) 2021-01-12 20:48:04 -08:00
Harshavardhana 1a5775e2e8
enable small and large file optimization (#11260)
- for large objects we found that 1MiB block for
  r/w respectively.
- for small objects we found that 128KiB block for
  r/w respectively.
2021-01-12 10:20:39 -08:00
Anis Elleuch e2579b1f5a
azure: Use default upload parameters to avoid consuming too much memory (#11251)
A lot of memory is consumed when uploading small files in parallel, use
the default upload parameters and add MINIO_AZURE_UPLOAD_CONCURRENCY for
users to tweak.
2021-01-11 22:48:09 -08:00
Poorna Krishnamoorthy 7824e19d20
Allow synchronous replication if enabled. (#11165)
Synchronous replication can be enabled by setting the --sync
flag while adding a remote replication target.

This PR also adds proxying on GET/HEAD to another node in a
active-active replication setup in the event of a 404 on the current node.
2021-01-11 22:36:51 -08:00
Harshavardhana 317305d5f9
fix: regression in adding new replication targets (#11257) 2021-01-11 09:08:42 -08:00
Harshavardhana e4e117faab
fix: enable xl.json to xl.meta only if legacy drive is found (#11255)
another optimization is renameLegacyMetadata() never needs
to validate bucket with os.Stat() again, leading to reduction
in one extra syscall.
2021-01-11 02:27:04 -08:00
Klaus Post 51dad1d130
Fix missing GetObjectNInfo Closure (#11243)
Review for missing Close of returned value from `GetObjectNInfo`.

This was often obscured by the stuff that auto-unlocks when reaching EOF.
2021-01-08 10:12:26 -08:00
Harshavardhana 4593b146be
fix: print errors only when metacache status has errors (#11248) 2021-01-08 16:52:19 +05:30
Harshavardhana f21d650ed4
fix: readData in bulk call using messagepack byte wrappers (#11228)
This PR refactors the way we use buffers for O_DIRECT and
to re-use those buffers for messagepack reader writer.

After some extensive benchmarking found that not all objects
have this benefit, and only objects smaller than 64KiB see
this benefit overall.

Benefits are seen from almost all objects from

1KiB - 32KiB

Beyond this no objects see benefit with bulk call approach
as the latency of bytes sent over the wire v/s streaming
content directly from disk negate each other with no
remarkable benefits.

All other optimizations include reuse of msgp.Reader,
msgp.Writer using sync.Pool's for all internode calls.
2021-01-07 19:27:31 -08:00
Harshavardhana a4f6705874
expire stale locks when owner is down (#11247)
fixes #11246
2021-01-07 19:16:18 -08:00
Poorna Krishnamoorthy b35b537e3f
Pass versionID to checkReplicateDelete in web handler (#11244) 2021-01-07 15:28:27 -08:00
Harshavardhana 5c52d5ffc7
fix: treat errVolumeNotFound as EOF error in listPathRaw (#11238) 2021-01-07 09:52:53 -08:00
Harshavardhana f0808bb2e5
fix: getObject fd leaks in transition and replication code (#11237) 2021-01-06 16:13:10 -08:00
Harshavardhana a6dee21092
initialize IAM store before Init() to avoid any crash (#11236) 2021-01-06 13:40:20 -08:00
Anis Elleuch 6f781c5e7a
heal: Reduce whitespace ticker to 5 seconds (#11234)
30 seconds white spaces is long for some setups which time out when no
read activity in short time, reduce the subnet health white space ticker
to 5 seconds, since it has no cost at all.
2021-01-06 13:29:50 -08:00
Harshavardhana f8ca859790
fix: server/gateway banner formatting (#11230) 2021-01-06 10:38:07 -08:00
Harshavardhana 76e2713ffe
fix: use buffers only when necessary for io.Copy() (#11229)
Use separate sync.Pool for writes/reads

Avoid passing buffers for io.CopyBuffer()
if the writer or reader implement io.WriteTo or io.ReadFrom
respectively then its useless for sync.Pool to allocate
buffers on its own since that will be completely ignored
by the io.CopyBuffer Go implementation.

Improve this wherever we see this to be optimal.

This allows us to be more efficient on memory usage.
```
   385  // copyBuffer is the actual implementation of Copy and CopyBuffer.
   386  // if buf is nil, one is allocated.
   387  func copyBuffer(dst Writer, src Reader, buf []byte) (written int64, err error) {
   388  	// If the reader has a WriteTo method, use it to do the copy.
   389  	// Avoids an allocation and a copy.
   390  	if wt, ok := src.(WriterTo); ok {
   391  		return wt.WriteTo(dst)
   392  	}
   393  	// Similarly, if the writer has a ReadFrom method, use it to do the copy.
   394  	if rt, ok := dst.(ReaderFrom); ok {
   395  		return rt.ReadFrom(src)
   396  	}
```

From readahead package
```
// WriteTo writes data to w until there's no more data to write or when an error occurs.
// The return value n is the number of bytes written.
// Any error encountered during the write is also returned.
func (a *reader) WriteTo(w io.Writer) (n int64, err error) {
	if a.err != nil {
		return 0, a.err
	}
	n = 0
	for {
		err = a.fill()
		if err != nil {
			return n, err
		}
		n2, err := w.Write(a.cur.buffer())
		a.cur.inc(n2)
		n += int64(n2)
		if err != nil {
			return n, err
		}
```
2021-01-06 09:36:55 -08:00
Harshavardhana b5d291ea88
fix: rename remaining zone -> pool (#11231) 2021-01-06 09:35:47 -08:00
Klaus Post eb9172eecb
Allow Compression + encryption (#11103) 2021-01-05 20:08:35 -08:00
Poorna Krishnamoorthy 64bddf47d8
Pass deletemarker correctly to replicate opts (#11227)
fixes: #11180
2021-01-05 14:12:37 -08:00
Harshavardhana 4ed45ce543
fix: healing buckets during pool expansion (#11224)
fixes #11209
2021-01-05 13:24:22 -08:00
Klaus Post ad511b0eb8
tests: Fix occasional data race (#11223)
CI tests could trigger a data race.

Servers are generally not expected to reinitialize, so tests could trigger data races when reinitializing and async operations are running.

We add the option to safely reset global vars instead of overwriting.

Fixes races like:

```
WARNING: DATA RACE
Read at 0x00000477ab18 by goroutine 1159:
  github.com/minio/minio/cmd.FileInfo.ToObjectInfo()
      /home/runner/work/minio/minio/cmd/erasure-metadata.go:105 +0x16d
  github.com/minio/minio/cmd.erasureObjects.putObject()
      /home/runner/work/minio/minio/cmd/erasure-object.go:748 +0x13f8
  github.com/minio/minio/cmd.(*erasureObjects).listPath.func3.2()
      /home/runner/work/minio/minio/cmd/metacache-set.go:682 +0x7d3
  github.com/minio/minio/cmd.newMetacacheBlockWriter.func1.2()
      /home/runner/work/minio/minio/cmd/metacache-stream.go:777 +0x1c4
  github.com/minio/minio/cmd.newMetacacheBlockWriter.func1()
      /home/runner/work/minio/minio/cmd/metacache-stream.go:806 +0x614

Previous write at 0x00000477ab18 by goroutine 1269:
  [failed to restore the stack]

Goroutine 1159 (running) created at:
  github.com/minio/minio/cmd.newMetacacheBlockWriter()
      /home/runner/work/minio/minio/cmd/metacache-stream.go:760 +0x112
  github.com/minio/minio/cmd.(*erasureObjects).listPath.func3()
      /home/runner/work/minio/minio/cmd/metacache-set.go:672 +0xe22

Goroutine 1269 (running) created at:
  testing.(*T).Run()
      /opt/hostedtoolcache/go/1.14.13/x64/src/testing/testing.go:1095 +0x537
  testing.runTests.func1()
      /opt/hostedtoolcache/go/1.14.13/x64/src/testing/testing.go:1339 +0xa6
  testing.tRunner()
      /opt/hostedtoolcache/go/1.14.13/x64/src/testing/testing.go:1050 +0x1eb
  testing.runTests()
      /opt/hostedtoolcache/go/1.14.13/x64/src/testing/testing.go:1337 +0x594
  testing.(*M).Run()
      /opt/hostedtoolcache/go/1.14.13/x64/src/testing/testing.go:1252 +0x2ff
  github.com/minio/minio/cmd.TestMain()
      /home/runner/work/minio/minio/cmd/test-utils_test.go:120 +0x44e
  main.main()
      _testmain.go:1408 +0x223
==================
==================
WARNING: DATA RACE
Read at 0x00000477aae8 by goroutine 1159:
  github.com/minio/minio/cmd.(*BucketVersioningSys).Enabled()
      /home/runner/work/minio/minio/cmd/bucket-versioning.go:26 +0x52
  github.com/minio/minio/cmd.FileInfo.ToObjectInfo()
      /home/runner/work/minio/minio/cmd/erasure-metadata.go:105 +0x197
  github.com/minio/minio/cmd.erasureObjects.putObject()
      /home/runner/work/minio/minio/cmd/erasure-object.go:748 +0x13f8
  github.com/minio/minio/cmd.(*erasureObjects).listPath.func3.2()
      /home/runner/work/minio/minio/cmd/metacache-set.go:682 +0x7d3
  github.com/minio/minio/cmd.newMetacacheBlockWriter.func1.2()
      /home/runner/work/minio/minio/cmd/metacache-stream.go:777 +0x1c4
  github.com/minio/minio/cmd.newMetacacheBlockWriter.func1()
      /home/runner/work/minio/minio/cmd/metacache-stream.go:806 +0x614

Previous write at 0x00000477aae8 by goroutine 1269:
  [failed to restore the stack]

Goroutine 1159 (running) created at:
  github.com/minio/minio/cmd.newMetacacheBlockWriter()
      /home/runner/work/minio/minio/cmd/metacache-stream.go:760 +0x112
  github.com/minio/minio/cmd.(*erasureObjects).listPath.func3()
      /home/runner/work/minio/minio/cmd/metacache-set.go:672 +0xe22

Goroutine 1269 (running) created at:
  testing.(*T).Run()
      /opt/hostedtoolcache/go/1.14.13/x64/src/testing/testing.go:1095 +0x537
  testing.runTests.func1()
      /opt/hostedtoolcache/go/1.14.13/x64/src/testing/testing.go:1339 +0xa6
  testing.tRunner()
      /opt/hostedtoolcache/go/1.14.13/x64/src/testing/testing.go:1050 +0x1eb
  testing.runTests()
      /opt/hostedtoolcache/go/1.14.13/x64/src/testing/testing.go:1337 +0x594
  testing.(*M).Run()
      /opt/hostedtoolcache/go/1.14.13/x64/src/testing/testing.go:1252 +0x2ff
  github.com/minio/minio/cmd.TestMain()
      /home/runner/work/minio/minio/cmd/test-utils_test.go:120 +0x44e
  main.main()
      _testmain.go:1408 +0x223
==================
```
2021-01-05 10:45:26 -08:00
Harshavardhana cb0eaeaad8
feat: migrate to ROOT_USER/PASSWORD from ACCESS/SECRET_KEY (#11185) 2021-01-05 10:22:57 -08:00
Harshavardhana d0027c3c41
do not use large buffers if not necessary (#11220)
without this change, there is a performance
regression for small objects GETs, this makes
the overall speed to go back to pre '59d363'
commit days.
2021-01-04 18:51:52 -08:00
Anis Elleuch cb7fc99368
handlers: Avoid initializing a struct in each handler call (#11217) 2021-01-04 09:54:22 -08:00
Harshavardhana a4383051d9
remove/deprecate crawler disable environment (#11214)
with changes present to automatically throttle crawler
at runtime, there is no need to have an environment
value to disable crawling. crawling is a fundamental
piece for healing, lifecycle and many other features
there is no good reason anyone would need to disable
this on a production system.

* Apply suggestions from code review
2021-01-04 09:43:31 -08:00
Harshavardhana e7ae49f9c9
fix: calculate prometheus disks_offline/disks_total correctly (#11215)
fixes #11196
2021-01-04 09:42:09 -08:00
Anis Elleuch 153d4be032
tracing: NumSubscribers() to use atomic instead of mutex (#11219)
globalSubscribers.NumSubscribers() is heavily used in S3 requests and it
uses mutex, use atomic.Load instead since it is faster

Co-authored-by: Anis Elleuch <anis@min.io>
2021-01-04 09:40:30 -08:00
Anis Elleuch dfd99b6d8f
handlers: Little bit more optimizations (#11211) 2021-01-04 00:01:06 -08:00
Harshavardhana c4b1d394d6
erasure: avoid io.Copy in hotpaths to reduce allocation (#11213) 2021-01-03 16:27:34 -08:00
Harshavardhana c4131c2798
feat: Small object optimization read data in single bulk call (#11207) 2021-01-03 11:27:57 -08:00
Anis Elleuch c9d502e6fa
parentDirIsObject() to return quickly with inexistant parent (#11204)
Rewrite parentIsObject() function. Currently if a client uploads
a/b/c/d, we always check if c, b, a are actual objects or not.

The new code will check with the reverse order and quickly quit if 
the segment doesn't exist.

So if a, b, c in 'a/b/c' does not exist in the first place, then returns
false quickly.
2021-01-02 12:01:29 -08:00
Anis Elleuch 677e80c0f8
xl: Remove check-dir in ReadVersion (#11200)
The only purpose of check-dir flag in
ReadVersion is to return 404 when
an object has xl.meta but without data.

This is causing an extract call to the disk 
which can be penalizing in case of busy system
where disks receive many concurrent access.
2021-01-02 10:35:57 -08:00
Harshavardhana aa85af4d1a fix: missing CopyObjectPart maxClients reorder 2021-01-01 23:07:37 -08:00
Anis Elleuch ae731d232f
trace: Reorder http/trace maxClients wrapping for correct tracing (#11202)
mc admin trace does not show the correct handler name in the output: it
is printing `maxClients' for all handlers. The reason is that the wrong
order of handler wrapping.
2021-01-01 23:06:07 -08:00
Anis Elleuch a317d220ed
xl-storage: Do not stat bucket assuming the object exists (#11201)
In HEAD/GET, only STAT the bucket if the 
object does not exist to return the correct
error response.
2021-01-01 09:44:36 -08:00
Harshavardhana 3e1221a01c
fix: log once updating dataUsageCache versions (#11190)
also reduce usage of *bytes.Buffer for
reading `usage-cache.bin`
2020-12-31 09:45:09 -08:00
Ritesh H Shukla 36fc2f98ed
fix: admin trace throttled requests (#11192) 2020-12-30 21:04:55 -08:00
Ritesh H Shukla 556524c715
Reduce logging when peer is offline (#11184) 2020-12-30 14:38:54 -08:00
Harshavardhana cc457f1798
fix: enhance logging in crawler use console.Debug instead of logger.Info (#11179) 2020-12-29 01:57:28 -08:00
Harshavardhana ca0d31b09a
fix: re-arrange handlers to handle requests on /minio (#11177)
fixes #11175
2020-12-28 17:10:33 -08:00
Harshavardhana 445a9bd827
fix: heal optimizations in crawler to avoid multiple healing attempts (#11173)
Fixes two problems

- Double healing when bitrot is enabled, instead heal attempt
  once in applyActions() before lifecycle is applied.

- If applyActions() is successful and getSize() returns proper
  value, then object is accounted for and should be removed
  from the oldCache namespace map to avoid double heal attempts.
2020-12-28 10:31:00 -08:00
Harshavardhana d8d25a308f
fix: use HealObject for cleaning up dangling objects (#11171)
main reason is that HealObjects starts a recursive listing
for each object, this can be a really really long time on
large namespaces instead avoid recursive listing just
perform HealObject() instead at the prefix.

delete's already handle purging dangling content, we
don't need to achieve this by doing recursive listing,
this in-turn can delay crawling significantly.
2020-12-27 15:42:20 -08:00
Harshavardhana c19e6ce773
avoid a crash in crawler when lifecycle is not initialized (#11170)
Bonus for static buffers use bytes.NewReader instead of
bytes.NewBuffer, to use a more reader friendly implementation
2020-12-26 22:58:06 -08:00
Harshavardhana 59d3639396
fix: inherit heal opts globally, including bitrot settings (#11166)
Bonus re-use ReadFileStream internal io.Copy buffers, fixes
lots of chatty allocations when reading metacache readers
with many sustained concurrent listing operations

```
   17.30GB  1.27% 84.80%    35.26GB  2.58%  io.copyBuffer
```
2020-12-24 23:04:03 -08:00
Harshavardhana 027e17468a
fix: discarding results do not attempt in-memory metacache writer (#11163)
Optimizations include

- do not write the metacache block if the size of the
  block is '0' and it is the first block - where listing
  is attempted for a transient prefix, this helps to
  avoid creating lots of empty metacache entries for
  `minioMetaBucket`

- avoid the entire initialization sequence of cacheCh
  , metacacheBlockWriter if we are simply going to skip
  them when discardResults is set to true.

- No need to hold write locks while writing metacache
  blocks - each block is unique, per bucket, per prefix
  and also is written by a single node.
2020-12-24 15:02:02 -08:00
Harshavardhana 45ea161f8d
webUI: change listing to 1000 keys from browser UI (#11159)
gateway implementations do not handle maxKeys being
`-1` properly unlike MinIO implementation, handle it
by setting an appropriate value.

fixes #11158
2020-12-23 19:58:15 -08:00
Harshavardhana 6a66f142d4
fix: strict quorum in list should list on all drives (#11157)
current implementation was incorrect, it in-fact
assumed only read quorum number of disks. in-fact
that value is only meant for read quorum good entries
from all online disks.

This PR fixes this behavior properly.
2020-12-23 09:26:40 -08:00
Harshavardhana 5982965839
fix: re-use bytes.Buffer using sync.Pool (#11156) 2020-12-22 23:22:37 -08:00
Harshavardhana 8565cefe4e fix: allow HTTP2.0 to be always configured 2020-12-22 16:32:58 -08:00
Andreas Auernhammer 8cdf2106b0
refactor cmd/crypto code for SSE handling and parsing (#11045)
This commit refactors the code in `cmd/crypto`
and separates SSE-S3, SSE-C and SSE-KMS.

This commit should not cause any behavior change
except for:
  - `IsRequested(http.Header)`

which now returns the requested type {SSE-C, SSE-S3,
SSE-KMS} and does not consider SSE-C copy headers.

However, SSE-C copy headers alone are anyway not valid.
2020-12-22 09:19:32 -08:00
Harshavardhana 35fafb837b
fix: issues with handling delete markers in metacache (#11150)
Additional cases handled

- fix address situations where healing is not
  triggered on failed writes and deletes.

- consider object exists during listing when
  metadata can be successfully decoded.
2020-12-22 09:16:43 -08:00
Harshavardhana 274bbad5cb
fix: select always online peers for remote listing (#11153)
always find the right set of online peers for remote listing,
this may have an effect on listing if the server is down - we
should do this to avoid always performing transient operations
on bucket->peerClient that is permanently or down for a long
period.
2020-12-22 09:16:07 -08:00
Harshavardhana 5c451d1690
update x/net/http2 to address few bugs (#11144)
additionally also configure http2 healthcheck
values to quickly detect unstable connections
and let them timeout.

also use single transport for proxying requests
2020-12-21 21:42:38 -08:00
Poorna Krishnamoorthy c987313431
Encrypt remote target if kms is configured (#11034)
Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
2020-12-21 16:21:33 -08:00
Anis Elleuch 2ecaab55a6
admin: ServerInfo returns info without object layer initialized (#11142) 2020-12-21 09:35:19 -08:00
Harshavardhana 3e792ae2a2
fix: change defaults for DNS cache dialer (#11145) 2020-12-21 09:33:29 -08:00
Harshavardhana 4cc500a041
normalize users with double // in accessKeys (#11143)
Bonus fix, use constant time compare for secret keys  in web-handlers.go:SetAuth()
2020-12-20 10:09:51 -08:00
Harshavardhana d8e28830cf
fix: allow STS creds for admin accounts to add users (#11138)
Allow rotating creds with privileges to add users

fixes https://github.com/minio/console/issues/529
2020-12-19 13:24:21 -08:00