Commit Graph

5133 Commits

Author SHA1 Message Date
Poorna
21bf5b4db7
replication: heal proactively upon access (#15501)
Queue failed/pending replication for healing during listing and GET/HEAD
API calls. This includes healing of existing objects that were never
replicated or those in the middle of a resync operation.

This PR also fixes a bug in ListObjectVersions where lifecycle filtering
should be done.
2022-08-09 15:00:24 -07:00
Harshavardhana
a406bb0288
restrict number of disks used for scanning buckets upto GOMAXPROCS (#15492)
control scanner parallelism to avoid higher CPU
usage on nodes that have more drives but an old CPU.
2022-08-08 16:16:44 -07:00
Harshavardhana
1823ab6808
LDAP/OpenID must be initialized IAM Init() (#15491)
This allows for LDAP/OpenID to be non-blocking,
allowing for unreachable Identity targets to be
initialized in IAM.
2022-08-08 16:16:27 -07:00
Harshavardhana
8eec49304d use logger.Info instead of logger.LogIf 2022-08-08 16:13:58 -07:00
Harshavardhana
ecdc2f2f5f
fix: maxConcurrent '0' is an invalid value (#15500)
log and continue with defaults instead of
crashing the service.
2022-08-08 15:18:45 -07:00
Harshavardhana
e178c55bc3
remove non-working GetRawData() from FS mode (#15498) 2022-08-08 11:34:09 -07:00
Poorna
2c137c0d04
fix: handle invalid endpoint errors in site replication(#15499)
fixes #15497
2022-08-08 11:12:05 -07:00
Harshavardhana
638c57e466 revert changes in FS implementation for umask
fixes #15494
2022-08-08 09:48:24 -07:00
Harshavardhana
5e4213b3be
fix: keep writing previous speedtest result (#15484)
when object speedtest is running keep writing
previous speedtest result back to client until
we have a new result - this avoids sending back
blank entries in between the speedtest when it
is running in 'autotune' mode.
2022-08-07 23:04:03 -07:00
Harshavardhana
e0b0a351c6
remove IAM old migration code (#15476)
```
commit 7bdaf9bc50
Author: Aditya Manthramurthy <donatello@users.noreply.github.com>
Date:   Wed Jul 24 17:34:23 2019 -0700

    Update on-disk storage format for users system (#7949)
```

Bonus: fixes a bug when etcd keys were being re-encrypted.
2022-08-05 17:53:23 -07:00
Anis Elleuch
1d2ff46a89
Ensure lock/versioning permissions when creating a bucket (#15432)
Currently, the code doesn't check if the user creating a bucket with
locking feature has bucket locking and versioning permissions enabled,
adding it in accordance with S3 spec.

https://docs.aws.amazon.com/AmazonS3/latest/API/API_CreateBucket.html

Object Lock - If ObjectLockEnabledForBucket is set to true in your CreateBucket request,
s3:PutBucketObjectLockConfiguration and s3:PutBucketVersioning permissions are required.
2022-08-05 16:27:09 -07:00
Harshavardhana
8f7c739328
feat: add SpeedTest ResponseTimes and TTFB (#15479)
Capture average, p50, p99, p999 response times
and ttfb values. These are needed for latency
measurements and overall understanding of our
speedtest results.
2022-08-05 09:40:03 -07:00
Poorna
1beea3daba
fix: import bucket metadata import to return a summary (#15462) 2022-08-05 01:52:50 -07:00
Aditya Manthramurthy
3d94c38ec4
Add env variables to configuration APIs output (#15465)
Config export and config get APIs now include environment 
variables set on the server
2022-08-04 22:21:52 -07:00
Harshavardhana
f4af2d3cdc
fix: decodeDirObject() in single drive DeleteObjects() call (#15477)
Thanks to @bh4t for reproducing this issue.
2022-08-04 18:57:43 -07:00
ebozduman
b57e7321e7
Replaces 'disk'=>'drive' visible to end user (#15464) 2022-08-04 16:10:08 -07:00
Anis Elleuch
e93867488b
actively cancel listIAMConfigItems to avoid goroutine leak (#15471)
listConfigItems creates a goroutine but sometimes callers will
exit without properly asking listAllIAMConfigItems() to stop sending
results, hence a goroutine leak.

Create a new context and cancel it for each listAllIAMConfigItems
call.
2022-08-04 13:20:43 -07:00
Harshavardhana
3bd9615d0e
fix: log if there is readDir() failure with ListBuckets (#15461)
This is actionable and must be logged.

Bonus: also honor umask by using 0o666 for all Open() syscalls.
2022-08-04 07:23:05 -07:00
Harshavardhana
a6e0ec4e6f
Add support converting non-inlined to inlined (#15444)
This is a feature to allow for inode compaction on
large clusters that use a lot of small files spread
across a large heirarchy.
2022-08-02 23:10:22 -07:00
Andreas Auernhammer
d774a3309b
kes: automatically reload KES client certificate (#15450)
This commit adds support for automatically reloading
the MinIO client certificate for authentication to KES.

The client certificate will now be reloaded:
 - when the private key / certificate file changes
 - when a SIGHUP signal is received
 - every 15 minutes

Fixes #14869

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-08-02 16:58:09 -07:00
Anis Elleuch
b3edb25377
bloom: healObject to mark a path dirty only for dangling objects (#15458)
The path is marked dirty automatically when healObject() is called, which is
wrong. HealObject() is called during self-healing and this will lead to
an increase in the false positive result of the bloom filter.

Also move NSUpdated() from renameData() and call it directly in
CompleteMultipart and PutObject, this is not a functional change but
it will make it less prone to errors in the future.
2022-08-02 16:57:39 -07:00
Harshavardhana
53a816b17a
fix: readdir fallback on root of the drive (#15457)
fixes #15452
2022-08-02 14:57:36 -07:00
Harshavardhana
043aaa792d
fix: intrument os.OpenFile differently for Reads and Writes (#15449)
allows us to trace latency for READs or WRITEs
2022-08-01 13:22:43 -07:00
Shireesh Anjal
e6eab2091f
fix: Incorrect ServersCount in cluster.info (#15431)
The `ServersCount` field in cluster.info is expected to contain the
number of nodes, and not number of endpoints.
2022-07-29 22:21:40 -07:00
Harshavardhana
3cdb609cca
allow root users to return appropriate policy in AccountInfo (#15437)
fixes #15436

This fixes a regression caused after the removal of "consoleAdmin"
policy usage for 'root users' in PR #15402
2022-07-29 20:58:03 -07:00
Harshavardhana
aa874010e2
fix: regression in resolving the right versions (#15430)
fix: regression in resolving right versions

commit d480022711 caused a regression in real
resolver, by picking up incorrect versionID.
2022-07-29 10:03:53 -07:00
Cesar Celis Hernandez
8ec888d13d
feat: update binary once and push it to other servers (#15407) 2022-07-29 08:34:30 -07:00
Harshavardhana
916f274c83
choose starting concurrency based on number of local disks (#15428)
smaller setups may have less drives per server choosing
the concurrency based on number of local drives, and let
the MinIO server change the overall concurrency as
necessary.
2022-07-29 00:00:06 -07:00
Aditya Manthramurthy
7ac53c07af
fix: passing application configuration to console (#15409)
This is an update to MinIO server after swagger codegen related build
fixes added after issues introduced in 39fd7b0b3b
2022-07-28 18:30:24 -07:00
Harshavardhana
bc72e4226e
do not allow filesystem fallback in server download (#15429)
It is possible for anyone with admin access to relatively
to get any content of any random OS location by simply
providing the file with 'mc admin update alias/ /etc/passwd`.

Workaround is to disable 'admin:ServiceUpdate' action. Everyone
is advised to upgrade to this patch.

Thanks to @alevsk for finding this bug.
2022-07-28 17:44:21 -07:00
Poorna
5e0776e96a
replication: Include replica object versions for resync (#15427) 2022-07-28 13:43:02 -07:00
Anis Elleuch
2f1ef02d35
Do not update directory access time (#15426)
Most setups will have relatime it only updates the access time 
following a change in the directory.
2022-07-28 12:40:48 -07:00
Harshavardhana
aff236e20e
fix: cluster healthcheck for single drive setups (#15415)
single drive setups must return '200 OK' if
drive is accessible, current master returns '503'
2022-07-27 16:46:34 -07:00
Harshavardhana
cbd70d26b5
optimize speedtest for smaller setups (#15414)
this has been observed in multiple environments
where the setups are small `speedtest` naturally
fails with default '10s' and the concurrency
of '32' is big for such clusters.

choose a smaller value i.e equal to number of
drives in such clusters and let 'autotune'
increase the concurrency instead.
2022-07-27 14:41:59 -07:00
Harshavardhana
5e763b71dc
use logger.LogOnce to reduce printing disconnection logs (#15408)
fixes #15334

- re-use net/url parsed value for http.Request{}
- remove gosimple, structcheck and unusued due to https://github.com/golangci/golangci-lint/issues/2649
- unwrapErrs upto leafErr to ensure that we store exactly the correct errors
2022-07-27 09:44:59 -07:00
Aditya Manthramurthy
7e4e7a66af
Remove internal usage of consoleAdmin (#15402)
"consoleAdmin" was used as the policy for root derived accounts, but this
lead to unexpected bugs when an administrator modified the consoleAdmin
policy

This change avoids evaluating a policy for root derived accounts as by
default no policy is mapped to the root user. If a session policy is
attached to a root derived account, it will be evaluated as expected.
2022-07-26 19:06:55 -07:00
Shireesh Anjal
906947a285
fix: typo in json key ClusterInfo DeploymentID (#15406)
deployement_id -> deployment_id
2022-07-26 19:05:33 -07:00
Poorna
426c902b87
site replication: fix healing of bucket deletes. (#15377)
This PR changes the handling of bucket deletes for site 
replicated setups to hold on to deleted bucket state until 
it syncs to all the clusters participating in site replication.
2022-07-25 17:51:32 -07:00
Anis Elleuch
e4b51235f8
upgrade: Split in two steps to ensure a stable retry (#15396)
Currently, if one server in a distributed setup fails to upgrade 
due to any reasons, it is not possible to upgrade again unless 
nodes are restarted.

To fix this, split the upgrade process into two steps :

- download the new binary on all servers
- If successful, overwrite the old binary with the new one
2022-07-25 17:49:47 -07:00
Eng Zer Jun
0a3b1ad4eb
test: use T.TempDir to create temporary test directory (#15400)
This commit replaces `ioutil.TempDir` with `t.TempDir` in tests. The
directory created by `t.TempDir` is automatically removed when the test
and all its subtests complete.

Prior to this commit, temporary directory created using `ioutil.TempDir`
needs to be removed manually by calling `os.RemoveAll`, which is omitted
in some tests. The error handling boilerplate e.g.
	defer func() {
		if err := os.RemoveAll(dir); err != nil {
			t.Fatal(err)
		}
	}
is also tedious, but `t.TempDir` handles this for us nicely.

Reference: https://pkg.go.dev/testing#T.TempDir

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2022-07-25 12:37:26 -07:00
Anis Elleuch
f23f442d33
Add cluster info to inspect/profiling archive (#15360)
Add cluster info to inspect and profiling archive.

In addition to the existing data generation for both inspect and profiling,
cluster.info file is added. This latter contains some info of the cluster.
The generation of cluster.info is is done as the last step and it can fail
if it exceed 10 seconds.
2022-07-25 09:11:35 -07:00
Klaus Post
3795b2c8ba
Add compression scheme to header (#15395)
For easier debugging. We still do not return compressed size for security reasons.
2022-07-24 07:15:49 -07:00
Harshavardhana
7725425e05
fix: fork os.MkdirAll to optimize cases where parent exists (#15379)
a/b/c/d/ where `a/b/c/` exists results in additional syscalls
such as an Lstat() call to verify if the `a/b/c/` exists
and its a directory.

We do not need to do this on MinIO since the parent prefixes
if exist, we can simply return success without spending
additional syscalls.

Also this implementation attempts to simply use Access() calls
to avoid os.Stat() calls since the latter does memory allocation
for things we do not need to use.

Access() is simpler since we have a predictable structure on
the backend and we know exactly how our path structures are.
2022-07-24 00:43:11 -07:00
Aditya Manthramurthy
39fd7b0b3b
Pass multiple IDP config to console (#15270)
This change passes multiple IDP config via a struct 
rather than env variables.
2022-07-22 15:28:02 -07:00
Harshavardhana
b0d70a0e5e
support additional claim info in Auditing STS calls (#15381)
Bonus: Adds a missing AuditLog from AssumeRoleWithCertificate API

Fixes #9529
2022-07-22 11:12:03 -07:00
Poorna
7d8c8de827
single drive: Remove bucket metadata on DeleteBucket (#15378)
from disk and in-memory map
2022-07-21 19:51:53 -07:00
jiuker
3faef829c5
expect full quorum for writing 'format.json' everywhere (#15362) 2022-07-21 18:04:17 -07:00
Poorna
7560fb6f9a
save IAM export assets relative at a folder prefix (#15355) 2022-07-21 17:51:33 -07:00
Klaus Post
69bf39f42e
fix: make complete multipart uploads faster encrypted/compressed backends (#15375)
- Only fetch the parts we need and abort as soon as one is missing.
- Only fetch the number of parts requested by "ListObjectParts".
2022-07-21 16:47:58 -07:00
Minio Trusted
564a0afae1 Revert "tests: Add context cancelation (#15374)"
This reverts commit 1e332f0eb1.

Reverting this as tests are failing randomly.
2022-07-21 13:58:56 -07:00
Klaus Post
1e332f0eb1
tests: Add context cancelation (#15374)
A huge number of goroutines would build up from various monitors

When creating test filesystems provide a context so they can shut down when no longer needed.
2022-07-21 11:52:18 -07:00
Poorna
cab8d3d568
feat: add API to return list of objects waiting to be replicated (#15091) 2022-07-21 11:05:44 -07:00
Klaus Post
be8c4cb24a
fix: support multiple validateAdminReq actions (#15372)
handle multiple validateAdminReq actions and remove duplicate error responses.
2022-07-21 10:26:59 -07:00
Harshavardhana
65166e4ce4
fix: readQuorum calculation when defaultParityCount is 0 (#15363)
when parity is '0' the readQuorum must be equal
to the number of data disks.
2022-07-21 07:25:54 -07:00
Harshavardhana
d3f89fa6e3
remove unnecessary logs in IAM store (#15356) 2022-07-20 08:19:12 -07:00
Harshavardhana
ce8397f7d9
use partInfo only for intermediate part.x.meta (#15353) 2022-07-19 18:56:24 -07:00
Klaus Post
cae9aeca00
fix: reused field crash in PartIndices (#15351)
`PartIndices` may be set if xlMetaV2Version is reused.

Clear before unmarshaling and add sanity check when reading.
2022-07-19 16:49:46 -07:00
Klaus Post
f939d1c183
Independent Multipart Uploads (#15346)
Do completely independent multipart uploads.

In distributed mode, a lock was held to merge each multipart 
upload as it was added. This lock was highly contested and 
retries are expensive (timewise) in distributed mode.

Instead, each part adds its metadata information uniquely. 
This eliminates the per object lock required for each to merge.
The metadata is read back and merged by "CompleteMultipartUpload" 
without locks when constructing final object.

Co-authored-by: Harshavardhana <harsha@minio.io>
2022-07-19 08:35:29 -07:00
Andreas Auernhammer
242d06274a
kms: add context.Context to KMS API calls (#15327)
This commit adds a `context.Context` to the
the KMS `{Stat, CreateKey, GenerateKey}` API
calls.

The context will be used to terminate external calls
as soon as the client requests gets canceled.

A follow-up PR will add a `context.Context` to
the remaining `DecryptKey` API call.

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-07-18 18:54:27 -07:00
Poorna
957e3ed729
export IAM: include site replicator svcacct (#15339) 2022-07-18 17:38:53 -07:00
Harshavardhana
b6eb8dff64
Add decommission compression+encryption enabled tests (#15322)
update compression environment variables to follow
the expected sub-system style, however support fallback
mode.
2022-07-17 08:43:14 -07:00
Harshavardhana
7da9e3a6f8
support encrypted/compressed objects properly during decommission (#15320)
fixes #15314
2022-07-16 19:35:24 -07:00
Anis Elleuch
876970baea
Exclude upload-ids with incomplete part upload in multipart listing (#15318)
Uploading a part object can leave an inconsistent state inside
.minio.sys/multipart where data are uploaded but xl.meta is not
committed yet.

Do not list upload-ids that have this state in the multipart listing.
2022-07-16 13:25:58 -07:00
LHHDZ
e68e76e143
fix: data race, which caused tests execution to fail (#15313) 2022-07-16 07:57:55 -07:00
Harshavardhana
e7ac1ea54c
allow decommission to continue when healing (#15312)
Bonus:

- heal buckets in-case during startup the new
  pools have bucket missing.
2022-07-15 21:03:23 -07:00
Harshavardhana
5ac6d91525
support 'admin update' for hotfix versions (#15308)
hotfixed versions are rejected as invalid,
allow `mc admin update` from hotfix repos.
2022-07-15 16:00:34 -07:00
Harshavardhana
1cd6713e24
copy query values before update to preserve the expected keys (#15310)
in success_action_redirect we were missing required
query params as per S3 spec - updated tests.
2022-07-15 15:04:48 -07:00
Harshavardhana
1b339ea062
allow force delete on decom pool (#15302)
Bonus:

- skip suspended pool from being
  considered for multipart uploads

- add more context for decomErrors()
2022-07-14 20:44:22 -07:00
Harshavardhana
236ef03dbd
fix: skip objects expired via lifecycle rules during decommission (#15300) 2022-07-14 16:47:09 -07:00
Poorna
7e32a17742
fix: site replication healing of missing buckets (#15298)
fixes a regression from #15186

- Adding tests to cover healing of buckets.
- Also dereference quota in SiteReplicationStatus only when non-nil
2022-07-14 14:27:47 -07:00
Krishnan Parthasarathi
1d42133d44
listing: Expire object versions past expiry (#15287)
We skip object versions which are past their ILM expiry. This change schedules
them for expiry while at it.
2022-07-14 07:21:26 -07:00
Poorna
b4f6901903
resync: Avoid concurrent access/write on map (#15286)
fixes a crash

```
fatal error: concurrent map iteration and map write
minio[19309]: goroutine 18640 [running]:
minio[19309]: runtime.throw({0x27a3399?, 0x1785?})
minio[19309]: runtime/panic.go:992 +0x71 fp=0xc0062f1c80 sp=0xc0062f1c50 pc=0x438671
minio[19309]: runtime.mapiternext(0xc0062f1e90?)
minio[19309]: runtime/map.go:871 +0x4eb fp=0xc0062f1cf0 sp=0xc0062f1c80 pc=0x41002b
minio[19309]: github.com/minio/minio/cmd.(*ReplicationPool).periodicResyncMetaSave(0xc0056c00c0, {0x4d06a48, 0xc0005b2480}, {0x4d22fc0, 0xc0015ea0
```
2022-07-13 16:29:10 -07:00
Klaus Post
0149382cdc
Add padding to compressed+encrypted files (#15282)
Add up to 256 bytes of padding for compressed+encrypted files.

This will obscure the obvious cases of extremely compressible content 
and leave a similar output size for a very wide variety of inputs.

This does *not* mean the compression ratio doesn't leak information 
about the content, but the outcome space is much smaller, 
so often *less* information is leaked.
2022-07-13 07:52:15 -07:00
Klaus Post
697c9973a7
Upgrade compression package (#15284)
Includes mitigation for CVE-2022-30631 (Go should still be updated)

Remove functions now available upstream.
2022-07-13 07:48:14 -07:00
Harshavardhana
788fd3df81
preserve incoming query params in success_action_redirect (#15280)
fixes #15274
2022-07-13 07:46:44 -07:00
Anis Elleuch
996cac5fed
Avoid listing buckets from a suspended pool (#15283)
Make bucket requests sent after decommissioning is started are not
created in a suspended pool. Therefore listing buckets should avoid
suspended pools as well.
2022-07-13 07:44:50 -07:00
Harshavardhana
0a8b78cb84
fix: simplify passing auditLog eventType (#15278)
Rename Trigger -> Event to be a more appropriate
name for the audit event.

Bonus: fixes a bug in AddMRFWorker() it did not
cancel the waitgroup, leading to waitgroup leaks.
2022-07-12 10:43:32 -07:00
Harshavardhana
b4eb74f5ff
allow custom speedtest bucket (#15271)
this allows for specifying existing buckets with

- object replication enabled
- object encryption enabled
- object versioning enabled
- object locking enabled
2022-07-12 10:12:47 -07:00
Anis Elleuch
57d1f31054
Do not log erasure read failure when disk goes offline (#15277)
Avoid printing the following log

```
API: SYSTEM
Time: Fri Jul 08 2022 11:48:40 GMT+0100
Error: Error(disk not found) reading erasure shards at...

Backtrace:
0: internal/logger/logger.go:278:logger.LogIf()
1: cmd/bitrot-streaming.go:156:cmd.(*streamingBitrotReader).ReadAt()
2: cmd/erasure-decode.go:165:cmd.(*parallelReader).Read.func1()
```
2022-07-12 09:56:56 -07:00
Klaus Post
9f02f51b87
Add 4K minimum compressed size (#15273)
There is no point in compressing very small files.

Typically the effective size on disk will be the same due to disk blocks.

So don't waste resources on extremely small files.

We don't check on multipart. 1) because we don't know and 2) this is very likely a big object anyway.
2022-07-12 07:42:04 -07:00
Klaus Post
911a17b149
Add compressed file index (#15247) 2022-07-11 17:30:56 -07:00
Poorna
3d969bd2b4
fix: ignore missing targets/replication config during site removal (#15269) 2022-07-11 14:11:46 -07:00
Andreas Auernhammer
f800cee4fa
metric: add KMS-related metrics (#15258)
This commit adds a minimal set of KMS-related metrics:
```
 # HELP minio_cluster_kms_online Reports whether the KMS is online (1) or offline (0)
 # TYPE minio_cluster_kms_online gauge
 minio_cluster_kms_online{server="127.0.0.1:9000"} 1
 # HELP minio_cluster_kms_request_error Number of KMS requests that failed with a well-defined error
 # TYPE minio_cluster_kms_request_error counter
 minio_cluster_kms_request_error{server="127.0.0.1:9000"} 16790
 # HELP minio_cluster_kms_request_success Number of KMS requests that succeeded
 # TYPE minio_cluster_kms_request_success counter
 minio_cluster_kms_request_success{server="127.0.0.1:9000"} 348031
```

Currently, we report whether the KMS is available and how many requests
succeeded/failed. However, KES exposes much more metrics that can be
exposed if necessary. See: https://pkg.go.dev/github.com/minio/kes#Metric

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-07-11 09:17:28 -07:00
Praveen raj Mani
b49fc33cb3
purge objects immediately with x-minio-force-delete in DeleteObject and DeleteBucket API (#15148) 2022-07-11 09:15:54 -07:00
Klaus Post
37a6b2da67
Allow compaction at bucket top level. (#15266)
If more than 1M folders (objects or prefixes) are found at the top level in a bucket allow it to be compacted.

While very suboptimal structure we should limit memory usage at some point.
2022-07-11 07:59:03 -07:00
Harshavardhana
913e977c8d
remove auto-port warning for console-address (#15260) 2022-07-08 13:36:41 -07:00
Harshavardhana
c2ddcb3b40
do not recreate deprecated delete-journal.bin, only read it (#15185)
simplify deprecated code, re-enable hot-swap replace disks
2022-07-08 12:17:02 -07:00
Anis Elleuch
ed0cbfb31e
fix: rootdisk detection by not using cached value when GetDiskInfo() errors out (#15249)
GetDiskInfo() uses timedValue to cache the disk info for one second.

timedValue behavior was recently changed to return an old cached value
when calculating a new value returns an error.

When a mount point is empty, GetDiskInfo() will return errUnformattedDisk,
timedValue will return cached disk info with unexpected IsRootDisk value,
e.g. false if the mount point belongs to a root disk. Therefore, the mount
point will be considered a valid disk and will be formatted as well.

This commit will also add more defensive code when marking root disks:
always mark a disk offline for any GetDiskInfo() error except
errUnformattedDisk. The server will try anyway to reconnect to those
disks every 10 seconds.
2022-07-07 17:05:23 -07:00
Harshavardhana
32b2f6117e
fix: do not pass around sync.Map (#15250)
it is not safe to pass around sync.Map
through pointers, as it may be concurrently
updated by different callers.

this PR simplifies by avoiding sync.Map
altogether, we do not need sync.Map
to keep object->erasureMap association.

This PR fixes a crash when concurrently
using this value when audit logs are
configured.

```
fatal error: concurrent map iteration and map write

goroutine 247651580 [running]:
runtime.throw({0x277a6c1?, 0xc002381400?})
        runtime/panic.go:992 +0x71 fp=0xc004d29b20 sp=0xc004d29af0 pc=0x438671
runtime.mapiternext(0xc0d6e87f18?)
        runtime/map.go:871 +0x4eb fp=0xc004d29b90 sp=0xc004d29b20 pc=0x41002b
```
2022-07-07 17:04:25 -07:00
Harshavardhana
ae92521310
remove unnecessary nAgreed value in partial() func (#15242) 2022-07-07 13:45:34 -07:00
Harshavardhana
5802df4365
retry and resume decom operation upon retriable failures (#15244)
it is possible in a k8s-like system reading pool.bin
might not have quorum during startup, however, add
a way to retry after this failure.
2022-07-07 12:31:44 -07:00
Anis Elleuch
8d98282afd
Better reporting of total/free usable capacity of the cluster (#15230)
The current code uses approximation using a ratio. The approximation 
can skew if we have multiple pools with different disk capacities.

Replace the algorithm with a simpler one which counts data 
disks and ignore parity disks.
2022-07-06 13:29:49 -07:00
Harshavardhana
3af6073576
no 'replicate status' without replication config (#15233)
'replicate status' shouldn't be displaying historic
values unless replication config is present on the
relevant bucket.
2022-07-06 09:53:33 -07:00
Harshavardhana
2518af5f9e
fix: allow certain mutations on objects during decommissioning (#15231)
fix: allow certain mutation on objects during decommission

currently by mistake deletion of objects was skipped,
if the object resided on the pool being decommissioned.

delete's are okay to be allowed since decommission is
designed to run on a cluster with active I/O.
2022-07-06 09:53:16 -07:00
Harshavardhana
7b793d84c8
fix: calculate scanner metric paths for single drive (#15232)
Additionally use pathJoin() to avoid double `//`
in path names.
2022-07-06 07:48:38 -07:00
Aditya Manthramurthy
af9bc7ea7d
Add external IDP management Admin API for OpenID (#15152) 2022-07-05 18:18:04 -07:00
Klaus Post
ac055b09e9
Add detailed scanner metrics (#15161) 2022-07-05 14:45:49 -07:00
haslersn
df42914da6
Fix missing whitespace in error message for IncompleteBody (#15227) 2022-07-05 12:19:57 -07:00
Klaus Post
2471bdda00
fix: for DiskInfo call cache disk metrics (#15229)
Small uploads spend a significant amount of time (~5%) fetching disk info metrics. Also maps are allocated for each call.

Add a 100ms cache to disk metrics.
2022-07-05 11:02:30 -07:00
Harshavardhana
9d80ff5a05
fix: decommission delete markers for non-current objects (#15225)
versioned buckets were not creating the delete markers
present in the versioned stack of an object, this essentially
would stop decommission to succeed.

This PR fixes creating such delete markers properly during
a decommissioning process, adds tests as well.
2022-07-05 07:37:24 -07:00
Harshavardhana
b311abed31
decom IAM, Bucket metadata properly (#15220)
Current code incorrectly passed the
config asset object name while decommissioning,
make sure that we pass the right object name
to be hashed on the newer set of pools.

This PR fixes situations after a successful
decommission, the users and policies might go
missing due to wrong hashed set.
2022-07-04 14:02:54 -07:00
Harshavardhana
ce667ddae0
do not print errFileNotFound in entries.resolve() (#15216) 2022-07-04 06:40:46 -07:00
Harshavardhana
0fee993a4b
return appropriate error under 'decom status' (#15213)
fixes #15208
2022-07-01 16:21:23 -07:00
Poorna
0ea5c9d8e8
site healing: Skip stale iam asset updates from peer. (#15203)
Allow healing to apply IAM change only when peer
gave the most recent update.
2022-07-01 13:19:13 -07:00
Harshavardhana
63ac260bd5
Simplify Prometheus metrics gather (#15210) 2022-07-01 13:18:39 -07:00
Harshavardhana
f9a4ad7904
update banner with version+runtime (#15206) 2022-06-30 13:58:09 -07:00
Minio Trusted
e60b67d246 Revert "Tighten enforcement of object retention (#14993)"
This reverts commit 5e3010d455.

This commit causes regression on object locked buckets causine
delete-markers to be not created.
2022-06-30 13:06:32 -07:00
Klaus Post
9004d69c6f
Make ReqInfo concurrency safe (#15204)
Some read/writes of ReqInfo did not get appropriate locks, leading to races.

Make sure reading and writing holds appropriate locks.
2022-06-30 10:48:50 -07:00
Harshavardhana
8856a2d77b
finalize startup-banner and remove unnecessary logs (#15202) 2022-06-29 16:32:04 -07:00
Anis Elleuch
54a061bdda
Save minio version information centrally (#15181) 2022-06-29 14:45:49 -07:00
Poorna
7cc9286e0f
site healing: Skip stale bucket metadata updates from peer (#15186)
Allow healing to apply bucket metadata change only when peer
gave the most recent update.
2022-06-28 18:09:20 -07:00
Harshavardhana
2f25639ea0
update banner to reflect the final agreed UI (#15192) 2022-06-28 16:37:40 -07:00
Harshavardhana
2070c215a2
handle missing funcNames for handlers (#15188)
also use designated names for internal
calls

- storageREST calls are storageR
- lockREST calls are lockR
- peerREST calls are just peer

Named in this fashion to facilitate wildcard matches
by having prefixes of the same name.

Additionally, also enable funcNames for generic handlers
that return errors, currently we disable '<unknown>'
2022-06-28 05:04:10 -07:00
Harshavardhana
9c605ad153
allow support for parity '0', '1' enabling support for 2,3 drive setups (#15171)
allows for further granular setups

- 2 drives (1 parity, 1 data)
- 3 drives (1 parity, 2 data)

Bonus: allows '0' parity as well.
2022-06-27 20:22:18 -07:00
Anis Elleuch
b7c7e59dac
Revert proxying requests with precondition errors (#15180)
In a replicated setup, when an object is updated in one cluster but
still waiting to be replicated to the other cluster, GET requests with
if-match, and range headers will likely fail. It is better to proxy
requests instead.

Also, this commit avoids printing verbose logs about precondition &
range errors.
2022-06-27 14:03:44 -07:00
Harshavardhana
699cf6ff45
perform object sweep after equeue the latest CopyObject() (#15183)
keep it similar to PutObject/CompleteMultipart
2022-06-27 12:11:33 -07:00
Anis Elleuch
9201870f6c
Remove unnecessary code in WalkDir() (#15168)
Recalculating forward is useless. It is never used and it will be
computed again when calling scanDir() again.
2022-06-27 10:26:56 -07:00
Harshavardhana
6722f58668
save MinIO version with each version (8-bytes extra) (#15170)
store MinIO version along with each version in 'xl.meta'
for future purposes, can be used as ways to add specific
code for bug fixes if any.
2022-06-27 03:59:41 -07:00
Harshavardhana
7b9b7cef11
add license banner for GNU AGPLv3 (#15178)
Bonus: rewrite subnet re-use of Transport
2022-06-27 03:58:25 -07:00
Harshavardhana
bd099f5e71
fix: change timedValue to return the previously cached value (#15169)
fix: change timedvalue to return previous cached value

caller can interpret the underlying error and decide
accordingly, places where we do not interpret the
errors upon timedValue.Get() - we should simply use
the previously cached value instead of returning "empty".

Bonus: remove some unused code
2022-06-25 08:50:16 -07:00
Klaus Post
baf257adcb
fix: health client leak when calling UpdateAllTargets (#15167)
When `LoadBucketMetadataHandler` is called and `UpdateAllTargets` gets called.

Since targets are rebuilt we cancel all.
2022-06-24 11:12:52 -07:00
Anis Elleuch
4fd1986885
Trace all http requests (#15064)
Add a generic handler that adds a new tracing context to the request if
tracing is enabled. Other handlers are free to modify the tracing
context to update information on the fly, such as, func name, enable
body logging etc..

With this commit, requests like this 

```
curl -H "Host: ::1:3000" http://localhost:9000/
```

will be traced as well.
2022-06-23 23:19:24 -07:00
Harshavardhana
e1afac9439
reduce sha256 CPU usage by turning it off for speedtests (#15154)
continuation of the PR #15151, keeping signature v4 for
the headers however avoiding sha256 for the body.
2022-06-23 11:26:53 -07:00
Poorna
580d9db85e
Add APIs to import/export IAM data (#15014) 2022-06-23 09:25:15 -07:00
Anis Elleuch
42e2fd35d8
heal: Include dir markers when healing a fresh disk (#15158)
Directories markers are not healed when healing a new fresh disk. A
a proper fix would be moving object names encoding/decoding to erasure
object level but it is too late now since the object to set distribution is
calculated at a higher level.
2022-06-23 06:47:33 -07:00
Harshavardhana
1a40c7c27c
use signature-v2 for 'object perf' tests to avoid CPU using sha256 (#15151)
It is observed in a local 8 drive system the CPU seems to be
bottlenecked at

```
(pprof) top
Showing nodes accounting for 1385.31s, 88.47% of 1565.88s total
Dropped 1304 nodes (cum <= 7.83s)
Showing top 10 nodes out of 159
      flat  flat%   sum%        cum   cum%
      724s 46.24% 46.24%       724s 46.24%  crypto/sha256.block
   219.04s 13.99% 60.22%    226.63s 14.47%  syscall.Syscall
   158.04s 10.09% 70.32%    158.04s 10.09%  runtime.memmove
   127.58s  8.15% 78.46%    127.58s  8.15%  crypto/md5.block
    58.67s  3.75% 82.21%     58.67s  3.75%  github.com/minio/highwayhash.updateAVX2
    40.07s  2.56% 84.77%     40.07s  2.56%  runtime.epollwait
    33.76s  2.16% 86.93%     33.76s  2.16%  github.com/klauspost/reedsolomon._galMulAVX512Parallel84
     8.88s  0.57% 87.49%     11.56s  0.74%  runtime.step
     7.84s   0.5% 87.99%      7.84s   0.5%  runtime.memclrNoHeapPointers
     7.43s  0.47% 88.47%     22.18s  1.42%  runtime.pcvalue
```

Bonus changes:

- re-use transport for bucket replication clients, also site replication clients.
- use 32KiB buffer for all read and writes at transport layer seems to help
  TLS read connections.
- Do not have 'MaxConnsPerHost' this is problematic to be used with net/http
  connection pooling 'MaxIdleConnsPerHost' is enough.
2022-06-22 16:28:25 -07:00
Poorna
cb097e6b0a
CopyObject: fix read/write err on closed pipe (#15135)
Fixes: #15128
Regression from PR#14971
2022-06-21 19:20:11 -07:00
Poorna
1cfb03fb74
replication: Avoid proxying when precondition failed (#15134)
Proxying is not required when content is on this cluster and
does not meet pre-conditions specified in the request.

Fixes #15124
2022-06-21 14:11:35 -07:00
Harshavardhana
f293df647c
s3/zip: extract metadata properly for Zipped objects (#15123)
s3/zip: extra metadata properly for Zipped objects

fixes #15121
2022-06-21 14:11:12 -07:00
sota
e2e5bd6f19
fix: cant parse comment without '=' in environment file (#15130) 2022-06-21 10:37:15 -07:00
Andreas Auernhammer
cd7a0a9757
fips: simplify TLS configuration (#15127)
This commit simplifies the TLS configuration.
It inlines the FIPS / non-FIPS code.

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-06-21 07:54:48 -07:00
Anis Elleuch
b3eda248a3
Parallelize new disks healing of different erasure sets (#15112)
- Always reformat all disks when a new disk is detected, this will
  ensure new uploads to be written in new fresh disks
- Always heal all buckets first when an erasure set started to be healed
- Use a lock to prevent two disks belonging to different nodes but in
  the same erasure set to be healed in parallel
- Heal different sets in parallel

Bonus:
- Avoid logging errUnformattedDisk when a new fresh disk is inserted but
  not detected by healing mechanism yet (10 seconds lag)
2022-06-21 07:53:55 -07:00
Harshavardhana
486888f595
remove gateway banner and some other TODO loggers (#15125) 2022-06-21 05:25:40 -07:00
Poorna
b3ebc69034
improve error message for bucket metadata export/import API (#15120) 2022-06-20 16:13:45 -07:00
Harshavardhana
761dde2f1b
fix: add 'mc support inspect' support for single drive deployment (#15122) 2022-06-20 16:11:19 -07:00
Harshavardhana
2bb6a3f4d0
cleanup site replication error handling (#15113)
site replication errors were printed at
various random locations, repeatedly - this
PR attempts to remove double logging and
capture all of them at a common place.

This PR also enhances the code to show
partial success and errors as well.
2022-06-20 10:48:11 -07:00
Anis Elleuch
73733a8fb9
heal: Report correctly in multip-pools setup (#15117)
`mc admin heal -r <alias>` in a multi setup pools returns incorrectly
grey objects. The reason is that erasure-server-pools.HealObject() runs
HealObject in all pools and returns the result of the first nil
error. However, in the lower erasureObject level, HealObject() returns
nil if an object does not exist + missing error in each disk of the object
in that pool, therefore confusing mc.

Make erasureObject.HealObject() to return not found error in the lower
level, so at least erasureServerPools will know what pools to ignore.
2022-06-20 08:07:45 -07:00
Poorna
2fa1d8ac48
Add import/export APIs to migrate bucket metadata (#14929) 2022-06-18 06:55:39 -07:00
Poorna
8b9a19eef1
fix: typo in site replication version healing (#15103) 2022-06-17 16:43:24 -07:00
Aditya Manthramurthy
7f629df4d5
Add generic function to retrieve config value with metadata (#15083)
`config.ResolveConfigParam` returns the value of a configuration for any
subsystem based on checking env, config store, and default value. Also returns info
about which config source returned the value.

This is useful to return info about config params overridden via env in the user
APIs. Currently implemented only for OpenID subsystem, but will be extended for
others subsequently.
2022-06-17 11:39:21 -07:00
Anis Elleuch
98ddc3596c
Avoid CompleteMultipart freeze with unexpected network issue (#15102)
If sending a white space during a long S3 handler call fails,
the whitespace goroutine forgets to return a result to the caller.
Therefore, the complete multipart handler will be blocked.

Remember to send the header written result to the caller 
or/and close the channel.
2022-06-17 10:41:25 -07:00
Harshavardhana
5d23be6242
fix: ignore printing io.EOF during WalkDir() on concurrently modified objects (#15100)
fix: ignore print io.EOF during WalkDir() on concurrently modified objects
2022-06-17 08:23:47 -07:00
Poorna
55ee94bed0
initialize site replication subsys after loading metadata (#15099) 2022-06-16 19:00:35 -07:00
Harshavardhana
d228d29944
update '-v' flag behavior to include copyRight and license (#15097)
```
~ minio -v
minio version DEVELOPMENT.2022-06-16T20-40-14Z (commit-id=e083228e2a06bfdcd006fee28d449cd2b47c542a)
Runtime: go1.18.3 linux/amd64
Copyright (c) 2015-2022 MinIO, Inc.
Licence AGPLv3 <https://www.gnu.org/licenses/agpl-3.0.html>
```
2022-06-16 16:10:48 -07:00
Harshavardhana
013cc66d8e
add dataErrs for healing debug log (#15092) 2022-06-16 09:42:45 -07:00
Harshavardhana
c7ed6eee5e
fix: background local test also via channel (#15086)
current implementation for `standalone` setups
was blocking the `perf drive`.

Bonus: remove all old unused complicated code.
2022-06-15 14:51:42 -07:00
Harshavardhana
8082d1fed6
add bucket level S3 received/sent bytes (#15084)
adds bucket level metrics for bytes received and sent bytes on all S3 API calls.
2022-06-14 15:14:24 -07:00
Harshavardhana
d2a10dbe69
fix: simplify healthcheck code to freeze calls only once (#15082)
- currently subnet health check was freezing and calling
  locks at multiple locations, avoid them.

- throw errors if first attempt itself fails with no results
2022-06-14 11:22:07 -07:00
Anis Elleuch
14645142db
erasure-sd: Evaluate versioning Prefix in multi-delete objects (#15081)
Erasure SD DeleteObjects() is only inheriting bucket versioning status
from the handler layer.

Add the missing versioning prefix evaluation for each object that will
deleted.
2022-06-14 10:05:12 -07:00
Anis Elleuch
0d00f3a55b
kms: initialize after cli parsing (#15076)
KMS depends on the --certs-dir flag. 

Ensure KMS is initialized after loading the flag.
2022-06-13 13:06:13 -07:00
Anis Elleuch
dd53b287f2
sts: Avoid printing all STS errors (#15065)
Limit printing STS errors to 

- STS internal error
- STS not initialized
- STS upstream error
2022-06-11 12:55:32 -07:00
Harshavardhana
7413045f0e
fix: add missing minio_s3_requests_total (#15070)
PR #15052 caused a regression, add the missing metrics back.

Bonus:

- internode information should be only for distributed setups 
- update the dashboard to include 4xx and 5xx error panels.
2022-06-11 00:50:31 -07:00
Harshavardhana
af1944f28d
support reading systemctl config automatically on baremetal setups (#15066)
this allows for customers to use `mc admin service restart`
directly even when performing RPM, DEB upgrades. Upon such 'restart'
after upgrade MinIO will re-read the /etc/default/minio for any
newer environment variables.

As long as `MINIO_CONFIG_ENV_FILE=/etc/default/minio` is set, this
is honored.
2022-06-10 09:59:15 -07:00
Harshavardhana
214ea14f29
fix: for frozen calls return if client disconnects (#15062) 2022-06-09 05:06:47 -07:00
Anis Elleuch
5fb420c703
prometheus: Add S3 4xx and 5xx S3 monitoring (#15052)
Currently minio_s3_requests_errors_total covers 4xx and 
5xx S3 responses which can be confusing when s3 applications 
sent a lot of HEAD requests with obvious 404 responses or 
when the replication is enabled.

Add 
- minio_s3_requests_4xx_errors_total
- minio_s3_requests_5xx_errors_total

to help users monitor 4xx and 5xx HTTP status codes separately.
2022-06-08 11:22:34 -07:00
Harshavardhana
2420f6c000
fix: make metrics endpoint responsive by reducing the chatter (#15055)
peerOnlineCounter was making NxN calls to many peers, this
can be really long and tedious if there are random servers
that are going down.

Instead we should calculate online peers from the point of
view of "self" and return those online and offline appropriately
by performing a healthcheck.
2022-06-08 02:43:13 -07:00
Harshavardhana
b0d7332a0c
healthcheck cluster endpoint should honor write/readQuorum per pool (#15053) 2022-06-07 19:08:21 -07:00
Harshavardhana
d55efc791f
relax O_DIRECT in single drive mode if unsupported (#15045) 2022-06-07 06:44:01 -07:00
Minio Trusted
e2d4d097e7 do not print errors upon 'nil' err 2022-06-06 17:33:41 -07:00
Shireesh Anjal
4ce81fd07f
Add periodic callhome functionality (#14918)
* Add periodic callhome functionality

Periodically (every 24hrs by default), fetch callhome information and
upload it to SUBNET.

New config keys under the `callhome` subsystem:

enable - Set to `on` for enabling callhome. Default `off`
frequency - Interval between callhome cycles. Default `24h`

* Improvements based on review comments

- Update `enableCallhome` safely
- Rename pctx to ctx
- Block during execution of callhome
- Store parsed proxy URL in global subnet config
- Store callhome URL(s) in constants
- Use existing global transport
- Pass auth token to subnetPostReq
- Use `config.EnableOn` instead of `"on"`

* Use atomic package instead of lock

* Use uber atomic package

* Use `Cancel` instead of `cancel`

Co-authored-by: Harshavardhana <harsha@minio.io>

Co-authored-by: Harshavardhana <harsha@minio.io>
Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>
2022-06-06 16:14:52 -07:00
Harshavardhana
df9eeb7f8f
fix: do not log concurrently when multiple disks return errors (#15044)
since the values inside 'context' are mutated internally by
logger, make sure to log serially upon errors not concurrently.
2022-06-06 15:15:11 -07:00
Harshavardhana
31c4fdbf79
fix: resyncing 'null' version on pre-existing content (#15043)
PR #15041 fixed replicating 'null' version however
due to a regression from #14994 caused the target
versions for these 'null' versioned objects to have
different 'versions', this may cause confusion with
bi-directional replication and cause double replication.

This PR fixes this properly by making sure we replicate
the correct versions on the objects.
2022-06-06 15:14:56 -07:00
Harshavardhana
48e367ff7d
reject resync start on misconfigured replication rules (#15041)
we expect resync to start on buckets with replication
rule ExistingObjects enabled, if not we reject such
calls.
2022-06-06 02:54:39 -07:00
Anis Elleuch
fd02492cb7
avoid limits on the number of parallel trace/bucket notifications listeners (#14799)
Simplifies overall limits on the incoming listeners for notifications.

Fixes #14566
2022-06-05 14:29:12 -07:00
Harshavardhana
5afdc56796
allow single drive mode to run on root disk (#15037)
for practical reasons, allow root disk based installs for single drive mode.
2022-06-03 12:53:42 -07:00
Harshavardhana
c3e1da8e04
honor canceled context and do not leak on mergeChannels (#15034)
mergeEntryChannels has the potential to perpetually
wait on the results channel, context might be closed
and we did not honor the caller context canceling.
2022-06-03 05:59:02 -07:00
Anis Elleuch
20a753e2e5
Fix a possible service freeze after perf object (#15036)
The S3 service can be frozen indefinitely if a client or mc asks for object
perf API but quits early or has some networking issues. The reason is
that partialWrite() can block indefinitely.

This commit makes partialWrite() listens to context cancellation as well. It
also renames deadlinedCtx to healthCtx since it covers handler context
cancellation and not only not only the speedtest deadline.
2022-06-03 05:58:45 -07:00
Aditya Manthramurthy
61a7434379
Update --version option behavior (#15032)
- Add git commit ID
- Add go version
2022-06-02 18:40:53 -07:00
Poorna
29edb4ccfe
fix: site replication bucket heal to not panic if replication config is missing (#15025) 2022-06-02 12:34:03 -07:00
Anis Elleuch
d4e565e595
Add defensive check for one stream message size (#15029)
In a streaming response, the client knows the size of a streamed
message but never checks the message size. Add the check to error 
out if the response message is truncated.
2022-06-02 09:16:26 -07:00
Klaus Post
f7cecf0945
Make isIndexedMetaV2 return errors (#15012)
Indexed streams would be decoded by the legacy loader if there 
was an error loading it. Return an error when the stream is indexed 
and it cannot be loaded.

Fixes "unknown minor metadata version" on corrupted xl.meta files and 
returns an actual error.
2022-05-31 19:06:57 -07:00
Harshavardhana
52221db7ef
fix: for unexpected errors in reading versioning config panic (#14994)
We need to make sure if we cannot read bucket metadata
for some reason, and bucket metadata is not missing and
returning corrupted information we should panic such
handlers to disallow I/O to protect the overall state
on the system.

In-case of such corruption we have a mechanism now
to force recreate the metadata on the bucket, using
`x-minio-force-create` header with `PUT /bucket` API
call.

Additionally fix the versioning config updated state
to be set properly for the site replication healing
to trigger correctly.
2022-05-31 02:57:57 -07:00
Anis Elleuch
56a61bab56
test: Add GetObjectNInfo test with some outdated disks (#15004)
Add a test reading an object which has some old data in some outdated
disks, in a versioned and non-versioned bucket.
2022-05-30 17:52:59 -07:00
Harshavardhana
d480022711
fix: invalidate outdated disks appropriately during readAllXL (#15002)
readAllXL would return inlined data for outdated disks
causing "read" to return incorrect content to the client,

this PR fixes this behavior by making sure we skip such
outdated disks appropriately based on the latest ModTime
on the disk.
2022-05-30 12:43:54 -07:00
Harshavardhana
f1abb92f0c
feat: Single drive XL implementation (#14970)
Main motivation is move towards a common backend format
for all different types of modes in MinIO, allowing for
a simpler code and predictable behavior across all features.

This PR also brings features such as versioning, replication,
transitioning to single drive setups.
2022-05-30 10:58:37 -07:00
Harshavardhana
5792be71fa
fix: add timeouts to avoid goroutine leaks in net/http (#14995)
Following code can reproduce an unending go-routine buildup,
while keeping connections established due to lack of client
not closing the connections.

https://gist.github.com/harshavardhana/2d00e6f909054d2d2524c71485ad02e1

Without this PR all MinIO deployments can be put into
denial of service attacks, causing entire service to be
unavailable.

We bring in two timeouts at this stage to control such
go-routine build ups, new change

- IdleTimeout (to kill off idle connections)
- ReadHeaderTimeout (to kill off connections that are too slow)

This new change also brings two hidden options to make any
additional relevant changes if desired in some setups.
2022-05-30 06:24:51 -07:00
Poorna
5e3010d455
Tighten enforcement of object retention (#14993)
Ref issue#14991 - in the rare case that object in bucket under
retention has null version, make sure to enforce retention
rules.
2022-05-28 02:21:19 -07:00
Anis Elleuch
ccbf65c8e8
site-repl: Fix deadlock after an IAM loading error (#14990)
Fix forgotten IAM cache lock releases when reading some data from
disk/etcd

Co-authored-by: Anis Elleuch <anis@min.io>
2022-05-27 10:26:38 -07:00
Harshavardhana
9d07cde385
use crypto/sha256 only for FIPS 140-2 compliance (#14983)
It would seem like the PR #11623 had chewed more
than it wanted to, non-fips build shouldn't really
be forced to use slower crypto/sha256 even for
presumed "non-performance" codepaths. In MinIO
there are really no "non-performance" codepaths.
This assumption seems to have had an adverse
effect in certain areas of CPU usage.

This PR ensures that we stick to sha256-simd
on all non-FIPS builds, our most common build
to ensure we get the best out of the CPU at
any given point in time.
2022-05-27 06:00:19 -07:00
Aditya Manthramurthy
464b9d7c80
Add support for Identity Management Plugin (#14913)
- Adds an STS API `AssumeRoleWithCustomToken` that can be used to 
  authenticate via the Id. Mgmt. Plugin.
- Adds a sample identity manager plugin implementation
- Add doc for plugin and STS API
- Add an example program using go SDK for AssumeRoleWithCustomToken
2022-05-26 17:58:09 -07:00
Poorna
5c81d0d89a
site replication: heal missing/invalid replication config (#14979)
Validate remote target ARNs and heal any stale rules in
the replication config
2022-05-26 17:57:23 -07:00
Klaus Post
c0bf02b8b2
Ignore disks with 0 total space (#14981)
Ignore disks with 0 total

Mainly defensive to ensure no `/0` in percent calculation.
2022-05-26 06:01:50 -07:00
Harshavardhana
fd46a1c3b3
fix: some races when accessing ldap/openid config globally (#14978) 2022-05-25 18:32:53 -07:00
Aditya Manthramurthy
5aae7178ad
Fix listing of service and sts accounts (#14977)
Now returns user does not exist error if the user is not known to the system
2022-05-25 15:28:54 -07:00
Harshavardhana
dea8220eee
do not heal outdated disks > parityBlocks (#14976)
this PR also fixes a situation where incorrect
partsMetadata slice was used where fi.Data was
re-used from a single drive causing duplication
of the shards across all drives.

This happens for situations where shouldHeal()
returns true for all drives > parityBlocks.

To avoid this we should never attempt to heal on all
drives > parityBlocks, unless we are doing metadata
migration from xl.json -> xl.meta
2022-05-25 15:17:10 -07:00
Klaus Post
a4be0b88f6
Add server pool reserved space (#14974)
If one or more pools reach 85% usage in a set, we will only 
use pools that have more free space.

In case all pools are above 85% we allow all of them to be used 
with the regular distribution.
2022-05-25 13:20:20 -07:00
Poorna
d8101573be
Disallow deletion of ARN when under active replication (#14972)
fixes a regression from #12880
2022-05-24 19:40:45 -07:00
Klaus Post
41cdb357bb
Compensate for different server pool sizes (#14968)
When a server pool with a different number of sets is added they are 
not compensated when choosing a destination pool for new objects. 
This leads to the unbalanced placement of objects with smaller pools 
getting a bigger number of objects since we only compare the destination 
sets directly.

This change will compensate for differences in set sizes when choosing
the destination pool.

Different set sizes are already compensated by fewer disks.
2022-05-24 18:57:14 -07:00
Harshavardhana
38caddffe7
fix: copyObject on versioned bucket when updating metadata (#14971)
updating metadata with CopyObject on a versioned bucket
causes the latest version to be not readable, this PR fixes
this properly by handling the inline data bug fix introduced
in PR #14780.

This bug affects only inlined data.
2022-05-24 17:27:45 -07:00
Poorna
0e26f983d6
site replication: Allow replication rule edit (#14969)
Revert commit b42cfcea60 as too
restrictive
2022-05-24 13:27:33 -07:00
Anis Elleuch
77dc99e71d
Do not use inline data size in xl.meta quorum calculation (#14831)
* Do not use inline data size in xl.meta quorum calculation

Data shards of one object can different inline/not-inline decision
in multiple disks. This happens with outdated disks when inline
decision changes. For example, enabling bucket versioning configuration
will change the small file threshold.

When the parity of an object becomes low, GET object can return 503
because it is not unable to calculate the xl.meta quorum, just because
some xl.meta has inline data and other are not.

So this commit will be disable taking the size of the inline data into
consideration when calculating the xl.meta quorum.

* Add tests for simulatenous inline/notinline object

Co-authored-by: Anis Elleuch <anis@min.io>
2022-05-24 06:26:38 -07:00
Anis Elleuch
5041bfcb5c
replication healing: Fix typo when healing bucket quota info (#14966)
A typo is found in the replication healing code where an empty quota
configuration is sent to peer sites instead of the correct one.
.io>
2022-05-24 06:26:13 -07:00
Harshavardhana
f8650a3493
fetch bucket replication stats across peers in single call (#14956)
current implementation relied on recursively calling one bucket
at a time across all peers, this would be very slow and chatty
when there are 100's of buckets which would mean 100*peerCount
amount of network operations.

This PR attempts to reduce this entire call into `peerCount`
amount of network calls only. This functionality addresses also a
concern where the Prometheus metrics would significantly slow
down when one of the peers is offline.
2022-05-23 09:15:30 -07:00
Klaus Post
90a52a29c5
Fix WalkDir fallback hot loop (#14961)
Fix fallback hot loop

fd was never refreshed, leading to an infinite hot loop if a disk failed and the fallback disk fails as well.

Fix & simplify retry loop.

Fixes #14960
2022-05-23 06:28:46 -07:00
Poorna
8859c92f80
Relax site replication syncing of service accounts (#14955)
Synchronous replication of service/sts accounts can be relaxed
as site replication healing should catch up when peer clusters
are back online.
2022-05-20 19:09:11 -07:00
Anis Elleuch
01e5632949
mrf: Fix stale MRF data showed in heal info (#14953)
One usee reported having mc admin heal status output ETA increasing
by time. It turned out it is MRF that is not clearing its data due to a
bug in the code.

pendingItems is increased when an object is queued to be healed but
never decreasd when there is a healing error. This commit will decrease
pendingItems and pendingBytes even when there is an error to give
accurate reporting.
2022-05-20 07:33:18 -07:00
Anis Elleuch
95a6b2c991
Merge LDAP STS policy evaluation with the generic STS code (#14944)
If LDAP is enabled, STS security token policy is evaluated using a
different code path and expects ldapUser claim to exist in the security
token. This means other STS temporary accounts generated by any Assume
Role function, such as AssumeRoleWithCertificate, won't be allowed to do any
operation as these accounts do not have LDAP user claim.

Since IsAllowedLDAPSTS() is similar to IsAllowedSTS(), this commit will
merge both.

Non harmful changes:
- IsAllowed for LDAP will start supporting RoleARN claim
- IsAllowed for LDAP will not check for parent claim anymore. This check doesn't
  seem to be useful since all STS login compare access/secret/security-token
  with the one saved in the disk.
- LDAP will support $username condition in policy documents.

Co-authored-by: Anis Elleuch <anis@min.io>
Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>
2022-05-19 11:06:55 -07:00
Harshavardhana
30c9e50701
make sure to ignore expected errors and dirname deletes (#14945) 2022-05-18 17:58:19 -07:00
Aditya Manthramurthy
9aadd725d2
Avoid calling .Reset() on active timer (#14941)
.Reset() documentation states:

    For a Timer created with NewTimer, Reset should be invoked only on stopped
    or expired timers with drained channels.

This change is just to comply with this requirement as there might be some
runtime dependent situation that might lead to unexpected behavior.
2022-05-18 15:37:58 -07:00
Harshavardhana
6cfb1cb6fd
fix: timer usage across codebase (#14935)
it seems in some places we have been wrongly using the
timer.Reset() function, nicely exposed by an example
shared by @donatello https://go.dev/play/p/qoF71_D1oXD

this PR fixes all the usage comprehensively
2022-05-17 22:42:59 -07:00
Harshavardhana
2dc8ac1e62
allow IAM cache load to be granular and capture missed state (#14930)
anything that is stuck on the disk today can cause latency
spikes for all incoming S3 I/O, we need to have this
de-coupled so that we can make sure that latency in loading
credentials are not reflected back to the S3 API calls.

The approach this PR takes is by checking if the calls were
updated just in case when the IAM load was in progress,
so that we can use merge instead of "replacement" to avoid
missing state.
2022-05-17 19:58:47 -07:00
Harshavardhana
040ac5cad8
fix: when logger queue is full exit quickly upon doneCh (#14928)
Additionally only reload requested sub-system not everything
2022-05-16 16:10:51 -07:00
Harshavardhana
03f8b25b50
disable connectDisks loop under testing (#14920)
avoids races during tests, keeps tests predictable
2022-05-16 05:36:00 -07:00
Aditya Manthramurthy
f28a8eca91
Add Access Management Plugin tests with OpenID (#14919) 2022-05-13 12:48:02 -07:00
Anis Elleuch
ca69e54cb6
tests: Fix sporadic failure of TestXLStorageDeleteFile (#14911)
The test expects from DeleteFile to return errDiskNotFound when the disk
is not available. It calls os.RemoveAll() to remove one disk after XL storage
initialization. However, this latter contains some goroutines which can
race with os.RemoveAll() and then the test fails sporadically with
returning random errors.

The commit will tweak the initialization routine of the XL storage to
only run deletion of temporary and metacache data in the  background,
so TestXLStorageDeleteFile won't fail anymore.
2022-05-12 15:24:58 -07:00
Aditya Manthramurthy
4629abd5a2
Add tests for Access Management Plugin (#14909) 2022-05-12 15:24:19 -07:00
Harshavardhana
dc99f4a7a3
allow bucket to be listed when GetBucketLocation is enabled (#14903)
currently, we allowed buckets to be listed from the
API call if and when the user has ListObject()
permission at the global level, this is okay to be
extended to GetBucketLocation() as well since

GetBucketLocation() is a "read" call and allowing "reads"
on a bucket has an implicit assumption that ListBuckets()
should be allowed.

This makes discoverability of access for read-only users
becomes easier or users with specific restrictions on their
policies.
2022-05-12 10:46:20 -07:00
Harshavardhana
9341201132
logger lock should be more granular (#14901)
This PR simplifies few things by splitting
the locks between audit, logger targets to
avoid potential contention between them.

any failures inside audit/logger HTTP
targets must only log to console instead
of other targets to avoid cyclical dependency.

avoids unneeded atomic variables instead
uses RWLock to differentiate a more common
read phase v/s lock phase.
2022-05-12 07:20:58 -07:00
Krishnan Parthasarathi
88dd83a365
lifecycle: Set opts.VersionSuspended when expiring objects (#14902) 2022-05-12 06:09:24 -07:00
Harshavardhana
60d0611ac2
use BadRequest HTTP status instead of Conflict for certain errors (#14900)
PutBucketVersioning API should return BadRequest for errors
instead of Conflict, Conflict is used for "AlreadyExists"
resource situations.
2022-05-11 13:44:16 -07:00
Harshavardhana
f939222942
add support for extra prometheus labels (#14899)
fixes #14353
2022-05-11 13:04:53 -07:00
Krishna Srinivas
e34ca9acd1
retry each object decom upto 3 times, in-case of failure (#14861) 2022-05-11 11:37:32 -07:00
Aditya Manthramurthy
83071a3459
Add support for Access Management Plugin (#14875)
- This change renames the OPA integration as Access Management Plugin - there is
nothing specific to OPA in the integration, it is just a webhook.

- OPA configuration is automatically migrated to Access Management Plugin and
OPA specific configuration is marked as deprecated.

- OPA doc is updated and moved.
2022-05-10 17:14:55 -07:00
Anis Elleuch
edf364bf21
tracing: Add disk path to storage tracing (#14883)
Example:

2022-05-09T17:14:04:000 [STORAGE] storage.ListVols 127.0.0.1:9000 /tmp/xl/2 / 227.834µs
2022-05-09T17:14:04:000 [STORAGE] storage.ListVols 127.0.0.1:9000 /tmp/xl/4 / 236.042µs
2022-05-09T17:14:04:000 [STORAGE] storage.ListVols 127.0.0.1:9000 /tmp/xl/3 / 130.958µs
2022-05-09T17:14:04:000 [STORAGE] storage.ListVols 127.0.0.1:9000 /tmp/xl/1 / 102.875µs
2022-05-10 07:48:07 -07:00
Anis Elleuch
1e037883b0
pools: GetObjectNInfo should cover locking during object read (#14887)
In case of multi-pools setup, GetObjectNInfo returns a GetObjectReader
but it unlocks the read lock when quitting GetObjectNInfo. This should
not happen, unlock should only happen when GetObjectReader is closed.
2022-05-10 07:47:40 -07:00
Klaus Post
d909f167ff
tests: Add localLocker RUnlock test (#14882) 2022-05-09 09:55:52 -07:00
Harshavardhana
62aa42cccf
avoid replication proxy on version excluded paths (#14878)
no need to attempt proxying objects that were
never replicated, but do have local `null`
versions on them.
2022-05-08 16:50:31 -07:00
Harshavardhana
5cffd3780a
fix: multiple fixes in prefix exclude implementation (#14877)
- do not need to restrict prefix exclusions that do not
  have `/` as suffix, relax this requirement as spark may
  have staging folders with other autogenerated characters
  , so we are better off doing full prefix March and skip. 

- multiple delete objects was incorrectly creating a
  null delete marker on a versioned bucket instead of
  creating a proper versioned delete marker.

- do not suspend paths on the excluded prefixes during
  delete operations to avoid creating `null` delete markers,
  honor suspension of versioning only at bucket level for
  delete markers.
2022-05-07 22:06:44 -07:00
Harshavardhana
def75ffcfe
allow versioning config changes under site replication (#14876)
PR #14828 introduced prefix-level exclusion of versioning
and replication - however our site replication implementation
since it defaults versioning on all buckets did not allow
changing versioning configuration once the bucket was created.

This PR changes this and ensures that such changes are honored
and also propagated/healed across sites appropriately.
2022-05-07 18:39:40 -07:00
Krishnan Parthasarathi
ad8e611098
feat: implement prefix-level versioning exclusion (#14828)
Spark/Hadoop workloads which use Hadoop MR 
Committer v1/v2 algorithm upload objects to a 
temporary prefix in a bucket. These objects are 
'renamed' to a different prefix on Job commit. 
Object storage admins are forced to configure 
separate ILM policies to expire these objects 
and their versions to reclaim space.

Our solution:

This can be avoided by simply marking objects 
under these prefixes to be excluded from versioning, 
as shown below. Consequently, these objects are 
excluded from replication, and don't require ILM 
policies to prune unnecessary versions.

-  MinIO Extension to Bucket Version Configuration
```xml
<VersioningConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> 
        <Status>Enabled</Status>
        <ExcludeFolders>true</ExcludeFolders>
        <ExcludedPrefixes>
          <Prefix>app1-jobs/*/_temporary/</Prefix>
        </ExcludedPrefixes>
        <ExcludedPrefixes>
          <Prefix>app2-jobs/*/__magic/</Prefix>
        </ExcludedPrefixes>

        <!-- .. up to 10 prefixes in all -->     
</VersioningConfiguration>
```
Note: `ExcludeFolders` excludes all folders in a bucket 
from versioning. This is required to prevent the parent 
folders from accumulating delete markers, especially
those which are shared across spark workloads 
spanning projects/teams.

- To enable version exclusion on a list of prefixes

```
mc version enable --excluded-prefixes "app1-jobs/*/_temporary/,app2-jobs/*/_magic," --exclude-prefix-marker myminio/test
```
2022-05-06 19:05:28 -07:00
Shireesh Anjal
3ec1844e4a
return kubernetes info in health report (#14865) 2022-05-06 12:41:07 -07:00
Poorna
523670ba0d
fix: site removal API error handling (#14870)
when the site is being removed is missing replication config. This can
happen when a new deployment is brought in place of a site that
is lost/destroyed and needs to delink old deployment from site
replication.
2022-05-06 12:40:34 -07:00
Harshavardhana
35dea24ffd
fix: console log peer API from its broken implementation (#14873)
console logging peer API was broken as it would
timeout after 15minutes, this never really worked
beyond this value and basically failed to provide
the streaming "log" functionality that was expected
from this implementation.

also fix convoluted channel handling by keeping things
simple, this is rewritten.
2022-05-06 12:39:58 -07:00
Harshavardhana
c7df1ffc6f
avoid concurrent reads and writes to opts.UserDefined (#14862)
do not modify opts.UserDefined after object-handler
has set all the necessary values, any mutation needed
should be done on a copy of this value not directly.

As there are other pieces of code that access opts.UserDefined
concurrently this becomes challenging.

fixes #14856
2022-05-05 04:14:41 -07:00
Aditya Manthramurthy
2b7e75e079
Add OPA doc and remove deprecation marking (#14863) 2022-05-04 23:53:42 -07:00
Anis Elleuch
44a3b58e52
Add audit log for decommissioning (#14858) 2022-05-04 00:45:27 -07:00
Anis Elleuch
46de9ac03e
Decom: Easily restart decommission when it is done (#14855)
When a decommission task is successfully completed, failed, or canceled,
this commit allows restarting the decommission again. Restarting is not
allowed when there is an ongoing decommission task.
2022-05-03 13:36:08 -07:00
Harshavardhana
f0462322fd
fix: remove embedded-policy as requested by the user (#14847)
this PR introduces a few changes such as

- sessionPolicyName is not reused in an extracted manner
  to apply policies for incoming authenticated calls,
  instead uses a different key to designate this
  information for the callers.

- this differentiation is needed to ensure that service
  account updates do not accidentally store JSON representation
  instead of base64 equivalent on the disk.

- relax requirements for Deleting a service account, allow
  deleting a service account that might be unreadable, i.e
  a situation where the user might have removed session policy 
  which now carries a JSON representation, making it unparsable.

- introduce some constants to reuse instead of strings.

fixes #14784
2022-05-02 17:56:19 -07:00
Klaus Post
c59d2a6288
Log Range Header if present in the request (#14851)
Add Range header as param to easier debug of Range requests.
2022-05-02 10:37:26 -07:00
Klaus Post
3e3ff2a70b
Check error status codes (#14850)
If an invalid status code is generated from an error we risk panicking. Even if there 
are no potential problems at the moment we should prevent this in the future.

Add safeguards against this.

Sample trace:

```
May 02 06:41:39   minio[52806]: panic: "GET /20180401230655.PDF": invalid WriteHeader code 0
May 02 06:41:39   minio[52806]: goroutine 16040430822 [running]:
May 02 06:41:39   minio[52806]: runtime/debug.Stack(0xc01fff7c20, 0x25c4b00, 0xc0490e4080)
May 02 06:41:39   minio[52806]:         runtime/debug/stack.go:24 +0x9f
May 02 06:41:39   minio[52806]: github.com/minio/minio/cmd.setCriticalErrorHandler.func1.1(0xc022048800, 0x4f38ab0, 0xc0406e0fc0)
May 02 06:41:39   minio[52806]:         github.com/minio/minio/cmd/generic-handlers.go:469 +0x85
May 02 06:41:39   minio[52806]: panic(0x25c4b00, 0xc0490e4080)
May 02 06:41:39   minio[52806]:         runtime/panic.go:965 +0x1b9
May 02 06:41:39   minio[52806]: net/http.checkWriteHeaderCode(...)
May 02 06:41:39   minio[52806]:         net/http/server.go:1092
May 02 06:41:39   minio[52806]: net/http.(*response).WriteHeader(0xc0406e0fc0, 0x0)
May 02 06:41:39   minio[52806]:         net/http/server.go:1126 +0x718
May 02 06:41:39   minio[52806]: github.com/minio/minio/internal/logger.(*ResponseWriter).WriteHeader(0xc032fa3ea0, 0x0)
May 02 06:41:39   minio[52806]:         github.com/minio/minio/internal/logger/audit.go:116 +0xb1
May 02 06:41:39   minio[52806]: github.com/minio/minio/internal/logger.(*ResponseWriter).WriteHeader(0xc032fa3f40, 0x0)
May 02 06:41:39   minio[52806]:         github.com/minio/minio/internal/logger/audit.go:116 +0xb1
May 02 06:41:39   minio[52806]: github.com/minio/minio/internal/logger.(*ResponseWriter).WriteHeader(0xc002ce8000, 0x0)
May 02 06:41:39   minio[52806]:         github.com/minio/minio/internal/logger/audit.go:116 +0xb1
May 02 06:41:39   minio[52806]: github.com/minio/minio/cmd.writeResponse(0x4f364a0, 0xc002ce8000, 0x0, 0xc0443b86c0, 0x1cb, 0x224, 0x2a9651e, 0xf)
May 02 06:41:39   minio[52806]:         github.com/minio/minio/cmd/api-response.go:736 +0x18d
May 02 06:41:39   minio[52806]: github.com/minio/minio/cmd.writeErrorResponse(0x4f44218, 0xc069086ae0, 0x4f364a0, 0xc002ce8000, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc00656afc0)
May 02 06:41:39   minio[52806]:         github.com/minio/minio/cmd/api-response.go:798 +0x306
May 02 06:41:39   minio[52806]: github.com/minio/minio/cmd.objectAPIHandlers.getObjectHandler(0x4b73768, 0x4b73730, 0x4f44218, 0xc069086ae0, 0x4f82090, 0xc002d80620, 0xc040e03885, 0xe, 0xc040e03894, 0x61, ...)
May 02 06:41:39   minio[52806]:         github.com/minio/minio/cmd/object-handlers.go:456 +0x252c
```
2022-05-02 10:36:29 -07:00
Harshavardhana
16bc11e72e
fix: disallow newer policies, users & groups with space characters (#14845)
space characters at the beginning or at the end can lead to
confusion under various UI elements in differentiating the
actual name of "policy, user or group" - to avoid this behavior
this PR onwards we shall reject such inputs for newer entries.

existing saved entries will behave as is and are going to be
operable until they are removed/renamed to something more
meaningful.
2022-05-02 09:27:35 -07:00
Harshavardhana
2719f1efaa
fix: reject invalid r.Host headers (#14846)
r.Host headers can come in unparsed that may contain
invalid hostnames, reject such requests as invalid.

This is a continuation fix from #14844
2022-05-02 04:42:41 -07:00
Harshavardhana
39ac62a1a1
fix: panic in browser redirect handler for unexpected r.Host (#14844)
```
panic: "GET /": invalid hostname
goroutine 148 [running]:
runtime/debug.Stack()
	runtime/debug/stack.go:24 +0x65
github.com/minio/minio/cmd.setCriticalErrorHandler.func1.1()
	github.com/minio/minio/cmd/generic-handlers.go:469 +0x8e
panic({0x2201f00, 0xc001f1ddd0})
	runtime/panic.go:1038 +0x215
github.com/minio/pkg/net.URL.String({{0x25aa417, 0x5}, {0x0, 0x0}, 0x0, {0xc000174380, 0xd7}, {0x0, 0x0}, {0x0, ...}, ...})
	github.com/minio/pkg@v1.1.23/net/url.go:97 +0xfe
github.com/minio/minio/cmd.setBrowserRedirectHandler.func1({0x49af080, 0xc0003c20e0}, 0xc00002ea00)
	github.com/minio/minio/cmd/generic-handlers.go:136 +0x118
net/http.HandlerFunc.ServeHTTP(0xc00002ea00, {0x49af080, 0xc0003c20e0}, 0xa)
	net/http/server.go:2047 +0x2f
github.com/minio/minio/cmd.setAuthHandler.func1({0x49af080, 0xc0003c20e0}, 0xc00002ea00)
	github.com/minio/minio/cmd/auth-handler.go:525 +0x3d8
net/http.HandlerFunc.ServeHTTP(0xc00002e900, {0x49af080, 0xc0003c20e0}, 0xc001f33701)
	net/http/server.go:2047 +0x2f
github.com/gorilla/mux.(*Router).ServeHTTP(0xc0025d0780, {0x49af080, 0xc0003c20e0}, 0xc00002e800)
	github.com/gorilla/mux@v1.8.0/mux.go:210 +0x1cf
github.com/rs/cors.(*Cors).Handler.func1({0x49af080, 0xc0003c20e0}, 0xc00002e800)
	github.com/rs/cors@v1.7.0/cors.go:219 +0x1bd
net/http.HandlerFunc.ServeHTTP(0x0, {0x49af080, 0xc0003c20e0}, 0xc00068d9f8)
	net/http/server.go:2047 +0x2f
github.com/minio/minio/cmd.setCriticalErrorHandler.func1({0x49af080, 0xc0003c20e0}, 0x4a5cd3)
	github.com/minio/minio/cmd/generic-handlers.go:476 +0x83
net/http.HandlerFunc.ServeHTTP(0x72, {0x49af080, 0xc0003c20e0}, 0x0)
	net/http/server.go:2047 +0x2f
github.com/minio/minio/internal/http.(*Server).Start.func1({0x49af080, 0xc0003c20e0}, 0x10000c001f1dda0)
	github.com/minio/minio/internal/http/server.go:105 +0x1b6
net/http.HandlerFunc.ServeHTTP(0x0, {0x49af080, 0xc0003c20e0}, 0x46982e)
	net/http/server.go:2047 +0x2f
net/http.serverHandler.ServeHTTP({0xc003dc1950}, {0x49af080, 0xc0003c20e0}, 0xc00002e800)
	net/http/server.go:2879 +0x43b
net/http.(*conn).serve(0xc000514d20, {0x49cfc38, 0xc0010c0e70})
	net/http/server.go:1930 +0xb08
created by net/http.(*Server).Serve
	net/http/server.go:3034 +0x4e8
```
2022-05-01 13:45:45 -07:00
Harshavardhana
85f3a9f3b0 Remove Azure gateway implementation (#14418)
refer #14331
2022-04-29 12:51:23 -07:00
Klaus Post
13ba4b433d
Clean up cpuio profiling (#14838)
Don't start regular cpu profile as well. Use bed madmin const.
2022-04-29 09:35:42 -07:00
Aditya Manthramurthy
0e502899a8
Add support for multiple OpenID providers with role policies (#14223)
- When using multiple providers, claim-based providers are not allowed. All
providers must use role policies.

- Update markdown config to allow `details` HTML element
2022-04-28 18:27:09 -07:00
Harshavardhana
424b44c247
allow changing server command line from http->https (#14832)
this is allowed as long as order is preserved as is
on an existing setup, the new command line is updated
in `pool.bin` to facilitate future decommission's on
these pools.
2022-04-28 16:27:53 -07:00
Harshavardhana
01a71c366d
allow service accounts and temp credentials site-level healing (#14829)
This PR introduces support for site level

- service account healing
- temporary credentials healing
2022-04-28 02:39:00 -07:00
Harshavardhana
5a9a898ba2
allow forcibly creating metadata on buckets (#14820)
introduce x-minio-force-create environment variable
to force create a bucket and its metadata as required,
it is useful in some situations when bucket metadata
needs recovery.
2022-04-27 04:44:07 -07:00
Harshavardhana
c56a139fdc
fix: support decommissioning directory objects (#14822)
improvements in this PR include

- decommission objects that have __XLDIR__ suffix
- decommission objects that have `null` version on
  a versioned bucket.
- make sure to look for any "decom" failures to ensure
  that we do not wrong conclude decom as complete without
  all files getting copied over.
- break out eagerly upon first error for objects with
  multiple versions, leave the object as is for support
  debugging and analysis.
2022-04-26 20:06:41 -07:00
Anis Elleuch
df50eda811
Add number of versions in server info API (#14812)
The goal is to show the number of versions in the server info API.
2022-04-25 22:04:10 -07:00
Aditya Manthramurthy
f5d3313210
Increase context timeout for IAM concurrency test (#14817)
- This should reduce failures in Windows CI
2022-04-25 20:14:20 -07:00
Daniel Valdivia
b7dd61f6bc
Fix double slash subpath for console (#14815)
Signed-off-by: Daniel Valdivia <18384552+dvaldivia@users.noreply.github.com>
2022-04-25 13:05:56 -07:00
Harshavardhana
0cc993f403 Remove GCS, HDFS gateway implementations #14418
refer #14331
2022-04-24 10:19:17 -07:00
Poorna
3a64580663
Add support for site replication healing (#14572)
heal bucket metadata and IAM entries for
sites participating in site replication from
the site with the most updated entry.

Co-authored-by: Harshavardhana <harsha@minio.io>
Co-authored-by: Aditya Manthramurthy <aditya@minio.io>
2022-04-24 02:36:31 -07:00
Harshavardhana
d087e28dce
start using t.SetEnv instead of os.Setenv (#14787) 2022-04-23 15:33:45 -07:00
Klaus Post
96adfaebe1
Make storage class config dynamic (#14791)
Updating the storage class is already thread safe, so we can do this safely.
2022-04-21 12:07:33 -07:00
Aditya Manthramurthy
ddf84f8257
fix: concurrency bug in site-replication (#14786)
The site replication status call was using a loop iteration variable sent
directly into go-routines instead of being passed as an argument. As the
variable is being updated in the loop, previously launched go routines do not
necessarily use the value at the time they were launched.
2022-04-20 16:20:07 -07:00
Harshavardhana
507f993075
attempt to real resolve when there is a quorum failure on reads (#14613) 2022-04-20 12:49:05 -07:00
Harshavardhana
73a6a60785
fix: replication deleteObject() regression and CopyObject() behavior (#14780)
This PR fixes two issues

- The first fix is a regression from #14555, the fix itself in #14555
  is correct but the interpretation of that information by the
  object layer code for "replication" was not correct. This PR
  tries to fix this situation by making sure the "Delete" replication
  works as expected when "VersionPurgeStatus" is already set.

  Without this fix, there is a DELETE marker created incorrectly on
  the source where the "DELETE" was triggered.

- The second fix is perhaps an older problem started since we inlined-data
  on the disk for small objects, CopyObject() incorrectly inline's
  a non-inlined data. This is due to the fact that we have code where
  we read the `part.1` under certain conditions where the size of the
  `part.1` is less than the specific "threshold".

  This eventually causes problems when we are "deleting" the data that
  is only inlined, which means dataDir is ignored leaving such
  dataDir on the disk, that looks like an inconsistent content on
  the namespace.

fixes #14767
2022-04-20 10:22:05 -07:00
Anis Elleuch
cf4cf58faf
Do not allow parallel upgrade in one server (#14782)
It is wasteful to allow parallel upgrades of MinIO server. This also generates
 weird error invoked by selfupdate module when it happens such as:

'rename /opt/bin/.minio.old /opt/bin/..minio.old.old'
2022-04-20 06:18:21 -07:00
polaris-megrez
6bc3c74c0c
honor client context in IAM user/policy listing calls (#14682) 2022-04-19 09:00:19 -07:00
Harshavardhana
598ce1e354
supply prefix filtering when necessary (#14772)
currently filterPefix was never used and set
that would filter out entries when needed
when `prefix` doesn't end with `/` - this
often leads to objects getting Walked(), Healed()
that were never requested by the caller.
2022-04-19 08:20:48 -07:00
Harshavardhana
7e248fc0ba
wait on parallel decom to complete before returning (#14764)
without this wait there is a potential for some objects
that are in actively being decommissioned would cancel,
however the decommission status might wrongly conclude
this as "Complete".

To avoid this make sure to add waitgroups on the parallel
workers, allowing parallel copies to complete fully before
we return.
2022-04-18 13:26:29 -07:00
Daniel Valdivia
c526fa9119
Support console UI access at a subpath on a subdomain (#14761)
fixes #14285 

Signed-off-by: Daniel Valdivia <18384552+dvaldivia@users.noreply.github.com>
2022-04-17 16:01:49 -07:00
Anis Elleuch
a5b3548ede
Bring back listing LDAP users temporarly (#14760)
In previous releases, mc admin user list would return the list of users
that have policies mapped in IAM database. However, this was removed but
this commit will bring it back until we revamp this.
2022-04-15 21:26:02 -07:00
Harshavardhana
8318aa0113
cancel active routine only after metadata has been saved (#14757)
currently updated pool.bin was not saved properly, that would
lead to unable to remove a pool upon a successful decommission.

fixes #14756
2022-04-15 13:16:15 -07:00
Harshavardhana
e69c42956b
fix: IAM reload should only list at config/iam/ precisely (#14753) 2022-04-15 12:12:45 -07:00
Aditya Manthramurthy
e8e48e4c4a
S3 select switch to new parquet library and reduce locking (#14731)
- This change switches to a new parquet library
- SelectObjectContent now takes a single lock at the beginning and holds it
during the operation. Previously the operation took a lock every time the
parquet library performed a Seek on the underlying object stream.
- Add basic support for LogicalType annotations for timestamps.
2022-04-14 06:54:47 -07:00
Harshavardhana
2a6a40e93b
enable go1.18.x builds (#14746) 2022-04-13 14:21:55 -07:00
Harshavardhana
eda34423d7 update gofumpt -w - new changes 2022-04-13 12:00:11 -07:00
Shireesh Anjal
5c53620a72
Include speedtest as part of healthinfo api (#14696)
Execute the object, drive and net speedtests as part of the healthinfo
(if requested by the client), and include their result in the response.

The options for the speedtests have been picked from the default values
used by `mc support perf` command.
2022-04-12 13:17:44 -07:00
Krishna Srinivas
5f94cec1e2
Allow parallel decom migration threads to be more than erasure sets (#14733) 2022-04-12 10:49:53 -07:00
Krishnan Parthasarathi
28d3ad3ada
Honor object retention when applying ILM policies (#14732) 2022-04-11 21:55:56 -07:00
Aditya Manthramurthy
66b14a0d32
Fix service account privilege escalation (#14729)
Ensure that a regular unprivileged user is unable to create service accounts for other users/root.
2022-04-11 15:30:28 -07:00
Harshavardhana
153a612253
fetch bucket retention config once for ILM evalAction (#14727)
This is mainly an optimization, does not change any
existing functionality.
2022-04-11 13:25:32 -07:00
Krishnan Parthasarathi
1a1b55e133
Add support for minio tier type (#14468) 2022-04-11 13:24:40 -07:00
Harshavardhana
e77ad3f9bb
make sure to pass Lifecycle if set for List filtering (#14722)
PR #14606 never really passed the Lifecycle filter
down to the listing callers to ensure skipping the
entries.
2022-04-10 11:14:52 -07:00
Harshavardhana
4ce86ff5fa
align atomic variables once more for 32bit (#14721) 2022-04-09 22:19:44 -07:00
Harshavardhana
601a744159
pass the necessary query params for remote NSSCanner (#14719)
fixes a regression from #14464
2022-04-09 08:09:52 -07:00
Poorna
a1b01e6d5f
Combine profiling start/stop APIs into one (#14662)
Take profile duration as a query parameter for profile API
2022-04-08 12:44:35 -07:00
Krishna Srinivas
48594617b5
Parallelize decommissioning process (#14704) 2022-04-07 23:19:13 -07:00
Krishna Srinivas
b35b9dcff7
Use S3 client for uplooads/downloads during perf test (#14570) 2022-04-07 21:20:40 -07:00
Lenin Alevski
a3e317773a
Skip commented lines when parsing MinIO configuration file (#14710)
Signed-off-by: Lenin Alevski <alevsk.8772@gmail.com>
2022-04-07 16:02:51 -07:00
Anis Elleuch
16431d222c
heal: Enable periodic bitrot scan configuration (#14464) 2022-04-07 08:10:40 -07:00
Harshavardhana
ee49a23220
resume/start decommission on the first node of the pool under decommission (#14705)
Additionally fixes

- IsSuspended() can use read locks
- Avoid double cancels panic on canceler
2022-04-06 23:42:05 -07:00
Harshavardhana
a9eef521ec skip config/history/ during IAM load (#14698) 2022-04-06 21:03:41 -07:00
Klaus Post
901d33b59c
Tweak listing quorum (#14703)
Always go for 50% quorum, and only use non-healing disks.

Fixes #14635
2022-04-06 12:24:21 -07:00
Harshavardhana
00ebea2536
skip config/history/ during IAM load (#14698) 2022-04-05 19:00:59 -07:00
Klaus Post
dedf9774c7
Set inspect-input.txt modtime (#14688)
If no time given, use current time.
2022-04-05 13:06:10 -07:00
Andreas Auernhammer
6b1c62133d
listing: improve listing of encrypted objects (#14667)
This commit improves the listing of encrypted objects:
 - Use `etag.Format` and `etag.Decrypt`
 - Detect SSE-S3 single-part objects in a single iteration
 - Fix batch size to `250`
 - Pass request context to `DecryptAll` to not waste resources
   when a client cancels the operation.

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-04-04 11:42:03 -07:00
Anis Elleuch
d4251b2545
Remove unnecessary log printing (#14685)
Co-authored-by: Anis Elleuch <anis@min.io>
2022-04-04 11:10:06 -07:00
Andreas Auernhammer
b9d1698d74
etag: add Format and Decrypt functions (#14659)
This commit adds two new functions to the
internal `etag` package:
 - `ETag.Format`
 - `Decrypt`

The `Decrypt` function decrypts an encrypted
ETag using a decryption key. It returns not
encrypted / multipart ETags unmodified.

The `Decrypt` function is mainly used when
handling SSE-S3 encrypted single-part objects.
In particular, the ETag of an SSE-S3 encrypted
single-part object needs to be decrypted since
S3 clients expect that this ETag is equal to the
content MD5.

The `ETag.Format` method also covers SSE ETag handling.
MinIO encrypts all ETags of SSE single part objects.
However, only the ETag of SSE-S3 encrypted single part
objects needs to be decrypted.
The ETag of an SSE-C or SSE-KMS single part object
does not correspond to its content MD5 and can be
a random value.
The `ETag.Format` function formats an ETag such that
it is an AWS S3 compliant ETag. In particular, it
returns non-encrypted ETags (single / multipart)
unmodified. However, for encrypted ETags it returns
the trailing 16 bytes as ETag. For encrypted ETags
the last 16 bytes will be a random value.

The main purpose of `Format` is to format ETags
such that clients accept them as well-formed AWS S3
ETags.
It differs from the `String` method since `String`
will return string representations for encrypted
ETags that are not AWS S3 compliant.

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-04-03 13:29:13 -07:00
Shireesh Anjal
7c696e1cb6
Write deployment id to health report at the start (#14673)
The deployment id was being written to the health report towards the end
of the handler. Because of this, if there was a timeout in any of the
data fetching, the deployment id was not getting written at all. Upload
of such reports fails on SUBNET as deployment id is the unique
identifier for a cluster in subnet.

Fixed by writing the deployment id at the beginning of the processing.
2022-04-03 13:15:02 -07:00
Aditya Manthramurthy
165d60421d
Add metrics for observing IAM sync operations (#14680) 2022-04-03 13:08:59 -07:00
Poorna
0e6aedc7ed
Capture cmdline args for inspect API (#14668)
Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
2022-03-31 16:05:43 -07:00
Aditya Manthramurthy
fc9668baa5
Increase IAM refresh rate to every 10 mins (#14661)
Add timing information for IAM init and refresh
2022-03-30 17:02:59 -07:00
Andreas Auernhammer
ba17d46f15
ListObjectParts: simplify ETag decryption and size adjustment (#14653)
This commit simplifies the ETag decryption and size adjustment
when listing object parts.

When listing object parts, MinIO has to decrypt the ETag of all
parts if and only if the object resp. the parts is encrypted using
SSE-S3.
In case of SSE-KMS and SSE-C, MinIO returns a pseudo-random ETag.
This is inline with AWS S3 behavior.

Further, MinIO has to adjust the size of all encrypted parts due to
the encryption overhead.

The ListObjectParts does specifically not use the KMS bulk decryption
API (4d2fc530d0) since the ETags of all
parts are encrypted using the same object encryption key. Therefore,
MinIO only has to connect to the KMS once, even if there are multiple
parts resp. ETags. It can simply reuse the same object encryption key.

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-03-30 15:23:25 -07:00
Krishna Srinivas
bdd816488d
Get the BackendInfo to fill the apporpriate struct fields (#14660) 2022-03-30 10:48:35 -07:00
Krishna Srinivas
36dcfee2f7
Allow decomission of pool even if a drive in it is down (#14656) 2022-03-29 22:51:31 -07:00
Poorna
4d13ddf6b3
Avoid shadowing error during replication proxy check (#14655)
Fixes #14652
2022-03-29 10:53:09 -07:00
Poorna
9e25475475
Validate tier manager is initialized in tier Empty() check (#14646)
Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
2022-03-29 10:10:06 -07:00
Andreas Auernhammer
e955aa7f2a
kes: add support for encrypted private keys (#14650)
This commit adds support for encrypted KES
client private keys.

Now, it is possible to encrypt the KES client
private key (`MINIO_KMS_KES_KEY_FILE`) with
a password.

For example, KES CLI already supports the
creation of encrypted private keys:
```
kes identity new --encrypt --key client.key --cert client.crt MinIO
```

To decrypt an encrypted private key, the password
needs to be provided:
```
MINIO_KMS_KES_KEY_PASSWORD=<password>
```

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-03-29 09:53:33 -07:00
Harshavardhana
7956ff0313
fix: multiple pool setup return incorrect DeleteMarker metadata (#14642) 2022-03-27 23:39:50 -07:00
Aditya Manthramurthy
9ff25fb64b
Load IAM in-memory cache using only a single list call (#14640)
- Increase global IAM refresh interval to 30 minutes
- Also print a log after loading IAM subsystem
2022-03-27 18:48:01 -07:00
Andreas Auernhammer
04df69f633
listing: decrypt only SSE-S3 single-part ETags (#14638)
This commit optimises the ETag decryption when
listing objects.

When MinIO lists objects, it has to decrypt the
ETags of single-part SSE-S3 objects.

It does not need to decrypt ETags of
 - plaintext objects => Their ETag is not encrypted
 - SSE-C objects     => Their ETag is not the content MD5
 - SSE-KMS objects   => Their ETag is not the content MD5
 - multipart objects => Their ETag is not encrypted

Hence, MinIO only needs to make a call to the KMS
when it needs to decrypt a single-part SSE-S3 object.
It can resolve the ETags off all other object types
locally.

This commit implements the above semantics by
processing an object listing in batches.
If the batch contains no single-part SSE-S3 object,
then no KMS calls will be made.

If the batch contains at least one single-part
SSE-S3 object we have to make at least one KMS call.
No we first filter all single-part SSE-S3 objects
such that we only request the decryption keys for
these objects.
Once we know which objects resp. ETags require a
decryption key, MinIO either uses the KES bulk
decryption API (if supported) or decrypts each
ETag serially.

This commit is a significant improvement compared
to the previous listing code. Before, a single
non-SSE-S3 object caused MinIO to fall-back to
a serial ETag decryption.
For example, if a batch consisted of 249 SSE-S3
objects and one single SSE-KMS object, MinIO would
send 249 requests to the KMS.
Now, MinIO will send a single request for exactly
those 249 objects and skip the one SSE-KMS object
since it can handle its ETag locally.

Further, MinIO would request decryption keys
for SSE-S3 multipart objects in the past - even
though multipart ETags are not encrypted.
So, if a bucket contained only multipart SSE-S3
objects, MinIO would make totally unnecessary
requests to the KMS.
Now, MinIO simply skips these multipart objects
since it can handle the ETags locally.

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-03-27 18:34:11 -07:00
Anis Elleuch
908eb57795
Always get the actual object size (#14637)
In bulk ETag decryption, do not rely on the etag to check if it is
encrypted or not to decide if we should set the actual object size in
ObjectInfo. The reason is that multipart objects ETags are not
encrypted.

Always get the actual object size in that case.
2022-03-27 08:54:25 -07:00
Harshavardhana
5cfedcfe33
askDisks for strict quorum to be equal to read quorum (#14623) 2022-03-25 16:29:45 -07:00
Andreas Auernhammer
4d2fc530d0
add support for SSE-S3 bulk ETag decryption (#14627)
This commit adds support for bulk ETag
decryption for SSE-S3 encrypted objects.

If KES supports a bulk decryption API, then
MinIO will check whether its policy grants
access to this API. If so, MinIO will use
a bulk API call instead of sending encrypted
ETags serially to KES.

Note that MinIO will not use the KES bulk API
if its client certificate is an admin identity.

MinIO will process object listings in batches.
A batch has a configurable size that can be set
via `MINIO_KMS_KES_BULK_API_BATCH_SIZE=N`.
It defaults to `500`.

This env. variable is experimental and may be
renamed / removed in the future.

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-03-25 15:01:41 -07:00
Harshavardhana
f046f557fa
request only 1 best version for latest version resolution (#14625)
ListObjects, ListObjectsV2 calls are being heavily taxed when
there are many versions on objects left over from a previous
release or ILM was never setup to clean them up. Instead
of being absolutely correct at resolving the exact latest
version of an object, we simply rely on the top most 1
version and resolve the rest.

Once we have obtained the top most "1" version for
ListObject, ListObjectsV2 call we break out.
2022-03-25 08:50:07 -07:00
Harshavardhana
401958938d
add load balance properly restClientFromHash() bucket/prefix (#14621)
spread out resuming further to other nodes
2022-03-25 03:41:31 -07:00
Poorna
566cffe53d
save format.json by default for inspect API (#14620) 2022-03-25 02:02:17 -07:00
Minio Trusted
a42b576382 keep maximum concurrent operations to 512 (to sustain upto 1024 open fds) 2022-03-23 17:02:04 -07:00
Klaus Post
2ac54e5a7b
ListObjects: Filter lifecycle expired objects (#14606)
For ListObjects and ListObjectsV2 perform lifecycle checks on 
all objects before returning. This will filter out objects that are 
pending lifecycle expiration.

Bonus: Cheaper server pool conflict resolution by not converting to FileInfo.
2022-03-22 12:39:45 -07:00
Harshavardhana
8eecdc6d1f
odd stripe sizes should choose (odd+1)/2 to get correct quorum (#14610) 2022-03-22 12:21:14 -07:00
Klaus Post
50577e2bd2
Allow adjusting request pool both ways (#14609)
When reloading a dynamic config allow the request pool to scale both ways.

Existing requests hold on to the previous pool, so they will pop the elements from that.
2022-03-22 11:28:54 -07:00
Klaus Post
7bc1f986e8
Do not wait for results when canceled (#14607)
When canceled nobody may be listening for the results.

Prevents memory buildup from cancelled requests.
2022-03-22 09:37:01 -07:00
Harshavardhana
d796621ccc
choose smaller default deadline for diagnostics without --full (#14599) 2022-03-21 23:25:24 -07:00
Harshavardhana
f6113264f4 add detection for GOMAXPROCS < NumCPU 2022-03-21 19:05:10 -07:00
Harshavardhana
a3534a730b
fallback quorum should be "strict" globally if config is not loaded (#14589) 2022-03-20 17:39:06 -07:00
Harshavardhana
bd6f7b6d83
fix: make decommission restart non-blocking (#14591)
currently an on-going decommission, during a server
restart might block the startup sequence for relatively
longer periods, instead start the decommission in
background lazily.
2022-03-20 14:46:43 -07:00
Andreas Auernhammer
b0a4beb66a
PutObjectPart: set SSE-KMS headers and truncate ETags. (#14578)
This commit fixes two bugs in the `PutObjectPartHandler`.
First, `PutObjectPart` should return SSE-KMS headers
when the object is encrypted using SSE-KMS.
Before, this was not the case.

Second, the ETag should always be a 16 byte hex string,
perhaps followed by a `-X` (where `X` is the number of parts).
However, `PutObjectPart` used to return the encrypted ETag
in case of SSE-KMS. This leaks MinIO internal etag details
through the S3 API.

The combination of both bugs causes clients that use SSE-KMS
to fail when trying to validate the ETag. Since `PutObjectPart`
did not send the SSE-KMS response headers, the response looked
like a plaintext `PutObjectPart` response. Hence, the client
tries to verify that the ETag is the content-md5 of the part.
This could never be the case, since MinIO used to return the
encrypted ETag.

Therefore, clients behaving as specified by the S3 protocol
tried to verify the ETag in a situation they should not.

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-03-19 10:15:12 -07:00
Harshavardhana
01ee49045e
fix: handle race in server setup global CI/CD variable (#14579) 2022-03-18 18:21:09 -07:00
Harshavardhana
7bd9f821dd
return correct context errors for locking operations (#14569)
if a context is canceled do not need to return a timeout error
instead, return the appropriate error for context canceled.
2022-03-18 15:32:45 -07:00
Klaus Post
61eb9d4e29
Fix listing fallback re-using disks (#14576)
When more than 2 disks are unavailable for listing, the same disk will be used for fallback.

This makes quorum calculations incorrect since the same disk will have multiple entries.

This PR keeps track of which fallback disks have been handed out and only every returns a disk once.
2022-03-18 11:35:27 -07:00
Harshavardhana
43eb5a001c
re-use transport for AdminInfo() call (#14571)
avoids creating new transport for each `isServerResolvable`
request, instead re-use the available global transport and do
not try to forcibly close connections to avoid TIME_WAIT
build upon large clusters.

Never use httpClient.CloseIdleConnections() since that can have
a drastic effect on existing connections on the transport pool.

Remove it everywhere.
2022-03-17 16:20:10 -07:00
Klaus Post
c1760fb764
Move apiCalls to front for field alignment (#14568)
Fixes #14565
2022-03-17 10:57:52 -07:00
Minio Trusted
ffcadcd99e Revert "Use S3 client for uplooads/downloads during perf test (#14553)"
This reverts commit ff811f594b.

Speedtest is broken need to fix this more cleanly.
2022-03-16 23:34:49 -07:00
Krishnan Parthasarathi
7b81967a3c
Fix handling of object versions pending purge (#14555)
- GetObject() with vid should return 405
- GetObject() without vid should return 404
- ListObjects() should ignore this object if this is the "latest" version of the object
- ListObjectVersions() should list this object as "DELETE marker"
- Remove data parts before sync'ing the version pending purge
2022-03-16 16:59:43 -07:00
Krishna Srinivas
ff811f594b
Use S3 client for uplooads/downloads during perf test (#14553) 2022-03-16 16:58:46 -07:00
Harshavardhana
e3071157f0
allow MakeBucketLocation to work for metaBucket (#14548)
decommission would fail to start due to failure
in MakeBucketLocation() error on .minio.sys/ bucket
creation.

Allow these special buckets.
2022-03-14 11:25:24 -07:00
Klaus Post
c07af89e48
select: Add ScanRange to CSV&JSON (#14546)
Implements https://docs.aws.amazon.com/AmazonS3/latest/API/API_SelectObjectContent.html#AmazonS3-SelectObjectContent-request-ScanRange

Fixes #14539
2022-03-14 09:48:36 -07:00
Harshavardhana
9c846106fa
decouple service accounts from root credentials (#14534)
changing root credentials makes service accounts
in-operable, this PR changes the way sessionToken
is generated for service accounts.

It changes service account behavior to generate
sessionToken claims from its own secret instead
of using global root credential.

Existing credentials will be supported by
falling back to verify using root credential.

fixes #14530
2022-03-14 09:09:22 -07:00
Harshavardhana
cf94d1f1f1
do not crash readXLMetaNoData - if the xl.meta has incorrect content (#14538)
```
tmp = buf[want:]
```

Would potentially crash when `buf` is truncated for some reason
and does not have the expected bytes, this is of course considered
not normal and is an odd situation. But we do not need to crash
here instead allow for errors to be returned and let callers handle
the errors.
2022-03-14 09:07:46 -07:00
Poorna
f8d6eaaa96
fix: regression from range GET proxy on replicated buckets #14345 (#14532)
Fixes: #14531
2022-03-11 15:56:49 -08:00
Poorna
75b925c326
Deprecate root disk for disk caching (#14527)
This PR modifies #14513 to issue a deprecation
warning rather than reject settings on startup.
2022-03-10 18:42:44 -08:00
Harshavardhana
91d419ee6c
warn issues about large block I/O performance for Linux older than 4.0.0 (#14524)
This PR simply adds a warning message when it detects older kernel
versions and warn's them about potential performance issues on this
kernel.

The issue can be seen only with parallel I/O across all drives
on denser setups such as 90 drives or 45 drives per server configurations.
2022-03-10 17:36:13 -08:00
Harshavardhana
41079f1015
heal: remove blocking healDiskMeta upon startup (#14514)
This type of code is not necessary, read's of all
metadata content at `.minio.sys/config` automatically
triggers healing when necessary in the GetObjectNInfo()
call-path.

Having this code is not useful and this also adds to
the overall startup time of MinIO when there are lots
of users and policies.
2022-03-10 02:45:14 -08:00
Poorna
712dfa40cd
Add missing site replication hook for clearing sse config (#14512) 2022-03-10 00:04:34 -08:00
Klaus Post
b890bbfa63
Add local disk health checks (#14447)
The main goal of this PR is to solve the situation where disks stop 
responding to operations. This generally causes an FD build-up and 
eventually will crash the server.

This adds detection of hung disks, where calls on disk get stuck.

We add functionality to `xlStorageDiskIDCheck` where it keeps 
track of the number of concurrent requests on a given disk.

A total number of 100 operations are allowed. If this limit is reached 
we will block (but not reject) new requests, but we will monitor the 
state of the disk.

If no requests have been completed or updated within a 15-second 
window, we mark the disk as offline. Requests that are blocked will be 
unblocked and return an error as "faulty disk".

New requests will be rejected until the disk is marked OK again.

Once a disk has been marked faulty, a check will run every 5 seconds that 
will attempt to write and read back a file. As long as this fails the disk will 
remain faulty.

To prevent lots of long-running requests to mark the disk faulty we 
implement a callback feature that allows updating the status as parts 
of these operations are running.

We add a reader and writer wrapper that will update the status of each 
successful read/write operation. This should allow fine enough granularity 
that a slow, but still operational disk will not reach 15 seconds where 
50 operations have not progressed.

Note that errors themselves are not enough to mark a disk faulty. 
A nil (or io.EOF) error will mark a disk as "good".

* Make concurrent disk setting configurable via `_MINIO_DISK_MAX_CONCURRENT`.

* de-couple IsOnline() from disk health tracker

The purpose of IsOnline() is to ensure that we
reconnect the drive only when the "drive" was

- disconnected from network we need to validate
  if the drive is "correct" and is the same drive
  which belongs to this server.

- drive was replaced we have to format it - we
  support hot swapping of the drives.

IsOnline() is not meant for taking the drive offline
when it is hung, it is not useful we can let the
drive be online instead "return" errors for relevant
calls.

* return errFaultyDisk for DiskInfo() call

Co-authored-by: Harshavardhana <harsha@minio.io>

Possible future Improvements:

* Unify the REST server and local xlStorageDiskIDCheck. This would also improve stats significantly.
* Allow reads/writes to be aborted by the context.
* Add usage stats, concurrent count, blocked operations, etc.
2022-03-09 11:38:54 -08:00
Poorna
46ba15ab03
Return MethodNotAllowed if force del on replicated bucket (#14505) 2022-03-08 14:28:51 -08:00
Poorna
1e39ca39c3
fix: consistent replies for incorrect range requests on replicated buckets (#14345)
Propagate error from replication proxy target correctly to the client if range GET is unsatisfiable.
2022-03-08 13:58:55 -08:00
Krishnan Parthasarathi
80ef1ae51c
Simplify assembling of tierStats from data-usage (#14504) 2022-03-08 12:08:29 -08:00
Krishna Srinivas
4d0715d226
Implement netperf for "mc support perf net" (#14397)
Co-authored-by: Klaus Post <klauspost@gmail.com>
2022-03-08 09:54:38 -08:00
Klaus Post
8a274169da
heal: Fix first entry on dangling (#14495)
Instead of the first, the last entry was returned
pointerizing the range value.
2022-03-08 09:04:20 -08:00
Harshavardhana
5d6f6d8d5b
create missing .minio.sys/config, .minio.sys/buckets during decommission (#14497) 2022-03-07 16:18:57 -08:00
Anis Elleuch
bacf6156c1
metrics: Avoid crash when fetching tier metrics (#14493)
Data usage does not always contain tiering info even if the data usage
information is valid. Avoid a crash in that case.

(e.g. the scanner scanned the namespace, the user enables tiering,
prometheus scrapes the server before the scanner gets a chance to
update the data usage with new tiering information)
2022-03-07 10:59:32 -08:00
Klaus Post
1d1b213f1f
scanner: Consider preselection bias when selecting for Healing (#14492)
Healing decisions would align with skipped folder counters. This can lead to files 
never being selected for heal checks on "clean" paths.

Use different hashing methods and take objectHealProbDiv into account when 
calculating the cycle.

Found by @vadmeste
2022-03-07 09:25:53 -08:00
Harshavardhana
92a77cc78e
update pkg v1.1.20 to reload certs in k8s always (#14470) 2022-03-04 20:34:39 -08:00
Harshavardhana
b0c84e3de7
fix: deleteVersions causing xl.meta to have empty Versions[] slice (#14483)
This is a side-affect of the optimization done in PR #13544 which
causes a certain type of delete operations on given object versions
can cause lastVersion indication to be skipped, which leads to
an `xl.meta` where Versions[] slice is empty while the entire
file is intact by itself.

This PR tries to ensure that such files are visible and deletable
by regular means of listing as null 'delete-marker' and also
avoid the situation where this potential issue might arise.
2022-03-04 20:01:26 -08:00
Anis Elleuch
bbc914e174
heal: Do not override heal scan mode mode if it is set (#14476)
mc admin heal has --scan=deep flag which enforces bitrot checking 
when doing the healing.

Do not force override an existing heal scan option.
2022-03-04 18:25:06 -08:00
Anis Elleuch
3fca4055d2
heal: Re-heal an object when a corruption is found during normal scan (#14482)
When scanning using normal mode, HealObject() can report an 
error saying that it found a corrupted part. This doesn't have 
when HealObject() is called with bitrot scan flag. However, when 
this happens, we can still restart HealObject() with the bitrot scan.

This is also important because this means the scanner and the 
new disks healer will not be able to heal an object that doesn't 
exist in a specific disk and has corruption in another disk.

Also without this PR, mc admin heal command without bitrot will report
an error.
2022-03-04 18:24:34 -08:00
Harshavardhana
66afa16aed
canceled PUTs throw frivolous logs (#14475)
remote drives might throw frivolous logs,
if the caller canceled the PUT operation
in such scenarios there is no reason to log.
2022-03-04 10:31:33 -08:00
Harshavardhana
0e3bafcc54
improve logs, fix banner formatting (#14456) 2022-03-03 13:21:16 -08:00
Andreas Auernhammer
b48f719b8e
kes: remove unnecessary error conversion (#14459)
This commit removes some duplicate code that
converts KES API errors.

This code was added since KES `0.18.0` changed
some exported API errors. However, the KES SDK
handles this error conversion itself.
Therefore, it is not necessary to duplicate this
behavior in MinIO.

See: 21555fa624/error.go (L94)

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-03-03 09:42:37 -08:00
Lenin Alevski
289fcbd08c
KES dependency upgrade (#14454)
- Updating KES dependency to v.0.18.0
- Fixing incompatibility issue when checking for errors during KES key creation

Signed-off-by: Lenin Alevski <alevsk.8772@gmail.com>
2022-03-02 23:03:40 -08:00
Harshavardhana
7e803adf13
do not attempt force delete on bucket (#14452)
caller needs to ask explicitly for force delete
otherwise, the force delete might end up deleting
an existing bucket with data.

fixes #14445
2022-03-02 20:47:53 -08:00
Anis Elleuch
4a15bd8ff8
Return info for DiskInfo when the disk is unformatted (#14427)
In a distributed setup, a DiskInfo REST call to an unformatted disk
returns an error with no disk information, such as the disk endpoint
URL, which is unexpected.
2022-03-01 15:06:47 -08:00
Klaus Post
b030ef1aca
tests: Clean up dsync package (#14415)
Add non-constant timeouts to dsync package.

Reduce test runtime by minutes. Hopefully not too aggressive.
2022-03-01 11:14:28 -08:00
Harshavardhana
cc46a99f97
skip object-lock headers without values (#14430)
metadata headers can have headers without values
as per AWS S3 spec however, we need to skip some
headers that do not have values that potentially
can have empty values set.
2022-03-01 11:04:47 -08:00
Xuehan Xu
becec6cb6b
correct mrf.newSetReconnected invocation's param order (#14426)
Signed-off-by: xuxuehan <xuxuehan@qianxin.com>
2022-02-28 09:13:19 -08:00
Harshavardhana
b7c90751b0 allow drive tests to respond only drive paths 2022-02-25 18:54:46 -08:00
Harshavardhana
e43cc316ff
remove errCh usage from HealObjects() simplify it (#14414)
errCh is not needed instead, rely on errs slice to
capture and return errors instead.

most probably fixes #14247
2022-02-25 12:20:41 -08:00
hellivan
03b35ecdd0
collect correct parentUser for OIDC creds auto expiration (#14400) 2022-02-24 11:43:15 -08:00
Harshavardhana
c08540c7b7
reject speedtest when there isn't enough disk space available (#14402)
small setups do not return appropriate errors when speedtest
cannot run on small tiny setups, allow the tests to fail
appropriately more pro-actively.

many users bring toy setups, this PR simply returns an error
in such situations.
2022-02-24 09:06:18 -08:00
Shireesh Anjal
3934700a08
Make audit webhook and kafka config dynamic (#14390) 2022-02-24 09:05:33 -08:00
Harshavardhana
2d78e20120
enable CI environment additionally for MINIO_CI_CD (#14395)
all CI/CD environments set CI=true this is enough
for MinIO to be run inside CI environments, support
it.
2022-02-23 16:01:59 -08:00
Harshavardhana
2e6f8bdf19
do not skip healing disks during deletes (#14394)
healing disks take active I/O it is possible
that deleted objects might stay in .trash
folder for a really long time until the drive
is fully healed.

this PR changes it such that we are making sure
we purge the active content written to these
disks as well.
2022-02-23 14:30:46 -08:00
Shireesh Anjal
25144fedd5
Send deployment id and minio version in http header (#14378) 2022-02-23 13:36:01 -08:00
Krishnan Parthasarathi
27f64dd9a4
Add support for tier-remove and tier-verify (#14382)
* Add tier remove support only if it's empty
* Add support for tier verify
2022-02-23 13:34:25 -08:00
Harshavardhana
9d7648f02f
reduce unnecessary logging during speedtest (#14387)
- speedtest logs calls that were canceled
  spuriously, in situations where it should
  be ignored.

- all errors of interest are always sent back
  to the client there is no need to log them
  on the server console.

- PUT failures should negate the increments
  such that GET is not attempted on unsuccessful
  calls.

- do not attempt MRF on speedtest objects.
2022-02-23 11:59:13 -08:00
Poorna
1ef8babfef
cache: improve error reported for atime check (#14384) 2022-02-23 11:57:06 -08:00
Poorna
4ea7bf0510
Use custom transport for site replication (#14391)
Also, ensure that tiering uses a different instance of custom transport
2022-02-23 11:50:40 -08:00
Anis Elleuch
5dcf1d13a9
ci: Always set disks as non root disks (#14389)
In the testing mode, reformatting disks will fail because the healing
code will complain if one disk is in root mode. This commit will
automatically set all disks as non-root if MINIO_CI_CD is set.
2022-02-23 10:11:33 -08:00
Shireesh Anjal
94d37d05e5
Apply dynamic config at sub-system level (#14369)
Currently, when applying any dynamic config, the system reloads and
re-applies the config of all the dynamic sub-systems.

This PR refactors the code in such a way that changing config of a given
dynamic sub-system will work on only that sub-system.
2022-02-22 10:59:28 -08:00
Harshavardhana
0cbdc458c5
fix: do not reload disk format.json on a reconnected disk (#14351)
An onlineDisk means its a valid disk but it may be a
re-connected disk, this PR verifies that based on LastConn()
to only trigger MRF. Current code would again re-load the
disk 'format.json' which is not necessary and perhaps an
unnecessary call.

A potential side affect of this is closing perfectly online
disks and getting re-replaced by reloading 'format.json'.

This PR tries to avoid this situation by making sure MRF
is triggered but not reloading 'format.json' because of MRF.
2022-02-21 15:51:54 -08:00
Harshavardhana
65b1a4282e
fix: console logger regression with dynamic logger webhook registration (#14346)
fixes a regression from #14289
2022-02-17 17:50:10 -08:00
Harshavardhana
af3dc25dfe
align 32bit integers with atomic values in structs (#14344)
fixes #14341
2022-02-17 15:22:26 -08:00
Krishnan Parthasarathi
5a0c0079a1
Don't add free-version on restore-object (#14340) 2022-02-17 15:05:19 -08:00
Harshavardhana
af8f563ed3
allow clearing FIFO config as fallback (#14338)
FIFO is already removed, for users who upgrade are allowed to clear their configs.
2022-02-17 12:49:46 -08:00
Poorna
93af4a4864
Handle non existent kms key correctly (#14329)
- in PutBucketEncryption API
- admin APIs for  `mc admin KMS key [create|info]`
- PutObject API when invalid KMS key is specified
2022-02-17 11:36:14 -08:00
Shireesh Anjal
28f188e3ef
Make logger webhook config dynamic (#14289)
It should not be required to restart the 
server after setting the logger webhook config.
2022-02-17 11:11:15 -08:00
Harshavardhana
d756da41b9 fix: print gateway banner on removal notice 2022-02-16 20:34:47 -08:00
Krishnan Parthasarathi
cdab4a3b85
Update hourly tier-stats only on succesful tiering (#14330) 2022-02-16 17:29:12 -08:00
Klaus Post
b88c57ba93
Add fgprof profiles (#14321)
https://github.com/felixge/fgprof#rocket-fgprof---the-full-go-profiler
2022-02-16 12:00:10 -08:00
Klaus Post
60cd513a33
Fix leaked healing goroutines (#14322)
Only the first `listAndHeal` would ever be able to write on errCh, blocking all others infinitely.

Instead read all errors but return the first non-nil, if any.

The intention appears to be that this should cancel on any error, 
so that part is kept. 

Regression from #13990
2022-02-16 08:40:18 -08:00
Harshavardhana
03a6e8aee2
fix: creating steep directory structure on trash folder (#14314)
weird directory structures get created on the '.trash'
folder upon server restarts, this PR fixes this.
2022-02-15 16:34:03 -08:00
Anis Elleuch
4afbb89774
nas: Clean stale background appended files (#14295)
When more than one gateway reads and writes from the same mount point
and there is a load balancer pointing to those gateways. Each gateway 
will try to create its own temporary append file but fails to clear it later 
when not needed.

This commit creates a routine that checks all upload IDs saved in
multipart directory and remove any stale entry with the same upload id
in the memory and in the temporary background append folder as well.
2022-02-15 09:25:47 -08:00
Klaus Post
5ec57a9533
Add GetObject gzip option (#14226)
Enabled with `mc admin config set alias/ api gzip_objects=on`

Standard filtering applies (1K response minimum, not compressed content 
type, not range request, gzip accepted by client).
2022-02-14 09:19:01 -08:00
Anis Elleuch
1f92fc3fc0
Always check for root disks unless MINIO_CI_CD is set (#14232)
The current code considers a pool with all root disks to be as part
of a testing environment even if there are other pools with mounted
disks. This will result to illegitimate writing in root disks.

Fix this by simplifing the logic: require MINIO_CI_CD in order to skip
root disk check.
2022-02-13 15:42:07 -08:00
Harshavardhana
fad3d66093
parallelize background cleanup on local disks across sets (#14290) 2022-02-11 14:22:48 -08:00
Poorna
ed3418c046
Refactor replication resync to be an active process (#14266)
When resync is triggered, walk the bucket namespace and
resync objects that are unreplicated. This PR also adds
an API to report resync progress.
2022-02-10 10:16:52 -08:00
Anis Elleuch
71bab74148
Fix adding bucket forwarder handler in server mode (#14288)
MinIO configuration is loaded after the initialization of the server
handlers, which will miss the initialization of the bucket forwarder
handler.

Though the federation is deprecated, let's fix this for the time being.
2022-02-10 08:49:36 -08:00
Anis Elleuch
661ea57907
restore: Add quotes some fields in x-amz-restore header (#14281)
S3 spec returns x-amz-restore header in HEAD/GET object with the
following format:

```
x-amz-restore: ongoing-request="false", expiry-date="Fri, 21 Dec 2012
00:00:00 GMT"
```

This commit adds quotes as the current code does not support it. It will
also supports the old format saved in the disk (in xl.meta) for backward
compatibility.
2022-02-09 13:17:41 -08:00
Anis Elleuch
1f18efb0ba
gateway: Active bucket forwarding handler (#14277)
A regression removed support of federation in the gateway mode. 
Enable it again.

Federation is deprecated for a while but let's fix this for the time being.
2022-02-09 09:31:47 -08:00
Daniel
8ae46bce93
fix the error logs have been omitted because of retryCount never exceed 10 (#14268) 2022-02-09 03:14:22 -08:00
Harshavardhana
f19a414e09
fix: allow danging objects to be purged properly deleteMultipleObjects() (#14273)
Deleting bulk objects had an issue since the relevant versionID
is not passed through the layers to ensure that the dangling
object purge actually works cleanly.

This is a continuation of quorum related error returned by
multi-object delete API from #14248

This PR ensures that we pass down correct information as
well as extend the scope of dangling object detection.
2022-02-08 20:08:23 -08:00
Krishnan Parthasarathi
0ee2933234
Export tier metrics via Prometheus (#13413)
e.g
```
minio_cluster_ilm_transitioned_bytes{server="minio3:9000",tier="S3TIER-1"} 1.36317772e+08
minio_cluster_ilm_transitioned_bytes{server="minio3:9000",tier="S3TIER-2"} 2892
minio_cluster_ilm_transitioned_bytes{server="minio3:9000",tier="STANDARD"}
1.3631488e+08

minio_cluster_ilm_transitioned_objects{server="minio3:9000",tier="S3TIER-1"} 1
minio_cluster_ilm_transitioned_objects{server="minio3:9000",tier="S3TIER-2"} 0
minio_cluster_ilm_transitioned_objects{server="minio3:9000",tier="STANDARD"} 1

minio_cluster_ilm_transitioned_versions{server="minio3:9000",tier="S3TIER-1"} 3
minio_cluster_ilm_transitioned_versions{server="minio3:9000",tier="S3TIER-2"} 2
minio_cluster_ilm_transitioned_versions{server="minio3:9000",tier="STANDARD"} 1
```
2022-02-08 12:45:28 -08:00
Shireesh Anjal
9890f579f8
Add subsystem level validation on config set (#14269)
When setting a config of a particular sub-system, validate the existing
config and notification targets of only that sub-system, so that
existing errors related to one sub-system (e.g. notification target
offline) do not result in errors for other sub-systems.
2022-02-08 10:36:41 -08:00
Anis Elleuch
2ee337ead5
prometheus: Add incoming requests metrics since last scrape (#14261)
Some users running MinIO claim that their system became slow. One 
way to investigate is to look at this Prometheus history of the number of
the requests reaching the server. The existing current S3 requests metric
is not enough because it can increase of the system really becomes slow, 
due to disk issues for example.
2022-02-07 16:30:14 -08:00
Harshavardhana
3c87e1e60d
fix: rename some function names to avoid confusion (#14262) 2022-02-07 11:49:07 -08:00
Harshavardhana
0cac868a36
speed-up startup time, do not block on ListBuckets() (#14240)
Bonus fixes #13816
2022-02-07 10:39:57 -08:00
Harshavardhana
186c477f3c init console server after server config is initialized
fixes #14259
2022-02-07 00:17:33 -08:00
Harshavardhana
6123377e66
speedup getFormatErasureInQuorum use driveCount (#14239)
startup speed-up, currently getFormatErasureInQuorum()
would spend up to 2-3secs when there are 3000+ drives
for example in a setup, simplify this implementation
to use drive counts.
2022-02-04 12:21:21 -08:00
Harshavardhana
0256dae657
fix: quorum requirement for DeleteMarkers and parity upgraded objects (#14248)
DeleteMarkers do not have a default quorum, i.e it is possible that
DeleteMarkers were created with n/2+1 quorum as well to make sure
that we satisfy situations such as those we need to make sure delete
markers only expect n/2 read quorum.

Additionally we should also look at additional metadata on the
actual objects that might have been "erasure" upgraded with new
parity when disks are down.

In such a scenario do not default to the standard storage class
parity, instead use the parityBlocks present on the FileInfo to
ensure that we are dealing with the correct quorum for READs and
DELETEs.
2022-02-04 02:47:36 -08:00
Harshavardhana
84b121bbe1
return error with empty x-amz-copy-source-range headers (#14249)
fixes #14246
2022-02-03 16:58:27 -08:00
Harshavardhana
01e550a9be
ignore unreadable metrics on certain closed systems (#14234)
fixes #14233
2022-02-03 09:45:12 -08:00
Poorna
63a2e0bab6
Remove notification from NotificationSys on bucket deletion (#14236) 2022-02-02 17:11:56 -08:00
Harshavardhana
24657859a8
when o_direct is disabled do not attempt fadvise call (#14230) 2022-02-02 08:54:52 -08:00
Sidhartha Mani
d7df6bc738
add support for speedtest drive (#14182) 2022-02-01 22:38:05 -08:00
Poorna
a4e1de93a7
Add API for removing site(s) from site replication (#14104) 2022-02-01 17:26:09 -08:00
Klaus Post
067d21d0f2
fs: Retry listing if no marker (#14221)
Retry listings, when no next marker is returned and the result isn't truncated.

This can happen when an object is queued, but no info can be fetched.

Fixes #14190
2022-02-01 10:00:14 -08:00
Shireesh Anjal
3882da6ac5
Add subnet proxy config (#14225)
Will store the HTTP(S) proxy URL to use for connecting to SUBNET.
2022-02-01 09:52:38 -08:00
Anis Elleuch
127e8bf3b6
heal: Avoid printing repetitive error to heal a root disk (#14220)
The healing code repeatedly tries to heal a root disk when it is empty
the reason is that connectEndpoint() returns errUnformattedDisk even
if the disk is a root disk. Changing that to returning another error
will avoid queueing the disk to the healing code in each connect disks
iteration.
2022-01-31 17:28:20 -08:00
Harshavardhana
74faed166a
Add quota usage as part of prometheus metrics (#14222)
Bonus: pass caller context when needed to all bucket metadata handling calls.
2022-01-31 17:27:43 -08:00
Harshavardhana
dbd05d6e82
remove FIFO bucket quota, use ILM expiration instead (#14206) 2022-01-31 11:07:04 -08:00
Harshavardhana
b5d35c7e09
ignore disk metrics for single drive mode (#14212)
fixes #14211
2022-01-31 00:44:26 -08:00
Poorna
0f88cdc80e
Return all stats in SiteReplicationStatus API if options unset (#14207) 2022-01-28 21:19:38 -08:00
Poorna
38e3c7a8f7
Added filters for SiteReplicationStatus API to support new UI changes (#14177) 2022-01-28 15:37:55 -08:00
Poorna
a4be47d7ad
Validate config before saving changes after config reset (#14203) 2022-01-27 18:28:16 -08:00
Harshavardhana
aaea94a48d
update quorum requirement to list all objects (#14201)
some upgraded objects might not get listed due
to different quorum ratios across objects.

make sure to list all objects that satisfy the
maximum possible quorum.
2022-01-27 17:00:15 -08:00
Aditya Manthramurthy
c3d9c45f58
Ensure that AssumeRole calls are sent to Audit log (#14202)
When authentication fails MinIO was not sending out an Audit log 
event for this STS call
2022-01-27 16:17:11 -08:00
Klaus Post
a2a48cc065
Optimize read locker cleanup (#14200)
When objects hold a lot of read locks cleanup time grows exponentially.

```
BEFORE:

Unable to complete tests.

AFTER:

=== RUN   Test_localLocker_expireOldLocksExpire/100-locks/1-read
    local-locker_test.go:298: Scan Took: 0s. Left: 100/100
    local-locker_test.go:317: Expire 50% took: 0s. Left: 44/44
    local-locker_test.go:331: Expire rest took: 0s. Left: 0/0
=== RUN   Test_localLocker_expireOldLocksExpire/100-locks/100-read
    local-locker_test.go:298: Scan Took: 0s. Left: 10000/100
    local-locker_test.go:317: Expire 50% took: 1ms. Left: 5000/100
    local-locker_test.go:331: Expire rest took: 1ms. Left: 0/0
=== RUN   Test_localLocker_expireOldLocksExpire/100-locks/1000-read
    local-locker_test.go:298: Scan Took: 2ms. Left: 100000/100
    local-locker_test.go:317: Expire 50% took: 55ms. Left: 50038/100
    local-locker_test.go:331: Expire rest took: 29ms. Left: 0/0
=== RUN   Test_localLocker_expireOldLocksExpire/10000-locks/1-read
    local-locker_test.go:298: Scan Took: 1ms. Left: 10000/10000
    local-locker_test.go:317: Expire 50% took: 2ms. Left: 5019/5019
    local-locker_test.go:331: Expire rest took: 2ms. Left: 0/0
=== RUN   Test_localLocker_expireOldLocksExpire/10000-locks/100-read
    local-locker_test.go:298: Scan Took: 23ms. Left: 1000000/10000
    local-locker_test.go:317: Expire 50% took: 160ms. Left: 499798/10000
    local-locker_test.go:331: Expire rest took: 138ms. Left: 0/0
=== RUN   Test_localLocker_expireOldLocksExpire/10000-locks/1000-read
    local-locker_test.go:298: Scan Took: 200ms. Left: 10000000/10000
    local-locker_test.go:317: Expire 50% took: 5.888s. Left: 5000196/10000
    local-locker_test.go:331: Expire rest took: 3.417s. Left: 0/0
=== RUN   Test_localLocker_expireOldLocksExpire/1000000-locks/1-read
    local-locker_test.go:298: Scan Took: 133ms. Left: 1000000/1000000
    local-locker_test.go:317: Expire 50% took: 348ms. Left: 500255/500255
    local-locker_test.go:331: Expire rest took: 307ms. Left: 0/0
```
2022-01-27 14:10:57 -08:00
Harshavardhana
cf407f7176
do not expect 'speedtest' to be a bucket (#14199)
fixes #14196
2022-01-27 08:13:03 -08:00
Harshavardhana
d6dd17a483
make sure to pass groups for all credentials while verifying policies (#14193)
fixes #14180
2022-01-26 21:53:36 -08:00
Aditya Manthramurthy
7dfa565d00
Identity LDAP: Allow multiple search base DNs (#14191)
This change allows the MinIO server to lookup users in different directory
sub-trees by allowing specification of multiple search bases separated by
semicolons.
2022-01-26 15:05:59 -08:00
Krishnan Parthasarathi
d2e5f01542
feat: maintain in-memory tier stats for the last 24hrs (#13782) 2022-01-26 14:33:10 -08:00
yfanswer
f4e373e0d2
de-couple cache completeMultipartUpload with caller context (#14181) 2022-01-26 11:55:58 -08:00
Harshavardhana
57118919d2
cached diskIDs are not needed for scanner healing (#14170)
This PR removes an unnecessary state that gets
passed around for DiskIDs, which is not necessary
since each disk exactly knows which pool and which
set it belongs to on a running system.

Currently cached DiskId's won't work properly
because it always ends up skipping offline disks
and never runs healing when disks are offline, as
it expects all the cached diskIDs to be present
always. This also sort of made things in-flexible
in terms perhaps a new diskID for `format.json`.
(however this is not a big issue)

This is an unnecessary requirement that healing
via scanner needs all drives to be online, instead
healing should trigger even when partial nodes
and drives are available this ensures that we
keep the SLA in-tact on the objects when disks
are offline for a prolonged period of time.
2022-01-26 08:34:56 -08:00
Klaus Post
7db05a80dd
locking: Fix wrong map id (#14184)
Wrong resource is being fetched, since idx is incremented, but mapID is reused.

Regression caused by #13454 - that part didn't optimize anything anyway.
2022-01-26 08:34:09 -08:00
Anis Elleuch
45a99c3fd3
publish storage API latency through node metrics (#14117)
Publish storage functions latency to help compare the performance 
of different disks in a single deployment.

e.g.:
```
minio_node_disk_latency_us{api="storage.WalkDir",disk="/tmp/xl/1",server="localhost:9001"} 226
minio_node_disk_latency_us{api="storage.WalkDir",disk="/tmp/xl/2",server="localhost:9002"} 1180
minio_node_disk_latency_us{api="storage.WalkDir",disk="/tmp/xl/3",server="localhost:9003"} 1183
minio_node_disk_latency_us{api="storage.WalkDir",disk="/tmp/xl/4",server="localhost:9004"} 1625
```
2022-01-25 16:31:44 -08:00
Harshavardhana
b68f0cbde4
ignore remote disks with diskID empty as offline (#14168)
concurrent loading of erasure sets can now expose a
situation in a distributed setup that might return
diskID as empty, treat such disks as offline.
2022-01-24 19:40:02 -08:00
Krishnan Parthasarathi
ebc3627c73
further improvements to newXLStorage (#14166)
- create internal erasure volumes only if the disk is unformatted
- return a copy of format data in xlStorage.ReadAll
- parse env vars only once, to be re-used by xl-storage
2022-01-24 17:09:12 -08:00
Harshavardhana
5a9f133491
speed up startup sequence for all operations (#14148)
This speed-up is intended for faster startup times
for almost all MinIO operations. Changes here are

- Drives are not re-read for 'format.json' on a regular
  basis once read during init is remembered and refreshed
  at 5 second intervals.

- Do not do O_DIRECT tests on drives with existing 'format.json'
  only fresh setups need this check.

- Parallelize initializing erasureSets for multiple sets.

- Avoid re-reading format.json when migrating 'format.json'
  from really old V1->V2->V3

- Keep a copy of local drives for any given server in memory
  for a quick lookup.
2022-01-24 11:28:45 -08:00
Harshavardhana
f6d13f57bb
fix: correct parentUser lookup for OIDC auto expiration (#14154)
fixes #14026

This is a regression from #13884
2022-01-22 16:36:11 -08:00
Poorna
48da4aeee0
Add API for removing site(s) from site replication (#14022) 2022-01-21 08:48:21 -08:00
Harshavardhana
7f214a0e46
use dnscache resolver for resolving command line endpoints (#14135)
this helps in caching the resolved values early on, avoids
causing further resolution for individual nodes when
object layer comes online.

this can speed up our startup time during, upgrades etc by
an order of magnitude.

additional changes in connectLoadInitFormats() and parallelize
all calls that might be potentially blocking.
2022-01-20 13:03:15 -08:00
Klaus Post
e1a0a1e73c
fs: Return prefix as listing marker if no objects (#14143)
Fixes #14132
2022-01-20 10:55:18 -08:00
Harshavardhana
9d588319dd
support site replication to replicate IAM users,groups (#14128)
- Site replication was missing replicating users,
  groups when an empty site was added.

- Add site replication for groups and users when they
  are disabled and enabled.

- Add support for replicating bucket quota config.
2022-01-19 20:02:24 -08:00
Klaus Post
0012ca8ca5
Fix inconsistent metadata after healing (#14125)
When calculating signatures empty part ETags were not discarded, leading 
to a different signature compared to freshly created ones.

This would mean that after a heal signature of the healed metadata would be 
different. Fixing the calculation of signature will make these consistent.

Furthermore when inconsistent entries, with zero version ID, with the same 
mod times but different signatures, the one with the lowest signature would 
be picked for quorum check. Since this is 50/50, we fall back to a simple 
quorum count on all signatures.

Each of these fixes by themselves will lead to quorum. Tests were added 
for regressions and expected outcomes.
2022-01-19 10:48:00 -08:00
Poorna
288e276abe
Specify tags in options while selecting replication targets (#14126)
When the replication rule is based on tag matches, the replication process
should pick up targets matching the tags specified in the replication
rule.

Fixing regression due to #12880
2022-01-19 10:45:42 -08:00
Jarbitz
f22e745514
fix: ListBucketUsers comment doc (#14129) 2022-01-19 10:45:13 -08:00
Krishnan Parthasarathi
070c31eac5
Wait for updates collector when disk.NSScanner returns error (#14127) 2022-01-19 00:46:43 -08:00
Harshavardhana
70e1cbda21
allow disabling O_DIRECT in certain environments for reads (#14115)
repeated reads on single large objects in HPC like
workloads, need the following option to disable
O_DIRECT for a more effective usage of the kernel
page-cache.

However this optional should be used in very specific
situations only, and shouldn't be enabled on all
servers.

NVMe servers benefit always from keeping O_DIRECT on.
2022-01-17 08:34:14 -08:00
Harshavardhana
60f2df54e0
Add envVars for CLI arguments (#14114)
fixes #14107
2022-01-15 16:20:02 -08:00
Harshavardhana
ba708f51f2
fix: copyMetrics to avoid map references elsewhere (#14113)
map labels might have been referenced else, this
can lead to concurrent access at lower layers.

avoid this by copying the information while
concurrently serving the metrics.
2022-01-14 16:48:19 -08:00
Harshavardhana
0df31f63ab
reject changing pools when there are pending decommissions in-progress (#14102)
do not allow mutation to pool command line when there are
unfinished decommissions in place, disallow such scenarios
to avoid user mistakes.

also add testcases to cover all relevant scenarios.
2022-01-14 10:32:35 -08:00
Klaus Post
64d4da5a37
Add Put input readahead (#14084)
When reading input for PutObject or PutObjectPart add a readahead buffer for big inputs.

This will make network reads+hashing separate run async with erasure coding and writes. This will reduce overall latency in distributed setups where the input is from upstream and writes go to other servers.

We will read at 2 buffers ahead, meaning one will always be ready/waiting and one is currently being read from.

This improves PutObject and PutObjectParts for these cases.
2022-01-14 10:01:25 -08:00
Harshavardhana
7aec38a73e
Simplify the messaging for internode versions (#14103)
provide a cleaner message instead of cryptic
logs, also provide the relevant link on how to do
recommended way to upgrade.
2022-01-13 17:25:08 -08:00
Klaus Post
a2fd8caa69
Ignore version not found in deleteVersions (#14093)
When deleting multiple versions it "gives" up with an errFileVersionNotFound if 
a version cannot be found. This effectively skips deleting other versions 
sent in the same request. 

This can happen on inconsistent objects. We should ignore errFileVersionNotFound 
and continue with others.

We already ignore these at the caller level, this PR is continuation of 54a9877
2022-01-13 14:28:07 -08:00
Harshavardhana
f546636c52
fix: use renameAll instead of deleteObject() for purging temporary files (#14096)
This PR simplifies few things

- Multipart parts are renamed, upon failure are unrenamed() keep this
  multipart specific behavior it is needed and works fine.

- AbortMultipart should blindly delete once lock is acquired instead
  of re-reading metadata and calculating quorum, abort is a delete()
  operation and client has no business looking for errors on this.

- Skip Access() calls to folders that are operating on
  `.minio.sys/multipart` folder as well.
2022-01-13 11:07:41 -08:00
Harshavardhana
38ccc4f672
fix: make sure to avoid calling RenameData() on disconnected disks. (#14094)
Large clusters with multiple sets, or multi-pool setups at times might
fail and report unexpected "file not found" errors. This can become
a problem during startup sequence when some files need to be created
at multiple locations.

- This PR ensures that we nil the erasure writers such that they
  are skipped in RenameData() call.

- RenameData() doesn't need to "Access()" calls for `.minio.sys`
  folders they always exist.

- Make sure PutObject() never returns ObjectNotFound{} for any
  errors, make sure it always returns "WriteQuorum" when renameData()
  fails with ObjectNotFound{}. Return appropriate errors for all
  other cases.
2022-01-12 18:49:01 -08:00
Harshavardhana
cc3f139d1f
replication: attempt abort multipart-upload at max 3 times on remote (#14087)
this is mainly an attempt to relinquish space on the remote
site, if this still doesn't do it we give and let the admin
know with a log message.
2022-01-11 22:32:29 -08:00
Harshavardhana
d50442da01
fix: simplify usage calculation and progress (#14086) 2022-01-11 18:48:43 -08:00
Harshavardhana
404b05a44c
fix: ignore drained pool in Healing, hold lock additionally (#14080) 2022-01-11 12:27:47 -08:00
Harshavardhana
3d7c1ad31d
ignore configNotFound error in AccountInfo() (#14082)
fixes #14081
2022-01-11 08:43:18 -08:00
yinhen
d300e775a6
Avoid reconnect of disk during startup sequence (#14070) 2022-01-10 23:33:58 -08:00
Harshavardhana
7ee2d1c339
fix: when healing log path when we give up (#14079) 2022-01-10 21:22:17 -08:00
Poorna
54a98773f8
fix: replication of tag removal (#14056)
Currently tag removal leaves replication state as `PENDING` 
because the `HEAD` api returns just a tag count but not the 
actual tags, and this is treated as a no-op
2022-01-10 19:06:10 -08:00
Harshavardhana
737a3f0bad
fix: decommission bugfixes found during migration of .minio.sys/config (#14078) 2022-01-10 17:26:00 -08:00
Harshavardhana
3bd9636a5b
do not remove Sid from svcaccount policies (#14064)
fixes #13905
2022-01-10 14:26:26 -08:00
Harshavardhana
76b21de0c6
feat: decommission feature for pools (#14012)
```
λ mc admin decommission start alias/ http://minio{1...2}/data{1...4}
```

```
λ mc admin decommission status alias/
┌─────┬─────────────────────────────────┬──────────────────────────────────┬────────┐
│ ID  │ Pools                           │ Capacity                         │ Status │
│ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Active │
│ 2nd │ http://minio{3...4}/data{1...4} │ 329 GiB (used) / 421 GiB (total) │ Active │
└─────┴─────────────────────────────────┴──────────────────────────────────┴────────┘
```

```
λ mc admin decommission status alias/ http://minio{1...2}/data{1...4}
Progress: ===================> [1GiB/sec] [15%] [4TiB/50TiB]
Time Remaining: 4 hours (started 3 hours ago)
```

```
λ mc admin decommission status alias/ http://minio{1...2}/data{1...4}
ERROR: This pool is not scheduled for decommissioning currently.
```

```
λ mc admin decommission cancel alias/
┌─────┬─────────────────────────────────┬──────────────────────────────────┬──────────┐
│ ID  │ Pools                           │ Capacity                         │ Status   │
│ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Draining │
└─────┴─────────────────────────────────┴──────────────────────────────────┴──────────┘
```

> NOTE: Canceled decommission will not make the pool active again, since we might have
> Potentially partial duplicate content on the other pools, to avoid this scenario be
> very sure to start decommissioning as a planned activity.

```
λ mc admin decommission cancel alias/ http://minio{1...2}/data{1...4}
┌─────┬─────────────────────────────────┬──────────────────────────────────┬────────────────────┐
│ ID  │ Pools                           │ Capacity                         │ Status             │
│ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Draining(Canceled) │
└─────┴─────────────────────────────────┴──────────────────────────────────┴────────────────────┘
```
2022-01-10 09:07:49 -08:00
Harshavardhana
b7c5e45fff
heal: isObjectDangling should return false when it cannot decide (#14053)
In a multi-pool setup when disks are coming up, or in a single pool
setup let's say with 100's of erasure sets with a slow network.

It's possible when healing is attempted on `.minio.sys/config`
folder, it can lead to healing unexpectedly deleting some policy
files as dangling due to a mistake in understanding when `isObjectDangling`
is considered to be 'true'.

This issue happened in commit 30135eed86
when we assumed the validMeta with empty ErasureInfo is considered
to be fully dangling. This implementation issue gets exposed when
the server is starting up.

This is most easily seen with multiple-pool setups because of the
disconnected fashion pools that come up. The decision to purge the
object as dangling is taken incorrectly prior to the correct state
being achieved on each pool, when the corresponding drive let's say
returns 'errDiskNotFound', a 'delete' is triggered. At this point,
the 'drive' comes online because this is part of the startup sequence
as drives can come online lazily.

This kind of situation exists because we allow (totalDisks/2) number
of drives to be online when the server is being restarted.

Implementation made an incorrect assumption here leading to policies
getting deleted.

Added tests to capture the implementation requirements.
2022-01-07 19:11:54 -08:00
Aditya Manthramurthy
0a224654c2
fix: progagation of service accounts for site replication (#14054)
- Only non-root-owned service accounts are replicated for now.
- Add integration tests for OIDC with site replication
2022-01-07 17:41:43 -08:00
Aditya Manthramurthy
1981fe2072
Add internal IDP and OIDC users support for site-replication (#14041)
- This allows site-replication to be configured when using OpenID or the
  internal IDentity Provider.

- Internal IDP IAM users and groups will now be replicated to all members of the
  set of replicated sites.

- When using OpenID as the external identity provider, STS and service accounts
  are replicated.

- Currently this change dis-allows root service accounts from being
  replicated (TODO: discuss security implications).
2022-01-06 15:52:43 -08:00
Minio Trusted
76877eb6fa move gofumpt to golang-ci 2022-01-06 13:08:21 -08:00
Klaus Post
3d66d053c7
Add small client TLS PSK cache (#14039) 2022-01-06 11:34:02 -08:00
Klaus Post
0e31cff762
fix: DeleteMultipleObjects to finish even if cancelled + concurrent sets (#14038)
* Process sets concurrently.
* Disconnect context from request.
* Insert context cancellation checks.
* errFileNotFound and errFileVersionNotFound are ok, unless creating delete markers.
2022-01-06 10:47:49 -08:00
Shireesh Anjal
c27110e37d
Add timeinfo to health data (#14013)
Capture RoundtripDuration to figure out 
NTP issues in subnet health analyzer.
2022-01-06 01:51:10 -08:00
Harshavardhana
89441a22aa
enforceRetentionForDeletion should return false early for delete-marker (#14033) 2022-01-05 17:05:28 -08:00
Poorna
4d39fd4165
Add API for cluster replication status visibility (#13885) 2022-01-05 02:44:08 -08:00
Harshavardhana
001b77e7e1
use readConfig/saveConfig to simplify I/O on usage/tracker info (#14019) 2022-01-03 10:22:58 -08:00
Harshavardhana
a60ac7ca17
fix: audit log to support object names in multipleObjectNames() handler (#14017) 2022-01-03 01:28:52 -08:00
Harshavardhana
42ba0da6b0
fix: initialize new drwMutex for each attempt in 'for {' loop. (#14009)
It is possible that GetLock() call remembers a previously
failed releaseAll() when there are networking issues, now
this state can have potential side effects.

This PR tries to avoid this side affect by making sure
to initialize NewNSLock() for each GetLock() attempts
made to avoid any prior state in the memory that can
interfere with the new lock grants.
2022-01-02 09:15:34 -08:00
Harshavardhana
f527c708f2
run gofumpt cleanup across code-base (#14015) 2022-01-02 09:15:06 -08:00
Harshavardhana
79df2c7ce7
correctly calculate read quorum based on the available fileInfo (#14000)
The current usage of assuming `default` parity of `4` is not correct
for all objects stored on MinIO, objects in .minio.sys have maximum
parity, healing won't trigger on these objects due to incorrect
verification of quorum.
2021-12-28 15:33:03 -08:00
Harshavardhana
866a95de38
fix: choose appropriate quorum for a given erasure set (#13998)
multiObject delete should honor expected quorum
2021-12-28 12:41:52 -08:00
Minio Trusted
bb97eafa82 madmin-go v1.1.23 and pkg v1.1.11 2021-12-26 23:23:18 -08:00
Harshavardhana
c980804514
trim values from envrionment files (#13991)
trim values to remove any spaces, newlines
from the files while importing credentials
and other values.
2021-12-25 22:02:54 -08:00
Harshavardhana
b883803b21
fix: healing across pools removing dangling objects (#13990)
adds other simplifications to the code when running
namespace heals across pools.
2021-12-25 09:01:44 -08:00
Harshavardhana
7e3a7d7044
add healing for invalid shards by skipping the blocks (#13978)
Built on top of #13945, now we need to simply skip the
shards and its automated.
2021-12-23 23:01:46 -08:00
Aditya Manthramurthy
5a96cbbeaa
Fix user privilege escalation bug (#13976)
The AddUser() API endpoint was accepting a policy field. 
This API is used to update a user's secret key and account 
status, and allows a regular user to update their own secret key. 

The policy update is also applied though does not appear to 
be used by any existing client-side functionality.

This fix changes the accepted request body type and removes 
the ability to apply policy changes as that is possible via the 
policy set API.

NOTE: Changing passwords can be disabled as a workaround
for this issue by adding an explicit "Deny" rule to disable the API
for users.
2021-12-23 09:21:21 -08:00
Harshavardhana
54ec0a1308
add configurable delta for skipping shards (#13967)
This PR is an attempt to make this configurable
as not all situations have same level of tolerable
delta, i.e disks are replaced days apart or even
hours.

There is also a possibility that nodes have drifted
in time, when NTP is not configured on the system.
2021-12-22 11:43:01 -08:00
Harshavardhana
1cf726348f
return meaningful error for disabled users (#13968)
fixes #13958
2021-12-22 11:40:21 -08:00
Harshavardhana
0e3037631f
skip inconsistent shards if possible (#13945)
data shards were wrong due to a healing bug
reported in #13803 mainly with unaligned object
sizes.

This PR is an attempt to automatically avoid
these shards, with available information about
the `xl.meta` and actually disk mtime.
2021-12-21 10:08:26 -08:00
Aditya Manthramurthy
6fbf4f96b6
Move last remaining IAM notification calls into IAMSys methods (#13941) 2021-12-21 02:16:50 -08:00
Aditya Manthramurthy
526e10a2e0
Fix regression in STS permissions via group in internal IDP (#13955)
- When using MinIO's internal IDP, STS credential permissions did not check the
groups of a user.

- Also fix bug in policy checking in AccountInfo call
2021-12-20 14:07:16 -08:00
Harshavardhana
499872f31d
Add configurable channel queue_size for audit/logger webhook targets (#13819)
Also log all the missed events and logs instead of silently
swallowing the events.

Bonus: Extend the logger webhook to support mTLS
similar to audit webhook target.
2021-12-20 13:16:53 -08:00
Anis Elleuch
5cc16e098c
env: Remove quotes when parsing a config env file (#13953)
The code parsing the config environment file does not remove 
quotes of environment variables values. This commit adds this 
capability.
2021-12-20 13:13:06 -08:00
Aditya Manthramurthy
1f4e0bd17c
fix: access for root user's STS credential (#13947)
add a test to cover this case
2021-12-19 23:05:20 -08:00
Aditya Manthramurthy
997e808088
fix; race in bucket replication stats (#13942)
- r.ulock was not locked when r.UsageCache was being modified

Bonus:

- simplify code by removing some unnecessary clone methods - we can 
do this because go arrays are values (not pointers/references) that are 
automatically copied on assignment.

- remove some unnecessary map allocation calls
2021-12-17 15:33:13 -08:00
Shireesh Anjal
13441ad0f8
Add IsKubernetes and IsDocker to health data (#13936) 2021-12-17 14:46:54 -08:00
Harshavardhana
aa508591c1
cache only metrics served from the disks (#13940)
do not need to cache in-memory instant metrics
2021-12-17 11:40:09 -08:00
Harshavardhana
818f0201fc
re-implement prometheus metrics endpoint to be simpler (#13922)
data-structures were repeatedly initialized
this causes GC pressure, instead re-use the
collectors.

Initialize collectors in `init()`, also make
sure to honor the cache semantics for performance
requirements.

Avoid a global map and a global lock for metrics
lookup instead let them all be lock-free unless
the cache is being invalidated.
2021-12-17 10:11:04 -08:00
Aditya Manthramurthy
890f43ffa5
Map policy to parent for STS (#13884)
When STS credentials are created for a user, a unique (hopefully stable) parent
user value exists for the credential, which corresponds to the user for whom the
credentials are created. The access policy is mapped to this parent-user and is
persisted. This helps ensure that all STS credentials of a user have the same
policy assignment at all times.

Before this change, for an OIDC STS credential, when the policy claim changes in
the provider (when not using RoleARNs), the change would not take effect on
existing credentials, but only on new ones.

To support existing STS credentials without parent-user policy mappings, we
lookup the policy in the policy claim value. This behavior should be deprecated
when such support is no longer required, as it can still lead to stale
policy mappings.

Additionally this change also simplifies the implementation for all non-RoleARN
STS credentials. Specifically, for AssumeRole (internal IDP) STS credentials,
policies are picked up from the parent user's policies; for
AssumeRoleWithCertificate STS credentials, policies are picked up from the
parent user mapping created when the STS credential is generated.
AssumeRoleWithLDAP already picks up policies mapped to the virtual parent user.
2021-12-17 00:46:30 -08:00
Poorna K
e270ab65b3
fix: healing of replication delete markers (#13933)
A corner case can occur where the delete-marker was propagated 
but the metadata could not be updated on the primary. Sending 
a RemoveObject call with the Delete marker version would end 
up permanently deleting the version on target. Instead, perform 
a Stat on the delete-marker version on target and redo replication 
only if the delete-marker is missing on target.
2021-12-16 15:34:55 -08:00
Anis Elleuch
926373f9c1
Run the data scanner routine in a loop (#13928)
After the introduction of Refresh logic in locks, the data scanner can
quit when the data scanner lock is not able to get refreshed. In that
case, the context of the data scanner will get canceled and
runDataScanner() will quit. Another server would pick the scanning
routine but after some time, all nodes can just have all scanning
routine aborted, as described above.

This fix will just run the data scanner in a loop.
2021-12-16 08:32:15 -08:00
Poorna K
111c6177d2
Deprecate caching for erasure/distributed mode (#13909)
Fixes: #13907

Also removing default value of `writethrough` for cache commit
which was interfering with cache_after setting
2021-12-15 16:48:34 -08:00
Poorna K
b42cfcea60
Disallow versioning/replication change in cluster replication setup (#13910) 2021-12-15 10:37:08 -08:00
Klaus Post
aca6dfbd60
Check for nil RPC in listing (#13917)
Fixes #13915
2021-12-15 09:19:11 -08:00
Harshavardhana
5f7e6d03ff
copy bucket slice to avoid skipping .minio.sys/buckets (#13912)
healing was skipping `.minio.sys/buckets` path so
essentially not healing `.usage.json` - fix this
by making a copy of `buckets` slice.
2021-12-15 09:18:09 -08:00
Harshavardhana
88ad742da0
fix: error handling cases in site-replication (#13901)
- Allow proper SRError to be propagated to
  handlers and converted appropriately.

- Make sure to enable object locking on buckets
  when requested in MakeBucketHook.

- When DNSConfig is enabled attempt to delete it
  first before deleting buckets locally.
2021-12-14 14:09:57 -08:00
Krishnan Parthasarathi
44a9339c0a
Newer noncurrent versions (#13815)
- Rename MaxNoncurrentVersions tag to NewerNoncurrentVersions

Note: We apply overlapping NewerNoncurrentVersions rules such that 
we honor the highest among applicable limits. e.g if 2 overlapping rules 
are configured with 2 and 3 noncurrent versions to be retained, we 
will retain 3.

- Expire newer noncurrent versions after noncurrent days
- MinIO extension: allow noncurrent days to be zero, allowing expiry 
  of noncurrent version as soon as more than configured 
  NewerNoncurrentVersions are present.
- Allow NewerNoncurrentVersions rules on object-locked buckets
- No x-amz-expiration when NewerNoncurrentVersions configured
- ComputeAction should skip rules with NewerNoncurrentVersions > 0
- Add unit tests for lifecycle.ComputeAction
- Support lifecycle rules with MaxNoncurrentVersions
- Extend ExpectedExpiryTime to work with zero days
- Fix all-time comparisons to be relative to UTC
2021-12-14 09:41:44 -08:00
Harshavardhana
113c7ff49a
add code to parse secrets natively instead of shell scripts (#13883) 2021-12-13 18:23:31 -08:00
Poorna K
d422d24278
replication: warn if insufficient workers (#13899)
This should give an early warning if configured replication 
workers are insufficient to meet application workload.
2021-12-13 18:22:56 -08:00
Aditya Manthramurthy
de400f3473
Allow setting non-existent policy on a user/group (#13898) 2021-12-13 15:55:52 -08:00
Harshavardhana
8144a125ce
check for update in background (#13889) 2021-12-13 09:43:03 -08:00
jiangfucheng
88c0d0120c
update heal object unit test (#13886) 2021-12-11 09:04:07 -08:00
Aditya Manthramurthy
44fefe5b9f
Add option to policy info API to return create/mod timestamps (#13796)
- This introduces a new admin API with a query parameter (v=2) to return a
response with the timestamps

- Older API still works for compatibility/smooth transition in console
2021-12-11 09:03:39 -08:00
Aditya Manthramurthy
f2bd026d0e
Allow OIDC user to query user info if policies permit (#13882) 2021-12-10 15:03:39 -08:00
Klaus Post
81e43b87c2
Don't zero buffer if big enough (#13877)
Only append zeroed bytes when we don't have enough space anyway.
2021-12-10 13:08:10 -08:00
Aditya Manthramurthy
a02e17f15c
Add tests to ensure that OIDC user can create IAM users (#13881) 2021-12-10 13:04:21 -08:00
Harshavardhana
5b7c00ff52
add more tests to cover areas for weird object names (#13873)
continuation of #13858 to add more tests and also validate the 
written object data.
2021-12-09 17:52:53 -08:00
Aditya Manthramurthy
b9f0046ee7
Allow STS credentials to create users (#13874)
- allow any regular user to change their own password
- allow STS credentials to create users if permissions allow

Bonus: do not allow changes to sts/service account credentials (via add user API)
2021-12-09 17:48:51 -08:00
Harshavardhana
3b79f7e4ae
ignore if volume exists in MakeVolBulk, return other errors (#13866) 2021-12-09 15:55:42 -08:00
Aditya Manthramurthy
85d2df02b9
fix: user listing with LDAP (#13872)
Users listing was showing just a weird policy 
mapping output which does not make sense here.
2021-12-09 15:55:28 -08:00
Harshavardhana
2f1e8ba612
add more directory marker tests and fix a bug (#13871)
ListObjects() should never list a delete-marked folder
if latest is delete marker and delimiter is not provided.

ListObjectVersions() should list a delete-marked folder
even if latest is delete marker and delimiter is not
provided.

Enhance further versioning listing on the buckets
2021-12-09 14:59:23 -08:00
Anis Elleuch
84c690cb07
storage: Use request.Form and avoid mux matching (#13858)
request.Form uses less memory allocation and avoids gorilla mux matching
with weird characters in parameters such as '\n'

- Remove Queries() to avoid matching
- Ensure r.ParseForm is called to populate fields
- Add a unit test for object names with '\n'
2021-12-09 08:38:46 -08:00
Harshavardhana
239bbad7ab
add test to expect prefix without a directory object (#13865)
Motivation is to cover more areas
2021-12-09 08:36:54 -08:00
Harshavardhana
dcff6c996d
fix: do not list delete-marked objects (#13864)
delete marked objects should not be considered
for listing when listing is delimited, this issue
as introduced in PR #13804 which was mainly to
address listing of directories in listing when
delimited.

This PR fixes this properly and adds tests to
ensure that we behave in accordance with how
an S3 API behaves for ListObjects() without
versions.
2021-12-08 17:34:52 -08:00
Poorna K
0a66a6f1e5
Avoid cache GC of writebacks before commit syncs (#13860)
Save part.1 for writebacks in a separate folder
and move it to cache dir atomically while saving
the cache metadata. This is to avoid GC mistaking
part.1 as orphaned cache entries and purging them.

This PR also fixes object size being overwritten during
retries for write-back mode.
2021-12-08 14:52:31 -08:00
Harshavardhana
e82a5c5c54
fix: site replication issues and add tests (#13861)
- deleting policies was deleting all LDAP
  user mapping, this was a regression introduced
  in #13567

- deleting of policies is properly sent across
  all sites.

- remove unexpected errors instead embed the real
  errors as part of the 500 error response.
2021-12-08 11:50:15 -08:00
Harshavardhana
b9aae1aaae
fix: speedtest should exit upon errors cleanly (#13851)
- deleteBucket() should be called for cleanup
  if client abruptly disconnects

- out of disk errors should be sent to client
  properly and also cancel the calls

- limit concurrency to available MAXPROCS not
  32 for auto-tuned setup, if procs are beyond
  32 then continue normally. this is to handle
  smaller setups.

fixes #13834
2021-12-06 16:36:14 -08:00
Harshavardhana
7d70afc937
fix: potential crash in diskCache when fileScorer is empty (#13850)
```
goroutine 115 [running]:
github.com/minio/minio/cmd.(*diskCache).purge.func3({0xc007a10a40, 0x40}, 0x40)
   github.com/minio/minio/cmd/disk-cache-backend.go:430 +0x90d
```
2021-12-06 15:55:29 -08:00
Aditya Manthramurthy
12b63061c2
Fix LDAP service account creation (#13849)
- when a user has only group permissions
- fixes regression from ac74237f0 (#13657)
- fixes https://github.com/minio/console/issues/1291
2021-12-06 15:55:11 -08:00
Klaus Post
038fdeea83
snowball: return errors on failures (#13836)
Return errors when untar fails at once.

Current error handling was quite a mess. Errors are written 
to the stream, but processing continues.

Instead, return errors when they occur and transform 
internal errors to bad request errors, since it is likely a 
problem with the input.

Fixes #13832
2021-12-06 09:45:23 -08:00
Anis Elleuch
0b6225bcc3
Better error msg when version mismatch of internode API (#13845)
Sometimes, we see an error message like "Server expects 'storage' API
version 'v41', instead found 'v41'" shows a more generic error message
with the path of the REST call.
2021-12-06 09:44:48 -08:00
Anis Elleuch
f286ef8e17
isMultipart to test on parts sizes only if object is encrypted (#13839)
ObjectInfo.isMultipart() is testing if parts sizes are compatible with
encrypted parts but this only can be done if the object is encrypted.
2021-12-06 09:43:43 -08:00
Harshavardhana
b120bcb60a
validate if cached value is empty before use (#13830)
fixes a crash reproduced while running hadoop tests

```
goroutine 201564 [running]:
github.com/minio/minio/cmd.metaCacheEntries.resolve({0xc0206ab7a0, 0x4, 0xc0015b1908}, 0xc0212a7040)
	github.com/minio/minio/cmd/metacache-entries.go:352 +0x58a
```

Bonus: HeadBucket() should always provide content-type
2021-12-06 02:59:51 -08:00
Harshavardhana
be34fc9134
fix: kms-id header should have arn:aws:kms: prefix (#13833)
arn:aws:kms: is a must for KMS keyID.
2021-12-06 00:39:32 -08:00
Harshavardhana
8591d17d82
return appropriate errors upon parseErrors (#13831) 2021-12-05 11:36:26 -08:00
Harshavardhana
f6190d6751
Add single drive support for directory prefixes in Listing (#13829)
This fixes the compatibility issue with Hadoop 3.3.1

fixes #13710
2021-12-03 18:08:40 -08:00
Aditya Manthramurthy
4f35054d29
Ensure that role ARNs don't collide (#13817)
This is to prepare for multiple providers enhancement.
2021-12-03 13:15:56 -08:00
Shireesh Anjal
d29df6714a
Introduce new config subnet api_key (#13793)
The earlier approach of using a license token for 
communicating with SUBNET is being replaced 
with a simpler mechanism of API keys. Unlike the 
license which is a JWT token, these API keys will 
be simple UUID tokens and don't have any embedded 
information in them. SUBNET would generate the 
API key on cluster registration, and then it would 
be saved in this config, to be used for subsequent 
communication with SUBNET.
2021-12-03 09:32:11 -08:00
jiangfucheng
7460fb8349
fix padding error and compatible with uploaded objects (#13803) 2021-12-03 09:26:30 -08:00
Harshavardhana
a7c430355a
fix: throw appropriate errors when all disks fail (#13820)
when all disks fail with same error, fail server
startup anyways - we cannot proceed.

fixes #13818
2021-12-03 09:25:17 -08:00
Aditya Manthramurthy
b14527b7af
If role policy is configured, require that role ARN be set in STS (#13814) 2021-12-02 15:43:39 -08:00
Klaus Post
3db931dc0e
Improve listing consistency with version merging (#13723) 2021-12-02 11:29:16 -08:00
Klaus Post
8309ddd486
Fix panic (not fatal) on connection drops (#13811)
Fix more regressions from #13597 with double closed channels.

```
panic: "POST /minio/storage/data/distxl-plain/s1/d2/v42/createfile?disk-id=c789f7e1-2b52-442a-b518-aa2dac03f3a1&file-path=f6161668-b939-4543-9873-91b9da4cdff6%2F5eafa986-a3bf-4b1c-8bc0-03a37de390a3%2Fpart.1&length=2621760&volume=.minio.sys%2Ftmp": send on closed channel
goroutine 1977 [running]:
runtime/debug.Stack()
        c:/go/src/runtime/debug/stack.go:24 +0x65
github.com/minio/minio/cmd.setCriticalErrorHandler.func1.1()
        d:/minio/minio/cmd/generic-handlers.go:468 +0x8e
panic({0x2928860, 0x4fb17e0})
        c:/go/src/runtime/panic.go:1038 +0x215
github.com/minio/minio/cmd.keepHTTPReqResponseAlive.func2({0x4fe4ea0, 0xc02737d8a0})
        d:/minio/minio/cmd/storage-rest-server.go:818 +0x48
github.com/minio/minio/cmd.(*storageRESTServer).CreateFileHandler(0xc0015a8510, {0x50073e0, 0xc0273ec460}, 0xc029b9a400)
        d:/minio/minio/cmd/storage-rest-server.go:334 +0x1d2
net/http.HandlerFunc.ServeHTTP(...)
        c:/go/src/net/http/server.go:2046
github.com/minio/minio/cmd.httpTraceHdrs.func1({0x50073e0, 0xc0273ec460}, 0x0)
        d:/minio/minio/cmd/handler-utils.go:372 +0x53
net/http.HandlerFunc.ServeHTTP(0x5007380, {0x50073e0, 0xc0273ec460}, 0x10)
        c:/go/src/net/http/server.go:2046 +0x2f
github.com/minio/minio/cmd.addCustomHeaders.func1({0x5007380, 0xc0273dcf00}, 0xc0273f7340)
```

Reverts but adds write checks.
2021-12-02 11:22:32 -08:00
Harshavardhana
21c868a646
fix: do not ignore delete-marker directories in ListObjects() (#13804)
Following scenario such as objects that exist inside a
prefix say `folder/` must be included in the listObjects()
response.

```
2aa16073-387e-492c-9d59-b4b0b7b6997a v2 DEL folder/
a5b9ce68-7239-4921-90ab-20aed402c7a2 v1 PUT folder/
f2211798-0eeb-4d9e-9184-fcfeae27d069 v1 PUT folder/1.txt
```

Current master does not handle this scenario, because it
ignores the top level delete-marker on folders. This is
however unexpected. It is expected that list-objects returns
the top level prefix in this situation.

```
aws s3api list-objects --bucket harshavardhana --prefix unique/ \
     --delimiter / --profile minio --endpoint-url http://localhost:9000
{
    "CommonPrefixes": [
        {
            "Prefix": "unique/folder/"
        }
    ]
}
```

There are applications in the wild such as Hadoop s3a connector
that exploit this behavior and expect the folder to be present
in the response.

This also makes the behavior consistent with AWS S3.
2021-12-02 08:46:33 -08:00
Harshavardhana
24d904d194
reload certs from disk upon SIGHUP (#13792) 2021-12-01 00:38:32 -08:00
Harshavardhana
b280a37c4d
add delete-marker proactively in DeleteObject() (#13795)
single object delete was not working properly
on a bucket when versioning was suspended,
current version 'null' object was never removed.

added unit tests to cover the behavior

fixes #13783
2021-11-30 18:30:06 -08:00
Poorna K
9ec197f2e8
Add support for adding new site(s) to site replication (#13696)
Currently, the new site is expected to be empty
2021-11-30 13:16:37 -08:00
Poorna K
d21466f595
cache: in writeback mode skip etag verification (#13781)
if the commit is still in pending or failed status

This PR also does some minor code cleanup
2021-11-30 10:22:42 -08:00
Aditya Manthramurthy
42d11d9e7d
Move IAM notifications into IAM system functions (#13780) 2021-11-29 14:38:57 -08:00
Harshavardhana
e49c184595
add configurable 'shutdown-timeout' for HTTP server (#13771)
fixes #12317
2021-11-29 09:06:56 -08:00
Harshavardhana
99d87c5ca2
fix: totalDrives reported in speedTest for multiple-pools (#13770)
totalDrives reported in speedTest result were wrong
for multiple pools, this PR fixes this.

Bonus: add support for configurable storage-class, this
allows us to test REDUCED_REDUNDANCY to see further
maximum throughputs across the cluster.
2021-11-29 09:05:46 -08:00
Aditya Manthramurthy
4c0f48c548
Add role ARN support for OIDC identity provider (#13651)
- Allows setting a role policy parameter when configuring OIDC provider

- When role policy is set, the server prints a role ARN usable in STS API requests

- The given role policy is applied to STS API requests when the roleARN parameter is provided.

- Service accounts for role policy are also possible and work as expected.
2021-11-26 19:22:40 -08:00
Aditya Manthramurthy
4ce6d35e30
Add new site config sub-system intended to replace region (#13672)
- New sub-system has "region" and "name" fields.

- `region` subsystem is marked as deprecated, however still works, unless the
new region parameter under `site` is set - in this case, the region subsystem is
ignored. `region` subsystem is hidden from top-level help (i.e. from `mc admin
config set myminio`), but appears when specifically requested (i.e. with `mc
admin config set myminio region`).

- MINIO_REGION, MINIO_REGION_NAME are supported as legacy environment variables for server region.

- Adds MINIO_SITE_REGION as the current environment variable to configure the
server region and MINIO_SITE_NAME for the site name.
2021-11-25 13:06:25 -08:00
Klaus Post
34dc725d26
fix: s3zip in fs mode (#13758)
The index was converted directly from bytes to binary. This would fail a roundtrip through json.

This would result in `Error: invalid input: magic number mismatch` when reading back.

On non-erasure backends store index as base64.
2021-11-25 09:11:25 -08:00
Aditya Manthramurthy
61029fe20b
fix: returning invalid account-not-exists error for LDAP svc acc (#13756) 2021-11-24 15:19:33 -08:00
Anis Elleuch
55d4cdd464
multi-delete: Avoid empty Delete tag in the response (#13725)
When removing an object fails, such as when it is WORM protected, a
wrong <Delete> will still be in the response. This commit fixes it.
2021-11-24 10:01:07 -08:00
Klaus Post
fe3e47b1e8
Fix "send on closed channel" panic (#13745)
The httpStreamResponse should not return until CloseWithError has been called.

Instead keep track of write state and skip writing/flushing if an error has occurred.

Fixes #13743

Regression from #13597 (not released)
2021-11-24 09:42:42 -08:00
Harshavardhana
9ca25bd48f
fix: atomic.Value should be a concrete type to avoid panics (#13740)
Go's atomic.Value does not support `nil` type,
concrete type is necessary to avoid any panics with
the current implementation.

Also remove boolean to turn-off tracking of freezeCount.
2021-11-23 16:09:28 -08:00
Harshavardhana
91e0823ff0
allow service freeze/unfreeze on a setup (#13707)
an active running speedTest will reject all
new S3 requests to the server, until speedTest
is complete.

this is to ensure that speedTest results are
accurate and trusted.

Co-authored-by: Klaus Post <klauspost@gmail.com>
2021-11-23 12:02:16 -08:00
Klaus Post
142c6b11b3
Reduce JWT overhead for internode tokens (#13738)
Since JWT tokens remain valid for up to 15 minutes, we 
don't have to regenerate tokens for every call.

Cache tokens for matching access+secret+audience 
for up to 15 seconds.

```
BenchmarkAuthenticateNode/uncached-32         	  270567	      4179 ns/op	    2961 B/op	      33 allocs/op
BenchmarkAuthenticateNode/cached-32           	 7684824	       157.5 ns/op	      48 B/op	       1 allocs/op
```

Reduces internode call allocations a great deal.
2021-11-23 09:51:53 -08:00
Anis Elleuch
d1bfb4d2c0
policy: Fix a typo when validating the list of policies (#13735)
When assigning two policies to a user using mc command, the server code
wrongly validates due to a typo in the code, the commit fixes it.
2021-11-23 08:57:29 -08:00
Harshavardhana
26c457860b
remove "expires" header from presign v2 as metadata (#13718)
fixes #13704
2021-11-22 16:07:23 -08:00
Harshavardhana
28f95f1fbe
quorum calculation getLatestFileInfo should be itself (#13717)
FileInfo quorum shouldn't be passed down, instead
inferred after obtaining a maximally occurring FileInfo.

This PR also changes other functions that rely on
wrong quorum calculation.

Update tests as well to handle the proper requirement. All
these changes are needed when migrating from older deployments
where we used to set N/2 quorum for reads to EC:4 parity in
newer releases.
2021-11-22 09:36:29 -08:00
Harshavardhana
c791de0e1e
re-implement pickValidInfo dataDir, move to quorum calculation (#13681)
dataDir loosely based on maxima is incorrect and does not
work in all situations such as disks in the following order

- xl.json migration to xl.meta there may be partial xl.json's
  leftover if some disks are not yet connected when the disk
  is yet to come up, since xl.json mtime and xl.meta is
  same the dataDir maxima doesn't work properly leading to
  quorum issues.

- its also possible that XLV1 might be true among the disks
  available, make sure to keep FileInfo based on common quorum
  and skip unexpected disks with the older data format.

Also, this PR tests upgrade from older to a newer release if the 
data is readable and matches the checksum.

NOTE: this is just initial work we can build on top of this to do further tests.
2021-11-21 10:41:30 -08:00