Commit Graph

5119 Commits

Author SHA1 Message Date
Klaus Post b7bb122be8
fix: replication auto-scaling deadlock (#16084) 2022-11-17 07:35:02 -08:00
Krishnan Parthasarathi 8441a3bf5f
fix: update metacache entry only once (#16072) 2022-11-16 11:25:00 -08:00
Harshavardhana 853c4de75a
allow changing endpoints in distributed setups (#16071) 2022-11-16 07:59:10 -08:00
jiuker 3597af789e
allow resultCh to be closed() after clusterMetaHealthInfo() (#16073) 2022-11-16 03:04:36 -08:00
Shireesh Anjal 5246e3be84
Send health diagnostics data as part of callhome (#16006) 2022-11-15 13:53:05 -08:00
Klaus Post 8a07000e58
fix: refactor getReplicationDiff for safe use (#16051) 2022-11-15 07:59:21 -08:00
Krishnan Parthasarathi 3bb82ef60d
top-locks: Include lock-held duration (#16061) 2022-11-15 07:57:52 -08:00
Harshavardhana 91f45c4aa6
avoid inconsistent versions healing when versions are large (#16066) 2022-11-14 18:35:26 -08:00
Poorna d6bc141bd1
feat: Add support for site level resync (#15753) 2022-11-14 07:16:40 -08:00
jiuker 7ac64ad24a
fix: use errors.Is for wrapped returns (#16062) 2022-11-14 07:15:46 -08:00
Harshavardhana 6d76db9d6c
improve server startup error when pools are incorrect (#16056) 2022-11-11 19:40:45 -08:00
jiuker bdcb485740
netPerfRX Reset() should use write Lock() (#16043) 2022-11-10 19:44:20 -08:00
Poorna e32b948a49
fix: parsing multipart uploadID under site replicated setup (#16048)
continue the fix from #16034
2022-11-10 16:17:45 -08:00
Klaus Post 5b242f1d11
Add Audit target metrics (#16044) 2022-11-10 10:20:21 -08:00
Poorna 34d28dd79f
replication: Avoid blocking on mrf save (#16045) 2022-11-10 10:20:02 -08:00
Krishnan Parthasarathi 6eef9b4a23
lifecycle: simplify Eval and HasActiveRules (#16036) 2022-11-10 07:17:45 -08:00
Aditya Manthramurthy 5f1999cc71
fix: avoid URL unsafe chars in multipart upload ID (#16034) 2022-11-09 16:41:16 -08:00
Krishnan Parthasarathi 40a2c6b882
Return remote tier as StorageClass for transitioned objects (#16035) 2022-11-09 15:57:34 -08:00
jiuker 7b7356f04c
close the reader under disk cache bitrot verification (#16024) 2022-11-09 04:20:11 -08:00
Klaus Post bbc312fce6
Add notification queue metrics (#16026) 2022-11-08 16:36:47 -08:00
Anis Elleuch 7260241511
Remove some logs caused by external apps (#16027) 2022-11-08 13:29:05 -08:00
Anis Elleuch 3b1a9b9fdf
Use the same lock for the scanner and site replication healing (#15985) 2022-11-08 08:55:55 -08:00
Harshavardhana 72afc2727a
rebalance status must return appropriate error initially (#16022) 2022-11-08 07:56:45 -08:00
Aditya Manthramurthy 76d822bf1e
Add LDAP policy entities API (#15908) 2022-11-07 14:35:09 -08:00
Klaus Post ddeca9f12a
fix: filter rest errors and logs returned (#16019) 2022-11-07 10:38:08 -08:00
Harshavardhana 1f3db03bf0
allow changing argument for path for SNSD setup (#16013) 2022-11-07 00:11:58 -08:00
Harshavardhana 944c62daf4
skip flaky tests on windows OS (#16015) 2022-11-07 00:11:21 -08:00
Harshavardhana 9547b7d0e9
add deadlineConnections on remoteTransport (#16010) 2022-11-05 11:09:21 -07:00
Klaus Post 808ecfe0f2
merge versions across sets when listing (#16003) 2022-11-04 11:33:22 -07:00
Klaus Post 2894dd4d1a
fix: hold lock while serializing replication stats (#16007) 2022-11-04 09:59:14 -07:00
jiuker fd8750e959
fix: http body must be drained in downloadBinary() (#16001) 2022-11-04 08:22:38 -07:00
Poorna 4f5d38a4b1
site replication edit: validate endpoint belongs to deployment (#16000) 2022-11-03 16:23:45 -07:00
Anis Elleuch 7e73fc2870
Implement inspect data API v2 (#15474)
Co-authored-by: Klaus Post <klauspost@gmail.com>
2022-11-02 13:36:38 -07:00
Harshavardhana 0d49b365ff
converge SNSD deployments into single code (#15988) 2022-11-01 16:41:01 -07:00
Anis Elleuch 7721595aa9
config: Deprecated delay/max_wait/scanner and introduce speed (#15941) 2022-11-01 08:04:07 -07:00
Harshavardhana fd6f6fc8df
cleanup stale parent multipart directories (#15980) 2022-11-01 08:00:02 -07:00
Aditya Manthramurthy 4fb47cd568
fix: update admin IDP APIs to be more RESTful (#15896) 2022-10-31 14:52:26 -07:00
Klaus Post ecc932d5dd
Clean entire tmp-old on restart (#15979) 2022-10-31 07:27:50 -07:00
Harshavardhana b57fbff7c1
ignore background healInfo in single drive setup (#15968) 2022-10-31 07:26:10 -07:00
Poorna d765b89a63
improve validation for replication resync API (#15964) 2022-10-28 23:21:33 -07:00
Harshavardhana 6e4acf0504
add a message of removal for gateway and hide the command (#15965) 2022-10-28 14:11:20 -07:00
Klaus Post 71954faa3a
mark pubsub type safe via generics (#15961) 2022-10-28 10:55:42 -07:00
Klaus Post 0f0e154315
fix: inconsistent replication delete marker timestamps (#15956) 2022-10-27 09:46:52 -07:00
Harshavardhana 136d41775f
remove numAvailableDisks check as it doesn't serve any purpose (#15954) 2022-10-27 09:05:24 -07:00
Harshavardhana ec77d28e62
make subnet subsys dynamic and simplify callhome (#15927) 2022-10-27 00:20:01 -07:00
Klaus Post 86420a1f46
Store multipart checksums (#15953) 2022-10-26 18:14:58 -07:00
Poorna 7dd8b6c8ed
ensure ILM expiry creates non null deleteMarker for versioned bucket (#15947) 2022-10-26 16:09:27 -07:00
Anis Elleuch 533c9d4fe3
fix: lockName to disallow parallel same erasure set healing (#15951) 2022-10-26 12:43:54 -07:00
Anis Elleuch a35ef155fc
return appropriate error status code in the lock handler (#15950) 2022-10-26 09:51:26 -07:00
Poorna 8dd3c41b2a
allow MakeBucket errors to be handled lazily (#15945)
remote error is not required to be passed back to the 
client - this is mostly because we have healing that should 
eventually, catch up on this and heal the bucket.
2022-10-25 23:32:37 -07:00
Krishnan Parthasarathi 4523da6543
feat: introduce pool-level rebalance (#15483) 2022-10-25 12:36:57 -07:00
Poorna ce8456a1a9
proxy multipart to peers via multipart uploadID (#15926) 2022-10-25 10:52:29 -07:00
Poorna 9ce1884732
reject editing bucket replication config when site replication is enabled (#15937) 2022-10-24 20:24:32 -07:00
Harshavardhana 23b329b9df
remove gateway completely (#15929) 2022-10-24 17:44:15 -07:00
Krishnan Parthasarathi 0c34e51a75
Filter out tiering metadata during CopyObject (#15936) 2022-10-24 16:32:31 -07:00
Anis Elleuch fc6c794972
Audit dangling object removal (#15933) 2022-10-24 11:35:07 -07:00
Klaus Post 86d543d0f6
Check for s3zip content offset (#15924) 2022-10-21 15:37:48 -07:00
Poorna e4e90b53c1
fix: delete-marker replication check properly (#15923) 2022-10-21 14:45:06 -07:00
Anis Elleuch 58d776daa0
Set CONSOLE_MINIO_SERVER to 127.0.0.1 by default (#15887) 2022-10-21 14:42:28 -07:00
Krishnan Parthasarathi f6b2e89109
Pass encrypted etag as is for immediate tiering (#15925) 2022-10-21 14:40:50 -07:00
Anis Elleuch ac85c2af76
lifecycle: refactor rules filtering and tagging support (#15914) 2022-10-21 10:46:53 -07:00
Shireesh Anjal 5aba2aedb3
Do not freeze s3 traffic in healthinfo api (#15912) 2022-10-21 00:34:32 -07:00
Harshavardhana a8332efa94
fix: Delete() of bucket metadata should not parse the config (#15904) 2022-10-19 17:55:09 -07:00
Aditya Manthramurthy 3dbef72dc7
fix: AccountInfo API for roleARN based accounts (#15907) 2022-10-19 17:54:41 -07:00
Aditya Manthramurthy 2d16e74f38
Add LDAP IDP Configuration APIs (#15840) 2022-10-19 11:00:10 -07:00
Anis Elleuch de5070446d
Deprecate --listeners flag (#15900) 2022-10-19 08:45:50 -07:00
Harshavardhana 374abd1e7d
add filter support for tags and metadata in batch replication (#15885) 2022-10-18 21:22:21 -07:00
Anis Elleuch 0506d9e83d
storage: Return errDiskNotFound when a peer is during shutdown (#15868) 2022-10-18 13:50:46 -07:00
Klaus Post bd3dfad8b9
Add concurrent Snowball extraction + options (#15836) 2022-10-18 13:50:21 -07:00
Harshavardhana 9fff315555
do not need to trace ignored objects (#15894) 2022-10-18 13:47:55 -07:00
Anis Elleuch 18fb86b7be
convert context.DeadlineExceed to offline disk in DiskInfo() (#15886) 2022-10-18 03:01:16 -07:00
Harshavardhana 58a8275e84
do not assume invalid buf to be non-xl.meta (#15843) 2022-10-17 09:39:21 -07:00
Aditya Manthramurthy 85fc7cea97
Pass role ARN for OIDC providers to console (#15862) 2022-10-15 12:57:03 -07:00
Harshavardhana 328d660106
support CRC32 Checksums on single drive setup (#15873) 2022-10-15 11:58:47 -07:00
Harshavardhana c68910005b
validate bucket before attempting batch replication (#15861) 2022-10-15 11:58:31 -07:00
Harshavardhana c79bcc8838 Revert "convert context.DeadlineExceed to offline disk in DiskInfo() (#15869)"
This reverts commit 0fe58dbb34.
2022-10-14 20:37:50 -07:00
Anis Elleuch 0fe58dbb34
convert context.DeadlineExceed to offline disk in DiskInfo() (#15869) 2022-10-14 19:32:13 -07:00
Harshavardhana 6cb2f56395 Revert "Revert "tests: Add context cancelation (#15374)""
This reverts commit 564a0afae1.
2022-10-14 03:08:40 -07:00
Harshavardhana 59e33b3b21
validate setBucketTarget properly as per BucketExists() call (#15860) 2022-10-13 17:46:49 -07:00
Poorna 0e3c92c027 attempt delete marker replication after object is replicated (#15857)
Ensure delete marker replication success, especially since the
recent optimizations to heal on HEAD, LIST and GET can force
replication attempts on delete marker before underlying object
version could have synced.
2022-10-13 17:45:23 -07:00
Anis Elleuch db7a9b2c37
heal-info: Return the endpoint of a disk with unknown state (#15854) 2022-10-13 16:41:44 -07:00
Harshavardhana 44097faec1
support deleteMarkers and all versions in batch replication (#15858) 2022-10-13 14:42:10 -07:00
Klaus Post bf3da5081f
Omit empty checksums in responses (#15850) 2022-10-13 00:49:46 -07:00
Harshavardhana 5532982857
do not disable IsKubernetes(), IsDocker() checks with MINIO_CI_CD (#15852) 2022-10-12 23:40:48 -07:00
Anis Elleuch 783dd875f7
refactor objectQuorumFromMeta() to search for parity quorum (#15844) 2022-10-12 16:42:45 -07:00
Harshavardhana 97112c69be
fix: replication stats() to not crash under any situation (#15851)
Co-authored-by: Daniel Valdivia <18384552+dvaldivia@users.noreply.github.com>
2022-10-12 15:47:41 -07:00
Javier Adriel 2939000342
Add metrics, version and apis handlers (#15839) 2022-10-12 12:08:03 -07:00
Harshavardhana 41e1654f9a
remove spurious logging for object not found (#15842) 2022-10-12 04:28:21 -07:00
Harshavardhana e3cb0278ce
honor specified target prefix under batch replication (#15834) 2022-10-11 14:36:06 -07:00
Harshavardhana 0c81f1bdb3
indicate how long it took to bring the drive online (#15835) 2022-10-11 11:33:56 -07:00
Klaus Post 6220875803
Add missing server info fields (#15826) 2022-10-11 11:31:26 -07:00
Aditya Manthramurthy 64cf887b28
use LDAP config from minio/pkg to share with console (#15810) 2022-10-07 22:12:36 -07:00
Harshavardhana 927a879052
authenticate the request first for headObject() (#15820) 2022-10-07 21:45:53 -07:00
Anis Elleuch dfe0c96b87
preserve Version and DeleteMarker sort order in the list XML response (#15819) 2022-10-07 16:12:36 -07:00
Anis Elleuch e856e10ac2
ignore VersionNotFound in addition to ObjectNotFound while replicating (#15814) 2022-10-07 16:11:41 -07:00
Harshavardhana 928feb0889
remove unused debug param from evalActionFromLifecycle (#15813) 2022-10-07 10:24:12 -07:00
Anis Elleuch 158d0e26a2
decom: Ignore object/version error during deletion (#15806) 2022-10-06 09:41:58 -07:00
Harshavardhana 78385bfbeb
set bucket creation timestamp properly for legacy FS backend (#15800) 2022-10-06 02:46:31 -07:00
Harshavardhana 2a13cc28f2 feat: implement support batch replication (#15554) 2022-10-05 23:00:43 -07:00
Lenin Alevski 4bdf41a6c7
Removing unused getUpdateReaderFromFile function (#15794)
Signed-off-by: Lenin Alevski <alevsk.8772@gmail.com>
2022-10-05 07:58:27 -07:00
Klaus Post 3c605c93fe
warn when 0 parity has been set as default parity (#15790) 2022-10-04 22:41:42 -07:00
Anis Elleuch 121f18a443
Use admin request check for ReplicationDiff handler (#15793) 2022-10-04 17:47:31 -07:00
Harshavardhana 538aeef27a
fix: heal service accounts for LDAP users in site replication (#15785) 2022-10-04 10:41:47 -07:00
Poorna be0d2537b7
site replication: fix typo in meta collection (#15792) 2022-10-04 10:19:17 -07:00
Javier Adriel 3307aa1260
Implement KMS handlers (#15737) 2022-10-04 10:05:09 -07:00
Harshavardhana 57cfdfd8fb
remove 'perf' tests from health diagnostics (#15780) 2022-10-03 00:18:41 -07:00
Harshavardhana f696a221af
allow tagging policy condition for GetObject (#15777) 2022-10-02 12:29:29 -07:00
Harshavardhana 2aac50571d
fix: de-duplicate conflicting object names on namespace (#15772) 2022-09-30 15:44:21 -07:00
Shireesh Anjal 45edd27ad7
Re-load config after 'mc admin config reset' (#15771) 2022-09-30 10:55:53 -07:00
Daryl White d44f3526dc
Update links to documentation site (#15750) 2022-09-28 21:28:45 -07:00
Harshavardhana 41b633f5ea
support tagging based policy conditions (#15763) 2022-09-28 11:25:46 -07:00
Anis Elleuch 86bb48792c
non-blocking initialization of bucket target notifications (#15571) 2022-09-27 17:23:28 -07:00
Harshavardhana 94dbb4a427
fix: generalize SC config and also skip healing sub-sys under SD (#15757) 2022-09-26 09:04:54 -07:00
Anis Elleuch 048a46ec2a
Add RPC tcp timeout/errs and AVG duration to prometheus (#15747) 2022-09-26 09:04:26 -07:00
Poorna 8ea6fb368d
Add auto configuration of replication workers (#15636) 2022-09-24 16:20:28 -07:00
Harshavardhana b04c0697e1
validate correct ETag for the parts sent during CompleteMultipart (#15751) 2022-09-23 21:17:08 -07:00
Harshavardhana 50a8ba6a6f
fix: parse and save retainUntilDate in correct time format (#15741) 2022-09-23 08:49:27 -07:00
Anis Elleuch 20c89ebbb3
freeze before exit when _MINIO_DEBUG_NO_EXIT is defined (#15709)
this is to ensure keep k8s pods running, when they reach a "crashloop" stage
2022-09-22 11:57:27 -07:00
Krishnan Parthasarathi 6f56ba80b3
lifecycle: Assign unique id to rules with empty id (#15731) 2022-09-22 10:51:54 -07:00
Anis Elleuch 6e84283c66
fix: ignoring O_DIRECT in case of erasure single disk (#15734)
fixes #15733 
fixes #15735
2022-09-22 10:41:06 -07:00
Harshavardhana 9d6fddcfdf
persist the non-default creds in config (#15711) 2022-09-21 16:14:47 -07:00
jiuker 749ce107ee
fix: context leak with replication endpoint hearbeat (#15721) 2022-09-21 03:08:45 -07:00
Poorna aec2aa3497
site replication: clear config if remove --all specified (#15716) 2022-09-20 14:32:23 -07:00
Klaus Post ff12080ff5
Remove deprecated io/ioutil (#15707) 2022-09-19 11:05:16 -07:00
Minio Trusted d89f6af6c4 avoid replication stats crash in Prometheus 2022-09-16 17:09:45 -07:00
Harshavardhana 2c68a19dfd
upgrade all deps and update CREDITS (#15650) 2022-09-16 01:59:45 -07:00
Harshavardhana 9e5853ecc0
optimize double reads by reusing results from checkUploadIDExists() (#15692)
Move to using `xl.meta` data structure to keep temporary partInfo,
this allows for a future change where we move to different parts to
different drives.
2022-09-15 12:43:49 -07:00
Harshavardhana 124544d834
add pre-conditions support for PUT calls during replication (#15674)
PUT shall only proceed if pre-conditions are met, the new
code uses

- x-minio-source-mtime
- x-minio-source-etag

to verify if the object indeed needs to be replicated
or not, allowing us to avoid StatObject() call.
2022-09-14 18:44:04 -07:00
Poorna b910904fa6
change replication stats save path for windows (#15690) 2022-09-14 13:49:13 -07:00
Klaus Post eee1ce305c
When listing, do not count delete markers (#15689)
When limiting listing do not count delete, since they may be discarded.

Extend limit, since we may be discarding the forward-to marker.

Fix directories always being sent to resolve, since they didn't return as match.
2022-09-14 12:11:27 -07:00
Klaus Post 5c61c3ccdc
Fix flaky TestGetObjectWithOutdatedDisks (#15687)
On occasion this test fails:

```
2022-09-12T17:22:44.6562737Z === RUN   TestGetObjectWithOutdatedDisks
2022-09-12T17:22:44.6563751Z     erasure-object_test.go:1214: Test 2: Expected data to have md5sum = `c946b71bb69c07daf25470742c967e7c`, found `7d16d23f07072af1a809707ba101ae07`
2
```

Theory: Both objects are written with the same timestamp due to lower timer resolution on Windows. This results in secondary resolution, which is deterministic, but random.

Solution: Instead of hacking in a wait we request the specific version we want. Should still keep the test relevant.

Bonus: Remote action dependency for vulncheck
2022-09-14 08:17:39 -07:00
Poorna a0fb0c1835
panic if replication config could not be read from disk (#15685)
If replication config could not be read from bucket metadata for some
reason, issue a panic so that unexpected replication outcomes can
be avoided for replicated buckets.

For similar reasons, adding a panic while fetching object-lock config
if it failed for reason other than non-existence of config.
2022-09-13 21:23:33 -07:00
Aditya Manthramurthy e152b2a975
Pass groups claim into condition values (#15679)
This allows using `jwt:groups` as a multi-valued condition key in policies.
2022-09-13 09:45:36 -07:00
Poorna 6b9fd256e1
Persist in-memory replication stats to disk (#15594)
to avoid relying on scanner-calculated replication metrics.
This will improve the accuracy of the replication stats reported.

This PR also adds on to #15556 by handing replication
traffic that could not be queued by available workers to the 
MRF queue so that entries in `PENDING` status are healed faster.
2022-09-12 12:40:02 -07:00
Klaus Post ff9a74b91f
Add fast max-keys=1 support for Listing (#15670)
Add a listing option to stop when the limit is reached.  
This can be used by stateless listings for fast results.
2022-09-09 08:13:06 -07:00
Harshavardhana b579163802
limit number of buckets to 500k (#15668)
500k is a reasonable limit for any single MinIO
cluster deployment, in future we may increase this
value.

However for now we are going to keep this limit.
2022-09-09 03:06:34 -07:00
Krishnan Parthasarathi 96bfa77856
serialize updates to healing tracker (#15647)
When healing is parallelized by setting the ` _MINIO_HEAL_WORKERS` 
environment variable, multiple goroutines may race while updating the disk's 
healing tracker. This change serializes only these concurrent updates using a
channel. Note, the healing tracker is still not concurrency safe in other contexts.
2022-09-07 08:47:21 -07:00
Harshavardhana 8e997eba4a
fix: trigger Heal when xl.meta needs healing during PUT (#15661)
This PR is a continuation of the previous change instead
of returning an error, instead trigger a spot heal on the
'xl.meta' and return only after the healing is complete.

This allows for future GETs on the same resource to be
consistent for any version of the object.
2022-09-07 07:25:39 -07:00
Harshavardhana 228c6686f8
allow non-standards fallback for all http.TimeFormats (#15662)
fixes #15645
2022-09-07 07:24:54 -07:00
Harshavardhana 7776d064cf
allow non-standards fallback for Expires header (#15655)
fixes #15645
2022-09-05 19:18:18 -07:00
Harshavardhana 2d9b5a65f1
verify RenameData() versions to be consistent (#15649)
xl.meta gets written and never rolled back, however
we definitely need to validate the state that is
persisted on the disk, if there are inconsistencies

- more than write quorum we should return an error
  to the client

- if write quorum was achieved however there are
  inconsistent xl.meta's we should simply trigger
  an MRF on them
2022-09-05 16:51:37 -07:00
Shireesh Anjal c240da6568
Reuse madmin.ClusterRegistrationInfo (#15654)
The `clusterInfo` struct in admin-handlers is same as
madmin.ClusterRegistrationInfo, except for small differences in field
names.

Removing this and using madmin.ClusterRegistrationInfo in its place will
help in following ways:

- The JSON payload generated by mc in case of cluster registration will
  be consistent (same keys) with cluster.info generated by minio as part
  of the profile and inspect zip
- health-analyzer can parse the cluster.info using the same struct and
  won't have to define it's own
2022-09-05 10:02:25 -07:00
Harshavardhana 157272dc5b
fix: use optimized json.NewEncoder instead for metrics (#15648) 2022-09-05 08:06:35 -07:00
yudoutingle f4c56026a2
fix: potential deadLock caused by unlocking a non-existing lock (#15635) 2022-09-02 14:24:32 -07:00
Harshavardhana 37e3f5de10
do not print object not found errors in MRF healing (#15646) 2022-09-02 14:22:40 -07:00
Harshavardhana 5ea629beb2
avoid printing io.ErrUnexpectedEOF for .metacache objects (#15642) 2022-09-02 12:47:17 -07:00
Anis Elleuch cf52691959
Save resync status in the backend using a last update timestamp (#15638)
Currently, there is a short time window where the code is allowed 
to save the status of a replication resync. Currently, the window is
`now.Sub(st.EndTime) <= resyncTimeInterval`. Also, any failure to 
write in the backend disks is not retried.

Refactor the code a little bit to rely on the last timestamp of a
successful write of the resync status of any given bucket in the 
backend disks.
2022-09-01 16:53:36 -07:00
Anis Elleuch 10e75116ef
Avoid replicating dirs in listing with replication enabled (#15641)
When replication is enabled in a particular bucket, the listing will send
objects to bucket replication, but it is also sending prefixes for non
recursive listing which is useless and shows a lot of error logs.

This commit will ignore prefixes.
2022-09-01 15:22:11 -07:00
Harshavardhana f649968c69
tier: avoid stats infinite loop in forwardTo method (#15640)
under some sequence of events following code would
reach an infinite loop.

```
idx1, idx2 := 0, 1
for ; idx2 != idx1; idx2++ {
        fmt.Println(idx2)
}
```

fixes #15639
2022-09-01 13:51:06 -07:00
Harshavardhana bcedc2b0d9
fix: add healing metric type for heal tracing (#15631)
changes the `heal.checkBucket` to `heal.Bucket` instead
since the latter is more meaningful.
2022-08-31 12:28:03 -07:00
Klaus Post 8e4a45ec41
fix: encrypt checksums in metadata (#15620) 2022-08-31 08:13:23 -07:00
Klaus Post dec942beb6
feat: Add healing trace (#15616) 2022-08-31 01:56:12 -07:00
Abirdcfly d4e0f13bb3
chore: remove duplicate word in comments (#15607)
Signed-off-by: Abirdcfly <fp544037857@gmail.com>

Signed-off-by: Abirdcfly <fp544037857@gmail.com>
2022-08-30 08:26:43 -07:00
Anis Elleuch 1f28a3bb80
Avoid messages from go test output (#15601)
A lot of warning messages are printed in CI/CD failures generated by go
test. Avoid that by requiring at least Error level for logging when
doing go test.
2022-08-30 08:23:40 -07:00
Krishnan Parthasarathi 3a1d3a7952
audit-log: Add time to get/restore object from remote-tier (#15602) 2022-08-29 21:33:59 -07:00
Klaus Post a9f1ad7924
Add extended checksum support (#15433) 2022-08-29 16:57:16 -07:00
Poorna 929b9e164e
site replication: Avoid returning root svcacct info in sr metadata (#15608)
Service accounts of root users should not be replicated.
2022-08-29 11:19:51 -07:00
Harshavardhana 97376f6e8f
improve performance for inlined data (#15603)
inlined data often is bigger than the allowed
O_DIRECT alignment, so potentially we can write
'xl.meta' without O_DSYNC instead we can rely on
O_DIRECT + fdatasync() instead.

This PR allows O_DIRECT on inlined data that
would gain the benefits of performing O_DIRECT,
eventually performing an fdatasync() at the end.

Performance boost can be observed here for small
objects < 128KiB. The performance boost is mainly
seen on HDD, and marginal on NVMe setups.
2022-08-29 11:19:29 -07:00
Febriananda Wida Pramudita 1f22a16b15
fix: endpoints for single local disks must retain port info (#15585) 2022-08-26 12:53:15 -07:00
Harshavardhana 433b6fa8fe
upgrade golang-lint to the latest (#15600) 2022-08-26 12:52:29 -07:00
Krishnan Parthasarathi 99fbfe2421
Add concurrency to healing objects on a fresh disk (#15575) 2022-08-25 13:07:15 -07:00
Poorna b1b6264bea
fix: validate deployment id when adding peer clusters (#15591)
Fixes: #15573
2022-08-25 11:30:52 -07:00
Aditya Manthramurthy 18dffb26e7
Allow querying a single target in config get API (#15587) 2022-08-25 00:17:05 -07:00
Harshavardhana edba7c987b
fix: objects matching prefixes should not leave delete markers (#15586)
This is needed to ensure that we do not leave prefixes where
version is suspended, instead we never leave versions on
these paths.
2022-08-24 13:46:29 -07:00
Anis Elleuch b737c83a66
Ensure that only one node performs site replication healing (#15584)
When a node finds a change in the other replication cluster and applies
to itself will already notify other peers. No need for all nodes in a
given cluster to do site replication healing, only one node is
sufficient.
2022-08-24 13:46:09 -07:00
Anis Elleuch 97a6322de1
Fix regression in notifying peers about new policy mapping (#15583)
Switch from mux.Vars() to r.Form to avoid the issue of missing arguments
passed to LoadPolicyMappingHandler.
2022-08-24 12:34:52 -07:00
Klaus Post 037fe4afdc
Add listing block reuse (#15579)
When streaming results, pool metadata slices when sent.
2022-08-24 09:11:16 -07:00
Aditya Manthramurthy afbb63a197
Factor out external event notification funcs (#15574)
This change moves external event notification functionality into
`event-notification.go`. This simplifies notification related code.
2022-08-24 06:42:36 -07:00
Harshavardhana 8902561f3c
use new xxml for XML responses to support rare control characters (#15511)
use new xxml/XML responses to support rare control characters

fixes #15023
2022-08-23 17:04:11 -07:00
Anis Elleuch b8cdf060c8
Properly replicate policy mapping for virtual users (#15558)
Currently, replicating policy mapping for STS users does not work. Fix
it is by passing user type to PolicyDBSet.
2022-08-23 11:11:45 -07:00
Poorna 4155c5b695
replication: improve MRF healing. (#15556)
This PR improves the replication failure healing by persisting
most recent failures to disk and re-queuing them until the replication
is successful.

While this does not eliminate the need for healing during a full scan, 
queuing MRF vastly improves the ETA to keeping replicated buckets 
in sync as it does not wait for the scanner visit to detect unreplicated 
object versions.
2022-08-22 16:53:06 -07:00
Poorna 471467d310
fix: ensure metadata update happens after deletemarker replication (#15564)
Fixes regression caused by #15521
2022-08-22 15:59:06 -07:00
Harshavardhana ae4ee95d25
change default lock retry interval to 50ms (#15560)
competing calls on the same object on versioned bucket
mutating calls on the same object may unexpected have
higher delays.

This can be reproduced with a replicated bucket
overwriting the same object writes, deletes repeatedly.

For longer locks like scanner keep the 1sec interval
2022-08-19 16:21:05 -07:00
Harshavardhana e9055e9ef7
fix: walk() should cancel itself upon context cancellation (#15553)
This PR fixes possible leaks that may emanate from not
listening on context cancelation or timeouts.

```
goroutine 60957610 [chan send, 16 minutes]:
github.com/minio/minio/cmd.(*erasureServerPools).Walk.func1.1.1(...)
        github.com/minio/minio/cmd/erasure-server-pool.go:1724 +0x368
github.com/minio/minio/cmd.listPathRaw({0x4a9a740, 0xc0666dffc0},...
        github.com/minio/minio/cmd/metacache-set.go:1022 +0xfc4
github.com/minio/minio/cmd.(*erasureServerPools).Walk.func1.1()
        github.com/minio/minio/cmd/erasure-server-pool.go:1764 +0x528
created by github.com/minio/minio/cmd.(*erasureServerPools).Walk.func1
        github.com/minio/minio/cmd/erasure-server-pool.go:1697 +0x1b7
```
2022-08-18 17:49:08 -07:00
Harshavardhana d350b666ff
feat: add idempotent delete marker support (#15521)
The bottom line is delete markers are a nuisance,
most applications are not version aware and this
has simply complicated the version management.

AWS S3 gave an unnecessary complication overhead
for customers, they need to now manage these
markers by applying ILM settings and clean
them up on a regular basis.

To make matters worse all these delete markers
get replicated as well in a replicated setup,
requiring two ILM settings on each site.

This PR is an attempt to address this inferior
implementation by deviating MinIO towards an
idempotent delete marker implementation i.e
MinIO will never create any more than single
consecutive delete markers.

This significantly reduces operational overhead
by making versioning more useful for real data.

This is an S3 spec deviation for pragmatic reasons.
2022-08-18 16:41:59 -07:00
Harshavardhana 895357607a
avoid using errors.As for 'errors.New' use errors.Is (#15549)
Bonus: ignore coredns CVE, for now, there is no fix yet

https://github.com/coredns/coredns/issues/5574
2022-08-18 11:10:49 -07:00
Harshavardhana bf38c0c0d1
fix: increase concurrency of DeleteObjects() to N/10th (#15546)
instead of keeping the value 10 and static, make
the concurrency a function of incoming number of
objects being deleted.
2022-08-18 09:33:56 -07:00
Poorna 21fe14201f
replication: centralize healthcheck for remote targets (#15516)
This PR moves health check from minio-go client to being
managed on the server.

Additionally integrating health check into site replication
2022-08-16 17:46:22 -07:00
Harshavardhana 48640b1de2
fix: trim arn:aws:kms from incoming SSE aws-kms-key-id (#15540) 2022-08-16 11:28:30 -07:00
Anis Elleuch 5682685c80
Introduce disk io stats metrics (#15512) 2022-08-16 07:13:49 -07:00
Harshavardhana c7d535c648
init console after IAM init() (#15531)
fixes #15527
2022-08-13 12:54:41 -07:00
Aditya Manthramurthy 9986e103cf
Fix env var output in config get/export APIs (#15528)
Fix a bug where env vars are not output when the config for the
subsystem is specified solely via env vars.
2022-08-13 10:39:01 -07:00
Krishnan Parthasarathi 91e6af4470
Add trace support for decommissioning (#15502)
* Add trace support for decommissioning
* Add support for tracing errors during decommission
2022-08-10 12:46:45 -07:00
Shireesh Anjal 316c492842
Upgrade madmin-go to latest version (v1.4.15) (#15510) 2022-08-10 07:36:13 -07:00
Harshavardhana 74418b542a
fix: incorrect context timeout during listPath() (#15509)
This PR cleans up the listing code for single drive
to ensure that we do not add an incorrect context
timeout, while resuming the listing.

fixes #15508
2022-08-10 07:35:29 -07:00
Poorna 172e63dbb6
fix: site replication group updates to set status correctly (#15507)
Fixes: #15486
2022-08-09 15:17:43 -07:00
Poorna 21bf5b4db7
replication: heal proactively upon access (#15501)
Queue failed/pending replication for healing during listing and GET/HEAD
API calls. This includes healing of existing objects that were never
replicated or those in the middle of a resync operation.

This PR also fixes a bug in ListObjectVersions where lifecycle filtering
should be done.
2022-08-09 15:00:24 -07:00
Harshavardhana a406bb0288
restrict number of disks used for scanning buckets upto GOMAXPROCS (#15492)
control scanner parallelism to avoid higher CPU
usage on nodes that have more drives but an old CPU.
2022-08-08 16:16:44 -07:00
Harshavardhana 1823ab6808
LDAP/OpenID must be initialized IAM Init() (#15491)
This allows for LDAP/OpenID to be non-blocking,
allowing for unreachable Identity targets to be
initialized in IAM.
2022-08-08 16:16:27 -07:00
Harshavardhana 8eec49304d use logger.Info instead of logger.LogIf 2022-08-08 16:13:58 -07:00
Harshavardhana ecdc2f2f5f
fix: maxConcurrent '0' is an invalid value (#15500)
log and continue with defaults instead of
crashing the service.
2022-08-08 15:18:45 -07:00
Harshavardhana e178c55bc3
remove non-working GetRawData() from FS mode (#15498) 2022-08-08 11:34:09 -07:00
Poorna 2c137c0d04
fix: handle invalid endpoint errors in site replication(#15499)
fixes #15497
2022-08-08 11:12:05 -07:00
Harshavardhana 638c57e466 revert changes in FS implementation for umask
fixes #15494
2022-08-08 09:48:24 -07:00
Harshavardhana 5e4213b3be
fix: keep writing previous speedtest result (#15484)
when object speedtest is running keep writing
previous speedtest result back to client until
we have a new result - this avoids sending back
blank entries in between the speedtest when it
is running in 'autotune' mode.
2022-08-07 23:04:03 -07:00
Harshavardhana e0b0a351c6
remove IAM old migration code (#15476)
```
commit 7bdaf9bc50
Author: Aditya Manthramurthy <donatello@users.noreply.github.com>
Date:   Wed Jul 24 17:34:23 2019 -0700

    Update on-disk storage format for users system (#7949)
```

Bonus: fixes a bug when etcd keys were being re-encrypted.
2022-08-05 17:53:23 -07:00
Anis Elleuch 1d2ff46a89
Ensure lock/versioning permissions when creating a bucket (#15432)
Currently, the code doesn't check if the user creating a bucket with
locking feature has bucket locking and versioning permissions enabled,
adding it in accordance with S3 spec.

https://docs.aws.amazon.com/AmazonS3/latest/API/API_CreateBucket.html

Object Lock - If ObjectLockEnabledForBucket is set to true in your CreateBucket request,
s3:PutBucketObjectLockConfiguration and s3:PutBucketVersioning permissions are required.
2022-08-05 16:27:09 -07:00
Harshavardhana 8f7c739328
feat: add SpeedTest ResponseTimes and TTFB (#15479)
Capture average, p50, p99, p999 response times
and ttfb values. These are needed for latency
measurements and overall understanding of our
speedtest results.
2022-08-05 09:40:03 -07:00
Poorna 1beea3daba
fix: import bucket metadata import to return a summary (#15462) 2022-08-05 01:52:50 -07:00
Aditya Manthramurthy 3d94c38ec4
Add env variables to configuration APIs output (#15465)
Config export and config get APIs now include environment 
variables set on the server
2022-08-04 22:21:52 -07:00
Harshavardhana f4af2d3cdc
fix: decodeDirObject() in single drive DeleteObjects() call (#15477)
Thanks to @bh4t for reproducing this issue.
2022-08-04 18:57:43 -07:00
ebozduman b57e7321e7
Replaces 'disk'=>'drive' visible to end user (#15464) 2022-08-04 16:10:08 -07:00
Anis Elleuch e93867488b
actively cancel listIAMConfigItems to avoid goroutine leak (#15471)
listConfigItems creates a goroutine but sometimes callers will
exit without properly asking listAllIAMConfigItems() to stop sending
results, hence a goroutine leak.

Create a new context and cancel it for each listAllIAMConfigItems
call.
2022-08-04 13:20:43 -07:00
Harshavardhana 3bd9615d0e
fix: log if there is readDir() failure with ListBuckets (#15461)
This is actionable and must be logged.

Bonus: also honor umask by using 0o666 for all Open() syscalls.
2022-08-04 07:23:05 -07:00
Harshavardhana a6e0ec4e6f
Add support converting non-inlined to inlined (#15444)
This is a feature to allow for inode compaction on
large clusters that use a lot of small files spread
across a large heirarchy.
2022-08-02 23:10:22 -07:00
Andreas Auernhammer d774a3309b
kes: automatically reload KES client certificate (#15450)
This commit adds support for automatically reloading
the MinIO client certificate for authentication to KES.

The client certificate will now be reloaded:
 - when the private key / certificate file changes
 - when a SIGHUP signal is received
 - every 15 minutes

Fixes #14869

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-08-02 16:58:09 -07:00
Anis Elleuch b3edb25377
bloom: healObject to mark a path dirty only for dangling objects (#15458)
The path is marked dirty automatically when healObject() is called, which is
wrong. HealObject() is called during self-healing and this will lead to
an increase in the false positive result of the bloom filter.

Also move NSUpdated() from renameData() and call it directly in
CompleteMultipart and PutObject, this is not a functional change but
it will make it less prone to errors in the future.
2022-08-02 16:57:39 -07:00
Harshavardhana 53a816b17a
fix: readdir fallback on root of the drive (#15457)
fixes #15452
2022-08-02 14:57:36 -07:00
Harshavardhana 043aaa792d
fix: intrument os.OpenFile differently for Reads and Writes (#15449)
allows us to trace latency for READs or WRITEs
2022-08-01 13:22:43 -07:00
Shireesh Anjal e6eab2091f
fix: Incorrect ServersCount in cluster.info (#15431)
The `ServersCount` field in cluster.info is expected to contain the
number of nodes, and not number of endpoints.
2022-07-29 22:21:40 -07:00
Harshavardhana 3cdb609cca
allow root users to return appropriate policy in AccountInfo (#15437)
fixes #15436

This fixes a regression caused after the removal of "consoleAdmin"
policy usage for 'root users' in PR #15402
2022-07-29 20:58:03 -07:00
Harshavardhana aa874010e2
fix: regression in resolving the right versions (#15430)
fix: regression in resolving right versions

commit d480022711 caused a regression in real
resolver, by picking up incorrect versionID.
2022-07-29 10:03:53 -07:00
Cesar Celis Hernandez 8ec888d13d
feat: update binary once and push it to other servers (#15407) 2022-07-29 08:34:30 -07:00
Harshavardhana 916f274c83
choose starting concurrency based on number of local disks (#15428)
smaller setups may have less drives per server choosing
the concurrency based on number of local drives, and let
the MinIO server change the overall concurrency as
necessary.
2022-07-29 00:00:06 -07:00
Aditya Manthramurthy 7ac53c07af
fix: passing application configuration to console (#15409)
This is an update to MinIO server after swagger codegen related build
fixes added after issues introduced in 39fd7b0b3b
2022-07-28 18:30:24 -07:00
Harshavardhana bc72e4226e
do not allow filesystem fallback in server download (#15429)
It is possible for anyone with admin access to relatively
to get any content of any random OS location by simply
providing the file with 'mc admin update alias/ /etc/passwd`.

Workaround is to disable 'admin:ServiceUpdate' action. Everyone
is advised to upgrade to this patch.

Thanks to @alevsk for finding this bug.
2022-07-28 17:44:21 -07:00
Poorna 5e0776e96a
replication: Include replica object versions for resync (#15427) 2022-07-28 13:43:02 -07:00
Anis Elleuch 2f1ef02d35
Do not update directory access time (#15426)
Most setups will have relatime it only updates the access time 
following a change in the directory.
2022-07-28 12:40:48 -07:00
Harshavardhana aff236e20e
fix: cluster healthcheck for single drive setups (#15415)
single drive setups must return '200 OK' if
drive is accessible, current master returns '503'
2022-07-27 16:46:34 -07:00
Harshavardhana cbd70d26b5
optimize speedtest for smaller setups (#15414)
this has been observed in multiple environments
where the setups are small `speedtest` naturally
fails with default '10s' and the concurrency
of '32' is big for such clusters.

choose a smaller value i.e equal to number of
drives in such clusters and let 'autotune'
increase the concurrency instead.
2022-07-27 14:41:59 -07:00
Harshavardhana 5e763b71dc
use logger.LogOnce to reduce printing disconnection logs (#15408)
fixes #15334

- re-use net/url parsed value for http.Request{}
- remove gosimple, structcheck and unusued due to https://github.com/golangci/golangci-lint/issues/2649
- unwrapErrs upto leafErr to ensure that we store exactly the correct errors
2022-07-27 09:44:59 -07:00
Aditya Manthramurthy 7e4e7a66af
Remove internal usage of consoleAdmin (#15402)
"consoleAdmin" was used as the policy for root derived accounts, but this
lead to unexpected bugs when an administrator modified the consoleAdmin
policy

This change avoids evaluating a policy for root derived accounts as by
default no policy is mapped to the root user. If a session policy is
attached to a root derived account, it will be evaluated as expected.
2022-07-26 19:06:55 -07:00
Shireesh Anjal 906947a285
fix: typo in json key ClusterInfo DeploymentID (#15406)
deployement_id -> deployment_id
2022-07-26 19:05:33 -07:00
Poorna 426c902b87
site replication: fix healing of bucket deletes. (#15377)
This PR changes the handling of bucket deletes for site 
replicated setups to hold on to deleted bucket state until 
it syncs to all the clusters participating in site replication.
2022-07-25 17:51:32 -07:00
Anis Elleuch e4b51235f8
upgrade: Split in two steps to ensure a stable retry (#15396)
Currently, if one server in a distributed setup fails to upgrade 
due to any reasons, it is not possible to upgrade again unless 
nodes are restarted.

To fix this, split the upgrade process into two steps :

- download the new binary on all servers
- If successful, overwrite the old binary with the new one
2022-07-25 17:49:47 -07:00
Eng Zer Jun 0a3b1ad4eb
test: use `T.TempDir` to create temporary test directory (#15400)
This commit replaces `ioutil.TempDir` with `t.TempDir` in tests. The
directory created by `t.TempDir` is automatically removed when the test
and all its subtests complete.

Prior to this commit, temporary directory created using `ioutil.TempDir`
needs to be removed manually by calling `os.RemoveAll`, which is omitted
in some tests. The error handling boilerplate e.g.
	defer func() {
		if err := os.RemoveAll(dir); err != nil {
			t.Fatal(err)
		}
	}
is also tedious, but `t.TempDir` handles this for us nicely.

Reference: https://pkg.go.dev/testing#T.TempDir

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2022-07-25 12:37:26 -07:00
Anis Elleuch f23f442d33
Add cluster info to inspect/profiling archive (#15360)
Add cluster info to inspect and profiling archive.

In addition to the existing data generation for both inspect and profiling,
cluster.info file is added. This latter contains some info of the cluster.
The generation of cluster.info is is done as the last step and it can fail
if it exceed 10 seconds.
2022-07-25 09:11:35 -07:00
Klaus Post 3795b2c8ba
Add compression scheme to header (#15395)
For easier debugging. We still do not return compressed size for security reasons.
2022-07-24 07:15:49 -07:00
Harshavardhana 7725425e05
fix: fork os.MkdirAll to optimize cases where parent exists (#15379)
a/b/c/d/ where `a/b/c/` exists results in additional syscalls
such as an Lstat() call to verify if the `a/b/c/` exists
and its a directory.

We do not need to do this on MinIO since the parent prefixes
if exist, we can simply return success without spending
additional syscalls.

Also this implementation attempts to simply use Access() calls
to avoid os.Stat() calls since the latter does memory allocation
for things we do not need to use.

Access() is simpler since we have a predictable structure on
the backend and we know exactly how our path structures are.
2022-07-24 00:43:11 -07:00
Aditya Manthramurthy 39fd7b0b3b
Pass multiple IDP config to console (#15270)
This change passes multiple IDP config via a struct 
rather than env variables.
2022-07-22 15:28:02 -07:00
Harshavardhana b0d70a0e5e
support additional claim info in Auditing STS calls (#15381)
Bonus: Adds a missing AuditLog from AssumeRoleWithCertificate API

Fixes #9529
2022-07-22 11:12:03 -07:00
Poorna 7d8c8de827
single drive: Remove bucket metadata on DeleteBucket (#15378)
from disk and in-memory map
2022-07-21 19:51:53 -07:00
jiuker 3faef829c5
expect full quorum for writing 'format.json' everywhere (#15362) 2022-07-21 18:04:17 -07:00
Poorna 7560fb6f9a
save IAM export assets relative at a folder prefix (#15355) 2022-07-21 17:51:33 -07:00
Klaus Post 69bf39f42e
fix: make complete multipart uploads faster encrypted/compressed backends (#15375)
- Only fetch the parts we need and abort as soon as one is missing.
- Only fetch the number of parts requested by "ListObjectParts".
2022-07-21 16:47:58 -07:00
Minio Trusted 564a0afae1 Revert "tests: Add context cancelation (#15374)"
This reverts commit 1e332f0eb1.

Reverting this as tests are failing randomly.
2022-07-21 13:58:56 -07:00
Klaus Post 1e332f0eb1
tests: Add context cancelation (#15374)
A huge number of goroutines would build up from various monitors

When creating test filesystems provide a context so they can shut down when no longer needed.
2022-07-21 11:52:18 -07:00
Poorna cab8d3d568
feat: add API to return list of objects waiting to be replicated (#15091) 2022-07-21 11:05:44 -07:00
Klaus Post be8c4cb24a
fix: support multiple validateAdminReq actions (#15372)
handle multiple validateAdminReq actions and remove duplicate error responses.
2022-07-21 10:26:59 -07:00
Harshavardhana 65166e4ce4
fix: readQuorum calculation when defaultParityCount is 0 (#15363)
when parity is '0' the readQuorum must be equal
to the number of data disks.
2022-07-21 07:25:54 -07:00
Harshavardhana d3f89fa6e3
remove unnecessary logs in IAM store (#15356) 2022-07-20 08:19:12 -07:00
Harshavardhana ce8397f7d9
use partInfo only for intermediate part.x.meta (#15353) 2022-07-19 18:56:24 -07:00
Klaus Post cae9aeca00
fix: reused field crash in PartIndices (#15351)
`PartIndices` may be set if xlMetaV2Version is reused.

Clear before unmarshaling and add sanity check when reading.
2022-07-19 16:49:46 -07:00
Klaus Post f939d1c183
Independent Multipart Uploads (#15346)
Do completely independent multipart uploads.

In distributed mode, a lock was held to merge each multipart 
upload as it was added. This lock was highly contested and 
retries are expensive (timewise) in distributed mode.

Instead, each part adds its metadata information uniquely. 
This eliminates the per object lock required for each to merge.
The metadata is read back and merged by "CompleteMultipartUpload" 
without locks when constructing final object.

Co-authored-by: Harshavardhana <harsha@minio.io>
2022-07-19 08:35:29 -07:00
Andreas Auernhammer 242d06274a
kms: add `context.Context` to KMS API calls (#15327)
This commit adds a `context.Context` to the
the KMS `{Stat, CreateKey, GenerateKey}` API
calls.

The context will be used to terminate external calls
as soon as the client requests gets canceled.

A follow-up PR will add a `context.Context` to
the remaining `DecryptKey` API call.

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-07-18 18:54:27 -07:00
Poorna 957e3ed729
export IAM: include site replicator svcacct (#15339) 2022-07-18 17:38:53 -07:00
Harshavardhana b6eb8dff64
Add decommission compression+encryption enabled tests (#15322)
update compression environment variables to follow
the expected sub-system style, however support fallback
mode.
2022-07-17 08:43:14 -07:00
Harshavardhana 7da9e3a6f8
support encrypted/compressed objects properly during decommission (#15320)
fixes #15314
2022-07-16 19:35:24 -07:00
Anis Elleuch 876970baea
Exclude upload-ids with incomplete part upload in multipart listing (#15318)
Uploading a part object can leave an inconsistent state inside
.minio.sys/multipart where data are uploaded but xl.meta is not
committed yet.

Do not list upload-ids that have this state in the multipart listing.
2022-07-16 13:25:58 -07:00
LHHDZ e68e76e143
fix: data race, which caused tests execution to fail (#15313) 2022-07-16 07:57:55 -07:00
Harshavardhana e7ac1ea54c
allow decommission to continue when healing (#15312)
Bonus:

- heal buckets in-case during startup the new
  pools have bucket missing.
2022-07-15 21:03:23 -07:00
Harshavardhana 5ac6d91525
support 'admin update' for hotfix versions (#15308)
hotfixed versions are rejected as invalid,
allow `mc admin update` from hotfix repos.
2022-07-15 16:00:34 -07:00
Harshavardhana 1cd6713e24
copy query values before update to preserve the expected keys (#15310)
in success_action_redirect we were missing required
query params as per S3 spec - updated tests.
2022-07-15 15:04:48 -07:00
Harshavardhana 1b339ea062
allow force delete on decom pool (#15302)
Bonus:

- skip suspended pool from being
  considered for multipart uploads

- add more context for decomErrors()
2022-07-14 20:44:22 -07:00
Harshavardhana 236ef03dbd
fix: skip objects expired via lifecycle rules during decommission (#15300) 2022-07-14 16:47:09 -07:00
Poorna 7e32a17742
fix: site replication healing of missing buckets (#15298)
fixes a regression from #15186

- Adding tests to cover healing of buckets.
- Also dereference quota in SiteReplicationStatus only when non-nil
2022-07-14 14:27:47 -07:00
Krishnan Parthasarathi 1d42133d44
listing: Expire object versions past expiry (#15287)
We skip object versions which are past their ILM expiry. This change schedules
them for expiry while at it.
2022-07-14 07:21:26 -07:00
Poorna b4f6901903
resync: Avoid concurrent access/write on map (#15286)
fixes a crash

```
fatal error: concurrent map iteration and map write
minio[19309]: goroutine 18640 [running]:
minio[19309]: runtime.throw({0x27a3399?, 0x1785?})
minio[19309]: runtime/panic.go:992 +0x71 fp=0xc0062f1c80 sp=0xc0062f1c50 pc=0x438671
minio[19309]: runtime.mapiternext(0xc0062f1e90?)
minio[19309]: runtime/map.go:871 +0x4eb fp=0xc0062f1cf0 sp=0xc0062f1c80 pc=0x41002b
minio[19309]: github.com/minio/minio/cmd.(*ReplicationPool).periodicResyncMetaSave(0xc0056c00c0, {0x4d06a48, 0xc0005b2480}, {0x4d22fc0, 0xc0015ea0
```
2022-07-13 16:29:10 -07:00
Klaus Post 0149382cdc
Add padding to compressed+encrypted files (#15282)
Add up to 256 bytes of padding for compressed+encrypted files.

This will obscure the obvious cases of extremely compressible content 
and leave a similar output size for a very wide variety of inputs.

This does *not* mean the compression ratio doesn't leak information 
about the content, but the outcome space is much smaller, 
so often *less* information is leaked.
2022-07-13 07:52:15 -07:00
Klaus Post 697c9973a7
Upgrade compression package (#15284)
Includes mitigation for CVE-2022-30631 (Go should still be updated)

Remove functions now available upstream.
2022-07-13 07:48:14 -07:00
Harshavardhana 788fd3df81
preserve incoming query params in success_action_redirect (#15280)
fixes #15274
2022-07-13 07:46:44 -07:00
Anis Elleuch 996cac5fed
Avoid listing buckets from a suspended pool (#15283)
Make bucket requests sent after decommissioning is started are not
created in a suspended pool. Therefore listing buckets should avoid
suspended pools as well.
2022-07-13 07:44:50 -07:00
Harshavardhana 0a8b78cb84
fix: simplify passing auditLog eventType (#15278)
Rename Trigger -> Event to be a more appropriate
name for the audit event.

Bonus: fixes a bug in AddMRFWorker() it did not
cancel the waitgroup, leading to waitgroup leaks.
2022-07-12 10:43:32 -07:00
Harshavardhana b4eb74f5ff
allow custom speedtest bucket (#15271)
this allows for specifying existing buckets with

- object replication enabled
- object encryption enabled
- object versioning enabled
- object locking enabled
2022-07-12 10:12:47 -07:00
Anis Elleuch 57d1f31054
Do not log erasure read failure when disk goes offline (#15277)
Avoid printing the following log

```
API: SYSTEM
Time: Fri Jul 08 2022 11:48:40 GMT+0100
Error: Error(disk not found) reading erasure shards at...

Backtrace:
0: internal/logger/logger.go:278:logger.LogIf()
1: cmd/bitrot-streaming.go:156:cmd.(*streamingBitrotReader).ReadAt()
2: cmd/erasure-decode.go:165:cmd.(*parallelReader).Read.func1()
```
2022-07-12 09:56:56 -07:00
Klaus Post 9f02f51b87
Add 4K minimum compressed size (#15273)
There is no point in compressing very small files.

Typically the effective size on disk will be the same due to disk blocks.

So don't waste resources on extremely small files.

We don't check on multipart. 1) because we don't know and 2) this is very likely a big object anyway.
2022-07-12 07:42:04 -07:00
Klaus Post 911a17b149
Add compressed file index (#15247) 2022-07-11 17:30:56 -07:00
Poorna 3d969bd2b4
fix: ignore missing targets/replication config during site removal (#15269) 2022-07-11 14:11:46 -07:00
Andreas Auernhammer f800cee4fa
metric: add KMS-related metrics (#15258)
This commit adds a minimal set of KMS-related metrics:
```
 # HELP minio_cluster_kms_online Reports whether the KMS is online (1) or offline (0)
 # TYPE minio_cluster_kms_online gauge
 minio_cluster_kms_online{server="127.0.0.1:9000"} 1
 # HELP minio_cluster_kms_request_error Number of KMS requests that failed with a well-defined error
 # TYPE minio_cluster_kms_request_error counter
 minio_cluster_kms_request_error{server="127.0.0.1:9000"} 16790
 # HELP minio_cluster_kms_request_success Number of KMS requests that succeeded
 # TYPE minio_cluster_kms_request_success counter
 minio_cluster_kms_request_success{server="127.0.0.1:9000"} 348031
```

Currently, we report whether the KMS is available and how many requests
succeeded/failed. However, KES exposes much more metrics that can be
exposed if necessary. See: https://pkg.go.dev/github.com/minio/kes#Metric

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-07-11 09:17:28 -07:00
Praveen raj Mani b49fc33cb3
purge objects immediately with `x-minio-force-delete` in DeleteObject and DeleteBucket API (#15148) 2022-07-11 09:15:54 -07:00
Klaus Post 37a6b2da67
Allow compaction at bucket top level. (#15266)
If more than 1M folders (objects or prefixes) are found at the top level in a bucket allow it to be compacted.

While very suboptimal structure we should limit memory usage at some point.
2022-07-11 07:59:03 -07:00
Harshavardhana 913e977c8d
remove auto-port warning for console-address (#15260) 2022-07-08 13:36:41 -07:00
Harshavardhana c2ddcb3b40
do not recreate deprecated delete-journal.bin, only read it (#15185)
simplify deprecated code, re-enable hot-swap replace disks
2022-07-08 12:17:02 -07:00
Anis Elleuch ed0cbfb31e
fix: rootdisk detection by not using cached value when GetDiskInfo() errors out (#15249)
GetDiskInfo() uses timedValue to cache the disk info for one second.

timedValue behavior was recently changed to return an old cached value
when calculating a new value returns an error.

When a mount point is empty, GetDiskInfo() will return errUnformattedDisk,
timedValue will return cached disk info with unexpected IsRootDisk value,
e.g. false if the mount point belongs to a root disk. Therefore, the mount
point will be considered a valid disk and will be formatted as well.

This commit will also add more defensive code when marking root disks:
always mark a disk offline for any GetDiskInfo() error except
errUnformattedDisk. The server will try anyway to reconnect to those
disks every 10 seconds.
2022-07-07 17:05:23 -07:00
Harshavardhana 32b2f6117e
fix: do not pass around sync.Map (#15250)
it is not safe to pass around sync.Map
through pointers, as it may be concurrently
updated by different callers.

this PR simplifies by avoiding sync.Map
altogether, we do not need sync.Map
to keep object->erasureMap association.

This PR fixes a crash when concurrently
using this value when audit logs are
configured.

```
fatal error: concurrent map iteration and map write

goroutine 247651580 [running]:
runtime.throw({0x277a6c1?, 0xc002381400?})
        runtime/panic.go:992 +0x71 fp=0xc004d29b20 sp=0xc004d29af0 pc=0x438671
runtime.mapiternext(0xc0d6e87f18?)
        runtime/map.go:871 +0x4eb fp=0xc004d29b90 sp=0xc004d29b20 pc=0x41002b
```
2022-07-07 17:04:25 -07:00
Harshavardhana ae92521310
remove unnecessary nAgreed value in partial() func (#15242) 2022-07-07 13:45:34 -07:00
Harshavardhana 5802df4365
retry and resume decom operation upon retriable failures (#15244)
it is possible in a k8s-like system reading pool.bin
might not have quorum during startup, however, add
a way to retry after this failure.
2022-07-07 12:31:44 -07:00
Anis Elleuch 8d98282afd
Better reporting of total/free usable capacity of the cluster (#15230)
The current code uses approximation using a ratio. The approximation 
can skew if we have multiple pools with different disk capacities.

Replace the algorithm with a simpler one which counts data 
disks and ignore parity disks.
2022-07-06 13:29:49 -07:00
Harshavardhana 3af6073576
no 'replicate status' without replication config (#15233)
'replicate status' shouldn't be displaying historic
values unless replication config is present on the
relevant bucket.
2022-07-06 09:53:33 -07:00
Harshavardhana 2518af5f9e
fix: allow certain mutations on objects during decommissioning (#15231)
fix: allow certain mutation on objects during decommission

currently by mistake deletion of objects was skipped,
if the object resided on the pool being decommissioned.

delete's are okay to be allowed since decommission is
designed to run on a cluster with active I/O.
2022-07-06 09:53:16 -07:00
Harshavardhana 7b793d84c8
fix: calculate scanner metric paths for single drive (#15232)
Additionally use pathJoin() to avoid double `//`
in path names.
2022-07-06 07:48:38 -07:00
Aditya Manthramurthy af9bc7ea7d
Add external IDP management Admin API for OpenID (#15152) 2022-07-05 18:18:04 -07:00
Klaus Post ac055b09e9
Add detailed scanner metrics (#15161) 2022-07-05 14:45:49 -07:00
haslersn df42914da6
Fix missing whitespace in error message for IncompleteBody (#15227) 2022-07-05 12:19:57 -07:00
Klaus Post 2471bdda00
fix: for DiskInfo call cache disk metrics (#15229)
Small uploads spend a significant amount of time (~5%) fetching disk info metrics. Also maps are allocated for each call.

Add a 100ms cache to disk metrics.
2022-07-05 11:02:30 -07:00
Harshavardhana 9d80ff5a05
fix: decommission delete markers for non-current objects (#15225)
versioned buckets were not creating the delete markers
present in the versioned stack of an object, this essentially
would stop decommission to succeed.

This PR fixes creating such delete markers properly during
a decommissioning process, adds tests as well.
2022-07-05 07:37:24 -07:00
Harshavardhana b311abed31
decom IAM, Bucket metadata properly (#15220)
Current code incorrectly passed the
config asset object name while decommissioning,
make sure that we pass the right object name
to be hashed on the newer set of pools.

This PR fixes situations after a successful
decommission, the users and policies might go
missing due to wrong hashed set.
2022-07-04 14:02:54 -07:00
Harshavardhana ce667ddae0
do not print errFileNotFound in entries.resolve() (#15216) 2022-07-04 06:40:46 -07:00
Harshavardhana 0fee993a4b
return appropriate error under 'decom status' (#15213)
fixes #15208
2022-07-01 16:21:23 -07:00
Poorna 0ea5c9d8e8
site healing: Skip stale iam asset updates from peer. (#15203)
Allow healing to apply IAM change only when peer
gave the most recent update.
2022-07-01 13:19:13 -07:00
Harshavardhana 63ac260bd5
Simplify Prometheus metrics gather (#15210) 2022-07-01 13:18:39 -07:00
Harshavardhana f9a4ad7904
update banner with version+runtime (#15206) 2022-06-30 13:58:09 -07:00
Minio Trusted e60b67d246 Revert "Tighten enforcement of object retention (#14993)"
This reverts commit 5e3010d455.

This commit causes regression on object locked buckets causine
delete-markers to be not created.
2022-06-30 13:06:32 -07:00
Klaus Post 9004d69c6f
Make ReqInfo concurrency safe (#15204)
Some read/writes of ReqInfo did not get appropriate locks, leading to races.

Make sure reading and writing holds appropriate locks.
2022-06-30 10:48:50 -07:00
Harshavardhana 8856a2d77b
finalize startup-banner and remove unnecessary logs (#15202) 2022-06-29 16:32:04 -07:00
Anis Elleuch 54a061bdda
Save minio version information centrally (#15181) 2022-06-29 14:45:49 -07:00
Poorna 7cc9286e0f
site healing: Skip stale bucket metadata updates from peer (#15186)
Allow healing to apply bucket metadata change only when peer
gave the most recent update.
2022-06-28 18:09:20 -07:00
Harshavardhana 2f25639ea0
update banner to reflect the final agreed UI (#15192) 2022-06-28 16:37:40 -07:00
Harshavardhana 2070c215a2
handle missing funcNames for handlers (#15188)
also use designated names for internal
calls

- storageREST calls are storageR
- lockREST calls are lockR
- peerREST calls are just peer

Named in this fashion to facilitate wildcard matches
by having prefixes of the same name.

Additionally, also enable funcNames for generic handlers
that return errors, currently we disable '<unknown>'
2022-06-28 05:04:10 -07:00
Harshavardhana 9c605ad153
allow support for parity '0', '1' enabling support for 2,3 drive setups (#15171)
allows for further granular setups

- 2 drives (1 parity, 1 data)
- 3 drives (1 parity, 2 data)

Bonus: allows '0' parity as well.
2022-06-27 20:22:18 -07:00
Anis Elleuch b7c7e59dac
Revert proxying requests with precondition errors (#15180)
In a replicated setup, when an object is updated in one cluster but
still waiting to be replicated to the other cluster, GET requests with
if-match, and range headers will likely fail. It is better to proxy
requests instead.

Also, this commit avoids printing verbose logs about precondition &
range errors.
2022-06-27 14:03:44 -07:00
Harshavardhana 699cf6ff45
perform object sweep after equeue the latest CopyObject() (#15183)
keep it similar to PutObject/CompleteMultipart
2022-06-27 12:11:33 -07:00
Anis Elleuch 9201870f6c
Remove unnecessary code in WalkDir() (#15168)
Recalculating forward is useless. It is never used and it will be
computed again when calling scanDir() again.
2022-06-27 10:26:56 -07:00
Harshavardhana 6722f58668
save MinIO version with each version (8-bytes extra) (#15170)
store MinIO version along with each version in 'xl.meta'
for future purposes, can be used as ways to add specific
code for bug fixes if any.
2022-06-27 03:59:41 -07:00
Harshavardhana 7b9b7cef11
add license banner for GNU AGPLv3 (#15178)
Bonus: rewrite subnet re-use of Transport
2022-06-27 03:58:25 -07:00
Harshavardhana bd099f5e71
fix: change timedValue to return the previously cached value (#15169)
fix: change timedvalue to return previous cached value

caller can interpret the underlying error and decide
accordingly, places where we do not interpret the
errors upon timedValue.Get() - we should simply use
the previously cached value instead of returning "empty".

Bonus: remove some unused code
2022-06-25 08:50:16 -07:00
Klaus Post baf257adcb
fix: health client leak when calling UpdateAllTargets (#15167)
When `LoadBucketMetadataHandler` is called and `UpdateAllTargets` gets called.

Since targets are rebuilt we cancel all.
2022-06-24 11:12:52 -07:00
Anis Elleuch 4fd1986885
Trace all http requests (#15064)
Add a generic handler that adds a new tracing context to the request if
tracing is enabled. Other handlers are free to modify the tracing
context to update information on the fly, such as, func name, enable
body logging etc..

With this commit, requests like this 

```
curl -H "Host: ::1:3000" http://localhost:9000/
```

will be traced as well.
2022-06-23 23:19:24 -07:00
Harshavardhana e1afac9439
reduce sha256 CPU usage by turning it off for speedtests (#15154)
continuation of the PR #15151, keeping signature v4 for
the headers however avoiding sha256 for the body.
2022-06-23 11:26:53 -07:00
Poorna 580d9db85e
Add APIs to import/export IAM data (#15014) 2022-06-23 09:25:15 -07:00
Anis Elleuch 42e2fd35d8
heal: Include dir markers when healing a fresh disk (#15158)
Directories markers are not healed when healing a new fresh disk. A
a proper fix would be moving object names encoding/decoding to erasure
object level but it is too late now since the object to set distribution is
calculated at a higher level.
2022-06-23 06:47:33 -07:00
Harshavardhana 1a40c7c27c
use signature-v2 for 'object perf' tests to avoid CPU using sha256 (#15151)
It is observed in a local 8 drive system the CPU seems to be
bottlenecked at

```
(pprof) top
Showing nodes accounting for 1385.31s, 88.47% of 1565.88s total
Dropped 1304 nodes (cum <= 7.83s)
Showing top 10 nodes out of 159
      flat  flat%   sum%        cum   cum%
      724s 46.24% 46.24%       724s 46.24%  crypto/sha256.block
   219.04s 13.99% 60.22%    226.63s 14.47%  syscall.Syscall
   158.04s 10.09% 70.32%    158.04s 10.09%  runtime.memmove
   127.58s  8.15% 78.46%    127.58s  8.15%  crypto/md5.block
    58.67s  3.75% 82.21%     58.67s  3.75%  github.com/minio/highwayhash.updateAVX2
    40.07s  2.56% 84.77%     40.07s  2.56%  runtime.epollwait
    33.76s  2.16% 86.93%     33.76s  2.16%  github.com/klauspost/reedsolomon._galMulAVX512Parallel84
     8.88s  0.57% 87.49%     11.56s  0.74%  runtime.step
     7.84s   0.5% 87.99%      7.84s   0.5%  runtime.memclrNoHeapPointers
     7.43s  0.47% 88.47%     22.18s  1.42%  runtime.pcvalue
```

Bonus changes:

- re-use transport for bucket replication clients, also site replication clients.
- use 32KiB buffer for all read and writes at transport layer seems to help
  TLS read connections.
- Do not have 'MaxConnsPerHost' this is problematic to be used with net/http
  connection pooling 'MaxIdleConnsPerHost' is enough.
2022-06-22 16:28:25 -07:00
Poorna cb097e6b0a
CopyObject: fix read/write err on closed pipe (#15135)
Fixes: #15128
Regression from PR#14971
2022-06-21 19:20:11 -07:00
Poorna 1cfb03fb74
replication: Avoid proxying when precondition failed (#15134)
Proxying is not required when content is on this cluster and
does not meet pre-conditions specified in the request.

Fixes #15124
2022-06-21 14:11:35 -07:00
Harshavardhana f293df647c
s3/zip: extract metadata properly for Zipped objects (#15123)
s3/zip: extra metadata properly for Zipped objects

fixes #15121
2022-06-21 14:11:12 -07:00
sota e2e5bd6f19
fix: cant parse comment without '=' in environment file (#15130) 2022-06-21 10:37:15 -07:00
Andreas Auernhammer cd7a0a9757
fips: simplify TLS configuration (#15127)
This commit simplifies the TLS configuration.
It inlines the FIPS / non-FIPS code.

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-06-21 07:54:48 -07:00
Anis Elleuch b3eda248a3
Parallelize new disks healing of different erasure sets (#15112)
- Always reformat all disks when a new disk is detected, this will
  ensure new uploads to be written in new fresh disks
- Always heal all buckets first when an erasure set started to be healed
- Use a lock to prevent two disks belonging to different nodes but in
  the same erasure set to be healed in parallel
- Heal different sets in parallel

Bonus:
- Avoid logging errUnformattedDisk when a new fresh disk is inserted but
  not detected by healing mechanism yet (10 seconds lag)
2022-06-21 07:53:55 -07:00
Harshavardhana 486888f595
remove gateway banner and some other TODO loggers (#15125) 2022-06-21 05:25:40 -07:00
Poorna b3ebc69034
improve error message for bucket metadata export/import API (#15120) 2022-06-20 16:13:45 -07:00
Harshavardhana 761dde2f1b
fix: add 'mc support inspect' support for single drive deployment (#15122) 2022-06-20 16:11:19 -07:00
Harshavardhana 2bb6a3f4d0
cleanup site replication error handling (#15113)
site replication errors were printed at
various random locations, repeatedly - this
PR attempts to remove double logging and
capture all of them at a common place.

This PR also enhances the code to show
partial success and errors as well.
2022-06-20 10:48:11 -07:00
Anis Elleuch 73733a8fb9
heal: Report correctly in multip-pools setup (#15117)
`mc admin heal -r <alias>` in a multi setup pools returns incorrectly
grey objects. The reason is that erasure-server-pools.HealObject() runs
HealObject in all pools and returns the result of the first nil
error. However, in the lower erasureObject level, HealObject() returns
nil if an object does not exist + missing error in each disk of the object
in that pool, therefore confusing mc.

Make erasureObject.HealObject() to return not found error in the lower
level, so at least erasureServerPools will know what pools to ignore.
2022-06-20 08:07:45 -07:00
Poorna 2fa1d8ac48
Add import/export APIs to migrate bucket metadata (#14929) 2022-06-18 06:55:39 -07:00
Poorna 8b9a19eef1
fix: typo in site replication version healing (#15103) 2022-06-17 16:43:24 -07:00
Aditya Manthramurthy 7f629df4d5
Add generic function to retrieve config value with metadata (#15083)
`config.ResolveConfigParam` returns the value of a configuration for any
subsystem based on checking env, config store, and default value. Also returns info
about which config source returned the value.

This is useful to return info about config params overridden via env in the user
APIs. Currently implemented only for OpenID subsystem, but will be extended for
others subsequently.
2022-06-17 11:39:21 -07:00
Anis Elleuch 98ddc3596c
Avoid CompleteMultipart freeze with unexpected network issue (#15102)
If sending a white space during a long S3 handler call fails,
the whitespace goroutine forgets to return a result to the caller.
Therefore, the complete multipart handler will be blocked.

Remember to send the header written result to the caller 
or/and close the channel.
2022-06-17 10:41:25 -07:00
Harshavardhana 5d23be6242
fix: ignore printing io.EOF during WalkDir() on concurrently modified objects (#15100)
fix: ignore print io.EOF during WalkDir() on concurrently modified objects
2022-06-17 08:23:47 -07:00
Poorna 55ee94bed0
initialize site replication subsys after loading metadata (#15099) 2022-06-16 19:00:35 -07:00
Harshavardhana d228d29944
update '-v' flag behavior to include copyRight and license (#15097)
```
~ minio -v
minio version DEVELOPMENT.2022-06-16T20-40-14Z (commit-id=e083228e2a06bfdcd006fee28d449cd2b47c542a)
Runtime: go1.18.3 linux/amd64
Copyright (c) 2015-2022 MinIO, Inc.
Licence AGPLv3 <https://www.gnu.org/licenses/agpl-3.0.html>
```
2022-06-16 16:10:48 -07:00
Harshavardhana 013cc66d8e
add dataErrs for healing debug log (#15092) 2022-06-16 09:42:45 -07:00
Harshavardhana c7ed6eee5e
fix: background local test also via channel (#15086)
current implementation for `standalone` setups
was blocking the `perf drive`.

Bonus: remove all old unused complicated code.
2022-06-15 14:51:42 -07:00
Harshavardhana 8082d1fed6
add bucket level S3 received/sent bytes (#15084)
adds bucket level metrics for bytes received and sent bytes on all S3 API calls.
2022-06-14 15:14:24 -07:00
Harshavardhana d2a10dbe69
fix: simplify healthcheck code to freeze calls only once (#15082)
- currently subnet health check was freezing and calling
  locks at multiple locations, avoid them.

- throw errors if first attempt itself fails with no results
2022-06-14 11:22:07 -07:00
Anis Elleuch 14645142db
erasure-sd: Evaluate versioning Prefix in multi-delete objects (#15081)
Erasure SD DeleteObjects() is only inheriting bucket versioning status
from the handler layer.

Add the missing versioning prefix evaluation for each object that will
deleted.
2022-06-14 10:05:12 -07:00
Anis Elleuch 0d00f3a55b
kms: initialize after cli parsing (#15076)
KMS depends on the --certs-dir flag. 

Ensure KMS is initialized after loading the flag.
2022-06-13 13:06:13 -07:00
Anis Elleuch dd53b287f2
sts: Avoid printing all STS errors (#15065)
Limit printing STS errors to 

- STS internal error
- STS not initialized
- STS upstream error
2022-06-11 12:55:32 -07:00
Harshavardhana 7413045f0e
fix: add missing minio_s3_requests_total (#15070)
PR #15052 caused a regression, add the missing metrics back.

Bonus:

- internode information should be only for distributed setups 
- update the dashboard to include 4xx and 5xx error panels.
2022-06-11 00:50:31 -07:00
Harshavardhana af1944f28d
support reading systemctl config automatically on baremetal setups (#15066)
this allows for customers to use `mc admin service restart`
directly even when performing RPM, DEB upgrades. Upon such 'restart'
after upgrade MinIO will re-read the /etc/default/minio for any
newer environment variables.

As long as `MINIO_CONFIG_ENV_FILE=/etc/default/minio` is set, this
is honored.
2022-06-10 09:59:15 -07:00
Harshavardhana 214ea14f29
fix: for frozen calls return if client disconnects (#15062) 2022-06-09 05:06:47 -07:00
Anis Elleuch 5fb420c703
prometheus: Add S3 4xx and 5xx S3 monitoring (#15052)
Currently minio_s3_requests_errors_total covers 4xx and 
5xx S3 responses which can be confusing when s3 applications 
sent a lot of HEAD requests with obvious 404 responses or 
when the replication is enabled.

Add 
- minio_s3_requests_4xx_errors_total
- minio_s3_requests_5xx_errors_total

to help users monitor 4xx and 5xx HTTP status codes separately.
2022-06-08 11:22:34 -07:00
Harshavardhana 2420f6c000
fix: make metrics endpoint responsive by reducing the chatter (#15055)
peerOnlineCounter was making NxN calls to many peers, this
can be really long and tedious if there are random servers
that are going down.

Instead we should calculate online peers from the point of
view of "self" and return those online and offline appropriately
by performing a healthcheck.
2022-06-08 02:43:13 -07:00
Harshavardhana b0d7332a0c
healthcheck cluster endpoint should honor write/readQuorum per pool (#15053) 2022-06-07 19:08:21 -07:00
Harshavardhana d55efc791f
relax O_DIRECT in single drive mode if unsupported (#15045) 2022-06-07 06:44:01 -07:00
Minio Trusted e2d4d097e7 do not print errors upon 'nil' err 2022-06-06 17:33:41 -07:00
Shireesh Anjal 4ce81fd07f
Add periodic callhome functionality (#14918)
* Add periodic callhome functionality

Periodically (every 24hrs by default), fetch callhome information and
upload it to SUBNET.

New config keys under the `callhome` subsystem:

enable - Set to `on` for enabling callhome. Default `off`
frequency - Interval between callhome cycles. Default `24h`

* Improvements based on review comments

- Update `enableCallhome` safely
- Rename pctx to ctx
- Block during execution of callhome
- Store parsed proxy URL in global subnet config
- Store callhome URL(s) in constants
- Use existing global transport
- Pass auth token to subnetPostReq
- Use `config.EnableOn` instead of `"on"`

* Use atomic package instead of lock

* Use uber atomic package

* Use `Cancel` instead of `cancel`

Co-authored-by: Harshavardhana <harsha@minio.io>

Co-authored-by: Harshavardhana <harsha@minio.io>
Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>
2022-06-06 16:14:52 -07:00
Harshavardhana df9eeb7f8f
fix: do not log concurrently when multiple disks return errors (#15044)
since the values inside 'context' are mutated internally by
logger, make sure to log serially upon errors not concurrently.
2022-06-06 15:15:11 -07:00
Harshavardhana 31c4fdbf79
fix: resyncing 'null' version on pre-existing content (#15043)
PR #15041 fixed replicating 'null' version however
due to a regression from #14994 caused the target
versions for these 'null' versioned objects to have
different 'versions', this may cause confusion with
bi-directional replication and cause double replication.

This PR fixes this properly by making sure we replicate
the correct versions on the objects.
2022-06-06 15:14:56 -07:00
Harshavardhana 48e367ff7d
reject resync start on misconfigured replication rules (#15041)
we expect resync to start on buckets with replication
rule ExistingObjects enabled, if not we reject such
calls.
2022-06-06 02:54:39 -07:00
Anis Elleuch fd02492cb7
avoid limits on the number of parallel trace/bucket notifications listeners (#14799)
Simplifies overall limits on the incoming listeners for notifications.

Fixes #14566
2022-06-05 14:29:12 -07:00