Commit Graph

4678 Commits

Author SHA1 Message Date
Harshavardhana 2420f6c000
fix: make metrics endpoint responsive by reducing the chatter (#15055)
peerOnlineCounter was making NxN calls to many peers, this
can be really long and tedious if there are random servers
that are going down.

Instead we should calculate online peers from the point of
view of "self" and return those online and offline appropriately
by performing a healthcheck.
2022-06-08 02:43:13 -07:00
Harshavardhana b0d7332a0c
healthcheck cluster endpoint should honor write/readQuorum per pool (#15053) 2022-06-07 19:08:21 -07:00
Harshavardhana d55efc791f
relax O_DIRECT in single drive mode if unsupported (#15045) 2022-06-07 06:44:01 -07:00
Minio Trusted e2d4d097e7 do not print errors upon 'nil' err 2022-06-06 17:33:41 -07:00
Shireesh Anjal 4ce81fd07f
Add periodic callhome functionality (#14918)
* Add periodic callhome functionality

Periodically (every 24hrs by default), fetch callhome information and
upload it to SUBNET.

New config keys under the `callhome` subsystem:

enable - Set to `on` for enabling callhome. Default `off`
frequency - Interval between callhome cycles. Default `24h`

* Improvements based on review comments

- Update `enableCallhome` safely
- Rename pctx to ctx
- Block during execution of callhome
- Store parsed proxy URL in global subnet config
- Store callhome URL(s) in constants
- Use existing global transport
- Pass auth token to subnetPostReq
- Use `config.EnableOn` instead of `"on"`

* Use atomic package instead of lock

* Use uber atomic package

* Use `Cancel` instead of `cancel`

Co-authored-by: Harshavardhana <harsha@minio.io>

Co-authored-by: Harshavardhana <harsha@minio.io>
Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>
2022-06-06 16:14:52 -07:00
Harshavardhana df9eeb7f8f
fix: do not log concurrently when multiple disks return errors (#15044)
since the values inside 'context' are mutated internally by
logger, make sure to log serially upon errors not concurrently.
2022-06-06 15:15:11 -07:00
Harshavardhana 31c4fdbf79
fix: resyncing 'null' version on pre-existing content (#15043)
PR #15041 fixed replicating 'null' version however
due to a regression from #14994 caused the target
versions for these 'null' versioned objects to have
different 'versions', this may cause confusion with
bi-directional replication and cause double replication.

This PR fixes this properly by making sure we replicate
the correct versions on the objects.
2022-06-06 15:14:56 -07:00
Harshavardhana 48e367ff7d
reject resync start on misconfigured replication rules (#15041)
we expect resync to start on buckets with replication
rule ExistingObjects enabled, if not we reject such
calls.
2022-06-06 02:54:39 -07:00
Anis Elleuch fd02492cb7
avoid limits on the number of parallel trace/bucket notifications listeners (#14799)
Simplifies overall limits on the incoming listeners for notifications.

Fixes #14566
2022-06-05 14:29:12 -07:00
Harshavardhana 5afdc56796
allow single drive mode to run on root disk (#15037)
for practical reasons, allow root disk based installs for single drive mode.
2022-06-03 12:53:42 -07:00
Harshavardhana c3e1da8e04
honor canceled context and do not leak on mergeChannels (#15034)
mergeEntryChannels has the potential to perpetually
wait on the results channel, context might be closed
and we did not honor the caller context canceling.
2022-06-03 05:59:02 -07:00
Anis Elleuch 20a753e2e5
Fix a possible service freeze after perf object (#15036)
The S3 service can be frozen indefinitely if a client or mc asks for object
perf API but quits early or has some networking issues. The reason is
that partialWrite() can block indefinitely.

This commit makes partialWrite() listens to context cancellation as well. It
also renames deadlinedCtx to healthCtx since it covers handler context
cancellation and not only not only the speedtest deadline.
2022-06-03 05:58:45 -07:00
Aditya Manthramurthy 61a7434379
Update --version option behavior (#15032)
- Add git commit ID
- Add go version
2022-06-02 18:40:53 -07:00
Poorna 29edb4ccfe
fix: site replication bucket heal to not panic if replication config is missing (#15025) 2022-06-02 12:34:03 -07:00
Anis Elleuch d4e565e595
Add defensive check for one stream message size (#15029)
In a streaming response, the client knows the size of a streamed
message but never checks the message size. Add the check to error 
out if the response message is truncated.
2022-06-02 09:16:26 -07:00
Klaus Post f7cecf0945
Make isIndexedMetaV2 return errors (#15012)
Indexed streams would be decoded by the legacy loader if there 
was an error loading it. Return an error when the stream is indexed 
and it cannot be loaded.

Fixes "unknown minor metadata version" on corrupted xl.meta files and 
returns an actual error.
2022-05-31 19:06:57 -07:00
Harshavardhana 52221db7ef
fix: for unexpected errors in reading versioning config panic (#14994)
We need to make sure if we cannot read bucket metadata
for some reason, and bucket metadata is not missing and
returning corrupted information we should panic such
handlers to disallow I/O to protect the overall state
on the system.

In-case of such corruption we have a mechanism now
to force recreate the metadata on the bucket, using
`x-minio-force-create` header with `PUT /bucket` API
call.

Additionally fix the versioning config updated state
to be set properly for the site replication healing
to trigger correctly.
2022-05-31 02:57:57 -07:00
Anis Elleuch 56a61bab56
test: Add GetObjectNInfo test with some outdated disks (#15004)
Add a test reading an object which has some old data in some outdated
disks, in a versioned and non-versioned bucket.
2022-05-30 17:52:59 -07:00
Harshavardhana d480022711
fix: invalidate outdated disks appropriately during readAllXL (#15002)
readAllXL would return inlined data for outdated disks
causing "read" to return incorrect content to the client,

this PR fixes this behavior by making sure we skip such
outdated disks appropriately based on the latest ModTime
on the disk.
2022-05-30 12:43:54 -07:00
Harshavardhana f1abb92f0c
feat: Single drive XL implementation (#14970)
Main motivation is move towards a common backend format
for all different types of modes in MinIO, allowing for
a simpler code and predictable behavior across all features.

This PR also brings features such as versioning, replication,
transitioning to single drive setups.
2022-05-30 10:58:37 -07:00
Harshavardhana 5792be71fa
fix: add timeouts to avoid goroutine leaks in net/http (#14995)
Following code can reproduce an unending go-routine buildup,
while keeping connections established due to lack of client
not closing the connections.

https://gist.github.com/harshavardhana/2d00e6f909054d2d2524c71485ad02e1

Without this PR all MinIO deployments can be put into
denial of service attacks, causing entire service to be
unavailable.

We bring in two timeouts at this stage to control such
go-routine build ups, new change

- IdleTimeout (to kill off idle connections)
- ReadHeaderTimeout (to kill off connections that are too slow)

This new change also brings two hidden options to make any
additional relevant changes if desired in some setups.
2022-05-30 06:24:51 -07:00
Poorna 5e3010d455
Tighten enforcement of object retention (#14993)
Ref issue#14991 - in the rare case that object in bucket under
retention has null version, make sure to enforce retention
rules.
2022-05-28 02:21:19 -07:00
Anis Elleuch ccbf65c8e8
site-repl: Fix deadlock after an IAM loading error (#14990)
Fix forgotten IAM cache lock releases when reading some data from
disk/etcd

Co-authored-by: Anis Elleuch <anis@min.io>
2022-05-27 10:26:38 -07:00
Harshavardhana 9d07cde385
use crypto/sha256 only for FIPS 140-2 compliance (#14983)
It would seem like the PR #11623 had chewed more
than it wanted to, non-fips build shouldn't really
be forced to use slower crypto/sha256 even for
presumed "non-performance" codepaths. In MinIO
there are really no "non-performance" codepaths.
This assumption seems to have had an adverse
effect in certain areas of CPU usage.

This PR ensures that we stick to sha256-simd
on all non-FIPS builds, our most common build
to ensure we get the best out of the CPU at
any given point in time.
2022-05-27 06:00:19 -07:00
Aditya Manthramurthy 464b9d7c80
Add support for Identity Management Plugin (#14913)
- Adds an STS API `AssumeRoleWithCustomToken` that can be used to 
  authenticate via the Id. Mgmt. Plugin.
- Adds a sample identity manager plugin implementation
- Add doc for plugin and STS API
- Add an example program using go SDK for AssumeRoleWithCustomToken
2022-05-26 17:58:09 -07:00
Poorna 5c81d0d89a
site replication: heal missing/invalid replication config (#14979)
Validate remote target ARNs and heal any stale rules in
the replication config
2022-05-26 17:57:23 -07:00
Klaus Post c0bf02b8b2
Ignore disks with 0 total space (#14981)
Ignore disks with 0 total

Mainly defensive to ensure no `/0` in percent calculation.
2022-05-26 06:01:50 -07:00
Harshavardhana fd46a1c3b3
fix: some races when accessing ldap/openid config globally (#14978) 2022-05-25 18:32:53 -07:00
Aditya Manthramurthy 5aae7178ad
Fix listing of service and sts accounts (#14977)
Now returns user does not exist error if the user is not known to the system
2022-05-25 15:28:54 -07:00
Harshavardhana dea8220eee
do not heal outdated disks > parityBlocks (#14976)
this PR also fixes a situation where incorrect
partsMetadata slice was used where fi.Data was
re-used from a single drive causing duplication
of the shards across all drives.

This happens for situations where shouldHeal()
returns true for all drives > parityBlocks.

To avoid this we should never attempt to heal on all
drives > parityBlocks, unless we are doing metadata
migration from xl.json -> xl.meta
2022-05-25 15:17:10 -07:00
Klaus Post a4be0b88f6
Add server pool reserved space (#14974)
If one or more pools reach 85% usage in a set, we will only 
use pools that have more free space.

In case all pools are above 85% we allow all of them to be used 
with the regular distribution.
2022-05-25 13:20:20 -07:00
Poorna d8101573be
Disallow deletion of ARN when under active replication (#14972)
fixes a regression from #12880
2022-05-24 19:40:45 -07:00
Klaus Post 41cdb357bb
Compensate for different server pool sizes (#14968)
When a server pool with a different number of sets is added they are 
not compensated when choosing a destination pool for new objects. 
This leads to the unbalanced placement of objects with smaller pools 
getting a bigger number of objects since we only compare the destination 
sets directly.

This change will compensate for differences in set sizes when choosing
the destination pool.

Different set sizes are already compensated by fewer disks.
2022-05-24 18:57:14 -07:00
Harshavardhana 38caddffe7
fix: copyObject on versioned bucket when updating metadata (#14971)
updating metadata with CopyObject on a versioned bucket
causes the latest version to be not readable, this PR fixes
this properly by handling the inline data bug fix introduced
in PR #14780.

This bug affects only inlined data.
2022-05-24 17:27:45 -07:00
Poorna 0e26f983d6
site replication: Allow replication rule edit (#14969)
Revert commit b42cfcea60 as too
restrictive
2022-05-24 13:27:33 -07:00
Anis Elleuch 77dc99e71d
Do not use inline data size in xl.meta quorum calculation (#14831)
* Do not use inline data size in xl.meta quorum calculation

Data shards of one object can different inline/not-inline decision
in multiple disks. This happens with outdated disks when inline
decision changes. For example, enabling bucket versioning configuration
will change the small file threshold.

When the parity of an object becomes low, GET object can return 503
because it is not unable to calculate the xl.meta quorum, just because
some xl.meta has inline data and other are not.

So this commit will be disable taking the size of the inline data into
consideration when calculating the xl.meta quorum.

* Add tests for simulatenous inline/notinline object

Co-authored-by: Anis Elleuch <anis@min.io>
2022-05-24 06:26:38 -07:00
Anis Elleuch 5041bfcb5c
replication healing: Fix typo when healing bucket quota info (#14966)
A typo is found in the replication healing code where an empty quota
configuration is sent to peer sites instead of the correct one.
.io>
2022-05-24 06:26:13 -07:00
Harshavardhana f8650a3493
fetch bucket replication stats across peers in single call (#14956)
current implementation relied on recursively calling one bucket
at a time across all peers, this would be very slow and chatty
when there are 100's of buckets which would mean 100*peerCount
amount of network operations.

This PR attempts to reduce this entire call into `peerCount`
amount of network calls only. This functionality addresses also a
concern where the Prometheus metrics would significantly slow
down when one of the peers is offline.
2022-05-23 09:15:30 -07:00
Klaus Post 90a52a29c5
Fix WalkDir fallback hot loop (#14961)
Fix fallback hot loop

fd was never refreshed, leading to an infinite hot loop if a disk failed and the fallback disk fails as well.

Fix & simplify retry loop.

Fixes #14960
2022-05-23 06:28:46 -07:00
Poorna 8859c92f80
Relax site replication syncing of service accounts (#14955)
Synchronous replication of service/sts accounts can be relaxed
as site replication healing should catch up when peer clusters
are back online.
2022-05-20 19:09:11 -07:00
Anis Elleuch 01e5632949
mrf: Fix stale MRF data showed in heal info (#14953)
One usee reported having mc admin heal status output ETA increasing
by time. It turned out it is MRF that is not clearing its data due to a
bug in the code.

pendingItems is increased when an object is queued to be healed but
never decreasd when there is a healing error. This commit will decrease
pendingItems and pendingBytes even when there is an error to give
accurate reporting.
2022-05-20 07:33:18 -07:00
Anis Elleuch 95a6b2c991
Merge LDAP STS policy evaluation with the generic STS code (#14944)
If LDAP is enabled, STS security token policy is evaluated using a
different code path and expects ldapUser claim to exist in the security
token. This means other STS temporary accounts generated by any Assume
Role function, such as AssumeRoleWithCertificate, won't be allowed to do any
operation as these accounts do not have LDAP user claim.

Since IsAllowedLDAPSTS() is similar to IsAllowedSTS(), this commit will
merge both.

Non harmful changes:
- IsAllowed for LDAP will start supporting RoleARN claim
- IsAllowed for LDAP will not check for parent claim anymore. This check doesn't
  seem to be useful since all STS login compare access/secret/security-token
  with the one saved in the disk.
- LDAP will support $username condition in policy documents.

Co-authored-by: Anis Elleuch <anis@min.io>
Co-authored-by: Aditya Manthramurthy <donatello@users.noreply.github.com>
2022-05-19 11:06:55 -07:00
Harshavardhana 30c9e50701
make sure to ignore expected errors and dirname deletes (#14945) 2022-05-18 17:58:19 -07:00
Aditya Manthramurthy 9aadd725d2
Avoid calling .Reset() on active timer (#14941)
.Reset() documentation states:

    For a Timer created with NewTimer, Reset should be invoked only on stopped
    or expired timers with drained channels.

This change is just to comply with this requirement as there might be some
runtime dependent situation that might lead to unexpected behavior.
2022-05-18 15:37:58 -07:00
Harshavardhana 6cfb1cb6fd
fix: timer usage across codebase (#14935)
it seems in some places we have been wrongly using the
timer.Reset() function, nicely exposed by an example
shared by @donatello https://go.dev/play/p/qoF71_D1oXD

this PR fixes all the usage comprehensively
2022-05-17 22:42:59 -07:00
Harshavardhana 2dc8ac1e62
allow IAM cache load to be granular and capture missed state (#14930)
anything that is stuck on the disk today can cause latency
spikes for all incoming S3 I/O, we need to have this
de-coupled so that we can make sure that latency in loading
credentials are not reflected back to the S3 API calls.

The approach this PR takes is by checking if the calls were
updated just in case when the IAM load was in progress,
so that we can use merge instead of "replacement" to avoid
missing state.
2022-05-17 19:58:47 -07:00
Harshavardhana 040ac5cad8
fix: when logger queue is full exit quickly upon doneCh (#14928)
Additionally only reload requested sub-system not everything
2022-05-16 16:10:51 -07:00
Harshavardhana 03f8b25b50
disable connectDisks loop under testing (#14920)
avoids races during tests, keeps tests predictable
2022-05-16 05:36:00 -07:00
Aditya Manthramurthy f28a8eca91
Add Access Management Plugin tests with OpenID (#14919) 2022-05-13 12:48:02 -07:00
Anis Elleuch ca69e54cb6
tests: Fix sporadic failure of TestXLStorageDeleteFile (#14911)
The test expects from DeleteFile to return errDiskNotFound when the disk
is not available. It calls os.RemoveAll() to remove one disk after XL storage
initialization. However, this latter contains some goroutines which can
race with os.RemoveAll() and then the test fails sporadically with
returning random errors.

The commit will tweak the initialization routine of the XL storage to
only run deletion of temporary and metacache data in the  background,
so TestXLStorageDeleteFile won't fail anymore.
2022-05-12 15:24:58 -07:00
Aditya Manthramurthy 4629abd5a2
Add tests for Access Management Plugin (#14909) 2022-05-12 15:24:19 -07:00
Harshavardhana dc99f4a7a3
allow bucket to be listed when GetBucketLocation is enabled (#14903)
currently, we allowed buckets to be listed from the
API call if and when the user has ListObject()
permission at the global level, this is okay to be
extended to GetBucketLocation() as well since

GetBucketLocation() is a "read" call and allowing "reads"
on a bucket has an implicit assumption that ListBuckets()
should be allowed.

This makes discoverability of access for read-only users
becomes easier or users with specific restrictions on their
policies.
2022-05-12 10:46:20 -07:00
Harshavardhana 9341201132
logger lock should be more granular (#14901)
This PR simplifies few things by splitting
the locks between audit, logger targets to
avoid potential contention between them.

any failures inside audit/logger HTTP
targets must only log to console instead
of other targets to avoid cyclical dependency.

avoids unneeded atomic variables instead
uses RWLock to differentiate a more common
read phase v/s lock phase.
2022-05-12 07:20:58 -07:00
Krishnan Parthasarathi 88dd83a365
lifecycle: Set opts.VersionSuspended when expiring objects (#14902) 2022-05-12 06:09:24 -07:00
Harshavardhana 60d0611ac2
use BadRequest HTTP status instead of Conflict for certain errors (#14900)
PutBucketVersioning API should return BadRequest for errors
instead of Conflict, Conflict is used for "AlreadyExists"
resource situations.
2022-05-11 13:44:16 -07:00
Harshavardhana f939222942
add support for extra prometheus labels (#14899)
fixes #14353
2022-05-11 13:04:53 -07:00
Krishna Srinivas e34ca9acd1
retry each object decom upto 3 times, in-case of failure (#14861) 2022-05-11 11:37:32 -07:00
Aditya Manthramurthy 83071a3459
Add support for Access Management Plugin (#14875)
- This change renames the OPA integration as Access Management Plugin - there is
nothing specific to OPA in the integration, it is just a webhook.

- OPA configuration is automatically migrated to Access Management Plugin and
OPA specific configuration is marked as deprecated.

- OPA doc is updated and moved.
2022-05-10 17:14:55 -07:00
Anis Elleuch edf364bf21
tracing: Add disk path to storage tracing (#14883)
Example:

2022-05-09T17:14:04:000 [STORAGE] storage.ListVols 127.0.0.1:9000 /tmp/xl/2 / 227.834µs
2022-05-09T17:14:04:000 [STORAGE] storage.ListVols 127.0.0.1:9000 /tmp/xl/4 / 236.042µs
2022-05-09T17:14:04:000 [STORAGE] storage.ListVols 127.0.0.1:9000 /tmp/xl/3 / 130.958µs
2022-05-09T17:14:04:000 [STORAGE] storage.ListVols 127.0.0.1:9000 /tmp/xl/1 / 102.875µs
2022-05-10 07:48:07 -07:00
Anis Elleuch 1e037883b0
pools: GetObjectNInfo should cover locking during object read (#14887)
In case of multi-pools setup, GetObjectNInfo returns a GetObjectReader
but it unlocks the read lock when quitting GetObjectNInfo. This should
not happen, unlock should only happen when GetObjectReader is closed.
2022-05-10 07:47:40 -07:00
Klaus Post d909f167ff
tests: Add localLocker RUnlock test (#14882) 2022-05-09 09:55:52 -07:00
Harshavardhana 62aa42cccf
avoid replication proxy on version excluded paths (#14878)
no need to attempt proxying objects that were
never replicated, but do have local `null`
versions on them.
2022-05-08 16:50:31 -07:00
Harshavardhana 5cffd3780a
fix: multiple fixes in prefix exclude implementation (#14877)
- do not need to restrict prefix exclusions that do not
  have `/` as suffix, relax this requirement as spark may
  have staging folders with other autogenerated characters
  , so we are better off doing full prefix March and skip. 

- multiple delete objects was incorrectly creating a
  null delete marker on a versioned bucket instead of
  creating a proper versioned delete marker.

- do not suspend paths on the excluded prefixes during
  delete operations to avoid creating `null` delete markers,
  honor suspension of versioning only at bucket level for
  delete markers.
2022-05-07 22:06:44 -07:00
Harshavardhana def75ffcfe
allow versioning config changes under site replication (#14876)
PR #14828 introduced prefix-level exclusion of versioning
and replication - however our site replication implementation
since it defaults versioning on all buckets did not allow
changing versioning configuration once the bucket was created.

This PR changes this and ensures that such changes are honored
and also propagated/healed across sites appropriately.
2022-05-07 18:39:40 -07:00
Krishnan Parthasarathi ad8e611098
feat: implement prefix-level versioning exclusion (#14828)
Spark/Hadoop workloads which use Hadoop MR 
Committer v1/v2 algorithm upload objects to a 
temporary prefix in a bucket. These objects are 
'renamed' to a different prefix on Job commit. 
Object storage admins are forced to configure 
separate ILM policies to expire these objects 
and their versions to reclaim space.

Our solution:

This can be avoided by simply marking objects 
under these prefixes to be excluded from versioning, 
as shown below. Consequently, these objects are 
excluded from replication, and don't require ILM 
policies to prune unnecessary versions.

-  MinIO Extension to Bucket Version Configuration
```xml
<VersioningConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> 
        <Status>Enabled</Status>
        <ExcludeFolders>true</ExcludeFolders>
        <ExcludedPrefixes>
          <Prefix>app1-jobs/*/_temporary/</Prefix>
        </ExcludedPrefixes>
        <ExcludedPrefixes>
          <Prefix>app2-jobs/*/__magic/</Prefix>
        </ExcludedPrefixes>

        <!-- .. up to 10 prefixes in all -->     
</VersioningConfiguration>
```
Note: `ExcludeFolders` excludes all folders in a bucket 
from versioning. This is required to prevent the parent 
folders from accumulating delete markers, especially
those which are shared across spark workloads 
spanning projects/teams.

- To enable version exclusion on a list of prefixes

```
mc version enable --excluded-prefixes "app1-jobs/*/_temporary/,app2-jobs/*/_magic," --exclude-prefix-marker myminio/test
```
2022-05-06 19:05:28 -07:00
Shireesh Anjal 3ec1844e4a
return kubernetes info in health report (#14865) 2022-05-06 12:41:07 -07:00
Poorna 523670ba0d
fix: site removal API error handling (#14870)
when the site is being removed is missing replication config. This can
happen when a new deployment is brought in place of a site that
is lost/destroyed and needs to delink old deployment from site
replication.
2022-05-06 12:40:34 -07:00
Harshavardhana 35dea24ffd
fix: console log peer API from its broken implementation (#14873)
console logging peer API was broken as it would
timeout after 15minutes, this never really worked
beyond this value and basically failed to provide
the streaming "log" functionality that was expected
from this implementation.

also fix convoluted channel handling by keeping things
simple, this is rewritten.
2022-05-06 12:39:58 -07:00
Harshavardhana c7df1ffc6f
avoid concurrent reads and writes to opts.UserDefined (#14862)
do not modify opts.UserDefined after object-handler
has set all the necessary values, any mutation needed
should be done on a copy of this value not directly.

As there are other pieces of code that access opts.UserDefined
concurrently this becomes challenging.

fixes #14856
2022-05-05 04:14:41 -07:00
Aditya Manthramurthy 2b7e75e079
Add OPA doc and remove deprecation marking (#14863) 2022-05-04 23:53:42 -07:00
Anis Elleuch 44a3b58e52
Add audit log for decommissioning (#14858) 2022-05-04 00:45:27 -07:00
Anis Elleuch 46de9ac03e
Decom: Easily restart decommission when it is done (#14855)
When a decommission task is successfully completed, failed, or canceled,
this commit allows restarting the decommission again. Restarting is not
allowed when there is an ongoing decommission task.
2022-05-03 13:36:08 -07:00
Harshavardhana f0462322fd
fix: remove embedded-policy as requested by the user (#14847)
this PR introduces a few changes such as

- sessionPolicyName is not reused in an extracted manner
  to apply policies for incoming authenticated calls,
  instead uses a different key to designate this
  information for the callers.

- this differentiation is needed to ensure that service
  account updates do not accidentally store JSON representation
  instead of base64 equivalent on the disk.

- relax requirements for Deleting a service account, allow
  deleting a service account that might be unreadable, i.e
  a situation where the user might have removed session policy 
  which now carries a JSON representation, making it unparsable.

- introduce some constants to reuse instead of strings.

fixes #14784
2022-05-02 17:56:19 -07:00
Klaus Post c59d2a6288
Log Range Header if present in the request (#14851)
Add Range header as param to easier debug of Range requests.
2022-05-02 10:37:26 -07:00
Klaus Post 3e3ff2a70b
Check error status codes (#14850)
If an invalid status code is generated from an error we risk panicking. Even if there 
are no potential problems at the moment we should prevent this in the future.

Add safeguards against this.

Sample trace:

```
May 02 06:41:39   minio[52806]: panic: "GET /20180401230655.PDF": invalid WriteHeader code 0
May 02 06:41:39   minio[52806]: goroutine 16040430822 [running]:
May 02 06:41:39   minio[52806]: runtime/debug.Stack(0xc01fff7c20, 0x25c4b00, 0xc0490e4080)
May 02 06:41:39   minio[52806]:         runtime/debug/stack.go:24 +0x9f
May 02 06:41:39   minio[52806]: github.com/minio/minio/cmd.setCriticalErrorHandler.func1.1(0xc022048800, 0x4f38ab0, 0xc0406e0fc0)
May 02 06:41:39   minio[52806]:         github.com/minio/minio/cmd/generic-handlers.go:469 +0x85
May 02 06:41:39   minio[52806]: panic(0x25c4b00, 0xc0490e4080)
May 02 06:41:39   minio[52806]:         runtime/panic.go:965 +0x1b9
May 02 06:41:39   minio[52806]: net/http.checkWriteHeaderCode(...)
May 02 06:41:39   minio[52806]:         net/http/server.go:1092
May 02 06:41:39   minio[52806]: net/http.(*response).WriteHeader(0xc0406e0fc0, 0x0)
May 02 06:41:39   minio[52806]:         net/http/server.go:1126 +0x718
May 02 06:41:39   minio[52806]: github.com/minio/minio/internal/logger.(*ResponseWriter).WriteHeader(0xc032fa3ea0, 0x0)
May 02 06:41:39   minio[52806]:         github.com/minio/minio/internal/logger/audit.go:116 +0xb1
May 02 06:41:39   minio[52806]: github.com/minio/minio/internal/logger.(*ResponseWriter).WriteHeader(0xc032fa3f40, 0x0)
May 02 06:41:39   minio[52806]:         github.com/minio/minio/internal/logger/audit.go:116 +0xb1
May 02 06:41:39   minio[52806]: github.com/minio/minio/internal/logger.(*ResponseWriter).WriteHeader(0xc002ce8000, 0x0)
May 02 06:41:39   minio[52806]:         github.com/minio/minio/internal/logger/audit.go:116 +0xb1
May 02 06:41:39   minio[52806]: github.com/minio/minio/cmd.writeResponse(0x4f364a0, 0xc002ce8000, 0x0, 0xc0443b86c0, 0x1cb, 0x224, 0x2a9651e, 0xf)
May 02 06:41:39   minio[52806]:         github.com/minio/minio/cmd/api-response.go:736 +0x18d
May 02 06:41:39   minio[52806]: github.com/minio/minio/cmd.writeErrorResponse(0x4f44218, 0xc069086ae0, 0x4f364a0, 0xc002ce8000, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc00656afc0)
May 02 06:41:39   minio[52806]:         github.com/minio/minio/cmd/api-response.go:798 +0x306
May 02 06:41:39   minio[52806]: github.com/minio/minio/cmd.objectAPIHandlers.getObjectHandler(0x4b73768, 0x4b73730, 0x4f44218, 0xc069086ae0, 0x4f82090, 0xc002d80620, 0xc040e03885, 0xe, 0xc040e03894, 0x61, ...)
May 02 06:41:39   minio[52806]:         github.com/minio/minio/cmd/object-handlers.go:456 +0x252c
```
2022-05-02 10:36:29 -07:00
Harshavardhana 16bc11e72e
fix: disallow newer policies, users & groups with space characters (#14845)
space characters at the beginning or at the end can lead to
confusion under various UI elements in differentiating the
actual name of "policy, user or group" - to avoid this behavior
this PR onwards we shall reject such inputs for newer entries.

existing saved entries will behave as is and are going to be
operable until they are removed/renamed to something more
meaningful.
2022-05-02 09:27:35 -07:00
Harshavardhana 2719f1efaa
fix: reject invalid r.Host headers (#14846)
r.Host headers can come in unparsed that may contain
invalid hostnames, reject such requests as invalid.

This is a continuation fix from #14844
2022-05-02 04:42:41 -07:00
Harshavardhana 39ac62a1a1
fix: panic in browser redirect handler for unexpected r.Host (#14844)
```
panic: "GET /": invalid hostname
goroutine 148 [running]:
runtime/debug.Stack()
	runtime/debug/stack.go:24 +0x65
github.com/minio/minio/cmd.setCriticalErrorHandler.func1.1()
	github.com/minio/minio/cmd/generic-handlers.go:469 +0x8e
panic({0x2201f00, 0xc001f1ddd0})
	runtime/panic.go:1038 +0x215
github.com/minio/pkg/net.URL.String({{0x25aa417, 0x5}, {0x0, 0x0}, 0x0, {0xc000174380, 0xd7}, {0x0, 0x0}, {0x0, ...}, ...})
	github.com/minio/pkg@v1.1.23/net/url.go:97 +0xfe
github.com/minio/minio/cmd.setBrowserRedirectHandler.func1({0x49af080, 0xc0003c20e0}, 0xc00002ea00)
	github.com/minio/minio/cmd/generic-handlers.go:136 +0x118
net/http.HandlerFunc.ServeHTTP(0xc00002ea00, {0x49af080, 0xc0003c20e0}, 0xa)
	net/http/server.go:2047 +0x2f
github.com/minio/minio/cmd.setAuthHandler.func1({0x49af080, 0xc0003c20e0}, 0xc00002ea00)
	github.com/minio/minio/cmd/auth-handler.go:525 +0x3d8
net/http.HandlerFunc.ServeHTTP(0xc00002e900, {0x49af080, 0xc0003c20e0}, 0xc001f33701)
	net/http/server.go:2047 +0x2f
github.com/gorilla/mux.(*Router).ServeHTTP(0xc0025d0780, {0x49af080, 0xc0003c20e0}, 0xc00002e800)
	github.com/gorilla/mux@v1.8.0/mux.go:210 +0x1cf
github.com/rs/cors.(*Cors).Handler.func1({0x49af080, 0xc0003c20e0}, 0xc00002e800)
	github.com/rs/cors@v1.7.0/cors.go:219 +0x1bd
net/http.HandlerFunc.ServeHTTP(0x0, {0x49af080, 0xc0003c20e0}, 0xc00068d9f8)
	net/http/server.go:2047 +0x2f
github.com/minio/minio/cmd.setCriticalErrorHandler.func1({0x49af080, 0xc0003c20e0}, 0x4a5cd3)
	github.com/minio/minio/cmd/generic-handlers.go:476 +0x83
net/http.HandlerFunc.ServeHTTP(0x72, {0x49af080, 0xc0003c20e0}, 0x0)
	net/http/server.go:2047 +0x2f
github.com/minio/minio/internal/http.(*Server).Start.func1({0x49af080, 0xc0003c20e0}, 0x10000c001f1dda0)
	github.com/minio/minio/internal/http/server.go:105 +0x1b6
net/http.HandlerFunc.ServeHTTP(0x0, {0x49af080, 0xc0003c20e0}, 0x46982e)
	net/http/server.go:2047 +0x2f
net/http.serverHandler.ServeHTTP({0xc003dc1950}, {0x49af080, 0xc0003c20e0}, 0xc00002e800)
	net/http/server.go:2879 +0x43b
net/http.(*conn).serve(0xc000514d20, {0x49cfc38, 0xc0010c0e70})
	net/http/server.go:1930 +0xb08
created by net/http.(*Server).Serve
	net/http/server.go:3034 +0x4e8
```
2022-05-01 13:45:45 -07:00
Harshavardhana 85f3a9f3b0 Remove Azure gateway implementation (#14418)
refer #14331
2022-04-29 12:51:23 -07:00
Klaus Post 13ba4b433d
Clean up cpuio profiling (#14838)
Don't start regular cpu profile as well. Use bed madmin const.
2022-04-29 09:35:42 -07:00
Aditya Manthramurthy 0e502899a8
Add support for multiple OpenID providers with role policies (#14223)
- When using multiple providers, claim-based providers are not allowed. All
providers must use role policies.

- Update markdown config to allow `details` HTML element
2022-04-28 18:27:09 -07:00
Harshavardhana 424b44c247
allow changing server command line from http->https (#14832)
this is allowed as long as order is preserved as is
on an existing setup, the new command line is updated
in `pool.bin` to facilitate future decommission's on
these pools.
2022-04-28 16:27:53 -07:00
Harshavardhana 01a71c366d
allow service accounts and temp credentials site-level healing (#14829)
This PR introduces support for site level

- service account healing
- temporary credentials healing
2022-04-28 02:39:00 -07:00
Harshavardhana 5a9a898ba2
allow forcibly creating metadata on buckets (#14820)
introduce x-minio-force-create environment variable
to force create a bucket and its metadata as required,
it is useful in some situations when bucket metadata
needs recovery.
2022-04-27 04:44:07 -07:00
Harshavardhana c56a139fdc
fix: support decommissioning directory objects (#14822)
improvements in this PR include

- decommission objects that have __XLDIR__ suffix
- decommission objects that have `null` version on
  a versioned bucket.
- make sure to look for any "decom" failures to ensure
  that we do not wrong conclude decom as complete without
  all files getting copied over.
- break out eagerly upon first error for objects with
  multiple versions, leave the object as is for support
  debugging and analysis.
2022-04-26 20:06:41 -07:00
Anis Elleuch df50eda811
Add number of versions in server info API (#14812)
The goal is to show the number of versions in the server info API.
2022-04-25 22:04:10 -07:00
Aditya Manthramurthy f5d3313210
Increase context timeout for IAM concurrency test (#14817)
- This should reduce failures in Windows CI
2022-04-25 20:14:20 -07:00
Daniel Valdivia b7dd61f6bc
Fix double slash subpath for console (#14815)
Signed-off-by: Daniel Valdivia <18384552+dvaldivia@users.noreply.github.com>
2022-04-25 13:05:56 -07:00
Harshavardhana 0cc993f403 Remove GCS, HDFS gateway implementations #14418
refer #14331
2022-04-24 10:19:17 -07:00
Poorna 3a64580663
Add support for site replication healing (#14572)
heal bucket metadata and IAM entries for
sites participating in site replication from
the site with the most updated entry.

Co-authored-by: Harshavardhana <harsha@minio.io>
Co-authored-by: Aditya Manthramurthy <aditya@minio.io>
2022-04-24 02:36:31 -07:00
Harshavardhana d087e28dce
start using t.SetEnv instead of os.Setenv (#14787) 2022-04-23 15:33:45 -07:00
Klaus Post 96adfaebe1
Make storage class config dynamic (#14791)
Updating the storage class is already thread safe, so we can do this safely.
2022-04-21 12:07:33 -07:00
Aditya Manthramurthy ddf84f8257
fix: concurrency bug in site-replication (#14786)
The site replication status call was using a loop iteration variable sent
directly into go-routines instead of being passed as an argument. As the
variable is being updated in the loop, previously launched go routines do not
necessarily use the value at the time they were launched.
2022-04-20 16:20:07 -07:00
Harshavardhana 507f993075
attempt to real resolve when there is a quorum failure on reads (#14613) 2022-04-20 12:49:05 -07:00
Harshavardhana 73a6a60785
fix: replication deleteObject() regression and CopyObject() behavior (#14780)
This PR fixes two issues

- The first fix is a regression from #14555, the fix itself in #14555
  is correct but the interpretation of that information by the
  object layer code for "replication" was not correct. This PR
  tries to fix this situation by making sure the "Delete" replication
  works as expected when "VersionPurgeStatus" is already set.

  Without this fix, there is a DELETE marker created incorrectly on
  the source where the "DELETE" was triggered.

- The second fix is perhaps an older problem started since we inlined-data
  on the disk for small objects, CopyObject() incorrectly inline's
  a non-inlined data. This is due to the fact that we have code where
  we read the `part.1` under certain conditions where the size of the
  `part.1` is less than the specific "threshold".

  This eventually causes problems when we are "deleting" the data that
  is only inlined, which means dataDir is ignored leaving such
  dataDir on the disk, that looks like an inconsistent content on
  the namespace.

fixes #14767
2022-04-20 10:22:05 -07:00
Anis Elleuch cf4cf58faf
Do not allow parallel upgrade in one server (#14782)
It is wasteful to allow parallel upgrades of MinIO server. This also generates
 weird error invoked by selfupdate module when it happens such as:

'rename /opt/bin/.minio.old /opt/bin/..minio.old.old'
2022-04-20 06:18:21 -07:00
polaris-megrez 6bc3c74c0c
honor client context in IAM user/policy listing calls (#14682) 2022-04-19 09:00:19 -07:00
Harshavardhana 598ce1e354
supply prefix filtering when necessary (#14772)
currently filterPefix was never used and set
that would filter out entries when needed
when `prefix` doesn't end with `/` - this
often leads to objects getting Walked(), Healed()
that were never requested by the caller.
2022-04-19 08:20:48 -07:00
Harshavardhana 7e248fc0ba
wait on parallel decom to complete before returning (#14764)
without this wait there is a potential for some objects
that are in actively being decommissioned would cancel,
however the decommission status might wrongly conclude
this as "Complete".

To avoid this make sure to add waitgroups on the parallel
workers, allowing parallel copies to complete fully before
we return.
2022-04-18 13:26:29 -07:00
Daniel Valdivia c526fa9119
Support console UI access at a subpath on a subdomain (#14761)
fixes #14285 

Signed-off-by: Daniel Valdivia <18384552+dvaldivia@users.noreply.github.com>
2022-04-17 16:01:49 -07:00
Anis Elleuch a5b3548ede
Bring back listing LDAP users temporarly (#14760)
In previous releases, mc admin user list would return the list of users
that have policies mapped in IAM database. However, this was removed but
this commit will bring it back until we revamp this.
2022-04-15 21:26:02 -07:00
Harshavardhana 8318aa0113
cancel active routine only after metadata has been saved (#14757)
currently updated pool.bin was not saved properly, that would
lead to unable to remove a pool upon a successful decommission.

fixes #14756
2022-04-15 13:16:15 -07:00
Harshavardhana e69c42956b
fix: IAM reload should only list at config/iam/ precisely (#14753) 2022-04-15 12:12:45 -07:00
Aditya Manthramurthy e8e48e4c4a
S3 select switch to new parquet library and reduce locking (#14731)
- This change switches to a new parquet library
- SelectObjectContent now takes a single lock at the beginning and holds it
during the operation. Previously the operation took a lock every time the
parquet library performed a Seek on the underlying object stream.
- Add basic support for LogicalType annotations for timestamps.
2022-04-14 06:54:47 -07:00
Harshavardhana 2a6a40e93b
enable go1.18.x builds (#14746) 2022-04-13 14:21:55 -07:00
Harshavardhana eda34423d7 update gofumpt -w - new changes 2022-04-13 12:00:11 -07:00
Shireesh Anjal 5c53620a72
Include speedtest as part of healthinfo api (#14696)
Execute the object, drive and net speedtests as part of the healthinfo
(if requested by the client), and include their result in the response.

The options for the speedtests have been picked from the default values
used by `mc support perf` command.
2022-04-12 13:17:44 -07:00
Krishna Srinivas 5f94cec1e2
Allow parallel decom migration threads to be more than erasure sets (#14733) 2022-04-12 10:49:53 -07:00
Krishnan Parthasarathi 28d3ad3ada
Honor object retention when applying ILM policies (#14732) 2022-04-11 21:55:56 -07:00
Aditya Manthramurthy 66b14a0d32
Fix service account privilege escalation (#14729)
Ensure that a regular unprivileged user is unable to create service accounts for other users/root.
2022-04-11 15:30:28 -07:00
Harshavardhana 153a612253
fetch bucket retention config once for ILM evalAction (#14727)
This is mainly an optimization, does not change any
existing functionality.
2022-04-11 13:25:32 -07:00
Krishnan Parthasarathi 1a1b55e133
Add support for minio tier type (#14468) 2022-04-11 13:24:40 -07:00
Harshavardhana e77ad3f9bb
make sure to pass Lifecycle if set for List filtering (#14722)
PR #14606 never really passed the Lifecycle filter
down to the listing callers to ensure skipping the
entries.
2022-04-10 11:14:52 -07:00
Harshavardhana 4ce86ff5fa
align atomic variables once more for 32bit (#14721) 2022-04-09 22:19:44 -07:00
Harshavardhana 601a744159
pass the necessary query params for remote NSSCanner (#14719)
fixes a regression from #14464
2022-04-09 08:09:52 -07:00
Poorna a1b01e6d5f
Combine profiling start/stop APIs into one (#14662)
Take profile duration as a query parameter for profile API
2022-04-08 12:44:35 -07:00
Krishna Srinivas 48594617b5
Parallelize decommissioning process (#14704) 2022-04-07 23:19:13 -07:00
Krishna Srinivas b35b9dcff7
Use S3 client for uplooads/downloads during perf test (#14570) 2022-04-07 21:20:40 -07:00
Lenin Alevski a3e317773a
Skip commented lines when parsing MinIO configuration file (#14710)
Signed-off-by: Lenin Alevski <alevsk.8772@gmail.com>
2022-04-07 16:02:51 -07:00
Anis Elleuch 16431d222c
heal: Enable periodic bitrot scan configuration (#14464) 2022-04-07 08:10:40 -07:00
Harshavardhana ee49a23220
resume/start decommission on the first node of the pool under decommission (#14705)
Additionally fixes

- IsSuspended() can use read locks
- Avoid double cancels panic on canceler
2022-04-06 23:42:05 -07:00
Harshavardhana a9eef521ec skip config/history/ during IAM load (#14698) 2022-04-06 21:03:41 -07:00
Klaus Post 901d33b59c
Tweak listing quorum (#14703)
Always go for 50% quorum, and only use non-healing disks.

Fixes #14635
2022-04-06 12:24:21 -07:00
Harshavardhana 00ebea2536
skip config/history/ during IAM load (#14698) 2022-04-05 19:00:59 -07:00
Klaus Post dedf9774c7
Set inspect-input.txt modtime (#14688)
If no time given, use current time.
2022-04-05 13:06:10 -07:00
Andreas Auernhammer 6b1c62133d
listing: improve listing of encrypted objects (#14667)
This commit improves the listing of encrypted objects:
 - Use `etag.Format` and `etag.Decrypt`
 - Detect SSE-S3 single-part objects in a single iteration
 - Fix batch size to `250`
 - Pass request context to `DecryptAll` to not waste resources
   when a client cancels the operation.

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-04-04 11:42:03 -07:00
Anis Elleuch d4251b2545
Remove unnecessary log printing (#14685)
Co-authored-by: Anis Elleuch <anis@min.io>
2022-04-04 11:10:06 -07:00
Andreas Auernhammer b9d1698d74
etag: add `Format` and `Decrypt` functions (#14659)
This commit adds two new functions to the
internal `etag` package:
 - `ETag.Format`
 - `Decrypt`

The `Decrypt` function decrypts an encrypted
ETag using a decryption key. It returns not
encrypted / multipart ETags unmodified.

The `Decrypt` function is mainly used when
handling SSE-S3 encrypted single-part objects.
In particular, the ETag of an SSE-S3 encrypted
single-part object needs to be decrypted since
S3 clients expect that this ETag is equal to the
content MD5.

The `ETag.Format` method also covers SSE ETag handling.
MinIO encrypts all ETags of SSE single part objects.
However, only the ETag of SSE-S3 encrypted single part
objects needs to be decrypted.
The ETag of an SSE-C or SSE-KMS single part object
does not correspond to its content MD5 and can be
a random value.
The `ETag.Format` function formats an ETag such that
it is an AWS S3 compliant ETag. In particular, it
returns non-encrypted ETags (single / multipart)
unmodified. However, for encrypted ETags it returns
the trailing 16 bytes as ETag. For encrypted ETags
the last 16 bytes will be a random value.

The main purpose of `Format` is to format ETags
such that clients accept them as well-formed AWS S3
ETags.
It differs from the `String` method since `String`
will return string representations for encrypted
ETags that are not AWS S3 compliant.

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-04-03 13:29:13 -07:00
Shireesh Anjal 7c696e1cb6
Write deployment id to health report at the start (#14673)
The deployment id was being written to the health report towards the end
of the handler. Because of this, if there was a timeout in any of the
data fetching, the deployment id was not getting written at all. Upload
of such reports fails on SUBNET as deployment id is the unique
identifier for a cluster in subnet.

Fixed by writing the deployment id at the beginning of the processing.
2022-04-03 13:15:02 -07:00
Aditya Manthramurthy 165d60421d
Add metrics for observing IAM sync operations (#14680) 2022-04-03 13:08:59 -07:00
Poorna 0e6aedc7ed
Capture cmdline args for inspect API (#14668)
Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
2022-03-31 16:05:43 -07:00
Aditya Manthramurthy fc9668baa5
Increase IAM refresh rate to every 10 mins (#14661)
Add timing information for IAM init and refresh
2022-03-30 17:02:59 -07:00
Andreas Auernhammer ba17d46f15
ListObjectParts: simplify ETag decryption and size adjustment (#14653)
This commit simplifies the ETag decryption and size adjustment
when listing object parts.

When listing object parts, MinIO has to decrypt the ETag of all
parts if and only if the object resp. the parts is encrypted using
SSE-S3.
In case of SSE-KMS and SSE-C, MinIO returns a pseudo-random ETag.
This is inline with AWS S3 behavior.

Further, MinIO has to adjust the size of all encrypted parts due to
the encryption overhead.

The ListObjectParts does specifically not use the KMS bulk decryption
API (4d2fc530d0) since the ETags of all
parts are encrypted using the same object encryption key. Therefore,
MinIO only has to connect to the KMS once, even if there are multiple
parts resp. ETags. It can simply reuse the same object encryption key.

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-03-30 15:23:25 -07:00
Krishna Srinivas bdd816488d
Get the BackendInfo to fill the apporpriate struct fields (#14660) 2022-03-30 10:48:35 -07:00
Krishna Srinivas 36dcfee2f7
Allow decomission of pool even if a drive in it is down (#14656) 2022-03-29 22:51:31 -07:00
Poorna 4d13ddf6b3
Avoid shadowing error during replication proxy check (#14655)
Fixes #14652
2022-03-29 10:53:09 -07:00
Poorna 9e25475475
Validate tier manager is initialized in tier Empty() check (#14646)
Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io>
2022-03-29 10:10:06 -07:00
Andreas Auernhammer e955aa7f2a
kes: add support for encrypted private keys (#14650)
This commit adds support for encrypted KES
client private keys.

Now, it is possible to encrypt the KES client
private key (`MINIO_KMS_KES_KEY_FILE`) with
a password.

For example, KES CLI already supports the
creation of encrypted private keys:
```
kes identity new --encrypt --key client.key --cert client.crt MinIO
```

To decrypt an encrypted private key, the password
needs to be provided:
```
MINIO_KMS_KES_KEY_PASSWORD=<password>
```

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-03-29 09:53:33 -07:00
Harshavardhana 7956ff0313
fix: multiple pool setup return incorrect DeleteMarker metadata (#14642) 2022-03-27 23:39:50 -07:00
Aditya Manthramurthy 9ff25fb64b
Load IAM in-memory cache using only a single list call (#14640)
- Increase global IAM refresh interval to 30 minutes
- Also print a log after loading IAM subsystem
2022-03-27 18:48:01 -07:00
Andreas Auernhammer 04df69f633
listing: decrypt only SSE-S3 single-part ETags (#14638)
This commit optimises the ETag decryption when
listing objects.

When MinIO lists objects, it has to decrypt the
ETags of single-part SSE-S3 objects.

It does not need to decrypt ETags of
 - plaintext objects => Their ETag is not encrypted
 - SSE-C objects     => Their ETag is not the content MD5
 - SSE-KMS objects   => Their ETag is not the content MD5
 - multipart objects => Their ETag is not encrypted

Hence, MinIO only needs to make a call to the KMS
when it needs to decrypt a single-part SSE-S3 object.
It can resolve the ETags off all other object types
locally.

This commit implements the above semantics by
processing an object listing in batches.
If the batch contains no single-part SSE-S3 object,
then no KMS calls will be made.

If the batch contains at least one single-part
SSE-S3 object we have to make at least one KMS call.
No we first filter all single-part SSE-S3 objects
such that we only request the decryption keys for
these objects.
Once we know which objects resp. ETags require a
decryption key, MinIO either uses the KES bulk
decryption API (if supported) or decrypts each
ETag serially.

This commit is a significant improvement compared
to the previous listing code. Before, a single
non-SSE-S3 object caused MinIO to fall-back to
a serial ETag decryption.
For example, if a batch consisted of 249 SSE-S3
objects and one single SSE-KMS object, MinIO would
send 249 requests to the KMS.
Now, MinIO will send a single request for exactly
those 249 objects and skip the one SSE-KMS object
since it can handle its ETag locally.

Further, MinIO would request decryption keys
for SSE-S3 multipart objects in the past - even
though multipart ETags are not encrypted.
So, if a bucket contained only multipart SSE-S3
objects, MinIO would make totally unnecessary
requests to the KMS.
Now, MinIO simply skips these multipart objects
since it can handle the ETags locally.

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-03-27 18:34:11 -07:00
Anis Elleuch 908eb57795
Always get the actual object size (#14637)
In bulk ETag decryption, do not rely on the etag to check if it is
encrypted or not to decide if we should set the actual object size in
ObjectInfo. The reason is that multipart objects ETags are not
encrypted.

Always get the actual object size in that case.
2022-03-27 08:54:25 -07:00
Harshavardhana 5cfedcfe33
askDisks for strict quorum to be equal to read quorum (#14623) 2022-03-25 16:29:45 -07:00
Andreas Auernhammer 4d2fc530d0
add support for SSE-S3 bulk ETag decryption (#14627)
This commit adds support for bulk ETag
decryption for SSE-S3 encrypted objects.

If KES supports a bulk decryption API, then
MinIO will check whether its policy grants
access to this API. If so, MinIO will use
a bulk API call instead of sending encrypted
ETags serially to KES.

Note that MinIO will not use the KES bulk API
if its client certificate is an admin identity.

MinIO will process object listings in batches.
A batch has a configurable size that can be set
via `MINIO_KMS_KES_BULK_API_BATCH_SIZE=N`.
It defaults to `500`.

This env. variable is experimental and may be
renamed / removed in the future.

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-03-25 15:01:41 -07:00
Harshavardhana f046f557fa
request only 1 best version for latest version resolution (#14625)
ListObjects, ListObjectsV2 calls are being heavily taxed when
there are many versions on objects left over from a previous
release or ILM was never setup to clean them up. Instead
of being absolutely correct at resolving the exact latest
version of an object, we simply rely on the top most 1
version and resolve the rest.

Once we have obtained the top most "1" version for
ListObject, ListObjectsV2 call we break out.
2022-03-25 08:50:07 -07:00
Harshavardhana 401958938d
add load balance properly restClientFromHash() bucket/prefix (#14621)
spread out resuming further to other nodes
2022-03-25 03:41:31 -07:00
Poorna 566cffe53d
save format.json by default for inspect API (#14620) 2022-03-25 02:02:17 -07:00
Minio Trusted a42b576382 keep maximum concurrent operations to 512 (to sustain upto 1024 open fds) 2022-03-23 17:02:04 -07:00
Klaus Post 2ac54e5a7b
ListObjects: Filter lifecycle expired objects (#14606)
For ListObjects and ListObjectsV2 perform lifecycle checks on 
all objects before returning. This will filter out objects that are 
pending lifecycle expiration.

Bonus: Cheaper server pool conflict resolution by not converting to FileInfo.
2022-03-22 12:39:45 -07:00
Harshavardhana 8eecdc6d1f
odd stripe sizes should choose (odd+1)/2 to get correct quorum (#14610) 2022-03-22 12:21:14 -07:00
Klaus Post 50577e2bd2
Allow adjusting request pool both ways (#14609)
When reloading a dynamic config allow the request pool to scale both ways.

Existing requests hold on to the previous pool, so they will pop the elements from that.
2022-03-22 11:28:54 -07:00
Klaus Post 7bc1f986e8
Do not wait for results when canceled (#14607)
When canceled nobody may be listening for the results.

Prevents memory buildup from cancelled requests.
2022-03-22 09:37:01 -07:00
Harshavardhana d796621ccc
choose smaller default deadline for diagnostics without --full (#14599) 2022-03-21 23:25:24 -07:00
Harshavardhana f6113264f4 add detection for GOMAXPROCS < NumCPU 2022-03-21 19:05:10 -07:00
Harshavardhana a3534a730b
fallback quorum should be "strict" globally if config is not loaded (#14589) 2022-03-20 17:39:06 -07:00
Harshavardhana bd6f7b6d83
fix: make decommission restart non-blocking (#14591)
currently an on-going decommission, during a server
restart might block the startup sequence for relatively
longer periods, instead start the decommission in
background lazily.
2022-03-20 14:46:43 -07:00
Andreas Auernhammer b0a4beb66a
PutObjectPart: set SSE-KMS headers and truncate ETags. (#14578)
This commit fixes two bugs in the `PutObjectPartHandler`.
First, `PutObjectPart` should return SSE-KMS headers
when the object is encrypted using SSE-KMS.
Before, this was not the case.

Second, the ETag should always be a 16 byte hex string,
perhaps followed by a `-X` (where `X` is the number of parts).
However, `PutObjectPart` used to return the encrypted ETag
in case of SSE-KMS. This leaks MinIO internal etag details
through the S3 API.

The combination of both bugs causes clients that use SSE-KMS
to fail when trying to validate the ETag. Since `PutObjectPart`
did not send the SSE-KMS response headers, the response looked
like a plaintext `PutObjectPart` response. Hence, the client
tries to verify that the ETag is the content-md5 of the part.
This could never be the case, since MinIO used to return the
encrypted ETag.

Therefore, clients behaving as specified by the S3 protocol
tried to verify the ETag in a situation they should not.

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-03-19 10:15:12 -07:00
Harshavardhana 01ee49045e
fix: handle race in server setup global CI/CD variable (#14579) 2022-03-18 18:21:09 -07:00
Harshavardhana 7bd9f821dd
return correct context errors for locking operations (#14569)
if a context is canceled do not need to return a timeout error
instead, return the appropriate error for context canceled.
2022-03-18 15:32:45 -07:00
Klaus Post 61eb9d4e29
Fix listing fallback re-using disks (#14576)
When more than 2 disks are unavailable for listing, the same disk will be used for fallback.

This makes quorum calculations incorrect since the same disk will have multiple entries.

This PR keeps track of which fallback disks have been handed out and only every returns a disk once.
2022-03-18 11:35:27 -07:00
Harshavardhana 43eb5a001c
re-use transport for AdminInfo() call (#14571)
avoids creating new transport for each `isServerResolvable`
request, instead re-use the available global transport and do
not try to forcibly close connections to avoid TIME_WAIT
build upon large clusters.

Never use httpClient.CloseIdleConnections() since that can have
a drastic effect on existing connections on the transport pool.

Remove it everywhere.
2022-03-17 16:20:10 -07:00
Klaus Post c1760fb764
Move apiCalls to front for field alignment (#14568)
Fixes #14565
2022-03-17 10:57:52 -07:00
Minio Trusted ffcadcd99e Revert "Use S3 client for uplooads/downloads during perf test (#14553)"
This reverts commit ff811f594b.

Speedtest is broken need to fix this more cleanly.
2022-03-16 23:34:49 -07:00
Krishnan Parthasarathi 7b81967a3c
Fix handling of object versions pending purge (#14555)
- GetObject() with vid should return 405
- GetObject() without vid should return 404
- ListObjects() should ignore this object if this is the "latest" version of the object
- ListObjectVersions() should list this object as "DELETE marker"
- Remove data parts before sync'ing the version pending purge
2022-03-16 16:59:43 -07:00
Krishna Srinivas ff811f594b
Use S3 client for uplooads/downloads during perf test (#14553) 2022-03-16 16:58:46 -07:00
Harshavardhana e3071157f0
allow MakeBucketLocation to work for metaBucket (#14548)
decommission would fail to start due to failure
in MakeBucketLocation() error on .minio.sys/ bucket
creation.

Allow these special buckets.
2022-03-14 11:25:24 -07:00
Klaus Post c07af89e48
select: Add ScanRange to CSV&JSON (#14546)
Implements https://docs.aws.amazon.com/AmazonS3/latest/API/API_SelectObjectContent.html#AmazonS3-SelectObjectContent-request-ScanRange

Fixes #14539
2022-03-14 09:48:36 -07:00
Harshavardhana 9c846106fa
decouple service accounts from root credentials (#14534)
changing root credentials makes service accounts
in-operable, this PR changes the way sessionToken
is generated for service accounts.

It changes service account behavior to generate
sessionToken claims from its own secret instead
of using global root credential.

Existing credentials will be supported by
falling back to verify using root credential.

fixes #14530
2022-03-14 09:09:22 -07:00
Harshavardhana cf94d1f1f1
do not crash readXLMetaNoData - if the `xl.meta` has incorrect content (#14538)
```
tmp = buf[want:]
```

Would potentially crash when `buf` is truncated for some reason
and does not have the expected bytes, this is of course considered
not normal and is an odd situation. But we do not need to crash
here instead allow for errors to be returned and let callers handle
the errors.
2022-03-14 09:07:46 -07:00
Poorna f8d6eaaa96
fix: regression from range GET proxy on replicated buckets #14345 (#14532)
Fixes: #14531
2022-03-11 15:56:49 -08:00
Poorna 75b925c326
Deprecate root disk for disk caching (#14527)
This PR modifies #14513 to issue a deprecation
warning rather than reject settings on startup.
2022-03-10 18:42:44 -08:00
Harshavardhana 91d419ee6c
warn issues about large block I/O performance for Linux older than 4.0.0 (#14524)
This PR simply adds a warning message when it detects older kernel
versions and warn's them about potential performance issues on this
kernel.

The issue can be seen only with parallel I/O across all drives
on denser setups such as 90 drives or 45 drives per server configurations.
2022-03-10 17:36:13 -08:00
Harshavardhana 41079f1015
heal: remove blocking healDiskMeta upon startup (#14514)
This type of code is not necessary, read's of all
metadata content at `.minio.sys/config` automatically
triggers healing when necessary in the GetObjectNInfo()
call-path.

Having this code is not useful and this also adds to
the overall startup time of MinIO when there are lots
of users and policies.
2022-03-10 02:45:14 -08:00
Poorna 712dfa40cd
Add missing site replication hook for clearing sse config (#14512) 2022-03-10 00:04:34 -08:00
Klaus Post b890bbfa63
Add local disk health checks (#14447)
The main goal of this PR is to solve the situation where disks stop 
responding to operations. This generally causes an FD build-up and 
eventually will crash the server.

This adds detection of hung disks, where calls on disk get stuck.

We add functionality to `xlStorageDiskIDCheck` where it keeps 
track of the number of concurrent requests on a given disk.

A total number of 100 operations are allowed. If this limit is reached 
we will block (but not reject) new requests, but we will monitor the 
state of the disk.

If no requests have been completed or updated within a 15-second 
window, we mark the disk as offline. Requests that are blocked will be 
unblocked and return an error as "faulty disk".

New requests will be rejected until the disk is marked OK again.

Once a disk has been marked faulty, a check will run every 5 seconds that 
will attempt to write and read back a file. As long as this fails the disk will 
remain faulty.

To prevent lots of long-running requests to mark the disk faulty we 
implement a callback feature that allows updating the status as parts 
of these operations are running.

We add a reader and writer wrapper that will update the status of each 
successful read/write operation. This should allow fine enough granularity 
that a slow, but still operational disk will not reach 15 seconds where 
50 operations have not progressed.

Note that errors themselves are not enough to mark a disk faulty. 
A nil (or io.EOF) error will mark a disk as "good".

* Make concurrent disk setting configurable via `_MINIO_DISK_MAX_CONCURRENT`.

* de-couple IsOnline() from disk health tracker

The purpose of IsOnline() is to ensure that we
reconnect the drive only when the "drive" was

- disconnected from network we need to validate
  if the drive is "correct" and is the same drive
  which belongs to this server.

- drive was replaced we have to format it - we
  support hot swapping of the drives.

IsOnline() is not meant for taking the drive offline
when it is hung, it is not useful we can let the
drive be online instead "return" errors for relevant
calls.

* return errFaultyDisk for DiskInfo() call

Co-authored-by: Harshavardhana <harsha@minio.io>

Possible future Improvements:

* Unify the REST server and local xlStorageDiskIDCheck. This would also improve stats significantly.
* Allow reads/writes to be aborted by the context.
* Add usage stats, concurrent count, blocked operations, etc.
2022-03-09 11:38:54 -08:00
Poorna 46ba15ab03
Return MethodNotAllowed if force del on replicated bucket (#14505) 2022-03-08 14:28:51 -08:00
Poorna 1e39ca39c3
fix: consistent replies for incorrect range requests on replicated buckets (#14345)
Propagate error from replication proxy target correctly to the client if range GET is unsatisfiable.
2022-03-08 13:58:55 -08:00
Krishnan Parthasarathi 80ef1ae51c
Simplify assembling of tierStats from data-usage (#14504) 2022-03-08 12:08:29 -08:00
Krishna Srinivas 4d0715d226
Implement netperf for "mc support perf net" (#14397)
Co-authored-by: Klaus Post <klauspost@gmail.com>
2022-03-08 09:54:38 -08:00
Klaus Post 8a274169da
heal: Fix first entry on dangling (#14495)
Instead of the first, the last entry was returned
pointerizing the range value.
2022-03-08 09:04:20 -08:00
Harshavardhana 5d6f6d8d5b
create missing .minio.sys/config, .minio.sys/buckets during decommission (#14497) 2022-03-07 16:18:57 -08:00
Anis Elleuch bacf6156c1
metrics: Avoid crash when fetching tier metrics (#14493)
Data usage does not always contain tiering info even if the data usage
information is valid. Avoid a crash in that case.

(e.g. the scanner scanned the namespace, the user enables tiering,
prometheus scrapes the server before the scanner gets a chance to
update the data usage with new tiering information)
2022-03-07 10:59:32 -08:00
Klaus Post 1d1b213f1f
scanner: Consider preselection bias when selecting for Healing (#14492)
Healing decisions would align with skipped folder counters. This can lead to files 
never being selected for heal checks on "clean" paths.

Use different hashing methods and take objectHealProbDiv into account when 
calculating the cycle.

Found by @vadmeste
2022-03-07 09:25:53 -08:00
Harshavardhana 92a77cc78e
update pkg v1.1.20 to reload certs in k8s always (#14470) 2022-03-04 20:34:39 -08:00
Harshavardhana b0c84e3de7
fix: deleteVersions causing xl.meta to have empty Versions[] slice (#14483)
This is a side-affect of the optimization done in PR #13544 which
causes a certain type of delete operations on given object versions
can cause lastVersion indication to be skipped, which leads to
an `xl.meta` where Versions[] slice is empty while the entire
file is intact by itself.

This PR tries to ensure that such files are visible and deletable
by regular means of listing as null 'delete-marker' and also
avoid the situation where this potential issue might arise.
2022-03-04 20:01:26 -08:00
Anis Elleuch bbc914e174
heal: Do not override heal scan mode mode if it is set (#14476)
mc admin heal has --scan=deep flag which enforces bitrot checking 
when doing the healing.

Do not force override an existing heal scan option.
2022-03-04 18:25:06 -08:00
Anis Elleuch 3fca4055d2
heal: Re-heal an object when a corruption is found during normal scan (#14482)
When scanning using normal mode, HealObject() can report an 
error saying that it found a corrupted part. This doesn't have 
when HealObject() is called with bitrot scan flag. However, when 
this happens, we can still restart HealObject() with the bitrot scan.

This is also important because this means the scanner and the 
new disks healer will not be able to heal an object that doesn't 
exist in a specific disk and has corruption in another disk.

Also without this PR, mc admin heal command without bitrot will report
an error.
2022-03-04 18:24:34 -08:00
Harshavardhana 66afa16aed
canceled PUTs throw frivolous logs (#14475)
remote drives might throw frivolous logs,
if the caller canceled the PUT operation
in such scenarios there is no reason to log.
2022-03-04 10:31:33 -08:00
Harshavardhana 0e3bafcc54
improve logs, fix banner formatting (#14456) 2022-03-03 13:21:16 -08:00
Andreas Auernhammer b48f719b8e
kes: remove unnecessary error conversion (#14459)
This commit removes some duplicate code that
converts KES API errors.

This code was added since KES `0.18.0` changed
some exported API errors. However, the KES SDK
handles this error conversion itself.
Therefore, it is not necessary to duplicate this
behavior in MinIO.

See: 21555fa624/error.go (L94)

Signed-off-by: Andreas Auernhammer <hi@aead.dev>
2022-03-03 09:42:37 -08:00
Lenin Alevski 289fcbd08c
KES dependency upgrade (#14454)
- Updating KES dependency to v.0.18.0
- Fixing incompatibility issue when checking for errors during KES key creation

Signed-off-by: Lenin Alevski <alevsk.8772@gmail.com>
2022-03-02 23:03:40 -08:00
Harshavardhana 7e803adf13
do not attempt force delete on bucket (#14452)
caller needs to ask explicitly for force delete
otherwise, the force delete might end up deleting
an existing bucket with data.

fixes #14445
2022-03-02 20:47:53 -08:00
Anis Elleuch 4a15bd8ff8
Return info for DiskInfo when the disk is unformatted (#14427)
In a distributed setup, a DiskInfo REST call to an unformatted disk
returns an error with no disk information, such as the disk endpoint
URL, which is unexpected.
2022-03-01 15:06:47 -08:00
Klaus Post b030ef1aca
tests: Clean up dsync package (#14415)
Add non-constant timeouts to dsync package.

Reduce test runtime by minutes. Hopefully not too aggressive.
2022-03-01 11:14:28 -08:00
Harshavardhana cc46a99f97
skip object-lock headers without values (#14430)
metadata headers can have headers without values
as per AWS S3 spec however, we need to skip some
headers that do not have values that potentially
can have empty values set.
2022-03-01 11:04:47 -08:00
Xuehan Xu becec6cb6b
correct mrf.newSetReconnected invocation's param order (#14426)
Signed-off-by: xuxuehan <xuxuehan@qianxin.com>
2022-02-28 09:13:19 -08:00
Harshavardhana b7c90751b0 allow drive tests to respond only drive paths 2022-02-25 18:54:46 -08:00
Harshavardhana e43cc316ff
remove errCh usage from HealObjects() simplify it (#14414)
errCh is not needed instead, rely on errs slice to
capture and return errors instead.

most probably fixes #14247
2022-02-25 12:20:41 -08:00
hellivan 03b35ecdd0
collect correct parentUser for OIDC creds auto expiration (#14400) 2022-02-24 11:43:15 -08:00
Harshavardhana c08540c7b7
reject speedtest when there isn't enough disk space available (#14402)
small setups do not return appropriate errors when speedtest
cannot run on small tiny setups, allow the tests to fail
appropriately more pro-actively.

many users bring toy setups, this PR simply returns an error
in such situations.
2022-02-24 09:06:18 -08:00
Shireesh Anjal 3934700a08
Make audit webhook and kafka config dynamic (#14390) 2022-02-24 09:05:33 -08:00
Harshavardhana 2d78e20120
enable CI environment additionally for MINIO_CI_CD (#14395)
all CI/CD environments set CI=true this is enough
for MinIO to be run inside CI environments, support
it.
2022-02-23 16:01:59 -08:00
Harshavardhana 2e6f8bdf19
do not skip healing disks during deletes (#14394)
healing disks take active I/O it is possible
that deleted objects might stay in .trash
folder for a really long time until the drive
is fully healed.

this PR changes it such that we are making sure
we purge the active content written to these
disks as well.
2022-02-23 14:30:46 -08:00
Shireesh Anjal 25144fedd5
Send deployment id and minio version in http header (#14378) 2022-02-23 13:36:01 -08:00
Krishnan Parthasarathi 27f64dd9a4
Add support for tier-remove and tier-verify (#14382)
* Add tier remove support only if it's empty
* Add support for tier verify
2022-02-23 13:34:25 -08:00
Harshavardhana 9d7648f02f
reduce unnecessary logging during speedtest (#14387)
- speedtest logs calls that were canceled
  spuriously, in situations where it should
  be ignored.

- all errors of interest are always sent back
  to the client there is no need to log them
  on the server console.

- PUT failures should negate the increments
  such that GET is not attempted on unsuccessful
  calls.

- do not attempt MRF on speedtest objects.
2022-02-23 11:59:13 -08:00
Poorna 1ef8babfef
cache: improve error reported for atime check (#14384) 2022-02-23 11:57:06 -08:00
Poorna 4ea7bf0510
Use custom transport for site replication (#14391)
Also, ensure that tiering uses a different instance of custom transport
2022-02-23 11:50:40 -08:00
Anis Elleuch 5dcf1d13a9
ci: Always set disks as non root disks (#14389)
In the testing mode, reformatting disks will fail because the healing
code will complain if one disk is in root mode. This commit will
automatically set all disks as non-root if MINIO_CI_CD is set.
2022-02-23 10:11:33 -08:00
Shireesh Anjal 94d37d05e5
Apply dynamic config at sub-system level (#14369)
Currently, when applying any dynamic config, the system reloads and
re-applies the config of all the dynamic sub-systems.

This PR refactors the code in such a way that changing config of a given
dynamic sub-system will work on only that sub-system.
2022-02-22 10:59:28 -08:00
Harshavardhana 0cbdc458c5
fix: do not reload disk format.json on a reconnected disk (#14351)
An onlineDisk means its a valid disk but it may be a
re-connected disk, this PR verifies that based on LastConn()
to only trigger MRF. Current code would again re-load the
disk 'format.json' which is not necessary and perhaps an
unnecessary call.

A potential side affect of this is closing perfectly online
disks and getting re-replaced by reloading 'format.json'.

This PR tries to avoid this situation by making sure MRF
is triggered but not reloading 'format.json' because of MRF.
2022-02-21 15:51:54 -08:00
Harshavardhana 65b1a4282e
fix: console logger regression with dynamic logger webhook registration (#14346)
fixes a regression from #14289
2022-02-17 17:50:10 -08:00
Harshavardhana af3dc25dfe
align 32bit integers with atomic values in structs (#14344)
fixes #14341
2022-02-17 15:22:26 -08:00
Krishnan Parthasarathi 5a0c0079a1
Don't add free-version on restore-object (#14340) 2022-02-17 15:05:19 -08:00
Harshavardhana af8f563ed3
allow clearing FIFO config as fallback (#14338)
FIFO is already removed, for users who upgrade are allowed to clear their configs.
2022-02-17 12:49:46 -08:00
Poorna 93af4a4864
Handle non existent kms key correctly (#14329)
- in PutBucketEncryption API
- admin APIs for  `mc admin KMS key [create|info]`
- PutObject API when invalid KMS key is specified
2022-02-17 11:36:14 -08:00
Shireesh Anjal 28f188e3ef
Make logger webhook config dynamic (#14289)
It should not be required to restart the 
server after setting the logger webhook config.
2022-02-17 11:11:15 -08:00
Harshavardhana d756da41b9 fix: print gateway banner on removal notice 2022-02-16 20:34:47 -08:00
Krishnan Parthasarathi cdab4a3b85
Update hourly tier-stats only on succesful tiering (#14330) 2022-02-16 17:29:12 -08:00
Klaus Post b88c57ba93
Add fgprof profiles (#14321)
https://github.com/felixge/fgprof#rocket-fgprof---the-full-go-profiler
2022-02-16 12:00:10 -08:00
Klaus Post 60cd513a33
Fix leaked healing goroutines (#14322)
Only the first `listAndHeal` would ever be able to write on errCh, blocking all others infinitely.

Instead read all errors but return the first non-nil, if any.

The intention appears to be that this should cancel on any error, 
so that part is kept. 

Regression from #13990
2022-02-16 08:40:18 -08:00
Harshavardhana 03a6e8aee2
fix: creating steep directory structure on trash folder (#14314)
weird directory structures get created on the '.trash'
folder upon server restarts, this PR fixes this.
2022-02-15 16:34:03 -08:00
Anis Elleuch 4afbb89774
nas: Clean stale background appended files (#14295)
When more than one gateway reads and writes from the same mount point
and there is a load balancer pointing to those gateways. Each gateway 
will try to create its own temporary append file but fails to clear it later 
when not needed.

This commit creates a routine that checks all upload IDs saved in
multipart directory and remove any stale entry with the same upload id
in the memory and in the temporary background append folder as well.
2022-02-15 09:25:47 -08:00
Klaus Post 5ec57a9533
Add GetObject gzip option (#14226)
Enabled with `mc admin config set alias/ api gzip_objects=on`

Standard filtering applies (1K response minimum, not compressed content 
type, not range request, gzip accepted by client).
2022-02-14 09:19:01 -08:00
Anis Elleuch 1f92fc3fc0
Always check for root disks unless MINIO_CI_CD is set (#14232)
The current code considers a pool with all root disks to be as part
of a testing environment even if there are other pools with mounted
disks. This will result to illegitimate writing in root disks.

Fix this by simplifing the logic: require MINIO_CI_CD in order to skip
root disk check.
2022-02-13 15:42:07 -08:00
Harshavardhana fad3d66093
parallelize background cleanup on local disks across sets (#14290) 2022-02-11 14:22:48 -08:00
Poorna ed3418c046
Refactor replication resync to be an active process (#14266)
When resync is triggered, walk the bucket namespace and
resync objects that are unreplicated. This PR also adds
an API to report resync progress.
2022-02-10 10:16:52 -08:00
Anis Elleuch 71bab74148
Fix adding bucket forwarder handler in server mode (#14288)
MinIO configuration is loaded after the initialization of the server
handlers, which will miss the initialization of the bucket forwarder
handler.

Though the federation is deprecated, let's fix this for the time being.
2022-02-10 08:49:36 -08:00
Anis Elleuch 661ea57907
restore: Add quotes some fields in x-amz-restore header (#14281)
S3 spec returns x-amz-restore header in HEAD/GET object with the
following format:

```
x-amz-restore: ongoing-request="false", expiry-date="Fri, 21 Dec 2012
00:00:00 GMT"
```

This commit adds quotes as the current code does not support it. It will
also supports the old format saved in the disk (in xl.meta) for backward
compatibility.
2022-02-09 13:17:41 -08:00
Anis Elleuch 1f18efb0ba
gateway: Active bucket forwarding handler (#14277)
A regression removed support of federation in the gateway mode. 
Enable it again.

Federation is deprecated for a while but let's fix this for the time being.
2022-02-09 09:31:47 -08:00
Daniel 8ae46bce93
fix the error logs have been omitted because of retryCount never exceed 10 (#14268) 2022-02-09 03:14:22 -08:00
Harshavardhana f19a414e09
fix: allow danging objects to be purged properly deleteMultipleObjects() (#14273)
Deleting bulk objects had an issue since the relevant versionID
is not passed through the layers to ensure that the dangling
object purge actually works cleanly.

This is a continuation of quorum related error returned by
multi-object delete API from #14248

This PR ensures that we pass down correct information as
well as extend the scope of dangling object detection.
2022-02-08 20:08:23 -08:00
Krishnan Parthasarathi 0ee2933234
Export tier metrics via Prometheus (#13413)
e.g
```
minio_cluster_ilm_transitioned_bytes{server="minio3:9000",tier="S3TIER-1"} 1.36317772e+08
minio_cluster_ilm_transitioned_bytes{server="minio3:9000",tier="S3TIER-2"} 2892
minio_cluster_ilm_transitioned_bytes{server="minio3:9000",tier="STANDARD"}
1.3631488e+08

minio_cluster_ilm_transitioned_objects{server="minio3:9000",tier="S3TIER-1"} 1
minio_cluster_ilm_transitioned_objects{server="minio3:9000",tier="S3TIER-2"} 0
minio_cluster_ilm_transitioned_objects{server="minio3:9000",tier="STANDARD"} 1

minio_cluster_ilm_transitioned_versions{server="minio3:9000",tier="S3TIER-1"} 3
minio_cluster_ilm_transitioned_versions{server="minio3:9000",tier="S3TIER-2"} 2
minio_cluster_ilm_transitioned_versions{server="minio3:9000",tier="STANDARD"} 1
```
2022-02-08 12:45:28 -08:00
Shireesh Anjal 9890f579f8
Add subsystem level validation on `config set` (#14269)
When setting a config of a particular sub-system, validate the existing
config and notification targets of only that sub-system, so that
existing errors related to one sub-system (e.g. notification target
offline) do not result in errors for other sub-systems.
2022-02-08 10:36:41 -08:00
Anis Elleuch 2ee337ead5
prometheus: Add incoming requests metrics since last scrape (#14261)
Some users running MinIO claim that their system became slow. One 
way to investigate is to look at this Prometheus history of the number of
the requests reaching the server. The existing current S3 requests metric
is not enough because it can increase of the system really becomes slow, 
due to disk issues for example.
2022-02-07 16:30:14 -08:00
Harshavardhana 3c87e1e60d
fix: rename some function names to avoid confusion (#14262) 2022-02-07 11:49:07 -08:00
Harshavardhana 0cac868a36
speed-up startup time, do not block on ListBuckets() (#14240)
Bonus fixes #13816
2022-02-07 10:39:57 -08:00
Harshavardhana 186c477f3c init console server after server config is initialized
fixes #14259
2022-02-07 00:17:33 -08:00
Harshavardhana 6123377e66
speedup getFormatErasureInQuorum use driveCount (#14239)
startup speed-up, currently getFormatErasureInQuorum()
would spend up to 2-3secs when there are 3000+ drives
for example in a setup, simplify this implementation
to use drive counts.
2022-02-04 12:21:21 -08:00
Harshavardhana 0256dae657
fix: quorum requirement for DeleteMarkers and parity upgraded objects (#14248)
DeleteMarkers do not have a default quorum, i.e it is possible that
DeleteMarkers were created with n/2+1 quorum as well to make sure
that we satisfy situations such as those we need to make sure delete
markers only expect n/2 read quorum.

Additionally we should also look at additional metadata on the
actual objects that might have been "erasure" upgraded with new
parity when disks are down.

In such a scenario do not default to the standard storage class
parity, instead use the parityBlocks present on the FileInfo to
ensure that we are dealing with the correct quorum for READs and
DELETEs.
2022-02-04 02:47:36 -08:00
Harshavardhana 84b121bbe1
return error with empty x-amz-copy-source-range headers (#14249)
fixes #14246
2022-02-03 16:58:27 -08:00
Harshavardhana 01e550a9be
ignore unreadable metrics on certain closed systems (#14234)
fixes #14233
2022-02-03 09:45:12 -08:00
Poorna 63a2e0bab6
Remove notification from NotificationSys on bucket deletion (#14236) 2022-02-02 17:11:56 -08:00
Harshavardhana 24657859a8
when o_direct is disabled do not attempt fadvise call (#14230) 2022-02-02 08:54:52 -08:00
Sidhartha Mani d7df6bc738
add support for speedtest drive (#14182) 2022-02-01 22:38:05 -08:00
Poorna a4e1de93a7
Add API for removing site(s) from site replication (#14104) 2022-02-01 17:26:09 -08:00
Klaus Post 067d21d0f2
fs: Retry listing if no marker (#14221)
Retry listings, when no next marker is returned and the result isn't truncated.

This can happen when an object is queued, but no info can be fetched.

Fixes #14190
2022-02-01 10:00:14 -08:00
Shireesh Anjal 3882da6ac5
Add subnet proxy config (#14225)
Will store the HTTP(S) proxy URL to use for connecting to SUBNET.
2022-02-01 09:52:38 -08:00
Anis Elleuch 127e8bf3b6
heal: Avoid printing repetitive error to heal a root disk (#14220)
The healing code repeatedly tries to heal a root disk when it is empty
the reason is that connectEndpoint() returns errUnformattedDisk even
if the disk is a root disk. Changing that to returning another error
will avoid queueing the disk to the healing code in each connect disks
iteration.
2022-01-31 17:28:20 -08:00
Harshavardhana 74faed166a
Add quota usage as part of prometheus metrics (#14222)
Bonus: pass caller context when needed to all bucket metadata handling calls.
2022-01-31 17:27:43 -08:00