When using a chain provider all providers do not return a valid
access and secret key, an anonymous request is sent, which makes it hard
for users to figure out what is going on
In the case of S3 tiering, when AWS IAM temporary account generation returns
an error, an anonymous login will be used because of the chain provider.
Avoid this and use the AWS IAM provider directly to get a good error
message.
This helps reduce disk operations as these periodic routines would not
run concurrently any more.
Also add expired STS purging periodic operation: Since we do not scan
the on-disk STS credentials (and instead only load them on-demand) a
separate routine is needed to purge expired credentials from storage.
Currently this runs about a quarter as often as IAM refresh.
Also fix a bug where with etcd, STS accounts could get loaded into the
iamUsersMap instead of the iamSTSAccountsMap.
This allows scanner to avoid lengthy scans, skip
things appropriately and also not lose metrics in
any manner.
reduce longer deadlines for usage-cache loads/saves
to match the disk timeout which is 2minutes now per
IOP.
In situations with large number of STS credentials on disk, IAM load
time is high. To mitigate this, STS accounts will now be loaded into
memory only on demand - i.e. when the credential is used.
In each IAM cache (re)load we skip loading STS credentials and STS
policy mappings into memory. Since STS accounts only expire and cannot
be deleted, there is no risk of invalid credentials being reused,
because credential validity is checked when it is used.
Currently we have IOPs of these patterns
```
[OS] os.Mkdir play.min.io:9000 /disk1 2.718µs
[OS] os.Mkdir play.min.io:9000 /disk1/data 2.406µs
[OS] os.Mkdir play.min.io:9000 /disk1/data/.minio.sys 4.068µs
[OS] os.Mkdir play.min.io:9000 /disk1/data/.minio.sys/tmp 2.843µs
[OS] os.Mkdir play.min.io:9000 /disk1/data/.minio.sys/tmp/d89c8ceb-f8d1-4cc6-b483-280f87c4719f 20.152µs
```
It can be seen that we can save quite Nx levels such as
if your drive is mounted at `/disk1/minio` you can simply
skip sending an `Mkdir /disk1/` and `Mkdir /disk1/minio`.
Since they are expected to exist already, this PR adds a way
for us to ignore all paths upto the mount or a directory which
ever has been provided to MinIO setup.
Previously existing objects were queued to single worker and MRF re-queues
are also handled by same worker - this does not fully use the available
bandwidth in case there is no incoming workload.
Errors such as
```
returned an error (context deadline exceeded) (*fmt.wrapError)
```
```
(msgp: too few bytes left to read object) (*fmt.wrapError)
```
This change enables embedding files in ZIP with custom permissions.
Also uses default creds for starting MinIO based on inspect data.
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
objects with 10,000 parts and many of them can
cause a large memory spike which can potentially
lead to OOM due to lack of GC.
with previous PR reducing the memory usage significantly
in #17963, this PR reduces this further by 80% under
repeated calls.
Scanner sub-system has no use for the slice of Parts(),
it is better left empty.
```
benchmark old ns/op new ns/op delta
BenchmarkToFileInfo/ToFileInfo-8 295658 188143 -36.36%
benchmark old allocs new allocs delta
BenchmarkToFileInfo/ToFileInfo-8 61 60 -1.64%
benchmark old bytes new bytes delta
BenchmarkToFileInfo/ToFileInfo-8 1097210 227255 -79.29%
```
- this PR avoids sending a large ChecksumInfo slice
when its not needed
- also for a file with XLV2 format there is no reason
to allocate Checksum slice while reading
Keys are helpful to ensure the strict ordering of messages, however currently the
code uses a random request id for every log, hence using the request-id
as a Kafka key is not serve any purpose;
This commit removes the usage of the key, to also fix the audit issue from
internal subsystem that does not have a request ID.
to track the replication transfer rate across different nodes,
number of active workers in use and in-queue stats to get
an idea of the current workload.
This PR also adds replication metrics to the site replication
status API. For site replication, prometheus metrics are
no longer at the bucket level - but at the cluster level.
Add prometheus metric to track credential errors since uptime
replicationTimestamp might differ if there were retries
in replication and the retried attempt overwrote in
quorum but enough shards with newer timestamp causing
the existing timestamps on xl.meta to be invalid, we
do not rely on this value for anything external.
this is purely a hint for debugging purposes, but there
is no real value in it considering the object itself
is in-tact we do not have to spend time healing this
situation.
we may consider healing this situation in future but
that needs to be decoupled to make sure that we do not
over calculate how much we have to heal.
.metacache objects are transient in nature, and are better left to
use page-cache effectively to avoid using more IOPs on the disks.
this allows for incoming calls to be not taxed heavily due to
multiple large batch listings.
given a versionId the mtime is always the same, it
can never be different than its original value.
versionIds also do not conflict, since they are uuid's
and unique practically forever.