A cache structure will be kept with a tree of usages.
The cache is a tree structure where each keeps track
of its children.
An uncompacted branch contains a count of the files
only directly at the branch level, and contains link to
children branches or leaves.
The leaves are "compacted" based on a number of properties.
A compacted leaf contains the totals of all files beneath it.
A leaf is only scanned once every dataUsageUpdateDirCycles,
rarer if the bloom filter for the path is clean and no lifecycles
are applied. Skipped leaves have their totals transferred from
the previous cycle.
A clean leaf will be included once every healFolderIncludeProb
for partial heal scans. When selected there is a one in
healObjectSelectProb that any object will be chosen for heal scan.
Compaction happens when either:
- The folder (and subfolders) contains less than dataScannerCompactLeastObject objects.
- The folder itself contains more than dataScannerCompactAtFolders folders.
- The folder only contains objects and no subfolders.
- A bucket root will never be compacted.
Furthermore, if a has more than dataScannerCompactAtChildren recursive
children (uncompacted folders) the tree will be recursively scanned and the
branches with the least number of objects will be compacted until the limit
is reached.
This ensures that any branch will never contain an unreasonable amount
of other branches, and also that small branches with few objects don't
take up unreasonable amounts of space.
Whenever a branch is scanned, it is assumed that it will be un-compacted
before it hits any of the above limits. This will make the branch rebalance
itself when scanned if the distribution of objects has changed.
TLDR; With current values: No bucket will ever have more than 10000
child nodes recursively. No single folder will have more than 2500 child
nodes by itself. All subfolders are compacted if they have less than 500
objects in them recursively.
We accumulate the (non-deletemarker) version count for paths as well,
since we are changing the structure anyway.
MRF does not detect when a node is disconnected and reconnected quickly
this change will ensure that MRF is alerted by comparing the last disk
reconnection timestamp with the last MRF check time.
Signed-off-by: Anis Elleuch <anis@min.io>
Co-authored-by: Klaus Post <klauspost@gmail.com>
wait groups are necessary with io.Pipes() to avoid
races when a blocking function may not be expected
and a Write() -> Close() before Read() races on each
other. We should avoid such situations..
Co-authored-by: Klaus Post <klauspost@gmail.com>
This commit replaces the custom KES client implementation
with the KES SDK from https://github.com/minio/kes
The SDK supports multi-server client load-balancing and
requests retry out of the box. Therefore, this change reduces
the overall complexity within the MinIO server and there
is no need to maintain two separate client implementations.
Signed-off-by: Andreas Auernhammer <aead@mail.de>
This commit enforces the usage of AES-256
for config and IAM data en/decryption in FIPS
mode.
Further, it improves the implementation of
`fips.Enabled` by making it a compile time
constant. Now, the compiler is able to evaluate
the any `if fips.Enabled { ... }` at compile time
and eliminate unused code.
Signed-off-by: Andreas Auernhammer <aead@mail.de>
p.writers is a verbatim value of bitrotWriter
backed by a pipe() that should never be nil'ed,
instead use the captured errors to skip the writes.
additionally detect also short writes, and reject
them as errors.
currently GetUser() returns 403 when IAM is not initialized
this can lead to applications crashing, instead return 503
so that the applications can retry and backoff.
fixes#12078
as there is no automatic way to detect if there
is a root disk mounted on / or /var for the container
environments due to how the root disk information
is masked inside overlay root inside container.
this PR brings an environment variable to set
root disk size threshold manually to detect the
root disks in such situations.
This commit fixes a bug in the single-part object decryption
that is triggered in case of SSE-KMS. Before, it was assumed
that the encryption is either SSE-C or SSE-S3. In case of SSE-KMS
the SSE-C branch was executed. This lead to an invalid SSE-C
algorithm error.
This commit fixes this by inverting the `if-else` logic.
Now, the SSE-C branch only gets executed when SSE-C headers
are present.
Signed-off-by: Andreas Auernhammer <aead@mail.de>
This commit fixes a bug introduced by af0c65b.
When there is no / an empty client-provided SSE-KMS
context the `ParseMetadata` may return a nil map
(`kms.Context`).
When unsealing the object key we must check that
the context is nil before assigning a key-value pair.
Signed-off-by: Andreas Auernhammer <aead@mail.de>
UpdateServiceAccount ignores updating fields when not passed from upper
layer, such as empty policy, empty account status, and empty secret key.
This PR will check for a secret key only if it is empty and add more
check on the value of the account status.
Signed-off-by: Anis Elleuch <anis@min.io>
This commit adds basic SSE-KMS support.
Now, a client can specify the SSE-KMS headers
(algorithm, optional key-id, optional context)
such that the object gets encrypted using the
SSE-KMS method. Further, auto-encryption now
defaults to SSE-KMS.
This commit does not try to do any refactoring
and instead tries to implement SSE-KMS as a minimal
change to the code base. However, refactoring the entire
crypto-related code is planned - but needs a separate
effort.
Signed-off-by: Andreas Auernhammer <aead@mail.de>
It is possible in some scenarios that in multiple pools,
two concurrent calls for the same object as a multipart operation
can lead to duplicate entries on two different pools.
This PR fixes this
- hold locks to serialize multiple callers so that we don't race.
- make sure to look for existing objects on the namespace as well
not just for existing uploadIDs
When running MinIO server without LDAP/OpenID, we should error out when
the code tries to create a service account for a non existant regular
user.
Bonus: refactor the check code to be show all cases more clearly
Signed-off-by: Anis Elleuch <anis@min.io>
Co-authored-by: Anis Elleuch <anis@min.io>
This commit adds basic SSE-KMS support.
Now, a client can specify the SSE-KMS headers
(algorithm, optional key-id, optional context)
such that the object gets encrypted using the
SSE-KMS method. Further, auto-encryption now
defaults to SSE-KMS.
This commit does not try to do any refactoring
and instead tries to implement SSE-KMS as a minimal
change to the code base. However, refactoring the entire
crypto-related code is planned - but needs a separate
effort.
Signed-off-by: Andreas Auernhammer <aead@mail.de>
Co-authored-by: Klaus Post <klauspost@gmail.com>
avoid time_wait build up with getObject requests if there are
pending callers and they timeout, can lead to time_wait states
Bonus share the same buffer pool with erasure healing logic,
additionally also fixes a race where parallel readers were
never cleanup during Encode() phase, because pipe.Reader end
was never closed().
Added closer right away upon an error during Encode to make
sure to avoid racy Close() while stream was still being
Read().
This commit fixes a bug when parsing the env. variable
`MINIO_KMS_SECRET_KEY`. Before, the env. variable
name - instead of its value - was parsed. This (obviously)
did not work properly.
This commit fixes this.
Signed-off-by: Andreas Auernhammer <aead@mail.de>
cleanup functions should never be cleaned before the reader is
instantiated, this type of design leads to situations where order
of lockers and places for them to use becomes confusing.
Allow WithCleanupFuncs() if the caller wishes to add cleanupFns
to be run upon close() or an error during initialization of the
reader.
Also make sure streams are closed before we unlock the resources,
this allows for ordered cleanup of resources.
upon errors to acquire lock context would still leak,
since the cancel would never be called. since the lock
is never acquired - proactively clear it before returning.
failed queue should be used for retried requests to
avoid cascading the failures into incoming queue, this
would allow for a more fair retry for failed replicas.
Additionally also avoid taking context in queue task
to avoid confusion, simplifies its usage.
There can be situations where replication completed but the
`X-Amz-Replication-Status` metadata update failed such as
when the server returns 503 under high load. This object version will
continue to be picked up by the scanner and replicateObject would perform
no action since the versions match between source and target.
The metadata would never reflect that replication was successful
without this fix, leading to repeated re-queuing.
This commit enhances the docs about IAM encryption.
It adds a quick-start section that explains how to
get started quickly with `MINIO_KMS_SECRET_KEY`
instead of setting up KES.
It also removes the startup message that gets printed
when the server migrates IAM data to plaintext.
We will point this out in the release notes.
Signed-off-by: Andreas Auernhammer <aead@mail.de>
OpenID connect generated service accounts do not work
properly after console logout, since the parentUser state
is lost - instead use sub+iss claims for parentUser, according
to OIDC spec both the claims provide the necessary stability
across logins etc.
Currently, only credentials could be updated with
`mc admin bucket remote edit`.
Allow updating synchronous replication flag, path,
bandwidth and healthcheck duration on buckets, and
a flag to disable proxying in active-active replication.