Fix races in IAM cache
Fixes#19344
On the top level we only grab a read lock, but we write to the cache if we manage to fetch it.
a03dac41eb/cmd/iam-store.go (L446) is also flipped to what it should be AFAICT.
Change the internal cache structure to a concurrency safe implementation.
Bonus: Also switch grid implementation.
we must attempt to convert all errors at storage-rest-client
into StorageErr() regardless of what functionality is being
called in, this PR fixes this for multiple callers including
some internally used functions.
- old version was unable to retain messages during config reload
- old version could not go from memory to disk during reload
- new version can batch disk queue entries to single for to reduce I/O load
- error logging has been improved, previous version would miss certain errors.
- logic for spawning/despawning additional workers has been adjusted to trigger when half capacity is reached, instead of when the log queue becomes full.
- old version would json marshall x2 and unmarshal 1x for every log item. Now we only do marshal x1 and then we GetRaw from the store and send it without having to re-marshal.
panic seen due to premature closing of slow channel while listing is still sending or
list has already closed on the sender's side:
```
panic: close of closed channel
goroutine 13666 [running]:
github.com/minio/minio/internal/ioutil.SafeClose[...](0x101ff51e4?)
/Users/kp/code/src/github.com/minio/minio/internal/ioutil/ioutil.go:425 +0x24
github.com/minio/minio/cmd.(*erasureServerPools).Walk.func1()
/Users/kp/code/src/github.com/minio/minio/cmd/erasure-server-pool.go:2142 +0x170
created by github.com/minio/minio/cmd.(*erasureServerPools).Walk in goroutine 1189
/Users/kp/code/src/github.com/minio/minio/cmd/erasure-server-pool.go:1985 +0x228
```
Object names of directory objects qualified for ExpiredObjectAllVersions
must be encoded appropriately before calling on deletePrefix on their
erasure set.
e.g., a directory object and regular objects with overlapping prefixes
could lead to the expiration of regular objects, which is not the
intention of ILM.
```
bucket/dir/ ---> directory object
bucket/dir/obj-1
```
When `bucket/dir/` qualifies for expiration, the current implementation would
remove regular objects under the prefix `bucket/dir/`, in this case,
`bucket/dir/obj-1`.
In handlers related to health diagnostics e.g. CPU, Network, Partitions,
etc, globalMinioHost was being passed as the addr, resulting in empty
value for the same in the health report.
Using globalLocalNodeName instead fixes the issue.
IAM loading is a lazy operation, allow these
fallbacks to be in place when we cannot find
in-memory state().
this allows us to honor the request even if pay
a small price for lookup and populating the data.
When objects have more versions than their ILM policy expects to retain
via NewerNoncurrentVersions, but they don't qualify for expiry due to
NoncurrentDays are configured in that rule.
In this case, applyNewerNoncurrentVersionsLimit method was enqueuing empty
tasks, which lead to a panic (panic: runtime error: index out of range [0] with
length 0) in newerNoncurrentTask.OpHash method, which assumes the task
to contain at least one version to expire.
When returning the status of a decommissioned pool, a pool with zero
time StartedTime will be considered an active pool, which is unexpected.
This commit will always ensure that a pool's canceled/failed/completed
status is returned.
This commit changes how MinIO generates the object encryption key (OEK)
when encrypting an object using server-side encryption.
This change is fully backwards compatible. Now, MinIO generates
the OEK as following:
```
Nonce = RANDOM(32) // generate 256 bit random value
OEK = HMAC-SHA256(EK, Context || Nonce)
```
Before, the OEK was computed as following:
```
Nonce = RANDOM(32) // generate 256 bit random value
OEK = SHA256(EK || Nonce)
```
The new scheme does not technically fix a security issue but
uses a more familiar scheme. The only requirement for the
OEK generation function is that it produces a (pseudo)random value
for every pair (`EK`,`Nonce`) as long as no `EK`-`Nonce` combination
is repeated. This prevents a faulty PRNG from repeating or generating
a "bad" key.
The previous scheme guarantees that the `OEK` is a (pseudo)random
value given that no pair (`EK`,`Nonce`) repeats under the assumption
that SHA256 is indistinguable from a random oracle.
The new scheme guarantees that the `OEK` is a (pseudo)random value
given that no pair (`EK`, `Nonce`) repeats under the assumption that
SHA256's underlying compression function is a PRF/PRP.
While the later is a weaker assumption, and therefore, less likely
to be false, both are considered true. SHA256 is believed to be
indistinguable from a random oracle AND its compression function
is assumed to be a PRF/PRP.
As far as the OEK generating is concerned, the OS random number
generator is not required to be pseudo-random but just non-repeating.
Apart from being more compatible to standard definitions and
descriptions for how to generate crypto. keys, this change does not
have any impact of the actual security of the OEK key generation.
Signed-off-by: Andreas Auernhammer <github@aead.dev>
avoids error during upgrades such as
```
API: SYSTEM()
Time: 19:19:22 UTC 03/18/2024
DeploymentID: 24e4b574-b28d-4e94-9bfa-03c363a600c2
Error: Invalid api configuration: found invalid keys (expiry_workers=100 ) for 'api' sub-system, use 'mc admin config reset myminio api' to fix invalid keys (*fmt.wrapError)
11: internal/logger/logger.go:260:logger.LogIf()
...
```
we were prematurely not writing 4k pages while we
could have due to the fact that most buffers would
be multiples of 4k upto some number and there shall
be some remainder.
We only need to write the remainder without O_DIRECT.
at scale customers might start with failed drives,
causing skew in the overall usage ratio per EC set.
make this configurable such that customers can turn
this off as needed depending on how comfortable they
are.
Cosmetic change, but breaks up a big code block and will make a goroutine
dumps of streams are more readable, so it is clearer what each goroutine is doing.
Currently, the code relies on object parity to decide whether it is a
delete marker or a regular object. In the case of a delete marker, the
return quorum is half of the disks in the erasure set. However, this
calculation must be corrected with objects with EC = 0, mainly
because EC is not a one-time fixed configuration.
Though all data are correct, the manifested symptom is a 503 with an
EC=0 object. This bug was manifested after we introduced the
fast Get Object feature that does not read all data from all disks in
case of inlined objects
Metrics v3 is mainly a reorganization of metrics into smaller groups of
metrics and the removal of internal aggregation of metrics received from
peer nodes in a MinIO cluster.
This change adds the endpoint `/minio/metrics/v3` as the top-level metrics
endpoint and under this, various sub-endpoints are implemented. These
are currently documented in `docs/metrics/v3.md`
The handler will serve metrics at any path
`/minio/metrics/v3/PATH`, as follows:
when PATH is a sub-endpoint listed above => serves the group of
metrics under that path; or when PATH is a (non-empty) parent
directory of the sub-endpoints listed above => serves metrics
from each child sub-endpoint of PATH. otherwise, returns a no
resource found error
All available metrics are listed in the `docs/metrics/v3.md`. More will
be added subsequently.
When an object qualifies for both tiering and expiration rules and is
past its expiration date, it should be expired without requiring to tier
it, even when tiering event occurs before expiration.