This commit fixes a subtle bug in the ETag
`IsEncrypted` implementation.
An encrypted ETag may contain random bytes,
i.e. some randomness used for encryption.
This random value can contain a '-' byte
simple due to being randomly generated.
Before, the `IsEncrypted` implementation
incorrectly assumed that an encrypted ETag
cannot contain a '-' since it would be a
multipart ETag. Multipart ETags have a
16 byte value followed by a '-' and the part number.
For example:
```
059ba80b807c3c776fb3bcf3f33e11ae-2
```
However, the following encrypted ETag
```
20000f00db2d90a7b40782d4cff2b41a7799fc1e7ead25972db65150118dfbe2ba76a3c002da28f85c840cd2001a28a9
```
also contains a '-' byte but is not a multipart ETag.
This commit fixes the `IsEncrypted` implementation
simply by checking whether the ETag is at least 32
bytes long. A valid multipart ETag is never 32 bytes
long since a part number must be <= 10000.
However, an encrypted ETag must be at least 32 bytes
long. It contains the encrypted ETag bytes (16 bytes)
and the authentication tag added by the AEAD cipher (again
16 bytes).
Signed-off-by: Andreas Auernhammer <hi@aead.dev>
This commit adds support for bulk ETag
decryption for SSE-S3 encrypted objects.
If KES supports a bulk decryption API, then
MinIO will check whether its policy grants
access to this API. If so, MinIO will use
a bulk API call instead of sending encrypted
ETags serially to KES.
Note that MinIO will not use the KES bulk API
if its client certificate is an admin identity.
MinIO will process object listings in batches.
A batch has a configurable size that can be set
via `MINIO_KMS_KES_BULK_API_BATCH_SIZE=N`.
It defaults to `500`.
This env. variable is experimental and may be
renamed / removed in the future.
Signed-off-by: Andreas Auernhammer <hi@aead.dev>
In riscv64, the `syscall.Uname` function will return a uint8 slice.
func main() {
var buf syscall.Utsname
fmt.Printf("Buffer Type: %T\n", buf.Release)
}
output:
Buffer Type: [65]uint8
This is tested in the Arch Linux RISC-V 64 QEMU environment.
Signed-off-by: Avimitin <avimitin@gmail.com>
Fix `panic: "POST /minio/peer/v21/signalservice?signal=2": sync: WaitGroup is reused before previous Wait has returned`
Log entries already on the channel would cause `logEntry` to increment the
waitgroup when sending messages, after Cancel has been called.
Instead of tracking every single message, just check the send goroutine. Faster
and safe, since it will not decrement until the channel is closed.
Regression from #14289
avoids creating new transport for each `isServerResolvable`
request, instead re-use the available global transport and do
not try to forcibly close connections to avoid TIME_WAIT
build upon large clusters.
Never use httpClient.CloseIdleConnections() since that can have
a drastic effect on existing connections on the transport pool.
Remove it everywhere.
PR introduced in #13819 was incorrect and was not
handling the situation where a buffer is full can
cause incessant amount of logs that would keep the
logger webhook overrun by the requests.
To avoid this only log failures to console logger
instead of all targets as it can cause self reference,
leading to an infinite loop.
This PR simply adds a warning message when it detects older kernel
versions and warn's them about potential performance issues on this
kernel.
The issue can be seen only with parallel I/O across all drives
on denser setups such as 90 drives or 45 drives per server configurations.
The main goal of this PR is to solve the situation where disks stop
responding to operations. This generally causes an FD build-up and
eventually will crash the server.
This adds detection of hung disks, where calls on disk get stuck.
We add functionality to `xlStorageDiskIDCheck` where it keeps
track of the number of concurrent requests on a given disk.
A total number of 100 operations are allowed. If this limit is reached
we will block (but not reject) new requests, but we will monitor the
state of the disk.
If no requests have been completed or updated within a 15-second
window, we mark the disk as offline. Requests that are blocked will be
unblocked and return an error as "faulty disk".
New requests will be rejected until the disk is marked OK again.
Once a disk has been marked faulty, a check will run every 5 seconds that
will attempt to write and read back a file. As long as this fails the disk will
remain faulty.
To prevent lots of long-running requests to mark the disk faulty we
implement a callback feature that allows updating the status as parts
of these operations are running.
We add a reader and writer wrapper that will update the status of each
successful read/write operation. This should allow fine enough granularity
that a slow, but still operational disk will not reach 15 seconds where
50 operations have not progressed.
Note that errors themselves are not enough to mark a disk faulty.
A nil (or io.EOF) error will mark a disk as "good".
* Make concurrent disk setting configurable via `_MINIO_DISK_MAX_CONCURRENT`.
* de-couple IsOnline() from disk health tracker
The purpose of IsOnline() is to ensure that we
reconnect the drive only when the "drive" was
- disconnected from network we need to validate
if the drive is "correct" and is the same drive
which belongs to this server.
- drive was replaced we have to format it - we
support hot swapping of the drives.
IsOnline() is not meant for taking the drive offline
when it is hung, it is not useful we can let the
drive be online instead "return" errors for relevant
calls.
* return errFaultyDisk for DiskInfo() call
Co-authored-by: Harshavardhana <harsha@minio.io>
Possible future Improvements:
* Unify the REST server and local xlStorageDiskIDCheck. This would also improve stats significantly.
* Allow reads/writes to be aborted by the context.
* Add usage stats, concurrent count, blocked operations, etc.
This commit removes some duplicate code that
converts KES API errors.
This code was added since KES `0.18.0` changed
some exported API errors. However, the KES SDK
handles this error conversion itself.
Therefore, it is not necessary to duplicate this
behavior in MinIO.
See: 21555fa624/error.go (L94)
Signed-off-by: Andreas Auernhammer <hi@aead.dev>
- Updating KES dependency to v.0.18.0
- Fixing incompatibility issue when checking for errors during KES key creation
Signed-off-by: Lenin Alevski <alevsk.8772@gmail.com>
fixes a regression introduced in #14269 that refactored
the notification registration logic, all the amqp targets
however online will not be available for use anymore.
fixes#14451
Up until now `InitializeProvider` method of `Config` struct was
implemented on a value receiver which is why changes on `provider`
field where never reflected to method callers. In order to fix this
issue, the method was implemented on a pointer receiver.
This is a regression from #14037, distributed setups
with MQTT was not working anymore. According to MQTT
spec it is expected this is unique per server.
We shall proceed to use unix nano timestamp hex
value instead here.
Currently, when applying any dynamic config, the system reloads and
re-applies the config of all the dynamic sub-systems.
This PR refactors the code in such a way that changing config of a given
dynamic sub-system will work on only that sub-system.
The `LookupConfig` code was not using `GetWithDefault`, because of which
some of the config values were being returned as empty string, and calls
like `strconv.Atoi` and `time.ParseDuration` on these were failing.
Enabled with `mc admin config set alias/ api gzip_objects=on`
Standard filtering applies (1K response minimum, not compressed content
type, not range request, gzip accepted by client).
When setting a config of a particular sub-system, validate the existing
config and notification targets of only that sub-system, so that
existing errors related to one sub-system (e.g. notification target
offline) do not result in errors for other sub-systems.
This change allows the MinIO server to lookup users in different directory
sub-trees by allowing specification of multiple search bases separated by
semicolons.
This speed-up is intended for faster startup times
for almost all MinIO operations. Changes here are
- Drives are not re-read for 'format.json' on a regular
basis once read during init is remembered and refreshed
at 5 second intervals.
- Do not do O_DIRECT tests on drives with existing 'format.json'
only fresh setups need this check.
- Parallelize initializing erasureSets for multiple sets.
- Avoid re-reading format.json when migrating 'format.json'
from really old V1->V2->V3
- Keep a copy of local drives for any given server in memory
for a quick lookup.
repeated reads on single large objects in HPC like
workloads, need the following option to disable
O_DIRECT for a more effective usage of the kernel
page-cache.
However this optional should be used in very specific
situations only, and shouldn't be enabled on all
servers.
NVMe servers benefit always from keeping O_DIRECT on.
The code was not properly deciding if a lock needs to be removed
when it doesn't have quorum anymore. After this commit, a lock will be
forcefully unlocked if nodes reporting they are not able to find a lock
internally breaks the quorum.
Simplify the code as well.
```
λ mc admin decommission start alias/ http://minio{1...2}/data{1...4}
```
```
λ mc admin decommission status alias/
┌─────┬─────────────────────────────────┬──────────────────────────────────┬────────┐
│ ID │ Pools │ Capacity │ Status │
│ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Active │
│ 2nd │ http://minio{3...4}/data{1...4} │ 329 GiB (used) / 421 GiB (total) │ Active │
└─────┴─────────────────────────────────┴──────────────────────────────────┴────────┘
```
```
λ mc admin decommission status alias/ http://minio{1...2}/data{1...4}
Progress: ===================> [1GiB/sec] [15%] [4TiB/50TiB]
Time Remaining: 4 hours (started 3 hours ago)
```
```
λ mc admin decommission status alias/ http://minio{1...2}/data{1...4}
ERROR: This pool is not scheduled for decommissioning currently.
```
```
λ mc admin decommission cancel alias/
┌─────┬─────────────────────────────────┬──────────────────────────────────┬──────────┐
│ ID │ Pools │ Capacity │ Status │
│ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Draining │
└─────┴─────────────────────────────────┴──────────────────────────────────┴──────────┘
```
> NOTE: Canceled decommission will not make the pool active again, since we might have
> Potentially partial duplicate content on the other pools, to avoid this scenario be
> very sure to start decommissioning as a planned activity.
```
λ mc admin decommission cancel alias/ http://minio{1...2}/data{1...4}
┌─────┬─────────────────────────────────┬──────────────────────────────────┬────────────────────┐
│ ID │ Pools │ Capacity │ Status │
│ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Draining(Canceled) │
└─────┴─────────────────────────────────┴──────────────────────────────────┴────────────────────┘
```