kms: Expose API available when bucket federation is enabled
When bucket federation feature is enabled, KMS API will not work, such
as `mc admin kms key list`
The commit will fix the issue by disabling bucket forwarding when this
is a KMS request.
Tracing syscalls, opening and reading an `xl.meta` looks like this:
```
openat(AT_FDCWD, "/mnt/drive1/ss8-old/testbucket/ObjSize4MiBThreads72/(554O51H/peTb(0iztdbTKw59.csv/xl.meta", O_RDONLY|O_NOATIME|O_CLOEXEC) = 34 <0.000>
fcntl(34, F_GETFL) = 0x48000 (flags O_RDONLY|O_LARGEFILE|O_NOATIME) <0.000>
fcntl(34, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_NOATIME) = 0 <0.000>
epoll_ctl(4, EPOLL_CTL_ADD, 34, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=3172471557, u64=8145488475984499461}}) = -1 EPERM (Operation not permitted) <0.000>
fcntl(34, F_GETFL) = 0x48800 (flags O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_NOATIME) <0.000>
fcntl(34, F_SETFL, O_RDONLY|O_LARGEFILE|O_NOATIME) = 0 <0.000>
fstat(34, {st_mode=S_IFREG|0644, st_size=354, ...}) = 0 <0.000>
read(34, "XL2 \1\0\3\0\306\0\0\1P\2\2\1\304$\225\304\20\0\0\0\0\0\0\0\0\0\0\0"..., 354) = 354 <0.000>
close(34) = 0 <0.000>
```
Everything until `fstat` is the `os.Open` call.
Looking at the code: https://github.com/golang/go/blob/master/src/os/file_unix.go#L212-L243
It seems for every file it "tries" to see if it is pollable. This causes `syscall.SetNonblock(fd, true)` to be called. This is the first `F_SETFL`.
It then calls `f.pfd.Init("file", true)`. This will attempt to set it as pollable using `epoll_ctl`. This will always fail for files. It therefore calls `syscall.SetNonblock(fd, false)` resulting in the second `F_SETFL`.
If we set the `O_NONBLOCK` call on the initial open, we should avoid the 4 `fcntl` syscalls per file.
I don't see any way to avoid the `epoll_ctl` call, since kind is either `kindOpenFile` or `kindNonBlock`, so "pollable" will always be true. However avoiding 4 of 6 syscalls still seems worth it.
This should not have any effect, since files will end up with "nonblock" anyway.
allow non-inlined on disk to be inlined via
an unversioned ReadVersion() call, we only
need ReadXL() to resolve objects with multiple
versions only.
The choice of this block makes it to be dynamic
and chosen by the user via `mc admin config set`
Other bonus things
- Start measuring internode TTFB performance.
- Set TCP_NODELAY, TCP_CORK for low latency
removes contentious usage of mutexes in LRU, which
were never really reused in any manner; we do not
need it.
To trust hosts, the correct way is TLS certs; this PR completely
removes this dependency, which has never been useful.
```
0 0% 100% 25.83s 26.76% github.com/hashicorp/golang-lru/v2/expirable.(*LRU[...])
0 0% 100% 28.03s 29.04% github.com/hashicorp/golang-lru/v2/expirable.(*LRU[...])
```
Bonus: use `x-minio-time` as a nanosecond to avoid unnecessary
parsing logic of time strings instead of using a more
straightforward mechanism.
- Also, fix failure reporting at the end.
- Also, avoid parsing report objects when listing or resuming jobs, this
does not cause any bugs, it is only printing, not useful errors.
add unit test for v3 metrics for all its exposed endpoints
Bonus:
- support OpenMetrics encoding
- adds boot time for prometheus
- continueOnError is better to serve as
much metrics as possible.
```
curl http://localhost:9000/minio/metrics/v3/cluster/usage/buckets
```
Did not work as documented, due to the fact that there was a typo
in the bucket usage metrics registration group. This endpoint is
a cluster endpoint and does not require any `buckets` argument.
When creating the async listing, if the first request does not return within 3
minutes, it is stopped, since it isn't being kept alive.
Keep updating `lastHandout` while we are waiting for the initial request to be fulfilled.
When a fresh drive healing is finished, add more checks for the drive listing
errors. If any, re-list and heal again. Although this is an infrequent use
case to have listPathRaw() returning nil when minDisks is set to 1, we
still need to handle all possible use cases to avoid missing healing
any object.
Also, check for HealObject result to decide of an object is healed in the
fresh disk since HealObject returns nil if an object is healed in any
disk, and not in the new fresh drive.
In regular listing, this commit will avoid showing an object when its
latest version has a pending or failed deletion. In replicated setup.
It will also prevent showing older versions in the same case.
Also, sort the error map for multiple sites in ascending order
of deployment IDs, so that the error message generated is always
definitive order and same.
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
Add `ConnDialer` to abstract connection creation.
- `IncomingConn(ctx context.Context, conn net.Conn)` is provided as an entry point for
incoming custom connections.
- `ConnectWS` is provided to create web socket connections.
If `SkipReader` is called with a small initial buffer it may be doing a huge number if Reads to skip the requested number of bytes. If a small buffer is provided grab a 32K buffer and use that.
Fixes slow execution of `testAPIGetObjectWithMPHandler`.
Bonuses:
* Use `-short` with `-race` test.
* Do all suite test types with `-short`.
* Enable compressed+encrypted in `testAPIGetObjectWithMPHandler`.
* Disable big file tests in `testAPIGetObjectWithMPHandler` when using `-short`.
The code was advertenly passing max openfds to debug.SetMemoryLimit(),
fixing this accelerate go test in my machine.
This is only a testing bug, since the server context has always a valid
MaxMem, so the buggy code was never called in users environments.
Currently the status of a completed or failed batch is held in the
memory, a simple restart will lose the status and the user will not
have any visibility of the job that was long running.
In addition to the metrics, add a new API that reads the batch status
from the drives. A batch job will be cleaned up three days after
completion.
Also add the batch type in the batch id, the reason is that the batch
job request is removed immediately when the job is finished, then we
do not know the type of batch job anymore, hence a difficulty to locate
the job report
Hot load a policy document when during account authorization evaluation
to avoid returning 403 during server startup, when not all policies are
already loaded.
Add this support for group policies as well.
you can attempt a rebalance first i.e, start with 2 pools.
```
mc admin rebalance start alias/
```
and after that you can add a new pool, this would
potentially crash.
```
Jun 27 09:22:19 xxx minio[7828]: panic: runtime error: invalid memory address or nil pointer dereference
Jun 27 09:22:19 xxx minio[7828]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x58 pc=0x22cc225]
Jun 27 09:22:19 xxx minio[7828]: goroutine 1 [running]:
Jun 27 09:22:19 xxx minio[7828]: github.com/minio/minio/cmd.(*erasureServerPools).findIndex(...)
```
specifying customer HTTP client makes the gcs SDK
ignore the passed credentials, instead let the GCS
SDK manage the transport.
this PR fixes#19922 a regression from #19565
Currently, bucket metadata is being loaded serially inside ListBuckets
Objet API. Fix that by loading the bucket metadata as the number of
erasure sets * 10, which is a good approximation.
since the connection is active, the
response recorder body can grow endlessly
causing leak, as this bytes buffer is
never given back to GC due to an goroutine.
skip healing properly in scanner when drive is hotplugged
due to how the state is passed around the SkipHealing
might not be the true state() of the system always, causing
a situation where we might healing from the scanner on the
same drive which is being. Due to this competing heals get
triggered that slow each other down.
due to a historic bug in CopyObject() where
an inlined object loses its metadata, the
check causes an incorrect fallback verifying
data-dir.
CopyObject() bug was fixed in ffa91f9794 however
the occurrence of this problem is historic, so
the aforementioned check is stretching too much.
Bonus: simplify fileInfoRaw() to read xl.json as well,
also recreate buckets properly.
* Multipart SSEC checksums were not transferred.
* Remove key mismatch logging. This key is user-controlled with SSEC.
* If the source is SSEC and the destination reports ErrSSEEncryptedObject,
assume replication is good.
avoid concurrent callers for LoadUser() to even initiate
object read() requests, if an on-going operation is in progress.
this avoids many callers hitting the drives causing I/O
spikes, also allows for loading credentials faster.
the reason for this is to avoid STS mappings to be
purged without a successful load of other policies,
and all the credentials only loaded successfully
are properly handled.
This also avoids unnecessary cache store which was
implemented earlier for optimization.
Directory objects are used by applications that simulate the folder
structure of an on-disk filesystem. These are zero-byte objects with names
ending with '/'. They are only used to check whether a 'folder' exists in
the namespace.
StartSize starts with the raw free space of all disks in the given pool,
however during the status, CurrentSize is not showing the current free
raw space, as expected at least by `mc admin decom status` since it was
written.
Go's net/http is notoriously difficult to have a streaming
deadlines per READ/WRITE on the net.Conn if we add them they
interfere with the Go's internal requirements for a HTTP
connection.
Remove this support for now
fixes#19853
In the very rare case when all drives in a erasure set need to be healed,
remove .healing.bin from all drives, otherwise it will be stuck in a
loop
Also, fix a unit test that fails sometimes due to wrong test.
since #19688 there was a regression introduced during drive
lookups for single node multi-drive setups, drive replacement
would not work correctly without this PR.
This does not fix any current issue, but merging https://github.com/minio/madmin-go/pull/282
can lose the validation of the service account expiration time.
Add more defensive code for now. In the future, we should avoid doing
validation in another library.
precondition check was being honored before, validating
if anonymous access is allowed on the metadata of an
object, leading to metadata disclosure of the following
headers.
```
Last-Modified
Etag
x-amz-version-id
Expires:
Cache-Control:
```
although the information presented is minimal in nature,
and of opaque nature. It still simply discloses that an
object by a specific name exists or not without even having
enough permissions.
fix: authenticate LDAP via actual DN instead of normalized DN
Normalized DN is only for internal representation, not for
external communication, any communication to LDAP must be
based on actual user DN. LDAP servers do not understand
normalized DN.
fixes#19757
This change uses the updated ldap library in minio/pkg (bumped
up to v3). A new config parameter is added for LDAP configuration to
specify extra user attributes to load from the LDAP server and to store
them as additional claims for the user.
A test is added in sts_handlers.go that shows how to access the LDAP
attributes as a claim.
This is in preparation for adding SSH pubkey authentication to MinIO's SFTP
integration.
This commit will fix one rare case of a multipart object that
can be read in theory but GetObject API returned an error.
It turned out that a six years old code was marking a drive offline
when the bitrot streaming fails to read a part in a disk with any error.
This can affect reading a subsequent part, though having enough shards,
but unable to construct because one drive was marked offline earlier.
This commit will remove the drive marking offline code. It will also
close the bitrotstreaming reader before marking it as nil.
Currently, on enabling callhome (or restarting the server), the callhome
job gets scheduled. This means that one has to wait for 24hrs (the
default frequency duration) to see it in action and to figure out if it
is working as expected.
It will be a better user experience to perform the first callhome
execution immediately after enabling it (or on server start if already
enabled).
Also, generate audit event on callhome execution, setting the error
field in case the execution has failed.
* Store ModTime in the upload ID; return it when listing instead of the current time.
* Use this ModTime to expire and skip reading the file info.
* Consistent upload sorting in listing (since it now has the ModTime).
* Exclude healing disks to avoid returning an empty list.
```
==================
WARNING: DATA RACE
Read at 0x0000082be990 by goroutine 205:
github.com/minio/minio/cmd.setCommonHeaders()
Previous write at 0x0000082be990 by main goroutine:
github.com/minio/minio/cmd.lookupConfigs()
```
Recent Veeam is very picky about storage class names. Add `_MINIO_VEEAM_FORCE_SC` env var.
It will override the storage class returned by the storage backend if it is non-standard
and we detect a Veeam client by checking the User Agent.
Applies to HeadObject/GetObject/ListObject*
add deadlines that can be dynamically changed via
the drive max timeout values.
Bonus: optimize "file not found" case and hung drives/network - circuit break the check and return right
away instead of waiting.