Tracing syscalls, opening and reading an `xl.meta` looks like this:
```
openat(AT_FDCWD, "/mnt/drive1/ss8-old/testbucket/ObjSize4MiBThreads72/(554O51H/peTb(0iztdbTKw59.csv/xl.meta", O_RDONLY|O_NOATIME|O_CLOEXEC) = 34 <0.000>
fcntl(34, F_GETFL) = 0x48000 (flags O_RDONLY|O_LARGEFILE|O_NOATIME) <0.000>
fcntl(34, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_NOATIME) = 0 <0.000>
epoll_ctl(4, EPOLL_CTL_ADD, 34, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=3172471557, u64=8145488475984499461}}) = -1 EPERM (Operation not permitted) <0.000>
fcntl(34, F_GETFL) = 0x48800 (flags O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_NOATIME) <0.000>
fcntl(34, F_SETFL, O_RDONLY|O_LARGEFILE|O_NOATIME) = 0 <0.000>
fstat(34, {st_mode=S_IFREG|0644, st_size=354, ...}) = 0 <0.000>
read(34, "XL2 \1\0\3\0\306\0\0\1P\2\2\1\304$\225\304\20\0\0\0\0\0\0\0\0\0\0\0"..., 354) = 354 <0.000>
close(34) = 0 <0.000>
```
Everything until `fstat` is the `os.Open` call.
Looking at the code: https://github.com/golang/go/blob/master/src/os/file_unix.go#L212-L243
It seems for every file it "tries" to see if it is pollable. This causes `syscall.SetNonblock(fd, true)` to be called. This is the first `F_SETFL`.
It then calls `f.pfd.Init("file", true)`. This will attempt to set it as pollable using `epoll_ctl`. This will always fail for files. It therefore calls `syscall.SetNonblock(fd, false)` resulting in the second `F_SETFL`.
If we set the `O_NONBLOCK` call on the initial open, we should avoid the 4 `fcntl` syscalls per file.
I don't see any way to avoid the `epoll_ctl` call, since kind is either `kindOpenFile` or `kindNonBlock`, so "pollable" will always be true. However avoiding 4 of 6 syscalls still seems worth it.
This should not have any effect, since files will end up with "nonblock" anyway.
allow non-inlined on disk to be inlined via
an unversioned ReadVersion() call, we only
need ReadXL() to resolve objects with multiple
versions only.
The choice of this block makes it to be dynamic
and chosen by the user via `mc admin config set`
Other bonus things
- Start measuring internode TTFB performance.
- Set TCP_NODELAY, TCP_CORK for low latency
Use `runtime.Gosched()` if we have less than maxMergeMessages and the
queue is empty. Up maxMergeMessages to 50 to merge more messages into
a single write.
Add length check for an early bailout on readAllInto when we know packet length.
This commit enforces FIPS-compliant TLS ciphers in FIPS mode
by importing the `fipsonly` module.
Otherwise, MinIO still accepts non-FIPS compliant TLS connections.
removes contentious usage of mutexes in LRU, which
were never really reused in any manner; we do not
need it.
To trust hosts, the correct way is TLS certs; this PR completely
removes this dependency, which has never been useful.
```
0 0% 100% 25.83s 26.76% github.com/hashicorp/golang-lru/v2/expirable.(*LRU[...])
0 0% 100% 28.03s 29.04% github.com/hashicorp/golang-lru/v2/expirable.(*LRU[...])
```
Bonus: use `x-minio-time` as a nanosecond to avoid unnecessary
parsing logic of time strings instead of using a more
straightforward mechanism.
- Also, fix failure reporting at the end.
- Also, avoid parsing report objects when listing or resuming jobs, this
does not cause any bugs, it is only printing, not useful errors.
Split the read and write sides of handleMessages into two separate functions
Cosmetic. The only non-copy-and-paste change is that `cancel(ErrDisconnected)` is moved
into the defer on `readStream`.
add unit test for v3 metrics for all its exposed endpoints
Bonus:
- support OpenMetrics encoding
- adds boot time for prometheus
- continueOnError is better to serve as
much metrics as possible.
```
curl http://localhost:9000/minio/metrics/v3/cluster/usage/buckets
```
Did not work as documented, due to the fact that there was a typo
in the bucket usage metrics registration group. This endpoint is
a cluster endpoint and does not require any `buckets` argument.
When creating the async listing, if the first request does not return within 3
minutes, it is stopped, since it isn't being kept alive.
Keep updating `lastHandout` while we are waiting for the initial request to be fulfilled.
When a fresh drive healing is finished, add more checks for the drive listing
errors. If any, re-list and heal again. Although this is an infrequent use
case to have listPathRaw() returning nil when minDisks is set to 1, we
still need to handle all possible use cases to avoid missing healing
any object.
Also, check for HealObject result to decide of an object is healed in the
fresh disk since HealObject returns nil if an object is healed in any
disk, and not in the new fresh drive.
In regular listing, this commit will avoid showing an object when its
latest version has a pending or failed deletion. In replicated setup.
It will also prevent showing older versions in the same case.
Also, sort the error map for multiple sites in ascending order
of deployment IDs, so that the error message generated is always
definitive order and same.
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
Add `ConnDialer` to abstract connection creation.
- `IncomingConn(ctx context.Context, conn net.Conn)` is provided as an entry point for
incoming custom connections.
- `ConnectWS` is provided to create web socket connections.
If `SkipReader` is called with a small initial buffer it may be doing a huge number if Reads to skip the requested number of bytes. If a small buffer is provided grab a 32K buffer and use that.
Fixes slow execution of `testAPIGetObjectWithMPHandler`.
Bonuses:
* Use `-short` with `-race` test.
* Do all suite test types with `-short`.
* Enable compressed+encrypted in `testAPIGetObjectWithMPHandler`.
* Disable big file tests in `testAPIGetObjectWithMPHandler` when using `-short`.