sendfile implementation to perform DMA on all platforms
Go stdlib already supports sendfile/splice implementations
for
- Linux
- Windows
- *BSD
- Solaris
Along with this change however O_DIRECT for reads() must be
removed as well since we need to use sendfile() implementation
The main reason to add O_DIRECT for reads was to reduce the
chances of page-cache causing OOMs for MinIO, however it would
seem that avoiding buffer copies from user-space to kernel space
this issue is not a problem anymore.
There is no Go based memory allocation required, and neither
the page-cache is referenced back to MinIO. This page-
cache reference is fully owned by kernel at this point, this
essentially should solve the problem of page-cache build up.
With this now we also support SG - when NIC supports Scatter/Gather
https://en.wikipedia.org/wiki/Gather/scatter_(vector_addressing)
replace io.Discard usage to fix NUMA copy() latencies
On NUMA systems copying from 8K buffer allocated via
io.Discard leads to large latency build-up for every
```
copy(new8kbuf, largebuf)
```
can in-cur upto 1ms worth of latencies on NUMA systems
due to memory sharding across NUMA nodes.
Also shutdown poll add jitter, to verify if the shutdown
sequence can finish before 500ms, this reduces the overall
time taken during "restart" of the service.
Provides speedup for `mc admin service restart` during
active I/O, also ensures that systemd doesn't treat the
returned 'error' as a failure, certain configurations in
systemd can cause it to 'auto-restart' the process by-itself
which can interfere with `mc admin service restart`.
It can be observed how now restarting the service is
much snappier.
DNS refresh() in-case of MinIO can safely re-use
the previous values on bare-metal setups, since
bare-metal arrangements do not change DNS in any
manner commonly.
This PR simplifies that, we only ever need DNS caching
on bare-metal setups.
- On containerized setups do not enable DNS
caching at all, as it may have adverse effects on
the overall effectiveness of k8s DNS systems.
k8s DNS systems are dynamic and expect applications
to avoid managing DNS caching themselves, instead
provide a cleaner container native caching
implementations that must be used.
- update IsDocker() detection, including podman runtime
- move to minio/dnscache fork for a simpler package
Ensure delete marker replication success, especially since the
recent optimizations to heal on HEAD, LIST and GET can force
replication attempts on delete marker before underlying object
version could have synced.
Following code can reproduce an unending go-routine buildup,
while keeping connections established due to lack of client
not closing the connections.
https://gist.github.com/harshavardhana/2d00e6f909054d2d2524c71485ad02e1
Without this PR all MinIO deployments can be put into
denial of service attacks, causing entire service to be
unavailable.
We bring in two timeouts at this stage to control such
go-routine build ups, new change
- IdleTimeout (to kill off idle connections)
- ReadHeaderTimeout (to kill off connections that are too slow)
This new change also brings two hidden options to make any
additional relevant changes if desired in some setups.
introduce x-minio-force-create environment variable
to force create a bucket and its metadata as required,
it is useful in some situations when bucket metadata
needs recovery.
- Go might reset the internal http.ResponseWriter() to `nil`
after Write() failure if the go-routine has returned, do not
flush() such scenarios and avoid spurious flushes() as
returning handlers always flush.
- fix some racy tests with the console
- avoid ticker leaks in certain situations
read/writers are not concurrent in handlers
and self contained - no need to use atomics on
them.
avoids unnecessary contentions where it's not
required.
various situations where the client is retrying the request
server going through shutdown might incorrectly send 403
which is a non-retriable error, this PR allows for clients
when they retry an attempt to go to another healthy pod
or server in a distributed cluster - assuming it is a properly
load-balanced setup.
3DES is enabled by default in Golang, this commit will use
tls.CipherSuites() which returns all ciphers excluding those with
security issues, such as 3DES.
additionally optimize for IP only setups, avoid doing
unnecessary lookups if the Dial addr is an IP.
allow support for multiple listeners on same socket,
this is mainly meant for future purposes.