Current implementation requires server pools to have
same erasure stripe sizes, to facilitate same SLA
and expectations.
This PR allows server pools to be variadic, i.e they
do not have to be same erasure stripe sizes - instead
they should have SLA for parity ratio.
If the parity ratio cannot be guaranteed by the new
server pool, the deployment is rejected i.e server
pool expansion is not allowed.
This ensures that all the prometheus monitoring and usage
trackers to avoid alerts configured, although we cannot
support v1 to v2 here - we can v2 to v3.
A lot of memory is consumed when uploading small files in parallel, use
the default upload parameters and add MINIO_AZURE_UPLOAD_CONCURRENCY for
users to tweak.
Synchronous replication can be enabled by setting the --sync
flag while adding a remote replication target.
This PR also adds proxying on GET/HEAD to another node in a
active-active replication setup in the event of a 404 on the current node.
This PR refactors the way we use buffers for O_DIRECT and
to re-use those buffers for messagepack reader writer.
After some extensive benchmarking found that not all objects
have this benefit, and only objects smaller than 64KiB see
this benefit overall.
Benefits are seen from almost all objects from
1KiB - 32KiB
Beyond this no objects see benefit with bulk call approach
as the latency of bytes sent over the wire v/s streaming
content directly from disk negate each other with no
remarkable benefits.
All other optimizations include reuse of msgp.Reader,
msgp.Writer using sync.Pool's for all internode calls.
30 seconds white spaces is long for some setups which time out when no
read activity in short time, reduce the subnet health white space ticker
to 5 seconds, since it has no cost at all.
Use separate sync.Pool for writes/reads
Avoid passing buffers for io.CopyBuffer()
if the writer or reader implement io.WriteTo or io.ReadFrom
respectively then its useless for sync.Pool to allocate
buffers on its own since that will be completely ignored
by the io.CopyBuffer Go implementation.
Improve this wherever we see this to be optimal.
This allows us to be more efficient on memory usage.
```
385 // copyBuffer is the actual implementation of Copy and CopyBuffer.
386 // if buf is nil, one is allocated.
387 func copyBuffer(dst Writer, src Reader, buf []byte) (written int64, err error) {
388 // If the reader has a WriteTo method, use it to do the copy.
389 // Avoids an allocation and a copy.
390 if wt, ok := src.(WriterTo); ok {
391 return wt.WriteTo(dst)
392 }
393 // Similarly, if the writer has a ReadFrom method, use it to do the copy.
394 if rt, ok := dst.(ReaderFrom); ok {
395 return rt.ReadFrom(src)
396 }
```
From readahead package
```
// WriteTo writes data to w until there's no more data to write or when an error occurs.
// The return value n is the number of bytes written.
// Any error encountered during the write is also returned.
func (a *reader) WriteTo(w io.Writer) (n int64, err error) {
if a.err != nil {
return 0, a.err
}
n = 0
for {
err = a.fill()
if err != nil {
return n, err
}
n2, err := w.Write(a.cur.buffer())
a.cur.inc(n2)
n += int64(n2)
if err != nil {
return n, err
}
```
with changes present to automatically throttle crawler
at runtime, there is no need to have an environment
value to disable crawling. crawling is a fundamental
piece for healing, lifecycle and many other features
there is no good reason anyone would need to disable
this on a production system.
* Apply suggestions from code review
globalSubscribers.NumSubscribers() is heavily used in S3 requests and it
uses mutex, use atomic.Load instead since it is faster
Co-authored-by: Anis Elleuch <anis@min.io>
Rewrite parentIsObject() function. Currently if a client uploads
a/b/c/d, we always check if c, b, a are actual objects or not.
The new code will check with the reverse order and quickly quit if
the segment doesn't exist.
So if a, b, c in 'a/b/c' does not exist in the first place, then returns
false quickly.
The only purpose of check-dir flag in
ReadVersion is to return 404 when
an object has xl.meta but without data.
This is causing an extract call to the disk
which can be penalizing in case of busy system
where disks receive many concurrent access.
mc admin trace does not show the correct handler name in the output: it
is printing `maxClients' for all handlers. The reason is that the wrong
order of handler wrapping.
Fixes two problems
- Double healing when bitrot is enabled, instead heal attempt
once in applyActions() before lifecycle is applied.
- If applyActions() is successful and getSize() returns proper
value, then object is accounted for and should be removed
from the oldCache namespace map to avoid double heal attempts.
main reason is that HealObjects starts a recursive listing
for each object, this can be a really really long time on
large namespaces instead avoid recursive listing just
perform HealObject() instead at the prefix.
delete's already handle purging dangling content, we
don't need to achieve this by doing recursive listing,
this in-turn can delay crawling significantly.