Services are unfrozen before `initBackgroundReplication` is finished. This means that
the globalReplicationStats write is racy. Switch to an atomic pointer.
Provide the `ReplicationPool` with the stats, so it doesn't have to be grabbed
from the atomic pointer on every use.
All other loads and checks are nil, and calls return empty values when stats
still haven't been initialized.
context deadline was introduced to avoid a slow transfer from blocking
replication queue(s) shared by other buckets that may not be under throttling.
This PR removes this context deadline for larger objects since they are
anyway restricted to a limited set of workers. Otherwise, objects would
get dequeued when the throttle limit is exceeded and cannot proceed
within the deadline.
When a drive is in a failed state when a single node multiple drives
deployment is started, a replacement of a fresh disk will not be
properly healed unless the user restarts the node.
Fix this by always adding the new fresh disk to globalLocalDrivesMap. Also
remove globalLocalDrives for simplification, a map to store local node
drives can still be used since the order of local drives of a node is
not defined.
Currently, bucket metadata is being loaded serially inside ListBuckets
Objet API. Fix that by loading the bucket metadata as the number of
erasure sets * 10, which is a good approximation.
* Multipart SSEC checksums were not transferred.
* Remove key mismatch logging. This key is user-controlled with SSEC.
* If the source is SSEC and the destination reports ErrSSEEncryptedObject,
assume replication is good.
If used, 'opts.Marker` will cause many missed entries since results are returned
unsorted, and pools are serialized.
Switch to fully concurrent listing and merging across pools to return sorted entries.
calling a remote target remove with a perfectly
well constructed ARN can lead to a crash for a bucket
with no replication configured.
This PR fixes, and adds a crash check for ImportMetadata
as well.
This PR makes a feasible approach to handle all the scenarios
that we must face to avoid returning "panic."
Instead, we must return "errServerNotInitialized" when a
bucketMetadataSys.Get() is called, allowing the caller to
retry their operation and wait.
Bonus fix the way data-usage-cache stores the object.
Instead of storing usage-cache.bin with the bucket as
`.minio.sys/buckets`, the `buckets` must be relative
to the bucket `.minio.sys` as part of the object name.
Otherwise, there is no way to decommission entries at
`.minio.sys/buckets` and their final erasure set positions.
A bucket must never have a `/` in it. Adds code to read()
from existing data-usage.bin upon upgrade.
This reverts commit 928c0181bf.
This change was not correct, reverting.
We track 3 states with the ProxyRequest header - if replication process wants
to know if object is already replicated with a HEAD, it shouldn't proxy back
- Poorna
Create new code paths for multiple subsystems in the code. This will
make maintaing this easier later.
Also introduce bugLogIf() for errors that should not happen in the first
place.
If site replication enabled across sites, replicate the SSE-C
objects as well. These objects could be read from target sites
using the same client encryption keys.
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
Each Put, List, Multipart operations heavily rely on making
GetBucketInfo() call to verify if bucket exists or not on
a regular basis. This has a large performance cost when there
are tons of servers involved.
We did optimize this part by vectorizing the bucket calls,
however its not enough, beyond 100 nodes and this becomes
fairly visible in terms of performance.
support proxying of tagging requests in active-active replication
Note: even if proxying is successful, PutObjectTagging/DeleteObjectTagging
will continue to report a 404 since the object is not present locally.
historically, we have always kept storage-rest-server
and a local storage API separate without much trouble,
since they both can independently operate due to no
special state() between them.
however, over some time, we have added state()
such as
- drive monitoring threads now there will be "2" of
them per drive instead of just 1.
- concurrent tokens available per drive are now twice
instead of just single shared, allowing unexpectedly
high amount of I/O to go through.
- applying serialization by using walkMutexes can now
be adequately honored for both remote callers and local
callers.
Regression from #18285. CopyObject options were inheriting source MTime
for metadata timestamps if unspecified, removing this prevented metadata
updates from being applied on target.
This allows batch replication to basically do not
attempt to copy objects that do not have read quorum.
This PR also allows walk() to provide custom
values for quorum under batch replication, and
key rotation.
Bonus: allow replication to attempt Deletes/Puts when
the remote returns quorum errors of some kind, this is
to ensure that MinIO can rewrite the namespace with the
latest version that exists on the source.
There can be rare situations where errors seen in bucket metadata
load on startup or subsequent metadata updates can result in missing
replication remotes.
Attempt a refresh of remote targets backed by a good replication config
lazily in 5 minute intervals if there ever occurs a situation where
remote targets go AWOL.