The `clusterInfo` struct in admin-handlers is same as
madmin.ClusterRegistrationInfo, except for small differences in field
names.
Removing this and using madmin.ClusterRegistrationInfo in its place will
help in following ways:
- The JSON payload generated by mc in case of cluster registration will
be consistent (same keys) with cluster.info generated by minio as part
of the profile and inspect zip
- health-analyzer can parse the cluster.info using the same struct and
won't have to define it's own
when object speedtest is running keep writing
previous speedtest result back to client until
we have a new result - this avoids sending back
blank entries in between the speedtest when it
is running in 'autotune' mode.
smaller setups may have less drives per server choosing
the concurrency based on number of local drives, and let
the MinIO server change the overall concurrency as
necessary.
this has been observed in multiple environments
where the setups are small `speedtest` naturally
fails with default '10s' and the concurrency
of '32' is big for such clusters.
choose a smaller value i.e equal to number of
drives in such clusters and let 'autotune'
increase the concurrency instead.
This PR changes the handling of bucket deletes for site
replicated setups to hold on to deleted bucket state until
it syncs to all the clusters participating in site replication.
Currently, if one server in a distributed setup fails to upgrade
due to any reasons, it is not possible to upgrade again unless
nodes are restarted.
To fix this, split the upgrade process into two steps :
- download the new binary on all servers
- If successful, overwrite the old binary with the new one
Add cluster info to inspect and profiling archive.
In addition to the existing data generation for both inspect and profiling,
cluster.info file is added. This latter contains some info of the cluster.
The generation of cluster.info is is done as the last step and it can fail
if it exceed 10 seconds.
This commit adds a `context.Context` to the
the KMS `{Stat, CreateKey, GenerateKey}` API
calls.
The context will be used to terminate external calls
as soon as the client requests gets canceled.
A follow-up PR will add a `context.Context` to
the remaining `DecryptKey` API call.
Signed-off-by: Andreas Auernhammer <hi@aead.dev>
The current code uses approximation using a ratio. The approximation
can skew if we have multiple pools with different disk capacities.
Replace the algorithm with a simpler one which counts data
disks and ignore parity disks.
- currently subnet health check was freezing and calling
locks at multiple locations, avoid them.
- throw errors if first attempt itself fails with no results
Currently minio_s3_requests_errors_total covers 4xx and
5xx S3 responses which can be confusing when s3 applications
sent a lot of HEAD requests with obvious 404 responses or
when the replication is enabled.
Add
- minio_s3_requests_4xx_errors_total
- minio_s3_requests_5xx_errors_total
to help users monitor 4xx and 5xx HTTP status codes separately.
The S3 service can be frozen indefinitely if a client or mc asks for object
perf API but quits early or has some networking issues. The reason is
that partialWrite() can block indefinitely.
This commit makes partialWrite() listens to context cancellation as well. It
also renames deadlinedCtx to healthCtx since it covers handler context
cancellation and not only not only the speedtest deadline.
Main motivation is move towards a common backend format
for all different types of modes in MinIO, allowing for
a simpler code and predictable behavior across all features.
This PR also brings features such as versioning, replication,
transitioning to single drive setups.
it seems in some places we have been wrongly using the
timer.Reset() function, nicely exposed by an example
shared by @donatello https://go.dev/play/p/qoF71_D1oXD
this PR fixes all the usage comprehensively
Execute the object, drive and net speedtests as part of the healthinfo
(if requested by the client), and include their result in the response.
The options for the speedtests have been picked from the default values
used by `mc support perf` command.
The deployment id was being written to the health report towards the end
of the handler. Because of this, if there was a timeout in any of the
data fetching, the deployment id was not getting written at all. Upload
of such reports fails on SUBNET as deployment id is the unique
identifier for a cluster in subnet.
Fixed by writing the deployment id at the beginning of the processing.