Calling unfreezeServices twice results in panic:
```
panic: "POST /minio/peer/v32/signalservice?signal=4&sub-sys=": close of nil channel
goroutine 14703 [running]:
runtime/debug.Stack()
runtime/debug/stack.go:24 +0x65
github.com/minio/minio/cmd.setCriticalErrorHandler.func1.1()
github.com/minio/minio/cmd/generic-handlers.go:549 +0x8e
panic({0x27c3020, 0x4c9b370})
runtime/panic.go:884 +0x212
github.com/minio/minio/cmd.unfreezeServices()
github.com/minio/minio/cmd/service.go:112 +0xc7
github.com/minio/minio/cmd.(*peerRESTServer).SignalServiceHandler(0x0?, {0x4cb6af0, 0xc010b96420}, 0xc01affab00)
github.com/minio/minio/cmd/peer-rest-server.go:837 +0x13a
net/http.HandlerFunc.ServeHTTP(...)
```
If the function was called a second time `val` would not be nil, but the returned channel `ch` would be, causing the panic.
Check the channel isn't nil and also use Swap for an atomic swap instead of 2 separate operations (though we are in a mutex).
Disk level O_DIRECT support checking at xl storage initialization was
conditional on a config setting being enabled. (This never took effect
because config initialization happens after ObjectLayer is ready.) This
is not necessary as the config setting is dynamic - O_DIRECT should be
enabled via runtime config. So we need to do the disk level support
check regardless of the config setting.
- Trace needs higher buffered channels than 4000 to ensure
when we run `mc admin trace -a` it captures all information
sufficiently.
- Listen event notification needs the event channel to be
`apiRequestsMaxPerNode` * number of nodes
Currently, the retry is not fully used when there is no backup copy of
the data usage; use 5 retry attempts when we don't have any valid data,
new or backup, unless we have seen an un-recognized error.
comment in the code provides more detailed explanation
on what this PR entails and its assumptions.
this PR reduces the amount of listing() by an order
of magnitude, however there are other such calls that
still needs further optimization that shall be done
in subsequent PRs.
Add a new endpoint for "resource" metrics `/v2/metrics/resource`
This should return system metrics related to drives, network, CPU and
memory. Except for drives, other metrics should have corresponding "avg"
and "max" values also.
Reuse the real-time feature to capture the required data,
introducing CPU and memory metrics in it.
Collect the data every minute and keep updating the average and max values
accordingly, returning the latest values when the API is called.
without this the rename2() can rename the previous dataDir
causing issues for different versions of the object, only
latest version is preserved due to this bug.
Added healing code to ensure recovery of such content.
not checking w.Close() can prematurely make us
think that the w.Write() actually succeeded, apparently
Write() may or may not return an error but sometimes
only during a Close() call to the fd we may see the
error from Write() propagate.
Fdatasync(w) on the FD would return an error requiring
Close() error handling is less of a concern, however it may
happen such that fdatasync() did not return an error, where
as Close() would.
Currently, setting a new tiering target returns an error when a bucket
is versioned and the tiering credentials does not have authorization to
specify a version-id when reading or removing a specific version;
Since tiering does not require versioning anymore; avoid doing versioned
operations when performing checklist ops while adding a new tiering
configuration.
Do not error out when a provided marker is before or after the prefix, but instead just ignore it if before and return an empty list when after.
Fixes#18093
Include object and versions heal scan times when checking non-empty abandoned folders.
Furthermore don't add delay between healing versions, instead do one per object wait.
This PR changes the StatObject() to be must have for non-minio source
to being a conditional API call.
- Calls StatObject() when needed
- Calls GetObjectTagging() when needed
These calls if we do without these conditionals can cause a lot of
delays, so we avoid them if not needed in more common scenario.
all retries must not be counted as failed messages,
a failed message is a single counter not for all
retries, this PR fixes this.
Also we do not need to retry 10-times, instead we should
retry at max 3 times with some jitter to deliver the
messages.
If MinIO started with KMS enabled, MINIO_KMS_KES_KEY_NAME should
be set for server to start.
Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
In a perf test, one node will run speed test with all nodes. If there is
an error with a peer node, the peer node name is not included in the
error hence confusing the user.
This commit will add the peer endpoint string to the netperf error.
To ensure that policy mappings are current for service accounts
belonging to (non-derived) STS accounts (like an LDAP user's service
account) we periodically reload such mappings.
This is primarily to handle a case where a policy mapping update
notification is missed by a minio node. Such a node would continue to
have the stale mapping in memory because STS creds/mappings were never
periodically scanned from storage.