Calling unfreezeServices twice results in panic:
```
panic: "POST /minio/peer/v32/signalservice?signal=4&sub-sys=": close of nil channel
goroutine 14703 [running]:
runtime/debug.Stack()
runtime/debug/stack.go:24 +0x65
github.com/minio/minio/cmd.setCriticalErrorHandler.func1.1()
github.com/minio/minio/cmd/generic-handlers.go:549 +0x8e
panic({0x27c3020, 0x4c9b370})
runtime/panic.go:884 +0x212
github.com/minio/minio/cmd.unfreezeServices()
github.com/minio/minio/cmd/service.go:112 +0xc7
github.com/minio/minio/cmd.(*peerRESTServer).SignalServiceHandler(0x0?, {0x4cb6af0, 0xc010b96420}, 0xc01affab00)
github.com/minio/minio/cmd/peer-rest-server.go:837 +0x13a
net/http.HandlerFunc.ServeHTTP(...)
```
If the function was called a second time `val` would not be nil, but the returned channel `ch` would be, causing the panic.
Check the channel isn't nil and also use Swap for an atomic swap instead of 2 separate operations (though we are in a mutex).
Disk level O_DIRECT support checking at xl storage initialization was
conditional on a config setting being enabled. (This never took effect
because config initialization happens after ObjectLayer is ready.) This
is not necessary as the config setting is dynamic - O_DIRECT should be
enabled via runtime config. So we need to do the disk level support
check regardless of the config setting.
- Trace needs higher buffered channels than 4000 to ensure
when we run `mc admin trace -a` it captures all information
sufficiently.
- Listen event notification needs the event channel to be
`apiRequestsMaxPerNode` * number of nodes
Currently, the retry is not fully used when there is no backup copy of
the data usage; use 5 retry attempts when we don't have any valid data,
new or backup, unless we have seen an un-recognized error.
comment in the code provides more detailed explanation
on what this PR entails and its assumptions.
this PR reduces the amount of listing() by an order
of magnitude, however there are other such calls that
still needs further optimization that shall be done
in subsequent PRs.
Add a new endpoint for "resource" metrics `/v2/metrics/resource`
This should return system metrics related to drives, network, CPU and
memory. Except for drives, other metrics should have corresponding "avg"
and "max" values also.
Reuse the real-time feature to capture the required data,
introducing CPU and memory metrics in it.
Collect the data every minute and keep updating the average and max values
accordingly, returning the latest values when the API is called.
without this the rename2() can rename the previous dataDir
causing issues for different versions of the object, only
latest version is preserved due to this bug.
Added healing code to ensure recovery of such content.
not checking w.Close() can prematurely make us
think that the w.Write() actually succeeded, apparently
Write() may or may not return an error but sometimes
only during a Close() call to the fd we may see the
error from Write() propagate.
Fdatasync(w) on the FD would return an error requiring
Close() error handling is less of a concern, however it may
happen such that fdatasync() did not return an error, where
as Close() would.
Currently, setting a new tiering target returns an error when a bucket
is versioned and the tiering credentials does not have authorization to
specify a version-id when reading or removing a specific version;
Since tiering does not require versioning anymore; avoid doing versioned
operations when performing checklist ops while adding a new tiering
configuration.
Do not error out when a provided marker is before or after the prefix, but instead just ignore it if before and return an empty list when after.
Fixes#18093
Include object and versions heal scan times when checking non-empty abandoned folders.
Furthermore don't add delay between healing versions, instead do one per object wait.