Commit Graph

260 Commits

Author SHA1 Message Date
Harshavardhana 8c9ab85cfa
Add multipart uploads cache for ListMultipartUploads() (#20407)
this cache will be honored only when `prefix=""` while
performing ListMultipartUploads() operation.

This is mainly to satisfy applications like alluxio
for their underfs implementation and tests.

replaces https://github.com/minio/minio/pull/20181
2024-09-09 09:58:30 -07:00
Klaus Post f1302c40fe
Fix uninitialized replication stats (#20260)
Services are unfrozen before `initBackgroundReplication` is finished. This means that 
the globalReplicationStats write is racy. Switch to an atomic pointer.

Provide the `ReplicationPool` with the stats, so it doesn't have to be grabbed 
from the atomic pointer on every use.

All other loads and checks are nil, and calls return empty values when stats 
still haven't been initialized.
2024-08-15 05:04:40 -07:00
Klaus Post d96798ae7b
Add support profile deadlines and concurrent operations (#20244)
* Allow a maximum of 10 seconds to start profiling operations.
* Download up to 16 profiles concurrently, but only allow 10 seconds for
  each (does not include write time).
* Add cluster info as the first operation.
* Ignore remote download errors.
* Stop remote profiles if the request is terminated.
2024-08-15 03:36:00 -07:00
Klaus Post 53eb7656de
Add admin info timeouts (#20249)
Since a lot of operations load from storage, do remote calls, add a 10 second timeout to each operation.

This should make `mc admin info` return values even under extreme conditions.
2024-08-12 10:24:29 -07:00
Klaus Post fae563b85d
Add fixed timed restarts to updates (#19960) 2024-06-20 07:49:22 -07:00
Aditya Manthramurthy 5f78691fcf
ldap: Add user DN attributes list config param (#19758)
This change uses the updated ldap library in minio/pkg (bumped
up to v3). A new config parameter is added for LDAP configuration to
specify extra user attributes to load from the LDAP server and to store
them as additional claims for the user.

A test is added in sts_handlers.go that shows how to access the LDAP
attributes as a claim.

This is in preparation for adding SSH pubkey authentication to MinIO's SFTP
integration.
2024-05-24 16:05:23 -07:00
Harshavardhana 0b3eb7f218
add more deadlines and pass around context under most situations (#19752) 2024-05-15 15:19:00 -07:00
Klaus Post e7baf78ee8
fix: list operations resuming when hitting different node (#19494)
The rest of the peer clients were not consistent across nodes. So, meta cache requests 
would not go to the same server if a continuation happens on a different node.
2024-04-12 11:13:36 -07:00
Anis Eleuch 95bf4a57b6
logging: Add subsystem to log API (#19002)
Create new code paths for multiple subsystems in the code. This will
make maintaing this easier later.

Also introduce bugLogIf() for errors that should not happen in the first
place.
2024-04-04 05:04:40 -07:00
Aditya Manthramurthy b2c5b75efa
feat: Add Metrics V3 API (#19068)
Metrics v3 is mainly a reorganization of metrics into smaller groups of
metrics and the removal of internal aggregation of metrics received from
peer nodes in a MinIO cluster.

This change adds the endpoint `/minio/metrics/v3` as the top-level metrics
endpoint and under this, various sub-endpoints are implemented. These
are currently documented in `docs/metrics/v3.md`

The handler will serve metrics at any path
`/minio/metrics/v3/PATH`, as follows:

when PATH is a sub-endpoint listed above => serves the group of
metrics under that path; or when PATH is a (non-empty) parent 
directory of the sub-endpoints listed above => serves metrics
from each child sub-endpoint of PATH. otherwise, returns a no 
resource found error

All available metrics are listed in the `docs/metrics/v3.md`. More will
be added subsequently.
2024-03-10 01:15:15 -08:00
Harshavardhana 607cafadbc
converge clusterRead health into cluster health (#19063) 2024-02-15 16:48:36 -08:00
Harshavardhana 62761a23e6
remove unnecessary metrics in 'mc admin info' output (#19020)
Reduce the amount of data transfer on large deployments
2024-02-08 19:28:46 -08:00
Klaus Post 6005ad3d48
Fix shared top locks client (#19018)
`client` is shared across goroutines.

Seen with `mc support top locks` on minio built with `-race`.
2024-02-08 12:28:05 -08:00
Poorna 27d02ea6f7
metrics: add replication metrics on proxied requests (#18957) 2024-02-05 22:00:45 -08:00
Harshavardhana 100c35c281
avoid excessive logs when peer is down (#18969) 2024-02-04 23:25:42 -08:00
Harshavardhana 1d3bd02089
avoid close 'nil' panics if any (#18890)
brings a generic implementation that
prints a stack trace for 'nil' channel
closes(), if not safely closes it.
2024-01-28 10:04:17 -08:00
Harshavardhana 88837fb753
add new update v2 that updates per node, allows idempotent behavior (#18859)
add new update v2 that updates per node, allows idempotent behavior

new API ensures that

- binary is correct and can be downloaded checksummed verified
- committed to actual path
- restart returns back the relevant waiting drives
2024-01-26 08:40:13 -08:00
Harshavardhana 961f7dea82
compress binary while sending it to all the nodes (#18837)
Also limit the amount of concurrency when sending
binary updates to peers, avoid high network over
TX that can cause disconnection events for the
node sending updates.
2024-01-22 12:16:36 -08:00
Harshavardhana f9b4a8d6e8
improve server update behavior by re-using memory properly (#18831) 2024-01-19 18:27:58 -08:00
Harshavardhana ac81f0248c
introduce new ServiceV2 API to handle guided restarts (#18826)
New API now verifies any hung disks before restart/stop,
provides a 'per node' break down of the restart/stop results.

Provides also how many blocked syscalls are present on the
drives and what users must do about them.

Adds options to do pre-flight checks to provide information
to the user regarding any hung disks. Provides 'force' option
to forcibly attempt a restart() even with waiting syscalls
on the drives.
2024-01-19 14:22:36 -08:00
Anis Eleuch 8432fd5ac2
prom: Add online and healing drives metrics per erasure set (#18700) 2023-12-21 16:56:43 -08:00
Anis Eleuch aed7a1818a
info: Populate pool/set/disk indexes for offline disks (#18613)
This can be calculated from the disk layout and some external
applications would like to know the location of the offline
disks.
2023-12-08 08:13:04 -08:00
bestgopher 95d6f43cc8
fix(cmd/notification.go): no error when retry successful (#18530) 2023-11-27 22:41:03 -08:00
Shireesh Anjal f6e581ce54
Capture network device info in health report (#18381) 2023-11-02 09:49:49 -07:00
Shireesh Anjal bf1c6edb76
Revert "Capture network device info in health report" (#18241)
Introducing a new version of healthinfo struct for adding this info is
not correct. It needs to be implemented differently without adding a new
version.

This reverts commit 8737025d940f80360ed4b3686b332db5156f6659.
2023-10-13 07:46:36 -07:00
Shireesh Anjal a66a7f3e97
Capture network device info in health report (#18213) 2023-10-12 15:33:31 -07:00
Klaus Post 7cd08594f6
Use better host names for metric errors (#18188)
Typically hosts would end up like this:

```
   "hosts": [
        ":9000",
        ":9000",
        ":9000",
...
```

Also add host name to errors.
2023-10-09 17:27:11 -07:00
Shireesh Anjal 6d20ec3bea
Add support for resource metrics (#18057)
Add a new endpoint for "resource" metrics `/v2/metrics/resource`

This should return system metrics related to drives, network, CPU and
memory. Except for drives, other metrics should have corresponding "avg"
and "max" values also.

Reuse the real-time feature to capture the required data,
introducing CPU and memory metrics in it.

Collect the data every minute and keep updating the average and max values
accordingly, returning the latest values when the API is called.
2023-09-30 13:40:20 -07:00
Shubhendu bfddbb8b40
Embed file in ZIP with custom permissions (#17954)
This change enables embedding files in ZIP with custom permissions.
Also uses default creds for starting MinIO based on inspect data.

Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>
2023-09-06 09:24:01 -07:00
Harshavardhana 5b114b43f7
refactor bandwidth throttling for replication target (#17980)
This refactor is to allow using the bandwidth throttling
for other purposes.
2023-09-05 20:21:59 -07:00
Aditya Manthramurthy 1c99fb106c
Update to minio/pkg/v2 (#17967) 2023-09-04 12:57:37 -07:00
Poorna b48bbe08b2
Add additional info for replication metrics API (#17293)
to track the replication transfer rate across different nodes,
number of active workers in use and in-queue stats to get
an idea of the current workload.

This PR also adds replication metrics to the site replication
status API. For site replication, prometheus metrics are
no longer at the bucket level - but at the cluster level.

Add prometheus metric to track credential errors since uptime
2023-08-30 01:00:59 -07:00
Harshavardhana 1c5af7c31a
serialize queueMRFHeal(), add timeouts and avoid normal build-ups (#17886)
we expect a certain level of IOPs and latency so this is okay.

fixes other miscellaneous bugs

- such as hanging on mrfCh <- when the context is canceled
- queuing MRF heal when the context is canceled
- remove unused saveStateCh channel
2023-08-21 16:44:50 -07:00
Harshavardhana 6426b74770
move bucket centric metrics to /minio/v2/metrics/bucket handlers (#17663)
users/customers do not have a reasonable number of buckets anymore,
this is why we must avoid overpopulating cluster endpoints, instead
move the bucket monitoring to a separate endpoint.

some of it's a breaking change here for a couple of metrics, but
it is imperative that we do it to improve the responsiveness of
our Prometheus cluster endpoint.

Bonus: Added new cluster metrics for usage, objects and histograms
2023-07-18 22:25:12 -07:00
Poorna 5e2f8d7a42
replication: Simplify mrf requeueing and add backlog handler (#17171)
Simplify MRF queueing and add backlog handler

- Limit re-tries to 3 to avoid repeated re-queueing. Fall offs
to be re-tried when the scanner revisits this object or upon access.

- Change MRF to have each node process only its MRF entries.

- Collect MRF backlog by the node to allow for current backlog visibility
2023-07-12 23:51:33 -07:00
Kaan Kabalak f64d62b01d
Fix style of logOnceIf calls w/unique identifiers (#17631) 2023-07-11 13:17:45 -07:00
Harshavardhana f41edb23e2
add variadic delays in peer notification retries (#17592)
just adds more `jitter` in our retries to avoid
burst flooding for peer calls.
2023-07-07 07:47:38 -07:00
Kaan Kabalak 21fbe88e1f
Print certain log messages once per error (#17484) 2023-06-24 20:29:13 -07:00
Harshavardhana 7605d07bb2
add support for bucket level request count per API (#17468)
New metrics added to calculate API request count
per bucket, per API.  Captures errors, including
4xx, 5xx HTTP status codes separately.
2023-06-21 09:41:59 -07:00
Praveen raj Mani 7c72b25ef0
Add an option to make bucket notifications synchronous (#17406)
With the current asynchronous behaviour in sending notification events
to the targets, we can't provide guaranteed delivery as the systems
might go for restarts.

For such event-driven use-cases, we can provide an option to enable
synchronous events where the APIs wait until the event is successfully
sent or persisted.

This commit adds 'MINIO_API_SYNC_EVENTS' env which when set to 'on'
will enable sending/persisting events to targets synchronously.
2023-06-20 17:38:59 -07:00
Aditya Manthramurthy 5a1612fe32
Bump up madmin-go and pkg deps (#17469) 2023-06-19 17:53:08 -07:00
Praveen raj Mani 72802a5972
Use 'minio/pkg/sync/errgroup' and 'minio/pkg/workers' (#17069) 2023-04-25 22:57:40 -07:00
Anis Eleuch 6addc7a35d
server-info: Return initializing state properly (#17070) 2023-04-24 09:10:02 -07:00
Krishnan Parthasarathi 56c57e2c53
tier-stats: Avoid repeated logs (#16774) 2023-03-07 16:43:05 -08:00
Poorna a6057c35cc
Avoid peer notification when peer is offline, tune retries (#16737) 2023-03-07 08:13:28 -08:00
Klaus Post 9acf1024e4
Remove bloom filter (#16682)
Removes the bloom filter since it has so limited usability, often gets saturated anyway and adds a bunch of complexity to the scanner.

Also removes a tiny bit of CPU by each write operation.
2023-02-24 09:03:31 +05:30
Poorna 1b02e046c2
Fix bandwidth monitoring to be per remote target (#16360) 2023-01-19 18:52:16 +05:30
Krishnan Parthasarathi 71c95ad0d0
Signal stop-rebalance to all rebalancing pools (#16438) 2023-01-19 06:54:23 +05:30
Aditya Manthramurthy a30cfdd88f
Bump up madmin-go to v2 (#16162) 2022-12-06 13:46:50 -08:00
Harshavardhana 5a8df7efb3
re-implement StorageInfo to be a peer call (#16155) 2022-12-01 14:31:35 -08:00