minio

Commit Graph

Author	SHA1	Message	Date
Shireesh Anjal	bf1c6edb76	Revert "Capture network device info in health report" (#18241 ) Introducing a new version of healthinfo struct for adding this info is not correct. It needs to be implemented differently without adding a new version. This reverts commit 8737025d940f80360ed4b3686b332db5156f6659.	2023-10-13 07:46:36 -07:00
Shireesh Anjal	a66a7f3e97	Capture network device info in health report (#18213 )	2023-10-12 15:33:31 -07:00
Klaus Post	7cd08594f6	Use better host names for metric errors (#18188 ) Typically hosts would end up like this: ``` "hosts": [ ":9000", ":9000", ":9000", ... ``` Also add host name to errors.	2023-10-09 17:27:11 -07:00
Shireesh Anjal	6d20ec3bea	Add support for resource metrics (#18057 ) Add a new endpoint for "resource" metrics `/v2/metrics/resource` This should return system metrics related to drives, network, CPU and memory. Except for drives, other metrics should have corresponding "avg" and "max" values also. Reuse the real-time feature to capture the required data, introducing CPU and memory metrics in it. Collect the data every minute and keep updating the average and max values accordingly, returning the latest values when the API is called.	2023-09-30 13:40:20 -07:00
Shubhendu	bfddbb8b40	Embed file in ZIP with custom permissions (#17954 ) This change enables embedding files in ZIP with custom permissions. Also uses default creds for starting MinIO based on inspect data. Signed-off-by: Shubhendu Ram Tripathi <shubhendu@minio.io>	2023-09-06 09:24:01 -07:00
Harshavardhana	5b114b43f7	refactor bandwidth throttling for replication target (#17980 ) This refactor is to allow using the bandwidth throttling for other purposes.	2023-09-05 20:21:59 -07:00
Aditya Manthramurthy	1c99fb106c	Update to minio/pkg/v2 (#17967 )	2023-09-04 12:57:37 -07:00
Poorna	b48bbe08b2	Add additional info for replication metrics API (#17293 ) to track the replication transfer rate across different nodes, number of active workers in use and in-queue stats to get an idea of the current workload. This PR also adds replication metrics to the site replication status API. For site replication, prometheus metrics are no longer at the bucket level - but at the cluster level. Add prometheus metric to track credential errors since uptime	2023-08-30 01:00:59 -07:00
Harshavardhana	1c5af7c31a	serialize queueMRFHeal(), add timeouts and avoid normal build-ups (#17886 ) we expect a certain level of IOPs and latency so this is okay. fixes other miscellaneous bugs - such as hanging on mrfCh <- when the context is canceled - queuing MRF heal when the context is canceled - remove unused saveStateCh channel	2023-08-21 16:44:50 -07:00
Harshavardhana	6426b74770	move bucket centric metrics to /minio/v2/metrics/bucket handlers (#17663 ) users/customers do not have a reasonable number of buckets anymore, this is why we must avoid overpopulating cluster endpoints, instead move the bucket monitoring to a separate endpoint. some of it's a breaking change here for a couple of metrics, but it is imperative that we do it to improve the responsiveness of our Prometheus cluster endpoint. Bonus: Added new cluster metrics for usage, objects and histograms	2023-07-18 22:25:12 -07:00
Poorna	5e2f8d7a42	replication: Simplify mrf requeueing and add backlog handler (#17171 ) Simplify MRF queueing and add backlog handler - Limit re-tries to 3 to avoid repeated re-queueing. Fall offs to be re-tried when the scanner revisits this object or upon access. - Change MRF to have each node process only its MRF entries. - Collect MRF backlog by the node to allow for current backlog visibility	2023-07-12 23:51:33 -07:00
Kaan Kabalak	f64d62b01d	Fix style of logOnceIf calls w/unique identifiers (#17631 )	2023-07-11 13:17:45 -07:00
Harshavardhana	f41edb23e2	add variadic delays in peer notification retries (#17592 ) just adds more `jitter` in our retries to avoid burst flooding for peer calls.	2023-07-07 07:47:38 -07:00
Kaan Kabalak	21fbe88e1f	Print certain log messages once per error (#17484 )	2023-06-24 20:29:13 -07:00
Harshavardhana	7605d07bb2	add support for bucket level request count per API (#17468 ) New metrics added to calculate API request count per bucket, per API. Captures errors, including 4xx, 5xx HTTP status codes separately.	2023-06-21 09:41:59 -07:00
Praveen raj Mani	7c72b25ef0	Add an option to make bucket notifications synchronous (#17406 ) With the current asynchronous behaviour in sending notification events to the targets, we can't provide guaranteed delivery as the systems might go for restarts. For such event-driven use-cases, we can provide an option to enable synchronous events where the APIs wait until the event is successfully sent or persisted. This commit adds 'MINIO_API_SYNC_EVENTS' env which when set to 'on' will enable sending/persisting events to targets synchronously.	2023-06-20 17:38:59 -07:00
Aditya Manthramurthy	5a1612fe32	Bump up madmin-go and pkg deps (#17469 )	2023-06-19 17:53:08 -07:00
Praveen raj Mani	72802a5972	Use 'minio/pkg/sync/errgroup' and 'minio/pkg/workers' (#17069 )	2023-04-25 22:57:40 -07:00
Anis Eleuch	6addc7a35d	server-info: Return initializing state properly (#17070 )	2023-04-24 09:10:02 -07:00
Krishnan Parthasarathi	56c57e2c53	tier-stats: Avoid repeated logs (#16774 )	2023-03-07 16:43:05 -08:00
Poorna	a6057c35cc	Avoid peer notification when peer is offline, tune retries (#16737 )	2023-03-07 08:13:28 -08:00
Klaus Post	9acf1024e4	Remove bloom filter (#16682 ) Removes the bloom filter since it has so limited usability, often gets saturated anyway and adds a bunch of complexity to the scanner. Also removes a tiny bit of CPU by each write operation.	2023-02-24 09:03:31 +05:30
Poorna	1b02e046c2	Fix bandwidth monitoring to be per remote target (#16360 )	2023-01-19 18:52:16 +05:30
Krishnan Parthasarathi	71c95ad0d0	Signal stop-rebalance to all rebalancing pools (#16438 )	2023-01-19 06:54:23 +05:30
Aditya Manthramurthy	a30cfdd88f	Bump up madmin-go to v2 (#16162 )	2022-12-06 13:46:50 -08:00
Harshavardhana	5a8df7efb3	re-implement StorageInfo to be a peer call (#16155 )	2022-12-01 14:31:35 -08:00
Anis Elleuch	7e73fc2870	Implement inspect data API v2 (#15474 ) Co-authored-by: Klaus Post <klauspost@gmail.com>	2022-11-02 13:36:38 -07:00
Krishnan Parthasarathi	4523da6543	feat: introduce pool-level rebalance (#15483 )	2022-10-25 12:36:57 -07:00
Harshavardhana	23b329b9df	remove gateway completely (#15929 )	2022-10-24 17:44:15 -07:00
Harshavardhana	2a13cc28f2	feat: implement support batch replication (#15554 )	2022-10-05 23:00:43 -07:00
Poorna	6b9fd256e1	Persist in-memory replication stats to disk (#15594 ) to avoid relying on scanner-calculated replication metrics. This will improve the accuracy of the replication stats reported. This PR also adds on to #15556 by handing replication traffic that could not be queued by available workers to the MRF queue so that entries in `PENDING` status are healed faster.	2022-09-12 12:40:02 -07:00
Harshavardhana	433b6fa8fe	upgrade golang-lint to the latest (#15600 )	2022-08-26 12:52:29 -07:00
Aditya Manthramurthy	afbb63a197	Factor out external event notification funcs (#15574 ) This change moves external event notification functionality into `event-notification.go`. This simplifies notification related code.	2022-08-24 06:42:36 -07:00
Anis Elleuch	b8cdf060c8	Properly replicate policy mapping for virtual users (#15558 ) Currently, replicating policy mapping for STS users does not work. Fix it is by passing user type to PolicyDBSet.	2022-08-23 11:11:45 -07:00
Anis Elleuch	5682685c80	Introduce disk io stats metrics (#15512 )	2022-08-16 07:13:49 -07:00
ebozduman	b57e7321e7	Replaces 'disk'=>'drive' visible to end user (#15464 )	2022-08-04 16:10:08 -07:00
Cesar Celis Hernandez	8ec888d13d	feat: update binary once and push it to other servers (#15407 )	2022-07-29 08:34:30 -07:00
Harshavardhana	5e763b71dc	use logger.LogOnce to reduce printing disconnection logs (#15408 ) fixes #15334 - re-use net/url parsed value for http.Request{} - remove gosimple, structcheck and unusued due to https://github.com/golangci/golangci-lint/issues/2649 - unwrapErrs upto leafErr to ensure that we store exactly the correct errors	2022-07-27 09:44:59 -07:00
Anis Elleuch	e4b51235f8	upgrade: Split in two steps to ensure a stable retry (#15396 ) Currently, if one server in a distributed setup fails to upgrade due to any reasons, it is not possible to upgrade again unless nodes are restarted. To fix this, split the upgrade process into two steps : - download the new binary on all servers - If successful, overwrite the old binary with the new one	2022-07-25 17:49:47 -07:00
Anis Elleuch	f23f442d33	Add cluster info to inspect/profiling archive (#15360 ) Add cluster info to inspect and profiling archive. In addition to the existing data generation for both inspect and profiling, cluster.info file is added. This latter contains some info of the cluster. The generation of cluster.info is is done as the last step and it can fail if it exceed 10 seconds.	2022-07-25 09:11:35 -07:00
Harshavardhana	b4eb74f5ff	allow custom speedtest bucket (#15271 ) this allows for specifying existing buckets with - object replication enabled - object encryption enabled - object versioning enabled - object locking enabled	2022-07-12 10:12:47 -07:00
Klaus Post	ac055b09e9	Add detailed scanner metrics (#15161 )	2022-07-05 14:45:49 -07:00
Harshavardhana	c7ed6eee5e	fix: background local test also via channel (#15086 ) current implementation for `standalone` setups was blocking the `perf drive`. Bonus: remove all old unused complicated code.	2022-06-15 14:51:42 -07:00
Harshavardhana	8082d1fed6	add bucket level S3 received/sent bytes (#15084 ) adds bucket level metrics for bytes received and sent bytes on all S3 API calls.	2022-06-14 15:14:24 -07:00
Harshavardhana	2420f6c000	fix: make metrics endpoint responsive by reducing the chatter (#15055 ) peerOnlineCounter was making NxN calls to many peers, this can be really long and tedious if there are random servers that are going down. Instead we should calculate online peers from the point of view of "self" and return those online and offline appropriately by performing a healthcheck.	2022-06-08 02:43:13 -07:00
Harshavardhana	f8650a3493	fetch bucket replication stats across peers in single call (#14956 ) current implementation relied on recursively calling one bucket at a time across all peers, this would be very slow and chatty when there are 100's of buckets which would mean 100*peerCount amount of network operations. This PR attempts to reduce this entire call into `peerCount` amount of network calls only. This functionality addresses also a concern where the Prometheus metrics would significantly slow down when one of the peers is offline.	2022-05-23 09:15:30 -07:00
Harshavardhana	040ac5cad8	fix: when logger queue is full exit quickly upon doneCh (#14928 ) Additionally only reload requested sub-system not everything	2022-05-16 16:10:51 -07:00
Harshavardhana	2a6a40e93b	enable go1.18.x builds (#14746 )	2022-04-13 14:21:55 -07:00
Harshavardhana	eda34423d7	update gofumpt -w - new changes	2022-04-13 12:00:11 -07:00
Krishna Srinivas	4d0715d226	Implement netperf for "mc support perf net" (#14397 ) Co-authored-by: Klaus Post <klauspost@gmail.com>	2022-03-08 09:54:38 -08:00

1 2 3 4 5

236 Commits