minio

Commit Graph

Author	SHA1	Message	Date
Harshavardhana	1971c54a50	update buffer channels for both trace and listen events (#18171 ) - Trace needs higher buffered channels than 4000 to ensure when we run `mc admin trace -a` it captures all information sufficiently. - Listen event notification needs the event channel to be `apiRequestsMaxPerNode` * number of nodes	2023-10-05 18:16:04 -06:00
Shireesh Anjal	6d20ec3bea	Add support for resource metrics (#18057 ) Add a new endpoint for "resource" metrics `/v2/metrics/resource` This should return system metrics related to drives, network, CPU and memory. Except for drives, other metrics should have corresponding "avg" and "max" values also. Reuse the real-time feature to capture the required data, introducing CPU and memory metrics in it. Collect the data every minute and keep updating the average and max values accordingly, returning the latest values when the API is called.	2023-09-30 13:40:20 -07:00
Aditya Manthramurthy	1c99fb106c	Update to minio/pkg/v2 (#17967 )	2023-09-04 12:57:37 -07:00
Poorna	b48bbe08b2	Add additional info for replication metrics API (#17293 ) to track the replication transfer rate across different nodes, number of active workers in use and in-queue stats to get an idea of the current workload. This PR also adds replication metrics to the site replication status API. For site replication, prometheus metrics are no longer at the bucket level - but at the cluster level. Add prometheus metric to track credential errors since uptime	2023-08-30 01:00:59 -07:00
jiuker	a99cd825ab	fix: byHost realTime metrics API (#17681 )	2023-07-18 23:50:30 -07:00
Harshavardhana	6426b74770	move bucket centric metrics to /minio/v2/metrics/bucket handlers (#17663 ) users/customers do not have a reasonable number of buckets anymore, this is why we must avoid overpopulating cluster endpoints, instead move the bucket monitoring to a separate endpoint. some of it's a breaking change here for a couple of metrics, but it is imperative that we do it to improve the responsiveness of our Prometheus cluster endpoint. Bonus: Added new cluster metrics for usage, objects and histograms	2023-07-18 22:25:12 -07:00
Poorna	5e2f8d7a42	replication: Simplify mrf requeueing and add backlog handler (#17171 ) Simplify MRF queueing and add backlog handler - Limit re-tries to 3 to avoid repeated re-queueing. Fall offs to be re-tried when the scanner revisits this object or upon access. - Change MRF to have each node process only its MRF entries. - Collect MRF backlog by the node to allow for current backlog visibility	2023-07-12 23:51:33 -07:00
Anis Eleuch	6d0bc5ab1e	prometheus: Fix internode stats (#17594 ) Internode calculation was done inside S3 handlers, fix it by moving it to internode handlers. Remove admin stats since it is not used.	2023-07-08 07:35:11 -07:00
Kaan Kabalak	21fbe88e1f	Print certain log messages once per error (#17484 )	2023-06-24 20:29:13 -07:00
Harshavardhana	7605d07bb2	add support for bucket level request count per API (#17468 ) New metrics added to calculate API request count per bucket, per API. Captures errors, including 4xx, 5xx HTTP status codes separately.	2023-06-21 09:41:59 -07:00
Aditya Manthramurthy	5a1612fe32	Bump up madmin-go and pkg deps (#17469 )	2023-06-19 17:53:08 -07:00
Aditya Manthramurthy	518f6e4d39	fix: missing return after error response (#16920 )	2023-03-29 16:21:13 -07:00
Krishnan Parthasarathi	31fba6f434	Save bootstrap trace events in a circular buffer (#16823 )	2023-03-17 16:01:03 -07:00
Klaus Post	9acf1024e4	Remove bloom filter (#16682 ) Removes the bloom filter since it has so limited usability, often gets saturated anyway and adds a bunch of complexity to the scanner. Also removes a tiny bit of CPU by each write operation.	2023-02-24 09:03:31 +05:30
Daniel Valdivia	fb17f97cf3	move audit and logger message structure to minio/pkg (#16655 ) Signed-off-by: Daniel Valdivia <18384552+dvaldivia@users.noreply.github.com>	2023-02-21 21:21:17 -08:00
Harshavardhana	31b0decd46	migrate to minio/mux from gorilla/mux (#16456 )	2023-01-23 16:42:47 +05:30
Anis Elleuch	3039fd4519	Optimize background heal status to use LocalStorageInfo (#16414 )	2023-01-17 05:02:00 +05:30
Aditya Manthramurthy	a30cfdd88f	Bump up madmin-go to v2 (#16162 )	2022-12-06 13:46:50 -08:00
Harshavardhana	5a8df7efb3	re-implement StorageInfo to be a peer call (#16155 )	2022-12-01 14:31:35 -08:00
Poorna	d6bc141bd1	feat: Add support for site level resync (#15753 )	2022-11-14 07:16:40 -08:00
Klaus Post	ddeca9f12a	fix: filter rest errors and logs returned (#16019 )	2022-11-07 10:38:08 -08:00
Klaus Post	71954faa3a	mark pubsub type safe via generics (#15961 )	2022-10-28 10:55:42 -07:00
Krishnan Parthasarathi	4523da6543	feat: introduce pool-level rebalance (#15483 )	2022-10-25 12:36:57 -07:00
Harshavardhana	2a13cc28f2	feat: implement support batch replication (#15554 )	2022-10-05 23:00:43 -07:00
Poorna	6b9fd256e1	Persist in-memory replication stats to disk (#15594 ) to avoid relying on scanner-calculated replication metrics. This will improve the accuracy of the replication stats reported. This PR also adds on to #15556 by handing replication traffic that could not be queued by available workers to the MRF queue so that entries in `PENDING` status are healed faster.	2022-09-12 12:40:02 -07:00
Anis Elleuch	97a6322de1	Fix regression in notifying peers about new policy mapping (#15583 ) Switch from mux.Vars() to r.Form to avoid the issue of missing arguments passed to LoadPolicyMappingHandler.	2022-08-24 12:34:52 -07:00
Aditya Manthramurthy	afbb63a197	Factor out external event notification funcs (#15574 ) This change moves external event notification functionality into `event-notification.go`. This simplifies notification related code.	2022-08-24 06:42:36 -07:00
Anis Elleuch	b8cdf060c8	Properly replicate policy mapping for virtual users (#15558 ) Currently, replicating policy mapping for STS users does not work. Fix it is by passing user type to PolicyDBSet.	2022-08-23 11:11:45 -07:00
Anis Elleuch	5682685c80	Introduce disk io stats metrics (#15512 )	2022-08-16 07:13:49 -07:00
Cesar Celis Hernandez	8ec888d13d	feat: update binary once and push it to other servers (#15407 )	2022-07-29 08:34:30 -07:00
Anis Elleuch	e4b51235f8	upgrade: Split in two steps to ensure a stable retry (#15396 ) Currently, if one server in a distributed setup fails to upgrade due to any reasons, it is not possible to upgrade again unless nodes are restarted. To fix this, split the upgrade process into two steps : - download the new binary on all servers - If successful, overwrite the old binary with the new one	2022-07-25 17:49:47 -07:00
Harshavardhana	b4eb74f5ff	allow custom speedtest bucket (#15271 ) this allows for specifying existing buckets with - object replication enabled - object encryption enabled - object versioning enabled - object locking enabled	2022-07-12 10:12:47 -07:00
Klaus Post	ac055b09e9	Add detailed scanner metrics (#15161 )	2022-07-05 14:45:49 -07:00
Harshavardhana	c7ed6eee5e	fix: background local test also via channel (#15086 ) current implementation for `standalone` setups was blocking the `perf drive`. Bonus: remove all old unused complicated code.	2022-06-15 14:51:42 -07:00
Harshavardhana	8082d1fed6	add bucket level S3 received/sent bytes (#15084 ) adds bucket level metrics for bytes received and sent bytes on all S3 API calls.	2022-06-14 15:14:24 -07:00
Harshavardhana	2420f6c000	fix: make metrics endpoint responsive by reducing the chatter (#15055 ) peerOnlineCounter was making NxN calls to many peers, this can be really long and tedious if there are random servers that are going down. Instead we should calculate online peers from the point of view of "self" and return those online and offline appropriately by performing a healthcheck.	2022-06-08 02:43:13 -07:00
Anis Elleuch	fd02492cb7	avoid limits on the number of parallel trace/bucket notifications listeners (#14799 ) Simplifies overall limits on the incoming listeners for notifications. Fixes #14566	2022-06-05 14:29:12 -07:00
Harshavardhana	f8650a3493	fetch bucket replication stats across peers in single call (#14956 ) current implementation relied on recursively calling one bucket at a time across all peers, this would be very slow and chatty when there are 100's of buckets which would mean 100*peerCount amount of network operations. This PR attempts to reduce this entire call into `peerCount` amount of network calls only. This functionality addresses also a concern where the Prometheus metrics would significantly slow down when one of the peers is offline.	2022-05-23 09:15:30 -07:00
Harshavardhana	040ac5cad8	fix: when logger queue is full exit quickly upon doneCh (#14928 ) Additionally only reload requested sub-system not everything	2022-05-16 16:10:51 -07:00
Harshavardhana	35dea24ffd	fix: console log peer API from its broken implementation (#14873 ) console logging peer API was broken as it would timeout after 15minutes, this never really worked beyond this value and basically failed to provide the streaming "log" functionality that was expected from this implementation. also fix convoluted channel handling by keeping things simple, this is rewritten.	2022-05-06 12:39:58 -07:00
Krishna Srinivas	4d0715d226	Implement netperf for "mc support perf net" (#14397 ) Co-authored-by: Klaus Post <klauspost@gmail.com>	2022-03-08 09:54:38 -08:00
Harshavardhana	9d7648f02f	reduce unnecessary logging during speedtest (#14387 ) - speedtest logs calls that were canceled spuriously, in situations where it should be ignored. - all errors of interest are always sent back to the client there is no need to log them on the server console. - PUT failures should negate the increments such that GET is not attempted on unsuccessful calls. - do not attempt MRF on speedtest objects.	2022-02-23 11:59:13 -08:00
Sidhartha Mani	d7df6bc738	add support for speedtest drive (#14182 )	2022-02-01 22:38:05 -08:00
Harshavardhana	cf407f7176	do not expect 'speedtest' to be a bucket (#14199 ) fixes #14196	2022-01-27 08:13:03 -08:00
Krishnan Parthasarathi	d2e5f01542	feat: maintain in-memory tier stats for the last 24hrs (#13782 )	2022-01-26 14:33:10 -08:00
Harshavardhana	76b21de0c6	feat: decommission feature for pools (#14012 ) ``` λ mc admin decommission start alias/ http://minio{1...2}/data{1...4} ``` ``` λ mc admin decommission status alias/ ┌─────┬─────────────────────────────────┬──────────────────────────────────┬────────┐ │ ID │ Pools │ Capacity │ Status │ │ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Active │ │ 2nd │ http://minio{3...4}/data{1...4} │ 329 GiB (used) / 421 GiB (total) │ Active │ └─────┴─────────────────────────────────┴──────────────────────────────────┴────────┘ ``` ``` λ mc admin decommission status alias/ http://minio{1...2}/data{1...4} Progress: ===================> [1GiB/sec] [15%] [4TiB/50TiB] Time Remaining: 4 hours (started 3 hours ago) ``` ``` λ mc admin decommission status alias/ http://minio{1...2}/data{1...4} ERROR: This pool is not scheduled for decommissioning currently. ``` ``` λ mc admin decommission cancel alias/ ┌─────┬─────────────────────────────────┬──────────────────────────────────┬──────────┐ │ ID │ Pools │ Capacity │ Status │ │ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Draining │ └─────┴─────────────────────────────────┴──────────────────────────────────┴──────────┘ ``` > NOTE: Canceled decommission will not make the pool active again, since we might have > Potentially partial duplicate content on the other pools, to avoid this scenario be > very sure to start decommissioning as a planned activity. ``` λ mc admin decommission cancel alias/ http://minio{1...2}/data{1...4} ┌─────┬─────────────────────────────────┬──────────────────────────────────┬────────────────────┐ │ ID │ Pools │ Capacity │ Status │ │ 1st │ http://minio{1...2}/data{1...4} │ 439 GiB (used) / 561 GiB (total) │ Draining(Canceled) │ └─────┴─────────────────────────────────┴──────────────────────────────────┴────────────────────┘ ```	2022-01-10 09:07:49 -08:00
Harshavardhana	f527c708f2	run gofumpt cleanup across code-base (#14015 )	2022-01-02 09:15:06 -08:00
Aditya Manthramurthy	6fbf4f96b6	Move last remaining IAM notification calls into IAMSys methods (#13941 )	2021-12-21 02:16:50 -08:00
Harshavardhana	818f0201fc	re-implement prometheus metrics endpoint to be simpler (#13922 ) data-structures were repeatedly initialized this causes GC pressure, instead re-use the collectors. Initialize collectors in `init()`, also make sure to honor the cache semantics for performance requirements. Avoid a global map and a global lock for metrics lookup instead let them all be lock-free unless the cache is being invalidated.	2021-12-17 10:11:04 -08:00
Harshavardhana	b9aae1aaae	fix: speedtest should exit upon errors cleanly (#13851 ) - deleteBucket() should be called for cleanup if client abruptly disconnects - out of disk errors should be sent to client properly and also cancel the calls - limit concurrency to available MAXPROCS not 32 for auto-tuned setup, if procs are beyond 32 then continue normally. this is to handle smaller setups. fixes #13834	2021-12-06 16:36:14 -08:00

1 2 3 4

182 Commits