minio

Commit Graph

Author	SHA1	Message	Date
Klaus Post	f1302c40fe	Fix uninitialized replication stats (#20260 ) Services are unfrozen before `initBackgroundReplication` is finished. This means that the globalReplicationStats write is racy. Switch to an atomic pointer. Provide the `ReplicationPool` with the stats, so it doesn't have to be grabbed from the atomic pointer on every use. All other loads and checks are nil, and calls return empty values when stats still haven't been initialized.	2024-08-15 05:04:40 -07:00
Harshavardhana	53aa8f5650	use typos instead of codespell (#19088 )	2024-02-21 22:26:06 -08:00
Poorna	27d02ea6f7	metrics: add replication metrics on proxied requests (#18957 )	2024-02-05 22:00:45 -08:00
Poorna	b48bbe08b2	Add additional info for replication metrics API (#17293 ) to track the replication transfer rate across different nodes, number of active workers in use and in-queue stats to get an idea of the current workload. This PR also adds replication metrics to the site replication status API. For site replication, prometheus metrics are no longer at the bucket level - but at the cluster level. Add prometheus metric to track credential errors since uptime	2023-08-30 01:00:59 -07:00
Poorna	ca2a1c3f60	replication: clone metrics while loading metrics cache (#16462 )	2023-01-24 02:10:32 -08:00
Harshavardhana	a5f8af4efb	serialize replication stats() only when needed (#16280 )	2022-12-20 00:07:53 -08:00
Klaus Post	2894dd4d1a	fix: hold lock while serializing replication stats (#16007 )	2022-11-04 09:59:14 -07:00
Harshavardhana	94dbb4a427	fix: generalize SC config and also skip healing sub-sys under SD (#15757 )	2022-09-26 09:04:54 -07:00
Minio Trusted	d89f6af6c4	avoid replication stats crash in Prometheus	2022-09-16 17:09:45 -07:00
Poorna	6b9fd256e1	Persist in-memory replication stats to disk (#15594 ) to avoid relying on scanner-calculated replication metrics. This will improve the accuracy of the replication stats reported. This PR also adds on to #15556 by handing replication traffic that could not be queued by available workers to the MRF queue so that entries in `PENDING` status are healed faster.	2022-09-12 12:40:02 -07:00
Harshavardhana	f8650a3493	fetch bucket replication stats across peers in single call (#14956 ) current implementation relied on recursively calling one bucket at a time across all peers, this would be very slow and chatty when there are 100's of buckets which would mean 100*peerCount amount of network operations. This PR attempts to reduce this entire call into `peerCount` amount of network calls only. This functionality addresses also a concern where the Prometheus metrics would significantly slow down when one of the peers is offline.	2022-05-23 09:15:30 -07:00
Aditya Manthramurthy	9aadd725d2	Avoid calling .Reset() on active timer (#14941 ) .Reset() documentation states: For a Timer created with NewTimer, Reset should be invoked only on stopped or expired timers with drained channels. This change is just to comply with this requirement as there might be some runtime dependent situation that might lead to unexpected behavior.	2022-05-18 15:37:58 -07:00
Harshavardhana	6cfb1cb6fd	fix: timer usage across codebase (#14935 ) it seems in some places we have been wrongly using the timer.Reset() function, nicely exposed by an example shared by @donatello https://go.dev/play/p/qoF71_D1oXD this PR fixes all the usage comprehensively	2022-05-17 22:42:59 -07:00
Harshavardhana	f527c708f2	run gofumpt cleanup across code-base (#14015 )	2022-01-02 09:15:06 -08:00
Aditya Manthramurthy	997e808088	fix; race in bucket replication stats (#13942 ) - r.ulock was not locked when r.UsageCache was being modified Bonus: - simplify code by removing some unnecessary clone methods - we can do this because go arrays are values (not pointers/references) that are automatically copied on assignment. - remove some unnecessary map allocation calls	2021-12-17 15:33:13 -08:00
Anis Elleuch	4caed7cc0d	metrics: Add replication latency metrics (#13515 ) Add a new Prometheus metric for bucket replication latency e.g.: minio_bucket_replication_latency_ns{ bucket="testbucket", operation="upload", range="LESS_THAN_1_MiB", server="127.0.0.1:9001", targetArn="arn:minio:replication::45da043c-14f5-4da4-9316-aba5f77bf730:testbucket"} 2.2015663e+07 Co-authored-by: Klaus Post <klauspost@gmail.com>	2021-11-17 12:10:57 -08:00
Poorna K	e7f559c582	Fixes to replication metrics (#13493 ) For reporting ReplicaSize and loading initial replication metrics correctly.	2021-10-21 18:52:55 -07:00
Poorna Krishnamoorthy	c4373ef290	Add support for multi site replication (#12880 )	2021-09-18 13:31:35 -07:00
Harshavardhana	1f262daf6f	rename all remaining packages to internal/ (#12418 ) This is to ensure that there are no projects that try to import `minio/minio/pkg` into their own repo. Any such common packages should go to `https://github.com/minio/pkg`	2021-06-01 14:59:40 -07:00
Poorna Krishnamoorthy	3690de0c6b	Drop Pending size and count from replication metrics (#12378 ) Real-time metrics calculated in-memory rely on the initial replication metrics saved with data usage. However, this can lag behind the actual state of the cluster at the time of server restart leading to inaccurate Pending size/counts reported to Prometheus. Dropping the Pending metrics as this can be more reliably monitored by applications with replication notifications. Signed-off-by: Poorna Krishnamoorthy <poorna@minio.io>	2021-05-31 20:26:52 -07:00
Harshavardhana	82dc6aff1c	add support for configurable replication MRF workers (#12125 ) just like replication workers, allow failed replication workers to be configurable in situations like DR failures etc to catch up on replication sooner when DR is back online. Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-23 21:58:45 -07:00
Harshavardhana	069432566f	update license change for MinIO Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-23 11:58:53 -07:00
Poorna Krishnamoorthy	40409437cd	Add initial usage in GetBucketReplicationMetrics API (#11985 )	2021-04-06 11:32:52 -07:00
Poorna Krishnamoorthy	075bccda42	Fix cluster bucket stats API for prometheus (#11970 ) Metrics calculation was accumulating inital usage across all nodes rather than using initial usage only once. Also fixing: - bug where all peer traffic was going to the same node. - reset counters when replication status changes from PENDING -> FAILED	2021-04-06 08:36:54 -07:00
Harshavardhana	09ee303244	add cluster support for realtime bucket stats (#11963 ) implementation in #11949 only catered from single node, but we need cluster metrics by capturing from all peers. introduce bucket stats API that will be used for capturing in-line bucket usage as well eventually	2021-04-04 15:34:33 -07:00
Poorna Krishnamoorthy	47c09a1e6f	Various improvements in replication (#11949 ) - collect real time replication metrics for prometheus. - add pending_count, failed_count metric for total pending/failed replication operations. - add API to get replication metrics - add MRF worker to handle spill-over replication operations - multiple issues found with replication - fixes an issue when client sends a bucket name with `/` at the end from SetRemoteTarget API call make sure to trim the bucket name to avoid any extra `/`. - hold write locks in GetObjectNInfo during replication to ensure that object version stack is not overwritten while reading the content. - add additional protection during WriteMetadata() to ensure that we always write a valid FileInfo{} and avoid ever writing empty FileInfo{} to the lowest layers. Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io> Co-authored-by: Harshavardhana <harsha@minio.io>	2021-04-03 09:03:42 -07:00

26 Commits