minio

Commit Graph

Author	SHA1	Message	Date
Anis Elleuch	4caed7cc0d	metrics: Add replication latency metrics (#13515 ) Add a new Prometheus metric for bucket replication latency e.g.: minio_bucket_replication_latency_ns{ bucket="testbucket", operation="upload", range="LESS_THAN_1_MiB", server="127.0.0.1:9001", targetArn="arn:minio:replication::45da043c-14f5-4da4-9316-aba5f77bf730:testbucket"} 2.2015663e+07 Co-authored-by: Klaus Post <klauspost@gmail.com>	2021-11-17 12:10:57 -08:00
Daniel A. Ochoa	07dd0692b6	Fix hdfs gateway concurrent map writes (#13596 ) Co-authored-by: Harshavardhana <harsha@minio.io>	2021-11-08 09:07:58 -08:00
Anis Elleuch	20761e053e	replication: Fix replica stats during crawling (#13499 ) Also show replica stats with an ARN in Prometheus output.	2021-10-22 19:13:50 -07:00
Poorna K	e7f559c582	Fixes to replication metrics (#13493 ) For reporting ReplicaSize and loading initial replication metrics correctly.	2021-10-21 18:52:55 -07:00
Poorna Krishnamoorthy	19ecdc75a8	replication: Simplify metrics calculation (#13274 ) Also doing some code cleanup	2021-09-22 10:48:45 -07:00
Poorna Krishnamoorthy	c4373ef290	Add support for multi site replication (#12880 )	2021-09-18 13:31:35 -07:00
Harshavardhana	1f262daf6f	rename all remaining packages to internal/ (#12418 ) This is to ensure that there are no projects that try to import `minio/minio/pkg` into their own repo. Any such common packages should go to `https://github.com/minio/pkg`	2021-06-01 14:59:40 -07:00
Poorna Krishnamoorthy	3690de0c6b	Drop Pending size and count from replication metrics (#12378 ) Real-time metrics calculated in-memory rely on the initial replication metrics saved with data usage. However, this can lag behind the actual state of the cluster at the time of server restart leading to inaccurate Pending size/counts reported to Prometheus. Dropping the Pending metrics as this can be more reliably monitored by applications with replication notifications. Signed-off-by: Poorna Krishnamoorthy <poorna@minio.io>	2021-05-31 20:26:52 -07:00
Harshavardhana	fdc2020b10	move to iam, bucket policy from minio/pkg (#12400 )	2021-05-29 21:16:42 -07:00
Harshavardhana	ebf75ef10d	fix: remove all unused code (#12360 )	2021-05-24 09:28:19 -07:00
Harshavardhana	1aa5858543	move madmin to github.com/minio/madmin-go (#12239 )	2021-05-06 08:52:02 -07:00
Harshavardhana	069432566f	update license change for MinIO Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-23 11:58:53 -07:00
Harshavardhana	d0d67f9de0	feat: allow prometheus for only authorized users (#12121 ) allow restrictions on who can access Prometheus endpoint, additionally add prometheus as part of diagnostics canned policy. Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-22 18:55:30 -07:00
Poorna Krishnamoorthy	075bccda42	Fix cluster bucket stats API for prometheus (#11970 ) Metrics calculation was accumulating inital usage across all nodes rather than using initial usage only once. Also fixing: - bug where all peer traffic was going to the same node. - reset counters when replication status changes from PENDING -> FAILED	2021-04-06 08:36:54 -07:00
Harshavardhana	09ee303244	add cluster support for realtime bucket stats (#11963 ) implementation in #11949 only catered from single node, but we need cluster metrics by capturing from all peers. introduce bucket stats API that will be used for capturing in-line bucket usage as well eventually	2021-04-04 15:34:33 -07:00
Poorna Krishnamoorthy	47c09a1e6f	Various improvements in replication (#11949 ) - collect real time replication metrics for prometheus. - add pending_count, failed_count metric for total pending/failed replication operations. - add API to get replication metrics - add MRF worker to handle spill-over replication operations - multiple issues found with replication - fixes an issue when client sends a bucket name with `/` at the end from SetRemoteTarget API call make sure to trim the bucket name to avoid any extra `/`. - hold write locks in GetObjectNInfo during replication to ensure that object version stack is not overwritten while reading the content. - add additional protection during WriteMetadata() to ensure that we always write a valid FileInfo{} and avoid ever writing empty FileInfo{} to the lowest layers. Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io> Co-authored-by: Harshavardhana <harsha@minio.io>	2021-04-03 09:03:42 -07:00
Anis Elleuch	2c296652f7	Simplify access to local node name (#11907 ) The local node name is heavily used in tracing, create a new global variable to store it. Multiple goroutines can access it since it won't be changed later.	2021-03-26 11:37:58 -07:00
Klaus Post	749e9c5771	metrics: Add canceled requests (#11881 ) Add metric for canceled requests	2021-03-24 10:25:27 -07:00
Harshavardhana	c6a120df0e	fix: Prometheus metrics to re-use storage disks (#11647 ) also re-use storage disks for all `mc admin server info` calls as well, implement a new LocalStorageInfo() API call at ObjectLayer to lookup local disks storageInfo also fixes bugs where there were double calls to StorageInfo()	2021-03-02 17:28:04 -08:00
Harshavardhana	ffea6fcf09	fix: rename crawler as scanner in config (#11549 )	2021-02-17 12:04:11 -08:00
Ritesh H Shukla	67a8f37df0	fix: disk usage capacity metric reporting (#11435 )	2021-02-04 12:26:58 -08:00
Ritesh H Shukla	b4add82bb6	Updated Prometheus metrics (#11141 ) * Add metrics for nodes online and offline * Add cluster capacity metrics * Introduce v2 metrics	2021-01-18 20:35:38 -08:00
Harshavardhana	a4383051d9	remove/deprecate crawler disable environment (#11214 ) with changes present to automatically throttle crawler at runtime, there is no need to have an environment value to disable crawling. crawling is a fundamental piece for healing, lifecycle and many other features there is no good reason anyone would need to disable this on a production system. * Apply suggestions from code review	2021-01-04 09:43:31 -08:00
Harshavardhana	e7ae49f9c9	fix: calculate prometheus disks_offline/disks_total correctly (#11215 ) fixes #11196	2021-01-04 09:42:09 -08:00
Anis Elleuch	2ecaab55a6	admin: ServerInfo returns info without object layer initialized (#11142 )	2020-12-21 09:35:19 -08:00
Harshavardhana	4a564336fe	Revert "Add metrics for nodes online and offline (#11050 )" This reverts commit `f60bbdf86b`.	2020-12-08 09:23:35 -08:00
Ritesh H Shukla	f60bbdf86b	Add metrics for nodes online and offline (#11050 )	2020-12-08 01:06:27 -08:00
Poorna Krishnamoorthy	f3beb1236a	Add cache usage, total capacity to prometheus metrics (#11026 )	2020-12-07 16:35:11 -08:00
Ritesh H Shukla	038bcd9079	Add replication capacity metrics support in crawler (#10786 )	2020-12-07 13:47:48 -08:00
Harshavardhana	a0d0645128	remove safeMode behavior in startup (#10645 ) In almost all scenarios MinIO now is mostly ready for all sub-systems independently, safe-mode is not useful anymore and do not serve its original intended purpose. allow server to be fully functional even with config partially configured, this is to cater for availability of actual I/O v/s manually fixing the server. In k8s like environments it will never make sense to take pod into safe-mode state, because there is no real access to perform any remote operation on them.	2020-10-09 09:59:52 -07:00
飞雪无情	ea1803417f	Use constants for gateway names to avoid bugs caused by spelling. (#10355 )	2020-08-26 08:52:46 -07:00
Harshavardhana	caad314faa	add ruleguard support, fix all the reported issues (#10335 )	2020-08-24 12:11:20 -07:00
Harshavardhana	6669560cb9	turn-off bucket usage metrics in gateway mode (#10150 ) closes #10147	2020-07-28 13:04:26 -07:00
Harshavardhana	e7d7d5232c	fix: admin info output and improve overall performance (#10015 ) - admin info node offline check is now quicker - admin info now doesn't duplicate the code across doing the same checks for disks - rely on StorageInfo to return appropriate errors instead of calling locally. - diskID checks now return proper errors when disk not found v/s format.json missing. - add more disk states for more clarity on the underlying disk errors.	2020-07-13 09:51:07 -07:00
Harshavardhana	f9aa239973	fix: export prometheus metrics for cache GC triggers (#9815 ) Bonus change to use channel to serialize triggers, instead of using atomic variables. More efficient mechanism for synchronization. Co-authored-by: Nitish Tiwari <nitish@minio.io>	2020-06-15 09:05:35 -07:00
Harshavardhana	4915433bd2	Support bucket versioning (#9377 ) - Implement a new xl.json 2.0.0 format to support, this moves the entire marshaling logic to POSIX layer, top layer always consumes a common FileInfo construct which simplifies the metadata reads. - Implement list object versions - Migrate to siphash from crchash for new deployments for object placements. Fixes #2111	2020-06-12 20:04:01 -07:00
Harshavardhana	b2db8123ec	Preserve errors returned by diskInfo to detect disk errors (#9727 ) This PR basically reverts #9720 and re-implements it differently	2020-05-28 13:03:04 -07:00
Harshavardhana	53aaa5d2a5	Export bucket usage counts as part of bucket metrics (#9710 ) Bonus fixes in quota enforcement to use the new datastructure and use timedValue to cache a value/reload automatically avoids one less global variable.	2020-05-27 06:45:43 -07:00
Harshavardhana	6ac48a65cb	fix: use unused cacheMetrics code in prometheus (#9588 ) remove all other unusued/deadcode	2020-05-13 08:15:26 -07:00
Harshavardhana	27d716c663	simplify usage of mutexes and atomic constants (#9501 )	2020-05-03 22:35:40 -07:00
Harshavardhana	f44cfb2863	use GlobalContext whenever possible (#9280 ) This change is throughout the codebase to ensure that all codepaths honor GlobalContext	2020-04-09 09:30:02 -07:00
poornas	336460f67e	fix: gateway_s3_bytes_sent metric for all API methods (#9242 ) Co-authored-by: Harshavardhana <harsha@minio.io>	2020-04-01 12:52:31 -07:00
Nitish Tiwari	6b984410d5	Add support for self-healing related metrics in Prometheus (#9079 ) Fixes #8988 Co-authored-by: Anis Elleuch <vadmeste@users.noreply.github.com> Co-authored-by: Harshavardhana <harsha@minio.io>	2020-03-24 22:40:45 -07:00
Anis Elleuch	d4dcf1d722	metrics: Use StorageInfo() instead to have consistent info (#9006 ) Metrics used to have its own code to calculate offline disks. StorageInfo() was avoided because it is an expensive operation by sending calls to all nodes. To make metrics & server info share the same code, a new argument `local` is added to StorageInfo() so it will only query local disks when needed. Metrics now calls StorageInfo() as server info handler does but with the local flag set to false. Co-authored-by: Praveen raj Mani <praveen@minio.io> Co-authored-by: Harshavardhana <harsha@minio.io>	2020-02-20 09:21:33 +05:30
Nitish Tiwari	63be4709b7	Add metrics support for Azure & GCS Gateway (#8954 ) We added support for caching and S3 related metrics in #8591. As a continuation, it would be helpful to add support for Azure & GCS gateway related metrics as well.	2020-02-11 21:08:01 +05:30
Nitish Tiwari	24ad59316d	Use atomic.Uint64 for gateway metrics count instead of mutex (#8615 )	2019-12-07 11:21:52 +05:30
Nitish Tiwari	3df7285c3c	Add Support for Cache and S3 related metrics in Prometheus endpoint (#8591 ) This PR adds support below metrics - Cache Hit Count - Cache Miss Count - Data served from Cache (in Bytes) - Bytes received from AWS S3 - Bytes sent to AWS S3 - Number of requests sent to AWS S3 Fixes #8549	2019-12-05 23:16:06 -08:00
Harshavardhana	347b29d059	Implement bucket expansion (#8509 )	2019-11-19 17:42:27 -08:00
Harshavardhana	822eb5ddc7	Bring in safe mode support (#8478 ) This PR refactors object layer handling such that upon failure in sub-system initialization server reaches a stage of safe-mode operation wherein only certain API operations are enabled and available. This allows for fixing many scenarios such as - incorrect configuration in vault, etcd, notification targets - missing files, incomplete config migrations unable to read encrypted content etc - any other issues related to notification, policies, lifecycle etc	2019-11-09 09:27:23 -08:00
Harshavardhana	9e7a3e6adc	Extend further validation of config values (#8469 ) - This PR allows config KVS to be validated properly without being affected by ENV overrides, rejects invalid values during set operation - Expands unit tests and refactors the error handling for notification targets, returns error instead of ignoring targets for invalid KVS - Does all the prep-work for implementing safe-mode style operation for MinIO server, introduces a new global variable to toggle safe mode based operations NOTE: this PR itself doesn't provide safe mode operations	2019-10-30 23:39:09 -07:00

1 2

61 Commits