minio

Commit Graph

Author	SHA1	Message	Date
Harshavardhana	c0ac25bfff	fix: readiness needs to be like liveness (#9941 ) Readiness as no reasoning to be cluster scope because that is not how the k8s networking works for pods, all the pods to a deployment are not sharing the network in a singleton. Instead they are run as local scopes to themselves, with readiness failures the pod is potentially taken out of the network to be resolvable - this affects the distributed setup in myriad of different ways. Instead readiness should behave like liveness with local scope alone, and should be a dummy implementation. This PR all the startup times and overal k8s startup time dramatically improves. Added another handler called as `/minio/health/cluster` to understand the cluster scope health.	2020-06-30 11:28:27 -07:00
Harshavardhana	4790868878	allow background IAM load to speed up startup (#9796 ) Also fix healthcheck handler to run success only if object layer has initialized fully for S3 API access call.	2020-06-09 19:19:03 -07:00
Harshavardhana	5e529a1c96	simplify context timeout for readiness (#9772 ) additionally also add CORS support to restrict for specific origin, adds a new config and updated the documentation as well	2020-06-04 14:58:34 -07:00
Krishna Srinivas	7d19ab9f62	readiness returns error quickly if any of the set is down (#9662 ) This PR adds a new configuration parameter which allows readiness check to respond within 10secs, this can be reduced to a lower value if necessary using ``` mc admin config set api ready_deadline=5s ``` or ``` export MINIO_API_READY_DEADLINE=5s ```	2020-05-23 17:38:39 -07:00
Anis Elleuch	d4dcf1d722	metrics: Use StorageInfo() instead to have consistent info (#9006 ) Metrics used to have its own code to calculate offline disks. StorageInfo() was avoided because it is an expensive operation by sending calls to all nodes. To make metrics & server info share the same code, a new argument `local` is added to StorageInfo() so it will only query local disks when needed. Metrics now calls StorageInfo() as server info handler does but with the local flag set to false. Co-authored-by: Praveen raj Mani <praveen@minio.io> Co-authored-by: Harshavardhana <harsha@minio.io>	2020-02-20 09:21:33 +05:30
Harshavardhana	0879a4f743	rest/storage: Remove racy LastError usage (#8817 ) instead perform a liveness check call to verify if server is online and print relevant errors. Also introduce a StorageErr string error type instead of errors.New() deprecate usage of VerifyFileError, DeleteFileError for gob, change in datastructure also requires bump in storage REST version to v13. Fixes #8811	2020-01-14 18:45:17 -08:00
Praveen raj Mani	157721f694	Fix readiness to return 200 for read-only mode (#8728 ) - We should declare a cluster ready even if read quorum is achieved (atleast n/2 disks are online). - Such that, all the zones should have enough read quorum. Thus making the cluster ready for reads.	2020-01-02 05:05:01 -08:00
Praveen raj Mani	5d09233115	Fix Readiness check (#8681 ) - Remove goroutine-check in Readiness check - Bring in quorum check for readiness Fixes #8385 Co-authored-by: Harshavardhana <harsha@minio.io>	2019-12-28 22:24:43 +05:30
Harshavardhana	347b29d059	Implement bucket expansion (#8509 )	2019-11-19 17:42:27 -08:00
Harshavardhana	822eb5ddc7	Bring in safe mode support (#8478 ) This PR refactors object layer handling such that upon failure in sub-system initialization server reaches a stage of safe-mode operation wherein only certain API operations are enabled and available. This allows for fixing many scenarios such as - incorrect configuration in vault, etcd, notification targets - missing files, incomplete config migrations unable to read encrypted content etc - any other issues related to notification, policies, lifecycle etc	2019-11-09 09:27:23 -08:00
Harshavardhana	07a556a10b	Avoid ListBuckets() call instead rely on simple HTTP GET (#8475 ) This is to avoid making calls to backend and requiring gateways to allow permissions for ListBuckets() operation just for Liveness checks, we can avoid this and make our liveness checks to be more performant.	2019-11-01 16:58:10 -07:00
Harshavardhana	9e7a3e6adc	Extend further validation of config values (#8469 ) - This PR allows config KVS to be validated properly without being affected by ENV overrides, rejects invalid values during set operation - Expands unit tests and refactors the error handling for notification targets, returns error instead of ignoring targets for invalid KVS - Does all the prep-work for implementing safe-mode style operation for MinIO server, introduces a new global variable to toggle safe mode based operations NOTE: this PR itself doesn't provide safe mode operations	2019-10-30 23:39:09 -07:00
Nitish Tiwari	496fba3e9a	Return 200 OK for liveness checks while distributed cluster starts (#8176 ) With this PR, liveness check responds with 200 OK with "server-not- initialized" header while objectLayer gets initialized. The header is removed as objectLayer is initialized. This is to allow MinIO distributed cluster to get started when running on an orchestration platforms like Docker Swarm. This PR also updates sample Swarm yaml files to use correct values for healthcheck fields. Fixes #8140	2019-09-05 14:50:56 +05:30
Harshavardhana	5a28ef0d47	Bump readiness check upto 10000 go-routines (#8057 ) Most of our current workloads reach this value regularly, it doesn't make sense to keep 1000 go-routine limit.	2019-08-10 18:13:14 +05:30
Anis Elleuch	e857b6741d	Add one log in health checker liveness code (#7861 )	2019-07-06 16:38:39 -07:00
kannappanr	5ecac91a55	Replace Minio refs in docs with MinIO and links (#7494 )	2019-04-09 11:39:42 -07:00
Krishna Srinivas	267f183fc8	Do not do StorageInfo() and ListBuckets() for FS/Erasure in health check handler (#7090 ) Health checking programs very frequently use /minio/health/live to check health, hence we can avoid doing StorageInfo() and ListBuckets() for FS/Erasure backend.	2019-01-20 10:28:36 +05:30
Harshavardhana	166e998788	Fix healthcheck for NAS gateway (#6452 ) It was expected that in gateway mode, we do not know the backend types whereas in NAS gateway since its an extension of FS mode (standalone) this leads to an issue in LivenessCheckHandler() which would perpetually return 503, this would affect all kubernetes, openshift deployments of NAS gateway.	2018-09-11 13:44:10 -07:00
Nitish Tiwari	197af49c99	Fix healthcheck handler to verify gateway backend liveness (#6218 ) Fixes #6217	2018-07-31 10:55:34 -07:00
Harshavardhana	157ed65c35	Fix healthcheck handler to check errors in local disks only (#6184 ) Healthcheck handler in current implementation was performing ListBuckets() to check for liveness of Minio service. ListBuckets() implementation on the other hand doesn't do quorum based listing and if one of the disks returned error, an I/O error it would be lead to kubernetes taking the minio pod down prematurely even if the disk is not local to that minio server. The reason is ListBuckets() call cannot be trusted to provide us the valid information that we need, Minio is a clustered application which is designed to handle disk failures. Error on one of the disks doesn't mean the pod should become fully non-operational. This PR attempts to fix this by only checking for alive disks which are local to each setup and also by simply performing a Stat() operation, if the Stat() returned error on all disks local to a particular server then we can let kubernetes safely take it down, until then we should be operational.	2018-07-23 12:21:25 -07:00
Krishna Srinivas	9ede179a21	Use context.Background() instead of nil Rename Context[Get\|Set] -> [Get\|Set]Context	2018-03-15 16:28:25 -07:00
Krishna Srinivas	e452377b24	Add context to the object-interface methods. Make necessary changes to xl fs azure sia	2018-03-15 16:28:25 -07:00
Nitish Tiwari	10b01ac836	Add healthcheck endpoints (#5543 ) This PR adds readiness and liveness endpoints to probe Minio server instance health. Endpoints can only be accessed without authentication and the paths are /minio/health/live and /minio/health/ready for liveness and readiness respectively. The new healthcheck liveness endpoint is used for Docker healthcheck now. Fixes #5357 Fixes #5514	2018-03-12 11:46:53 +05:30

23 Commits