minio

Commit Graph

Author	SHA1	Message	Date
Andreas Auernhammer	8b660e18f2	kms: add support for MinKMS and remove some unused/broken code (#19368 ) This commit adds support for MinKMS. Now, there are three KMS implementations in `internal/kms`: Builtin, MinIO KES and MinIO KMS. Adding another KMS integration required some cleanup. In particular: - Various KMS APIs that haven't been and are not used have been removed. A lot of the code was broken anyway. - Metrics are now monitored by the `kms.KMS` itself. For basic metrics this is simpler than collecting metrics for external servers. In particular, each KES server returns its own metrics and no cluster-level view. - The builtin KMS now uses the same en/decryption implemented by MinKMS and KES. It still supports decryption of the previous ciphertext format. It's backwards compatible. - Data encryption keys now include a master key version since MinKMS supports multiple versions (~4 billion in total and 10000 concurrent) per key name. Signed-off-by: Andreas Auernhammer <github@aead.dev>	2024-05-07 16:55:37 -07:00
Andreas Auernhammer	faeb2b7e79	use `GenerateKey` as more reliable KMS health-check (#19404 ) This commit replaces the `KMS.Stat` API call with a `KMS.GenerateKey` call. This approach is more reliable since data key generation also works when the KMS backend is unavailable (temp. offline), but KES has cached the key. Ref: KES offline caching. With this change, it is less likely that MinIO readiness checks fail in cases where the KMS backend is offline. Signed-off-by: Andreas Auernhammer <github@aead.dev>	2024-04-03 14:13:20 -07:00
Harshavardhana	607cafadbc	converge clusterRead health into cluster health (#19063 )	2024-02-15 16:48:36 -08:00
Harshavardhana	dd2542e96c	add codespell action (#18818 ) Original work here, #18474, refixed and updated.	2024-01-17 23:03:17 -08:00
Andreas Auernhammer	0daa2dbf59	health: split liveness and readiness handler (#18457 ) This commit splits the liveness and readiness handler into two separate handlers. In K8S, a liveness probe is used to determine whether the pod is in "live" state and functioning at all. In contrast, the readiness probe is used to determine whether the pod is ready to serve requests. A failing liveness probe causes pod restarts while a failing readiness probe causes k8s to stop routing traffic to the pod. Hence, a liveness probe should be as robust as possible while a readiness probe should be used to load balancing. Ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ Signed-off-by: Andreas Auernhammer <github@aead.dev>	2023-11-16 01:51:27 -08:00
Harshavardhana	114fab4c70	export cluster health as prometheus metrics (#17741 )	2023-07-28 01:16:53 -07:00
Harshavardhana	5569acd95c	disallow EC:0 if not set during server startup (#17141 )	2023-05-04 14:44:30 -07:00
Anis Eleuch	06cd0a636e	Avoid calling KES Status when peers ping each other (#17140 )	2023-05-04 11:28:33 -07:00
Minio Trusted	4bc52897b2	Update yaml files to latest version RELEASE.2023-03-22T06-36-24Z	2023-03-22 21:16:15 +00:00
Harshavardhana	12047702f5	fix: tweak the maintenance=true to satisfy baremetal first (#16864 )	2023-03-21 08:48:38 -07:00
Harshavardhana	ae029191a3	liveness returns "busy" if queued requests > available capacity (#16719 )	2023-02-27 08:34:52 -08:00
Harshavardhana	21885f9457	fix: liveness/readiness must return errors if KMS is unreachable (#16540 )	2023-02-06 08:55:56 -08:00
Harshavardhana	23b329b9df	remove gateway completely (#15929 )	2022-10-24 17:44:15 -07:00
Harshavardhana	6d53e3c2d7	reduce number of middleware handlers (#13546 ) - combine similar looking functionalities into single handlers, and remove unnecessary proxying of the requests at handler layer. - remove bucket forwarding handler as part of default setup add it only if bucket federation is enabled. Improvements observed for 1kiB object reads. ``` ------------------- Operation: GET Operations: 4538555 -> 4595804 * Average: +1.26% (+0.2 MiB/s) throughput, +1.26% (+190.2) obj/s * Fastest: +4.67% (+0.7 MiB/s) throughput, +4.67% (+739.8) obj/s * 50% Median: +1.15% (+0.2 MiB/s) throughput, +1.15% (+173.9) obj/s ```	2021-11-01 08:04:03 -07:00
Harshavardhana	3837d2b94b	simplify credentials handling in S3 gateway (#13373 ) change credentials handling such that prefer MINIO_* envs first if they work, if not fallback to AWS credentials. If they fail we fail to start anyways.	2021-10-07 15:34:01 -07:00
Harshavardhana	1250312287	fail ready/liveness if etcd is unhealthy in gateway mode (#13146 )	2021-09-03 17:05:41 -07:00
Harshavardhana	a2cd3c9a1d	use ParseForm() to allow query param lookups once (#12900 ) ``` cpu: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz BenchmarkURLQueryForm BenchmarkURLQueryForm-4 247099363 4.809 ns/op 0 B/op 0 allocs/op BenchmarkURLQuery BenchmarkURLQuery-4 2517624 462.1 ns/op 432 B/op 4 allocs/op PASS ok github.com/minio/minio/cmd 3.848s ```	2021-08-07 22:43:01 -07:00
Harshavardhana	1f262daf6f	rename all remaining packages to internal/ (#12418 ) This is to ensure that there are no projects that try to import `minio/minio/pkg` into their own repo. Any such common packages should go to `https://github.com/minio/pkg`	2021-06-01 14:59:40 -07:00
Harshavardhana	069432566f	update license change for MinIO Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-23 11:58:53 -07:00
Krishna Srinivas	876b79b8d8	read-health check endpoint returns success if cluster can serve read requests (#11310 )	2021-02-09 01:00:44 -08:00
Harshavardhana	97692bc772	re-route requests if IAM is not initialized (#10850 )	2020-11-07 21:03:06 -08:00
Harshavardhana	02cfa774be	allow requests to be proxied when server is booting up (#10790 ) when server is booting up there is a possibility that users might see '503' because object layer when not initialized, then the request is proxied to neighboring peers first one which is online.	2020-10-30 12:20:28 -07:00
Harshavardhana	8b74a72b21	fix: rename READY deadline to CLUSTER deadline ENV (#10535 )	2020-09-23 09:14:33 -07:00
Harshavardhana	b0e1d4ce78	re-attach offline drive after new drive replacement (#10416 ) inconsistent drive healing when one of the drive is offline while a new drive was replaced, this change is to ensure that we can add the offline drive back into the mix by healing it again.	2020-09-04 17:09:02 -07:00
Harshavardhana	8a291e1dc0	Cluster healthcheck improvements (#10408 ) - do not fail the healthcheck if heal status was not obtained from one of the nodes, if many nodes fail then report this as a catastrophic error. - add "x-minio-write-quorum" value to match the write tolerance supported by server. - admin info now states if a drive is healing where madmin.Disk.Healing is set to true and madmin.Disk.State is "ok"	2020-09-02 22:54:56 -07:00
Daniel Valdivia	7d1734d033	indicate through HTTP header cluster healing in progress (#10342 )	2020-08-24 15:20:50 -07:00
Harshavardhana	fe157166ca	fix: Pass context all the way down to the network call in lockers (#10161 ) Context timeout might race on each other when timeouts are lower i.e when two lock attempts happened very quickly on the same resource and the servers were yet trying to establish quorum. This situation can lead to locks held which wouldn't be unlocked and subsequent lock attempts would fail. This would require a complete server restart. A potential of this issue happening is when server is booting up and we are trying to hold a 'transaction.lock' in quick bursts of timeout.	2020-07-29 23:15:34 -07:00
Harshavardhana	ec06089eda	fix: re-implement cluster healthcheck (#10101 )	2020-07-20 18:31:22 -07:00
Harshavardhana	c0ac25bfff	fix: readiness needs to be like liveness (#9941 ) Readiness as no reasoning to be cluster scope because that is not how the k8s networking works for pods, all the pods to a deployment are not sharing the network in a singleton. Instead they are run as local scopes to themselves, with readiness failures the pod is potentially taken out of the network to be resolvable - this affects the distributed setup in myriad of different ways. Instead readiness should behave like liveness with local scope alone, and should be a dummy implementation. This PR all the startup times and overal k8s startup time dramatically improves. Added another handler called as `/minio/health/cluster` to understand the cluster scope health.	2020-06-30 11:28:27 -07:00
Harshavardhana	4790868878	allow background IAM load to speed up startup (#9796 ) Also fix healthcheck handler to run success only if object layer has initialized fully for S3 API access call.	2020-06-09 19:19:03 -07:00
Harshavardhana	5e529a1c96	simplify context timeout for readiness (#9772 ) additionally also add CORS support to restrict for specific origin, adds a new config and updated the documentation as well	2020-06-04 14:58:34 -07:00
Krishna Srinivas	7d19ab9f62	readiness returns error quickly if any of the set is down (#9662 ) This PR adds a new configuration parameter which allows readiness check to respond within 10secs, this can be reduced to a lower value if necessary using ``` mc admin config set api ready_deadline=5s ``` or ``` export MINIO_API_READY_DEADLINE=5s ```	2020-05-23 17:38:39 -07:00
Anis Elleuch	d4dcf1d722	metrics: Use StorageInfo() instead to have consistent info (#9006 ) Metrics used to have its own code to calculate offline disks. StorageInfo() was avoided because it is an expensive operation by sending calls to all nodes. To make metrics & server info share the same code, a new argument `local` is added to StorageInfo() so it will only query local disks when needed. Metrics now calls StorageInfo() as server info handler does but with the local flag set to false. Co-authored-by: Praveen raj Mani <praveen@minio.io> Co-authored-by: Harshavardhana <harsha@minio.io>	2020-02-20 09:21:33 +05:30
Harshavardhana	0879a4f743	rest/storage: Remove racy LastError usage (#8817 ) instead perform a liveness check call to verify if server is online and print relevant errors. Also introduce a StorageErr string error type instead of errors.New() deprecate usage of VerifyFileError, DeleteFileError for gob, change in datastructure also requires bump in storage REST version to v13. Fixes #8811	2020-01-14 18:45:17 -08:00
Praveen raj Mani	157721f694	Fix readiness to return 200 for read-only mode (#8728 ) - We should declare a cluster ready even if read quorum is achieved (atleast n/2 disks are online). - Such that, all the zones should have enough read quorum. Thus making the cluster ready for reads.	2020-01-02 05:05:01 -08:00
Praveen raj Mani	5d09233115	Fix Readiness check (#8681 ) - Remove goroutine-check in Readiness check - Bring in quorum check for readiness Fixes #8385 Co-authored-by: Harshavardhana <harsha@minio.io>	2019-12-28 22:24:43 +05:30
Harshavardhana	347b29d059	Implement bucket expansion (#8509 )	2019-11-19 17:42:27 -08:00
Harshavardhana	822eb5ddc7	Bring in safe mode support (#8478 ) This PR refactors object layer handling such that upon failure in sub-system initialization server reaches a stage of safe-mode operation wherein only certain API operations are enabled and available. This allows for fixing many scenarios such as - incorrect configuration in vault, etcd, notification targets - missing files, incomplete config migrations unable to read encrypted content etc - any other issues related to notification, policies, lifecycle etc	2019-11-09 09:27:23 -08:00
Harshavardhana	07a556a10b	Avoid ListBuckets() call instead rely on simple HTTP GET (#8475 ) This is to avoid making calls to backend and requiring gateways to allow permissions for ListBuckets() operation just for Liveness checks, we can avoid this and make our liveness checks to be more performant.	2019-11-01 16:58:10 -07:00
Harshavardhana	9e7a3e6adc	Extend further validation of config values (#8469 ) - This PR allows config KVS to be validated properly without being affected by ENV overrides, rejects invalid values during set operation - Expands unit tests and refactors the error handling for notification targets, returns error instead of ignoring targets for invalid KVS - Does all the prep-work for implementing safe-mode style operation for MinIO server, introduces a new global variable to toggle safe mode based operations NOTE: this PR itself doesn't provide safe mode operations	2019-10-30 23:39:09 -07:00
Nitish Tiwari	496fba3e9a	Return 200 OK for liveness checks while distributed cluster starts (#8176 ) With this PR, liveness check responds with 200 OK with "server-not- initialized" header while objectLayer gets initialized. The header is removed as objectLayer is initialized. This is to allow MinIO distributed cluster to get started when running on an orchestration platforms like Docker Swarm. This PR also updates sample Swarm yaml files to use correct values for healthcheck fields. Fixes #8140	2019-09-05 14:50:56 +05:30
Harshavardhana	5a28ef0d47	Bump readiness check upto 10000 go-routines (#8057 ) Most of our current workloads reach this value regularly, it doesn't make sense to keep 1000 go-routine limit.	2019-08-10 18:13:14 +05:30
Anis Elleuch	e857b6741d	Add one log in health checker liveness code (#7861 )	2019-07-06 16:38:39 -07:00
kannappanr	5ecac91a55	Replace Minio refs in docs with MinIO and links (#7494 )	2019-04-09 11:39:42 -07:00
Krishna Srinivas	267f183fc8	Do not do StorageInfo() and ListBuckets() for FS/Erasure in health check handler (#7090 ) Health checking programs very frequently use /minio/health/live to check health, hence we can avoid doing StorageInfo() and ListBuckets() for FS/Erasure backend.	2019-01-20 10:28:36 +05:30
Harshavardhana	166e998788	Fix healthcheck for NAS gateway (#6452 ) It was expected that in gateway mode, we do not know the backend types whereas in NAS gateway since its an extension of FS mode (standalone) this leads to an issue in LivenessCheckHandler() which would perpetually return 503, this would affect all kubernetes, openshift deployments of NAS gateway.	2018-09-11 13:44:10 -07:00
Nitish Tiwari	197af49c99	Fix healthcheck handler to verify gateway backend liveness (#6218 ) Fixes #6217	2018-07-31 10:55:34 -07:00
Harshavardhana	157ed65c35	Fix healthcheck handler to check errors in local disks only (#6184 ) Healthcheck handler in current implementation was performing ListBuckets() to check for liveness of Minio service. ListBuckets() implementation on the other hand doesn't do quorum based listing and if one of the disks returned error, an I/O error it would be lead to kubernetes taking the minio pod down prematurely even if the disk is not local to that minio server. The reason is ListBuckets() call cannot be trusted to provide us the valid information that we need, Minio is a clustered application which is designed to handle disk failures. Error on one of the disks doesn't mean the pod should become fully non-operational. This PR attempts to fix this by only checking for alive disks which are local to each setup and also by simply performing a Stat() operation, if the Stat() returned error on all disks local to a particular server then we can let kubernetes safely take it down, until then we should be operational.	2018-07-23 12:21:25 -07:00
Krishna Srinivas	9ede179a21	Use context.Background() instead of nil Rename Context[Get\|Set] -> [Get\|Set]Context	2018-03-15 16:28:25 -07:00
Krishna Srinivas	e452377b24	Add context to the object-interface methods. Make necessary changes to xl fs azure sia	2018-03-15 16:28:25 -07:00

1 2

51 Commits