minio

Commit Graph

Author	SHA1	Message	Date
Harshavardhana	64f6020854	fix: cleanup locking, cancel context upon lock timeout (#12183 ) upon errors to acquire lock context would still leak, since the cancel would never be called. since the lock is never acquired - proactively clear it before returning.	2021-04-29 20:55:21 -07:00
Anis Elleuch	9e797532dc	lock: Always cancel the returned Get(R)Lock context (#12162 ) * lock: Always cancel the returned Get(R)Lock context There is a leak with cancel created inside the locking mechanism. The cancel purpose was to cancel operations such erasure get/put that are holding non-refreshable locks. This PR will ensure the created context.Cancel is passed to the unlock API so it will cleanup and avoid leaks. * locks: Avoid returning nil cancel in local lockers Since there is no Refresh mechanism in the local locking mechanism, we do not generate a new context or cancel. Currently, a nil cancel function is returned but this can cause a crash. Return a dummy function instead.	2021-04-27 16:12:50 -07:00
Krishnan Parthasarathi	c829e3a13b	Support for remote tier management (#12090 ) With this change, MinIO's ILM supports transitioning objects to a remote tier. This change includes support for Azure Blob Storage, AWS S3 compatible object storage incl. MinIO and Google Cloud Storage as remote tier storage backends. Some new additions include: - Admin APIs remote tier configuration management - Simple journal to track remote objects to be 'collected' This is used by object API handlers which 'mutate' object versions by overwriting/replacing content (Put/CopyObject) or removing the version itself (e.g DeleteObjectVersion). - Rework of previous ILM transition to fit the new model In the new model, a storage class (a.k.a remote tier) is defined by the 'remote' object storage type (one of s3, azure, GCS), bucket name and a prefix. * Fixed bugs, review comments, and more unit-tests - Leverage inline small object feature - Migrate legacy objects to the latest object format before transitioning - Fix restore to particular version if specified - Extend SharedDataDirCount to handle transitioned and restored objects - Restore-object should accept version-id for version-suspended bucket (#12091) - Check if remote tier creds have sufficient permissions - Bonus minor fixes to existing error messages Co-authored-by: Poorna Krishnamoorthy <poorna@minio.io> Co-authored-by: Krishna Srinivas <krishna@minio.io> Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-23 11:58:53 -07:00
Harshavardhana	069432566f	update license change for MinIO Signed-off-by: Harshavardhana <harsha@minio.io>	2021-04-23 11:58:53 -07:00
Andreas Auernhammer	3455f786fa	kms: encrypt IAM/config data with the KMS (#12041 ) This commit changes the config/IAM encryption process. Instead of encrypting config data (users, policies etc.) with the root credentials MinIO now encrypts this data with a KMS - if configured. Therefore, this PR moves the MinIO-KMS configuration (via env. variables) to a "top-level" configuration. The KMS configuration cannot be stored in the config file since it is used to decrypt the config file in the first place. As a consequence, this commit also removes support for Hashicorp Vault - which has been deprecated anyway. Signed-off-by: Andreas Auernhammer <aead@mail.de>	2021-04-22 09:51:09 -07:00
Andreas Auernhammer	885c170a64	introduce new package pkg/kms (#12019 ) This commit introduces a new package `pkg/kms`. It contains basic types and functions to interact with various KMS implementations. This commit also moves KMS-related code from `cmd/crypto` to `pkg/kms`. Now, it is possible to implement a KMS-based config data encryption in the `pkg/config` package.	2021-04-15 08:47:33 -07:00
Harshavardhana	641e564b65	fips build tag uses relevant binary link for updates (#12014 ) This code is necessary for `mc admin update` command to work with fips compiled binaries, with fips tags the releaseInfo will automatically point to fips specific binaries.	2021-04-08 09:51:11 -07:00
Klaus Post	48c5e7e5b6	Add runtime mem stats to server info (#11995 ) Adds information about runtime+gc memory use.	2021-04-07 10:40:51 -07:00
Harshavardhana	abb55bd49e	fix: properly close leaking bandwidth monitor channel (#11967 ) This PR fixes - close leaking bandwidth report channel leakage - remove the closer requirement for bandwidth monitor instead if Read() fails remember the error and return error for all subsequent reads. - use locking for usage-cache.bin updates, with inline data we cannot afford to have concurrent writes to usage-cache.bin corrupting xl.meta	2021-04-05 16:07:53 -07:00
Ritesh H Shukla	3ddd8b04d1	fix: handle unsupported APIs more granularly (#11674 )	2021-03-30 23:19:36 -07:00
Anis Elleuch	d8b5adfd10	trace: Add storage & OS tracing (#11889 )	2021-03-26 23:24:07 -07:00
Anis Elleuch	2c296652f7	Simplify access to local node name (#11907 ) The local node name is heavily used in tracing, create a new global variable to store it. Multiple goroutines can access it since it won't be changed later.	2021-03-26 11:37:58 -07:00
Klaus Post	749e9c5771	metrics: Add canceled requests (#11881 ) Add metric for canceled requests	2021-03-24 10:25:27 -07:00
Anis Elleuch	0eb146e1b2	add additional metrics per disk API latency, API call counts #11250 ) ``` mc admin info --json ``` provides these details, for now, we shall eventually expose this at Prometheus level eventually. Co-authored-by: Harshavardhana <harsha@minio.io>	2021-03-16 20:06:57 -07:00
Klaus Post	fa9cf1251b	Imporve healing and reporting (#11312 ) * Provide information on actively healing, buckets healed/queued, objects healed/failed. * Add concurrent healing of multiple sets (typically on startup). * Add bucket level resume, so restarts will only heal non-healed buckets. * Print summary after healing a disk is done.	2021-03-04 14:36:23 -08:00
Anis Elleuch	7be7109471	locking: Add Refresh for better locking cleanup (#11535 ) Co-authored-by: Anis Elleuch <anis@min.io> Co-authored-by: Harshavardhana <harsha@minio.io>	2021-03-03 18:36:43 -08:00
Harshavardhana	c6a120df0e	fix: Prometheus metrics to re-use storage disks (#11647 ) also re-use storage disks for all `mc admin server info` calls as well, implement a new LocalStorageInfo() API call at ObjectLayer to lookup local disks storageInfo also fixes bugs where there were double calls to StorageInfo()	2021-03-02 17:28:04 -08:00
Shireesh Anjal	289b22d911	fix: pool number not added for one server (#11670 ) The previous code was iterating over replies from peers and assigning pool numbers to them, thus missing to add it for the local server. Fixed by iterating over the server properties of all the servers including the local one.	2021-03-01 08:09:43 -08:00
Anis Elleuch	98d3f94996	metrics: Add the number of requests in the waiting queue (#11580 ) We can use this metric to check if there are too many S3 clients in the queue and could explain why some of those S3 clients are timing out. ``` minio_s3_requests_waiting_total{server="127.0.0.1:9000"} 9981 ``` If max_requests is 10000 then there is a strong possibility that clients are timing out because of the queue deadline.	2021-02-20 00:21:55 -08:00
Shireesh Anjal	3afa499885	fix: empty buckets/objects nodes in new setup (#11493 )	2021-02-09 09:52:38 -08:00
Andreas Auernhammer	33554651e9	crypto: deprecate native Hashicorp Vault support (#11352 ) This commit deprecates the native Hashicorp Vault support and removes the legacy Vault documentation. The native Hashicorp Vault documentation is marked as outdated and deprecated for over a year now. We give another 6 months before we start removing Hashicorp Vault support and show a deprecation warning when a MinIO server starts with a native Vault configuration.	2021-01-29 17:55:37 -08:00
Anis Elleuch	00cff1aac5	audit: per object send pool number, set number and servers per operation (#11233 )	2021-01-26 13:21:51 -08:00
Harshavardhana	9cdd981ce7	fix: expire locks only on participating lockers (#11335 ) additionally also add a new ForceUnlock API, to allow forcibly unlocking locks if possible.	2021-01-25 10:01:27 -08:00
Harshavardhana	a6c146bd00	validate storage class across pools when setting config (#11320 ) ``` mc admin config set alias/ storage_class standard=EC:3 ``` should only succeed if parity ratio is valid for all server pools, if not we should fail proactively. This PR also needs to bring other changes now that we need to cater for variadic drive counts per pool. Bonus fixes also various bugs reproduced with - GetObjectWithPartNumber() - CopyObjectPartWithOffsets() - CopyObjectWithMetadata() - PutObjectPart,PutObject with truncated streams	2021-01-22 12:09:24 -08:00
Anis Elleuch	6f781c5e7a	heal: Reduce whitespace ticker to 5 seconds (#11234 ) 30 seconds white spaces is long for some setups which time out when no read activity in short time, reduce the subnet health white space ticker to 5 seconds, since it has no cost at all.	2021-01-06 13:29:50 -08:00
Harshavardhana	e7ae49f9c9	fix: calculate prometheus disks_offline/disks_total correctly (#11215 ) fixes #11196	2021-01-04 09:42:09 -08:00
Anis Elleuch	2ecaab55a6	admin: ServerInfo returns info without object layer initialized (#11142 )	2020-12-21 09:35:19 -08:00
Anis Elleuch	e63a10e505	Profiling does not required object layer to be initialized (#11133 )	2020-12-18 11:51:15 -08:00
Harshavardhana	4550ac6fff	fix: refactor locks to apply them uniquely per node (#11052 ) This refactor is done for few reasons below - to avoid deadlocks in scenarios when number of nodes are smaller < actual erasure stripe count where in N participating local lockers can lead to deadlocks across systems. - avoids expiry routines to run 1000 of separate network operations and routes per disk where as each of them are still accessing one single local entity. - it is ideal to have since globalLockServer per instance. - In a 32node deployment however, each server group is still concentrated towards the same set of lockers that partipicate during the write/read phase, unlike previous minio/dsync implementation - this potentially avoids send 32 requests instead we will still send at max requests of unique nodes participating in a write/read phase. - reduces overall chattiness on smaller setups.	2020-12-10 07:28:37 -08:00
Klaus Post	a896125490	Add crawler delay config + dynamic config values (#11018 )	2020-12-04 09:32:35 -08:00
Ritesh H Shukla	7e2b79984e	Stream bucket bandwidth measurements (#11014 )	2020-12-03 11:34:42 -08:00
Harshavardhana	86409fa93d	add audit/admin trace support for browser requests (#10947 ) To support this functionality we had to fork the gorilla/rpc package with relevant changes	2020-11-20 22:52:17 -08:00
Shireesh Anjal	7bc47a14cc	Rename OBD to Health (#10842 ) Also, Remove thread stats and openfds from the health report as we already have process stats and numfds	2020-11-20 12:52:53 -08:00
Harshavardhana	0bcb1b679d	fix: disallow update if dates are same (#10890 ) fixes #10889	2020-11-12 14:18:59 -08:00
Harshavardhana	cbdab62c1e	fix: heal user/metadata right away upon server startup (#10863 ) this is needed such that we make sure to heal the users, policies and bucket metadata right away as we do listing based on list cache which only lists '3' sufficiently good drives, to avoid possibly losing access to these users upon upgrade make sure to heal them.	2020-11-10 09:02:06 -08:00
Klaus Post	2294e53a0b	Don't retain context in locker (#10515 ) Use the context for internal timeouts, but disconnect it from outgoing calls so we always receive the results and cancel it remotely.	2020-11-04 08:25:42 -08:00
Klaus Post	a982baff27	ListObjects Metadata Caching (#10648 ) Design: https://gist.github.com/klauspost/025c09b48ed4a1293c917cecfabdf21c Gist of improvements: * Cross-server caching and listing will use the same data across servers and requests. * Lists can be arbitrarily resumed at a constant speed. * Metadata for all files scanned is stored for streaming retrieval. * The existing bloom filters controlled by the crawler is used for validating caches. * Concurrent requests for the same data (or parts of it) will not spawn additional walkers. * Listing a subdirectory of an existing recursive cache will use the cache. * All listing operations are fully streamable so the number of objects in a bucket no longer dictates the amount of memory. * Listings can be handled by any server within the cluster. * Caches are cleaned up when out of date or superseded by a more recent one.	2020-10-28 09:18:35 -07:00
Shireesh Anjal	858e2a43df	Remove logging info from OBDInfoHandler (#10727 ) A lot of logging data is counterproductive. A better implementation with precise useful log data can be introduced later.	2020-10-27 17:41:48 -07:00
Harshavardhana	646d6917ed	turn-off checking for updates completely if MINIO_UPDATE=off (#10752 )	2020-10-24 22:39:44 -07:00
Harshavardhana	d9db7f3308	expire lockers if lockers are offline (#10749 ) lockers currently might leave stale lockers, in unknown ways waiting for downed lockers. locker check interval is high enough to safely cleanup stale locks.	2020-10-24 13:23:16 -07:00
Ritesh H Shukla	8ceb2a93fd	fix: peer replication bandwidth monitoring in distributed setup (#10652 )	2020-10-12 09:04:55 -07:00
Ritesh H Shukla	c2f16ee846	Add basic bandwidth monitoring for replication. (#10501 ) This change tracks bandwidth for a bucket and object - [x] Add Admin API - [x] Add Peer API - [x] Add BW throttling - [x] Admin APIs to set replication limit - [x] Admin APIs for fetch bandwidth	2020-10-09 20:36:00 -07:00
Harshavardhana	a0d0645128	remove safeMode behavior in startup (#10645 ) In almost all scenarios MinIO now is mostly ready for all sub-systems independently, safe-mode is not useful anymore and do not serve its original intended purpose. allow server to be fully functional even with config partially configured, this is to cater for availability of actual I/O v/s manually fixing the server. In k8s like environments it will never make sense to take pod into safe-mode state, because there is no real access to perform any remote operation on them.	2020-10-09 09:59:52 -07:00
Harshavardhana	736e58dd68	fix: handle concurrent lockers with multiple optimizations (#10640 ) - select lockers which are non-local and online to have affinity towards remote servers for lock contention - optimize lock retry interval to avoid sending too many messages during lock contention, reduces average CPU usage as well - if bucket is not set, when deleteObject fails make sure setPutObjHeaders() honors lifecycle only if bucket name is set. - fix top locks to list out always the oldest lockers always, avoid getting bogged down into map's unordered nature.	2020-10-08 12:32:32 -07:00
Harshavardhana	c6a9a94f94	fix: optimize ServerInfo() handler to avoid reading config (#10626 ) fixes #10620	2020-10-02 16:19:44 -07:00
Harshavardhana	66174692a2	add '.healing.bin' for tracking currently healing disk (#10573 ) add a hint on the disk to allow for tracking fresh disk being healed, to allow for restartable heals, and also use this as a way to track and remove disks. There are more pending changes where we should move all the disk formatting logic to backend drives, this PR doesn't deal with this refactor instead makes it easier to track healing in the future.	2020-09-28 19:39:32 -07:00
Harshavardhana	eafa775952	fix: add lock ownership to expire locks (#10571 ) - Add owner information for expiry, locking, unlocking a resource - TopLocks returns now locks in quorum by default, provides a way to capture stale locks as well with `?stale=true` - Simplify the quorum handling for locks to avoid from storage class, because there were challenges to make it consistent across all situations. - And other tiny simplifications to reset locks.	2020-09-25 19:21:52 -07:00
飞雪无情	d778d034e7	Remove redundant mgmtQueryKey type. (#10557 ) Remove redundant type conversion.	2020-09-24 08:40:21 -07:00
Anis Elleuch	8ea55f9dba	obd: Add console log to OBD output (#10372 )	2020-09-15 18:02:54 -07:00
Harshavardhana	c13afd56e8	Remove MaxConnsPerHost settings to avoid potential hangs (#10438 ) MaxConnsPerHost can potentially hang a call without any way to timeout, we do not need this setting for our proxy and gateway implementations instead IdleConn settings are good enough. Also ensure to use NewRequestWithContext and make sure to take the disks offline only for network errors. Fixes #10304	2020-09-08 14:22:04 -07:00
Andreas Auernhammer	fbd1c5f51a	certs: refactor cert manager to support multiple certificates (#10207 ) This commit refactors the certificate management implementation in the `certs` package such that multiple certificates can be specified at the same time. Therefore, the following layout of the `certs/` directory is expected: ``` certs/ │ ├─ public.crt ├─ private.key ├─ CAs/ // CAs directory is ignored │ │ │ ... │ ├─ example.com/ │ │ │ ├─ public.crt │ └─ private.key └─ foobar.org/ │ ├─ public.crt └─ private.key ... ``` However, directory names like `example.com` are just for human readability/organization and don't have any meaning w.r.t whether a particular certificate is served or not. This decision is made based on the SNI sent by the client and the SAN of the certificate. *** The `Manager` will pick a certificate based on the client trying to establish a TLS connection. In particular, it looks at the client hello (i.e. SNI) to determine which host the client tries to access. If the manager can find a certificate that matches the SNI it returns this certificate to the client. However, the client may choose to not send an SNI or tries to access a server directly via IP (`https://<ip>:<port>`). In this case, we cannot use the SNI to determine which certificate to serve. However, we also should not pick "the first" certificate that would be accepted by the client (based on crypto. parameters - like a signature algorithm) because it may be an internal certificate that contains internal hostnames. We would disclose internal infrastructure details doing so. Therefore, the `Manager` returns the "default" certificate when the client does not specify an SNI. The default certificate the top-level `public.crt` - i.e. `certs/public.crt`. This approach has some consequences: - It's the operator's responsibility to ensure that the top-level `public.crt` does not disclose any information (i.e. hostnames) that are not publicly visible. However, this was the case in the past already. - Any other `public.crt` - except for the top-level one - must not contain any IP SAN. The reason for this restriction is that the Manager cannot match a SNI to an IP b/c the SNI is the server host name. The entire purpose of SNI is to indicate which host the client tries to connect to when multiple hosts run on the same IP. So, a client will not set the SNI to an IP. If we would allow IP SANs in a lower-level `public.crt` a user would expect that it is possible to connect to MinIO directly via IP address and that the MinIO server would pick "the right" certificate. However, the MinIO server cannot determine which certificate to serve, and therefore always picks the "default" one. This may lead to all sorts of confusing errors like: "It works if I use `https:instance.minio.local` but not when I use `https://10.0.2.1`. These consequences/limitations should be pointed out / explained in our docs in an appropriate way. However, the support for multiple certificates should not have any impact on how deployment with a single certificate function today. Co-authored-by: Harshavardhana <harsha@minio.io>	2020-09-03 23:33:37 -07:00
Harshavardhana	8a291e1dc0	Cluster healthcheck improvements (#10408 ) - do not fail the healthcheck if heal status was not obtained from one of the nodes, if many nodes fail then report this as a catastrophic error. - add "x-minio-write-quorum" value to match the write tolerance supported by server. - admin info now states if a drive is healing where madmin.Disk.Healing is set to true and madmin.Disk.State is "ok"	2020-09-02 22:54:56 -07:00
Harshavardhana	2acb530ccd	update rulesguard with new rules (#10392 ) Co-authored-by: Nitish Tiwari <nitish@minio.io> Co-authored-by: Praveen raj Mani <praveen@minio.io>	2020-09-01 16:58:13 -07:00
Andreas Auernhammer	18725679c4	crypto: allow multiple KES endpoints (#10383 ) This commit addresses a maintenance / automation problem when MinIO-KES is deployed on bare-metal. In orchestrated env. the orchestrator (K8S) will make sure that `n` KES servers (IPs) are available via the same DNS name. There it is sufficient to provide just one endpoint.	2020-08-31 18:10:52 -07:00
Klaus Post	1b119557c2	getDisksInfo: Attribute failed disks to correct endpoint (#10360 ) If DiskInfo calls failed the information returned was used anyway resulting in no endpoint being set. This would make the drive be attributed to the local system since `disk.Endpoint == disk.DrivePath` in that case. Instead, if the call fails record the endpoint and the error only.	2020-08-26 10:11:26 -07:00
Harshavardhana	e57c742674	use single dynamic timeout for most locked API/heal ops (#10275 ) newDynamicTimeout should be allocated once, in-case of temporary locks in config and IAM we should have allocated timeout once before the `for loop` This PR doesn't fix any issue as such, but provides enough dynamism for the timeout as per expectation.	2020-08-17 11:29:58 -07:00
Harshavardhana	2a9819aff8	fix: refactor background heal for cluster health (#10225 )	2020-08-07 19:43:06 -07:00
Harshavardhana	6c6137b2e7	add cluster maintenance healthcheck drive heal affinity (#10218 )	2020-08-07 13:22:53 -07:00
Harshavardhana	0b8255529a	fix: proxies set keep-alive timeouts to be system dependent (#10199 ) Split the DialContext's one for internode and another for all other external communications especially proxy forwarders, gateway transport etc.	2020-08-04 14:55:53 -07:00
Harshavardhana	25a55bae6f	fix: avoid buffering of server sent events by proxies (#10164 )	2020-07-30 19:45:12 -07:00
Harshavardhana	3a73f1ead5	refactor server update behavior (#10107 )	2020-07-23 08:03:31 -07:00
Harshavardhana	e7d7d5232c	fix: admin info output and improve overall performance (#10015 ) - admin info node offline check is now quicker - admin info now doesn't duplicate the code across doing the same checks for disks - rely on StorageInfo to return appropriate errors instead of calling locally. - diskID checks now return proper errors when disk not found v/s format.json missing. - add more disk states for more clarity on the underlying disk errors.	2020-07-13 09:51:07 -07:00
Andreas Auernhammer	a317a2531c	admin: new API for creating KMS master keys (#9982 ) This commit adds a new admin API for creating master keys. An admin client can send a POST request to: ``` /minio/admin/v3/kms/key/create?key-id=<keyID> ``` The name / ID of the new key is specified as request query parameter `key-id=<ID>`. Creating new master keys requires KES - it does not work with the native Vault KMS (deprecated) nor with a static master key (deprecated). Further, this commit removes the `UpdateKey` method from the `KMS` interface. This method is not needed and not used anymore.	2020-07-08 18:50:43 -07:00
Harshavardhana	2743d4ca87	fix: Add support for preserving mtime for replication (#9995 ) This PR is needed for bucket replication support	2020-07-08 17:36:56 -07:00
Harshavardhana	cdb0e6ffed	support proper values for listMultipartUploads/listParts (#9970 ) object KMS is configured with auto-encryption, there were issues when using docker registry - this has been left unnoticed for a while. This PR fixes an issue with compatibility. Additionally also fix the continuation-token implementation infinite loop issue which was missed as part of #9939 Also fix the heal token to be generated as a client facing value instead of what is remembered by the server, this allows for the server to be stateless regarding the token's behavior.	2020-07-03 19:27:13 -07:00
Anis Elleuch	2be20588bf	Reroute requests based token heal/listing (#9939 ) When manual healing is triggered, one node in a cluster will become the authority to heal. mc regularly sends new requests to fetch the status of the ongoing healing process, but a load balancer could land the healing request to a node that is not doing the healing request. This PR will redirect a request to the node based on the node index found described as part of the client token. A similar technique is also used to proxy ListObjectsV2 requests by encoding this information in continuation-token	2020-07-03 11:53:03 -07:00
Harshavardhana	a38ce29137	fix: simplify background heal and trigger heal items early (#9928 ) Bonus fix during versioning merge one of the PR was missing the offline/online disk count fix from #9801 port it correctly over to the master branch from release. Additionally, add versionID support for MRF Fixes #9910 Fixes #9931	2020-06-29 13:07:26 -07:00
Harshavardhana	b8cb21c954	allow more than N number of locks in TopLocks (#9883 )	2020-06-20 06:33:01 -07:00
Harshavardhana	4915433bd2	Support bucket versioning (#9377 ) - Implement a new xl.json 2.0.0 format to support, this moves the entire marshaling logic to POSIX layer, top layer always consumes a common FileInfo construct which simplifies the metadata reads. - Implement list object versions - Migrate to siphash from crchash for new deployments for object placements. Fixes #2111	2020-06-12 20:04:01 -07:00
Harshavardhana	96ed0991b5	fix: optimize IAM users load, add fallback (#9809 ) Bonus fix, load service accounts properly when service accounts were generated with LDAP	2020-06-11 14:11:30 -07:00
Harshavardhana	b2db8123ec	Preserve errors returned by diskInfo to detect disk errors (#9727 ) This PR basically reverts #9720 and re-implements it differently	2020-05-28 13:03:04 -07:00
Harshavardhana	53aaa5d2a5	Export bucket usage counts as part of bucket metrics (#9710 ) Bonus fixes in quota enforcement to use the new datastructure and use timedValue to cache a value/reload automatically avoids one less global variable.	2020-05-27 06:45:43 -07:00
Sidhartha Mani	c121d27f31	progressively report obd results (#9639 )	2020-05-22 17:56:45 -07:00
Harshavardhana	bd032d13ff	migrate all bucket metadata into a single file (#9586 ) this is a major overhaul by migrating off all bucket metadata related configs into a single object '.metadata.bin' this allows us for faster bootups across 1000's of buckets and as well as keeps the code simple enough for future work and additions. Additionally also fixes #9396, #9394	2020-05-19 13:53:54 -07:00
Harshavardhana	1bc32215b9	enable full linter across the codebase (#9620 ) enable linter using golangci-lint across codebase to run a bunch of linters together, we shall enable new linters as we fix more things the codebase. This PR fixes the first stage of this cleanup.	2020-05-18 09:59:45 -07:00
Harshavardhana	814ddc0923	add missing admin actions, enhance AccountUsageInfo (#9607 )	2020-05-15 18:16:45 -07:00
Harshavardhana	6ac48a65cb	fix: use unused cacheMetrics code in prometheus (#9588 ) remove all other unusued/deadcode	2020-05-13 08:15:26 -07:00
Harshavardhana	337c2a7cb4	add audit logging for all admin calls (#9568 ) - add ServiceRestart/ServiceStop actions - audit log appropriately in all admin handlers fixes #9522	2020-05-11 10:34:08 -07:00
Harshavardhana	27d716c663	simplify usage of mutexes and atomic constants (#9501 )	2020-05-03 22:35:40 -07:00
Harshavardhana	7a5271ad96	fix: re-use connections in webhook/elasticsearch (#9461 ) - elasticsearch client should rely on the SDK helpers instead of pure HTTP calls. - webhook shouldn't need to check for IsActive() for all notifications, failure should be delayed. - Remove DialHTTP as its never used properly Fixes #9460	2020-04-28 13:57:56 -07:00
Praveen raj Mani	322385f1b6	fix: only show active/available ARNs in server startup banner (#9392 )	2020-04-21 09:38:32 -07:00
Klaus Post	c4464e36c8	fix: limit HTTP transport tuables to affordable values (#9383 ) Close connections pro-actively in transient calls	2020-04-17 11:20:56 -07:00
Harshavardhana	69fb68ef0b	fix simplify code to start using context (#9350 )	2020-04-16 10:56:18 -07:00
Sidhartha Mani	ec11e99667	implement configurable timeout for OBD tests (#9324 )	2020-04-14 11:48:32 -07:00
Harshavardhana	37d066b563	fix: deprecate requirement of session token for service accounts (#9320 ) This PR fixes couple of behaviors with service accounts - not need to have session token for service accounts - service accounts can be generated by any user for themselves implicitly, with a valid signature. - policy input for AddNewServiceAccount API is not fully typed allowing for validation before it is sent to the server. - also bring in additional context for admin API errors if any when replying back to client. - deprecate GetServiceAccount API as we do not need to reply back session tokens	2020-04-14 11:28:56 -07:00
Praveen raj Mani	bfec5fe200	fix: fetchLambdaInfo should return consistent results (#9332 ) - Introduced a function `FetchRegisteredTargets` which will return a complete set of registered targets irrespective to their states, if the `returnOnTargetError` flag is set to `False` - Refactor NewTarget functions to return non-nil targets - Refactor GetARNList() to return a complete list of configured targets	2020-04-14 11:19:25 -07:00
Harshavardhana	4314ee1670	fix: remove unusued PerfInfoHandler code (#9328 ) - Removes PerfInfo admin API as its not OBDInfo - Keep the drive path without the metaBucket in OBD global latency map. - Remove all the unused code related to PerfInfo API - Do not redefined global mib,gib constants use humanize.MiByte and humanize.GiByte instead always	2020-04-12 19:37:09 -07:00
Harshavardhana	f44cfb2863	use GlobalContext whenever possible (#9280 ) This change is throughout the codebase to ensure that all codepaths honor GlobalContext	2020-04-09 09:30:02 -07:00
Harshavardhana	2642e12d14	fix: change policies API to return and take struct (#9181 ) This allows for order guarantees in returned values can be consumed safely by the caller to avoid any additional parsing and validation. Fixes #9171	2020-04-07 19:30:59 -07:00
Sidhartha Mani	c8243706b4	Add Parallel NetOBD tests to saturate all nodes at once (#9241 )	2020-03-31 17:08:28 -07:00
Sidhartha Mani	7b732b566f	[Bugfix] Fix Net tests being omitted (#9234 )	2020-03-31 01:15:21 -07:00
Sidhartha Mani	0c80bf45d0	Implement oboard diagnostics admin API (#9024 ) - Implement a graph algorithm to test network bandwidth from every node to every other node - Saturate any network bandwidth adaptively, accounting for slow and fast network capacity - Implement parallel drive OBD tests - Implement a paging mechanism for OBD test to provide periodic updates to client - Implement Sys, Process, Host, Mem OBD Infos	2020-03-26 21:07:39 -07:00
Harshavardhana	3d3beb6a9d	Add response header timeouts (#9170 ) - Add conservative timeouts upto 3 minutes for internode communication - Add aggressive timeouts of 30 seconds for gateway communication Fixes #9105 Fixes #8732 Fixes #8881 Fixes #8376 Fixes #9028	2020-03-21 22:10:13 -07:00
Harshavardhana	b4bfdc92cc	fix: admin console logger changes to log.Info	2020-03-20 15:14:14 -07:00
Harshavardhana	ae654831aa	Add madmin package context support (#9172 ) This is to improve responsiveness for all admin API operations and allowing callers to cancel any on-going admin operations, if they happen to be waiting too long.	2020-03-20 15:00:44 -07:00
Harshavardhana	cfd12914e1	fix: crash in serverInfo handler when ldap is configured (#9123 )	2020-03-11 23:13:32 -07:00
Anis Elleuch	fdf65aa9b9	heal: Add info about the next background healing round (#9122 ) - avoid setting last heal activity when starting self-healing This can be confusing to users thinking that the self healing cycle was already performed. - add info about the next background healing round	2020-03-11 23:00:31 -07:00
Nitish Tiwari	7c32f3f554	Fix the URL for MinIO update when using custom download server (#9111 ) Co-authored-by: Nitish Tiwari <nitish@minio.io> Co-authored-by: Harshavardhana <harsha@minio.io>	2020-03-11 20:09:20 +05:30
Anis Elleuch	d4dcf1d722	metrics: Use StorageInfo() instead to have consistent info (#9006 ) Metrics used to have its own code to calculate offline disks. StorageInfo() was avoided because it is an expensive operation by sending calls to all nodes. To make metrics & server info share the same code, a new argument `local` is added to StorageInfo() so it will only query local disks when needed. Metrics now calls StorageInfo() as server info handler does but with the local flag set to false. Co-authored-by: Praveen raj Mani <praveen@minio.io> Co-authored-by: Harshavardhana <harsha@minio.io>	2020-02-20 09:21:33 +05:30
Andreas Auernhammer	086fbb745e	fix and improve KMS server info (#8944 ) This commit fixes typos in the displayed server info w.r.t. the KMS and removes the update status. For more information about why the update status is removed see: PR #8943	2020-02-06 06:18:34 +05:30
Andreas Auernhammer	4f37c8ccf2	refine the KMS admin API (#8943 ) This commit removes the `Update` functionality from the admin API. While this is technically a breaking change I think this will not cause any harm because: - The KMS admin API is not complete, yet. At the moment only the status can be fetched. - The `mc` integration hasn't been merged yet. So no `mc` client could have used this API in the past. The `Update`/`Rewrap` status is not useful anymore. It provided a way to migrate from one master key version to another. However, KES does not support the concept of key versions. Instead, key migration should be implemented as migration from one master key to another. Basically, the `Update` functionality has been implemented just for Vault.	2020-02-05 22:47:35 +05:30
Anis Elleuch	52bdbcd046	Add new admin API to return Accounting Usage (#8689 )	2020-02-04 18:20:39 -08:00
Anis Elleuch	7432b5c9b2	Use user CAs in checkEndpoint() call (#8911 ) The server info handler makes a http connection to other nodes to check if they are up but does not load the custom CAs in ~/.minio/certs/CAs. This commit fix it. Co-authored-by: Harshavardhana <harsha@minio.io>	2020-02-02 07:15:29 +05:30
poornas	5d838edcef	Fix panic in ServerInfoHandler when (#8915 ) Co-authored-by: Harshavardhana <harsha@minio.io>	2020-02-01 17:50:04 +05:30
poornas	2232e095d5	Make admin permissions more granular for admin handlers. (#8888 )	2020-01-26 20:47:52 -06:00
Harshavardhana	f14f60a487	fix: Avoid double usage calculation on every restart (#8856 ) On every restart of the server, usage was being calculated which is not useful instead wait for sufficient time to start the crawling routine. This PR also avoids lots of double allocations through strings, optimizes usage of string builders and also avoids crawling through symbolic links. Fixes #8844	2020-01-21 14:07:49 -08:00
Klaus Post	2bf6cf0e15	Enable multiple concurrent profile types (#8792 )	2020-01-10 17:19:58 -08:00
Praveen raj Mani	4cd1bbb50a	This PR fixes two things (#8772 ) - Stop spawning store replay routines when testing the notification targets - Properly honor the target.Close() to clean the resources used Fixes #8707 Co-authored-by: Harshavardhana <harsha@minio.io>	2020-01-09 19:45:44 +05:30
Harshavardhana	6695fd6a61	Add more context aware error for policy parsing errors (#8726 ) In existing functionality we simply return a generic error such as "MalformedPolicy" which indicates just a generic string "invalid resource" which is not very meaningful when there might be multiple types of errors during policy parsing. This PR ensures that we send these errors back to client to indicate the actual error, brings in two concrete types such as - iampolicy.Error - policy.Error Refer #8202	2020-01-03 11:28:52 -08:00
Ashish Kumar Sinha	abc266caa1	Add bucket and object count along with total object size (#8639 )	2019-12-12 09:58:59 -08:00
Anis Elleuch	555969ee42	Add data usage collect with its new admin API (#8553 ) Admin data usage info API returns the following (Only FS & XL, for now) - Number of buckets - Number of objects - The total size of objects - Objects histogram - Bucket sizes	2019-12-12 06:02:37 -08:00
Ashish Kumar Sinha	e2c5d29017	Bucket,Object count & Usage removed if set to default (#8638 )	2019-12-11 21:56:47 -08:00
kannappanr	d266b3a066	Admin Info: Modify Uptime to return seconds (#8635 )	2019-12-11 17:56:02 -08:00
Ashish Kumar Sinha	24fb1bf258	New Admin Info (#8497 )	2019-12-11 14:27:03 -08:00
Nitish Tiwari	3df7285c3c	Add Support for Cache and S3 related metrics in Prometheus endpoint (#8591 ) This PR adds support below metrics - Cache Hit Count - Cache Miss Count - Data served from Cache (in Bytes) - Bytes received from AWS S3 - Bytes sent to AWS S3 - Number of requests sent to AWS S3 Fixes #8549	2019-12-05 23:16:06 -08:00
poornas	929951fd49	Add support for multiple admins (#8487 ) Also define IAM policies for administering MinIO server	2019-11-19 02:03:18 -08:00
Harshavardhana	e9b2bf00ad	Support MinIO to be deployed on more than 32 nodes (#8492 ) This PR implements locking from a global entity into a more localized set level entity, allowing for locks to be held only on the resources which are writing to a collection of disks rather than a global level. In this process this PR also removes the top-level limit of 32 nodes to an unlimited number of nodes. This is a precursor change before bring in bucket expansion.	2019-11-13 12:17:45 -08:00
Harshavardhana	822eb5ddc7	Bring in safe mode support (#8478 ) This PR refactors object layer handling such that upon failure in sub-system initialization server reaches a stage of safe-mode operation wherein only certain API operations are enabled and available. This allows for fixing many scenarios such as - incorrect configuration in vault, etcd, notification targets - missing files, incomplete config migrations unable to read encrypted content etc - any other issues related to notification, policies, lifecycle etc	2019-11-09 09:27:23 -08:00
Harshavardhana	4e63e0e372	Return appropriate errors API versions changes across REST APIs (#8480 ) This PR adds code to appropriately handle versioning issues that come up quite constantly across our API changes. Currently we were also routing our requests wrong which sort of made it harder to write a consistent error handling code to appropriately reject or honor requests. This PR potentially fixes issues - old mc is used against new minio release which is incompatible returns an appropriate for client action. - any older servers talking to each other, report appropriate error - incompatible peer servers should report error and reject the calls with appropriate error	2019-11-04 09:30:59 -08:00
Andreas Auernhammer	eac518b178	admin API: change returned HTTP error in hardware info (#8471 ) This commit replaces the returned error message by the hardware info handler from `Method-Not-Allowed` to `Bad-Request` since the current HTTP error is not correct according to the HTTP spec. In particular: ``` The origin server MUST generate an Allow header field in a 405 response containing a list of the target resource's currently supported methods. ``` From: https://tools.ietf.org/html/rfc7231#section-6.5.5	2019-10-30 23:41:18 -07:00
Harshavardhana	9e7a3e6adc	Extend further validation of config values (#8469 ) - This PR allows config KVS to be validated properly without being affected by ENV overrides, rejects invalid values during set operation - Expands unit tests and refactors the error handling for notification targets, returns error instead of ignoring targets for invalid KVS - Does all the prep-work for implementing safe-mode style operation for MinIO server, introduces a new global variable to toggle safe mode based operations NOTE: this PR itself doesn't provide safe mode operations	2019-10-30 23:39:09 -07:00
Harshavardhana	47b13cdb80	Add etcd part of config support, add noColor/json support (#8439 ) - Add color/json mode support for get/help commands - Support ENV help for all sub-systems - Add support for etcd as part of config	2019-10-30 00:04:39 -07:00
Harshavardhana	ee4a6a823d	Migrate config to KV data format (#8392 ) - adding oauth support to MinIO browser (#8400) by @kanagaraj - supports multi-line get/set/del for all config fields - add support for comments, allow toggle - add extensive validation of config before saving - support MinIO browser to support proper claims, using STS tokens - env support for all config parameters, legacy envs are also supported with all documentation now pointing to latest ENVs - preserve accessKey/secretKey from FS mode setups - add history support implements three APIs - ClearHistory - RestoreHistory - ListHistory - add help command support for each config parameters - all the bug fixes after migration to KV, and other bug fixes encountered during testing.	2019-10-22 22:59:13 -07:00
Praveen raj Mani	8836d57e3c	The prometheus metrics refractoring (#8003 ) The measures are consolidated to the following metrics - `disk_storage_used` : Disk space used by the disk. - `disk_storage_available`: Available disk space left on the disk. - `disk_storage_total`: Total disk space on the disk. - `disks_offline`: Total number of offline disks in current MinIO instance. - `disks_total`: Total number of disks in current MinIO instance. - `s3_requests_total`: Total number of s3 requests in current MinIO instance. - `s3_errors_total`: Total number of errors in s3 requests in current MinIO instance. - `s3_requests_current`: Total number of active s3 requests in current MinIO instance. - `internode_rx_bytes_total`: Total number of internode bytes received by current MinIO server instance. - `internode_tx_bytes_total`: Total number of bytes sent to the other nodes by current MinIO server instance. - `s3_rx_bytes_total`: Total number of s3 bytes received by current MinIO server instance. - `s3_tx_bytes_total`: Total number of s3 bytes sent by current MinIO server instance. - `minio_version_info`: Current MinIO version with commit-id. - `s3_ttfb_seconds_bucket`: Histogram that holds the latency information of the requests. And this PR also modifies the current StorageInfo queries - Decouples StorageInfo from ServerInfo . - StorageInfo is enhanced to give endpoint information. NOTE: ADMIN API VERSION IS BUMPED UP IN THIS PR Fixes #7873	2019-10-22 21:01:14 -07:00
Ashish Kumar Sinha	18cb15559d	Add network hardware info (#8358 ) peerRESTVersion changed to v6	2019-10-17 04:09:49 -07:00
Harshavardhana	d48fd6fde9	Remove unusued params and functions (#8399 )	2019-10-15 18:35:41 -07:00
poornas	d7060c4c32	Allow logging targets to be configured to receive `minio` (#8347 ) specific errors, `application` errors or `all` by default. console logging on server by default lists all logs - enhance admin console API to accept `type` as query parameter to subscribe to application/minio logs.	2019-10-11 18:50:54 -07:00
Harshavardhana	290ad0996f	Move etcd, logger, crypto into their own packages (#8366 ) - Deprecates _MINIO_PROFILER, `mc admin profile` does the job - Move ENVs to common location in cmd/config/	2019-10-08 11:17:56 +05:30
Ashish Kumar Sinha	74008446fe	CPU hardware info (#8187 )	2019-10-03 20:18:38 +05:30
Bala FA	2a2ff96ee1	change `ReadPerf` into `ReadThroughput` in NetPerfInfo. (#8316 ) Previously `ReadPerf` was in time.Duration is changed to `ReadThroughput` in uint64.	2019-09-27 00:01:18 +05:30
Harshavardhana	fd53057654	Add InfoCannedPolicy API to fetch only necessary policy (#8307 ) This PR adds - InfoCannedPolicy() API for efficiency in fetching policies - Send group memberships for LDAPUser if available	2019-09-26 23:53:13 +05:30
Harshavardhana	9ac12cf898	Remove unusued Set/GetConfigKeys API (#8235 )	2019-09-13 16:34:34 -07:00
Krishnan Parthasarathi	6ba323b009	Add ability to test drive speeds on a MinIO setup (#7664 ) - Extends existing Admin API to measure disk performance	2019-09-13 03:22:30 +05:30
Harshavardhana	73e4e99942	Hosts should be skipped, when calculating local info (#8191 ) endpoint.IsLocal will not have .Host entries so using them to skip double entries will never work. change the code such that we look for endpoint.Host outside of endpoint.IsLocal logic to skip double hosts appropriately. Move these functions to their appropriate file.	2019-09-12 23:36:12 +05:30
Aditya Manthramurthy	a0456ce940	LDAP STS API (#8091 ) Add LDAP based users-groups system This change adds support to integrate an LDAP server for user authentication. This works via a custom STS API for LDAP. Each user accessing the MinIO who can be authenticated via LDAP receives temporary credentials to access the MinIO server. LDAP is enabled only over TLS. User groups are also supported via LDAP. The administrator may configure an LDAP search query to find the group attribute of a user - this may correspond to any attribute in the LDAP tree (that the user has access to view). One or more groups may be returned by such a query. A group is mapped to an IAM policy in the usual way, and the server enforces a policy corresponding to all the groups and the user's own mapped policy. When LDAP is configured, the internal MinIO users system is disabled.	2019-09-10 04:42:29 +05:30
Harshavardhana	b52a3e523c	Avoid using fastjson parser pool, move back to jsoniter (#8190 ) It looks like from implementation point of view fastjson parser pool doesn't behave the same way as expected when dealing many `xl.json` from multiple disks. The fastjson parser pool usage ends up returning incorrect xl.json entries for checksums, with references pointing to older entries. This led to the subtle bug where checksum info is duplicated from a previous xl.json read of a different file from different disk.	2019-09-06 04:21:27 +05:30
Andreas Auernhammer	810a44e951	KMS Admin-API: add route and handler for KMS key info (#7955 ) This commit adds an admin API route and handler for requesting status information about a KMS key. Therefore, the client specifies the KMS key ID (when empty / not set the server takes the currently configured default key-ID) and the server tries to perform a dummy encryption, re-wrap and decryption operation. If all three succeed we know that the server can access the KMS and has permissions to generate, re-wrap and decrypt data keys (policy is set correctly).	2019-09-05 01:49:44 +05:30
poornas	8a71b0ec5a	Add admin API to send console log messages (#7784 ) Utilized by mc admin console command.	2019-09-03 23:40:48 +05:30
Bala FA	fa3546bb03	Add NetPerfInfo() API in madmin (#8112 )	2019-08-31 08:27:53 +05:30
Aditya Manthramurthy	847a3ea0a2	Add unit tests and refactor to improve coverage (#7617 )	2019-08-29 13:53:27 -07:00
Harshavardhana	83d4c5763c	Decouple ServiceUpdate to ServerUpdate to be more native (#8138 ) The change now is to ensure that we take custom URL as well for updating the deployment, this is required for hotfix deliveries for certain deployments - other than the community release. This commit changes the previous work `d65a2c6725` with newer set of requirements. Also deprecates PeerUptime()	2019-08-28 15:04:43 -07:00
Harshavardhana	d65a2c6725	Implement cluster-wide in-place updates (#8070 ) This PR is a breaking change and also deprecates `minio update` command, from this release onwards all users are advised to just use `mc admin update`	2019-08-27 11:37:47 -07:00
Bala FA	60f52f461f	add network read performance collection support. (#8038 ) ReST API on /minio/admin/v1/performance?perfType=net[?size=N] returns ``` { "PEER-1": [ { "addr": ADDR, "readPerf": DURATION, "error": ERROR, }, ... ], ... ... "PEER-N": [ { "addr": ADDR, "readPerf": DURATION, "error": ERROR, }, ... ] } ```	2019-08-19 08:26:32 +05:30
Aditya Manthramurthy	bf9b619d86	Set the policy mapping for a user or group (#8036 ) Add API to set policy mapping for a user or group Contains a breaking Admin APIs change. - Also enforce all applicable policies - Removes the previous /set-user-policy API Bump up peerRESTVersion Add get user info API to show groups of a user	2019-08-13 13:41:06 -07:00
Harshavardhana	e6d8e272ce	Use const slashSeparator instead of "/" everywhere (#8028 )	2019-08-06 12:08:58 -07:00
Aditya Manthramurthy	414a7eca83	Add IAM groups support (#7981 ) This change adds admin APIs and IAM subsystem APIs to: - add or remove members to a group (group addition and deletion is implicit on add and remove) - enable/disable a group - list and fetch group info	2019-08-02 14:25:00 -07:00
Harshavardhana	123cccaed1	Honor connection pooling while tracing (#7979 ) This PR fixes relying on r.Context().Done() by setting ``` Connection: "close" ``` HTTP Header, this has detrimental issues for client side connection pooling. Since this header explicitly tells clients to turn-off connection pooling. This causing pro-active connections to be closed leaving many conn's in TIME_WAIT state. This can be observed with `mc admin trace -a` when running distributed setup. This PR also fixes tracing filtering issue when bucket names have `minio` as prefixes, trace was erroneously ignoring them.	2019-07-31 11:08:39 -07:00
Aditya Manthramurthy	7bdaf9bc50	Update on-disk storage format for users system (#7949 ) - Policy mapping is now at `config/iam/policydb/users/myuser1.json` and includes version. - User identity file is now versioned. - Migrate old data to the new format.	2019-07-24 17:34:23 -07:00
poornas	0373a1699b	Add error filter to admin trace API (#7923 ) This allows MinIO to have the ability to send back only error trace	2019-07-20 01:38:26 +01:00
Krishnan Parthasarathi	fbfc9a61ec	Add node address information to logs (#7941 )	2019-07-18 09:58:37 -07:00

1 2 3 4 5 ...

356 Commits