minio

Commit Graph

Author	SHA1	Message	Date
Klaus Post	c5b3a675fa	Block profiling tweaks (#11612 ) The base profiles contains no valuable data, don't record them. Reduce block rate by 2 orders of magnitude, should still capture just as valuable data with less CPU strain.	2021-02-27 09:22:14 -08:00
Harshavardhana	79b6a43467	fix: avoid timed value for network calls (#11531 ) additionally simply timedValue to have RWMutex to avoid concurrent calls to DiskInfo() getting serialized, this has an effect on all calls that use GetDiskInfo() on the same disks. Such as getOnlineDisks, getOnlineDisksWithoutHealing	2021-02-12 18:17:52 -08:00
Harshavardhana	2a7b123895	turn off http2 for TLS setups for now (#11523 ) due to lots of issues with x/net/http2, as well as the bundled h2_bundle.go in the go runtime should be avoided for now. https://github.com/golang/go/issues/23559 https://github.com/golang/go/issues/42534 https://github.com/golang/go/issues/43989 https://github.com/golang/go/issues/33425 https://github.com/golang/go/issues/29246 With collection of such issues present, it make sense to remove HTTP2 support for now	2021-02-11 15:53:04 -08:00
Harshavardhana	6cd255d516	fix: allow updated domain names in federation (#11365 ) additionally also disallow overlapping domain names	2021-01-28 11:44:48 -08:00
Harshavardhana	f903cae6ff	Support variable server pools (#11256 ) Current implementation requires server pools to have same erasure stripe sizes, to facilitate same SLA and expectations. This PR allows server pools to be variadic, i.e they do not have to be same erasure stripe sizes - instead they should have SLA for parity ratio. If the parity ratio cannot be guaranteed by the new server pool, the deployment is rejected i.e server pool expansion is not allowed.	2021-01-16 12:08:02 -08:00
Harshavardhana	8565cefe4e	fix: allow HTTP2.0 to be always configured	2020-12-22 16:32:58 -08:00
Harshavardhana	5c451d1690	update x/net/http2 to address few bugs (#11144 ) additionally also configure http2 healthcheck values to quickly detect unstable connections and let them timeout. also use single transport for proxying requests	2020-12-21 21:42:38 -08:00
Harshavardhana	790833f3b2	Revert "Support variable server sets (#10314 )" This reverts commit `aabf053d2f`.	2020-12-01 12:02:29 -08:00
Harshavardhana	aabf053d2f	Support variable server sets (#10314 )	2020-11-25 16:28:47 -08:00
Harshavardhana	bd2131ba34	add DNS cache support to avoid DNS flooding (#10693 ) Go stdlib resolver doesn't support caching DNS resolutions, since we compile with CGO disabled we are more probe to DNS flooding for all network calls to resolve for DNS from the DNS server. Under various containerized environments such as VMWare this becomes a problem because there are no DNS caches available and we may end up overloading the kube-dns resolver under concurrent I/O. To circumvent this issue implement a DNSCache resolver which resolves DNS and caches them for around 10secs with every 3sec invalidation attempted.	2020-10-16 14:49:05 -07:00
Harshavardhana	2760fc86af	Bump default idleConnsPerHost to control conns in time_wait (#10653 ) This PR fixes a hang which occurs quite commonly at higher concurrency by allowing following changes - allowing lower connections in time_wait allows faster socket open's - lower idle connection timeout to ensure that we let kernel reclaim the time_wait connections quickly - increase somaxconn to 4096 instead of 2048 to allow larger tcp syn backlogs. fixes #10413	2020-10-12 14:19:46 -07:00
Harshavardhana	736e58dd68	fix: handle concurrent lockers with multiple optimizations (#10640 ) - select lockers which are non-local and online to have affinity towards remote servers for lock contention - optimize lock retry interval to avoid sending too many messages during lock contention, reduces average CPU usage as well - if bucket is not set, when deleteObject fails make sure setPutObjHeaders() honors lifecycle only if bucket name is set. - fix top locks to list out always the oldest lockers always, avoid getting bogged down into map's unordered nature.	2020-10-08 12:32:32 -07:00
Harshavardhana	1f9abbee4d	make sure to release locks upon timeout (#10596 ) fixes #10418	2020-09-29 15:18:34 -07:00
Harshavardhana	37a5d5d7a0	reduce timeouts between servers for faster disconnects (#10562 )	2020-09-24 20:10:07 -07:00
Krishna Srinivas	230fc0d186	Support for "directory" objects (#10499 )	2020-09-19 08:39:41 -07:00
Klaus Post	b7438fe4e6	Copy metadata before spawning goroutine + prealloc maps (#10458 ) In `(*cacheObjects).GetObjectNInfo` copy the metadata before spawning a goroutine. Clean up a few map[string]string copies as well, reducing allocs and simplifying the code. Fixes #10426	2020-09-10 11:37:22 -07:00
Harshavardhana	c13afd56e8	Remove MaxConnsPerHost settings to avoid potential hangs (#10438 ) MaxConnsPerHost can potentially hang a call without any way to timeout, we do not need this setting for our proxy and gateway implementations instead IdleConn settings are good enough. Also ensure to use NewRequestWithContext and make sure to take the disks offline only for network errors. Fixes #10304	2020-09-08 14:22:04 -07:00
Anis Elleuch	46ee8659b4	fix write quorum calculation for bucket operations (#10364 ) When the number of disks is odd, the calculation of quorum for bucket operations were not correct, fix it.	2020-08-27 12:55:32 -07:00
Harshavardhana	1e2ebc9945	feat: time to bring back http2.0 support (#10230 ) Bonus move our CI/CD to go1.14	2020-08-10 09:02:29 -07:00
Harshavardhana	0b8255529a	fix: proxies set keep-alive timeouts to be system dependent (#10199 ) Split the DialContext's one for internode and another for all other external communications especially proxy forwarders, gateway transport etc.	2020-08-04 14:55:53 -07:00
Klaus Post	968342c732	Remove usage of go-ieproxy for windows (#10009 ) There is a potential for deadlock on Windows 10 refer https://github.com/mattn/go-ieproxy/issues/17 remove this dependency for now.	2020-07-10 12:08:14 -07:00
Harshavardhana	72e0745e2f	fix: migrate to go.etcd.io import path (#9987 ) with the merge of https://github.com/etcd-io/etcd/pull/11823 etcd v3.5.0 will now have a properly imported versioned path this fixes our pending migration to newer repo	2020-07-07 19:04:29 -07:00
Klaus Post	aa4d1021eb	Remove timeout from putobject and listobjects (#9986 ) Use a separate client for these calls that can take a long time. Add request context to these so they are canceled when the client disconnects instead except for ListObject which doesn't have any equivalent.	2020-07-07 12:19:57 -07:00
Harshavardhana	e59ee14f40	Tune tcp keep-alives with new kernel timeout options (#9963 ) For more deeper understanding https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/	2020-07-03 10:03:41 -07:00
Harshavardhana	4915433bd2	Support bucket versioning (#9377 ) - Implement a new xl.json 2.0.0 format to support, this moves the entire marshaling logic to POSIX layer, top layer always consumes a common FileInfo construct which simplifies the metadata reads. - Implement list object versions - Migrate to siphash from crchash for new deployments for object placements. Fixes #2111	2020-06-12 20:04:01 -07:00
Klaus Post	95814359bd	cache disk info to avoid repeated calls (#9682 ) This value is requested on every upload when there are multiple zones. Since this will result in an RPC call to every remote disk this scales quite badly in a distributed setup. Load every 1second interval. 2 servers, localhost only. In large distributed setups much bigger gains can be expected. ``` Operations: 21743 -> 22454 * Average: +3.28% (+0.0 MiB/s) throughput, +3.28% (+11.9) obj/s * Fastest: +3.37% (+0.0 MiB/s) throughput, +3.37% (+13.0) obj/s * 50% Median: +3.03% (+0.0 MiB/s) throughput, +3.03% (+11.2) obj/s * Slowest: +8.03% (+0.0 MiB/s) throughput, +8.03% (+22.8) obj/s ``` For easy management of this a generic helper has been added.	2020-05-26 12:52:24 -07:00
Harshavardhana	6ac48a65cb	fix: use unused cacheMetrics code in prometheus (#9588 ) remove all other unusued/deadcode	2020-05-13 08:15:26 -07:00
Anis Elleuch	6d76efb9bb	Add support of TCP fast open in internode calls (#9486 )	2020-05-08 14:33:23 -07:00
Harshavardhana	60d415bb8a	deprecate/remove global WORM mode (#9436 ) global WORM mode is a complex piece for which the time has passed, with the advent of S3 compatible object locking and retention implementation global WORM is sort of deprecated, this has been mentioned in our documentation for some time, now the time has come for this to go.	2020-04-24 16:37:05 -07:00
poornas	582953260b	Increase response header timeout for gateway (#9400 ) fixes: #9295	2020-04-21 19:21:27 -07:00
Sidhartha Mani	3e78ea8acc	improve obd tests and optimize network (#9378 ) - keep long running obd network tests alive - fix error - wrong number of parents in process OBD info - ensure that osinfo does not error out when inside containers - remove limit on max number of connections per client transport The generic client transport uses a default limit of 64 conns per transport. This could end up limiting and throttling usage, and artificially slowing down the performance of MinIO even on hardware capable of doing better.	2020-04-18 11:06:11 -07:00
Klaus Post	c4464e36c8	fix: limit HTTP transport tuables to affordable values (#9383 ) Close connections pro-actively in transient calls	2020-04-17 11:20:56 -07:00
Harshavardhana	f44cfb2863	use GlobalContext whenever possible (#9280 ) This change is throughout the codebase to ensure that all codepaths honor GlobalContext	2020-04-09 09:30:02 -07:00
Harshavardhana	30707659b5	[feature] allow for an odd number of erasure packs (#9221 ) Too many deployments come up with an odd number of hosts or drives, to facilitate even distribution among those setups allow for odd and prime numbers based packs.	2020-03-31 09:32:16 -07:00
Sidhartha Mani	0c80bf45d0	Implement oboard diagnostics admin API (#9024 ) - Implement a graph algorithm to test network bandwidth from every node to every other node - Saturate any network bandwidth adaptively, accounting for slow and fast network capacity - Implement parallel drive OBD tests - Implement a paging mechanism for OBD test to provide periodic updates to client - Implement Sys, Process, Host, Mem OBD Infos	2020-03-26 21:07:39 -07:00
Anis Elleuch	791821d590	sa: Allow empty policy to indicate parent user's policy is inherited (#9185 )	2020-03-23 14:17:18 -07:00
Harshavardhana	3d3beb6a9d	Add response header timeouts (#9170 ) - Add conservative timeouts upto 3 minutes for internode communication - Add aggressive timeouts of 30 seconds for gateway communication Fixes #9105 Fixes #8732 Fixes #8881 Fixes #8376 Fixes #9028	2020-03-21 22:10:13 -07:00
Anis Elleuch	23a0415eb7	profiling: Fix crash when enabling goroutines profiling (#9097 ) This commit replaces 'goroutines' with 'goroutine' when passing it to pprof library when activating goroutine type profiling	2020-03-06 13:22:47 -08:00
poornas	9fc7537f2a	Enforce md5sum checks for object retention APIs (#9030 ) this PR enforces md5sum verification for following API's to be compatible with AWS S3 spec - PutObjectRetention - PutObjectLegalHold Co-authored-by: Harshavardhana <harsha@minio.io>	2020-03-04 07:04:12 -08:00
Klaus Post	f1b2462193	Add goroutine profiles (#9078 ) Allow downloading goroutine dump to help detect leaks or overuse of goroutines. Extensions are now type dependent. Change `profiling` -> `profile` prefix, since that is what they are not the abstract concept.	2020-03-04 06:58:12 -08:00
Harshavardhana	712e82344c	acl: Support PUT calls with success for 'private' ACL's (#9000 ) Add dummy calls which respond success when ACL's are set to be private and fails, if user tries to change them from their default 'private' Some applications such as nuxeo may have an unnecessary requirement for this operation, we support this anyways such that don't have to fully implement the functionality just that we can respond with success for default ACLs	2020-02-16 11:37:52 +05:30
Harshavardhana	c56c2f5fd3	fix routing issue for esoteric characters in gorilla/mux (#8967 ) First step is to ensure that Path component is not decoded by gorilla/mux to avoid routing issues while handling certain characters while uploading through PutObject() Delay the decoding and use PathUnescape() to escape the `object` path component. Thanks to @buengese and @ncw for neat test cases for us to test with. Fixes #8950 Fixes #8647	2020-02-12 09:08:02 +05:30
Harshavardhana	d7dc9aaf52	fix: remove response header timeout (#8919 ) Adding respone header timeout seems to have premature timeout like consequences which leads to potential disconnections.	2020-02-01 08:31:55 +05:30
Klaus Post	c7178d2066	Profiling: Add base, fix memory profiling (#8850 ) For 'snapshot' type profiles, record a 'before' profile that can be used as `go tool pprof -base=before ...` to compare before and after. "Before" profiles are included in the zipped package. [`runtime.MemProfileRate`](https://golang.org/pkg/runtime/#pkg-variables) should not be updated while the application is running, so we set it at startup. Co-authored-by: Harshavardhana <harsha@minio.io>	2020-01-21 15:49:25 -08:00
Harshavardhana	f14f60a487	fix: Avoid double usage calculation on every restart (#8856 ) On every restart of the server, usage was being calculated which is not useful instead wait for sufficient time to start the crawling routine. This PR also avoids lots of double allocations through strings, optimizes usage of string builders and also avoids crawling through symbolic links. Fixes #8844	2020-01-21 14:07:49 -08:00
poornas	60e60f68dd	Add support for object locking with legal hold. (#8634 )	2020-01-16 15:41:56 -08:00
Klaus Post	d8660b30cc	Reduce MemProfileRate (#8814 ) Enabling the memory profiling has a significant impact on performance. Reduce the profiling rate by 2 orders of magnitude. It is still 128x smaller than default so it should be plenty.	2020-01-14 16:18:45 -08:00
poornas	30922148fb	Fix bug preventing overwrite of object if (#8796 ) object lock config is enabled for a bucket. Creating a bucket with object lock configuration enabled does not automatically cause WORM protection to be applied. PUT operation needs to specifically request object locking or bucket has to have default retention settings configured. Fixes regression introduced in #8657	2020-01-13 17:29:31 -08:00
Klaus Post	2bf6cf0e15	Enable multiple concurrent profile types (#8792 )	2020-01-10 17:19:58 -08:00
Harshavardhana	5aa5dcdc6d	lock: improve locker initialization at init (#8776 ) Use reference format to initialize lockers during startup, also handle `nil` for NetLocker in dsync and remove errorLocker implementation Add further tuning parameters such as - DialTimeout is now 15 seconds from 30 seconds - KeepAliveTimeout is not 20 seconds, 5 seconds more than default 15 seconds - ResponseHeaderTimeout to 10 seconds - ExpectContinueTimeout is reduced to 3 seconds - DualStack is enabled by default remove setting it to `true` - Reduce IdleConnTimeout to 30 seconds from 1 minute to avoid idleConn build up Fixes #8773	2020-01-10 02:35:06 -08:00

1 2 3

145 Commits